Managing Linux Software RAID (Part I)

Redundant Array of Independent Disks (RAID) is a storage and virtualization technology that first gained traction in the late 1980s. It combines multiple physical disk drives into one logical virtual unit to improve performance and facilitate redundancy. Data is distributed across the drives depending on the RAID level. Though not a substitute for backups, RAID levels can help keep your system running in the face of common hardware issues like data loss from disk failures or unrecoverable (sector) read errors.

RAID can be implemented and managed using “block” devices, which can be Software RAID (e.g. Linux Software RAID’s mdadm), Hardware RAID (i.e. dedicated hardware card), or FakeRAID (also called BIOS or Onboard RAID). This article will be focusing on the how-tos of Linux Software RAID and its management tool, mdadm.

Linux Software RAID is a block device that can work on most other devices—SATA, USB, IDE, SCSI, or a combination of these. Note that the RAID layer does not touch the file system layer at all. You can place a file system on a RAID device, just like on any other block device.

How Software RAID devices work

Software RAID devices should be seen as ordinary disks or disk partitions. They can be “built” from a number of other block devices.

In a RAID implementation, there are spare disks or hot spares that do not take part in the RAID set until one of the active disks fail. Once a device failure is detected, that device is marked as ‘faulty’ and the set starts reconstruction immediately on the first hot spare available. Your system can now run for some time even with a faulty disk as the spare takes its place. If no spare disk is available, then the array runs in ‘degraded’ mode.

Note that faulty disks still appear and behave as members of the array. The RAID layer just avoids reading/writing them. Also, if a device needs to be removed from the array, remember to mark it first as ‘faulty’ before you remove it.

Creating a SATA drive-based RAID device

You can use full drives as parts of a RAID array but this should be avoided if you want to make the drives bootable. What you need to do is create a partition in the drives (using fdisk or Parted) with partition type set to ‘fd,’ the Linux Software RAID auto-detect.

After creating a partition in each relevant drive (sda1, sdb1), you can now create the array:

mdadm –create /dev/md0 -n 2 –level=1 /dev/sda1 /dev/sdb1

 Where:

  • /dev/md0 – your new array device path
  • -n 2  – the total number of drives, which can be classified as spare, active, or missing; a ‘missing’ drive requires the additional option -f to work
  • –level=<RAID level> – can contain any of the following:

o   one of linear

o   raid0  OR 0

o   stripe

o   raid1  OR 1

o   mirror

o   raid4 OR 4

o   raid5 OR

o   raid6 OR 6

o   raid10 OR 10

o   multipath OR mp

o   faulty

o   container

  • /dev/sda1, /dev/sdb1 – each component device’s path

IMPORTANT NOTE: If you want to boot from a RAID device, place the bootloader code at the start of the physical drive that will be used as the underlying component of the RAID device. This means that if you are using /dev/sda1 as a member of the array /dev/md0 and your primary HDD/SSD is set as BIOS boot option, you must choose /dev/sda for the bootloader code. One way to do this is by running grub-install /dev/sda.

After you create a RAID device, you can use it as you would a regular local drive. You can, for example, use it as a physical volume for Linux kernel’s logical volume manager (LVM) or put a file system in it.

While RAID metadata is stored in each component drive, you should still save your RAID configuration for future use. This will help you monitor and correct md-device naming, otherwise you will get device names like md126 and md127. You can save your configuration file by running the following command:

mdadm –detail –scan >> /etc/mdadm.conf

For Ubuntu: /etc/mdadm/mdadm.conf

 Monitoring a RAID device

You can proactively monitor the health of a RAID device using –monitor in the mdadm tool. If mdadm detects an array health issue, it can do a host of actions you specify in the command line. It can send an email, run a specified program, or send a message to the system log to name a few. Read the manual in man mdadm (Monitor section) to see a list of options.

Most Linux distributions provide wrappers for the mdadm –monitor. CentOS / RHEL for example, provides an mdmonitor service, which you can set to start automatically with the system by running chkconfig mdmonitor on.

Another way to check device health is to look into the /proc/mdstat text file to see what arrays your kernel recognizes and to see their respective states.

 Replacing the component drive of a RAID device

To replace a drive (e.g. (/dev/sdb1) in an active RAID array, it first needs to marked as ‘faulty’ by the kernel—marked as (F) in the /proc/mdstat. If it is not yet marked as faulty, you must do the tagging yourself prior to removal. To do this, run mdadm /dev/md0 -f /dev/sdb1.

After the drive is removed, you can add another drive as replacement using mdadm /dev/md0 -a /dev/sdc1. Note that you can perform multiple operations with this one command—e.g. mdadm /dev/md0 -f /dev/sdb1 -r /dev/sdb1 -a /dev/sdc1.

Check the contents of the /proc/mdstat file afterwards to see if everything went as expected. The operation went as it should if mdadm did not produce any error output during the replacement.

Destroying a RAID device

There may be a time when you need to completely destroy a RAID device to make underlying devices usable for the system. A situation where this is plausible is if you have a spare HDD with preconfigured RAID metadata and your kernel does recognize the array. In this case, there is only one member in the array and it is not possible to just remove the single drive.

To destroy the RAID device, you first need to stop the array with mdadm -S /dev/md0. The next step is to remove the metadata from the sda1 partition by issuing the command mdadm –zero-superblock /dev/sda1. Remember to bring the metadata of /dev/sda1 (not /dev/sda) to zero (0) if it is part of an array.

References:

See also Managing Linux Software RAID (Part II): Recovery After a Disk Failure.
See our Knowledgebase for more How-To articles.

Comments are closed.