What is RAID?
Originally, as envisaged in 1987 by Patterson, Gibson and Katz from the University of California in Berkeley, the acronym RAID stood for a "Redundant Array of Inexpensive Disks". In brief a larger number of smaller cheaper disks could be used in place of a single much more expensive large hard disk. (Or even to create a disk that was larger than any currently available).
They went a stage further and postulated a variety of options that would not only result in getting a big disk for a lower cost, but could improve performance, or increase reliability at the same time. Partly the options for improved reliability were required as using multiple disks gave a reduction in the Mean Time Between Failure, divide the MTBF for a drive in the array by the number of drives and theoretically a RAID will fail more quickly than a single disk.
Today RAID is usually described as a "Redundant Array of Independent Disks", technology has moved on and even the most costly disks are not particularly expensive.
Five levels of RAID were originally defined, some geared towards performance, others to improved fault tolerance, though the first of these did not have any "redundancy" or "fault-tolerance" so might not truly be considered RAID.
Whilst a high level of additional data security is added by some RAID levels, RAID data recovery can still be needed as multiple disk failures can occur and the levels of protection are no guard against file system corruption or accidental file deletion.
RAID levels 0, and 2-5 use a technique known as "data striping". Rather than filling one disk then moving on to the next (spanning), a unit of data transfer known as a stripe is defined. This could be 512 bytes x the number of disks, it could be several megabytes, but usually it is in the 64K to 256K range. A "stripe" of data is written as a sequence of equal sized sections of data to each drive in sequence. E.g. 64KB written to disk 0, then 64KB written to disk 1 and so on until 64KB is written the final disk in the RAID set. These 64KB sections, across each of the disks in the set, form one "stripe".
This technique balances load across the disks, and provides a performance gain especially when writing large volumes of sequential data. Write data to one drive, then onto the next, by the time you get back to the first drive it has had more than enough time to commit the data and be ready for more.
Hardware or Software RAID?
RAID describes how data is stored not how disks are managed. RAID capability can be provided either by a controller that presents the host system with a single data space (hardware RAID) but can also be provided by the operating system. An O/S can provide an abstraction layer whereby it sees several drives but presents a single volume to applications.
Hardware RAID has advantages in that the RAID can be transferred to another system more easily and the RAID operation does not require any OS resources.
Implementations of hardware RAID do differ. Some use a RAID controller in the host system, this has several disks attached to it but presents one disk to the OS. Other RAID implementations have a RAID controller mounted in a case along with the disks, this makes the RAID "box" appear to be a single disk and this "box" is plugged into a standard SCSI controller.
This latter implementation is highly independent, the RAID "box" is a disk, and there are systems that use lower cost IDE and SATA disks within the "box" but present these as a single SCSI disk to the host’s SCSI controller.
Parity or Error Correction Code (ECC)
RAID uses parity to enable it to rebuild data from a failed disk drive. Think of a basic arithmetic equation. Take the sum 5 + 3 = 8, nice and simple, and you can take any number from the equation and work out what it should be. Take 5 + n = 8, n + 3 = 8 or 5 + 3 = n, and you can work out what n should be. With RAID the process differs in that the values are not added together, this could not work as for each sequence of bytes being worked upon there is only one byte to hold the answer. So RAID using a process of XORing the value of each byte and then storing the result in the ECC byte.
Data calculated from the data being written is stored and so if one disk fails the data that was on it can be re-calculated. For later RAID levels extra parity is calculated as one lot of parity dealt only with the loss of one drive. Take the sum n + x = 8 and you cannot work out what n and x used to be.
Different RAID standards stored this parity data either on a dedicated drive, or in stripes interspersed within the data.
A "Hot Spare" drive is a drive that is connected in with the RAID but is not in use as part of that RAID. Its purpose is to brought into play automatically by the RAID in the even of a single disk failure, removing the human reaction time to deal with the problem.
Whilst many RAID levels provide additional protection against hardware failures a RAID system can still fail with the result that vital data becomes inaccessible. RAID 5, for example, can survive one disk failing, but not two. The RAID data recovery process differs for each RAID level, a RAID 5 data recovery procedure can use the error correction information that is encoded whereas for a mirrored set no rebuilding is possible but data can be used from each of two failing disk drives to rebuild the original data.
Last Updated (Thursday, 11 June 2009 12:09)