RAID

From Wikipedia, the free encyclopedia

RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks, is a technology that allows high levels of storage reliability from low-cost and less reliable PC-class disk-drive components, via the technique of arranging the devices into arrays for redundancy . This concept was first defined by David A. Patterson , Garth A. Gibson , and Randy Katz at the University of California, Berkeley in 1987 as redundant array of inexpensive disks.^[1] Marketers representing industry RAID manufacturers later reinvented the term to describe a redundant array of independent disks as a means of dissociating a low-cost expectation from RAID technology.^[2]

RAID is now used as an umbrella term for computer data storage schemes that can divide and replicate data among multiple hard disk drives. The different schemes/architectures are named by the word RAID followed by a number, as in RAID 0, RAID 1, etc. RAID's various designs involve two key design goals: increase data reliability and/or increase input/output performance. When multiple physical disks are set up to use RAID technology, they are said to be in a RAID array^[3]. This array distributes data across multiple disks, but the array is seen by the computer user and operating system as one single disk. RAID can be set up to serve several different purposes.

[*] Principles

RAID combines two or more physical hard disks into a single logical unit using special hardware or software. Hardware solutions are often designed to present themselves to the attached system as a single hard drive, so that the operating system would be unaware of the technical workings. For example, if one were to configure a hardware-based RAID-5 volume using three 250 GB hard drives (two drives for data, and one for parity ), the operating system would be presented with a single 500 GB volume. Software solutions are typically implemented in the operating system and would present the RAID volume as a single drive to applications running within the operating system.

There are three key concepts in RAID: mirroring , the writing of identical data to more than one disk; striping , the splitting of data across more than one disk; and error correction , where redundant parity data is stored to allow problems to be detected and possibly repaired (known as fault tolerance ). Different RAID schemes use one or more of these techniques, depending on the system requirements. The purpose of using RAID is to improve reliability and availability of data, ensuring that important data is not harmed in case of hardware failure, and/or to increase the speed of file input/output .

Each RAID scheme affects reliability and performance in different ways. Every additional disk included in an array increases the likelihood that one will fail, but by using error checking and/or mirroring, the array as a whole can be made more reliable by the ability to survive and recover from a failure. Basic mirroring can speed up the reading of data, as a system can read different data from multiple disks at the same time, but it may be slow for writing if the configuration requires that all disks must confirm that the data is correctly written. Striping, often used for increasing performance, writes each bit to a different disk, allowing the data to be reconstructed from multiple disks faster than a single disk could send the same data. Error checking typically will slow down performance as data needs to be read from multiple places and then compared. The design of any RAID scheme is often a compromise in one or more respects, and understanding the requirements of a system is important. Modern disk arrays typically provide the facility to select an appropriate RAID configuration.

[*] Organization

Organizing disks into a redundant array decreases the usable storage capacity. For instance, a 2-disk RAID 1 array loses half of the total capacity that would have otherwise been available using both disks independently, and a RAID 5 array with several disks loses the capacity of one disk. Other types of RAID arrays are arranged, for example, so that they are faster to write to and read from than a single disk.

There are various combinations of these approaches giving different trade-offs of protection against data loss, capacity, and speed. RAID levels 0, 1, and 5 are the most commonly found, and cover most requirements.

[*] Standard levels

Main article: Standard RAID levels

A number of standard schemes have evolved which are referred to as levels. There were five RAID levels originally conceived, but many more variations have evolved, notably several nested levels and many non-standard levels (mostly proprietary ).

Following is a brief summary of the most commonly used RAID levels.^[4] Space efficiency is given as amount of storage space available in an array of n disks, in multiples of the capacity of a single drive. For example if an array holds n=5 drives of 250GB and efficiency is n-1 then available space is 4 times 250GB or roughly 1TB.

Level	Description	Minimum # of disks	Space Efficiency	Fault Tolerance
RAID 0	Striped set without parity or mirroring. Provides improved performance and additional storage but no redundancy or fault tolerance. Because there is no redundancy, this level is not actually a Redundant Array of Independent Disks, i.e. not true RAID. However, because of the similarities to RAID (especially the need for a controller to distribute data across multiple disks), simple stripe sets are normally referred to as RAID 0. Any disk failure destroys the array, which has greater consequences with more disks in the array (at a minimum, catastrophic data loss is twice as severe compared to single drives without RAID). A single disk failure destroys the entire array because when data is written to a RAID 0 drive, the data is broken into fragments. The number of fragments is dictated by the number of disks in the array. The fragments are written to their respective disks simultaneously on the same sector. This allows smaller sections of the entire chunk of data to be read off the drive in parallel, increasing bandwidth. RAID 0 does not implement error checking so any error is unrecoverable. More disks in the array means higher bandwidth, but greater risk of data loss.	2	n	0 (none)
RAID 1	Mirrored set without parity or striping. Provides fault tolerance from disk errors and failure of all but one of the drives. Increased read performance occurs when using a multi-threaded operating system that supports split seeks, as well as a very small performance reduction when writing. Array continues to operate so long as at least one drive is functioning. Using RAID 1 with a separate controller for each disk is sometimes called duplexing.	2	1 (size of the smallest disk)	n-1 disks
RAID 2	Hamming code parity. Disks are synchronized and striped in very small stripes, often in single bytes/words. Hamming codes error correction is calculated across corresponding bits on disks, and is stored on multiple parity disks.	3
RAID 3	Striped set with dedicated parity or bit interleaved parity or byte level parity. This mechanism provides fault tolerance similar to RAID 5. However, because the stripe across the disks is much smaller than a filesystem block, reads and writes to the array perform like a single drive with a high linear write performance. For this to work properly, the drives must have synchronised rotation. If one drive fails, performance is not affected.	3	n-1	1 disk
RAID 4	Block level parity. Identical to RAID 3, but does block-level striping instead of byte-level striping. In this setup, files can be distributed between multiple disks. Each disk operates independently which allows I/O requests to be performed in parallel, though data transfer speeds can suffer due to the type of parity. The error detection is achieved through dedicated parity and is stored in a separate, single disk unit.	3	n-1	1 disk
RAID 5	Striped set with distributed parity or interleave parity. Distributed parity requires all drives but one to be present to operate; drive failure requires replacement, but the array is not destroyed by a single drive failure. Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user. The array will have data loss in the event of a second drive failure and is vulnerable until the data that was on the failed drive is rebuilt onto a replacement drive. A single drive failure in the set will result in reduced performance of the entire set until the failed drive has been replaced and rebuilt.	3	n-1	1 disk
RAID 6	Striped set with dual distributed parity. Provides fault tolerance from two drive failures; array continues to operate with up to two failed drives. This makes larger RAID groups more practical, especially for high availability systems. This becomes increasingly important because large-capacity drives lengthen the time needed to recover from the failure of a single drive. Single parity RAID levels are vulnerable to data loss until the failed drive is rebuilt: the larger the drive, the longer the rebuild will take. Dual parity gives time to rebuild the array without the data being at risk if a (single) additional drive fails before the rebuild is complete.	4	n-2	2 disks

[*] Nested (hybrid) RAID

Main article: Nested RAID levels

In what was originally termed hybrid RAID,^[5] many storage controllers allow RAID levels to be nested. The elements of a RAID may be either individual disks or RAIDs themselves. Nesting more than two deep is unusual.

As there is no basic RAID level numbered larger than 9, nested RAIDs are usually unambiguously described by concatenating the numbers indicating the RAID levels, sometimes with a "+" in between. For example, RAID 10 (or RAID 1+0) consists of several level 1 arrays of physical drives, each of which is one of the "drives" of a level 0 array striped over the level 1 arrays. It is not called RAID 01, to avoid confusion with RAID 1, or indeed, RAID 01 . When the top array is a RAID 0 (such as in RAID 10 and RAID 50) most vendors omit the "+", though RAID 5+0 is clearer.

The key difference from RAID 0+1 is that RAID 1+0 creates a striped set from a series of mirrored drives. In a failed disk situation, RAID 1+0 performs better because all the remaining disks continue to be used. The array can sustain multiple drive losses so long as no mirror loses all its drives.

RAID 5+1: mirror striped set with distributed parity (some manufacturers label this as RAID 53).

[*] New RAID classification

In 1996, the RAID Advisory Board introduced an improved classification of RAID systems. It divides RAID into three types: Failure-resistant disk systems (that protect against data loss due to disk failure), failure-tolerant disk systems (that protect against loss of data access due to failure of any single component), and disaster-tolerant disk systems (that consist of two or more independent zones, either of which provides access to stored data).

The original "Berkeley" RAID classifications are still kept as an important historical reference point and also to recognize that RAID Levels 0-6 successfully define all known data mapping and protection schemes for disk. Unfortunately, the original classification caused some confusion due to assumption that higher RAID levels imply higher redundancy and performance. This confusion was exploited by RAID system manufacturers, and gave birth to the products with such names as RAID-7, RAID-10, RAID-30, RAID-S, etc. The new system describes the data availability characteristics of the RAID system rather than the details of its implementation.

The next list provides criteria for all three classes of RAID:

- Failure-resistant disk systems (FRDS) (meets a minimum of criteria 1 - 6):

1. Protection against data loss and loss of access to data due to disk drive failure
2. Reconstruction of failed drive content to a replacement drive
3. Protection against data loss due to a "write hole"
4. Protection against data loss due to host and host I/O bus failure
5. Protection against data loss due to replaceable unit failure
6. Replaceable unit monitoring and failure indication

- Failure-tolerant disk systems (FTDS) (meets a minimum of criteria 7 - 15 ):

7. Disk automatic swap and hot swap
8. Protection against data loss due to cache failure
9. Protection against data loss due to external power failure
10. Protection against data loss due to a temperature out of operating range
11. Replaceable unit and environmental failure warning
12. Protection against loss of access to data due to device channel failure
13. Protection against loss of access to data due to controller module failure
14. Protection against loss of access to data due to cache failure
15. Protection against loss of access to data due to power supply failure

- Disaster-tolerant disk systems (DTDS) (meets a minimum of criteria 16 - 21):

16. Protection against loss of access to data due to host and host I/O bus failure
17. Protection against loss of access to data due to external power failure
18. Protection against loss of access to data due to component replacement
19. Protection against loss of data and loss of access to data due to multiple disk failure
20. Protection against loss of access to data due to zone failure
21. Long-distance protection against loss of data due to zone failure

[*] Non-standard levels

Main article: Non-standard RAID levels

Many configurations other than the basic numbered RAID levels are possible, and many companies, organizations, and groups have created their own non-standard configurations, in many cases designed to meet the specialised needs of a small niche group. Most of these non-standard RAID levels are proprietary .

[*] Parity calculation; rebuilding failed drives

Parity data in a RAID environment is calculated using the Boolean XOR function. For example, here is a simple RAID 4 three-disk setup consisting of two drives that hold 8 bits of data each and a third drive that will be used to hold parity data.

Drive 1: 01101101
Drive 2: 11010100

To calculate parity data for the two drives, a XOR is performed on their data.
i.e. 01101101 XOR 11010100 = 10111001

The resulting parity data, 10111001, is then stored on Drive 3, the dedicated parity drive.

Should any of the three drives fail, the contents of the failed drive can be reconstructed on a replacement (or "hot spare") drive by subjecting the data from the remaining drives to the same XOR operation. If Drive 2 were to fail, its data could be rebuilt using the XOR results of the contents of the two remaining drives, Drive 3 and Drive 1:

Drive 3: 10111001
Drive 1: 01101101

i.e. 10111001 XOR 01101101 = 11010100

The result of that XOR calculation yields Drive 2's contents. 11010100 is then stored on Drive 2, fully repairing the array. This same XOR concept applies similarly to larger arrays, using any number of disks. In the case of a RAID 3 array of 12 drives, 11 drives participate in the XOR calculation shown above and yield a value that is then stored on the dedicated parity drive.

Another way of saying this is:

Basically, all it’s doing on the 3rd (recovery) drive is telling you whether the data on the first 2 drives is equal or not, 0 being yes, and 1 being no. Therefore, pretend that the second drive is missing. Drive 1 says 1, and Drive 3 says that Drive 1 and 2 are not the same value (1) so drive 2 must be a 0. If Drive 3 said they were the same value (0) then drive 2 must be a 0. That’s how it’s recalculated. It also tells you how the number of missing drives can’t exceed the number of recovery volumes. If drives 1 and 2 were both missing, and drive 3 said 0 (they’re the same) it still wouldn’t know if they’re both 0’s or if they’re both 1’s. If drive 3 said 1 (they’re different) you wouldn’t know whether drive 1 was 0 and drive 2 was 1, or if it was the other way around.

[*] RAID is not data backup

A RAID system used as a main drive is not a replacement for backing up data. Data may become damaged or destroyed without harm to the drive(s) on which they are stored. For example, some of the data may be overwritten by a system malfunction; a file may be damaged or deleted by user error or malice and not noticed for days or weeks. RAID can also be overwhelmed by catastrophic failure that exceeds its recovery capacity and, of course, the entire array is at risk of physical damage by fire, natural disaster, or human forces. RAID is also vulnerable to controller failure since it is not always possible to migrate a RAID to a new controller without data loss ^[9].

RAID drives can make excellent backup drives, when employed as backup devices to main storage, and particularly when located offsite from the main systems. However, the use of RAID as the main storage solution cannot replace backups.

[*] Implementations

(Specifically, the section comparing hardware / software raid)

The distribution of data across multiple drives can be managed either by dedicated hardware or by software . When done in software the software may be part of the operating system or it may be part of the firmware and drivers supplied with the card.

[*] Operating system based ("software RAID")

Software implementations are now provided by many operating systems . A software layer sits above the (generally block -based) disk device drivers and provides an abstraction layer between the logical drives (RAIDs) and physical drives . Most common levels are RAID 0 (striping across multiple drives for increased space and performance) and RAID 1 (mirroring two drives), followed by RAID 1+0, RAID 0+1, and RAID 5 (data striping with parity) are supported.

Software RAID has advantages and disadvantages compared to hardware RAID. The software must run on a host server attached to storage, and server's processor must dedicate processing time to run the RAID software. The additional processing capacity required for RAID 0 and RAID 1 is low, but parity-based arrays require more complex data processing during write or integrity-checking operations. As the rate of data processing increases with the number of disks in the array, so does the processing requirement. Furthermore all the buses between the processor and the disk controller must carry the extra data required by RAID which may cause congestion.

Over the history of hard disk drives, the increase in speed of commodity CPUs has been consistently greater than the increase in speed of hard disk drive throughput^[18]. Thus, over-time for a given number of hard disk drives, the percentage of host CPU time required to saturate a given number of hard disk drives has been dropping. e.g. The Linux software md RAID subsystem is capable of calculating parity information at 6GB/s (100% usage of a single core on a 2.1 GHz Intel "Core2" CPU as of Linux v2.6.26). A three-drive RAID5 array using hard disks capable of sustaining a write of 100MB/s will require parity to be calculated at the rate of 200MB/s. This will require the resources of just over 3% of a single CPU core during write operations (parity does not need to be calculated for read operations on a RAID5 array, unless a drive has failed).

Software RAID implementations may employ more sophisticated algorithms than hardware RAID implementations (for instance with respect to disk scheduling and command queueing), and thus may be capable of increased performance.

Another concern with operating system-based RAID is the boot process. It can be difficult or impossible to set up the boot process such that it can fail over to another drive if the usual boot drive fails. Such systems can require manual intervention to make the machine bootable again after a failure. There are exceptions to this, such as the LILO bootloader for Linux, loader for FreeBSD^[19] , and some configurations of the GRUB bootloader natively understand RAID-1 and can load a kernel. If the BIOS recognizes a broken first disk and refers bootstrapping to the next disk, such a system will come up without intervention, but the BIOS might or might not do that as intended. A hardware RAID controller typically has explicit programming to decide that a disk is broken and fall through to the next disk.

Hardware RAID controllers can also carry battery-powered cache memory. For data safety in modern systems the user of software RAID might need to turn the write-back cache on the disk off (but some drives have their own battery/capacitors on the write-back cache, a UPS, and/or implement atomicity in various ways, etc). Turning off the write cache has a performance penalty that can, depending on workload and how well supported command queuing in the disk system is, be significant. The battery backed cache on a RAID controller is one solution to have a safe write-back cache.

Finally operating system-based RAID usually uses formats specific to the operating system in question so it cannot generally be used for partitions that are shared between operating systems as part of a multi-boot setup. However, this allows RAID disks to be moved from one computer to a computer with an operating system or file system of the same type, which can be more difficult when using hardware RAID (e.g. #1: When one computer uses a hardware RAID controller from one manufacturer and another computer uses a controller from a different manufacturer, drives typically cannot be interchanged. e.g. #2: If the hardware controller 'dies' before the disks do, data may become unrecoverable unless a hardware controller of the same type is obtained, unlike with firmware-based or software-based RAID).

Most operating system-based implementations allow RAIDs to be created from partitions rather than entire physical drives. For instance, an administrator could divide an odd number of disks into two partitions per disk, mirror partitions across disks and stripe a volume across the mirrored partitions to emulate IBM's RAID 1E configuration . Using partitions in this way also allows mixing reliability levels on the same set of disks. For example, one could have a very robust RAID 1 partition for important files, and a less robust RAID 5 or RAID 0 partition for less important data. (Some BIOS-based controllers offer similar features, e.g. Intel Matrix RAID .) Using two partitions on the same drive in the same RAID is, however, dangerous. (e.g. #1: Having all partitions of a RAID-1 on the same drive will, obviously, make all the data inaccessible if the single drive fails. e.g. #2: In a RAID 5 array composed of four drives 250 + 250 + 250 + 500 GB, with the 500-GB drive split into two 250 GB partitions, a failure of this drive will remove two partitions from the array, causing all of the data held on it to be lost).

[*] Hardware-based

Hardware RAID controllers use different, proprietary disk layouts, so it is not usually possible to span controllers from different manufacturers. They do not require processor resources, the BIOS can boot from them, and tighter integration with the device driver may offer better error handling.

A hardware implementation of RAID requires at least a special-purpose RAID controller . On a desktop system this may be a PCI expansion card , PCI-e expansion card or built into the motherboard . Controllers supporting most types of drive may be used – IDE/ATA , SATA , SCSI , SSA , Fibre Channel , sometimes even a combination. The controller and disks may be in a stand-alone disk enclosure , rather than inside a computer. The enclosure may be directly attached to a computer, or connected via SAN . The controller hardware handles the management of the drives, and performs any parity calculations required by the chosen RAID level.

Most hardware implementations provide a read/write cache , which, depending on the I/O workload, will improve performance. In most systems the write cache is non-volatile (i.e. battery-protected), so pending writes are not lost on a power failure.

Hardware implementations provide guaranteed performance, add no overhead to the local CPU complex and can support many operating systems, as the controller simply presents a logical disk to the operating system.

Hardware implementations also typically support hot swapping, allowing failed drives to be replaced while the system is running.

However, hardware RAID controllers are mostly slower than software RAID due to a dedicated CPU on the controller card, which isn't as fast as a real CPU in a computer/server . More expensive RAID controllers have faster CPUs. If you buy a hardware RAID controller, checkout the specs and look for throughput speed.

[*] Firmware/driver-based RAID ("FakeRAID")

Operating system-based RAID doesn't always protect the boot process and is generally impractical on desktop versions of Windows (as described above). Hardware RAID controllers are expensive and proprietary. To fill this gap, cheap "RAID controllers" were introduced that do not contain a RAID controller chip, but simply a standard disk controller chip with special firmware and drivers. During early stage bootup the RAID is implemented by the firmware; when a protected-mode operating system kernel such as Linux or a modern version of Microsoft Windows is loaded the drivers take over.

These controllers are described by their manufacturers as RAID controllers, and it is rarely made clear to purchasers that the burden of RAID processing is borne by the host computer's central processing unit, not the RAID controller itself, thus introducing the aforementioned CPU overhead from which hardware controllers don't suffer. Firmware controllers often can only use certain types of hard drives in their RAID arrays (e.g. SATA for Intel Matrix RAID ), as there is neither SCSI nor PATA support in modern Intel ICH southbridges ; however, motherboard makers implement RAID controllers outside of the southbridge on some motherboards. Before their introduction, a "RAID controller" implied that the controller did the processing, and the new type has become known by some as "fake RAID" even though the RAID itself is implemented correctly. Adaptec calls them "HostRAID".

[*] Network-attached storage

Main article: Network-attached storage

While not directly associated with RAID, Network-attached storage (NAS) is an enclosure containing disk drives and the equipment necessary to make them available over a computer network , usually Ethernet . The enclosure is basically a dedicated computer in its own right, designed to operate over the network without screen or keyboard. It contains one or more disk drives; multiple drives may be configured as a RAID.

[*] Hot spares

Both hardware and software RAIDs with redundancy may support the use of hot spare drives, a drive physically installed in the array which is inactive until an active drive fails, when the system automatically replaces the failed drive with the spare, rebuilding the array with the spare drive included. This reduces the mean time to recovery (MTTR), though it doesn't eliminate it completely. Subsequent additional failure(s) in the same RAID redundancy group before the array is fully rebuilt can result in loss of the data; rebuilding can take several hours, especially on busy systems.

Rapid replacement of failed drives is important as the drives of an array will all have had the same amount of use, and may tend to fail at about the same time rather than randomly.^[^{citation needed]} RAID 6 without a spare uses the same number of drives as RAID 5 with a hot spare and protects data against simultaneous failure of up to two drives, but requires a more advanced RAID controller. Further, a hot spare can be shared by multiple RAID sets.