Friday, May 28, 2010
Tuesday, May 25, 2010
syntax
syntax
In computer science, the syntax of a programming language is the set of rules that define the combinations of symbols that are considered to be correctly structured programs in that language. The syntax of a language defines its surface form.[1] Text-based programming languages are based on sequences of characters, while visual programming languages are based on the spatial layout and connections between symbols (which may be textual or graphical).
The lexical grammar of a textual language specifies how characters must be chunked into tokens. Other syntax rules specify the permissible sequences of these tokens and the process of assigning meaning to these token sequences is part of semantics.
The syntactic analysis of source code usually entails the transformation of the linear sequence of tokens into a hierarchical syntax tree (abstract syntax trees are one convenient form of syntax tree). This process is called parsing, as it is in syntactic analysis in linguistics. Tools have been written that automatically generate parsers from a specification of a language grammar written in Backus-Naur form, e.g., Yacc (yet another compiler compiler).
Syntax definition
Parse tree of Python code with inset tokenization
The syntax of textual programming languages is usually defined using a combination of regular expressions (for lexical structure) and Backus-Naur Form (for grammatical structure) to inductively specify syntactic categories (non terminals) and terminal symbols. Syntactic categories are defined by rules called productions, which specify the values that belong to a particular syntactic category.[1] Terminal symbols are the concrete characters or strings of characters (for example keywords such as define, if, let, or void) from which syntactically valid programs are constructed.
Below is a simple grammar, based on Lisp, which defines productions for the syntactic categories expression, atom, number, symbol, and list:
expression ::= atom | list
atom ::= number | symbol
number ::= [+-]?['0'-'9']+
symbol ::= ['A'-'Z''a'-'z'].*
list ::= '(' expression* ')'
This grammar specifies the following:
* an expression is either an atom or a list;
* an atom is either a number or a symbol;
* a number is an unbroken sequence of one or more decimal digits, optionally preceded by a plus or minus sign;
* a symbol is a letter followed by zero or more of any characters (excluding whitespace); and
* a list is a matched pair of parentheses, with zero or more expressions inside it.
Here the decimal digits, upper- and lower-case characters, and parentheses are terminal symbols.
The following are examples of well-formed token sequences in this grammar: '12345', '()', '(a b c232 (1))'
The grammar needed to specify a programming language can be classified by its position in the Chomsky hierarchy. The syntax of most programming languages can be specified using a Type-2 grammar, i.e., they are context-free grammars.[2] However, there are exceptions. In some languages like Perl and Lisp the specification (or implementation) of the language allows constructs that execute during the parsing phase. Furthermore, these languages have constructs that allow the programmer to alter the behavior of the parser. This combination effectively blurs the distinction between parsing and execution, and makes syntax analysis an undecidable problem in these languages, meaning that the parsing phase may not finish. For example, in Perl it is possible to execute code during parsing using a BEGIN statement, and Perl function prototypes may alter the syntactic interpretation, and possibly even the syntactic validity of the remaining code.[3] Similarly, Lisp macros introduced by the defmacro syntax also execute during parsing, meaning that a Lisp compiler must have an entire Lisp run-time system present. In contrast C macros are merely string replacements, and do not require code execution.[4][5]
Syntax versus semantics
The syntax of a language describes the form of a valid program, but does not provide any information about the meaning of the program or the results of executing that program. The meaning given to a combination of symbols is handled by semantics (either formal or hard-coded in a reference implementation). Not all syntactically correct programs are semantically correct. Many syntactically correct programs are nonetheless ill-formed, per the language's rules; and may (depending on the language specification and the soundness of the implementation) result in an error on translation or execution. In some cases, such programs may exhibit undefined behavior. Even when a program is well-defined within a language, it may still have a meaning that is not intended by the person who wrote it.
Using natural language as an example, it may not be possible to assign a meaning to a grammatically correct sentence or the sentence may be false:
* "Colorless green ideas sleep furiously." is grammatically well-formed but has no generally accepted meaning.
* "John is a married bachelor." is grammatically well-formed but expresses a meaning that cannot be true.
The following C language fragment is syntactically correct, but performs an operation that is not semantically defined (because p is a null pointer, the operations p->real and p->im have no meaning):
complex *p = NULL;
complex abs_p = sqrt (p->real * p->real + p->im * p->im);
Wednesday, May 19, 2010
raid
RAID
From Wikipedia, the free encyclopedia
RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks, is a technology that allows high levels of storage reliability from low-cost and less reliable PC-class disk-drive components, via the technique of arranging the devices into arrays for redundancy. This concept was first defined by David A. Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987 as redundant array of inexpensive disks.[1] Marketers representing industry RAID manufacturers later reinvented the term to describe a redundant array of independent disks as a means of dissociating a low-cost expectation from RAID technology.[2]
RAID is now used as an umbrella term for computer data storage schemes that can divide and replicate data among multiple hard disk drives. The different schemes/architectures are named by the word RAID followed by a number, as in RAID 0, RAID 1, etc. RAID's various designs involve two key design goals: increase data reliability and/or increase input/output performance. When multiple physical disks are set up to use RAID technology, they are said to be in a RAID array[3]. This array distributes data across multiple disks, but the array is seen by the computer user and operating system as one single disk. RAID can be set up to serve several different purposes.
[*] Principles
RAID combines two or more physical hard disks into a single logical unit using special hardware or software. Hardware solutions are often designed to present themselves to the attached system as a single hard drive, so that the operating system would be unaware of the technical workings. For example, if one were to configure a hardware-based RAID-5 volume using three 250 GB hard drives (two drives for data, and one for parity), the operating system would be presented with a single 500 GB volume. Software solutions are typically implemented in the operating system and would present the RAID volume as a single drive to applications running within the operating system.
There are three key concepts in RAID: mirroring, the writing of identical data to more than one disk; striping, the splitting of data across more than one disk; and error correction, where redundant parity data is stored to allow problems to be detected and possibly repaired (known as fault tolerance). Different RAID schemes use one or more of these techniques, depending on the system requirements. The purpose of using RAID is to improve reliability and availability of data, ensuring that important data is not harmed in case of hardware failure, and/or to increase the speed of file input/output.
Each RAID scheme affects reliability and performance in different ways. Every additional disk included in an array increases the likelihood that one will fail, but by using error checking and/or mirroring, the array as a whole can be made more reliable by the ability to survive and recover from a failure. Basic mirroring can speed up the reading of data, as a system can read different data from multiple disks at the same time, but it may be slow for writing if the configuration requires that all disks must confirm that the data is correctly written. Striping, often used for increasing performance, writes each bit to a different disk, allowing the data to be reconstructed from multiple disks faster than a single disk could send the same data. Error checking typically will slow down performance as data needs to be read from multiple places and then compared. The design of any RAID scheme is often a compromise in one or more respects, and understanding the requirements of a system is important. Modern disk arrays typically provide the facility to select an appropriate RAID configuration.
[*] Organization
[*] RAID 0
[*] RAID 1
[*] RAID 3, RAID 4
[*] RAID 5
[*] RAID 6
[*] RAID 10
[*] RAID 01
[*] Hardware/Software
RAID can involve significant computation when reading and writing information. With traditional "real" RAID hardware, a separate controller does this computation. In other cases the operating system or simpler and less expensive controllers require the host computer's processor to do the computing, which reduces the computer's performance on processor-intensive tasks (see Operating system based ("software RAID") and Firmware/driver-based RAID below). Simpler RAID controllers may provide only levels 0 and 1, which require less processing.
RAID systems with redundancy continue working without interruption when one (or possibly more, depending on the type of RAID) disks of the array fail, although they are then vulnerable to further failures. When the bad disk is replaced by a new one the array is rebuilt while the system continues to operate normally. Some systems have to be powered down when removing or adding a drive; others support hot swapping, allowing drives to be replaced without powering down. RAID with hot-swapping is often used in high availability systems, where it is important that the system remains running as much of the time as possible.
Note that a RAID controller itself can become the single point of failure within a system.
[*] Standard levels
Main article: Standard RAID levels
A number of standard schemes have evolved which are referred to as levels. There were five RAID levels originally conceived, but many more variations have evolved, notably several nested levels and many non-standard levels (mostly proprietary).
Striped set without parity or mirroring. Provides improved performance and additional storage but no redundancy or fault tolerance. Because there is no redundancy, this level is not actually a Redundant Array of Independent Disks, i.e. not true RAID. However, because of the similarities to RAID (especially the need for a controller to distribute data across multiple disks), simple stripe sets are normally referred to as RAID 0. Any disk failure destroys the array, which has greater consequences with more disks in the array (at a minimum, catastrophic data loss is twice as severe compared to single drives without RAID). A single disk failure destroys the entire array because when data is written to a RAID 0 drive, the data is broken into fragments. The number of fragments is dictated by the number of disks in the array. The fragments are written to their respective disks simultaneously on the same sector. This allows smaller sections of the entire chunk of data to be read off the drive in parallel, increasing bandwidth. RAID 0 does not implement error checking so any error is unrecoverable. More disks in the array means higher bandwidth, but greater risk of data loss. | ||||
Mirrored set without parity or striping. Provides fault tolerance from disk errors and failure of all but one of the drives. Increased read performance occurs when using a multi-threaded operating system that supports split seeks, as well as a very small performance reduction when writing. Array continues to operate so long as at least one drive is functioning. Using RAID 1 with a separate controller for each disk is sometimes called duplexing. | ||||
Hamming code parity. Disks are synchronized and striped in very small stripes, often in single bytes/words. Hamming codes error correction is calculated across corresponding bits on disks, and is stored on multiple parity disks. | ||||
Striped set with dedicated parity or bit interleaved parity or byte level parity. | ||||
[*] Nested (hybrid) RAID
Main article: Nested RAID levels
As there is no basic RAID level numbered larger than 9, nested RAIDs are usually unambiguously described by concatenating the numbers indicating the RAID levels, sometimes with a "+" in between. For example, RAID 10 (or RAID 1+0) consists of several level 1 arrays of physical drives, each of which is one of the "drives" of a level 0 array striped over the level 1 arrays. It is not called RAID 01, to avoid confusion with RAID 1, or indeed, RAID 01. When the top array is a RAID 0 (such as in RAID 10 and RAID 50) most vendors omit the "+", though RAID 5+0 is clearer.
- RAID 0+1: striped sets in a mirrored set (minimum four disks; even number of disks) provides fault tolerance and improved performance but increases complexity. The key difference from RAID 1+0 is that RAID 0+1 creates a second striped set to mirror a primary striped set. The array continues to operate with one or more drives failed in the same mirror set, but if drives fail on both sides of the mirror the data on the RAID system is lost.
- RAID 1+0: mirrored sets in a striped set (minimum two disks but more commonly four disks to take advantage of speed benefits; even number of disks) provides fault tolerance and improved performance but increases complexity.
[*] New RAID classification
The next list provides criteria for all three classes of RAID:
- Failure-resistant disk systems (FRDS) (meets a minimum of criteria 1 - 6):
- Failure-tolerant disk systems (FTDS) (meets a minimum of criteria 7 - 15 ):
- Disaster-tolerant disk systems (DTDS) (meets a minimum of criteria 16 - 21):
[*] Non-standard levels
Main article: Non-standard RAID levels
Many configurations other than the basic numbered RAID levels are possible, and many companies, organizations, and groups have created their own non-standard configurations, in many cases designed to meet the specialised needs of a small niche group. Most of these non-standard RAID levels are proprietary.
- Storage Computer Corporation used to call a cached version of RAID 3 and 4, RAID 7. Storage Computer Corporation is now defunct.
- EMC Corporation used to offer RAID S as an alternative to RAID 5 on their Symmetrix systems. Their latest generations of Symmetrix, the DMX and the V-Max series, do not support RAID S (instead they support RAID-1, RAID-5 and RAID-6.)
- The ZFS filesystem, available in Solaris, OpenSolaris and FreeBSD, offers RAID-Z, which solves RAID 5's write hole problem.
- Hewlett-Packard's Advanced Data Guarding (ADG) is a form of RAID 6.
- NetApp's Data ONTAP uses RAID-DP (also referred to as "double", "dual", or "diagonal" parity), is a form of RAID 6, but unlike many RAID 6 implementations, does not use distributed parity as in RAID 5. Instead, two unique parity disks with separate parity calculations are used. This is a modification of RAID 4 with an extra parity disk.
- Accusys Triple Parity (RAID TP) implements three independent parities by extending RAID 6 algorithms on its FC-SATA and SCSI-SATA RAID controllers to tolerate three-disk failure.
- Linux MD RAID10 (RAID10) implements a general RAID driver that defaults to a standard RAID 1 with 2 drives, and a standard RAID 1+0 with four drives, but can have any number of drives, including odd numbers. MD RAID10 can run striped and mirrored, even with only two drives with the f2 layout (mirroring with striped reads, giving the read performance of RAID 0; normal Linux software RAID 1 does not stripe reads, but can read in parallel[6]).[7]
- Infrant (now part of Netgear) X-RAID offers dynamic expansion of a RAID5 volume without having to back up or restore the existing content. Just add larger drives one at a time, let it resync, then add the next drive until all drives are installed. The resulting volume capacity is increased without user downtime. (It should be noted that this is also possible in Linux, when utilizing Mdadm utility. It has also been possible in the EMC Clariion and HP MSA arrays for several years.) The new X-RAID2 found on x86 ReadyNas, that is ReadyNas with Intel CPUs, offers dynamic expansion of a RAID-5 or RAID-6 volume (note X-RAID2 Dual Redundancy not available on all X86 ReadyNas) without having to back up or restore the existing content etc. A major advantage over X-RAID, is that using X-RAID2 you do not need to replace all the disks to get extra space, you only need to replace two disks using single redundancy or four disks using dual redundancy to get more redundant space.
- BeyondRAID, created by Data Robotics and used in the Drobo series of products, implements both mirroring and striping simultaneously or individually dependent on disk and data context. It offers expandability without reconfiguration, the ability to mix and match drive sizes and the ability to reorder disks. It supports NTFS, HFS+, FAT32, and EXT3 file systems[8]. It also uses thin provisioning to allow for single volumes up to 16 TB depending on the host operating system support.
- Hewlett-Packard's EVA series arrays implement vRAID - vRAID-0, vRAID-1, vRAID-5, and vRAID-6. The EVA allows drives to be placed in groups (called Disk Groups) that form a pool of data blocks on top of which the RAID level is implemented. Any Disk Group may have "virtual disks" or LUNs of any vRAID type, including mixing vRAID types in the same Disk Group - a unique feature. vRAID levels are more closely aligned to Nested RAID levels - vRAID-1 is actually a RAID 1+0 (or RAID10), vRAID-5 is actually a RAID 5+0 (or RAID50), etc. Also, drives may be added on-the-fly to an existing Disk Group, and the existing virtual disks data is redistributed evenly over all the drives, thereby allowing dynamic performance and capacity growth.
- IBM (Among others) has implemented a RAID 1E (Level 1 Enhanced). With an even number of disks it is similar to a RAID 10 array, but, unlike a RAID 10 array, it can also be implemented with an odd number of drives. In either case, the total available disk space is n/2. It requires a minimum of three drives.
[*] Parity calculation; rebuilding failed drives
Parity data in a RAID environment is calculated using the Boolean XOR function. For example, here is a simple RAID 4 three-disk setup consisting of two drives that hold 8 bits of data each and a third drive that will be used to hold parity data.
Drive 1: 01101101
Drive 2: 11010100
The resulting parity data, 10111001, is then stored on Drive 3, the dedicated parity drive.
Drive 3: 10111001
Drive 1: 01101101
i.e. 10111001 XOR 01101101 = 11010100
Another way of saying this is:
[*] RAID is not data backup
A RAID system used as a main drive is not a replacement for backing up data. Data may become damaged or destroyed without harm to the drive(s) on which they are stored. For example, some of the data may be overwritten by a system malfunction; a file may be damaged or deleted by user error or malice and not noticed for days or weeks. RAID can also be overwhelmed by catastrophic failure that exceeds its recovery capacity and, of course, the entire array is at risk of physical damage by fire, natural disaster, or human forces. RAID is also vulnerable to controller failure since it is not always possible to migrate a RAID to a new controller without data loss [9].
[*] Implementations
(Specifically, the section comparing hardware / software raid)
The distribution of data across multiple drives can be managed either by dedicated hardware or by software. When done in software the software may be part of the operating system or it may be part of the firmware and drivers supplied with the card.
[*] Operating system based ("software RAID")
Software implementations are now provided by many operating systems. A software layer sits above the (generally block-based) disk device drivers and provides an abstraction layer between the logical drives (RAIDs) and physical drives. Most common levels are RAID 0 (striping across multiple drives for increased space and performance) and RAID 1 (mirroring two drives), followed by RAID 1+0, RAID 0+1, and RAID 5 (data striping with parity) are supported.
- Apple's Mac OS X Server[10] and Mac OS X[11] support RAID 0, RAID 1 and RAID 1+0.
- FreeBSD supports RAID 0, RAID 1, RAID 3, and RAID 5 and all layerings of the above via GEOM modules[12][13] and ccd.[14], as well as supporting RAID 0, RAID 1, RAID-Z, and RAID-Z2 (similar to RAID-5 and RAID-6 respectively), plus nested combinations of those via ZFS.
- Linux supports RAID 0, RAID 1, RAID 4, RAID 5, RAID 6 and all layerings of the above.[15][16]
- Microsoft's server operating systems support 3 RAID levels; RAID 0, RAID 1, and RAID 5. Some of the Microsoft desktop operating systems support RAID such as Windows XP Professional which supports RAID level 0 in addition to spanning multiple disks but only if using dynamic disks and volumes. Windows XP supports RAID 0, 1, and 5 with a simple file patch [17]. RAID functionality in Windows is slower than hardware RAID, but allows a RAID array to be moved to another machine with no compatibility issues.
- NetBSD supports RAID 0, RAID 1, RAID 4 and RAID 5 (and any nested combination of those like 1+0) via its software implementation, named RAIDframe.
- OpenBSD aims to support RAID 0, RAID 1, RAID 4 and RAID 5 via its software implementation softraid.
- OpenSolaris and Solaris 10 supports RAID 0, RAID 1, RAID 5 (or the similar "RAID Z" found only on ZFS), and RAID 6 (and any nested combination of those like 1+0) via ZFS and now has the ability to boot from a ZFS volume on both x86 and UltraSPARC. Through SVM, Solaris 10 and earlier versions support RAID 1 for the boot filesystem, and adds RAID 0 and RAID 5 support (and various nested combinations) for data drives.
Hardware RAID controllers can also carry battery-powered cache memory. For data safety in modern systems the user of software RAID might need to turn the write-back cache on the disk off (but some drives have their own battery/capacitors on the write-back cache, a UPS, and/or implement atomicity in various ways, etc). Turning off the write cache has a performance penalty that can, depending on workload and how well supported command queuing in the disk system is, be significant. The battery backed cache on a RAID controller is one solution to have a safe write-back cache.
Most operating system-based implementations allow RAIDs to be created from partitions rather than entire physical drives. For instance, an administrator could divide an odd number of disks into two partitions per disk, mirror partitions across disks and stripe a volume across the mirrored partitions to emulate IBM's RAID 1E configuration. Using partitions in this way also allows mixing reliability levels on the same set of disks. For example, one could have a very robust RAID 1 partition for important files, and a less robust RAID 5 or RAID 0 partition for less important data. (Some BIOS-based controllers offer similar features, e.g. Intel Matrix RAID.) Using two partitions on the same drive in the same RAID is, however, dangerous. (e.g. #1: Having all partitions of a RAID-1 on the same drive will, obviously, make all the data inaccessible if the single drive fails. e.g. #2: In a RAID 5 array composed of four drives 250 + 250 + 250 + 500 GB, with the 500-GB drive split into two 250 GB partitions, a failure of this drive will remove two partitions from the array, causing all of the data held on it to be lost).
[*] Hardware-based
A hardware implementation of RAID requires at least a special-purpose RAID controller. On a desktop system this may be a PCI expansion card, PCI-e expansion card or built into the motherboard. Controllers supporting most types of drive may be used – IDE/ATA, SATA, SCSI, SSA, Fibre Channel, sometimes even a combination. The controller and disks may be in a stand-alone disk enclosure, rather than inside a computer. The enclosure may be directly attached to a computer, or connected via SAN. The controller hardware handles the management of the drives, and performs any parity calculations required by the chosen RAID level.
Most hardware implementations provide a read/write cache, which, depending on the I/O workload, will improve performance. In most systems the write cache is non-volatile (i.e. battery-protected), so pending writes are not lost on a power failure.
Hardware implementations provide guaranteed performance, add no overhead to the local CPU complex and can support many operating systems, as the controller simply presents a logical disk to the operating system.
However, hardware RAID controllers are mostly slower than software RAID due to a dedicated CPU on the controller card, which isn't as fast as a real CPU in a computer/server. More expensive RAID controllers have faster CPUs. If you buy a hardware RAID controller, checkout the specs and look for throughput speed.
[*] Firmware/driver-based RAID ("FakeRAID")
Operating system-based RAID doesn't always protect the boot process and is generally impractical on desktop versions of Windows (as described above). Hardware RAID controllers are expensive and proprietary. To fill this gap, cheap "RAID controllers" were introduced that do not contain a RAID controller chip, but simply a standard disk controller chip with special firmware and drivers. During early stage bootup the RAID is implemented by the firmware; when a protected-mode operating system kernel such as Linux or a modern version of Microsoft Windows is loaded the drivers take over.
These controllers are described by their manufacturers as RAID controllers, and it is rarely made clear to purchasers that the burden of RAID processing is borne by the host computer's central processing unit, not the RAID controller itself, thus introducing the aforementioned CPU overhead from which hardware controllers don't suffer. Firmware controllers often can only use certain types of hard drives in their RAID arrays (e.g. SATA for Intel Matrix RAID), as there is neither SCSI nor PATA support in modern Intel ICH southbridges; however, motherboard makers implement RAID controllers outside of the southbridge on some motherboards. Before their introduction, a "RAID controller" implied that the controller did the processing, and the new type has become known by some as "fake RAID" even though the RAID itself is implemented correctly. Adaptec calls them "HostRAID".
[*] Network-attached storage
Main article: Network-attached storage
While not directly associated with RAID, Network-attached storage (NAS) is an enclosure containing disk drives and the equipment necessary to make them available over a computer network, usually Ethernet. The enclosure is basically a dedicated computer in its own right, designed to operate over the network without screen or keyboard. It contains one or more disk drives; multiple drives may be configured as a RAID.
[*] Hot spares
Both hardware and software RAIDs with redundancy may support the use of hot spare drives, a drive physically installed in the array which is inactive until an active drive fails, when the system automatically replaces the failed drive with the spare, rebuilding the array with the spare drive included. This reduces the mean time to recovery (MTTR), though it doesn't eliminate it completely. Subsequent additional failure(s) in the same RAID redundancy group before the array is fully rebuilt can result in loss of the data; rebuilding can take several hours, especially on busy systems.
Rapid replacement of failed drives is important as the drives of an array will all have had the same amount of use, and may tend to fail at about the same time rather than randomly.[citation needed] RAID 6 without a spare uses the same number of drives as RAID 5 with a hot spare and protects data against simultaneous failure of up to two drives, but requires a more advanced RAID controller. Further, a hot spare can be shared by multiple RAID sets.
[*] Reliability terms
Mean time to data loss (MTTDL)
Unrecoverable bit error rate (UBE)
Some RAID systems use RAM write cache to increase performance. A power failure can result in data loss unless this sort of disk buffer is supplemented with a battery to ensure that the buffer has enough time to write from RAM back to disk.
[*] Problems with RAID
[*] Correlated failures
A common misconception is that "server-grade" drives fail less frequently than consumer-grade drives. Two independent studies, one by Carnegie Mellon University and the other by Google, have shown that the “grade” of the drive does not relate to failure rates.[22][23]
[*] Atomicity
This is a little understood and rarely mentioned failure mode for redundant storage systems that do not utilize transactional features. Database researcher Jim Gray wrote "Update in Place is a Poison Apple"[24] during the early days of relational database commercialization. However, this warning largely went unheeded and fell by the wayside upon the advent of RAID, which many software engineers mistook as solving all data storage integrity and reliability problems. Many software programs update a storage object "in-place"; that is, they write a new version of the object on to the same disk addresses as the old version of the object. While the software may also log some delta information elsewhere, it expects the storage to present "atomic write semantics," meaning that the write of the data either occurred in its entirety or did not occur at all.
Since transactional support is not universally present in hardware RAID, many operating systems include transactional support to protect against data loss during an interrupted write. Novell Netware, starting with version 3.x, included a transaction tracking system. Microsoft introduced transaction tracking via the journaling feature in NTFS. Ext4 has journaling with checksums; ext3 has journaling without checksums but an "append-only" option, or ext3COW (Copy on Write). If the journal itself in a filesystem is corrupted though, this can be problematic. The journaling in NetApp WAFL file system gives atomicity by never updating the data in place, as does ZFS. An alternative method to journaling is soft updates, which are used in some BSD-derived system's implementation of UFS.
This can present as a sector read failure. Some RAID implementations protect against this failure mode by remapping the bad sector, using the redundant data to retrieve a good copy of the data, and rewriting that good data to the newly mapped replacement sector. The UBE (Unrecoverable Bit Error) rate is typically specified at 1 bit in 1015 for enterprise class disk drives (SCSI, FC, SAS) , and 1 bit in 1014 for desktop class disk drives (IDE/ATA/PATA, SATA). Increasing disk capacities and large RAID 5 redundancy groups have led to an increasing inability to successfully rebuild a RAID group after a disk failure because an unrecoverable sector is found on the remaining drives. Double protection schemes such as RAID 6 are attempting to address this issue, but suffer from a very high write penalty.
[*] Write cache reliability
The disk system can acknowledge the write operation as soon as the data is in the cache, not waiting for the data to be physically written. This typically occurs in old, non-journaled systems such as FAT32, or if the Linux/Unix "writeback" option is chosen without any protections like the "soft updates" option (to promote I/O speed whilst trading-away data reliability). A power outage or system hang such as a BSOD can mean a significant loss of any data queued in such a cache.
[*] Equipment compatibility
[*] Data recovery in the event of a failed array
[*] Drive error recovery algorithms
Many modern drives have internal error recovery algorithms that can take upwards of a minute to recover and re-map data that the drive fails to easily read. Many RAID controllers will drop a non-responsive drive in 8 seconds or so. This can cause the array to drop a good drive because it has not been given enough time to complete its internal error recovery procedure, leaving the rest of the array vulnerable. So-called enterprise class drives limit the error recovery time and prevent this problem, but desktop drives can be quite risky for this reason. A fix is known for Western Digital drives. A utility called WDTLER.exe can limit the error recovery time of a Western Digital desktop drive so that it will not be dropped from the array for this reason. The utility enables TLER (time limited error recovery) which limits the error recovery time to 7 seconds. <--"UPDATE" As of October 2009 Western Digital has locked out this feature in their desktop drives such as the Caviar Black. It is said that if you try to run the WDTLER program you may actually damage the firmware of the drive. "UPDATE"--> Western Digital enterprise class drives are shipped from the factory with TLER enabled to prevent being dropped from RAID arrays. Similar technologies are used by Seagate, Samsung, and
[*] Increasing recovery time
Drive capacity has grown at a much faster rate than transfer speed, and error rates have only fallen a little in comparison. Therefore, larger capacity drives may take hours, if not days, to rebuild. The re-build time is also limited if the entire array is still in operation at reduced capacity.[26] Given a RAID array with only one disk of redundancy (RAIDs 3, 4, and 5), a second failure would cause complete failure of the array, as the mean time between failure (MTBF) is high.[27]
[*] Other problems and viruses
[*] History
Norman Ken Ouchi at IBM was awarded a 1978 U.S. patent 4,092,732[30] titled "System for recovering data stored in failed memory unit." The claims for this patent describe what would later be termed RAID 5 with full stripe writes. This 1978 patent also mentions that disk mirroring or duplexing (what would later be termed RAID 1) and protection with dedicated parity (that would later be termed RAID 4) were prior art at that time.
The term RAID was first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the
One of the early uses of RAID 0 and 1 was the Crosfield Electronics Studio 9500 page layout system based on the Python workstation. The Python workstation was a Crosfield managed international development using PERQ 3B electronics, benchMark Technology's Viper display system and Crosfield's own RAID and fibre-optic network controllers. RAID 0 was particularly important to these workstations as it dramatically speeded image manipulation for the pre-press markets. Volume production started in
[*] Non-RAID drive architectures
Main article: Non-RAID drive architectures
Non-RAID drive architectures also exist, and are often referred to, similarly to RAID, by standard acronyms, several tongue-in-cheek. A single drive is referred to as a SLED (