Redundant array of independent or inexpensive disk (RAID) is one of the oldest but not outdated technology. Used to improve storage systems, by improving performance, reliability and availability of stored data.
What is RAID?
This is a data storage virtualization technology that has been used for many years. To combine multiple physical disk drives components into one or more logical units. For the purpose of data redundancy, performance improvement or sometimes both. It stores the same data in different places on multiple Hard drive or solid state drive to protect it. Against drive failure or physical damage. Data is normal distributed across the drives in one of the several ways. Mainly referred to as RAID levels. Depending on the required level of redundancy and performance. They are different schemes of data distribution layouts, which are named by the word RAID then followed by the number. Example RAID 0 or RAID 5.
Each RAID level provides a different balance between reliability, availability, performance and capacity. These consist of a series of data mirroring, striping and parity. Depending on the level requirements. Some levels use a combination of either striping and parity or striping and mirroring, while others prefer to use one. They are different levels of striping which can range from bit level to block level. The mirroring process involves the replication of the data contents transferred onto a drive. Hence, creating an exact copy that can be stored on a remote location. On the other hand, parity is the adding of a check bit to the string of binary code that is used for error detection during data transmission. This allows the receiving end to know how many bits of data it should receive. And if there is any difference then an error is detected and data should be resent.
Types of RAID
Depending on the administration deployment. They are two types of RAID controller deployments, which are hardware or software based. A RAID controller is a type of storage component that manages the disk drives in a RAID infrastructure. And provides the physical disk drives as a logical unit to the computer or server managing it. This helps improve performance and also protect data in case of a crash. In a hardware based RAID, a physical controller manages the entire array. This can either be an on-board chip or a RAID controller card that interfaces through the PCLe expansion slots.
With Software based, RAID services are delivered from the host. By using the resources of the hardware system. Like the central processing unit and the main memory. Despite performing the same function of a hardware based RAID. It does not produce much of performance boost as that of the hardware based. It comes in two categories, which are pure software and as a hybrid of software with hardware. This is brought about as pure software affecting performance of other application and hardware based being to expensive. Hybrid or firmware RAID is only implemented at the beginning of the boot process of the OS. Hence, delivering RAID BIOS functions from RAID BIOS on the motherboard.
RAID Vs Backup
The purpose of RAID is to provide redundancy. If one disk fails, the other drivers essentially take over until the failed drive is replaced. They are a good number of different RAID schemes or levels available out there. Of all the RAID levels the most common ones are RAID 0, 1, 5, 6 and 10. They are mainly used to protect data lose, due to physical damage or theft of drives. This allows data to be recovered from the other drive that has the exact same copy of the information. While backup provides data recovery due to data corruption and loses. Backup comes with a lot of options of backing up. Such as secondary local storage, backup servers or cloud services. Unlike RAID which is associated with redundancy, backup isn’t associated with that.
Coming to comparison of the two. The only difference is the type of data recovery that each can perform. As for backup, you can not recover data backed up on a drive that is physically damaged or stolen. Hence, RAID becomes important here. And on the other hand with RAID. If you accidentally deleted a file or data on one drive then that file is deleted on the entire system. Making it impossible to retrieve, but with backup it can easily be retrieved from backed up files. Although, they are constant debates on which is the best. I recommend a hybrid of both for better performance and reliability.
Many RAID levels employ an error protection called parity. Which is a widely used method for fault tolerance in a given data set. And offers some levels of data security. Originally, they where only a few levels of RAID, But many variations have evolved over the past years. Which now include nested and non-standard levels. But either hardware or software, RAID is available in different levels or schemes. These include-:
- RAID 0: consists of striping, but no mirroring or parity. It splits files and strips data across to or more disks. Data that is striped is treated as a single partition. Throughput is generally faster, as reading and writing is done on multiple disks at the same time. The main disadvantage is that isn’t fault tolerant. If one drive fails then the entire volume and file are lost.
- RAID 1: mainly consists of data mirroring, but without parity nor striping. As data is written identical on two or more disks. Hence, producing a mirrored set of drives. The main advantage is that data can be recovered in cases where one disk fails. Data is automatically mirrors back data on the replaced drive.
- RAID 2: uses bit level striping. Hamming code parity is used for error detection. All disk spindle rotate in synchronous. So that data is striped, such that each sequential bit is on every different drive. The major disadvantage is that it require an extra drive for error detection. And it is due to this that the structure is complex and costs are expensive.
- RAID 3: unlike RAID 2, i uses byte level striping with dedicated parity. By striping data onto multiple disks. Giving it the edge of transferring data in bulk and the ability to access data in parallel. Although the disadvantage is that additional drives for parity are required. And it also slows down when there is random access workload.
- RAID 4: consists of block level striping with a dedicated disk for parity. The striping provides high performance for random reads. Due to the fact that RAID 4 needs o write all parity data to one disk, random write performance is affected.
- RAID 5: distributes striping and parity at block level. Unlike RAID 4, parity is distributed among all the drives. This RAID level requires at least 3 disk drives. All drives are required, but only one is to be present for operation. It combines the performance of RAID 0 with the redundancy of RAID 1. This helps it to restore data when one drive fails by calculating subsequent reads from the distributed parity.
- RAID 6: like RAID 5, it distributes striping and parity. But unlike RAID 5, it uses double parity. Providing fault tolerance up to two drive failure. And also requires a minimum of four disks in its array. It also offers higher redundancy and increased read performance. Note the longer the drive capacity and the longer the array size. The more important it becomes to use RAID 6.
- Nested or hybrid RAID: these are RAID levels that are normally embedded within other RAID levels. As the element can either be an individual drives or arrays themselves. They are rarely nested more than one level deep. The final array is normally known as the top array. In cases where you have a nested RAID level, of RAID (1 + 0) or RAID (0 + 1). The first example involves mirroring and then striping, meaning the top-level is RAID 0 for RAID (1 + 0). The second one involves striping and then mirroring. Making RAID 1, the top-level of RAID (0 + 1). Bear in mind the plus sign is always omitted. So RAID (1 + 0) is actually RAID10 while RAID (0 + 1) is RAID 01.
- Nonstandard RAID: these are levels that are normally developed by companies or open-source projects for proprietary use. And may differ from standard RAID levels.
The main benefit of RAID is its increased fault tolerance, which increases the mean time between failures. By storing Data redundantly. With the development of a hybrid of RAID and Backup. This reduces a chance of losing data almost to zero. By increasing availability, resiliency, reliability and speed. When using multiple drives as compared to what a single drive can do. Like anything else, RAID has its own disadvantage or limitations. The expense and cost of implementing a nested RAID are way too high. And data in the array is more vulnerable to failure, if a failed drive isn’t replaced upon failure. Causing a probability of having bad sectors in the remaining drives.