Before you start
Objectives: Learn what is RAID and how most important RAID levels work.
Prerequisites: no prerequisites
Key terms: disk, data, hard, striping, parity, controller, mirroring, parity, jbod
Redundant Array of Independent Disks (RAID) is a disk subsystem that combines multiple physical disks into a single logical storage unit. Depending on the configuration, a RAID array can improve performance, provide fault tolerance, or both. There are two different ways in which we can implement RAID. In most cases we will purchase a RAID controller card and install it in our system. Many servers come with the RAID controller built into the system, so in that case we don’t actually have to install a card. However, that’s not the only way to do it. We can also implement RAID using software. Most network operating systems support software RAID. Instead of using a hardware card to manage the RAID array, we use the CPU in the operating system to create the RAID array. The software RAID works well and it’s easy to implement. Also, it’s less expensive because we don’t have to purchase any RAID card. However, it is slower than using hardware RAID. With hardware RAID we have a chip on the RAID card that is dedicated to managing that RAID array. If we implement software RAID then we have to use CPU time to perform RAID operations instead of using the chip on the RAID card.
When we implement multiple disks in a RAID array, our operating system will actually see those disk drives as one logical hard disk drive. It won’t differentiate between the two, even though the array is composed of multiple hard disk drives.
Striping in RAID
When we implement striping in a RAID array we actually take data that’s being written to the hard disk drive and split it into multiple parts so that it can be written to multiple different hard drives at the same time. For example, we can set up a RAID array that contains two hard disk drives and we can stripe data across them.
332.1 – Striping
Data is coming from the operating system. We actually split that data up into chunks and we write some of that data onto Disk 1 and and we write the other part of that data on Disk 2. The key benefit to striping is speed. That’s because we have two heads on two different hard drives writing the data split in half. Theoretically we should be able to write data two times as fast. In reality, it’s not quite two times as fast, but it’s pretty close. The same is with reading data. Instead of having to read all that data off of one hard disk drive, we can read off of two hard disk drives at the same time. The problem with striping is that using striping alone in a RAID array does not increase fault tolerance on the system. In fact, it decreases fault tolerance on the system. That’s because we have multiple points of failure. If we lose only one hard disk drive, all of our data is lost. Therefore, RAID includes a couple of other concepts to help protect the data. Not just to speed it up, which striping does, but to actually make data redundant. The first of these is mirroring.
Mirroring in RAID
With mirroring we can have two hard disk drives which are connected together in array. When we set up a mirrored array, the information that needs to be written to the hard disk drive is not split up between the disks. Instead of that we take one piece of information and write a duplicate copy on both hard disk drives at the same time. So, the data is not split up as with striping. It’s a full copy written to both hard disk drives at the same time.
332.2 – Mirroring
Doing this does not increase the speed of the system like striping does. Instead of that it increases redundancy. If something goes wrong with Disk 1, everything would be OK because an exact copy of the data is written to the second hard drive. Most RAID systems can be configured in a way that if one drive in a mirrored array goes down, the other drive automatically takes over. In that case the system will not go down and users can still use the second hard disk drive. User will not notice any difference. Mirroring still has a single point of failure and that’s the RAID controller which controls the mirrored drive array. If RAID controller goes down we still lose all of our data. In order to get around this, we can implement duplexing.
Duplexing in RAID
Duplexing is similar to mirroring. We still have our two hard disk drives but instead of connecting them to a single controller, we connect one hard drive in the mirrored array to one RAID controller, and we connect the other hard drive in the mirrored array to a different RAID controller.
332.3 – Duplexing
When data needs to be written, it is written to both disks at the same time and it goes through two different RAID controller boards. If one RAID controller goes down our disk IO operations will continue. Users won’t notice any difference because the second RAID controller will take over and continue servicing requests using the second hard disk drive.
Parity in RAID
Mirroring and duplexing works well but we don’t get the speed advantage introduced by striping. To overcome that issue we can use parity. When we set up parity, first we set up our striped array. Let’s say that we use two disks for striping. Additionally, we also add a third drive to the array.
332.4 – Parity
Depending on the rate level we could use either the entire disk for parity, or we might use just parts of the three disks. In our example the third disk is added to the array which will contain the parity information. This parity information can be used to reconstruct data if something bad happens to one of the disks in the striped array. By doing this we have created a striped array that has some redundancy to it. If one disk goes down, we can reconstruct the missing data from the parity information on the third disk.
We will talk about RAID levels 0, 1 and 5, which are the most important. Keep in mind that there are other RAID levels as well. Also note that higher RAID level doesn’t mean that it is better. RAID level only specifies a certain way of configuring a RAID array. Which level is better depends on what we really want to do.
RAID 0 is just simple striping of data between two hard disk drives. As such, RAID 0 increases the performance of read and write operations on our storage subsystem. The disadvantages of striping is that there is no redundancy. If one drive dies, we lose all data on our system because our files are split between the two hard disks. RAID 0 requires a minimum of two disks. It has no overhead because all disk space is available for storing data.
In RAID level called RAID 1 we’re doing simple mirroring. As we talked about before, with mirroring we have two hard disk drives controlled by a single RAID controller. The data is written to both hard drives in the exact duplicate. If one drive dies, the other drive can immediately take over and continue servicing clients. RAID 1 provides redundancy and fault tolerance, but it does not increase the speed of the system. It requires a minimum of two disks. It has a 50% overhead because data is written twice. Half of the disk space is used to store the second copy of the data. RAID 1 provides fault tolerance for a single disk failure.
If we’re going to set up a RAID 5 array, we have to have a minimum of three disks. We can have more if we want to, but 3 is the minimum. In RAID 5 we stripe across all three of the disks, which dramatically increases performance, but we reserve one portion of each disk for parity information. If a single disk fails, its data can be recovered using the parity information stored on the remaining disks.
332.5 – RAID 5
RAID 5 works well. However, one of the weaknesses with RAID 5 is that if one of the disks goes down, the performance will be really bad. We do have parity information that can keep the array running and we can still access our files, but the performance will be really lousy in that case. RAID 5 provides an increase in performance for read operations, but write operations are slower because of the time required to compute and write the parity information.
Some RAID controllers support combined levels of RAID. For example, RAID 0+1 is a striped array that is mirrored. Other combined configurations that might be supported include RAID 1+0, RAID 5+0, and RAID 5+1. For all RAID configurations, the amount of disk space used on each disk should be of equal size. If disks in the array are of different sizes, the resulting volume will be limited to the smallest disk. Remaining space on other drives can be used in other RAID sets or as traditional storage.
JBOD (just a bunch of disks) is not a RAID configuration, but like RAID configures multiple disks into a single logical storage unit. JBOD creates a single volume using space from two or more disks. Spanning is another term for JBOD because the volume spans multiple physical disks. In JBOD configuration data is not striped between disks. On a new JBOD configuration, data is typically saved to the first disk until it is full, then additional data is saved to the second disk and so on. Disks used within the spanned volume can be of different sizes, and there is no overhead. With JBOD there are no performance or fault tolerance benefits.
RAID stands for Redundant Array of Independent Disks. RAID can be implemented using hardware card or software. When we use striping we write different parts of the file to multiple disks. With mirroring we write the same file to multiple disks. With duplexing we are using multiple RAID controllers. Parity information can be used to reconstruct data if something goes wrong with our disk in an array. RAID 0 is just simple striping. RAID 1 is simple mirroring. In RAID 5 we stripe across multiple disks (minimum 3), but we reserve one portion of each disk for parity information. JBOD (just a bunch of disks) is not a RAID configuration, but like RAID configures multiple disks into a single logical storage unit.