RFQ/BOM 0 Sign In / Register

Select Your Location

RAID principle _ the advantages and disadvantages of different levels of RAID.

September 25, 2020

2865

1. Introduction


Raid stands for "Redundant Array of Independent Disks", sometimes also referred to as Disk Array.


RAID is a combination of multiple independent hard disks (physical hard disks) in different ways to form a hard disk group (logical hard disk), thereby providing higher storage performance than a single hard disk and providing data backup technology. The different ways of composing a disk array are called RAID levels.


Raid level:


Raid 0, Raid 1, Raid 0+1 (also known as Raid 10), Raid 2, Raid 3, Raid 5, Raid 6, Raid 7, Raid 53.


Principle analysis


Why do we need disk arrays?


At present, people gradually recognize the disk array technology. Disk array technology can be divided into several levels 0-5 RAID technology in detail, and the so-called RAID Level 10, 30, and 50 new levels have been developed. RAID is short for Redundant Array of Inexpensive Disk. The advantages of using RAID are simply: high security, fast speed, and large data capacity.


Certain levels of RAID technology can increase the speed to 400% of a single hard drive. The disk array connects multiple hard disk drives to work together, greatly improving the speed, and at the same time improving the reliability of the hard disk system to a near error-free state. These "fault-tolerant" systems are extremely fast and extremely reliable.


This article will discuss these new technologies and the advantages and disadvantages of different levels of RAID.


Hard disk data spanning (Spanning)


Data spanning technology makes multiple hard disks work like one hard disk, which enables users to break through the existing hard disk space limitations cheaply by combining existing resources or adding some resources.


Commonly used are the following RAID forms


(1) RAID 0


RAID 0 is also called Stripe (striping) or Striping, which represents the highest storage performance among all RAID levels. The principle of RAID 0 to improve storage performance is to distribute continuous data to multiple disks for access. In this way, data requests from the system can be executed by multiple disks in parallel, and each disk executes its own part of the data request. This parallel operation on data can make full use of the bandwidth of the bus and significantly improve the overall disk access performance.

RAID 0 structure diagram


As shown in the figure: The I/O data request sent by the system to the logical hard disk (RADI 0 disk group) composed of four disks is converted into 4 operations, each of which corresponds to a physical hard disk. We can clearly see from the figure that by establishing RAID 0, the original sequential data requests are distributed to all two hard drives for simultaneous execution. Theoretically, the parallel operation of four hard disks increases the disk read and write speed by 4 times at the same time.


However, due to the influence of various factors such as bus bandwidth, the actual increase rate will definitely be lower than the theoretical value. However, compared with the serial transmission of large amounts of data, the speed increase effect is obvious.


The disadvantage of RAID 0 is that it does not provide data redundancy, so once user data is damaged, the damaged data cannot be recovered. The characteristics of RAID 0 make it especially suitable for areas that require high performance but don't care about data security, such as graphics workstations. For individual users, RAID 0 is also an excellent choice to improve hard disk storage performance.


(2) RAID 1


RAID 1 is also called Mirror or Mirroring (mirroring), its purpose is to maximize the availability and repairability of user data. The operating mode of RAID 1 is to automatically copy 100% of the data written by the user to the hard disk to another hard disk.

RAID 1 structure diagram


As shown in the figure: when reading data, the system first reads the data from the source disk of RAID 0. If the data is successfully read, the system does not care about the data on the backup disk; if the data on the source disk fails, the system Automatically read the data on the backup disk without interrupting the user's work tasks. Of course, we should promptly replace the damaged hard disk and use the backup data to re-establish the Mirror to avoid irreparable data loss when the backup disk is damaged.


As 100% of the stored data is backed up, among all RAID levels, RAID 1 provides the highest data security. Similarly, because 100% of the data is backed up, the backup data occupies half of the total storage space, so the disk space utilization of Mirror is low and the storage cost is high. Although Mirror cannot improve storage performance, its high data security makes it especially suitable for storing important data, such as server and database storage.


(3) RAID 0+1


Just like its name, RAID 0+1 is a combination of RAID 0 and RAID 1, also known as RAID 10.


Take RAID 0+1 composed of four disks as an example, the data storage method is shown in the figure: RAID 0+1 is a solution that takes into account both storage performance and data security. While it provides the same data security as RAID 1, it also provides storage performance similar to RAID 0.


Because RAID 0+1 also provides data security through the 100% data backup function, the disk space utilization of RAID 0+1 is the same as that of RAID 1, and the storage cost is high.

RAID-10 structure diagram


The characteristics of RAID 0+1 make it particularly suitable for areas that require access to a large amount of data and require strict data security, such as banking, finance, commercial supermarkets, warehouses, and various file management.


(4) RAID 3


RAID 3 divides the data into multiple "blocks" and stores them on N+1 hard disks according to a certain fault tolerance algorithm. The effective space occupied by the actual data is the sum of the space of the N hard disks, and the N+1th hard disk is stored The data is the check error tolerance information. When one of the N+1 hard disks fails, the original data can be restored from the data in the other N hard disks. In this way, using only these N hard disks can also bring damage Continue to work (such as collecting and replaying material). When a new hard disk is replaced, the system can restore complete verification and fault tolerance information. Because in a hard disk array, the probability of failure of more than one hard disk at the same time is very small, so under normal circumstances, using RAID3, security can be guaranteed.

RAID 3 structure diagram


Compared with RAID0, RAID3 is relatively slow in terms of read and write speed. The fault-tolerant algorithm used and the block size determine the application of RAID. Under normal circumstances, RAID3 is more suitable for applications with large file types and high security requirements, such as video editing, hard disk broadcasters, large databases, etc.


(5) RAID 5


RAID 5 is a storage solution that balances storage performance, data security, and storage cost.


Take RAID 5 composed of four hard disks as an example, its data storage mode is shown in Figure 4: In the figure, P0 is the parity information of D0, D1 and D2, and so on. As can be seen from the figure, RAID 5 does not back up the stored data, but stores the data and the corresponding parity information on each of the disks that make up the RAID5, and the parity information and the corresponding data are stored separately On different disks. When a disk data of RAID5 is damaged, use the remaining data and the corresponding parity information to recover the damaged data.

RAID 5 structure diagram


RAID 5 can be understood as a compromise between RAID 0 and RAID 1. RAID 5 can provide data security for the system, but the degree of protection is lower than that of Mirror and the disk space utilization rate is higher than that of Mirror. RAID 5 has a data reading speed similar to RAID 0, but with one more parity information, the speed of writing data is slightly slower than writing to a single disk. At the same time, because multiple data correspond to one parity information, the disk space utilization of RAID 5 is higher than that of RAID 1, and the storage cost is relatively low.


(6) RAID 6


RAID 6 level is a RAID method designed to further strengthen data protection on the basis of RAID 5. It is actually an extended RAID 5 level. The difference from RAID 5 is that in addition to the same level data XOR check area on each hard disk, there is also an XOR check area for each data block. Of course, the check data of the data block of the current disk cannot be stored in the current disk but is interleaved. The specific form is shown in the figure.


In this way, each data block has two check protection barriers (one layer check and one overall check), so the data redundancy performance of RAID 6 is quite good. However, due to the addition of a parity, the writing efficiency is worse than that of RAID 5, and the design of the control system is more complicated. The second parity area also reduces the effective storage space. Due to the weak parity advantages of RAID 6 and the relatively large disadvantages in performance and cost performance compared to RAID 5, RAID 6 levels have not been actually applied, but are only a technology and idea for more advanced data redundancy. Try on

RAID-6 structure diagram


(7) RAID 7


The RAID 7 level is by far the theoretically highest performance RAID mode, because it has a major difference in the way it is constructed. The basic form is shown in the figure, you will find that in the past, a hard disk is a "pillar" that forms an array, but in RAID 7, multiple hard disks form a "pillar", and they all have their own channels, and because of this, You can decompose this picture into one hard drive connected to the main channel, but it is more subdivided than the previous level. The advantage of this is that when reading/writing data in a certain area, it can quickly locate, instead of being limited by a single hard disk in the past, only a part of the data area can be accessed at the same time. In RAID 7, the previous single hard disk is equivalent Since it is divided into multiple independent hard disks with its own read and write channels, the efficiency is self-explanatory.


However, the design of RAID 7 and the corresponding composition scale destined it to be a package contract plan. Generally speaking, RAID 7 is an overall system, with its own operating system, its own processor, and its own bus, rather than a simple plug-in card. In summary, the main features of RAID 7 are as follows:


All I/O transmissions are asynchronous, because it has its own independent controller and interface with Cache, which is not synchronized with the system clock. All read and write operations will pass through a high-speed system with a central Cache The bus, which we call X-Bus dedicated parity hard disk, can be used in any channel. The real-time operating system with complete functions is embedded in the array control microprocessor. This is the heart of RAID 7, which is responsible for the communication of each channel and Cache management is one of the biggest differences between it and other levels


Connectivity: up to 12 host interfaces


Scalability: linear capacity can be increased to 48 hard drives


Open system, using standard SCSI hard disk, standard PC bus, motherboard and SIMM memory


High-speed, integrated Cache data bus (the X-bus mentioned above)


Complete verification generation work inside Cache


Multiple additional drivers can be warmed up at any time to improve redundancy and flexibility. Easy to manage: SNMP (Simple Network Management Protocol) allows administrators to remotely monitor and implement system control. According to the RAID 7 designer, This kind of array will improve the I/O performance at write time by 150-600% compared to other RAID levels, although this has caused considerable controversy.

RAID-7 structure diagram


(8) RAID 53


Like RAID 10, RAID 53 is also a combination RAID level, but don’t apply the point of view of RAID 10, think it is a combination of RAID 5 and RAID 3. In fact, RAID 53 should be called RAID 30 or RAID 03 (also Say it is RAID 0+3), that is, the combination of RAID 3 and RAID 0. The specific form is shown in the figure: Compared with Figure 1, it can be found that the backup level in RAID 53 is changed from RAID 0 to RAID 3, which means that the original The mirrored array becomes a segmented (Segments) storage array. But instead of using a RAID 3 system for each RAID 0 hard drive, it uses RAID 3 for redundant storage (or parity) of all data, and the efficiency of reading and writing and ECC is much higher than that of RAID 0 .


It is worth noting that RAID 3 occupies a very important position in RAID 53 data transmission. When introducing RAID 3, I said that it has a high read and write transfer rate. Therefore, when carrying out large data throughput, due to the high transmission rate of RAID 3, the performance of RAID 53 is better than RAID 10 (because the time of redundant backup is shortened). Moreover, with RAID 0, its I/O bandwidth has not been reduced. However, it can be seen from its configuration form that its storage space utilization is lower than RAID 10, at 40%.