Posted March 26th, 2012 by Jason N
So after months of nothing, I finally come back to with website and I see 84 spam comments… I really should get some sort of spam filter, but I guess since nobody ever really comments much anyways (0 in total, I believe) I’ll just go ahead and delete all of the comments.
And now to the actual post…
Hard drives are a fragile device indeed; dropping them causes them to break, shaking them causes them to break, magnets can cause them to break and time can cause them to break. It’s interesting how these devices that we rely so heavily on with our data are actually constantly failing at random intervals.
So why do hard drives break, anyway?
It turns out that hard drives are actually mechanical devices; they consist of spinning platters, that record data that can be read by a ‘read/write head’. The fact that the head and platters move means that if too much shock and pressure is applied to the drive, the head can hit and scratch the platters, and since the platters spin around several thousand times per minute, the head quickly destroys the platter, rendering the drive ‘failed’ and useless.
So how do we protect our precious collection of pirated videos and games from such a failure?
One choice is to have two drives where one is a mirror of the other, meaning that if one drive fails, the other drive will continue to function. This is known as RAID1 and offers greater stability than just one drive, although most normal users prefer just to use the two drives as storage instead halving the capacity available.
But let’s now look at a different situation:
Over the years, you’ve collected so many videos and so much music, that you’ve now spread your data across multiple drives. The bad thing about spreading data throughout multiple drives is that when one of those drives fails, its data is becomes unrecoverable. To allow for data recovery, we can use RAID4 or RAID5, which allow data to be recovered when one of the drives has failed.
RAID, which stands for Redundant Array of Independent Drives, allows data to be spread across multiple (An array) of drives. One immediate advantage is that data access and writing will be much faster. Let’s take a look at how this works.
When you have a drive, data is written in sequences, I.E: to read/write a large piece of data, you have to read or write it in parts, one after the other. Now imagine you have two drives… You can now actually read/write twice as fast. Why? Because you can now read from of write to both drives at the same time. The principle remains the same for larger numbers of drives.
But now that you’ve spread the data across multiple drives, what will happen when one disk fails? All of the data will be lost, because the remaining drive only contains half the data that the computer needs. So how do we all for data recovery? The answer is using three or more drives, and using what’s known as ‘parity’ data.
Parity bits are what allow you to recover the contents of the failed drive (As long as the others are still working). Let’s imagine that 4 bits have been written to the drive. Let’s 1011 to the drives. The first drive will contain a ‘1’, the second will contain a ‘0’ the third will have ‘1’ and the fourth will also have ‘1’. So let’s try to calculate the parity data. If we ‘XOR’ each drives’ data with the next, we will end up with the parity bit.
XOR is a Boolean operator which performs some logic.
You use it as X XOR Y where X and Y are 1’s or 0’s.
The XOR operation will return ‘1’ if:
X = 0 and Y = 1
or X = 1 and Y = 0
It will return ‘0’ if:
X = 1 and Y = 1
X = 0 and Y = 0
In other maths
1 XOR 0 = 1
0 XOR 1 = 1
1 XOR 1 = 0
0 XOR 0 = 0
So if we XOR each drives’ bit with the next, we end up with this:
1 XOR 0 XOR 1 XOR 1
Let’s do this slowly so you understand:
We do this left to right, so we first take 1 XOR 0
1 XOR 0 = 1
Now we have this:
1 XOR 1 XOR 1
So we take 1 XOR 1
1 XOR 1 = 0
And now we have this:
0 XOR 1
Which equals 1, so our parity bit is 1.
Now let’s write that to the fifth drive, and we now have a RAID4 set-up.
Our data is now 10111.
Imagine now that the second drive decides that it’s sick of your abuse and fails.
We then have the following data remaining: 1X111. And X is missing data.
We can now recalculate the ‘missing X’ by taking the parity bit and pretending that it’s the second drive. We then redo the ‘parity calculation’
1 XOR 1 XOR 1 XOR 1 = 0
And so the missing data was 0. By this time you’ve replaced the drive and kept working with all data in-tact.
This technique also happens to work with larger amounts of data (I.E. You can ‘XOR’ more than one ‘bit’). So you could have 1011 XOR 1000 XOR 1010 XOR 1110 and use the resulting value as the parity data.
And that’s my explanation on RAID4.
RAID5 is similar, also using the parity data as in RAID4, but instead of having a dedicated parity drive, it spreads the parity data across all of the drives.
I might write an article on that some other time, but I’m tired and lazy, so I’ll just end it here.Comments: none yet | Filed under: articles | Tagged: backup, data, failure, hard disks, hard drives, raid, redundancy