How a Data Recovery Engineer Sees RAID

As somebody recovering data from RAID arrays, my view on them is a little different to the norm. In most cases I would say avoid RAID wherever possible. Simplicity is key.

Below are my answers to some real questions I have received from clients about RAIDs.

Why did this RAID disk fail?

Hard drive failure is not unusual and is often not avoidable. The truth is that all hard drives fail eventually, whether they are used in a RAID or not. Even though a RAID system can provide some fault tolerance from physical drive failure, they do have limits. A RAID5 on three disks for example can only handle a single drive failure at any one time. It is common for a second disk to fail whilst the other disk is being replaced. This is when RAID recovery is required; to first access the failed drives, and then rebuild the RAID. The best protection against RAID failure is to make backups. Backups in as many formats, in as many different physical locations as possible.

Why did the server fail so badly? Isn’t RAID meant to prevent this?

A 3-disk RAID5 can only cope with one bad disk. This doesn’t help when two drives fail at the same time. Although a RAID array can provide some leeway when it comes to disk failures, it doesn’t always help when you have multiple failures in quick succession. Adding more disks to the RAID can provide more redundancy, however this costs more money, and also adds complexity when things go wrong. Also you could be in a similar position if three disks happen to fail next time. A live system could fail at any time so prepare for the worst. Backups are cheap, and take a relatively short amount of time. RAID recovery can be expensive and cause unnecessary downtime.

Why couldn’t our IT support recover this?

We are a specialist data recovery company, with access to tools and resources which are not available to IT Support staff. We have spent the last fifteen years perfecting the process of extracting data from failed & failing hard drives and RAID arrays. For the best chance of recovery, we like to get the drives as soon after failure as possible. If more work gets carried out on the drives, things can be made much worse.

How can we avoid this happening again in the future?

To avoid similar problems in the future, the best way forward is some form of regular backup. The backups should be verified and then tested / restored as often as possible. This is where disaster recovery comes in, which can involve simulating certain types of failure and making sure you can get up and running again from your backups. At the very least, it wouldn’t hurt to put the really crucial business files onto an external hard drive every few weeks and store it in your company safe. It’s low-tech but at least you could plug it in to any PC and access the important business data if required as a last resort.

I’m not against RAIDs. They do have their place, but cannot be relied upon as a replacement for regular backups.

We have more articles about RAID here.

Leave a Reply

Your email address will not be published. Required fields are marked *