Dan has been a data recovery engineer at Dataquest International Ltd for over 8 years. When not recovering data, Dan can often be found writing articles, maintaining this website, or riding his old bicycle around Portsmouth.
October was a pretty busy month for us, so I thought it would be a good chance to check on our success rate. As you can see from the graphic below, we have a great success rate of at least 69%. We always keep an eye on our success rate, to make sure we are still recovering as many drives as possible. Our success rate is often higher than 69% but we did get a few non-recoverable drives which had suffered physical media damage. For an example of why those are unrecoverable, have a look at a photo of a head crash. (Tip: Those dark circular lines are not meant to be there!)
Of those successful jobs, a whopping 90% of them were recovered without even needing to repair them in our cleanroom. This is interesting as cleanroom facilities are often advertised as one of the most important factors when choosing a data recovery company. Not to undermine the need for cleanroom facilities, but they are not required for most hard drives.
In the graphic above we have classified non-cleanroom jobs as external, and cleanroom jobs as internal.
As somebody recovering data from RAID arrays, my view on them is a little different to the norm. In most cases I would say avoid RAID wherever possible. Simplicity is key.
Below are my answers to some real questions I have received from clients about RAIDs.
Why did this RAID disk fail?
Hard drive failure is not unusual and is often not avoidable. The truth is that all hard drives fail eventually, whether they are used in a RAID or not. Even though a RAID system can provide some fault tolerance from physical drive failure, they do have limits. A RAID5 on three disks for example can only handle a single drive failure at any one time. It is common for a second disk to fail whilst the other disk is being replaced. This is when RAID recovery is required; to first access the failed drives, and then rebuild the RAID. The best protection against RAID failure is to make backups. Backups in as many formats, in as many different physical locations as possible.
Why did the server fail so badly? Isn’t RAID meant to prevent this?
A 3-disk RAID5 can only cope with one bad disk. This doesn’t help when two drives fail at the same time. Although a RAID array can provide some leeway when it comes to disk failures, it doesn’t always help when you have multiple failures in quick succession. Adding more disks to the RAID can provide more redundancy, however this costs more money, and also adds complexity when things go wrong. Also you could be in a similar position if three disks happen to fail next time. A live system could fail at any time so prepare for the worst. Backups are cheap, and take a relatively short amount of time. RAID recovery can be expensive and cause unnecessary downtime.
Why couldn’t our IT support recover this?
We are a specialist data recovery company, with access to tools and resources which are not available to IT Support staff. We have spent the last fifteen years perfecting the process of extracting data from failed & failing hard drives and RAID arrays. For the best chance of recovery, we like to get the drives as soon after failure as possible. If more work gets carried out on the drives, things can be made much worse.
How can we avoid this happening again in the future?
To avoid similar problems in the future, the best way forward is some form of regular backup. The backups should be verified and then tested / restored as often as possible. This is where disaster recovery comes in, which can involve simulating certain types of failure and making sure you can get up and running again from your backups. At the very least, it wouldn’t hurt to put the really crucial business files onto an external hard drive every few weeks and store it in your company safe. It’s low-tech but at least you could plug it in to any PC and access the important business data if required as a last resort.
I’m not against RAIDs. They do have their place, but cannot be relied upon as a replacement for regular backups.
This immense 2TB iMac drive may be heavy, but have you ever wondered why?
When we recover these drives we often have to work on individual heads. As you can see from the image, this monster has 10 heads (the first is numbered zero). This means there are 5 spinning disks inside the drive.
From the outside, the only clue that these drives are so rammed full of disks is their weight. They are no bigger physically than any other desktop hard drive.
It is common to hear of hard drive problems happening as a result of a system update, or operating system upgrade. We have a theory that could possible explain this.
First of all, you should always make a full backup of your system before installing an update. It’s not unheard of for updates to go wrong, so this is crucial.
During a software update, a large amount of data gets read and written to and from the hard drive. If the hard drive is functioning fine, this happens without issues. Installing updates is a normal (and necessary) part of computing.
If the hard drive is not quite 100%, then maybe running a software update is the last straw. It puts the failing drive under a bit of extra strain and bang. The hard drive fault which had been lying dormant for months, now rears it’s head and the hard drive gives up, leaving you stranded from your data. Bear in mind that the drive would have failed eventually anyway, but the heavy disk usage probably accelerated the failure.
There are a couple of things to look out for, that may predict an imminent hard drive failure. (Please don’t wait for these signs before backing up. Do it now!)
Warnings or messages during boot up
Computer being unresponsive / slow at times
The dreaded beachball animation (On the Mac)
Clicking / chirping noises
If you are running any computer with important data, you should back up immediately and as often as possible. That way it doesn’t matter if your hard drive fails; just throw in a new drive and reload it from your backup.
Myth 1: When files are deleted they are gone forever.
Fact: When files are deleted they are actually only removed from an index. Unless you then overwrite those sectors with new data, the files will still be there. If you delete a file it is important to stop using the computer. Even browsing the internet causes cache files and images to be downloaded to the hard drive, potentially overwriting the deleted files.
Myth 2: Putting a hard drive in the freezer will bring it back to life.
Fact: This is an old one, which will not die. We have never had to put a hard drive in a freezer. There is only anecdotal evidence that freezing a hard drive helps in any way. One of the most common types of hard drive failure is firmware corruption, which cannot be fixed in a freezer. I would be worried about introducing condensation into the drive, which could be devastating. If anyone knows where this idea came from, or how the freezer is supposed to help, then I would love to hear about it.
Myth 3: The FBI can recover anything.
Fact: The FBI are bound by the same laws of physics as we are. If a hard drive has had a head crash, and scraped the magnetic coating off the platter, there is no data left to recover. You cannot read magnetic data from particles of dust! Even the FBI can’t recover that.
Myth 4: The best way to recover a hard drive is by swapping the platters out.
Fact: In almost all cases, you should not disturb the alignment of the platters. They are manufactured within strict tolerances which cannot be recreated outside of a manufacturing environment. If the problem lies with the on-disk firmware, electronic components, or read / write heads, then swapping the platters would not solve anything.
Note – If the spindle motor gets stuck then it can be necessary to swap the platters, but only as a last resort.
Hard drive firmware is the embedded software which controls the running of your hard drive. Most of it is stored within hidden sectors on the hard drive, and in normal operation you wouldn’t know it was there. Whenever you power up a drive, the firmware makes the motor spin, starts the read / write heads, and checks against a list of bad sectors. Only then will the computer be able to access the data area and allow you to see your files. If there is a problem with the firmware, the drive will get stuck and you won’t be able to access your data at all.
Failed firmware is almost impossible to diagnose without specialist equipment. In fact, it is hard to confirm that the firmware is faulty at all. Many hard drive problems manifest themselves in the same way; by clicking, or spinning down, or just generally not being identified by the PC. You shouldn’t start changing components until you know where the problem lies.
In the early days, most firmware could fit onto the electronic circuit board; simply swapping a damaged PCB with a good one was a common fix. Firmware is now too large to fit on the PCB, so the PCB contains just a very simple boot loader which starts off the drive and then loads the firmware from the disk surface. This means that swapping the PCB is no longer a common fix, and won’t work on most modern hard drives.
We have specialist hardware and software that allows us to check and repair the firmware on most hard drives. We have also dealt with many of these problems before and have a huge database of previous experience to draw on.
In the vast majority of cases, deleted data is actually still lurking around on your hard drive. If you put data in the Recycle Bin or Trash, and them empty it, all you are actually doing is telling the system that it can reuse those parts of the disk when it wants. Until you replace those areas with new data, the old data will still be there.
The Filing Cabinet
The tried and trusted analogy is of a filing cabinet. When you delete a file, you are removing the index card from the front of the drawer, but the actual file is still in there.
This is why it is really important to switch off your computer as soon as possible if you have accidentally deleted some files. You may not realise but even small actions like checking e-mail or browsing the internet can write cache files to the disk. That is when data could be lost.
Overwritten / Deleted Data
We often hear about the FBI being able to recover overwritten files. While this may have been possible on very old – low capacity hard drives (~100MB), it is unlikely to be possible on modern hard drives. The magnetic material is far too densely packed. Even then, it would only be tiny fragments of data recovered, and not whole files.
The Problem With SSDs
Solid state drives bring a whole new problem of their own. Due to the way the data is distributed around the device, known as wear levelling, you can never be sure of which sector you are writing or overwriting. Wear levelling is necessary to prolong the life of an SSD, but it means the drive could be moving data around behind the scenes, making deleted files much more difficult to track down.
In most cases, we can recover deleted files with the original file names and folders. With deleted Mac data, this is often not possible. In that case we have to use a special type of scan, which finds all files of a given type and saves them to numbered files. This means camera photos may be recovered into a JPG folder, with files named like photo0001.jpg, photo0002.jpg and so on.
If required we can process certain types of these files into more meaningful order. For photos we can arrange into folders by date taken, and for music files we can arrange into Artist / Album order.
The Important Bit
If you accidentally delete some files, they are likely to be recoverable. It’s the actions you take next which can make the recovery difficult – if not impossible.
RAID is often touted as the silver bullet in data storage. Increased storage capacity, resistance from hardware failures and improved performance. While these are all valid upsides to a RAID setup, there are also a few downsides which need to be addressed.
1. Extra Storage.
RAID can allow for a huge pool of storage, but with that storage comes great responsibility. You should factor in at least enough capacity to backup the RAID data somewhere else. If you can only afford 8TB of storage then you should only use 4TB for data and the other 4TB to back it up; Preferably on another machine / standalone system.
The first letter in RAID stands for redundancy. This means you can afford to lose a certain number of disks without losing access to your data. This also means that if you have a disk failure you need to get it replaced immediately, otherwise you’re running without redundancy.
Nobody likes downtime. If your 16TB RAID array goes offline without a backup then you have a couple of options. One option is to attempt to get the RAID back online by replacing disks, rebuilding the array etc, but this is risky. If this is your only copy of the data then rebuilding / reformatting the RAID could corrupt the data beyond recovery. Don’t do this if you don’t have a backup to fall back on.
The second and preferable option is to get the RAID professionally recovered. When we receive a RAID, the first thing we do is make images of all disks. This allows us to work on the RAID without risk. Then we use a read-only process to extract the data onto another form of storage. This is where downtime comes in. Unless you go for an emergency process, you could have to make do without the data for a number of days.
So What’s The Way Forward?
It’s one word. Redundancy.
Whatever you do, make sure your data is replicated across as many types of storage as possible. In an ideal world you would have a duplicate system running alongside the live system, which can take over if anything goes wrong. Then have the data on another type of storage, which you can access from somewhere else. Imagine if the RAID controller failed, and you could only access the data from that one machine.
It doesn’t matter how many backups you have if they all require the same system to access them.
I’ve only just scratched the surface here, but you should always look to make extra copies of your data. It may seem redundant now, but when your server fails containing all your data, all your accounts, all your client details and your website, you’ll be glad you kept that extra copy.
My main computer is an old MacBook Pro. I often download Linux ISOs to install on other computers. In recent Debian-esque releases this is actually really simple.
1. I find it quicker and easier to install from USB so first insert a USB pen / stick of some sort.
Note: This USB stick will be erased, so don’t use one with data that you need to keep!
2. Next we need to find out which number has been assigned to the USB stick. If you only have one disk in your Mac then the USB will usually be disk1, but always check first. (Note: Disks are numbered from zero, so your internal drive should be disk0) On your Mac open Disk Utility, which is located within Applications / Utilities. (See Image)
Select the USB stick from the lefthand window and then click the Info button which is on the toolbar. (See Image)
You will get a pop up window with loads of information about the device. We only need the Disk Identifier. Make a note of this for later.
3. To allow us to write data to the USB stick we need to unmount any volumes currently on there. (see image)
4. Now comes the actual writing. First locate the Terminal application, again within Applications / Utilities. (see image)
5. Remember to change the code to match your Disk Identifier from earlier. There are a few things to note about the following command.
sudo – allows you to run dangerous commands, so will require an administrator password
Instead of typing the location of the ISO file you can just drag the ISO onto the terminal when required.
“if” means input file (in this case the ISO file), “of” means output file (the USB stick)
When we found out the Disk Identifier, it was disk1. That will work in the command, but we use rdisk1 instead, which gives us raw access to the disk. This may not be necessary, but it works for me.
There is a lot of discussion about block sizes, but I find 4MB is reasonable for writing ISOs to USB. In Linux we often type bs=4M, however the Mac prefers it like bs=4096 instead. It’s the same thing, just expressed differently.
sudo dd if=[drag iso here] of=/dev/r[disk number] bs=4096; sync
If you’ve got it right, you shouldn’t get any feedback until it finishes. Your USB stick may have a blinking LED whilst the data is being written. For reference the 200MB debian-netinst ISO took just over a minute to write.
Once complete you should get something like:
48896+0 records in
48896+0 records out
200278016 bytes transferred in 95.151719 secs (2104828 bytes/sec)
This means you’re finished. Now eject the USB and try to boot your PC with it. The Mac may complain that the disk is not readable but just ignore that and try it on a PC.
We have just completed a complex data recovery, where a Mac system had been inadvertently overwritten with Windows. The Mac drive originally had over 500GB of data, so we expected to get most of it back, we just didn’t know how good the structure would be.
It helps to visualise the layout of the data on the disk. before it was overwritten, the data would have looked something like this:
Although the fresh Windows system is much smaller than the original data, it prevents you from seeing any of that old Mac data.
Once we made copies of the drive, we were able to reconstruct the missing parts of the Mac data, and could see all the original files and folders, with their original structure.
Luckily nobody had tried to fix the problem with this drive. Often the fixes people attempt are worse to recover from than the original problems.