Wednesday, July 01, 2009

How to lose your data

 
 

Sent to you by LATORRE.COM via Google Reader:

 
 

via Ask The Admin by Joe Glessner on 6/22/09

destroyed_computer

See that picture? When it happens to you, it may not look quite that bad (or be quite that obvious), but data loss sucks. And it does happen. I've been working with computers for 10+ years, and I've had it happen a couple times myself. Did I mention how much it sucks?

I'm not going to spend a couple pages telling you why you should backup, I'm just going to be straight about it, unless you really couldn't care less if that happened to your computer, you are flat out stupid if you are not backing up your data on a regular basis.

Instead of telling you why to backup, I'm going to tell you how to ensure that you are not going to get your data back, even if you think you are backing it up.

Method 1: I'll just back the data up to CD/DVD.

Well sure, this will work for a bit, but:

  1. Ever try to save 20GB to CD? Or 250GB to DVD? Ugh.
  2. How long do you think that optical desk is going to be readable?

Going this route, you can quickly end up trapped behind a small mountain of plastic. Or lets say you manage to somehow keep the optical disks to a manageable quantity, will the marker you labeled it with make the disk unreadable in a year, or is the dye layer unstable, rendering your disk unreadable in six months, or will the glue on the label you made for the disk make it worthless in a year or two? These are just a couple of examples of why optical media should not be considered an archive grade solution.

Method 2: ok then, I'll just copy the data to a USB hard drive.

Sure it's better than nothing, but single HDD solutions are not going to keep your data safe. Hard drives fail. In fact it will happen to every single hard disk you will ever come across. The only question is; when? It's not a matter of if, or of MTBF (mean time between failure), it is more a matter of "you never know, it could fail in ten years, or in ten seconds".

Don't get me wrong, if this is the only way you can back the data up, then it is your only choice, and it's better than nothing. Just be aware, as soon as you copy the data to that USB HDD, the "Clock of Death" is ticking.

Much better would be to copy the data over to a machine with a RAID storage system (preferably RAID5).

Method 3: I bought actual Backup Software (or use a vetted Open Source solution), and run Incremental Backups (to tape!) every single day!

Ok, so you spent some money on a tape backup solution, spent hours reading the manual and configuring your backup. Congratulations, I bet you think your data is safe! Until you find out how Incremental Backups really work (this usually happens after a disaster, and the tapes is all you have left of your pr0n, illegal mp3's downloaded movies warez mission critical data).

Let's pretend for a minute that your backup tapes look something like this:

Full_backup_tape (tape 1 – doesn't matter what you tell it to be, the first backup is always and without exception, a full backup)

Incremental_backup_1 (tape 2)

Incremental_backup_2 (tape 3)

Incremental_backup_3 (tape 4)

Incremental_backup_4 (tape 5)

Incremental_backup_5 (tape 6)

And then you have a catastrophic failure. So you're sitting there at 2am merrily running the restore, and you hit a snag: tape 2 won't read. Doesn't matter why, the tape could be bad, maybe you left it out of the tape safe overnight, and the radio station next door managed to erase it with the magnetic waves they transmit (this actually happened), the data is gone. So is all data after it. See Incremental backups require that all tapes since the last full backup be present and working. So tapes 3-6 may as well be empty, because you are never getting the data off of them. Ever.

If you can't run full backups every day, use Differential backups instead of Incrementals. Let's say that in the scenario the user had been running differentials rather than incrementals. They could then restore to current using just the original full backup, and the last differential.

Method 4: Now I'm running differential backups to tape every single day!

But you fail to check the backup logs every day, and the backup job you though had been running for the last year actually failed 273 days ago, and has been requesting the "correct" tape since then. I've seen this one a lot (in fact, I think this would be the most popular reason for data loss if you have backup software running).

You've got to check your backup logs. It sucks, and it's boring, but it's one of those things you just have to do.

Method 5: Alright, I'm running differentials to tape, and have been checking my logs for the last 2 years every single day!

But you've never run a test restore. If you haven't restored data from the tape successfully, there is no data on the tape. The tape was bad, the backup software failed (silently of course), the gremlins ate it.

Method 6: Ok, now I spend two hours reading the log and then randomly restoring files from my backups (before putting the tapes in the tape safe) every single day!

And then your server room catches fire. All machines, and the safe holding the backup tapes are destroyed. You never took any offsite, because you have a tape safe. It happens. It's unfortunate.

Method 7: Enough, I give up on tape! Now I run a full backup to a RAID5 NAS every single day!

But you ordered your NAS with the drives form the manufacturer, and they used 4 HDD's from the same batch, and two failed. This is the one that always gets them! The strength of RAID5 is that more than one drive has to fail before the RAID is unrecoverable. The weakness is that hard drives from the same batch tend to fail at the same time (or thereabouts).

To strengthen your RAID system, always make sure that you have drives from different batches, if not from different manufacturers (this is not always the best idea, but that is an argument for another time). For instance: to take care of my backup needs at home, I bought a Buffalo Terastation. Unfortunately, Buffalo sent me a Terastation with 4 drives from the same batch (you can usually tell if they all have the same date on them, sometimes there will be a batch code on the drive). I bought 3 more of the same model drive from 3 different manufacturers, and now have the most healthy RAID I can.

These are not the only ways to lose data, but they are by far the most common. How would I know? I was the Worldwide Manager of Technical Support for a backup software company for several years. And I always got to be the one to explain to the customers why their data is gone.

So what do I do?

There are as many answers to that question as there are IT shops with backup systems. Here is how I protect data at my office:

I backup all data every day (full backup) to a NAS configured in RAID5, with a hot spare. I check the health of the RAID every day (it takes about two minutes). Once a week I backup the entire RAID to LTO3 tape, and take the tapes offsite (currently I am taking them home, where they go into a DATA rated fire safe (there is a difference, do your homework), and then into my large safe where I keep all my other valuables. My ideal would be to have them delivered to a bank safety deposit box, but that costs money.

At home, I back up all my data to the aforementioned Terastation. Once per month, I copy all the data off to a USB HDD (actually two of them), and take one to work where it goes into the tape safe.

Is it perfect? No. Does it stand a much better chance of keeping that data alive through a catastrophic event? Absolutely. You don't have to go to these lengths to protect your data, but you should be aware of the risks.


 
 

Things you can do from here:

 
 

No comments: