link to homepage
Other sections:

My Data Backup Strategy

DVDs (optical media backup)

Since beginning using a digital camera, I have saved images in folders named after the photo date in yyyy-mm-dd format (details are in an earlier blog post). I then also systematically backed up every photo on a series of data DVDs (initially data CDs).

Some perl code and a database table are used to keep track of what's on the backup discs. The discs are stored in a separate physical location as my main hard drive.

Backup hard drive

My DVD backup gives me peace of mind, yet there are over 100 backup discs. If my main hard drive failed, re-creating my image collection would involve a substantial amount of effort. Also, my documents (such as computer code, invoices, development website, and email archive) are not on DVDs because they change too much.

I considered mirroring (RAID 1), but was put off by basic questions about how it worked (did my system need a RAID card? How would I know when one of the drives failed -- would it tell me on boot-up?). My own priority was ease of use and simple recovery; I didn't require the 1-for-1 real-time mirroring of RAID 1.

I eventually settled on buying a single, internal backup drive (750GB) and backing up both my entire digital archive and selected documents (email, code, development website, etc). I relied on the unix/linux "rsync" utility to make the backup, and I'll re-run it often to keep the backup current (no less than once a week). This will allow for easy recovery in the event of another hard drive failure.


Below is the exact syntax I use for rsync:

rsync -avhr --progress --stats --delete /original/folder/path /backup

Note that there is no trailing slash after the folders in the paths above; this makes a huge difference in the behavior of rsync. The syntax above will copy all of /original/folder/path (including any subfolders) to /backup; if you list backup afterwards you will see a folder named path that has all of path's contents.

The switches I used, -avhr, are for archive (preserves permissions, etc), verbose,human-readable numbers, and recursive (descend into and copy subfolders). The switch --delete means any files discovered on the backup drive that are not on the original drive will be deleted. I want this option so if I delete any old email, for example, it's also deleted from my backup.

My Failed Drives

I got serious about this after loosing two hard drives in the space of a couple months. The first one I lost was my digital image archive (a fairly new drive), and it required a week's effort to re-build the data (most was recovered through a linux data recover program, but the files were not in separate folders and had to be sorted through). I got serious after a second drive crashed shortly after that. It was not an important drive, but it was an illustration of hard drive reliability (or lack thereof). I researched my options, weighed the costs, and settled on the system above. I suspect RAID is better and I may migrate to that later, but this has the advantage of simplicity.

This page last modified on 2008-10-16

Post a comment on this page