link to cgstock.com homepage
home site info services image licensing newest prints
dreamhost

Digital photo archiving

I began taking photos with a digital camera in 2001, and realized all the original files would have to be preserved "forever", in such a way they would be backed up and I could locate images again. My original system for organizing the files has continued to work, but I tried several methods of backing up the entire collection. Below is the approach I've settled on, and it's worked for me since 2001. Note that I'm using the Linux operating system, and downloading photos using a Compact Flash card reader. My approach is enhanced by the use of an SQL database and the perl programming language, but that may not be necessary.

Save ALL your images

Other than out-of-focus or badly exposed images, hard drive space is cheap enough that you should save all your images. Years into the future, shots that seemed worthy of deletion may have greater significance.

As one example, I have photographed people who's images later became valuable (personally). I photographed several dozen neighbor kids when living in the Philippines. Later, a boy in the neighborhood died in a handgun accident. I had several unremarkable photos of the boy, but which were now the only images that existed of him alive. They were very important to me and his family, and one was used in his funeral procession (see photo).

Save files in folders named for the date using year-month-day ("2007-12-21")

This gives a directory structure with a natural sorting order, that looks like this:
/2000
|
+----2000-01-01
     2000-02-01
     2000-03-01
     ...

/2001
|
+----2001-01-01
     2001-02-01
     2001-03-01
     ...
etc.

Each image is downloaded into a folder named for the date it was taken on, according to the photo's EXIF information. I wrote a perl script that reads the EXIF data from images, creates any necessary folders, and copies the images. It uses a utility called "jhead", and is not very elegant but an excerpt is below:

#mount the flash card reader (this will be different for your system)
system("mount -t vfat /dev/sdd1 /mnt/flash")
#the below is based on Canon's particular directory structure
@files = glob("/mnt/flash/dcim/???canon/*");

foreach $filename(@files) {
    $out = `jhead $filename`;

    #the jhead command will return, in part, the following:
    #Date/Time    : 2005:08:18 02:16:53

    $out =~ /Date\/Time.*(\d\d\d\d:\d\d:\d\d) /;
    $date_taken=$1;
    $date_taken =~ s/:/-/g;
    $date_taken =~ /(\d\d\d\d)/;
    $year_taken = $1;

    unless(-e "/digicam/$year_taken/$date_taken/") {
         system("mkdir /digicam/$year_taken/$date_taken");
     }
    system("cp -v $filename /digicam/$year_taken/$date_taken/");
}
The same thing can be done by hand, if necessary.

Use an SQL database to keep track of your original image files

I have several thousand images on this website. I might need the original, high resolution digital file for any image on short notice. I could locate it by the picture date, and browsing the folder named after that date, but I've found that to be less than fool-proof. I use an SQL table which mirrors everything in the folders on my hard drive:


+--------------+-----------------------+------+-----+------------+
| Field        | Type                  | Null | Key | Default    | 
+--------------+-----------------------+------+-----+------------+
| id           | mediumint(8) unsigned | NO   | PRI | NULL       |
| folder       | date                  | NO   | MUL | 0000-00-00 |                
| filename     | varchar(100)          | NO   |     |            |                
| archive_no   | tinyint(3) unsigned   | YES  |     | NULL       |                
+--------------+-----------------------+------+-----+------------+

The important function of this table is that whatever I may do down the road with a given image, I always associate it with it's "file id number". The need for this is apparent from the filename scheme for Canon, "img_0000.jpg"; if you let the camera name your images with this naming scheme, after shooting 10,000 images you begin re-using filenames (you have two "img_0001.jpg", etc.). So I don't track images by filenames, or dates, or a combination, but a single, unique file id number.

Make backups

This goes without saying, but it meshes with the above system. I used CD-ROMs, then later DVDs, which I number sequentially. The database table has a field for "archive_no" which refers to the numbered disc on which the photo was backed up.

Note: in 2010, I tried to access 30 discs that were 2-3 years old. The CD-ROMS all read fine, but two of the DVDs were not readable on my computer. They were readable on a different computer, though. I believe there is some question about the longevity of this media. As of 2009, I employ a hard drive on a separate computer as a backup, along with optical discs.

When there are enough images not backed up that would fill a DVD, I make a new backup and update the files table to reflect which images are on that backup disc. I also periodically use rsync (over nfs) to backup images to my backup drive. That is, a second computer is able to mount the primary computer's drive via linux's network file server(nfs), and the rsync program copies any new images to the backup drive.

I haven't lost my original archive due to fire, theft, or catastrophic system failure yet, but I'm ready if it happens. I try to keep my backup in a separate location from my original archive, but that has led to problems of access. I do keep the archive in a fireproof area.

This page last modified on 2011-11-24

phone cgstock.com at 612-245-4306 email us:chris@cgstock.com
Chris Gregerson, 150 Green Ave. N., New Richmond, WI 54017 USA

home | licensing information | site info |  
http://www.cgstock.com/
Domains created and maintained by Chris Gregerson, chris@cgstsock.com www.cgstock.com www.gregerson.org