link to cgstock.com homepage
home site info image licensing newest video prints
Other sections:

Blog entries

2007-12-05 First post 2007-12-14 Gentoo Linux ... >2007-12-21 Photo archiving... 2008-02-15 Court Rules in...

Photo archiving

I tried several methods of saving digital photos for long-term storage. Below is the approach I've settled on, and it's worked for me since 2001. Note: I'm using the Linux operating system, and downloading photos using a Compact Flash card reader. I'm also using an SQL database, and the perl programming language.

Save ALL your images

Other than out-of-focus or badly exposed images, hard drive space is cheap enough now that you should save all your images. Years into the future, shots that seemed worthy of deletion have greater significance.

As one example, I photographed several dozen kids for months in the Philippines, after which one of the children died in a handgun accident. Otherwise unremarkable photos of the boy were now very important to me and his family (one rode on his hearse in his funeral procession, see photo).

Save files in folders named for the date using year-month-day ("2007-12-21")

This gives a directory structure with a natural sorting order, that looks like this:
/2000
|
+----2000-01-01
     2000-02-01
     2000-03-01
     ...

/2001
|
+----2001-01-01
     2001-02-01
     2001-03-01
     ...
etc.
Each image is downloaded into a folder named for the date it was taken on, according to the photo's EXIF information. I wrote a perl script that reads the EXIF data from images, creates any necessary folders, and copies the images. It uses a utility called "jhead", and is not very elegant but an excerpt is below:
#mount the flash card reader (this will be different for your system)
system("mount -t vfat /dev/sdd1 /mnt/flash")
#the below is based on Canon's particular directory structure
@files = glob("/mnt/flash/dcim/???canon/*");

foreach $filename(@files) {
    $out = `jhead $filename`;

    #the jhead command will return, in part, the following:
    #Date/Time    : 2005:08:18 02:16:53

    $out =~ /Date\/Time.*(\d\d\d\d:\d\d:\d\d) /;
    $date_taken=$1;
    $date_taken =~ s/:/-/g;
    $date_taken =~ /(\d\d\d\d)/;
    $year_taken = $1;

    unless(-e "/digicam/$year_taken/$date_taken/") {
         system("mkdir /digicam/$year_taken/$date_taken");
     }
    system("cp -v $filename /digicam/$year_taken/$date_taken/");
}
The same thing can be done by hand, if necessary.

Use an SQL database to keep track of your original image files

I have several thousand images on this website. I might need the original, high resolution digital file for any image on short notice. I could locate it by the picture date, and browsing the folder named after that date, but I've found that to be less than fool-proof. I use an SQL table which mirrors everything in the folders on my hard drive:

+--------------+-----------------------+------+-----+------------+
| Field        | Type                  | Null | Key | Default    | 
+--------------+-----------------------+------+-----+------------+
| id           | mediumint(8) unsigned | NO   | PRI | NULL       |
| folder       | date                  | NO   | MUL | 0000-00-00 |                
| filename     | varchar(100)          | NO   |     |            |                
| archive_no   | tinyint(3) unsigned   | YES  |     | NULL       |                
+--------------+-----------------------+------+-----+------------+
The important function of this table is that whatever I may do down the road with a given image, I always associate it with it's "file id number". The need for this is apparent from the filename scheme for Canon, "img_0000.jpg"; if you let the camera name your images with this naming scheme, after shooting 10,000 images you begin re-using filenames (you have two "img_0001.jpg", etc.). So I don't track images by filenames, or dates, or a combination, but a single, unique file id number.

Make backups

This goes without saying, but it meshes with the above system. I used DVDs, which I number sequentially. The files table has a field for "archive_no"; which refers to the numbered DVD on which the photo was backup up.

When there are enough images not backed up that would fill a DVD, I make a new backup and update the files table to reflect which images are on that backup disc.

I haven't lost my original archive due to fire, theft, or catastrophic system failure yet, but I'm ready for it when it happens. I try to keep my backup in a separate location from my original archive, but they are in the same building (in a coal bin, which seems fire proof).

This page last modified on 2007-12-22

Post a comment on this page

cgstock.com provides quality stock photos for commercial, fine-art, education, and non-profit use, with an emphasis on pictures of the Twin Cities of Minneapolis and St. Paul, Minnesota and China & the Philippines.
phone cgstock.com at 612-245-4306   email us:chris@cgstock.com
Chris Gregerson, 150 Green Ave. N., New Richmond, WI 54017 USA
home   |   licensing information   |   site info   |   web development services
http://www.cgstock.com/