Help me archive a roomful of old Mac data
January 11, 2010 9:09 AM   Subscribe

How do I automate dumping the contents of a couple hundred different data CDs into a directory in Mac OS X?

I've got a couple hundred data CD-Rs - mostly graphic design and illustration projects dating from before 2002 - that I want to dump into an archive drive (which itself will be duplicated and safely stored) before the CD-Rs succumb to bit rot. These are all data CDs, no applications, no music, no DVDs. I don't want to create disk images, I only want to copy off the files. There's no concern with cracking or working around copy protections.

The challenge I face is how to automate this as much as possible - ideally the human component of the process should be no more than dumbly inserting and removing CDs from the drive, and never have to go to the finder to identify the disk, drag it to a target, or eject it.

One possible process would watch for a disk insertion (the name of the volume is unknown until it's mounted), rsync the disk and its contents to a common target directory, unmount and eject the disk on success, and wait for the next disk insertion. How to automate that when there don't seem to be any "on disk insertion" triggers or events in AppleScript or Automator is what's stumping me right now. Thanks for any help or leads
posted by ardgedee to Technology (5 answers total) 2 users marked this as a favorite
 
I'd do this with a shell script, or just a one-liner from the command line. The Mac should auto-mount the CD when you insert it. Open up Terminal, do 'df' at the command prompt. Insert the CD and do 'df' again - you should see the drive mounted.

Then do:

cp -rv /path/to/CD/* /path/to/target/drive/directory

Eject the disk, put the new one in. Give it a second, then hit the Up arrow (to bring the command back out of the history) and hit Enter. And so on.
posted by jquinby at 9:16 AM on January 11, 2010


Best answer: You will want to use a launchd item combined with a shell script.

This launchd item will watch the /Volumes folder, and execute the script whenever the contents of volumes changes (such as inserting a new disk, ejecting a disk, etc.). It will also keep itself from executing twice if the previous job is already running.

Here is an example launchd item, change the program arguments to match the path to the script you've installed. you may want to insert a disk, and run the script manually from the terminal first, to ensure it works. Put it in ~/Library/LaunchAgents (create the directory if its not there) and you can load/unload it by using launchctl load/unload ~/Library/LaunchAgents/com.sneezingdog.copycds.plist

A sample script that should work (but i haven't tested):

#/bin/bash

#find the disk mounted by optical media
opticalMedia=`df | grep disk2s1 | awk '{ print $6 }'`

#check for the optical media
if [ -d $opticalMedia]; then
#it is here, lets get copying, the lack of the trailing / in $opticalMedia will copy the disk name to the destination folder as a full new folder, if you just want the contents, throw a slash in after that.
rsync -aEp $opticalMedia /destination/etc/etc/
#once done copying, kick out the disk, make sure to change this to match your optical drives mounted disk.
diskutil eject /dev/disk2s1
else
echo "nothing to sync"
fi

Which you can grab here (it has to be chmod +x'ed, and review it to make sure i don't hack your system, know your coders, etc.)

If your optical media doesn't mount at disk2s1 (easily discoverable by inserting a disk and just looking at the output of df by itself), edit that line.
posted by mrzarquon at 10:10 AM on January 11, 2010


Best answer: with the launchd item and the script working, you should just be able to keep a stack of disks next to the machine, and whenever the tray is ejected, remove the disk on it, put that to the side, put a new one on the stack, and close it. pretty simple.

If you are worried about overlapping disk names, add this line next to opticalMedia=

rightnow=`date +"%%Y%m%d-%H%M"`

then use

rsync -aEp $opticalMedia/ /destination/$rightnow/

which will create a new folder called $rightnow (date in yearmonthday-hourminute) and copy the contents of the opticalmedia, if you remove the /, it will nest the disk into the folder named $rightnow. This should keep the script running (since you shouldn't be inserting a disc / running the script under a minute of the last time you ran it), and creating a new, unique folder, for each disk being inserted.
posted by mrzarquon at 10:18 AM on January 11, 2010


Best answer: mrzarquon hit closest. There were some problems that resulted in a more complicated shell script, but in the end it worked.

Here's the script in production, edited and commented for clarity and privacy:
#/bin/bash

#find the disk mounted by optical media
opticalMedia=`df | grep disk1s1s2 | cut -d '/' -f 5`

if [ -d "/Volumes/$opticalMedia" ]; then
echo $opticalMedia
if [ -d "/Users/myhomedirectory/target/$opticalMedia" ]; then
#is there a name conflict? fix it.
timestamp = `date +"%%Y%m%d_%H%M"`
targetname="$opticalMedia-$timestamp"
else
#no conflict. yay.
targetname="$opticalMedia"
fi
#rsync using full archive method, extended attributes, full permissions
#(makes literal some default Mac OS X rsync behaviors, just in case)
rsync -aEp "/Volumes/$opticalMedia/" "/Users/myhomedirectory/target/$targetname"
diskutil eject /dev/disk1s1s2
else
echo "whoops. nothing there, drive misidentified, or some other bug"
fi
The complications:
  1. Mac OS X accommodates spaces in filenames gracefully and shell scripts accommodate them only with hammer and tongs. The awk one-liner fails when volumes have spaces in their names, but in private messages mrzarquon suggested the cut command used above, which works. Quoting variable names ended up being mandatory throughout.
  2. The disk ID number will vary because external devices - including the CD drive - get plugged and unplugged regularly (Today the CD drive is disk1s1s2; last week it was disk2s1s2). In lieu of writing a script to take snapshots of df regularly and making comparisons to determine what the drive's ID is today, I manually run df, check the device ID, and edit the shell script accordingly before doing the day's CD dump; it only risks changing during my workday if the drive is disconnected and then reconnected after something else is plugged in.
  3. Finally, I didn't want to have to create more nested or renamed directories than absolutely necessary within the target archive, so there's an added clause to rename the destination directory only in the case of a name conflict.
Thanks to everybody who helped.
posted by ardgedee at 8:59 AM on January 15, 2010


After thinking about it some more, this command works as well, assuming your only ata device is your optical drive (wont work on newer laptops I think):

system_profiler SPParallelATADataType | grep "Mount Point" | cut -d ":" -f 2 | cut -d "/" -f 3
posted by mrzarquon at 10:55 AM on January 15, 2010


« Older Column-oriented CSV in Python?   |   How Much Job Is No Job? Newer »
This thread is closed to new comments.