Automator? I hardly know 'er!
September 25, 2011 5:56 PM   Subscribe

Help me write a shell script for use in Automator that will actually do what I need it to.

I'm trying to write a shell script (for use in an Automator folder action) that will generate an MD5 checksum for each file in a directory and then write that checksum to a text file (along with the name of the file the checksum came from). Right now, the shell script I have is generating a checksum for lots of directories; I'm not sure whether it's the entire computer, but at the very least it's not just files in a specific directory, which is what I need. I have pretty much zero knowledge of this stuff, and trying to find a script that someone else has written hasn't helped me, so I'm hoping that AskMeFi can!

Note: Ideally, the text file should be appended with new file names and checksums whenever new files are acted upon by the folder action, but I can live with it if that's not possible.

In any case, here's what I have so far (removing cd "$1" doesn't actually seem to affect anything, but I'm including anyway):

cd "$1"
find . -print0 |xargs -0 md5 >> ~/Desktop/md5.txt
posted by divisjm to Computers & Internet (8 answers total) 2 users marked this as a favorite
I think you want something like this for your script -- find recurses into subdirectories, which isn't the behavior you seem to want.
for file in `ls $1`
  checksum = `md5 $file`
  echo $file $checksum >> ~/Desktop/md5.txt

posted by axiom at 6:03 PM on September 25, 2011

I'm assuming that you only want the checksums of regular files (i.e., no symlinks, sockets, etc.), so find is actually useful for this sort of thing.

Also, do you care about the exact output format? I.e., does it have to be "filename checksum" on each line, or are you OK with the the default MD5 output format "MD5 (filename) = checksum"?

If you're outputting all your info to the same file, then you probably want to include full pathnames -- otherwise you'll have a hard time identifying exactly which file you checksummed in which directory when reading the list later.

Try this (one-line) variation on your original script:
find $1 -depth 1 -type f -print0 | xargs -0 md5 >> ~/Desktop/md5.txt

That will give you all files in the target directory without recursing deeper (-depth 1), which are regular files (-type f), so you don't get error messages from md5 about giving it a directory to checksum. Here's some sample output of running it on my VVVVVV save directory:

hostname:VVVVVV username$ find `pwd` -depth 1 -type f -print0 | xargs -0 md5
MD5 (/Users/username/Documents/VVVVVV/4kvvvv.vvvvvv) = cd2ea7731a700b40224508d4bd2c656f
MD5 (/Users/username/Documents/VVVVVV/a_new_dimension.vvvvvv) = f8ebdfaf18c391eb1b60f0ca23e3e279
MD5 (/Users/username/Documents/VVVVVV/linewrap.vvvvvv) = ff7e1e339cb896ef1eacef27cf04d075
posted by McCoy Pauley at 6:32 PM on September 25, 2011

Best answer: (not on my mac at present... sorry).

Is the problem here simply that automator is running the script with a working directory which is not the one you want? So your "$1" isn't being interpreted correctly?

Firstly, I'd probably just go

find "$1" -print0 |xargs -0 md5 >> ~/Desktop/md5.txt

and if that's not working, then check what's in "$1" using echo or something.
posted by pompomtom at 6:45 PM on September 25, 2011

Ahh, it's recursing too far? Sorry, please ignore me!! Use McCoy Pauley's depth flag thing.
posted by pompomtom at 6:46 PM on September 25, 2011

cd "$1"
find . -maxdepth 1 -type f -newer ~/Desktop/md5.txt -print0 |
    xargs -0 md5sum >> ~/Desktop/md5.txt
-newer FILE will limit it to files newer than a given file. (duh). If you're planning multiple folders to a single sum file you may need something like:
cd "$1"
find . -maxdepth 1 \( -type f -name .last.checksum -prune \) -o \
    \( -type f -newer .last.checksum \) -print0 |
    xargs -0 md5sum >> ~/Desktop/md5.txt
touch .last.checksum
which will create (and ignore) a per-directory hidden file to track the last run on that directory.

But you probably don't want to cd into the directory in that case since you would loose the path information, so find "$1" ... "$1/.last.checksum" ... as appropriate.
posted by zengargoyle at 7:19 PM on September 25, 2011

Response by poster: find "$1" -print0 |xargs -0 md5 >> ~/Desktop/md5.txt actually worked (the others didn't have quite the right result), I'm guessing because of the way I'm using the script within the larger context of the Automator workflow. I still can't seem to get things to work properly as a folder action, but it's doing just fine as a workflow and app, and that's good enough for now. Thanks, everyone!
posted by divisjm at 3:34 AM on September 26, 2011

for file in `ls $1`

This is not a good idea. It will fail for files with spaces in their names. Any time you find yourself putting ls in ` `, you're probably doing something wrong, because the shell can expand globs itself much more efficiently and without having to parse the output (and thus get hung up on filenames with whitespace.) The proper way to write this, assuming that $1 is a directory name, is:

for file in "$1"/*; do ... done
posted by Rhomboid at 12:25 PM on September 26, 2011

And that's ignoring the fact that a for loop is not even needed, assuming that the command can take any number of arguments: md5 "$1"/*
posted by Rhomboid at 12:28 PM on September 26, 2011

« Older Help me buy a camera?   |   I'm sorry I suck at listening to you. Let me try... Newer »
This thread is closed to new comments.