Help me consolidate thousands of pictures
January 25, 2007 11:07 AM
Subscribe
I have a bunch of directories of pictures across multiple disks. Many images are duplicated. Additionally, I had to use some disk recovery software to rescue other images and the filenames changed. At the moment, I can't guarantee that all of the filenames are unique.
I'd like to weed out the duplicates and then ultimately consolidate everything into iPhoto. I'm talking close to 20,000 pictures.
I had two backup volumes fail simultaneously. I was able to recover some pictures from drive A, some from drive B, and not really know what I have in common. I'm sure there is LOTS of overlap.
Is there a program (hello perl wizards) that will weed out duplicates by CRC? I'm sure that through the various recovery procedures, the pictures were given different names. And I'm not quite sure the naming is unique, so I dont want to just
find . -name \*.jpg -exec mv {} mynewdirectory \;
for fear that I'll overwrite files. Plus that wont weed out the dupes. Any better ideas?
posted by neilkod to computers & internet (13 comments total)
7 users marked this as a favorite
The best I could come up with in my brainstorming was to determine some non-obvious unique identifiers, and possibly do this in a few steps. filename is obvious, but maybe filesize and created on date.
sort by name and identify the ones with the same name. if the filesize and created on date are the same for each file, then I'd say there's strong evidence, they are indeed dupes.
then sort by filesize and see how many files are exactly the same size, even if the names are similar. this might require a quick verification between the two files in that rare case where the filesize is identical, but the pictures are indeed different.
my point is, do it in a few iterations to try to narrow down the 20,000 pics into maybe a hundred or so questionable ones that can't be automatically removed from the group.
posted by johnstein at 11:43 AM on January 25, 2007