How do I sort out this folder full of images (after a bungled Aperture export)?
July 11, 2010 2:02 PM   Subscribe

I have a folder full of images -mostly jpeg but some RAW (I'm on a Mac, OS X 10.6.4)- that I exported from Aperture on an older machine. Like a fool, I grabbed all the jpegs in the Aperture vault in Finder and dragged them to a new folder, and now it turns out that I have several versions of each photograph. How do I find the largest/best version of each image and discard the rest?

N.B. I no longer have access to the old computer - these are from an external HDD. There are 7,654 images in total in the folder, with at least three versions of each image e.g. DSC_0686.jpg (a 16kb thumb), DSC_0686 2.jpg (the original/master) and DSC_0686 3.jpg (a reduced-quality preview).

I just need the master images, though in some cases there are more than three versions of each image, and it doesn't always hold true that the ___ 2.jpg version is the master.

I'm hoping there might be some bit of Automator wizardry that can fish out the biggest images, or a nifty bit of software (or indeed trick with Aperture or iPhoto).

This is my first question here; I'm sorry it's so banal, but in my defense, I'm a polar explorer (for want of a better job description) so these are some slightly unusual holiday snaps. Thanks!
posted by bsaunders to Computers & Internet (9 answers total) 1 user marked this as a favorite
would sorting the content of the folder by size solve the problem?
posted by phil at 2:09 PM on July 11, 2010

Response by poster: Alas no, as the images span several years, so some originals are <2>5mb.
posted by bsaunders at 2:12 PM on July 11, 2010

Response by poster: Oops - that meant to say that some originals are less than 2mb whereas half-size previews of images taken on a 5D Mk2 are often bigger than 5mb.
posted by bsaunders at 2:14 PM on July 11, 2010

Is the naming format always XXXX.jpg (thumb) XXXX2.jpg (master) and XXXX3.jpg (lowquality) or does it vary? Sorting by size would be my first thought, as phil stated above, but you can use with some bash scripting to sort and move things around if there is a pattern to the file names. Might be easier to just manually sort by size with the Finder.
posted by jdoss at 2:15 PM on July 11, 2010

Best answer: Duplicate Annihiliator will do what you want. It's a fairly incredible little program/plug in that does exactly what it says on the tin, but it's not free and it is a single-tasker.
posted by The Bellman at 2:32 PM on July 11, 2010

The big question is if you various versions of a photo all share a common 'DSC_####' component so that all photos that match "(something)DSC_1234(something_else)" are the same photo. Then it's not too bad to do. For instance ImageMagick has an identify program that does this:
$ identify *.jpg
earth_day.jpg[10] JPEG 2560x1024 2560x1024+0+0 8-bit DirectClass 265KiB 0.000u 0:00.000
Orion-Nebula-Again.jpg[11] JPEG 2560x1024 2560x1024+0+0 8-bit DirectClass 132KiB 0.000u 0:00.000
perfect-cosplay.jpg[12] JPEG 700x513 700x513+0+0 8-bit DirectClass 115KiB 0.000u 0:00.000
Samurai_Champloo_by_cman.jpg[13] JPEG 2560x1024 2560x1024+0+0 8-bit DirectClass 356KiB 0.000u 0:00.000
Spike_Wallpaper.jpg[14] JPEG 2560x1024 2560x1024+0+0 8-bit DirectClass 252KiB 0.030u 0:00.039
sunspectrum_noao_big.jpg[15] JPEG 3071x2048 3071x2048+0+0 8-bit DirectClass 681KiB 0.000u 0:00.000
the_swordfish_dual_mon_versio.jpg[16] JPEG 2560x1024 2560x1024+0+0 8-bit DirectClass 294KiB 0.000u 0:00.000
web2-1.jpg[17] JPEG 600x338 600x338+0+0 8-bit DirectClass 40KiB 0.000u 0:00.000
With a bit of scripting you could build a list of file names and sizes, then group them by the common part of the name like:
1234 DSC_1234.JPG 1024x768
1234 DSC_1234_2.JPG 640x480
So, pretty doable if your names have a bit of uniqueness even if they are inconsistent in the suffixes for the various sizes. But it's a lot of playing around to figure out things like the common format of the original pictures, common size of thumbnails/preview... there may even be specific EXIF information that can be used to discern an original from something manipulated by a program that made the smaller versions.
posted by zengargoyle at 2:37 PM on July 11, 2010

To clarify, you don't have any access to the original Aperture library (or a backup Aperture "vault"), right? You're never going to get back some of your metadata and organization if this is the case.
posted by zachlipton at 2:38 PM on July 11, 2010

Response by poster: Fantastiche! Thank you Steve.
posted by bsaunders at 2:38 PM on July 11, 2010

Response by poster: To clarify, you don't have any access to the original Aperture library (or a backup Aperture "vault"), right?

Absolutely, though I'm not concerned about preserving albums etc. - just getting rid of the 4k+ duplicates without spending weeks doing it manually! The Duplicate Annihilator plugin looks just the ticket.
posted by bsaunders at 2:41 PM on July 11, 2010

« Older What shape will I be in after my mouth goes booom?   |   What do Tauro's decisions actually mean in real... Newer »
This thread is closed to new comments.