Run to yer mama
February 3, 2018 6:01 AM   Subscribe

I have a folder full of low-resolution images, basically thumbnails by today's standards, with descriptive file names, and a stack of DVDs full of high-resolution jpgs straight from cameras with long serial-number-type file names. I'm certain that the originals for most of the thumbnails are on the DVDs and I need a magic way to match them up, so that each little lost orphan thumbnail can find its mama original.

I am helping a local nonprofit transfer its web site to a new web host, a site which contains a plethora of photo galleries. The original web site was created in the medieval era of the web when disk space and bandwidth were precious commodities in short supply and were fought over brutally by merciless armored noblemen wielding exotic pole arms, so my predecessor dutifully fiddled with compression codec settings and color palettes and things to crush down each camera image into a small version which would perform well on the site and in the browsers of that more civilized age.

In contrast, the new web host has no constraints on disk space or bandwidth and encourages you to upload the maximum resolution you've got, then automatically creates multiple downsized images and thumbnails and handles sending the appropriate resolution version to each site visitor's device. There are no copyright concerns related to publishing the high-resolution versions.

What I really need is a piece of software that can churn through all the files, compute some sort of resolution-independent hash for each of them, and use that hash to match up the old site thumbnails to their originals. Ideally a little shell script for Linux that calls imagemagick or something; but I also have access to Windows machines and if this is a standard feature of an Adobe product or other fancy expensive software, then just the name of the feature may be a useful search term, or I can probably track down someone with that software and bring all of the files with me on an external HD.
posted by XMLicious to Technology (6 answers total) 2 users marked this as a favorite
Just spitballing here... I would check the metadata in the thumbnail and the original file and see if some of the fields were carried over when the thumbnail was created from the original (like the timestamp when the photo was taken).

If that pans out it should be possible to write a Python or Perl script to fetch the metadata for both sets of images, match them up, and output a mapping of thumbnail to original.

If that doesn't work out, you could look for utilities like this to generate a "similarity score" between two images and take the highest scores to be suspected matches. Also, this discussion of doing something similar with OpenCV.
posted by duoshao at 6:26 AM on February 3, 2018 [1 favorite]

Adding to what duoshao said -

How to check metadata in Win and Mac.

One of the fields in metadata (also called EXIF data) is File Name. If that's present, using a script to match thumbnails to originals is easy - for someone who knows scripting. I can't help you there.
posted by Homer42 at 6:31 AM on February 3, 2018 [1 favorite]

This is interesting, and uses ImageMagick.
posted by duoshao at 6:42 AM on February 3, 2018 [2 favorites]

Are you aware of Tineye's MatchEngine? They do this sort of thing; their plans start at $200 per month, cancel any time.
posted by at at 7:01 AM on February 3, 2018 [1 favorite]

I've accidentally had duplicate photo finder software do this for me, but probably not for as big of a resolution difference as you're working with. (Shoot me a message if you're still interested, I may be able to dig up some of the programs names I've used.)
posted by Real.Wolf at 7:42 PM on July 19, 2018

« Older Notetaking app   |   They give us those nice bright colors/They give us... Newer »
This thread is closed to new comments.