Run to yer mama
February 3, 2018 6:01 AM Subscribe
I have a folder full of low-resolution images, basically thumbnails by today's standards, with descriptive file names, and a stack of DVDs full of high-resolution jpgs straight from cameras with long serial-number-type file names. I'm certain that the originals for most of the thumbnails are on the DVDs and I need a magic way to match them up, so that each little lost orphan thumbnail can find its mama original.
I am helping a local nonprofit transfer its web site to a new web host, a site which contains a plethora of photo galleries. The original web site was created in the medieval era of the web when disk space and bandwidth were precious commodities in short supply and were fought over brutally by merciless armored noblemen wielding exotic pole arms, so my predecessor dutifully fiddled with compression codec settings and color palettes and things to crush down each camera image into a small version which would perform well on the site and in the browsers of that more civilized age.
In contrast, the new web host has no constraints on disk space or bandwidth and encourages you to upload the maximum resolution you've got, then automatically creates multiple downsized images and thumbnails and handles sending the appropriate resolution version to each site visitor's device. There are no copyright concerns related to publishing the high-resolution versions.
What I really need is a piece of software that can churn through all the files, compute some sort of resolution-independent hash for each of them, and use that hash to match up the old site thumbnails to their originals. Ideally a little shell script for Linux that calls imagemagick or something; but I also have access to Windows machines and if this is a standard feature of an Adobe product or other fancy expensive software, then just the name of the feature may be a useful search term, or I can probably track down someone with that software and bring all of the files with me on an external HD.
I am helping a local nonprofit transfer its web site to a new web host, a site which contains a plethora of photo galleries. The original web site was created in the medieval era of the web when disk space and bandwidth were precious commodities in short supply and were fought over brutally by merciless armored noblemen wielding exotic pole arms, so my predecessor dutifully fiddled with compression codec settings and color palettes and things to crush down each camera image into a small version which would perform well on the site and in the browsers of that more civilized age.
In contrast, the new web host has no constraints on disk space or bandwidth and encourages you to upload the maximum resolution you've got, then automatically creates multiple downsized images and thumbnails and handles sending the appropriate resolution version to each site visitor's device. There are no copyright concerns related to publishing the high-resolution versions.
What I really need is a piece of software that can churn through all the files, compute some sort of resolution-independent hash for each of them, and use that hash to match up the old site thumbnails to their originals. Ideally a little shell script for Linux that calls imagemagick or something; but I also have access to Windows machines and if this is a standard feature of an Adobe product or other fancy expensive software, then just the name of the feature may be a useful search term, or I can probably track down someone with that software and bring all of the files with me on an external HD.
Adding to what duoshao said -
How to check metadata in Win and Mac.
One of the fields in metadata (also called EXIF data) is File Name. If that's present, using a script to match thumbnails to originals is easy - for someone who knows scripting. I can't help you there.
posted by Homer42 at 6:31 AM on February 3, 2018 [1 favorite]
How to check metadata in Win and Mac.
One of the fields in metadata (also called EXIF data) is File Name. If that's present, using a script to match thumbnails to originals is easy - for someone who knows scripting. I can't help you there.
posted by Homer42 at 6:31 AM on February 3, 2018 [1 favorite]
This is interesting, and uses ImageMagick.
posted by duoshao at 6:42 AM on February 3, 2018 [2 favorites]
posted by duoshao at 6:42 AM on February 3, 2018 [2 favorites]
Are you aware of Tineye's MatchEngine? They do this sort of thing; their plans start at $200 per month, cancel any time.
posted by at at 7:01 AM on February 3, 2018 [1 favorite]
posted by at at 7:01 AM on February 3, 2018 [1 favorite]
This python program seems to do what you want.
posted by gregr at 8:25 AM on February 3, 2018 [2 favorites]
posted by gregr at 8:25 AM on February 3, 2018 [2 favorites]
I've accidentally had duplicate photo finder software do this for me, but probably not for as big of a resolution difference as you're working with. (Shoot me a message if you're still interested, I may be able to dig up some of the programs names I've used.)
posted by Real.Wolf at 7:42 PM on July 19, 2018
posted by Real.Wolf at 7:42 PM on July 19, 2018
This thread is closed to new comments.
If that pans out it should be possible to write a Python or Perl script to fetch the metadata for both sets of images, match them up, and output a mapping of thumbnail to original.
If that doesn't work out, you could look for utilities like this to generate a "similarity score" between two images and take the highest scores to be suspected matches. Also, this discussion of doing something similar with OpenCV.
posted by duoshao at 6:26 AM on February 3, 2018 [1 favorite]