Join 3,494 readers in helping fund MetaFilter (Hide)


Is there a tool to help me consolidate all these image files?
October 26, 2010 10:02 AM   Subscribe

MacFilter: Looking for a program or shell script that will allow me to quickly and easily scan my computer (Mac) for duplicates of images and consolidate them so that aren't redundant copies taking up space everywhere.

I am currently working for an employer who takes a lot of photos, a lot, and for whatever reason (most likely do to having changed assistants a number of times over the years) many of these files have been duplicated and placed into different directories for various reasons (sets for printing, edited versions with originals, etc.) Now that I am in charge of this mess of files I am trying to get some basic organization done so that I can find what I need and not fear deleting things which are overly represented. I am looking for any program that can compare images, not just file names, and then alert me to duplicates. It would be best if I could set up some sort of system to automate this as much as possible. I have some level of comfort working in the shell so I wouldn't be opposed to some script that could do this for me, but something with more of a UI would be better as it may be helpful to be able to look at the images from time to time if they have a novel filename or something.
posted by Bengston to Computers & Internet (3 answers total) 12 users marked this as a favorite
 
You'll probably want to do some sort of hash matching (technical speak for content matching) to catch the duplicates that are named differently. Here's a few applications that do just that:

TidyUp
DupeGuru
SingleMizer
Araxis

I'm sure there's lots of other solutions out there, hopefully this helps get you started.
posted by samsara at 10:16 AM on October 26, 2010


Here's a beast of a command I put together:

find DIR -type f -printf '%20s\t%p\n' | sort -n | uniq -D -w 20 | awk -F$'\t' '{print $2}' | xargs sha1sum | uniq -D -w 32

This will first find files that are the same size. It will then compare SHA1 hashes of these files to see if they are identical.
  1. find DIR -type f -printf '%20s\t%p\n' Find all files under directory DIR and display their size and file name.
  2. sort -n Sort this list by size.
  3. uniq -D -w 20 Find only files with the same size as another files.
  4. awk -F$'\t' '{print $2}' Strip out the file sizes and just print the file names.
  5. xargs sha1sum Perform a second pass, now hashing each of the files that remains.
  6. uniq -D -w 32 Find any files with duplicate hashes.
Unfortunately I only have a Linux machine to test this on. I'm not at home on my Mac. These commands may need some fiddling to get working as the GNU/Linux utilities tend to have more flags than their BSD counterparts.
posted by Khalad at 11:27 AM on October 26, 2010 [1 favorite]


Khalad's script won't work as written, osx's find doesn't know printf, only print.

I use Hayne.net's perl script, and then massage that output to turn the extra files into links back to the original.
posted by nomisxid at 12:56 PM on October 26, 2010


« Older Improving how a print magazine...   |  Software to create 3-d book co... Newer »
This thread is closed to new comments.