Help me find ONLY the duplicate files between folders
August 21, 2019 3:00 PM   Subscribe

I have two folders full of images on my Mac with identical file names, and I want to find only the duplicates between them.

Folder A has low res photos of all the employees at my company. Folder B has high res photos of most those employees plus some former employees. I need to make a folder that contains just the high res photos of the current employees.

I can't figure out any we to do this within the finder. I could probably put the file names into excel and do some kind of lookup thing and generate a list of the difference, but then I'd still have to manually move and delete files. There are around 500, so I'd like to do it as automatically as possible.

A nice to have: Identifying the files in Folder A that are not in Folder B.
posted by jonathanhughes to Computers & Internet (12 answers total) 5 users marked this as a favorite
 
This might work at the MAC CLI:

diff -rub lowresfolder/ hiresfolder/
posted by pilot pirx at 3:21 PM on August 21, 2019 [1 favorite]


Best answer: if you're comfortable operating in a terminal, this can be accomplished relatively quickly.
Assuming the folders are named A and B and you want to create a new folder C that contains only the files from B that have a filename that can be found in A, and further assuming that A, B, and C folders all exist in the same parent, then you can issue the following command on the terminal:

for filename in $(ls A/); do cp B/$filename C/$filename; done;

for the nice to have: for filename in $(ls A/); do if [ ! -f B/$filename ]; then echo $filename; fi; done;
posted by VeritableSaintOfBrevity at 3:27 PM on August 21, 2019 [4 favorites]


I believe CCleaner has a search for duplicates feature.
posted by Melismata at 3:30 PM on August 21, 2019 [1 favorite]


I'd probably use VeritableSaintOfBrevity's solution, although this requires that the files in A/ and B/ have the exact same names. If they are PHOTO1234_lowres.jpg and PHOTO1234_hires.jpg, or something like that, you will need to add in some additional logic to strip off the nonmatching parts of the names.

Also, if there are some files in A/ that are not in B/ (i.e. you don't have a high-res version), they won't be copied to C/ via VSOB's command. That's because it's using the first folder (A) as a list of files to copy from B to C. You'll see an error if it's missing from B.

To get the complete set of files in A, you'll also need to run cp -n A/* C/, using the "-n" or "no clobber" option to prevent overwriting any high-res files you already copied in there. Do this last, obviously.

Test all this on a copy of the data, I didn't test these commands or anything.
posted by Kadin2048 at 3:51 PM on August 21, 2019 [1 favorite]


Grab the latest version of Beyond Compare
posted by pyro979 at 4:26 PM on August 21, 2019 [4 favorites]


Best answer:

$ ls a
bar.jpg  foo.jpg
$ ls b
bat.jpg  foo.jpg
$ mkdir c
$ cp -r a/* c
$ cp -br b/* c
$ find ./c -type f -iname '*~' -delete
$ ls c
bar.jpg  bat.jpg  foo.jpg
First recursive copy b to c. Then recursive copy a to c with the option that adds a '~' to the end of duplicate files. Delete the duplicate files. You are left with the highest resolution version of unique filenames assuming 'b' is high-res and 'a' is low-res.
posted by zengargoyle at 5:57 PM on August 21, 2019 [1 favorite]


Crap, may have mixed up the 'a' and 'b'.... high-res first, low-res with backup second, delete backups, Profit!
posted by zengargoyle at 6:02 PM on August 21, 2019 [1 favorite]


Response by poster: This is all fantastic stuff! Thanks so much!
posted by jonathanhughes at 6:48 PM on August 21, 2019


Seconding the suggestion for Beyond Compare. You can download the 30 day trial to get this done, but BC is so useful for so many things you might end up buying it. I use it for backing up, synchronizing folders, and even comparing the contents of files (who knew?)
posted by lhauser at 6:51 PM on August 21, 2019 [1 favorite]


I think zengargoyle doesnt quite have the logic right, if you copy the lo-res files into the folder with the hi-res files and then delete the duplicates you are still going to end up with the former employees.
What you need to do is KEEP all of those duplicates (the backups ending with ~ should be the hi-res originals) and delete everything else.

So the find command would need to be this:

find ./c -type f ! -iname '*~' -delete
You will then want to rename all of those to restore the .jpg file extension.
The answer by VeritableSaintOfBrevity looks a better route.
posted by Lanark at 10:38 AM on August 24, 2019


Yeah, thanks. I totally got caught up in giving shell solution to a Finder/Excel user and not really being sure of Mac OS BSD flavored utilities. Lost track trying to not make it too complicated or answer too many questions.

# setup, m and n are the files in both a(low) and b(high)
$ mkdir foobar; cd foobar
$ mkdir a; for f in a b c m n; do echo low > a/$f; done
$ mkdir b; for f in m n x y z; do echo high > b/$f; done

# make sorted list of files in each directory
$ (cd a; find . -type f | sort ) > a.lst; (cd b; find . -type f | sort) > b.lst

# comm is weird the 1 column is things in a but not b, the 2 column is things in b but not a
# the 3 column is things that are in both a and b.

# The -12 means only show me the third column... shared files
$ comm -12 a.lst b.lst
./m
./n

# The -23 is only in a
$ comm -23 a.lst b.lst
./a
./b
./c

# The -13 is only in b
$ comm -13 a.lst b.lst
./x
./y
./z

# Alternate for comm
$ sort a.lst b.lst | uniq -d
./m
./n

# Make a list of the dups.
$ comm -12 a.lst b.lst > dups.lst

# Copy those to a
$ cat dups.lst | (cd b; while read f; do cp "$f" ../a; done)

# take a peek
$ head a/*
==> a/a <==
low

==> a/b <==
low

==> a/c <==
low

==> a/m <==
high

==> a/n <==
high
I backed out of just using Perl (and I always mess up really fancy shell, plus shell quoting suck). :)

jonathanhughes, come to the Dark Side and learn the shell.
posted by zengargoyle at 11:45 PM on August 25, 2019 [1 favorite]


Nice use of comm! If there are no extraneous subdirectories you could do just
$ comm -23 <(ls A) <(ls B) > in-A-but-not-B.txt

$ cd B

$ cp `comm -12 <(ls ../A) <(ls)` ../A

posted by nicwolff at 12:55 PM on August 26, 2019 [1 favorite]


« Older How do I learn to be curious?   |   Negotiating salary when the offer is already... Newer »
This thread is closed to new comments.