Want to free up some space, don't want to delete non-duplicates.
January 9, 2011 3:58 PM   Subscribe

Two folders with almost identical contents. How to either delete duplicates or move not-duplicates. On OS X, happy to use terminal.

I have two folders (my current and previous iTunes library, so lots of nested folders etc.).

I would like to compare $oldfolder with $currentfolder, and either:
1. delete all duplicates in $oldfolder; or
2. move all non-duplicates from $oldfolder into a separate location


If option 1. is taken, I would like any empty directories to be removed.
If option 2. is taken, I'm not fussed as to whether the AAC/MP3 files are moved and just dumped in a new folder, or whether the file structure is maintained.

The contents of $oldfolder aren't indexed by my iTunes library (that's only $currentfolder), so this isn't a case of duplicates in my iTunes library, just a case of duplicate files on my system.

I'm happy to use the terminal for this, but doing this from scratch is a little beyond my competence.

Thanks for your help!
posted by djgh to Computers & Internet (12 answers total) 9 users marked this as a favorite
 
Assuming the structures of the folders are pretty close to the same (i.e., if something is in a location in $oldfolder, it's also in the same (relative) place in $currentfolder), you might look into a tool called rsync. You can ask it to sync your $currentfolder from $oldfolder, omitting the --delete option so as not to remove anything from your $currentfolder (and perhaps even specifying -n -v which should tell you what it WOULD do without modifying anything).

I'm not terribly familiar with OS X but rsync is a fairly standard unix tool. Try man rsync in terminal for a summary of its usage.
posted by axiom at 4:26 PM on January 9, 2011


If you have Xcode installed, I think you can use FileMerge for this.
posted by nicwolff at 4:41 PM on January 9, 2011


Or its Terminal interface, opendiff.
posted by nicwolff at 4:42 PM on January 9, 2011


If you can find or install fdupes or a similar program. (Darwin ports page) it will do it quite easily.

fdupes -r -d -N keepdir deletedir

That's '-r' recursive, '-d' delete, '-N' don't ask (remove for confirmation prompts). A test run could be done like:

fdupes -r -f keepdir deletedir

fdupes will find *any* duplicate file, even if duplicated in keepdir so be warned....

You can use rsync and/or diff -r but the directories would have to have the same structure as both tools are meant for ensuring two directory trees are identical, not finding duplicates.
posted by zengargoyle at 4:54 PM on January 9, 2011


Best answer: Or you can try This Code. (yay Perl)
posted by zengargoyle at 5:28 PM on January 9, 2011


Best answer: Without having to install any special new software, I believe the following works:

cd $oldfolder && find `pwd` -name '*' | sort > ~/old_listing.txt && perl -p -i.bak -e "s~^`pwd`~~g;" ~/old_listing.txt
cd $currentfolder && find `pwd` -name '*' | sort > ~/current_listing.txt && perl -p -i.bak -e "s~^`pwd`~~g;" ~/current_listing.txt
comm -12 ~/old_listing.txt ~/current_listing.txt > ~/files_to_move_or_delete.txt
rm ~/current_listing.txt && rm ~/old_listing.txt
less ~/files_to_move_or_delete.txt

before running, sub the actual path of $oldfolder and $currentfolder into the first two lines. After that, you can just copy and paste the whole blob to the command line and it should work.

This will create a listing of all the common files in the two folders in a text file in your home directory called 'files_to_move_or delete.txt', which it will open in 'less' to view (if you don't normally use less, you can quit it by pressing 'q').

It does not attempt to actually move or delete anything because I don't want a bug in my script to accidentally delete all your stuff. If the listing looks correct, then you can go ahead and delete all the files in the list by doing:

cd $oldfolder
perl -pe '`rm .$_`' ~/files_to_move_or_delete.txt

Hopefully that's helpful.
posted by tylerkaraszewski at 5:31 PM on January 9, 2011


Response by poster: tylerkaraszewski, that looks perfect, thank you. One question: $currentfolder's path has spaces in it. I can get round that by cd'ing in myself, but it's causing problems with the pwd's in the find and sort. I can't work out how to escape the spaces within pwd - could you offer any guidance? Google is failing me on this one (and this is a bit beyond my usual usage, so...).

I would normally rename the folder in question, but there are various network drive/Time Machine issues that complicate things and it's not my system, so...gotta love tech support for family...
posted by djgh at 6:21 PM on January 9, 2011


Umm...

cd $oldfolder
find . -type f | sort > ~/old.txt
cd $newfolder
find . -type f | sort > ~/new.txt
cd
comm -12 old.txt new.txt > dups.txt
...
cd $oldfolder
perl -le 'unlink $_' ~/dups.txt

You need '-type f' to restrict to files and exclude directories/sockets/links/etc.
If you use '.' you don't need to remove $PWD
If you're going to use 'rm' in Perl you should be using unlink. Or just shell directly 'while read a; do rm "$a"; done <>
And this just deletes files with the same name in the same place in the directory tree. Does not do any actual check for duplicates.

On the whole, doing this in shell is complicated a bit, especially if your filenames might have spaces or other special characters in them. My first quick attempt was with find + md5sum + sort + uniq -w but then you have to worry about whether or not to delete from $oldfolder so add a fgrep and then a awk or cut then quoting to avoid problematic characters. Then screw that, use Perl.
posted by zengargoyle at 6:27 PM on January 9, 2011


djdh: Check out my code linked to above. It works well.

# these two directories are copies but I deleted a file from the 'Keep These Here' directory.
$ diff -r 'Keep These Here' 'Delete From Here'
Only in Delete From Here: xdotool_20090330-1.dsc

# run proggy (see README, you have to uncomment the unlink part after testing
$ perl nukedups.pl 'Keep These Here' 'Delete From Here'
Delete From Here/xdotool_20090330-1.diff.gz
Delete From Here/xdotool_20090330.orig.tar.gz
Delete From Here/xdotool-20090330/a.out
Delete From Here/xdotool-20090330/xdo.h
...

# unique file kept
$ find Delete\ From\ Here -type f
Delete From Here/xdotool_20090330-1.dsc

# remove empty directories
$ find Delete\ From\ Here -depth -type d -exec rmdir {} \;
rmdir: failed to remove `Delete From Here': Directory not empty

# all that's left
$ ls Delete\ From\ Here/
xdotool_20090330-1.dsc


The code should run fine on Mac OS X, the modules used are in Perl core since forever.
posted by zengargoyle at 6:40 PM on January 9, 2011


Simply wrapping the pwd passed to find with double quotes seems to fix it with spaces:

cd $oldfolder && find "`pwd`" -name '*' | sort > ~/old_listing.txt && perl -p -i.bak -e "s~^`pwd`~~g;" ~/old_listing.txt
cd $currentfolder && find "`pwd`" -name '*' | sort > ~/current_listing.txt && perl -p -i.bak -e

...

You should also change the command used for the actual deletion to:
perl -ne 'chomp;unlink(".".$_);' ~/files_to_move_or_delete.txt

zengargoyle's modified version is largely (but not exactly) equivalent to mine, and prettier. You can use his (or her) version if you like.
posted by tylerkaraszewski at 6:57 PM on January 9, 2011


If you use my Perl you can also accomplish your second goal if you like.

# change unlink line to
use File::Copy;
move $_, $ARGV[2] unless $found{ $d };

And call it by: ./nukedups.pl $currentfolder $oldfolder $wantfolder
posted by zengargoyle at 7:06 PM on January 9, 2011


Response by poster: Thank you tylerkaraszewski and zengargoyle!
posted by djgh at 8:48 AM on January 10, 2011


« Older Amish Food near York PA?   |   Ambassador to Burkina Faso? Newer »
This thread is closed to new comments.