How to automatically remove duplicates from a union of two iTunes music folders?
February 3, 2006 8:24 AM   Subscribe

Yet another iTunes management question:

A friend has harddrive A, mostly taken up by the contents of ~/Music. His brother has just purchased harddrive B, an external drive, about a quarter taken up by his own ~/Music. They want to create a third ~/Music folder, comprised of every .mp3 from harddrives A and B with duplicates removed. This new ~/Music folder will then replace the original music folders of on the two drives.

The question: How to do this? Is there some automated way of removing the duplicates from the union of the two drives? I know iTunes will not do it, but maybe some sort of drive synchronisation tool that would? Some fancy UNIX magic? (Note: All computers involved are running OS X.)

Note that space isn't a problem, just the weeding of duplicates. And, yes, this question is a bit self-serving as I will be given access to this bounty of music...
posted by docgonzo to Computers & Internet (10 answers total)
Thanks for your help! Note that I am comfortable with though I can't do fancy stuff like write perl scripts, etc.
posted by docgonzo at 8:25 AM on February 3, 2006

I just dealt with this the other night and I could not find anything that did the trick. I spent 3 or 4 hours of weeding out dupes and other assorted crap. In doing so, I also screwed up my iTunes play list so I had to rebuild ALL of my playlists again. FUN!

I was able to find iTunes dupe barrier but couldn't get it to work properly as it kept hanging.
posted by photoslob at 8:30 AM on February 3, 2006

Has anyone used iTunes' "Show Duplicate Songs" feature to weed out dupes? Does it work?
posted by o2b at 8:48 AM on February 3, 2006

1. Import all of the larger source drive to the destination.

2. Create a table of the hash of each mp3s on the larger source drive (preferably as part of the import routine), and the actual destination of that mp3 (folder/file).

3. For each mp3 on the second source drive, take that mp3's hash. See if it collides with an existing hash

3a. If it does not, add that hash to the hash list and import that mp3.

3b. If it does collide, compare the actual mp3 data to the actual mp3 data of the mp3 with which it collided.

3c. If the mp3s are different, add the hash of the new mp3 to your list (as you need to add the file location even though the hashes collide) and import the mp3.

3d. Otherwise, reject that mp3 as a duplicate.

If you're sure two duplicates are binary identical (same rip, same compression, same tags -- that is, duplicate originated as copies of the same source mp3 and were not subsequently altered/retagged) use md5sum or similar to hash. If not, hash on known good tags (artist/album/track number may suffice) or on song length (CDDB uses song length and sum of total sing length per album/disk as a low-collision hash.)
posted by orthogonality at 8:59 AM on February 3, 2006

If the files and directory structures of the duplicated parts are identical, rsync will do it, i.e. something like:

rsync -r MusicFolder1/ NewMusicFolder/
rsync -r Musicfolder2/ NewMusicFolder/

Of course, this knows nothing about id3 tags or anything, so different copies of the same song (or the same copy in different places) will still be duplicated. I recommend you try it on a backup first.

rsync comes with OS X. There's also a third-party graphical version.
posted by beniamino at 9:00 AM on February 3, 2006

I had to do this myself. Here's what I did:

1. Apply a standard naming scheme to both libraries making sure that things like track numbers are included in song titles to prevent different versions fo the same sing clashing.

2. Put all the music files from one library into the root level of a folder, no artist/album foldering. You want all the files in the same place.

3. Repeat for the second library.

4. drag all the files from library A into library B's folder. select 'replace duplicate files'...

posted by DragonBoy at 9:05 AM on February 3, 2006

DragonBoy writes "2. Put all the music files from one library into the root level of a folder, no artist/album foldering. You want all the files in the same place."

This wiould work as well if you had a directory structure -- so long as the directory structure is the same.

Or in other words, a filename is really a canonical filename.
posted by orthogonality at 9:13 AM on February 3, 2006

2. Put all the music files from one library into the root level of a folder, no artist/album foldering. You want all the files in the same place.

When I did this, I was really surprised to find that I had "collisions" between same-named tracks in the same position on different CDs. So watch out.
posted by smackfu at 10:10 AM on February 3, 2006

I've found that "Show Duplicates" actually works pretty well.
posted by designbot at 11:05 AM on February 3, 2006

I've had good luck with Doug's Applescripts for iTunes.

When I did this, I created a new music library on a BIG external HD, dragged all the songs from both old iTunes folders into the new iTunes library(drag & drop on the window). After they were all imported, I used the "Corral Dupes" script - you can choose criteria (I think I chose file size, name, and artist) to find the dupes, then it will make a new playlist that features all duplicates (leaving one original). Then, select every song in the "dupe" playlist and hit [option + delete] - it will let you trash them.
posted by sluggo at 1:46 PM on February 3, 2006

« Older Setting up SPF on shared hosting   |   Lazy, Unemployed Wreck Newer »
This thread is closed to new comments.