Too much music! I should have such a problem...
December 8, 2009 10:39 AM   Subscribe

Perl script to manage enormous, and growing, music library?

First things first - I am not a programmer by any stretch of the imagination, but I do have some SQL and Perl knowledge (more SQL, less Perl). And, I'm willing to learn more.

The problem: 1000's of MP3 files. Some are duplicates (and triplicates) from compilation albums, some have really bad metadata. What I'd like to do is run a script that will pull all the dupes into a file, so I can see how bad it is. Then, in the interest of saving space, I'd like to delete all redundancies. And further, I'd like to be able to manipulate the metadata - normalize such things as &/and, bulk-change misspellings, etc.

If a tried and true script to do this sort of thing exists, can you point me to it? Thanks in advance.
posted by chez shoes to Technology (9 answers total) 5 users marked this as a favorite
Top Perl tips (and I do it for a living)

Google "CPAN", and find any pre written modules

"use strict;"

Perl has a reputation as a "write only language" and it can certainly be used to turn out some world class gobbledegook. It doesn't have to be written that way. Verbose is better than cryptic concise. And comment!

Personally I'd first try to generate a hash (MD5 probably) of the actual music content of the files, trying to avoid getting metadata involved. De-dupe based on that.

There are possibly some libraries out there that can do rhythmic analysis etc etc, however how easy they will be to use is anyone's guess.
posted by hardcode at 10:58 AM on December 8, 2009

You know, big ol' ugly iTunes has a "find duplicates" function, will automatically pull and sort MP3s by artist/album, and automatically finds and attaches cover art. It also has nice "change the tags of these 30 tracks at once" editing.

Even if you don't use iTunes normally, your tracks might benefit from a detour in and out.

For more hardcore fixing, you can Google "mp3 tag fixers" and find dozens of scripts and apps, but if it's just a few thousand songs... I'd just do a lot of select, control-(i) for get info, edit, repeat. Couple of hours?
posted by rokusan at 11:17 AM on December 8, 2009


id3lib is a great command-line tool to read/manipulate id3 tag info (perfect for scripts, might be easier than learning MP3::Tag and the perl required to leverage it.)

EasyTag is (IMHO) the best graphical id3tag editor.

MP3 Diags fixes tons of problems with mp3 files.

I'd avoid iTunes like the plague if I were you. (I've found that it can be quite recalcitrant when it comes to actually writing id3 info to the file and not just to the itunes database, despite invoking every imaginable combination of options to try and force it to do so.)
posted by namewithoutwords at 11:31 AM on December 8, 2009

Response by poster: Thanks rokusan - I wasn't sure what to Google, and now you've helped me!

I'm actually an iTunes fan even though it goes against everything one would expect about me :) That being said, their find dupes function doesn't seem to be able to tell that something is a duplicate if the Album name is different, so what I'm trying to do is go beyond iTunes' basic functionality.

We're talking about 40,000 songs, so it's definitely more than a couple hours' worth of effort. I've been doing exactly what you described, and it's seeming more and more of a Sisyphean task! Plus, it only makes the changes in the iTunes library, not in the actual files themselves - I would like it to be global, as I alternate between Winamp and iTunes.
posted by chez shoes at 11:32 AM on December 8, 2009

Best answer: What? No, iTunes edits actual MP3 tags. In the song file.

The Library contains playlist and other information NOT in the tags.

Yes, 40K is more than what I thought "thousands" meant. That's about the size of my library too, but so far iTunes and a lot of shift-select-edit has been enough. Once it's clean, it just takes a little discipline as you add new things. I have a playlist called "New" for things that aren't cleaned up and "filed" yet.
posted by rokusan at 11:47 AM on December 8, 2009

Best answer: Some programs to look into:
Musicbrainz Picard
The Godfather
posted by ArgentCorvid at 12:26 PM on December 8, 2009

That being said, their find dupes function doesn't seem to be able to tell that something is a duplicate if the Album name is different

Are you using iTunes 9.x (whatever the latest is?). I've certainly had it offer up dupes that are from different albums. In fact, I'm rather annoyed that it will flag songs along with their live version off of clearly different albums.

If you write something yourself, don't bother with MD5 hashes - that approach would probably be next to useless. You'd be looking for files that are exact duplicates, byte-for-byte. Thus, it will utterly fail to detect identical songs encoded by different sources, that use different compression, or even have different embedded ID3 tags.
posted by chrisamiller at 12:33 PM on December 8, 2009

Response by poster: Huh, thanks for the clarification, rokusan - I'll have to fiddle with iTunes a bit more and see if I can make this happen.

And thanks to everyone else for the links so far - these are exactly the sort of things I had in mind.
posted by chez shoes at 12:46 PM on December 8, 2009

Response by poster: I am using the latest iTunes (can't recall which version but I know I updated it recently). The other glitch I'm running into with the find dupes function is that if one file has an ampersand, and its duplicate has the word "and" spelled out, iTunes doesn't seem to know that they are dupes - hence my original desire to bulk search-and-replace the ampersands and so forth.
posted by chez shoes at 12:49 PM on December 8, 2009

« Older Living near Duluth, GA?   |   Very simple peanut butter cookie recipe for kids? Newer »
This thread is closed to new comments.