Save me two weeks of manual labor, please
January 26, 2007 2:06 PM   RSS feed for this thread Subscribe

I have a data set of approximately 9,000 hip-hop tracks. It's in an Excel file. I have the song title, the artist, and the album. Now I just need the year. How can I find and automatically import the year into the Excel file without manually looking up every track on Google? Is there a way to do this with MusicBrainz or a similar database?
posted by schroedinger to computers & internet (9 comments total) 1 user marked this as a favorite
Not sure how good the data is, but you can download the whole FreeDB database. If you're handy with simple scripting you may be able to get what you need that way.
posted by Nelson at 2:14 PM on January 26, 2007


Yeah, if you're not good at scripting, it'll be hard- but it's probably quite doable. Download the full .db (not sure what format it is), and then you're doing a simple loop: save your excel spreadsheet as a simple .csv or tab-delimited format, and then for each line take song, artist, album, look it up in the DB, and grab the year for the first match (there could be more than one) and output to file song,artist,album,year. Once you've got the DB up, it should be a fairly quick operation.
posted by hincandenza at 2:38 PM on January 26, 2007


You can do this in Excel too if you can get the FreeDB database into Excel (I don't know if you can). Just create an "ID" field that is BandName&AlbumName - do the same in the FreeDB. Then use the vlookup formula function to match records in your DB with the records in the FreeDB, returning YEAR.

The assumptions I'm making is that the FreeDB can be exported in such a way that it can be opened in Excel, and that both DB's have Artist and Album titles in them.

If there are more than 65,000 entries in the FreeDB, then you'll have to move to Access or Filemaker Pro. Either way the process will be similar.
posted by crapples at 2:50 PM on January 26, 2007


I've worked with FreeDB, and it's paaaainful. If memory serves, the albums are organized as individual (XML?) files in a file system whose top level is genre (!).

In addition, the problem with a script to perform the matching is that it will be predicated on your song/album names being exact matches for what's in the dataset (e.g., "The Beatles" vs. "Beatles, The").

I believe there are publicly-accessible APIs to the FreeDB database for people who have normalized the data; I put together a version for personal use using MySQL, but scrapped it several months ago -- sorry.

If you're going to go the scripting route, I'd suggest querying Amazon, and pulling the "Original Release Date" field of the first item off the search results.

Another option to consider is Google; when you query Google using an artist+album, the first search result is usually a link to a Google info page about the album (e.g., the first link available for this search on "Beatles White Album" leads to this page on something called "Google Music" with the relevant information pertaining to that album.
posted by Doofus Magoo at 3:07 PM on January 26, 2007


There are most certainly more than 65k entries in the FreeDB. There is no way that you would be able to import that into Excel, at least not sanely.

I say your best bet is to export to CSV (or some other text format) and write a short perl/python/whatever script that queries each record against some DB. That could be FreeDB but it also be amazon.com, musicbrainz, etc. It will definitely take a little bit of programming skill, I don't think there's any way around that.
posted by Rhomboid at 3:11 PM on January 26, 2007


And when I suggested considering using Google, I wasn't for a second suggesting that you manually enter all the album+artist combinations in by hand, but scripting the query/parsing process. Find someone that knows Perl and buy them a six-pack of beer; it's 30 lines of code, tops.
posted by Doofus Magoo at 3:12 PM on January 26, 2007


e.g. Net::Amazon
posted by Rhomboid at 3:13 PM on January 26, 2007


Sure, you could do this with FreeDB, as explained above. You could do it with Amazon's SOAP interface. With MusicBrainz'. And, with Google's API -- which may have its new music category online that way.

The most important part is to get it out of the spreadsheet and into a form that a decent language can use. CSV will be an easy step.

In the end, the quality of the results is going to depend on how good your data are. When I think of Hip-hop, I don't think of well-defined naming schemes and meticulous transcriptionists, so YMMV. Whatever you do, I wouldn't expect more than a 70% hit rate unless you are very very very lucky.
posted by cmiller at 5:02 PM on January 26, 2007


Net::Amazon plus some perl regexes.

I'd do it for you, but then I'd have to bill you.

Actually if you're really prepared to do it manually, teach yourself web scripting and do it that way - it's actually quite easy and can be lucrative.
posted by singingfish at 9:51 PM on January 26, 2007


« Older Need an easy way to put notes ...   |   What's the best tomato-based d... Newer »
This thread is closed to new comments.