All Music is %Big?
December 9, 2011 9:10 PM   Subscribe

What is a rough order of magnitude (ROM) estimate of the size of a searchable, database of all known recorded music?

Based on my severe mis-step in this thread, I have to wonder:

If you had a workable database of all recorded music ever, with Artist/Title/Edition, with one copy for each iteration of that triumvirate, how big would that be?

For the purposes of this question, ignore:

- multiple electronic copies of the "same" Artist/Title/Edition definition of a unique entry
- corrupted copies of same A/T/E

Assume that it is a perfect database, so there will be no duplicate entries for the same A/T/E that are a few seconds shorter, or have different audio artifacts, etc. Basically, there is ONE DEFINITIVE entry for each possible A/T/E

But, we do have to include:
- any and all cover songs (different A)
- any and all different performance dates or venues (different E)
- any and all different iterations by A and or E of T

So ---- how big is this gonna get, while keeping it searchable and integratable, ala Amazon-ian expertise?
posted by yesster to Computers & Internet (17 answers total) 3 users marked this as a favorite
Response by poster: Oh, and not just a reference database, but a database that includes the audio files themselves.
posted by yesster at 9:14 PM on December 9, 2011

I assume a decent grade lossless copy vs low grade mp3? (don't have an answer but the storage difference is going to be pretty big)

Hi yesster :)
posted by edgeways at 9:17 PM on December 9, 2011

If you're talking compressed at, say, 256 Kbps, I'm guessing that the answer is in the high hundreds or low thousands of terabytes. That's based on estimated total sizes of the iTunes music store.
posted by supercres at 9:25 PM on December 9, 2011 [2 favorites]

Ok.. this is going to be really rough but here is an estimate:

Gracenote has about 100,000,000 songs in it's database. Some of those are going to be duplicates, but I also know Gracenote doesn't encompasses many many songs, especially by local artists, so lets pretend for the sake or argument it is at least in the right neighborhood.

If, we also assume an average song will be ~3 minutes

then looking at a recent FLAC file I ripped for a song just about at 3 mins it equals 22.5 MB.

22.5 MB * 100,000,000 (songs) is just about 2,146 terabytes.

(so, for the given rate of what hard drives cost if you plopped down about $172,000 you could have the storage space to contain, perhaps, all the recoded music in the world)
posted by edgeways at 9:39 PM on December 9, 2011 [2 favorites]

Response by poster: So .25 million for one copy. Then we make redundant copies and do that whole fancy load balancing thingy and open an API for 3rd party stuff . . .

Sorry, re-calibrating ..... $175,000. Or about one year's operating costs for an FM radio station. In 1983?
posted by yesster at 9:48 PM on December 9, 2011

According to Brewster Kahle, there are about 3 million published discs. (I'm not sure how that squares with edgeways' GraceNote estimate, but since we're just spitballing, they're bot too far apart.) This doesn't include bootlegs, free recordings, etc.

Worst case, assume these are all completely full CDs, stored uncompressed. That's about 700 MB * 3M ~= about 2100 terabytes. (hey! same ballppark as edgeways. yay!)

Given that our friends at the Internet Archive now make a Petabox for off-the-shelf storage of a petabyte of data, it's totally doable from a technological standpoint. The rights issues would be a bitch, though ...
posted by chbrooks at 9:53 PM on December 9, 2011 [1 favorite]

Btw Here is the link to the Gracenote page.

Oh yeah it's be totally technologically doable, but pretty darn expensive setup probably in the realm of a few million and definitely the rights would be a headache to make the current European debt talks look like agreeing to house rules for Whist.
posted by edgeways at 10:02 PM on December 9, 2011 [1 favorite]

I really think you need to consider the difference between "all recorded music ever" and music which has made it to a CD issued by a record label. Because Gracenote seems to be a collection of CD data, and the gentlemen who made a "TED Talk" is also referring to "Published discs."

Off the top of my head, I can't imagine these lists of CDs being more than 2 or 3 percent of "all recorded music ever." I guess it kind of comes down to what you mean by "known," but when you consider all the tapes and albums made by bands, individuals, corporations, etc since the dawn of recording, what has made it to a CD release by a record label can't be more than a tiny fraction.
posted by drjimmy11 at 10:06 PM on December 9, 2011 [2 favorites]

well the question is aiming towards storage of digitized music for distribution, so while I think your point is valid drjimmy11 (although I'd argue against the % is likely much higher than a "small fraction") and that yesster may have been a little clearer I think the intent is clear enough.
posted by edgeways at 10:15 PM on December 9, 2011

Response by poster: Sorry, but 100,000,000 songs at 3 minutes each is 3,000,000,000 minutes. edgeways did the estimation. Telegraph did the article, but not about music.

This would mean that every day, the man-hours spent playing that game equals the (non-man-)hours of recorded music available.
posted by yesster at 10:43 PM on December 9, 2011

Response by poster: ^10error, done now
posted by yesster at 10:44 PM on December 9, 2011

How do you feel about remastered editions and suchlike?
posted by box at 11:03 PM on December 9, 2011

Total cost of ownership would easily be in the millions. According to the company that builds the Petaboxes linked for Internet Archive storage, they quote ~$1.50/GB. At ~2,000-3,000 terabytes you are looking at $2-3 million just to buy the hardware. That doesn't even include the cost of switches, cabling, transport, setup, monitoring gear, etc. Then you have to consider the software and operation costs. This kind of thing would require custom software development to make it usable. You can't just drop Windows and a copy of iTunes on here, you would be looking at a custom Linux distro tweaked for large compute, ditto filesystem, cluster-aware, searchable database(s) with redundancy, replication/failover/mirroring, some kind of usable frontend, and keep in mind that the HVAC setup and electricity bills for this kind of thing are non-trivial.
posted by sophist at 1:12 AM on December 10, 2011

If cost is a part of your math here, remember that Apple and Amazon's biggest costs are for licensing, not hardware or software. Record companies still aren't very liberal with this newfangled technology sharing hooey.
posted by rokusan at 5:38 AM on December 10, 2011

Does this database exclude bootleg recordings? Or are we including only official releases? Bootlegs of live shows meet the criteria listed in the original question.
posted by SuperSquirrel at 7:36 AM on December 10, 2011

Possibly relevant: Backblaze $117,000 1 PB disk array. The article includes some comparative pricing.

Anecdotally, large enterprises are starting to buy 1 PB disk arrays, mostly expensive ones from EMC. You can't order them online but they're also not weird experimental technology. In 10 years, we're looking at 300 TB individual hard drives for around $300, so you should be able to spend around $2400 to get enough storage to store the world's music.

Downloading all that music at 100 MBit would take over 5 years though.
posted by miyabo at 11:42 AM on December 10, 2011

"All known recorded music" has to include records, tapes, and 8 tracks, only a fraction of which have been digitally remastered. Let's say 20%. So if this was for the whole jar of M&Ms, I would say 10000 terabytes.
Licensing: Personal Use Bitches, OP didn't mention making a public portal :)
Good question!
posted by joecacti at 11:04 AM on December 12, 2011

« Older What does the night know?   |   Gifts for a newly minted real estate agent? Newer »
This thread is closed to new comments.