Safely Storing Mp3 Files
June 9, 2005 12:44 PM

I have over 20,000 mp3 (legal) files, which I want to be stored safely for decades to come, and I'm looking for deviece suggestions.

Basically I know that an external hard drive would be my best bet, probably a 300gb one, but which one is the best and has the least probablility of failing me. I want to give my collection to my children someday, and over a year ago i lost 10,000 when my hard drive died, and have since been paranoid as the collection keeps expanding.
posted by AMWKE to Technology (40 answers total) 5 users marked this as a favorite
The simple answer is to keep more than one backup, and tp diversify your technologies. Everything out there can fail, but you minimize the probability the more backups you have.

Multiple hard drives are probably your best bet. 20,000 3.5mb MP3 files can be easily stored on an 80GB drive. 80GB drives sometimes go for around $30 after rebates, and that means you could buy 4 or 5 for the price of a single 300GB drive at sale prices. Back up to a different hard drive each week, cycle them out, and if you wanted to minimize the probability of loss, store each drive in different, secure, climate-controlled locations. You can buy a single external enclosure and swap the hard drives in/out, as long as you put them back in their anti-static bags.

Similarly, you can get a spindle of 20 DVD-R discs to back up on. This will also cost you around $30, but possibly less after rebate. However, DVD-R authoring can be iffy, with certain drives having trouble reading certain discs made with other drives, and there's the possibility of scratching.

Backup tapes for personal backups, IMHO, aren't worth it for personal backups.

Ultimately, you'll need to back up your collection on something else; today's hard drives and DVD-Rs are unlikely to be compatible with the storage formats 20 years from now. MFM-format hard drives from 1985 won't work in today's computers, and 5.25" floppy drives are hard to come by. You should probably swap to a different storage mechanism every 3 years or so (which is the typical warranty-life of a hard drive).

Personally, I'd go with a multiple hard-drive backup. They're cheap and robust.
posted by eschatfische at 1:01 PM on June 9, 2005


Hard drives.

The question isn't "What's going to be around in twenty years" because, frankly, newer, better stuff will be out by then.

The real question should be "what will allow me to most easily and quickly transfer my music to this new medium?" And the answer is hard drives.
posted by Civil_Disobedient at 1:05 PM on June 9, 2005


cd or dvd, i think ... that's what i use ... i don't happen to believe the stories about them failing after a few years ... it hasn't happened yet to me

use paper envelopes, not jewel boxes and a file cabinet is very good to keep them in

on preview, yes, cd and dvd may be eventually replaced by something else ... but it won't happen without some warning and you'll have time to transfer to the new media ..
posted by pyramid termite at 1:06 PM on June 9, 2005


Hard drives, no question. Get a USB->IDE enclosure, and put a 200GB hard drive in the enclosure, copy everything over. This should be your backup, with a normal IDE drive inside your computer being your main drive. If either drive fails, replace it and copy the files over from the other. Once you get above a certain number of mp3s CDs become impractical, and as mentioned previously DVD-R support is just a little too spotty for my tastes at this point.
posted by patrickje at 1:23 PM on June 9, 2005


If you really want to be safe it has to be multiple formats (hard drive, dvd, etc) in multiple locations, I've also got about 20,000 mp3's and I keep them on the server at home with a hard drive backup and a dvd backup in the apartment as well as a hard drive backup at work and they are also stored online in a web accessable server far from home. I lost 40,000 mp3's a few years ago because of no backups so I'm pretty careful.
posted by Cosine at 1:28 PM on June 9, 2005


mp3? Twenty years from now, you may wish that you'd picked lossless compression.
posted by box at 1:35 PM on June 9, 2005


you may wish that you'd picked lossless compression
Exactly. Unless these are voice recordings MP3 is not an archival format. If possible, re-rip (securely, of course) into a lossless format.

I want to give my collection to my children someday
Give them the original CDs. They will likely still be playable and your kids can rip them into a proper format.
posted by Monochrome at 2:35 PM on June 9, 2005


If you're using external drives you could also go for one of the network devices like the linksys NSLU2. Their tech specs indicate you can set it to auto-backup from one attached drive to another.
posted by phearlez at 2:36 PM on June 9, 2005


agreed that something like flac or wavpack or at least musepack should be considered instead of mp3.

although a hard drive is easier and much more convenient, just remember that it is not really meant to be a backup medium. cds and dvds have their own issues as well, but random neutrinos can potentially bork a magnetic disk 10 or 20 years down the line.

I also agree that you should do more than one type of medium.
posted by dorian at 2:37 PM on June 9, 2005


Hard drives are poor archive media, with many points of failure, mechanically and electronically speaking. Additionally, good luck at plugging in a hard drive you buy today into a computer you buy 20+ years from now.

While an inexpensive format with a robust interface, CD-Rs and DVD-Rs are also poor archive media, as the dyes oxidize over time and you lose the ability to recover data.

You will need tape for real archive security, albeit at greater expense. Tape technology matures more slowly and backwards compatibility is built into the various formats. Additionally, tape is technically a safer medium. Its cost reflects its greater long-term utility.
posted by Poltroon at 2:42 PM on June 9, 2005


'good luck at plugging in a hard drive you buy today into a computer you buy 20+ years from now.'

You'll need a way to read the cd and last I checked a cd-rom drive used the same interface as a hard drive.
posted by Cosine at 3:16 PM on June 9, 2005


Digital archivists who want to store data for perpertuity don't talk about a perfect permenant medium, rather they talk about "refreshing" the data on a regular schedule such as every 5 years. (They also update the format with each refresh if that is appropriate.)

So realistically, to keep your data for 20 years or more, a more realistic approach is to find a medium to preserve the data for 5 years, and plan on recopying the data to a new medium after that time. Fortunately, the power of computing means that the cost of the media to store the same data will cost much less when you refresh it in 5 years.

Personally, I would take other people's advice and use multiple external hard drives as the best bet to get through your first 5 year window.
posted by Tallguy at 4:18 PM on June 9, 2005


atapi, scsi, sata, old wacky proprietary IFs... it's not the interface that matters, it's the data standards. whereas for a hd it really *is* the IF.
posted by dorian at 4:35 PM on June 9, 2005


What Tallguy said.

The only "safe" data is data that exists in multiple locations *and* formats *and* is moving around, or, being refreshed.

Leaving data on a hard disk without multiple copies is data suicide. Keep it in at least two places, with at least a third location in the rotation.

This is why RAID technology is so successful, especially RAID 5 or 7, because it integrates this concept into one system, making it both redundant and mobile.

20 years from now is a very, very long time in the computing world. By then it's very likely we could have optical and/or quantum storage media, and the idea of throwing a terabyte (or a multiples or exponents of terabytes) on a crystal the size of a sugarcube wouldn't be science fiction.
posted by loquacious at 4:56 PM on June 9, 2005


Once you get above a certain number of mp3s CDs become impractical

... if you don't have a cataloging program to help you ...

my problem with hard drives is they can fail ... it happens all the time ... i agree that multiple formats and locations are the only real options for 100% certain backup

ah, well, you live long enough, you'll lose everything, right?
posted by pyramid termite at 7:05 PM on June 9, 2005


Yeah, hard drives fail... eventually. But in my two decades of hard drive experience, I've personally had maybe four crashes, and two of those were old ST-225's (20 meg drives).

I cannot count how many CDs have been scratched up enough to cause skipping. Hundreds.
posted by Civil_Disobedient at 7:48 PM on June 9, 2005


I just happen to direct a digital music archive for a university. Welcome to my hell.

One of the world experts on this subject, Diethrich Schuller, Director of the Vienna Phonogrammarchiv, gave a talk in my lab last fall presenting his latest research. Basically, he says we are very likely deeply screwed, and to keep the analog copies and keep making analog backups as long as they keep making the tape (since Quantegy just went Chapter 11, that's become a real warning, and just try to buy archival reel to reel tape right now -- it's all being hoarded!). I won't go into his research (but see the VPA's website here . He has me stocking up on backup video and audio decks to put safely away for later in various formats - Hi8 video, VHS, audio cassette, etc., though I can't afford to buy extra Otari r-2-r decks! Same for computer hardware -- keep extra old-format drives, a few old machines (laptops are handy for space-saving) etc. and maintain them in working order. He convinced me that DVD was a disastrous medium for any number of reasons, chiefly the sheer density of data on them and thus the tiny amount of damage needed to render it unrecoverable.

Hard drives are the necessary evil they are, certainly for *delivering* audio over networks, but inarguably as one layer of backup. The reasons have been discussed here already, and many good tips given, especially keeping multiple copies on different high-quality drives -- Seagate or Western Digital, the better quality drives from any manufacturer (pay a premium; it's worth it when you amortize it). Replace the drives on a schedule -- every year or two ideally. You'll find uses for the old drives you take out of service - the sheer quantity of data many of us now need to store is unreal. Put them in your TiVO or something, but keep your music on *newish* drives. Keep them very carefully, and have a rigorous backup strategy that rotates among them. You can lose data during transfers, so do that work over a firewire or better internal SATA or SCSI connector, on a machine with lots of juice and nothing else going on while you backup; better to erase and rewrite (after low level reformat) than to defrag, but the balance is that drives wear out. Don't fill them all the way -- leave some headroom and you save them some wear and tear. Be anal about dust, humidity, magnetic sources, heat, and vibration where you store them, and store copies in different buildings.

Tape is indeed the best archival medium, though it has (well understood) limitations as a physical medium. It's an expensive and time-consuming medium to work with and not scaled to consumer usage in price or availibility or interface, though that's changing.

CDs are way better than DVDs, and for mp3 archives they are practical as a redundant copy. Again, don't fill them all the way (the edges, written last, are often first to be damaged). Don't write on them. And use archival grade CDs. Alas, there is only one available archival grade CD-R on the market at the moment, since Maxell's Gold seems to have gone out of production. Brand new from Delkin, also called Archival Gold and marketed to pro photographers. They will run you a buck each, slightly less if you buy 100 (they come in 25s, or jewel cased 10s). They claim a 200 year life, but that's absurd and unknowable. Believe it or not, CDs are susceptible to some fungal contamination in humid climates. And playing them or handling them degrades them.

It is a shame that your music is in mp3 -- another way we are all chasing our tails with technology. (Which reminds me, don't *ever* compress audio for backup or storage if you plan to archive it; I am not sure what happens if you try to compress already-compressed mp3s, but I wouldn't try it.) mp3 is a format for portability and data economy, but even at its highest bit rate it sacrifices a lot of audible data. NO compression at all is the professional standard for digitized analog sources. (There is an issue of whether any of the current audio file formats -- wav, mp3, ogg, aiff, or pcm -- will continue to exist. Some of them are proprietary and only used under license. Companies can withhold permission to use them, or go out of business, or stop supporting them. You may well have to do mass format conversion in the future just to keep your audio data accessible. Pay attention to the news on digital audio if you care about your collection. If you are obsessive, use more than one format for your backup sets (raw PCM, AIFF, etc., or even make a compressed version *just in case* and store it on an iPod).

The perfect can be the enemy of the good. You can be reasonably secure with a decent multi-hard drive backup for the next few years and then reassess. Anything that inhibits your ability to make those copies and take care of them is not worth thinking about. I have to archive thousands of hours of vintage analog audio and the standards keep rising (16/44 has now become 24/96, we just tested our new D/A converter today). In 24/96 we are talking about several *terabytes* of data. Only very recently have storage prices come down to where this is practical for an archive like mine. A consumer-level archivist has to make even greater compromises. There is no guarantee save obsession.
posted by realcountrymusic at 7:53 PM on June 9, 2005


The other reason to use an external USB/Firewire drive enclosure for your backup drive(s) is that external port connectors have longer lives than internal drive connectors. Parallel ports are still common, decades into their useful life. USB and Firewire will likely be similarly long-lived, though SATA or IDE will probably not be.
posted by anildash at 7:53 PM on June 9, 2005


16/44 has now become 24/96

Are you upsampling 16/44 audio to 24/96? Or are you just switching your capture rate now that the higher rate is available? Not to derail, but this is an archivist's thread, and I think it may matter to the poster. Personally, I would recommend that he/she re-rip his collection at lossless... but which lossless format?

Finally, another vote for multiple drives on regular retirement rotation. The cost per GB for HDs is actually comperable to DVD-R... so why go through all the hassle and time of burning out back-ups?

Übber-finally, you tape-heads can take a bath! That technology may be fine for banks and accountancies, but damn! what a pain!
posted by squirrel at 4:43 AM on June 10, 2005


No purpose to upsampling. Nope, we start all over at the beginning with the analog masters, a newly serviced Otari, and digitize it all over again.

Absolutely the OP "should" go back and re-rip the entire collection, except that I'd guess like most of us s/he acquired much of it in compressed form to begin with. This is what I mean about chasing our own tails. When you pay 99 cents for a compressed track, you are not getting something comparable to the $1.50 it would cost you on a CD. You're getting, oh, 1/10th the product.

"Lossless" compression is an oxymoron. It sounds better, but it ain't the same as uncompressed audio. You *want* redundant data and even "empty" data. You want it all.

The time it would take to re-rip 20,000 tracks off CDs is significant. But it might be worth re-ripping anything you actually had in 16/44 and really cared about.

cool thread.
posted by realcountrymusic at 5:13 AM on June 10, 2005


Does it matter? In five years, we'll all be networked and connected, and if my drive of 20,000 mp3s dies, I can download what I want from free sites or subscription services (this assume that music companies will at some point be forced to take their heads out of their collective asses and find a better way to sell music). Let other people worry about the infrastructure of maintaining that much material.

Really, (and as a music fiend, I know that this is true) with all the music, you will never need it all. It's like collecting paperwork: you keep it because you think, "One day, I will need that song." But you probably won't.
posted by rev- at 7:20 AM on June 10, 2005



Only you can answer the question of what you might someday want. But I would not trust my own cherished music collection to a networked future in which content will be tightly controlled by corporate interests. Heck, a lot of stuff is already not available in digital form. Again, services like Rhapsody have a place for the delivery of music (but the audio is awful!) but an archive is another matter.

Further on lossless formats, I don't see the point. They serve the end of portability, where every megabyte matters. But storage prices in all media are so low that the 30-40 percent savings of lossless compression codecs in space aren't worth the risk that some of the compressed data (e.g., silence, I think, thoughhaven't really paid attention to the algorithms used for lossless compression, because the term is an oxmoron on its face) may be vital in the future. As technology for restoring and converting audio improves (which it has steadily for 100+ years) and changes basic technological foundations (ditto), there is no way to know what might be possible to fix. Any repair/restoration scheme is improved by having the maximum amount of data to build from, including data in supposedly inaudible frequency ranges, and all the noise artifacts that pre-date the digital form (chances are good that some of your mp3 collection is made from long-ago or even recent analog masters; there is quite a bit of analog technology used in current rock and country recording, for example, for tone and for ideological reasons and because some analog technologies -- e.g. distortion, according to many guitarists -- are still superior to their digital replacements). And of course, music is "analog" in many respects as an acoustic phenomenon -- our hearing mechanism contains several analog segments though it may culminate in a quasi-digital neurological sequence.

I have listened to the Apple lossless codec against mp3 and aiff, and it's very very good sounding. I'd definitely consider it for some kinds of delivery of audio, or for storing spoken word materials. But no way I'd risk valuable music by encoding it with a proprietary lossless codec that's almost brand new.

Further on DVD: besides the root cause of the problems -- data is packed too densely to allow for any margin of error with damage or misalignment etc. -- the format is simply not mature. We have several competing DVD formats to deal with, none with the upper hand, and now higher-capacity DVDs are beginning to penetrate the market seriously. Not at all suitable as a stable archival medium for data. The installed base of DVD players is such that as a medium for video playback it will persist in a backwards-compatible fashion for some time, though you'd still want multiple copies (I keep *6* copies of every DVD we create from live events we film for the archive). The numerous technical and legal obstacles to retrieving raw data from DVDs are also terrifying -- and watch CDs carefully in this respect as well. Among other things, you may well want to know how to make an analog-mediated copy of your digital archive. Yeah, it's giving up a generation, so you have to use the very finest analog playback system you can, and the very best re-digitizing platform you can. I'm doing this with things like streamed radio shows we occasionally record.

Tape is less of a hassle and expense than it used to be. Get yourself the IT/Sysadmin version of PCConnection or CDW catalogs (call them, or just order a cheapo router for $20 and they assume you are the head of IT for a major bank and inundate you with catalogs full of switches and servers). There are some good options for tape at the prosumer level now, including some from IoMega, who are the masters at taking pro formats to the consumer level (of course, can anyone remember Zip drives? Jazz? Bernouli? Orb? yikes.)

You can set up a tape backup system for less than a grand, including a few hundred gigs of tape. Then you have to use it. And store the tapes correctly - just like audio tape.

I swear to god I keep my old cassettes with interviews and live music on them, exercise the tapes, and redigitize them every couple of years in a higher resolution. Some of my Maxell UDXLII90s are 15 years old and still sound pretty good. I don't think I have a single digital storage device about which I can say the same thing, though I do have a working Powerbook 230 from 1993, so one hard drive is getting close!
posted by realcountrymusic at 8:18 AM on June 10, 2005


"Lossless" compression is an oxymoron. It sounds better, but it ain't the same as uncompressed audio.

Er, yes it is, by definition: it is bit-for-bit identical to the uncompressed source. Now the issue of whether you'll be able to de-compress it later is another thing, but if you use an open-source codec or at least a well-documented file format, in the worst case you'll be able to write a decoder yourself.
posted by kindall at 8:45 AM on June 10, 2005


(e.g., silence, I think, though haven't really paid attention to the algorithms used for lossless compression, because the term is an oxymoron on its face)

Kindall is absolutely correct. Prove it to yourself - take a track, compress it with a lossless encoder, then uncompress it and do a bit for bit comparison. It will be perfect!

However, there is an issue of data dependency. In a wave file a single bit error will only effect one sample, but in a lossless file it might effect several samples in a row.

The data dependency is further complicated by the hardware/software reading the storage media. For example, a single bit error might cause the hardware/software to discard an entire block (possibly a lot of data, depending on the details) even though most of the data in that block is still good.

So the real problem with archival storage is that the requirements are fundamentally different from the assumptions of traditional digital hardware design. Most computer storage assumes that if it isn't bit-perfect it is garbage, so if the ECC can't fix it you trash it

There must be people out there looking at designing digital storage technologies with a mind to archival purposes...
posted by Chuckles at 9:39 AM on June 10, 2005


However, there is an issue of data dependency. In a wave file a single bit error will only effect one sample, but in a lossless file it might effect several samples in a row.

Ergo my statement that you need all the data points you can preserve, and compressing and decompressing, even lossless, provides yet another point in the cycle where small errors can creep in and ramify until a file is eventually damaged or even unusable. Decompression might be "perfect" 99.9 percent of the time. But I doubt it. Storage is too cheap, and lossless codecs don't save enough space to interest me. Besides, when I listen to "lossless" compressed audio, I could swear it doesn't sound as good as the original. Totally subjective, but I've tried blind comparison and usually get it right as to which is the compressed file. Basically, the idea is to keep your original data intact, and the less you do to alter it, the fewer times you need to convert it or transfer it, etc., the better the odds. We labor often under the mistaken belief that digital copies don't have generational loss. They often do, and tiny bits of damage can have huge consequences -- the errors can compound.

I admit I'm not an expert on the subject of data storage and compression, but the experts I've consulted all say *never* compress archival audio.
posted by realcountrymusic at 10:04 AM on June 10, 2005


Sorry, you are fooling yourself if you think you can tell any difference between PCM and FLAC or any other lossless audio compressor.

Lossless compression uses well-understood algorithms that are mathematically reversible. You don't worry that a zipped doc file will somehow magically change when you unzip it -- and it would literally be magic, i.e. an occurrence outside the laws of the universe as we know it, something like 2+2 suddenly equaling 5 just this once. ACE, FLAC, etc. are the same. Typically they apply some kind of reversible manipulation to the data to make the entropy more sequentially obvious (e.g. encoding the difference between samples rather than the actual sample values, rearranging the high and low bytes of the samples, and similar things), then stuff the result into a Huffman or LZW or arithmetic coder, the same algorithms used in ZIP, RAR, etc.

You do run a certain risk of not being able to decode the file completely if you have a media error, but that'd be a problem with an uncompressed file too. A well-designed lossless audio compressor will divide the file into blocks so that a single read error doesn't render the rest of the file unretrievable. In any case, the solution is the same as it is with uncompressed data -- redundant backups.
posted by kindall at 10:28 AM on June 10, 2005


Besides, when I listen to "lossless" compressed audio, I could swear it doesn't sound as good as the original.

This just isn't true. You probably aren't doing a properly controlled test. For example, you listen to a CD on your CD player, then you rip it, lossless compress it, then play it back through your sound card... DUH! It will sound different...

Or maybe you play a CD transport through an external DAC and you play the PC back through the same external DAC and you think "well if those don't sound the same I've proved it". Well they won't sound the same. Windows recodes the music during playback - which can be defeated using ASIO.

Even after doing that there may be small issues... It doesn't alter the definition of 'identical', and the files produced by lossless algorithms can be reproduced perfectly.
posted by Chuckles at 10:35 AM on June 10, 2005


You do run a certain risk of not being able to decode the file completely if you have a media error, but that'd be a problem with an uncompressed file too. A well-designed lossless audio compressor will divide the file into blocks so that a single read error doesn't render the rest of the file irretrievable. In any case, the solution is the same as it is with uncompressed data -- redundant backups.

It is my understanding that Audio CDs were designed specifically to 'play through' media errors. So presumably the best archival method would be to store the audio as an Audio CD, not as a wave file. However, I believe you can tell many CD drives to ignore errors, and presumably that would allow the wave file to act much like the Audio CD, and they would each be slightly more robust than the flac file for archival purposes (not commenting on weather it is worth the data overhead).

Of course this is nothing to count on. Another media technology will deal with errors in a different way. Data formats aren't typically designed to play through errors, they are designed to correct errors.

Basically the message is, you really need to understand the technology you are counting on. That, and somebody needs to design archival data storage methods to deal with errors in an archivally optimal way.

For example, think of a photomosaic, or a progressive GIF... If you store the file access information and the macro picture data repeatedly, and then put the micro data in the spaces in between, you have a media that can be damaged very badly but still read. You might not get to see all the little pictures in the photomosaic, but at least you can make out what the whole thing together was supposed to look like...
posted by Chuckles at 11:21 AM on June 10, 2005


I'm a composer, working in various digital mediums for the last 20 years, and find this to be a fascinating discussion. Some random thoughts and observations:

1) Is there as service in existence that will archive digital media for you? If not, what a great idea. Google should take that on as a paying business - I would trust them with my data. Let them maintain the triple-redundancy, climate-controlled, bullet-proof servers on three continents, and I'll just upload my shit. I'd give them, say, $5/year/gigabyte?They could also profit from data redundancy if you let them sniff your packets (a slightly cheaper service?) e.g. they could only store 1 lossless copy of "Blue Suede Shoes" and we could all access it. Think how much overlap there must be in people's mp3 collections.

2) Audio recordings on alog tape deteriorate dramatically over time. I have some tapes I made when I was ten years old in 1973 and they are much noisier now. But you can still listen to them.

3) I have a gigantic library of audio on DAT tape, and I sometimes lie awake at night worrying about it.

4) For archiving projects, we must distinguish between very valuable data and easily replacable data. The digital pictures of my daughter's first year are irreplacable. My mp3s of the new Jon Hassell record, beloved as they are, can be re-acquired fairly easily if necessary. Therefore, different archiving strategies for these things may be appropriate.

5) "...in the worst case you'll be able to write a decoder yourself."

ah ha ha ha ha ha ha ha ha ha aaahhhhh ha ha! hoohoohoohooo! rotfl!!
posted by blakeleyh at 1:53 PM on June 10, 2005


3) I have a gigantic library of audio on DAT tape, and I sometimes lie awake at night worrying about it.

Worry. We have no idea how long DAT tape lasts yet, and in my experience (which is extensive because I have used DAT myself as a primary recording medium in my research for a decade now) it holds up well -- my ten year old DATs are ok. Since they are digital, they don't suffer from gradual decay, but one day they may just not work.

A bigger concern, though, is that the format itself is going slowly into oblivion. I have just purchased a redundant backup DAT deck to keep for future use I'm so worried, and a lot of blank DAT tape, which is getting harder to find and more expensive. (It's still easily found, but fewer places carry it and they seem to have given up on selling it to consumers.) Handheld digital solid state recorders are coming on like gangbusters (Edirol's R1 is backordered for months at every major US distributor). Minidisc just went through a huge format upgrade, but is also dying due to screwy limitations on the consumer level machines, but it was replacing DAT in lower cost environments for the last few years. I recommend making digital backups of your DATs soon. If you can get a DAT deck that has an optical/coax/SPDIF out, you can make pristine digital copies.

To my ears, nothing sounds better than DAT (with a good mic and preamp), though I am now using the Marantz PMD-671 flash recorder and am very impressed (it also takes the little microdrives as well as 2GB CF cards).
posted by realcountrymusic at 3:28 PM on June 10, 2005


This just isn't true. You probably aren't doing a properly controlled test. For example, you listen to a CD on your CD player, then you rip it, lossless compress it, then play it back through your sound card... DUH! It will sound different...

Not to be confrontational or anything, but the DUH! is uncalled for. I am effectively an audio professional, certainly experienced enough to know how to set up a controlled A/B blind test. My computer's "sound card," fwiw, is better than most CD decoders (MOTU 828MKII, and Edirol FA66 on my laptop). I'm also a professional musician, or was, and pride myself on my ears. I am aware that spectrographically, lossless and raw audio will look about the same on the FFT. I am reporting only my subjective impression, which has been reinforced by my apparent ability to discriminate accurately (using headphones, of course) between lossless and raw samples of the same audio segment. Or maybe I am a very luck guesser. I may not be a genius, but I had considered the variables when I formed my opinion. People say you can't hear the difference between 48 and 96 KHz sampling rates either. I hear it, and I know engineers who can hear the difference reliably under blind conditions. Not exactly comparable, but I am humble in the presence of the human auditory system.

Otherwise, some very good points.
posted by realcountrymusic at 3:37 PM on June 10, 2005


my apparent ability to discriminate accurately between lossless and raw samples of the same audio segment.

People say you can't hear the difference between 48 and 96 KHz sampling rates either.

Apples and oranges. RCM, you're really damaging your otherwise interesting contributions to this thread by insisting on this lossless degradation idea. If you're sure there's a difference, something's wrong in the methodology of your A/B comparison.

Or, heck, maybe you've stumbled onto a flaw in the lossless compression algorithm that you used. If it's a mature, well-known algorithm, I doubt it.
posted by intermod at 8:15 PM on June 10, 2005


5) "...in the worst case you'll be able to write a decoder yourself."

ah ha ha ha ha ha ha ha ha ha aaahhhhh ha ha! hoohoohoohooo! rotfl!!

Well, like I said, these lossless algorithms are not that complicated, and on the back-end they use well-understood, efficient algorithms like Huffman and LZW that are never going away precisely because they're well-understood and efficient. (LZW is kind of mind-twisting, and arithmetic coding is more so, but you can treat these as black boxes.) So if in twenty years you can't find a FLAC decoder, and you still lack the skill to write one yourself, go to your nearest college and have a first-year CS student put it together for you over a weekend in exchange for a case of beer.
posted by kindall at 8:45 PM on June 10, 2005


There are some good options for tape at the prosumer level now, including some from IoMega, who are the masters at taking pro formats to the consumer level

Errrrrch! Emergency brake, bro. The next individual who tries to sell me an Iomega storage solution can do so ONLY while simultaneously transferring files from dozens of iffy Jazz drives. When finished with that, he or she can move on to the Zips. I don't care if Iomega is selling water in the Sahara, the ice cubes will require an arcane 12-pin interface.

Add to that my drawer of VideoWriter 400k floppies (which include the system on each disk!), but I'll grind my Magnavox hatchet another day.

Great thread, btw. And, yes, I will forget that you said "prosumer".
posted by squirrel at 12:48 AM on June 11, 2005


AskMetaFilter: a mature, well-known algorithm.

Okay, fine. Delete me.
posted by squirrel at 12:54 AM on June 11, 2005


Errrrrch! Emergency brake, bro. The next individual who tries to sell me an Iomega storage solution can do so ONLY while simultaneously transferring files from dozens of iffy Jazz drives. When finished with that, he or she can move on to the Zips. I don't care if Iomega is selling water in the Sahara, the ice cubes will require an arcane 12-pin interface.

Read on in my paragraph -- I say as much at the end. I'm just a sayin . . . if you want to go low-priced but still want tape, there are options. And what's with this "Rev" format? Proprietary 35GB removable hard drives the size of a small brick and priced at like 3 bucks a gig for cartridges when half the planet now carries around 20Gb drives in their pockets and 80GB firewire externals can be had for $100?

Weird.
posted by realcountrymusic at 2:10 PM on June 11, 2005


There must be people out there looking at designing digital storage technologies with a mind to archival purposes...

A Reed-Solomon code is what you're looking for. It appends parity bits to provide error-checking and is widely used, including in CDs and DVDs.
posted by euphorb at 7:41 PM on June 11, 2005


Or the more general Forward Error Correction (FEC). We've been using FEC for years in satellite communications, where bandwidth is very expensive and thus everything gets pushed as close to the edge as we can get it without losing bits. FEC bloats the bandwidth that you use by a small factor, but fixes your bits for you at the receiving end (hence the name).

Recently I've been seeing FEC show up in non-satellite applications, like fiber transport of video, which requires the most obscenely high speeds yet virtually perfect transmission. Thus FEC has been coming into the picture (rimshot). Funny thing is, I thought FEC was everywhere already, but found that really it was just confined to my little satellite world. Perhaps it's only recently that processing power has reached the point where it can handle the high bitrates of video, fiber transports, and hard drive transfers.
posted by intermod at 6:41 AM on June 12, 2005


I just spotted this thread from the front-page link on Mefi, and thought I'd chime in a bit.

First, lossless audio is just that, lossless. If you have a stream of bits and compress them, on disk they will be different, but if you run them through the uncompressor, you will get the original bitstream again.

As others have mentioned, compressed audio is likely more susceptible to media damage, as errors have the potential to affect more of the file. In the wrong spot, in fact, a single-bit error could potentially destroy an entire losslessly-compressed file. However, there are methods of dealing with this, too.

One very good technology that's in very wide use is the PAR2 file. It breaks up a set of source files into chunks, performs a lengthy series of mathematical calculations, and produces parity files. Each parity file has all the information about all the protected files, and the checksums for each block in them. In addition, most of the parity files will have one or more recovery blocks. If you have a bad block anywhere in the original protected fileset, one recovery block will fix it.

This is used very commonly for posting to USENET, which is a rather lossy and unstable propagation mechanism. Folks post big files of all sorts, (many of quite dubious legality :) ) along with a bunch of PAR2s. It's quite common to lose some parts of the original transmission, but with the accompanying PAR2 files, it's easy to fix. The algorithm is incredibly good. If you have enough PAR2 parity blocks, you can recover the entire original files using only parity. Any surviving blocks of the original files will also be used.. the algorithm is very good at detecting usable blocks in damaged files.

In other words, if you start with 100 data blocks, after generating parity blocks, then any combination of 100 blocks, from the original data or from the parity data, can reconstruct the original 100 blocks. The more parity data you generate, the more damage your fileset can take. If you generate 100 parity blocks, then you can recreate the original fileset just from parity, without any access to the original files at all. PAR2 is EXTREMELY robust.

The program I use most for this is QuickPar. I do not know if this is the best or the fastest program out there, but I like it very well.

I'd be willing to wager large, large sums of cash you'd get enormoualy more robust CDs by extracting, compressing, and adding parity files to fill the freed-up space. Most lossless algorithms compress by about 30%. With a 30% PAR2 rate, the CD would have to be more than 30% unreadable before being unrecoverable. Any less, even 29% unreadable, and you'd get back a bit-perfect copy of your original.

Personally, I think you the archival zealots should be very focused on this kind of error recovery. Errors DO happen, it's guaranteed.... parity files are cheap insurance. And you will never, never get multigenerational error. If you test your data against its PAR2s each time you copy it to new media, you can be absolutely assured you have a bit-perfect copy. You could archive literally forever this way, as long as you never took more damage in any given generational copy than you can fix with the PAR2s you generated.

USENET, btw, is notoriously lossy, just a horrible storage medium, and a 10% PAR2 rate will nearly always allow perfect transmission.
posted by Malor at 11:15 AM on June 19, 2005


PAR/PAR2 use Reed-Solomon codes, I believe. They're quite robust and well-understood by the people who understand those sorts of things (not by me, I blew my mind trying to understand LZW).
posted by kindall at 12:46 PM on June 23, 2005


« Older Why's 'existential detective' funny?   |   Stuff in Paris Newer »
This thread is closed to new comments.