store the big stuff
June 14, 2008 6:41 PM   Subscribe

Long term storage of digital media that is too big for DVD's... what's best practice?

I have about 10 hours of VHS tapes that I have converted to AVI files. The shortest one is one hour long - so the size of these things is immense. Altogether I'm looking at about 150gb of media, and I can't break them into pieces any smaller than about 15gb.

What if I really really want these things to be around in 20-30-100 years? What's the best storage strategy? What are my options?
posted by crapples to Computers & Internet (21 answers total) 6 users marked this as a favorite
 
Best answer: There is no economic medium out there to store this data in a format that will be useful in several decades.

Your best practice would be to back up this media in some redundant way (say to two different hard drives kept in two different locations) with the goal of finding some new medium in the near future.

That means you'll repeat your efforts over the years, but as someone who has gone from floppies, to DAT tapes, to CDs, to DVDs... I can say it's a never ending struggle.
posted by wfrgms at 6:55 PM on June 14, 2008


15GB for an hour is pretty insane. I suspect these are uncompressed AVI files (which isn't bad for being able to read in a decade). The best thing to do, though, would be to remaster these files as a proper DVD Video disk. You should be able to get about 2 hours on a single layer DVD, and DVD players will exist (or at least the specification will surely exist) for decades to come, so they'll be readable by something at least. (And DVD video is just MPEG2 and some extra bits for menus, chapters and such).

Barring that as an option, AVI files can be split safely with many tools, and you can then bundle the disks together, and number them so they're played back in order.

The least portable option would be to compress these files in something less portable and common than MPEG2, preferably something of a major standard though. This would reduce filesizes, but will likely be far obsoleted by the time a decade rolls by, nevermind a century.

And it's important to note that no optical medium today should be trusted blindly to last decades - Sure, they're advertised to last that long, but I'd probably still re-back them up every few years to whatever the modern standard is to ensure reliability and readability down the road.
posted by Rendus at 6:59 PM on June 14, 2008


Sounds like they're uncompressed, or at best losslessly compressed (e.g. HuffyUV, etc). While it does incur something of a loss in quality, lossy compression like one of the MPEG standards (i.e. MPEG-2, MPEG-4) or DV is probably the best bet for long-term storage, primarily because they're a) well-documented genuine international standards which are currently extremely portable and ubiquitous, and b) small enough to fit on media that's cheap and common today.

You should easily get ~1hr on a 4.7G DVD at bitrates approaching 9Mbps, the highest allowed in the DVD standard. Remember too that normal home VHS only has a vertical resolution of ~240 lines, so for best quality you can get away with encoding at 352x240 and dedicate more bits to encoding all the inevitable VHS noise (which is the real quality-killer in VHS -> digital transfers). At that sort of bitrate and/or frame size, the encode is going to be near-indistinguishable from the original source.

I'll leave it to others to comment on the actual storage medium to use, except to note that you're not going to find something readily available that is guaranteed to last 100 years (or even 30 or 20). Be prepared to have multiple copies of each, use some decent checksumming/error correcting method, copy the originals to new media every few years, and convert the oldest-accessible format to the then-current encoding format (whilst keeping the previous formats on the new media) as often as necessary. Be prepared to set up an endowment to pay for this, or educate your offspring in the importance of folk & family history ;-)
posted by Pinback at 7:11 PM on June 14, 2008


Best answer: 15GB for an hour is pretty insane.

I don't think that's insane if we're talking long terms storage for archival purposes. I'd argue that if you're talking about keeping stuff around for decades, you'll want to keep the media as high quality (uncompressed is best) as you can. The quality is likely to be significantly lower than what's available decades from now, so I see no need to further reduce it when hard drive space is relatively affordable these days.

I think a couple of hard drives, each in a different location (as recommended above), would be the easiest thing to do. Revisit every few years and upgrade to a more convenient or better storage option as new technology becomes available.

That being said, if you'd like to access the media on a fairly regular basis (let's say more than once or twice a year), having a compressed version on DVD makes a lot of sense.
posted by dhammond at 7:16 PM on June 14, 2008


If they're copied from VHS, MPEG2 compression is not going to lower the quality much at all. Making standard DVD format disks makes sure they play on any DVD player, which will help. If you're serious about saving them for decades, you'll need to have multiple copies and probably a couple of hard drives with the data as well, and have someone check it every couple of years to make sure all the data is still readable.
posted by demiurge at 7:19 PM on June 14, 2008


DVD should never be used for long term storage as the media degrades over time.

Use backup tape.
posted by mattoxic at 7:22 PM on June 14, 2008


What if I really really want these things to be around in 20-30-100 years? What's the best storage strategy?

Forgetting about the specific type of data for a minute, I'd recommend that you plan for the next 5 years or so, then revisit the plan periodically as new storage media appear. And for the next five years, I think that redundant disk storage at multiple physical locations is your best bet. This could be fairly cheap, too, as consumer-grade 1TB disks are available for less than $200 apiece.
posted by me & my monkey at 7:40 PM on June 14, 2008


Response by poster: Thanks a lot for the feedback. To summarize: One way or another, the answer is really to forget about storing anything for a century. Instead, store it redundantly for 3-5 years, then revisit and re-store it in the best way available at that time.

That's quite good advice. I think I'll come up with a plan like that. Thanks.
posted by crapples at 7:43 PM on June 14, 2008


Ahem. I know uncompressed AVIs are typically 13GB/hour (you can get more if you do something crazy like HuffYUV as a codec), but you can do this.

One can purchase MAM-A gold archival DVDs, 4.9 GB, which are rated for 116 years. Last I looked, a spindle of 50 ran me approximately $85. Belkin just came out with 25 GB single-layer Blu-Ray recordable discs, which they are touting for 200 years, at about $25/disc. So, you have the possibility of storage media.

Additionally, there's a skillion utilities to split up a file and then "span" it across various discs. You can later rejoin files. If you are very, very paranoid, you could try computing checksums with a couple of different methods, then storing those, and having two or three copies of each. You could also look at .par2 files for your checksumming needs.

This is doable.
posted by adipocere at 9:54 PM on June 14, 2008


I'd use Blu-Ray discs and a burner, being sure to make more than one copy of each, and store them in different locations... and then follow wfrgms idea of revisiting every 5 years or so.

The cheapness of blank discs more than makes up for their delicacy. Make FIVE copies, for crying out loud.

DVDs and Blu-Ray discs are definitely not forever, but they're far, far more resistant to hot, cold, moisture and vibration than any hard disc or tape.
posted by rokusan at 12:56 AM on June 15, 2008


DVDs and Blu-Ray discs are definitely not forever, but they're far, far more resistant to hot, cold, moisture and vibration than any hard disc or tape.

And far, far easier to damage, less easy to recover when said damage occurs, and will naturally deteriorate over the next few years. If you get a printed disc (an album or movie), you can extend that number by about a factor of two.

The problem with digital archival storage is that the very nature of digital data is such that we are constantly inventing faster, better, cheaper methods to store it. Which means new hardware every decade or so (I don't mean new hard drives--I mean entirely new storage platforms). That means you need to refresh your data every decade. If you have a lot of data, this can take a while.

I used to store everything on floppies and tapes. Then CDs. Then DVDs. When I last transferred my media collection, it was somewhere in the vicinity of 800 DVDs and CDs. Do you have any idea how long that shit took to copy? It took forfuckingever. And naturally, some of the CDs and DVDs had inferior dyes that caused CRC errors, which meant I had to find replacement data (re-ripping audio tracks from borrowed CDs, re-downloading movies, etc.)

Do you think I ever want to go through that hassle again? No fucking way.

That's why I use hard drives for now.

There are four principal benefits of hard drives: they're small (don't have to keep bookshelves worth of jewel cases or CD wallets), they're fast (backups can be done in a matter of hours as opposed to weeks), they're cheap (backups can be done inexpensively) and plentiful (the technology will be around for at least another decade). They are also far more reliable (when you take RAID into account).

Right now I've got approximately 12 TB of storage accessible at my fingertips. An hour ago, one of my hard drives failed. My RAID setup automatically detected the failure, alerted me, and began rebuilding the data to a RAID swap drive. An hour later, the partition is rebuilt and the drives are humming along like the problem never happened--and all this was handled transparently. Show me that kind of reliability with CDs or DVDs.

Now when my friends want to borrow some music, they can just bring over a portable harddrive or a USB memory stick and I can dump several gigs of data in seconds. With DVDs and CDs, it took me longer just to find the discs I was looking for.
posted by Civil_Disobedient at 4:09 AM on June 15, 2008


They are also far more reliable (when you take RAID into account)

This is true, but it does tempt you to take it into account for more than it's actually good for. 12TB is quite a lot of eggs to have in one basket, and RAID sets do fail; and when they fail, it's often expensive or impractical to recover their contents. A RAID array with N redundant drives that suffers N+1 drive failures tends to damage data in much the same way as a single drive with one failed head - the damage is distributed through the data like the holes in a Swiss cheese.

If you really, really don't want to go through that hassle again, you need to be periodically backing up that 12TB RAID set to another identical set that's normally stored powered down, preferably in a different building.
posted by flabdablet at 5:17 AM on June 15, 2008


One way or another, the answer is really to forget about storing anything for a century. Instead, store it redundantly for 3-5 years, then revisit and re-store it in the best way available at that time.

That's exactly right. You'll probably find that "the best way available" consists of making an exact bit-for-bit copy of your existing archive on some new medium (it will likely take up only a small fraction of the new medium's capacity) and then doing any transcoding you need to do in order to update away from file formats whose support appears to be on the wane. Keep the programs and scripts you use for the transcoding process in your archive along with the data.

For my money, the best media currently available for this kind of approach are, as civil_disobedient suggests, hard disk drives. Their data transfer rate is better than any other current medium, and if your strategy involves periodic copying of a large archive, that's important. They're also the cheapest storage currently available per gigabyte, and their unit capacities are conveniently large.
posted by flabdablet at 5:36 AM on June 15, 2008


This might be a bit out of place but I find Amazon's S3 service surprisingly cheap. No idea for how long they'll be around and how feasible it is to back it up thru the internet to Amazon's servers but my instincts tell me that they would be a lot more reliable than anything a single user can do.
posted by the_dude at 9:34 AM on June 15, 2008


S3 charges $0.15 per gigabyte per month, so archiving 150 gigabytes with them would cost $270 per year. You should be able to buy three 320GB hard drives for that, and those should last at least ten years if they're spun up once a month or so.

Actually getting data on and off S3 is also much, much slower than hard drives, and involves additional expense.

S3 (via something like Jungle Disk) is fantastic for storing the kind of stuff you'd normally back up to a USB flash drive, but it's not a great choice for archving raw video.
posted by flabdablet at 10:52 PM on June 15, 2008


12TB is quite a lot of eggs to have in one basket, and RAID sets do fail;

Actually, the 12TB is distributed between three subsets of RAID-5 arrays, but yes, you're right about avoiding the "multiple eggs, single basket" problem--a well-placed fire or flood would make quick work of my server, redundant drives or no.

But that goes for any archival method. And--as you point out--the benefit of using hard drives is that they have orders of magnitude more bandwidth to transfer data than any other (consumer) storage medium. So when you do decide to back up dozens of terabytes of data, you can do it in hours instead of weeks (or months) and it won't cost you a fortune.

and their unit capacities are conveniently large

That's the other huge selling point for hard drives: fantastic data density. It'll take you approximately 100 DVDs to match a 500 GB hard drive in storage capacity. That's either 3 feet of jewel cases stacked, or one large DVD "wallet" (~2"x12"x12")--keep in mind, if you use the "wallet" you lose some reliability points due to increased in micro-abrasion. Also keep in mind that, unless you've got 100 DVD drives all connected to your computer, you don't have instant access to the contents of those files without additional effort.

Now, if you want to copy those 100 DVDs... figure with error-checking turned on, that's about ~15 minutes per DVD--more than a full day (non-stop) baby-sitting your computer, swapping disks, etc. No thanks!
posted by Civil_Disobedient at 9:54 AM on June 16, 2008


DV is roughly 12 or 13 gigabytes per hour, & that's what this likely is. Actually, emitting the video out to DV tape is not a bad idea - not for primary backup, but just in case. It would be compact.

150 gigabytes is only roughly 30 DVDs, split up & spanned. That's about 7 or 8 hours of burning. Setting up error correction can take longer, but that can be done unattended.

I've made relatively high bitrate xvid encodings (I followed the Google Video recommendations, originally 2 meg/sec I think but less with denoising) & put them up on S3 for some peace of mind. At my encoding rates, I think that would be 6 or 7 gigabytes, so, a couple of bucks a month.

But, yeah, yay for par2 & for checking on stuff every few years.

Civil Disobedient: If you leave out jewel boxes & just use spindles, a hundred CDs/DVDs would be a stack only a few inches tall. Offline storage is also less vulnerable to power surges, virus attack, & disgruntled employees.
posted by Pronoiac at 11:02 AM on June 16, 2008


If you leave out jewel boxes & just use spindles, a hundred CDs/DVDs would be a stack only a few inches tall.

Sure. It would also be...
  1. Completely inaccessible ("Oh, you wanted to watch Office Space? I think that was somewhere near the middle of the stack...")
  2. Begging for data loss due to scratches--unless you never touch them, in which case, see point #1.
  3. Slow as molasses if you're burning them for archival purposes ("No, Virginia, you can't burn at 16x if you want to ensure a stable burn.")
  4. Slow as molasses if you've turned on data verification for the burn (only an idiot wouldn't)
  5. Can't hold more than 4.7Gb files unless you split them up. (Once again, I direct your attention to point #1)
Hey, if your time isn't worth anything to you, maybe you don't mind spending 25 hours to back up a single hard drive to DVD, only to risk losing data because you didn't have enough shelf space for your jewel cases and instead opted to stack them up like Belgian Waffles on a spindle.

Maybe you're more cutting edge, and want to use BlueRay discs? Cool beans. That'll be $500 for the drive, and $300 for the 10-pack of 50GB BlueRay discs. Or you could get a hard drive, which has better data access, faster data access, and safer data access for... $79.
posted by Civil_Disobedient at 3:58 AM on June 21, 2008


Are you taking this personally?

I'm looking at this as offline backup, not even offline storage - as a fallback when something fails, not a movie collection.

In the original post, the question is about 150 gigabytes of data. Realize that setting up a RAID with hot swap is overkill. This is a two or three inch stack of DVDs. You're looking at your own solution, eight times larger - requiring multiple hard drives, & it's in constant use. That's awesome - really - but a completely different problem.

Personally, I mistrust hard drives a bit since a friend with RAID hooked up two drives per controller, then a controller went out, taking two drives with it, leaving a large broken RAID full of unreadable gibberish. Single hard drives can get flaky or just die. Even with RAID, well, I already mentioned some problems in my previous comment.

The error detection I was talking about wasn't the burner's re-reading with different laser power levels, by the way - is that what you meant by data verification? I use par2 files & md5 checksums, I test the burns, & I keep the par2 & md5 files on a hard drive so I can search quickly for a given file. This makes testing with another DVD drive easier. par2 files can even scan an image of the DVD if the table of contents becomes inaccessible, & the directory's illegible.

Long-term, DVD currently looks more accessible than Blu-ray, which is comparatively unproven & far less numerous - no friends of mine have a Blu-ray burner yet.

Eh, whether the poster uses an extra hard drive or a stack of DVDs, as long as there's a backup, that's a victory. Fight entropy!
posted by Pronoiac at 4:07 PM on June 21, 2008


I'm looking at this as offline backup, not even offline storage - as a fallback when something fails, not a movie collection

I think c_d's point is that hard disk prices are now so low that the purchase price of the backup media is no longer the dominant cost to think about when you're looking for backup solutions.

The $79, 500GB hard disk that c_d linked to works out to 16 cents per gigabyte. By way of comparison, a 100 pack of 4.7GB DVD+R from the same supplier is $25, or 5.3 cents per gigabyte.

Although the hard disk is three times as costly per gigabyte as the stack of DVD's, we're only talking $54 in total. The hard disk will be much faster and much more convenient to transfer backup data onto, and this might actually make the difference between doing backups and not doing them.

On the other hand, a single failure in your DVD set will lose you at most 4.7GB of data. A single failure in your hard drive could lose you the lot.

On the other other hand, hard drive reliability is streets ahead of optical disc reliability.

Personally, I use two different hard drives and alternate my backup sessions between them.
posted by flabdablet at 10:19 PM on June 21, 2008


In the original post, the question is about 150 gigabytes of data.

The original post was asking about long-term storage. What starts out as 150 GB soon becomes 200, then 300... As someone who's spent weeks archiving data off of CDs and DVDs on to hard drives, I'd like to save anyone else the hassle in the future. There is simply no comparison between having to nurse your computer for a week and being able to click a button and get a perfect backup in a couple of hours without any further user intervention.

The other reason I recommend hard drives is because of this statement:

Altogether I'm looking at about 150gb of media, and I can't break them into pieces any smaller than about 15gb.

That means you have to RAR the AVI files, split them up, and generate PARs (tack on another bunch of hours). And if you ever want to actually use the data, like to watch one of those movies... you have to spend another bunch of hours copying the DVDs over to hard drive, then decompressing them.

It just seems silly to involve the extra steps when all you're going to do is put it on a hard drive in the end anyway. Silly, orders of magnitude more effort, and you're not saving anything--not money, not security, and obviously not time.
posted by Civil_Disobedient at 4:53 PM on June 24, 2008


« Older What is this song?   |   Cookies without eggs? Newer »
This thread is closed to new comments.