What's the best PHYSICAL way to preserve digital files for 50 years?
September 3, 2018 11:17 AM   Subscribe

Something I read once was about writing a letter to yourself and leaving it in an obscure library book for a few decades. I'd like to do that digitally. Like a personal time capsule. Load up 50 gigs worth of my data onto a flash drive and bury it. Only, IS it a flash drive? And how do you protect it? What's the best way to do this?
posted by rileyray3000 to Technology (25 answers total) 16 users marked this as a favorite
Library person here. Microfilm, hands-down. But don't take my word for it, see what NARA has to say.
posted by rachelpapers at 11:20 AM on September 3, 2018 [5 favorites]

Another library and IS person here. And yep.

Depends on the data (Images, texts, software...?)
Depends on what you mean by "preserve" (Do you want to be able to just view it? Edit it? Interact with it?)
Depends on what you mean by "protect" (From the elements? From societal collapse? Technical obsolesce?)

Not trying to be a pain in the a$$ but these are key questions. Digital preservation is increasingly referred to as "digital stewardship" in many circles -- you're going to have to pay attention to technical change and research needs and migrate formats and storage conditions as needed. It's a complicated space with no super clear answers.

It's not a "one and done" deal (as preservation microfilm might be - that stuff, properly stored, can last centuries).
posted by pantarei70 at 11:34 AM on September 3, 2018 [5 favorites]

The fact is, you can never predict where technology is going to go - so it's often a case of migrating to newer types of storage as time passes. On top of which, even if there were no problems using the flash drive, the file types (.txt .doc .png .jpg .mp3 .pdf etc) may not be readable in 50 years' time. So preserving and burying is really not much of an option when it comes to digital materials. You need to actually manage the material and make sure that it doesn't become corrupted over time. There are file formats that are more stable than others (pdf, for instance) but that doesn't mean you might not encounter problems with it in 50 years when you try to open it again.
posted by acidnova at 11:54 AM on September 3, 2018

The library thing is of course also goofy. Even if nobody checked out the book, there’s no guarantee they’ll still own it (and that’s assuming the library still exists).

The best things to bury and dig up later are things, not bits; bits require regular maintenance that stuff does not.

So microfilm is good because it’s a thing, not bits. Certain video and audio tape/film formats may also be attractive for that reason, assuming you prepare your chest properly and bury it deep enough.

The historical solution is to find things that carry meaning: locks of hair, ticket stubbs, maybe a handprint or toy or whatever else people stuffed into time capsules.

And actually, an iPhone or a flash card is a pretty good thing in that regard: don’t expect it to be useful in 50 years as anything other than a keepsake that helps you remember your life at present, because that’s basically all you can count on.
posted by SaltySalticid at 12:08 PM on September 3, 2018 [1 favorite]

the file types (.txt .doc .png .jpg .mp3 .pdf etc) may not be readable in 50 years' time
With the exception of PDF and DOC, that's not really a concern.

TXT: also called plain-text files; this is the simplest file format available for storing written text. If Forensics2068 can't read it, your bigger concern is the fact that civilization no longer exists.

DOC: While the original DOC format was a kludgy hack, and a person with no knowledge of it would have trouble, it's been ten years since Microsoft replaced it with the XML-based DOCX format; As long as the zip algorithm is still around (it's been around since 1989), Forensics2018 should have no trouble with DOCX files.

PNG, JPG, MP3: these files are so popular that it's easy to imagine some version of them still being in wide use in 2068. They're also all well-documented. There are documents, written in plain English, describing these formats; a savvy computer programmer could write a basic viewer from scratch if necessary. Again, as long as civilization is around, you'll be fine. MP3 is even patented, so theoretically, all of the information you need for it is in the US Patent Office.

PDF: I'd never store a document in PDF long-term. While there will undoubtedly be PDF readers available in 2068, it's intended for storing enough information about a document to print it, not to reread or edit it. You'd be better off saving the pages as individual PNG or JPG files.
posted by Hatashran at 12:36 PM on September 3, 2018 [8 favorites]

How about piggybacking on someone else's careful digital stewardship? In that case, you're betting that somebody like Gutenberg or archive.org are still around in 50 years, and you have to figure out how to make whatever-your-file-is into something worth preserving.
posted by Mogur at 12:39 PM on September 3, 2018 [1 favorite]

You should totally print out this question (using unfadeable inks on archive quality paper, I guess) and include it in your time capsule. Then probably laugh in 50 years' time about how naive we all are about how things will turn out.
posted by rd45 at 12:41 PM on September 3, 2018 [18 favorites]

You might be interested in the Rosetta Project, which basically created a version of microfilm on a disk of nickel for maximum longevity.
posted by ejs at 1:12 PM on September 3, 2018

Microfilm is great for images of paper, but how are you going to store digital data there? 50GB is enormous. Is there some robust image encoding people use for digital data?

The way I'd do this in practice is store the data "in the cloud". Right now I'd say it's a tossup between GMail (consumer friendly, but 50GB is a stretch) or Amazon S3. I would expect every ~10 years I need to migrate the data files to whatever is the new cloud storage option, so it's not a store and forget solution. Might store copies in several places at once for extra redundancy. The company Forever is advertising 100+ years if archival, I'd put them in the mix.

For physical media with a once-and-forget deployment, my guess was magnetic tape. But casual searching suggests magtape has a lifespan of 10-30 years, not enough. Same with CD and DVD-ROMS. M-DISC claims their media will last for 1000 years but that's a pretty obscure product.

I'd break this down into three questions.
  1. Which storage media can last 50 years
  2. Will I have equipment to read that media in 50 years?
  3. Will I have software to interpret that media?
Point 1 is solvable but awkward. I'm more optimistic on point 3 than many, I think the common digital video and audio formats will still be decodeable in 50 years. It's question 2 that worries me, although in part that depends on how hard your future reader is willing to work.
posted by Nelson at 1:50 PM on September 3, 2018

There are file formats that are more stable than others (pdf, for instance)

The file format may be kind of stable, the file data most certainly will not be over the course of half a century, when stored on a volatile medium like an USB stick or DVD. And the more complex a file is, the more devastating a few flipped bytes will be. Plain text will still be comprehensible even if words are mangled, a .doc, .pdf or .jpg can become totally invalid and unreadable if bit rot hits the wrong bytes.
posted by Stoneshop at 1:56 PM on September 3, 2018 [2 favorites]

Not finding a link but some of the best film restorations were due to what seemed primitive, three color separations were printed to paper for I think copyright reasons (early film was outright explosive). And forgotten in vaults. But were reconstructed with higher quality than about anything else.

Another thought:
Get a bunch of old smaller hard drives (smaller being generally more robust), test well, heavily duplicate for redundancy. Build a solar powered archive that on a 2-3 year schedule spins up the drives for a few minutes and then locks the arm. A few redundant computers that have a variety of data interfaces and simple software to read the discs.

Another bad thought: give it all to google.
posted by sammyo at 2:20 PM on September 3, 2018

Microfilm is great for images of paper, but how are you going to store digital data there? 50GB is enormous.

At some point in a discussion elsewhere on backup formats and media, someone pondered the required volume for transcription of 1.4PB on vellum. Couldn't help there, but stored on paper tape it would require 11.5 Albert Halls; one cubic meter would hold close to 1.5 gigabyte. And, the volume aside (and partially because of it), paper tape is actually quite robust and (when stored well) would certainly be OK after 50 years; we've read tapes that were at least 40 years old and as far as we know not stored under archival conditions, without problems. And reading it requires just what would be used for a camera in 2068, plus some image recognition software.
posted by Stoneshop at 2:32 PM on September 3, 2018

I forgot about paper tape! And I still remember the proper way to wind it between thumb and forefinger from when I was in the army! Yes, that would definitely work, and building a feeder becomes a simole mechanical task, right up until you get to the read head (which was pointed out would be whatever camera you have that still works).
posted by Mogur at 3:41 PM on September 3, 2018

Whatever you do, write down what you did, how to access it, when you last touched it, etc. -- and do so every month (perhaps when you pay bills, in the same place you track bill paying?) so you never lose track of it.

I say this because I composed a song as a pre-teen, my first composition, in an app called Music Composer. Decades later, I found a copy of the composition in the long-dead Music Composer format, and asked around on the internet until I found someone who was willing to convert it to a midi file, and they did...but I have, since then, forgotten where I put the midi file.

however, reading this askme did trigger my memory not only of the midi file, and of the composition's existence in the frist place, but also the actual composition, so I was able to re-hum all the parts into GarageBand just now, so there's that -- the brain is quite a storage device!
posted by davejay at 3:57 PM on September 3, 2018 [5 favorites]

Depending on the experience I was going for, I don't think sending yourself an email via Gmail would be a terrible way to gamble the longevity of some information. This versus something physical and not well maintained would be a no-brainer for a 10-15 leap into the future. Barring the apocalypse, I'd gamble on a JPG in one of my various cloud services making it easily into the 22nd century—With or without my intent, as it may be...

I transfer a lot of video tape for people and tell them the best way to make sure it's there for future family members is to simply get those MP4s onto as many other computers as possible. Old folks still want to get their hands on DVD's, but I don't know that there's going to be a simple way for grandkids to access those discs in 30 years. We're already losing magnetic media at a huge rate, and I would guess historians look back on the video tape era as similar to the silent/nitrate film years. A lot of Boomers still have boxes of VHS tapes in their garages or attics that will NOT be playable or are so degraded as to not be worth the effort once things make it to the next generation. 8mm film sure, but '80's Betamax? Probably not, at least not for normal folks.
posted by Nosmot at 5:09 PM on September 3, 2018

I always recommend revisiting an archive every five years and translating older versions to newer versions. That applies to the hardware and to the software. If you decided 20 years ago that archiving data files to a Zip drive was the way to go, look where you would be now. And even the standards change over time. i recently had a very difficult time dealing with .DOC files that were created just 15 years ago.
posted by megatherium at 5:19 PM on September 3, 2018 [1 favorite]

This is something that's talked about in every library school. Basically with digital files if your goal is to maximize the chances of content still being accessible, the best bet is to store a bunch of copies of your digital content on all your computers, on a bunch of cloud services, on flash drives, in all your email accounts, etc. On top of that, migrate the data to new file types as they come out and save THOSE everywhere you can think of too.
posted by rabbitrabbit at 8:18 PM on September 3, 2018

So the original idea was to take my journal, all the creative documents, pictures, voice messages and videos of mine I'd created at this point in my life and put them on a USB or some kind of industrial hard disk (if such a thing exists). As much as microfilm would be great for some of those things, it wouldn't be for others.

But as much as I'd like to put the entirety of my personal mementos on google, I'm already wary they get too much of my data without having my journal up there too.

I mean it's crazy to think a laptop, preserved, would be able to boot up again in 50 years right?
posted by rileyray3000 at 9:54 PM on September 3, 2018

I mean it's crazy to think a laptop, preserved, would be able to boot up again in 50 years right?

posted by aubilenon at 11:18 PM on September 3, 2018 [4 favorites]

Any attempt at long-term preservation of digital information in static physical form is working against the grain of digital information. The entire point of information being digital is that it becomes replicable without loss; therefore, the way to avoid losing it is to replicate it.

Physical media do not contain digital information. Physical media contain structures that can potentially be interpreted as digital information. And the higher the density at which digital information is packed into any given physical medium, and the more time elapses between writing the information into the medium and any subsequent attempt to read it back out, the more susceptible those structures become to misinterpretation.

Misinterpretation can happen in all kinds of different ways. In order for a physical structure to be interpreted as a digit, it needs to differ in some way from other similar structures nearby. Perhaps it's got more electrons stored in it, or perhaps it's magnetised differently, or perhaps its a different thickness, or perhaps it's chemically distinct, or perhaps it's more or less transparent to some wavelength or other of light. It doesn't really matter what physical property is chosen; the iron rule is that the higher the information density, the physically smaller the structures representing its digits will need to be, and the more susceptible they will become to random thermal degradation over time.

Then again, we can simply forget how to interpret any given medium. Media get continuously superseded by denser media over time, and the machinery required to pull information out of long-obsolete media can simply become unavailable as old machinery wears out and replacements are no longer being manufactured.

And even if we remember how to interpret the physical media as digits, there are endless ways to forget how to interpret the digits themselves as information that means anything at all to a human being.

Our entire digital information ecosystem is now built on physical media that are designed to last for years, not centuries. Because as it turns out, manufacturing immortal physical media is way more expensive, at least up front, than making endlessly perfectly replicable digital information immortal. And as a society we're really not very good at choosing to incur heavy expenditure up front even for causes way less frivolous than the personal handing-on of 50GB of idiosyncratically curated data.

a USB or some kind of industrial hard disk (if such a thing exists)

Sure they exist. Countless millions of them exist. Are any of them designed to preserve data for 50 years without being touched? No.

So let's think about microfilm.

Just to get a handle on this, imagine turning your 50GB archive into QR codes and storing those on microfilm. This is a terrible idea for multiple reasons (the entire point of microfilm is that all you're supposed to need to do to retrieve human-readable information from it is magnify it) but let's stick with it as an exercise.

Like all my best (?) ideas, this one was long since thought of by somebody else, who arrived at a cost estimate of about $100/GB. Their proposed workflow strikes me as highly dubious, but it's a fair bet that microfilm storage of raw digital information could in fact be achieved for somewhere between a tenth of that and ten times it.

The bit density of microfilm is low compared to that of modern digital media, which makes it more resistant to per-bit degradation if properly stored. But actually arranging for anything to be properly stored for 50 years is expensive in and of itself.

Instead of paying to guard a physical thing against heat and moisture and rot and fire and theft and rats and careless excavator operators and simply being forgotten about for 50 years, you'd be far better off paying for something that reliably preserves the information itself by keeping it moving onto the media du jour, with redundancy and checksums to make sure it all stays available. Because 50 years down the track it's not going to be any easier to protect your stash of microfilm, but 50GB is going to be such a laughably tiny quantity of information that the actual preservation of that many bits will involve only trivial amounts of time and money.
posted by flabdablet at 12:26 AM on September 4, 2018 [4 favorites]

Also, in general, the more you care about any given chunk of information the more sense it makes to keep several copies close to you.

If you want to go the flash drive route, blow your 50GB out to 64GB with Par2 and make five copies of it onto 64GB micro SD cards; tape one to the inside of your wallet, sew another into the lining of your favourite jacket, put another in an envelope and stick it in a drawer, leave one in a safety deposit box and bury the last one in a screw-top jam jar packed with silica gel cat litter.

Every year buy a new set of SD cards, disk-image the old ones onto the new ones, checksum the images to verify that they're all still identical, and put the new ones back where the old ones used to be. You can keep the old ones around if you want to check how long past their expected five year lifetime they actually do last before becoming altogether unreadable or yielding the wrong checksum.

This will initially cost you about $150 per year, but that will come down over time. 64GB is currently quite big as micro SD cards go, but in ten years they will be giving them away as promo schwag.
posted by flabdablet at 4:45 AM on September 4, 2018 [1 favorite]

I am in agreement with flabdablet, above: find a place where the people are in the business of preserving & sharing data, and give them your stuff.
Instead of paying to guard a physical thing against heat and moisture and rot and fire and theft and rats and careless excavator operators and simply being forgotten about for 50 years, you'd be far better off paying for something that reliably preserves the information itself by keeping it moving onto the media du jour...
For example, the ibiblio.org servers exist for people to share data long-term, so when I finished the first draft of a project to learn about my grandpa's WWII service, it went there in HTML & JPGs. They keep the servers running, and will migrate the data forward; I just have to check in on it periodically, and make sure they can reach me.
posted by wenestvedt at 6:36 AM on September 4, 2018

Our entire digital information ecosystem is now built on physical media that are designed to last for years, not centuries.

Which is actually a fair decision if you have to deal with permanently-accessible data, even if a percentage of it has no such requirement. Building online storage that's designed to last longer than two or three equipment generations is plain wasteful; you're not going to benefit from any savings in energy use, physical size and speed until your next equipment refresh. This holds true to some extent even for fully offline archives: if your tape vault can hold, say a thousand tapes, then tapes that hold four times as much can become a enticing proposition when your vault starts nearing capacity, instead of having to build/rent a second one. Even if that means migrating all that data from the old tapes to the new ones (not a bad action anyway as you want to check the tapes being readable on a somewhat regular basis).

Designing and building long-term digital archival systems is totally unlike any generic computing and storage requirement.
posted by Stoneshop at 8:43 AM on September 4, 2018 [1 favorite]

To provide a different framing for the same thing everyone’s saying: none of the technologies you might consider using have been in use for that long. Nobody in the world has ever successfully stored e.g., a flash drive for fifty years, and if you actually care at all about your data you aren’t going to try to use it to pioneer this stuff. We know some things that will go wrong but not all of them.
posted by aubilenon at 2:03 AM on September 5, 2018 [1 favorite]

Keep an eye on DNA tech as well. As some point you're going to be able to get a permanently self-replicating colony containing millions of copies of your 50GB up under the rim of your toilet bowl.
posted by flabdablet at 6:30 AM on September 5, 2018

« Older Name this plant based   |   Best island(s) to visit in the Greek... Newer »

You are not logged in, either login or create an account to post comments