How can I get offsite backup for the 5gb data a day my library volunteer role generates?
August 17, 2012 3:24 AM   Subscribe

Help me back up offsite the huge files I'm generating in my new voluntary role (c. 5Gb per session)!

I'm volunteering for a library that has a very large quantity of author talks and readings on cassette tape. My job is to digitize these by creating an archive of .wav files. Using Adobe Audition, I'm storing one copy of each file on the Mac's hard drive, and one copy on a Western Digital 2-terabyte backup drive. Both of these are onsite, which is obviously not ideal. But I'm generating c. 5Gb of data every day, and I'd really appreciate some advice on how to store this offsite, away from fire and theft, in a way that works for the whole organization. Should I save everything onto a DVD at the end of the day, label it and take it home? (I have access to archive standard DVDs). Buying space on the Cloud wouldn't be a problem, but uploading so much might - any tips to make this easier? I don't want to clog up the pipes for other staff. Or am I missing something very obvious?

(The .wav files are to be reference recordings from which .mp3s will be created for download from the library's website in the fullness of time. I might be able to shift to other lighter lossless formats if the experienced heads here think that could be a way forward).

All advice very gratefully received!
posted by pyotrstolypin to Computers & Internet (11 answers total)
I'd start by encoding to FLAC, which is non-lossy and should (at least) halve the problem. Now you have a small enough file that it will fit in a standard DVD.

For off-site backup I'd break the file into chunks (with, say, winRAR), and fill out the the rest of the DVD with PAR2 files. Then I'd burn at least two copies of the disc.

Parity files are a bit of a magic trick - lets say you've got 100Mb of data split into 2Mb chunks, with another 10Mb of PAR2 files on top. You can lose 10% of those files - any 10 random chunks - and still be able to reconstruct the original file. The parity files will fill in the blanks for you.

For on-site I'd just get a couple of 3TB drives and mirror them in case one fails. Cost would be well under $300, and you'll easily store 4 years of work on there, in FLAC.
posted by Leon at 3:53 AM on August 17, 2012

Voice recordings are the kind of stuff that can be compressed far more than other audio and still be useful; with a codec tuned for voice you can shrink it down to almost nothing and still have pretty good fidelity. I would say that you should at the very least make really small footprint versions of the files and put them onto some cheap, high-redundancy cloud service immediately, maybe to several different services, so that you'll have those no matter what happens to the multi-GB master recordings.
posted by XMLicious at 4:34 AM on August 17, 2012 [1 favorite], automatic unlimited backup for cheap. (I use it)
posted by blue_beetle at 5:48 AM on August 17, 2012 [2 favorites]

Yes, I would first question the assumption that you need a lossless copy of these recordings. You could do a low level compression (I.e., something nit compressed very much) and the would likely be as crisp as lossless without takingbup much space.
posted by OmieWise at 6:21 AM on August 17, 2012 [1 favorite]

A requirement to upload multiple GB per day over the internet is problematic in a lot of workplaces, and DVD burning is slow. Hard disk drives are cheaper per gigabyte than DVDs, take up less physical space, and are much faster. I'd use a rotating set of portable 2TB backup drives.

Call the backup drives A, B and C.

First day: A is at work, B and C are at home. When you go to work, take B with you. Do your thing, back it up to A. At the end of the day, update B from A so they're the same, then take A home with you.

Next day: B is at work, C and A are at home. When you go to work, take C with you. Do your thing, back it up to B. At the end of the day, update C from B so they're the same, then take B home with you.

Next day: C is at work, A and B are at home. When you go to work, take A with you. Do your thing, back it up to C. At the end of the day, update A from C so they're the same, then take C home with you.

Now you're back where you started.

As soon as any of your backup drives shows any sign of going bad, toss it and immediately duplicate the most recent backup onto its replacement. Keeping one spare on hand at work will make this more likely to happen.

The only way to lose work is if work burns down and you crash your car on the way home, or work burns down and your home burns down after you get there.
posted by flabdablet at 6:43 AM on August 17, 2012 [4 favorites]

Just use one of the countless online backup options out there already. Crashplan, mozy, whatever. Anything else is going to be suboptimal.
posted by toomuchpete at 7:07 AM on August 17, 2012 [1 favorite]

I'd start by encoding to FLAC

Are you ever going to edit your files in a common audio editor? Then don't encode them in FLAC. You'll punch yourself later for having added to your work load. Instead rely on big storage and big pipes.

I'm part of a team that produces a national radio show. We have more than a few terabytes of data after five broadcast seasons. What we do:

--The engineer and editor records everything to two drives at the time of recording in our rented studio. One drive stays at the studio temporarily (backup 1). He'll clear the session out during another visit but it stays there until gets the second copy to his own studio (backup 2).

--At his own studio, everything, including raw files and his working ProTools files, is automatically mirrored to a third drive every night. That means WAV, session files, and various MP3 and WAV exports (backup 3).

--All of the rough edits are uploaded to a private FTP server, which means it doubles as an online backup. Unfortunately, they're MP3 but at 192 mono so it wouldn't be too terrible if we had to resort to re-editing from these. They're exported directly from a ProTools session. (backup 4).

--All of his final edits are also uploaded to the private FTP server, from where I download them. They are in multiple formats: WAV, MP3, MP2. (backup 5).

--I store a copy on my external drive, which is then backed up to CrashPlan. Whether or not you can upload 5GB a day depends almost completely on your network's upload speed. Usually it's much slower than you think it is. But my home cable Internet service can easily upload 15GB a day; CrashPlan precompresses, then sends to the server. (backup 6).

--Once a year he delivers DVDs to me which contain everything he would need to re-edit all episodes (or create Frankenstein or best-of versions of multiple shows together). I store those on a shelf. They are a last-ditch resource, the kind of thing we'd only need if there were some kind of electromagnetic burst or a fire swept the whole county and happened to get his house and the studio but left my place untouched. (backup 7).

So that's multiple iterations stored in unique locations. That's the only way to do it.

I come from an IT background myself and had urged this sort of thing from the start, but like nearly EVERYBODY it took a massive drive loss before he'd do multiple redundant storage locations. There are some episodes for which we no longer have raw files, or edited files, or both, because the drive recovery was only partially successful.
posted by Mo Nickels at 9:04 AM on August 17, 2012 [1 favorite]

Actually, that should be eight backups, since my external drive is also one.
posted by Mo Nickels at 9:05 AM on August 17, 2012

Amazon S3 charges for usage only (no minimums) and costs somewhere in the neighborhood of $0.10 per GB per month (, so it's a great option for cloud storage. The popular Mac FTP client Transmit speaks S3 and you could be doing your uploading at night to minimize network impact.
posted by migurski at 10:14 AM on August 17, 2012

Someone already suggested FLAC for lossless compression, which is a great idea (or ALAC if you prefer). Then, once you have less data to deal with, upload it offsite somewhere. There are a bunch of cheap backup services. I've been using backblaze at home with pretty good results... it's $5/month/computer flat, which works well for your situation. I think their utility lets you schedule backups, too, so you can restrict it to uploading after hours.

Alternatively, there're always DVD-Rs, but then you're stuck keeping a pile of them at home, which isn't the best from a home cleanliness point of view.
posted by tabacco at 3:58 PM on August 17, 2012

Many, many thanks for all these great answers - I've pretty much taken something from everyone in what I've decided to do.
posted by pyotrstolypin at 10:19 AM on August 18, 2012

« Older What are the best hardware solutions for a good Go...   |   Where to get powdered paint from? Newer »
This thread is closed to new comments.