Help me backup my system.
July 8, 2008 9:39 AM   Subscribe

Data Backup Filter: I have about 6TB (yes TB, not GB) of data. I need to back it up BEFORE it crashes. I also need a good way to share it on the network. How can I do this? Details within...

Yes, this topic revisited again. None of the questions asked so far have helped in my situation.

It seems the era of digital video and digital audio has created a storage nightmare for me. I have 6TB of data spread across 5 external hard drives (3 500GBs and 4 1 TBs, not all full). They are all currently connected to a Windows PC.

I have 1 blank 1TB drive which I just attached to my Mac Mini as I'm trying to retire my Windows PC and migrate all data to the Mac FS so I can run it off my Airport Extreme's USB port (the Airport Extreme won't read NTFS).

I've looked at "mirroring" my external drives nightly using a backup program but this does not safeguard me in case of electrical storm, fire, angry wife, etc. Besides, there must be a limit to the number of external drives I have...I'm running out of outlets quickly.

But I cannot imagine putting all of this information on DVDs, the time and expense would be insane. Online backup systems are not optimal as there is sensitive personal data in the files and I don't trust the privacy of online backup solutions, but the cost for 6TB is prohibitive.

I do have a lot of "redundant" files that I am attempting to clean up (previous hard drive crashes in my Windows PC resulted in panicked "copy off all my files!" moments, and as such I'm sure at least 1TB of my data is redundant, but the time it's taking me to sort this out file by file is crazy). Are there any programs (Windows or Mac) that can find these and only back them up once, saving me on DVDs?

Do I have any other options besides "buy a lot more external hard drives"?
posted by arniec to Computers & Internet (28 answers total) 14 users marked this as a favorite
Mozy Home Unlimited is $5/mo. Admittedly tho, 5TB would be take a loooong time to upload.

Mozy also allows you to choose your own encryption key, which keeps your data private. PC and Mac - it's pretty nice.
posted by unixrat at 9:50 AM on July 8, 2008

Response by poster: You know, even if I only back up my MP3s and such, $5 a month for unlimited would be good. If I kept the MOST sensitive information burned to DVD and backed up the could work.

Not a bad solution unixrat! Anyone else have one that doesn't trust my data to a third party company? (with the recent ruling for Google to turn over all records to Viacom I just simply don't trust any amount of "privacy" a company offers...)
posted by arniec at 9:58 AM on July 8, 2008

I'd probably go with multiple NETGEAR ReadyNAS units, all with 4x1TB drives in them, set to redundancy. I'd have the units on UPS with line conditioning.

I'd sort out the duplicate files before backing up though.
posted by sharkfu at 10:01 AM on July 8, 2008

6TB means ~8 drive bays which is into server territory, and you really need to do it twice if you want to actually have a backup (rather than just a consolidated store).

NewEgg will gladly sell you the parts if you're technical or something like this HP if you can stomach the price. Ebay will almost certainly have plenty of better deals on used kit if you spend some time looking. Throw Ubuntu on it with software RAID 5 and LVM and you'll have a nice single place to put everything.

Another (expensive) option is to upload it all to S3, but 6TB is nearly a thousand dollars a month in storage costs alone, I suspect that's out of your budget. And the upload time would be nightmarish.
posted by Skorgu at 10:03 AM on July 8, 2008

While it still involves buying more and bigger hard drives, the Drobo helps manage expansion as you need it (up to 16GB.)

The downside: to get it on the network you need to purchase their Drobo Share separately, and each unit only has space for 4 drives.
posted by howling fantods at 10:04 AM on July 8, 2008

I have a Drobo. You can set it up on your Mac and then plug it into your Airport Extreme as you want to do and it works like a champ. They just released a new version, so the old version has dropped to $349 which is a nice price. As howling fantods said, it'll go up to 16GB, but given that the larges available drive right now is 1TB, it can only get up to 3TB.

I'd stop buying external drives. Either build a NAS FreeNAS and you can stick a LOT of space in one computer on your network or purchase a drobo or two.
posted by AaRdVarK at 10:07 AM on July 8, 2008

A budget option: Buy a little server and load it up with all the storage you need. Copy your files to the new box over your local network. Then, stick it in your parents/friends/relative's basement and use rsync to backup changed files nightly. As long as you're only making small changes daily, DSL lines on both end will be sufficient.

Of course, if you're downloading gigabytes of movies and music daily, this might not be the best solution. It also doesn't give you versioned backups, but at least you'll have an offsite copy, in case your house burns down or your hard drives fail.
posted by chrisamiller at 10:22 AM on July 8, 2008

If you are 100% certain none of this can be compressed or deleted and you absolutely need offsite backups then you need to colocate another server with the same amount of space, perform an initial synch, and then rsych (or whatever tool you want to use) nightly. If the contents of those files changes by a few percentage points a day then you better have a 10+ megabit connection to the internet or you'll never synch up.

Another alternative is to archive all this stuff onto tapes and store them. Do a monthly, quarterly, whatever rewrite. This doesnt seem very practical time-wise.

You can cheapen up option number one if you instead build another server with the drives you need, synch up, and store it at a buddies house. Do another synch manually monthly by driving there and picking up up and taking it back when you are done. No internet needed.
posted by damn dirty ape at 10:35 AM on July 8, 2008

Also, are these movies? Do you need the quality/codec they are at? If theyre raw DVD files you can compress them with xvid or mp4. A few weekends of work but now you have a 400gig problem not a 4TB problem.
posted by damn dirty ape at 10:37 AM on July 8, 2008

unixrat writes "Mozy also allows you to choose your own encryption key, which keeps your data private. PC and Mac - it's pretty nice."

But you use their software to do the encryption. Choosing your own key is feel good window dressing. If you're worried about sensitive data use truecrypt to encrypt it before you upload it.
posted by Mitheral at 10:48 AM on July 8, 2008

Rare (nonexistant) is the private individual with multiple terabytes of critical data that they need to protect against disaster.

critical questions:

1. does the storage need to be "online" at all times or would you be satisfied maintaining a data dump/storage library (i.e., a pile of disks in the closet, outlets are not an issue, and neither is clutter)?

2. how important is this stuff, really? Are these movies downloded from the internet ("not important at all"), lengthly and unwatched vacation videos (encode them to compress them), DVDs and CDs that you spent a lot of time ripping by hand but for which the source media is still readily available, or projects/etc.?

ReadyNAS units are online hardware-fault-tolerant servers; but they do not provide the properties that people actually expect from a backup (to wit: they do not facilitate recovery of deleted files) and any spinning media is vulnerable to an earthquake or other disaster. In some ways, they rather impede recovery (such as undelete or professional data recovery after a flood). If the stuff is really critical, the key question is what proportion of it needs to be online and how many redundant backups one needs to make.

I have, offsite, complete backups of individual files (time machine and periodically made full system images -- both to a series of hardware encrypted external backup drives where the most recent is kept at work locked in a filing cabinet and the rest at home, powered down) and in addition RAID1 (mirrored) drives in the work system and a pair or ReadyNAS units (single-disk-failure-tolerant) for stuff that is replaceable or currently in progress.
posted by rr at 10:52 AM on July 8, 2008

Lastly, whatever you go with make sure it has an esata connector. Doing the initial transfer over USB is going to be a nightmare. Two of these chassis will max out at 6 TB which gives you some room for the future.

esata in real world is roughly 4 or 5x faster than USB2.

Gigabit ethernet is something of a scam. The speed you get depends on the chipset of the interfaces, the switch, the length of a the cable, and a slew of other factors you may not be able to plan for. It may not even be that much faster than USB2 in some cases.
posted by damn dirty ape at 11:02 AM on July 8, 2008

Stop being a packrat and start deleting. Most things can be redownloaded.
posted by wongcorgi at 11:25 AM on July 8, 2008

Response by poster: The files' origin is quite varied.

1TB are MP3s I have ripped myself from my private (large) CD collection. While it took me months to do it, I view ownership of the actual physical CDs themselves a semi-adequate backup solution, however I need this to be online at all times for our 5 computers.

Just under 1TB are audio files I have made myself (podcast raw data; each show I do produces 2-3GB of raw data). I am parsing these out into 4GB chunks and burning them to DVD and deleting, however it is taking some time to do this and I worry about disk failure or some disaster before I complete the process and would like a backup.

Several gig are personally taken photographs, the majority in 10 megapixel quality. Sure, reducing quality is an option but as technology improves I believe I will eventually regret the quality loss on these.

About 1TB are personal movie files in DVD quality that I need to burn to video DVDs. I have a TiVo and TiVo to go and have transferred files to my PC for burning but need to make the menus and burn them off. Again, a LOT of video, a LOT of hours editing ahead of me. See above about fear of disaster.

Then I have a lot of programs. These days I purchase a software and I download it and get e-mailed a key. So I have to archive the program, and the key.

Plus other random bits and byes here and there.

Really, no, not all of this data needs to be online at all times. In fact, most of it could be offline except for attachment when needed. But what I want to do is back it up to get it off the external hard drive it's on before the hard drive fails.
posted by arniec at 11:35 AM on July 8, 2008

As others have said, you probably only have <1>critical data. Anything else, it will be easier to buy the hardware again and re-download. But what you need to do (and this isn't clear) is get some sort of hardware protection. The chances of a natural disaster are small, with huge consequences and huge costs to prevent. The chances of a hardware failure are small, with huge consequences and low costs to prevent.

In any case, as others have said, you pretty much have to scrap your current setup and buy a dump server + lots of drives. You want to not lose the data in case of a "crash" or drive failure. This is trivial as even a ReadyNAS runs RAID. I run a RAID5 on a HP storage server with 1.5TB. If a drive goes down, I have to go out and buy a new hard drive and rebuild the array. This will be downtime (I guess I could run it without RAID if I really wanted to), but this is acceptable.

I highly recommend ReadyNAS if you want to get going without fuss. As others said, if you're technically inclined you can probably do this cheaper, but be aware that if you don't know how, learning how to do it can be a considerable project in itself.
posted by geoff. at 11:51 AM on July 8, 2008

From here, there are several choices to remove duplicate files:

Easy Duplicate Finder
Double Killer
Duplicate Finder
Duplicate File Finder
posted by euphorb at 12:06 PM on July 8, 2008 [5 favorites]

arniec: Several gig are personally taken photographs, the majority in 10 megapixel quality. Sure, reducing quality is an option but as technology improves I believe I will eventually regret the quality loss on these.

Are you storing them as RAW or as JPEG? If you're storing them as RAW, then I recommend converting them to the losslessly-compressed format PNG (but see the caveats in this section of the wikipedia article) or TIFF. If you have them stored as JPEG, then 1) shame on you for storing originals in a lossy format, 2) 10MB might be large enough that you could comfortably switch them over to a lossless format (though this will increase the size of the files, but at least you'll be able to edit your pictures without loss of quality).

Then I have a lot of programs. These days I purchase a software and I download it and get e-mailed a key. So I have to archive the program, and the key.

This is definitely something to put on a DVD, and not on an online NAS. Do any of these programs offer free re-download later? If so, then you could just backup the key.
posted by philomathoholic at 1:22 PM on July 8, 2008

"Gigabit ethernet is something of a scam. The speed you get depends on the chipset of the interfaces, the switch, the length of a the cable, and a slew of other factors you may not be able to plan for. It may not even be that much faster than USB2 in some cases."

Really? I have a Shuttle PC with two 500GB in RAID-1 and a 400GB drive in it working as a FreeBSD/Samba server on my LAN. With MTU set to 9000, plugged into a cheap Netgear 1000BaseTX switch, it will do 450Mbps reads transfers from the server to a Macbook Pro. The Gigabit Ethernet chipset on the Shuttle's motherboard is some totally boring Intel chipset (em interface in *bsd). This is pretty much limited by the write speed capability of the 2.5" HDD in the laptop receiving the data.

A slightly scaled up version of this in a bigger case with eight 1TB HDD in RAID5 (on an Areca or 3Ware controller) would be the right backup solution. I do not suggest using any 4*1TB consumer grade NAS things as there's very little administration flexibility and their RAID technology is a "black box". Of course if you build a server with 8*1TB or 12*1TB drives and use it as your primary data storage source, you'd really need two identical servers to achieve redundancy.
posted by thewalrus at 1:37 PM on July 8, 2008

The speed you get depends on the chipset of the interfaces, the switch, the length of a the cable, and a slew of other factors you may not be able to plan for. It may not even be that much faster than USB2 in some cases.

Well, the chipset absolutely but unless you're getting your cables and switch from the dollar store you'll be fine. GigE is in a much more stable state now than it was even a year ago, but chipset difficulties are still around. If you can, pick one manufacturer and stick to it. I've standardized on Intel's Pro/1000 or whatever they're calling it today both at home and at work and my network speed is substantially faster than my disk bandwidth in both environments.
posted by Skorgu at 2:12 PM on July 8, 2008

seconding the advice for skorgu above: use the same chipset, and make sure your switch can handle jumbo frames.

I'd also check out openfiler.

My suggestion is to get a real beast of a server, with a redundant RAID setup.
posted by Freen at 4:04 PM on July 8, 2008

Please do not call RAID a backup anything. It has [almost] no recovery from rm -rf /mountpoint/*. Snapshots (supported by some servers, usually [at least in the case of the readynas] with horrible performance) can help here but are not a substitute. RAID itself has other issues (it is not clear to me, for example, that the ReadyNAS units ever do autonomous scrubbing to make sure that some form of silent corruption has not occurred).

There are a lot of benefits to offline storage. When spun down and disconnected harddisks are very resistent to damage.

It sounds to me that what the poster would find easiest is buying a bunch of retail 750G or 1TB drives [preferably SATA] and then copying every file to two different drives (so one redundant copy per file). An external case per drive is optional but it would be worth buying at least two (I like the external non-cases). Since it sounds like at least half of the data will rarely, if ever, be accessed again putting these somewhere where they wont get bumped, preferably back in their antistatic bag in the original box, is the last step.

Unless you have a lot of time, it probably is not worth burning anything to DVD at all, let alone putting yourself in front of a hurdle of needing to make menus for it prior to doing so. Disks will make it a hell of a lot faster and easier to migrate your data from temporary storage medium A (what we're discussing now) to temporary storage medium B (what will be cheap/effective in the long run).

Think carefully about your choice of filesystem, though. It seems unlikely that any of the current filesystems will go away, at least in a readonly incarnation, in the next 5 or 6 years, but 10 years out you may very will find it difficult to connect the media let alone read the data therein (though I suspect the aging media file formats, especially RAW images if not DNG, will expire first -- you may very well want to automate the generation of very high quality JPG, TIFF or PNG images from the RAW files if they are in a proprietary format). Most of the time people revisit _bulk_ photo collections for browsing and browsing a folder full of jpgs is much, much faster than browsing a folder full of raws.

It is worth writing a script to generate hashes for each file so that you can differentiate corruption issues from "the importer for the old out of date format I am loading is crashing when I load file X" issues that you will encounter.

One thing I would suggest is being extremely realistic about whether the material is really valuable to you. All of the suggestions so far, including mine, involve spending a non-trivial amount of money to retain.
posted by rr at 5:28 PM on July 8, 2008

Lossless compression will help, but not solve much. I use FLAC for audio, as well as using mono formats as much as possible- using mono for things that aren't stereo cuts storage in half. And lots of things don't need to be stereo.

I'm facing a similar problem, and the simplest thing for me is to have (nearly) everything online. In my case, however, 1.5tb is more than enough for now. That is easily accomplished on one machine (4*500 gb raid 5). In your case, you'll probably have to use more than one machine, separated by the role of the data on it.

For backing up, I am considering using hard drives in a tape-like role. (And/or another machine offsite doing rsyncs). Hard drives are the cheapest, densest storage medium right now. (Unless you relish the idea of backing up to DVDs, which might be cheaper, but by my count such a backup scenario would require ~ 1800 DVDs, which are not reusable.) I'd do a main backup onto a few drives, store them offsite. Then have two additional drives for incremental/differential backups. When they are both full, do another full backup.
posted by gjc at 6:52 PM on July 8, 2008

Everything rr is saying is right on the money. Especially this:
There are a lot of benefits to offline storage. When spun down and disconnected harddisks are very resistent to damage.
Well, except that I really, really doubt that any filesystem you will use could possibly become inaccessible within the next 15 years. That doesn't make it an unimportant issue, but..

I'd also suggest great care with DVDs. They just don't last that long, and they aren't really much cheaper anymore (if at all?!?!).

A single off line copy of a file, along with an online version (redundant or not), still isn't that secure. If it is really important, you want 5+ copies in 2-3 physical locations.. And more.. And, remember that 2 copies on the same physical media isn't really 2 copies. It seems that many people foolishly use a single external hard drive as their entire backup solution.. Perhaps I'm too pessimistic, but to me that is a disaster waiting to happen.

All that said, I've never seen much written about the life of data on a HD sitting in a drawer compared to data on optical media, and that is really important information for this conversation. Now excuse me while I go backup my important personal data again :)
posted by Chuckles at 8:34 PM on July 8, 2008

Chuckles- it's enough to drive one nuts, isn't it? Someone said it about RAID, but I think it's true for anything- as far as I know, there's no filesystem that will do error correction with an eye toward data integrity. The drive hardware will try to get its best guess as to what's on the media, but who knows if that is what was written, or junk. I'd like to see a FS that has parity/hashing at the FS level. Like a whole drive of par2 files or something. A real solution would cost in terms of speed and space, but for data integrity, I'd think it would be useful.
posted by gjc at 8:45 PM on July 8, 2008

Response by poster: Just for the record, I'm not worried about version-ing my files, nor recovery from accidental deletions and the like. Perhaps once I get my data organized I will be, but right now since I have this gigantic "data heap" that I have to sort out my primary concern is mirroring the heap in some way so that if a hard drive platter spins out of control before I've sorted the heap I have a backup somewhere.

I am an IT professional and actually had a Windows 2003 server running in my house (I wasn't using RAID, but just using it as a file storage unit) until recently it became buggy and I started to think external hard drives were preferable. Also my house is switching from being all Windows to being all Mac.

So I'd thought about getting the server back in order for incremental backups (or even just file synchs) but was hoping there may be another alternative; some hardware device I was unaware of (and I didn't know about the Drobo, which I may include in my network next time I have the urge to buy an external hard drive, buy a Drobo instead)

Tonight I think I'm going to look at doing a combination of all of the above. I have an external hard drive enclosure (USB 2.0) and some spare drives that will go in it, and I'll use it for the most sensitive data. Then if Mozy is truly unlimited storage for $5/mo I'll use it for my TV shows that I've yet to burn on DVD, MP3s, etc. just to avoid having to rip them all again.

Thank you all

(and as a sidenote: Yes, my pictures are JPEG because I just haven't set my camera to take pictures in RAW yet...I need to do that.)
posted by arniec at 6:20 AM on July 9, 2008

gjc writes "Someone said it about RAID, but I think it's true for anything- as far as I know, there's no filesystem that will do error correction with an eye toward data integrity."

This is why I like DVDs for my generated data backups (stuff like images and files). Besides my on going incrementals on HD an online I make two full back ups every quarter to DVD. One goes in the drawer here and the other goes to my father. Even if my hose burns down if any one disk gets damaged at my father's place I can hopefully recover the same file from an earlier version. The box is getting pretty full now that I'm up to over a 100GB but it is very reassuring. I wish dual layer media would come down in price.
posted by Mitheral at 10:16 AM on July 9, 2008

I'm going to hazard a guess that while Mozy says they offer unlimited storage for $5/month, they will either go out of business shortly or find some weasel clause to dump you as a customer if you try and store 6TB of data. My guess is that they allow unlimited storage, but not unlimited bandwidth to get it there.

There is no such thing as a free lunch.

If I were facing the same problem as you, I'd buy another 6TB of external HDs, make a duplicate copy of everything exactly as it is now. Then store those external HDs in your Aunt's closet. Now you have bought yourself time to sort through all your files, compress, weed out duplicates, change file formats, etc. Once all that is done you are ready to set up your RAID file server, which will use those extra external HDs as an off-site backup.
posted by ChrisHartley at 11:00 AM on July 9, 2008 [1 favorite]

gjc: as far as I know, there's no filesystem that will do error correction with an eye toward data integrity.

Look into Sun's zfs. It uses checksums by default, & multiple copies of data.

Elsewhere: Make a computer into an NAS, & run something like hardlink on it. Backing stuff up with rsync works, though the first time will be slow. Using, say, rsnapshot, to note the differences may be helpful when you note that the old system's breaking down & you now have very carefully synchronized gibberish.
posted by Pronoiac at 11:49 PM on July 11, 2008

« Older Change gchat tone?   |   Is it possible to obtain a Flexible Medical... Newer »
This thread is closed to new comments.