Need to buy a 30Tb hard drive, what are my options?
October 9, 2014 1:32 AM   Subscribe

My question is should I go with an enormous Linux raid or a dedicated NAS, and which hardware?

Linux Raid, and if so, what hardware is ideal to plug so many disks in, and what disks can you recommend? I am very familiar with Linux raid and have successfully used it in the past.

Or should I go with a dedicated NAS? And then which one, as I have zero experience there. Should I look at a usb3 or a gig ethernet link?

Thank you for your insights.
posted by CautionToTheWind to Computers & Internet (15 answers total) 2 users marked this as a favorite
USB3 would make it a jbod, not a nas.

Who is going to access the data, just you? How many hosts? Just yours? What kind of redundancy do you need, and what kind of backup?
posted by devnull at 1:47 AM on October 9, 2014

Whether a server or NAS appliance, it scales in price and feature set depending upon need so you may want to specify your usage requirements (e.g. number of concurrent users, types of data being stored/accessed, etc) and how those may change or grow.
posted by palionex at 1:51 AM on October 9, 2014

Response by poster: Maybe half a dozen servers will be reading parts of very large files (5Tb) from the large storage and writing back more large files. The goal is to create geographical maps from the source data.

The backup is naturally a separate issue, but redundancy is crucial as with that many hard drives, the chance of failure goes up. My idea is to have it survive one drive failure and to have spares standing by for replacement.
posted by CautionToTheWind at 2:22 AM on October 9, 2014

We do exactly this type of thing at my center, but we have a data center with a SAN(storage area network) with a few hundred TB of capacity, and sysadmins to keep things running. Building your own raid array for this kind of thing is really quite risky. If you possibly can, consider doing this in the cloud so that you can focus on the data instead of on the hardware.
posted by rockindata at 3:39 AM on October 9, 2014 [2 favorites]

usb3 is impractical for this. If you wanted a jbod (just a bunch of disks) for external storage in this capacity, a SAS external array would be more appropriate.

With multiple servers reading it, you'll want a nas or san. I'd suggest multiple gigE or a 10gig connection.
And you'll want some kind of raid, perhaps raid6 and a hot spare or two. You're probably looking at 10x 4TB drives at a minimum, preferably 12x.

Raid5 (one parity drive) isnt acceptable anymore with the huge capacity of current drives and raid rebuild times its common to lose another drive during the extended stress of a rebuild, raid6 uses two parity drives so has more of a chance of succeeding.

Do you have a tape backup system to backup the data? raid isn't a backup.

ps. You'll have to look at sas vs sata, 2.5" vs 3.5". These things get expensive quickly, but doing it on the cheap is a recipe to lose all your data.
posted by TheAdamist at 4:27 AM on October 9, 2014 [1 favorite]

I'd go for an enclosure or server with lots of drive bays, supermicro has some, and stack it full of 6 tb disks. Then use unraid or software raid (or btrfs/zfs if you prefer). Serve it over NFS/CIFS.
posted by devnull at 4:32 AM on October 9, 2014

It depends on your budget. If your budget is in the DIY area, I'd check out freenas. I've used it at home and at work with much success. If you want specifics of my builds, memail me. As TheAdamist says, raid5 is really not OK anymore.
posted by duckstab at 4:40 AM on October 9, 2014

I have been happy with my Synology NAS device.

Western Digital and Seagate have both recently introduced 6TB disks. They are on the expensive side right now (approx 2x the cost of a 4TB disk). You'll want to do some math to decide the optimal number of drive bays / drive size / cost.

Synology provides a handy product comparison table:

Synology Product Comparison

Some Synology devices have dual Gigabit Ethernet ports - these are set up by default to double the input/output rate of the device. You'll need to do some tweaking if you want to use the ports as a bridge. The Synology DSM software is at 5.0, ie, it's rather mature and very easy to use.

Note that the processors on most Synology devices are not ballsy enough to do full-on 1080p video transcoding. Which maybe you don't care about, but some people might be happier looking at QNAP or Drobo.

I've been running a Synology DS412+ with 4 x WD Red 4TB in RAID 6 for a year now and I've had no problems; am currently looking at expanding it. If they have a downside, I'd say it's that their support isn't all that great.

[the Usual Disclaimers: I'm not shilling for Synology, I've simply been reasonably happy with their products]
posted by doctor tough love at 5:24 AM on October 9, 2014 [1 favorite]

I've managed storage networks in excess of 200TB and currently manage one that is 120TB. I doubt you will be happy with a consumer level device, though they can probably do it. I have a synology DS414 at home, and it's excellent - but I think your performance needs will exceed what the Synolgy/Drobo/QNAP NAS can deliver.

Do you have a Dell or HP rep ? I prefer HP, and have been exceedingly happy with HPs service and offerings for my SANs. Good support - especially pre-sale, where good planning can save you many headaches and much much money down the line.

Dell is more meh, but several of my colleagues have had decent service from them. There are other SAN vendors - EMC, etc. It would pay to talk to them, but in my experience, Dell and HP tend to be a better value.
posted by Pogo_Fuzzybutt at 6:33 AM on October 9, 2014

An other vote for Synology.
Please note that there are a limit amount of switches that support their dual ethernet connection.
posted by Mac-Expert at 6:52 AM on October 9, 2014

My experience is that if you roll your own, you will end up with a hobby rather than a solution. Yeah, you will save some money vs. a packaged solution, but you will pay for it in time.

Unless you have actual requirements that push you towards a custom DIY NAS, I would just buy one, if you can afford it.

And yeah, what you want is a NAS, which implies Ethernet, not USB3, which means you're buying just a fancy external drive enclosure and would mean you'd still need a server to expose the files over the network. If you're going to involve a server like that, then I'd revise my recommendation and say you should just get some sort of external SATA backplane enclosure and do the RAID management yourself, because you're already managing a server for this thing.

Buffalo Terastations have always been popular among friends, and have sort of a price/performance sweet spot (and if you really want to, they used to be quite hackable, not sure about recent iterations of the hardware though). They used to be sold almost always unpopulated, bring-your-own drives, which could be nice if you have preferences about what drives to use.

Really though, the performance and features of the SOHO NAS boxes have really converged over the last few years. They are all about the same: small servers with a bunch of disks running some variety of Linux RAID under the hood and a web-management interface. Sometimes they plug into a proprietary cloud backup system, sometimes they do various kinds of content-management-y search stuff. (Kinda 'meh' on all those features personally. Classic case of a device trying to do too many things mediocrely instead of one thing well.) But they all serve up files over SMB well. If that's all you care about you can get a relatively low-end one. If you want to do other stuff on it — basically use it as the core of your network — then you might want to look at Synology or one of the other upmarket brands.
posted by Kadin2048 at 8:37 AM on October 9, 2014

Echoing Kadin2048: do you want to focus on your main problem at hand, or do you have the time to build a custom solution? There's nothing wrong with building your own. But if you're just trying to do a specific job, you're better off buying a packaged NAS solution.
posted by doctor tough love at 10:01 AM on October 9, 2014

The biggest trade-off with home-grown vs COTS is support. Do you need to call someone if it breaks two years down the road and get them to fix it? If so, building your own isn't an option.

If your own solution is an option, there are appliances (QNAP and Synology) or you can build a computer. I went the build your own computer route because I use the file server to do a few other things (VM host, plex server with media transcoding, etc.). If you build your own, I cannot overemphasize using ZFS over mdadm with at least two parity drives.

ZFS offers a whole host of features that mdadm doesn't. Plus it's not nearly as fragile when something goes wrong (read: lost a file, not the entire array). Plus there's that awesome protection against bitrot.
My idea is to have it survive one drive failure and to have spares standing by for replacement.
Sorry, no good here. URE on large drives mean you're statistically likely to have a bit flip as you read through the existing data to resilver which will then hose the entire process. Google phrases like "RAID5 is dead". (The ZFS equivilent of RAID5 is RAIDZ1 for single parity, it applies there too.) So go RAIDZ2 (or RAID6) or start doing stripping and mirroring.

As for connectivity, if you have multiple clients, ethernet is the way to go. 10GBE is expensive, especially the switches. You can bond in 10 minutes and have a collective 4x gigE connection for 500 MB/s aggregate. Mac-Expert is right, your switch needs to support this. $200 gets you an HP 1810-24Gv2 which supposed LACP (802.3ad) and you're off. Keep in mind this is 4 gigbits per second aggregate, not 4 gigabits per second to one client.

I was trying to avoid mentioning hardware, but oops, too late. FWIW, I have a rock solid supermicro motherboard and chassis. I don't use SATA expanders (but then again I only have 12 drives) but I do use reflashed to IT mode IBM m1015 cards. I don't use any drives from any manufacturer's power saving/green product lines. In the past, WD REDs, now HGST drives.

Future reading would include /r/datahoarders, /r/homelab,
posted by Brian Puccio at 10:31 AM on October 9, 2014

Feel free to stop by over in the FreeNAS Community Forums if you'd like to check out what other people have done and what would work well for a ZFS based solution. 30TB works out to 11 4TB drives in a RAIDZ3, making a 12-drive chassis an attractive option, and there's a lot of safety and resiliency with three parity drives.

obDisclosure: as a moderator and frequent poster, I may be biased.
posted by jgreco at 11:03 AM on October 9, 2014

Response by poster: Thank you, you have been very helpful and guided me away from trouble. I will be looking into what was posted in the next few days.
posted by CautionToTheWind at 5:51 AM on October 10, 2014

« Older Wlan woes   |   Looking for a short story about a woman with a... Newer »
This thread is closed to new comments.