Growable Linux hardware RAID?
March 1, 2008 2:36 PM   Subscribe

How can I make a growable hardware RAID-5 Linux system? Is LVM the right choice, or am I setting myself up for trouble? Lots of geeky details follow.

I'm setting up a home file server / project box. Consistent with my budget, I'm using only the finest, most sophisticated hardware … from 1998.

What I have is a Dell PowerEdge 2300. It has a PCI hardware SCSI RAID card, called the "PERC2/SC" (known to people outside Dell as the "AMI MegaRAID 466"), attached to a six-slot SCAII drive bay. Right now I have 4 74GB SCSI disks in the bay set up as a RAID-5. I want to have the option of adding more drives later on, bringing it up to 6x74GB in a RAID-5.

The card's BIOS configuration utility supports expanding the array volume by adding disks later on -- that's not a problem. But what I'm concerned about is the filesystem that I might build on top of that volume. Obviously I don't want to hose all my data when I install an additional drive and expand the array.

I've been doing some reading and it seems like LVM might be part of the answer, so to that end I installed Ubuntu-server, creating a small ext3 /boot partition and giving the rest of the free space to LVM as a physical volume. In fdisk, this looks basically like: (cutting block numbers)

Disk /dev/sda: 219.8 GB 26724 cylinders
/dev/sda1 1 26274 Extended
/dev/sda5* 1 31 Linux
/dev/sda6 32 26724 Linux LVM

I have this gut feeling that this won't work when I expand the array, and the "disk" that the LVM PV is sitting on (/dev/sda) suddenly increases in size. The fact that I can't find much information on growing a LVM physical volume isn't reassuring, either.

So, is this really the way to go? And if not, what's the best option? Should I forget LVM and just put the filesystem right down on the RAID volume directly? (I'm planning on using JFS for low CPU usage and easy online resizing, but I'm open to other suggestions.) Should I eliminate the /boot partition and turn the whole RAID volume into an LVM PV somehow?
posted by Kadin2048 to Technology (10 answers total) 7 users marked this as a favorite
Best answer: Interesting. I'm building a very similar system right now. No, literally, right now. My constraints were fault-tolerance and the ability to grow the array with zero downtime. Yeah, it's serving my xbmc. I'm a nerd.

Anyway, here's my solution:

I've got all the drives in a RAID 5EE array, to which I can add drives hot and grow the device live (I do have to force linux to rescan the scsi devices when I do that, however).

Using pvcreate, I've written the PV signature direct to disk - no partition table. I'm booting off of another device, so it's not a problem. Basically, I can't figure out a way to resize a partition without bringing the disk down. Using pvresize, I can grow the PVs at will. On that, I've created my VG and LV. Pretty straightfoward.

I went with XFS since it was designed to allow filesystem expansion and is well-supported in Linux. xfs_growfs is the command you'll need.

I've tested all of this live, and it works a treat. The only part I haven't tested is dropping disk-level encryption on top of it using LUKS, but it's still a work in progress. I also haven't found a decent NAS appliance setup (ala FreeNAS, OpenFiler, etc) that allows me to resize disks hot, so I'm going to probably end up rolling a DIY solution.

I wouldn't try to boot an LVM - if I were you, I'd mirror the boot volume, install Linux on that with ext3, and do file storage on the raid 5 array.
posted by TheNewWazoo at 3:19 PM on March 1, 2008

Just in case you haven't already seen it: Drobo.
posted by adamrice at 3:21 PM on March 1, 2008

Correction: I can grow my LVM setup with a combination of pvresize, vgextend and lvresize, but it's very easy to do (relatively speaking, of course).
posted by TheNewWazoo at 3:21 PM on March 1, 2008

Best answer: I'm a little old school when it comes to things like this, and in my relatively limited experience in the area (I am in the business) tells me that these sorts of things are nothing but trouble. The two ways I've seen RAID arrays go belly up are- never rebooting and then when you eventually do reboot, more than one drive can't spin up. And the other is growing arrays and filesystems, even with enterprise hardware like you have.

It'll cost you $100 to buy an IDE drive large enough to back everything up to, recreate the new array, and copy the data back over. It might not be geek-approved, but it'll do the job.

I don't trust LVM, but I don't have the knowledge to justify that opinion. I've seen wonky data corruption issues with machines that have used it.
posted by gjc at 5:19 PM on March 1, 2008

Response by poster: Great suggestions so far.

After I submitted the question, I was chatting about LVM with one of the infrastructure people I work with, and he suggested that when I wanted to grow the RAID array, I could just grow it, and then create an additional, new LVM PV in the additional cylinders that would be added to /dev/sda. (I'd then be able to add that PV to the same LV that I'm currently using, or make it into a separate one.) If anyone wants to comment on whether they think that's a good or bad idea, I'd be interested in opinions.

gjc: I'll definitely be making a backup onto some sort of large/cheap drive of the whole array before I go to do any resizing. I'm not necessarily opposed to doing things the fast-and-ugly way if the clean-and-elegant way is much harder. :)
posted by Kadin2048 at 6:02 PM on March 1, 2008

Careful with XFS, it's got something of a history when it comes to nasty bugs. No experience with JFS, but.. well, ext3's still the most popular fs for a reason.

And yes, keep backups. Not just before resizing, always; daily incrementals to online mass storage and an offline one you swap with weekly, or whatever. You probably have a greater risk of data loss from spack-deletes than disk failure or a resize going horribly wrong, and RAID absolutely will not help with that.
posted by Freaky at 7:13 PM on March 1, 2008

We use EVMS with OCFS2 at work and do multi-read-write to a fibre channel array. It works pretty darned well.
posted by SpecialK at 7:40 PM on March 1, 2008

Best answer: I use reiserfs on my LVM2 logical volumes because resize_reiserfs can grow a filesystem without unmounting it. I've tried this, and it works.

I have no experience with RAID 5, but if the effect of adding a disk and telling the controller to grow the array is that you just see the same /dev/sda only bigger, it seems to me that simply creating another partition in the new space, running pvcreate on it, and adding it to your existing volume group should work just fine.

I haven't played with pvresize but a cursory scan of the man page suggests to me that it's more an offline thing.
posted by flabdablet at 5:10 AM on March 2, 2008

Best answer: I have assembled such a system for my use, and its been working perfectly for the last year and a half or so. I used 4 250gb drives. I had to design it to grow because i did not have all the drives available at the start. I started with 2 drives, configured it to assume 1 missing drive and it gave me 500gb (minus /boot, overhead) of space with no redundancy (1 missing drive). I loaded it up to free the other drives, added a 3rd drive to the raid and it rebuilt itself with redundancy now. Added the 4th drive and it rebuilt itself with 250gb more storage space. Added that space to the lvm, resized my big partition and it just worked. So from my experience this is quite solid.

Snapped out 3 drives of the raid, assembled them in another computer and booted it from a linux livecd and i could mount my raid. Sweet. After those initial tests i never actually had to touch it, as it gave no trouble.

This large blob of growing storage space is managed with lvm, so it can easilly be chopped up as partitions. I really only needed one partition. I use reiserfs for performance and because live resize operations are considered stable on it. I've had it resize my live root partition and its a funny tingling feeling.

So on your drives i suggest 2 partitions:
1) /boot 200mb should fit a few kernels if you want/need to play
2) partition with the rest of the size, to be added to the raid

Then make a raid device using the 2nd partition of each drive. Then on the raid device, make a big partition that uses all the space. Then use lvm on the created raid partition to split it up as you see fit in the future. When your raid device grows, you can make a new partition on its new free space and add it to lvm to use the space.

I have this setup running an old version of Ubuntu called Hoary.
posted by CautionToTheWind at 8:46 AM on March 2, 2008

I like my three partitions per drive setup with separate boot, swap and LVM partitions (in that order) because:
  • I don't know of a boot loader that understands LVM, so /boot has to be a standard partition
  • I like giving each distro its own /boot partition, so its auto-update stuff Just Works
  • I can easily pick which distro autostarts by leaving an appropriately configured Smart Boot Manager floppy disk loaded
  • I like having my swap space at the fast end of the disk drives (usually the transfer rate on outer tracks is significantly higher than that on inner tracks)
  • My swap partitions are all labelled, and I just add fstab entries with those labels as I add more drives - the kernel can figure out which drive it wants to use instant-by-instant if they're all in the swap pool
But it sounds like the RAID controllers are just mapping stuff all over the place anyway, so trying to optimize swap by using low LBAs probably wouldn't achieve much with RAID.
posted by flabdablet at 9:05 PM on March 3, 2008

« Older Spanish Translation   |   Timothy "Speed" Levitch of Museum tours? Newer »
This thread is closed to new comments.