Lost RAID controller -- EOTW?
April 27, 2006 7:26 AM   Subscribe

Some RAID questions about what happens if you lose not a disk but the controller.

I need to build a largish (~1 TB) file server with the drives configured as raid 5. I see lots of discussion out there about how raid 5 helps you recover from a lost HD. But that's only one end of things. If your raid is hardware-based, what happens if you lose your controller card? If you have a spare controller of the same model, will it recognize the drives as an already-existing RAID array? What if you can't obtain the exact same model of controller? Alternatively, what happens to your software-based (Linux or XP) raid 5 array if your boot drive crashes and you have to reinstall the OS? Can you recover your array?
posted by jfuller to Computers & Internet (11 answers total)
 
If you have a spare controller of the same model, will it recognize the drives as an already-existing RAID array?

That's the idea, but it's up to the individual card manufacturer to implement it. But cards don't typically fail by shutting off; instead, they fail by screwing up your data for weeks before you finally notice something's a bit odd.

In other words, you still need backups. (RAID cards have a hard time surviving fires and floods, too.)

Alternatively, what happens to your software-based (Linux or XP) raid 5 array if your boot drive crashes and you have to reinstall the OS? Can you recover your array?

If by "boot drive" you mean "root filesystem" -- those are as important as your data, put them on RAID as well. The only thing in Linux that won't work on software RAID is /boot, since the bootloader needs to read it; I get around that by just having multiple /boot partitions with identical content.

But the biggest advantage of software RAID is that the disks will be recognized anywhere. I could take a disk out of my raid-1 mirror at home, put it (alone) in a box at work, tell Linux that it's half a raid-1 mirror, and get at the data fine (and build another half of a mirror from it, if I wanted).

I haven't used Windows software RAID.

Incidentally, be sure to put swap on RAID too. Some people advise not doing this, but I don't understand why -- if a disk fails, you want to keep running, and if that disk is your swap disk and it's not redundant, you won't keep running.
posted by mendel at 7:31 AM on April 27, 2006


I suffered from a RAID card failure once... I thought, "okay, I'll just order the same card and will still be able to access my data"; but that didn't work out. While the cards had the same manufacturer and product name, used the same controller chip and appeared to be identical in design, there must have been some difference (firmware? secondary chips used?) that made the new card unable or unwilling to recognize my existing RAID.

Fortunately, I had full backups of all my critical data; so the only thing I lost was some time.
posted by ckemp at 8:23 AM on April 27, 2006


YMMV, but if you have a duplicate card/controller, you should be able to recover just fine. After the new card is installed, you need to set it up exactly the same way as you did the previous card. (You did document that configuration, right?)

Many RAID setup utilities will ask you to initialize the drives after the RAID config is complete. DO NOT DO THIS. Just save the config without erasing/initializing the drives and it should then recognize the logical drive(s) on the RAID.

This is possible because the RAID config is not stored on the drives, but in non-volatile memory (NVRAM) on the card itself.
posted by pmbuko at 8:43 AM on April 27, 2006


NT tells you to write a drive configuration floppy that can instruct a new OS install how to mount the software raid array. I believe some documentation somewhere says that it can still reconstruct without the disc, but that it takes a long long time..

I don't know how more recent versions of windows handle it.
posted by Chuckles at 10:31 AM on April 27, 2006


FYI, Windows Server 2k3 disk manager is used to manage Windows software RAID. It's as simple as right-clicking the disk, and doing a short "import" operation. It's been a while since my MCSE training, but I think it stores a unique sid on each member of the set, and also a sequence number, so any 2k3 installation can recognize that they "go together."

I regularly upgrade my (RAID-0) home server, take out the entire (IDE) chain of drives, transfer to a new computer running 2k3, and I have my logical drive back in under 5 minutes.
posted by chota at 1:04 PM on April 27, 2006


Alternatively, what happens to your software-based (Linux or XP) raid 5 array if your boot drive crashes and you have to reinstall the OS? Can you recover your array?

Linux? Hell yeah. Main reason to use soft raid: You're not stuck scrounging for antique hardware. I believe that the more recent RAID toolsets ( MADAM, for example), set things up so the RAID kernel module will assemble, check and mount the raid for you, based on the disk parameters and tags, instead of a config file.

(YMMV - It's been a while since I did this.)
posted by Orb2069 at 1:57 PM on April 27, 2006


Had this happen with an out-of-warranty Compag/HP DL380 a couple of years ago. The RAID controller just died. Fortunately we were able to find some identical RAID controllers in town (at a reasonable price, too!) and the system came back up flawlessly. There was no previous data corruption.

Perhaps buying two of the same controller at the same time is a wise thing to do.
posted by lhauser at 1:57 PM on April 27, 2006


RAID controllers vary tremendously in their firmware sophistication as well as hardware capabilities, and the RAID level and type of volume have something to do with how easily you'll get running again after a RAID card failure. I use various versions of Adaptec and LSI MegaRAID controllers, which have onboard cache and battery backup, and I highly recommend having on board battery backup if you want to have card level swap and recover capability.

To insure volume integrity to RAID volumes under all conditions, you need to be sure data from any onboard cache is actually flushed completely to disk, under all operating and shutdown conditions, and that is what an onboard battery attempts to do. In theory, if your data hasn't made it to disk in a shutdown, it should still be in memory for up to 72 hours, and you have time to fix your machine, or move your controller and RAID array to a working machine to complete a normal shutdown safely. So that is issue #1 in the reliable RAID world.

Next issue is to be sure that your volume configuration is stored both on the disk array and in NVRAM on the controller card, and that your system can read and run from either configuration location. Typically, this will stored as something like an MBR record on each disk of the array, which can then be read back by a new replacement card, if you have to change out a bad controller, and have the requisite number of surviving disk members of a redundant array, to recover the array.

Worst that ever happened to me was losing 2 disks and a controller card simultaneously with a the main logic board and UPS of a RAID 50 server array in a lightning storm. We replaced the logic board the following day, along with disk spares and the controller card from shelf spares, and were up and running, rebuilding the arrays in the background, about 6 minutes after laying down the screwdriver.
posted by paulsc at 4:59 PM on April 27, 2006


Incidentally, be sure to put swap on RAID too. Some people advise not doing this, but I don't understand why

Going to swap's a speed killer/disk hog as it is. Swapping to /dev/md? makes it orders of magnitude worse. It used to also mung things pretty bad in the 2.6 kernels IIRC. Dunno if that's been fixed.

It's also unnecessary if you want a RAID0 swap - All you have to do is make a swap partition on each drive, and set them up with the same priority in fstab - The kernel will alternate between the drives, giving you almost all the benefit and still running at a reasonable speed.

-- if a disk fails, you want to keep running, and if that disk is your swap disk and it's not redundant, you won't keep running.

The last I heard, most HDC drivers wouldn't stay up after a bunch of drive read errors mung things, anyways. If you need 99.999999% availability, you really aught to bite the bullet and get a hardware RAID system with hotplug and all the goodies. SoftRAID is about the preserving the data, not uptime.
posted by Orb2069 at 6:34 PM on April 27, 2006


We had an Areca ARC-1260 controller running a 16-disk RAID 6 Open-E iSCSI array fail on us. We breathed a huge sigh of relief when we installed a new ARC-1260 and everything ran exactly as before.
posted by I EAT TAPAS at 8:38 PM on April 27, 2006


If you have a spare controller of the same model, will it recognize the drives as an already-existing RAID array?

Good controllers will do this.

Compaq controllers are very good at this: They immediately recognize a RAID array, even if it was created by a different Compaq RAID controller.

I've pulled 5 disks from one Compaq server, put them into another Compaq server (different model, different controller) and could immediately use it as if nothing happened.

They also notice when you swap the drives around and go "oh, you reordered the drives, okay, I've updated my configuration".
posted by bloo at 4:16 AM on April 28, 2006


« Older Drink it fast, before it turns to sludge   |   What happens to the value of Put Options if the... Newer »
This thread is closed to new comments.