Jbods gone bad
May 21, 2006 8:25 PM   Subscribe

How do I replace a failing disk in a jbod array setup using the windows XP Pro dynamic disk system?

I need a large amount of unpartitioned space (about 1.5TB) to store my scanned photos (8x10 film, so each full scan at 32 bit color is well over a gig, meh). I turned to a 6 disk jbod array created using the windows XP pro "dynamic disk" system to fulfill my needs.

One of the reasons I chose this path is that in case of a disk failure you supposedly only lose the data on a single disk rather then the whole array. Well, one of my disks is staring to fail (but is still readable, so the data is fully recoverable at this point). so I'm looking for instructions on how to go about replacing said disk without losing the data on the rest of the array.

My google skills seems to be failing me at the moment, as my searches have only turned up sites talking about the dynamic disk system in general, but no real clear instructions on how to replace a failed disk without losing the rest of the data, as promised.

Anyone happen to have a link handy that could help me out?

Thanks much,
Jeremy
posted by Jezztek to Computers & Internet (8 answers total)
 
worst case is you download a Knoppix Live CD and dd one disk to the other. not sure how the volume management would tolerate a disk change but that's a pretty damn close replica that should preserve labels and other metadata.
posted by kcm at 8:37 PM on May 21, 2006


JBOD = terrible for attempting to recover data. If you're lucky, the files won't be fragmented across the drives...

2 things.. go straight to the knoppix disk and dd the disk to another disk...

Then, go find a BYOD RAID 5 NAS... IBM just released a new one, take a look at Tom's Networking to take a look at some reviews... Buy some 750's, and you'll have 2.25 TB with fault-redundant disks. Then, when a disk fails, you unplug the bad disk, plug in the new disk.

This will cost a pretty penny, but look at it as an investment in never losing data, and making sure you never have to spend another minute on the hardware of your storage.
posted by hatsix at 9:29 PM on May 21, 2006


The Infrant ReadyNAS NV is a pretty nice small-biz option, as well.
posted by kcm at 9:36 PM on May 21, 2006


A lot is going to depend on whether you kept your individual disks as separate volumes in your JBOD, or if your spanned a dynamic volume across several dynamic disks, to create a single larger logical volume. If you did the former, you basically just add a new drive, set it up as a dynamic disk, copy over the files from the failing disk, and delete and remove the failing disk.

If you did the latter, replacing a single drive isn't so easy. We'd have to know the particulars to give you a reasonable procedure to try. For many possible arrangements in a 6 disk array, the procedure is going to involve restoring from backup, after replacing the failing drive, so you should be making a current backup before the failing disk goes out completely. Acronis makes some good utilities for cloning disks that may be helpful.
posted by paulsc at 9:50 PM on May 21, 2006


Response by poster: Yeah, I have a volume spanned across all 6 disks.

So basically I'll be having to backup all 1.5TB replace the single failing 250GB drive (wiping all the data on the old spanned volume) and then rebuild the array?

Which means I'll have to buy or borrow enough storage to do a full backup (having considered my negatives my current backup system), which is a bit more of a chore then I was hoping, but I guess them's the breaks.

Thanks much for the info, and thanks for the heads up on Knoppix.
posted by Jezztek at 1:06 AM on May 22, 2006


Back it up now. Grab a few large drives from somewhere, get that data off. If that drive fails, get the data off the rest of the drives becomes much harder.

(Sorry. That was important.)

Now, in the future. Spanning is deadly -- it gives you all the risks of striping, but none of the benifits. Drives are cheap. (No, really, they are.) Data isn't. If you really need to spin terabytes of disk, you need a real way to spin them. Which means you need RAID, of some flavor.

Mirrors are safe. Stripes are fast. Stripe and Mirrors is safe and fast, but you have to buy 2TB worth of disk to get 1TB worth of storage. If performance isn't crtical, stripe with parity means you only lose one disk worth of space. So, you spin 6 250GB drives, you get 1.25TB of storage that can lose one of those drives and still be readable.

If your using IDE or cheap SATA drives, you'll want RAID-5 *and* a hot spare, and probably a cold spare as well. IDE drives just aren't built to take server work. Having said that, it's cheaper to buy desktop SATAs and have extra spares handy than it is to buy server grade SATA drives, and definitly cheaper than SCSI/SAS/Fiber Channel drives and host adapters. Given that a 6 drive RAID-5 will be much faster than a 6 drive JBOD span, if the performance before was acceptible, the performance of RAID-5 will be fine, and investing in the production class drives won't make much sense, as long as you keep spares handy.

Finally: One fire kills everthing, so you really do need a away to get the data somewhere safe. Alas, there's no good cheap answer. Writeable DVDs and a backup program aren't expensive, but they're slow, and you'll need lots of blanks. Tapes that can handle 1TB aren't cheap, but they are much faster.

The easy answer for a 1TB array is a removeable hard disk or two and a backup program. How large depends on how compressible your data is. Unfortunatly, your doing photo work, photo data really isn't very compressible (in the lossless sense, as computer backups work.) So, a 1TB external array makes sense, but you need to back it up, then move the array somewhere else.

If your doing photo work, another way to back up in incrementally -- you finish a few images, dump them to a writeable DVD, and put that in (at least) a fire safe somewhere else in the house.

The other advantage of that method is you might not need to spin 1.5TB, if you store the images mostly on DVD.
posted by eriko at 4:40 AM on May 22, 2006


Jezztek, in case Eriko's explanation didn't take.... your setup is the absolute worst possible way to do what you're doing.

When you have 6 drives and you just span a large volume across them, if ANY drive fails, you lose everything. So if you're spanning across six drives, you're taking six times the risk, and you're risking six times as much data. This is an incredibly bad thing to do, as you're discovering.

RAID comes in multiple flavors. The simplest is RAID0, where you alternate cylinders between 2 or more drives, with no redundancy. This is similar to what you're doing (and may in fact be EXACTLY what you're doing). The only plus here is that it runs faster... by spreading the I/O across multiple spindles, the array can often run faster. But it's risky.... you're running X times as much chance of failure, and when any drive fails, you lose all the data on all the drives.

RAID-1 is simply copying the same data onto two drives, running in parallel. This can sometimes be faster for reading, if you have a clever controller that multiplexes read requests, but it will usually be a little slower for writing, since all data must be written twice. You trade away 50% of your storage space, but you can lose either drive and be fine.

RAID 0+1 is a botched, bad idea, in which you stripe data across 2 or more drives, and then mirror the striped volume to another striped volume of the same size. You will lose the entire array if any two drives fail at the same time, and you lose 50% of your space. It's the worst of all possible worlds.

RAID10 is *almost* the same thing, but not quite. In this case, you mirror pairs of drives, and then stripe data on top of the mirrors. It sounds like the same thing as RAID0+1, but it's much more fault tolerant. The only way your volume will fail is if both drives in the same mirror fail. So, while you still lose 50% of your space, you have very good redundancy... there's a good chance you can lose multiple drives and stay up.

RAID4 is a different approach, where the controller uses one drive as parity data. If you have six drives, five of the drives are used for data, and one is used for parity. Any one drive in this system can fail, and the array will stay up. This works that sixth drive very hard, though, because whenever any data changes, the sixth drive is written to. That drive takes a beating, and tends to fail much sooner than it should. So RAID4 isn't often used.

RAID5 is almost the same thing, but the parity data is spread across all the drives, so it evens out the wear. This is used very, very often.

Both RAID4 and RAID5 are slow on writes, because every time a sector changes, all the other cylinders must be read, a checksum must be computed, and then the checksum must be written to the parity area. This is a lot of I/O, and slows writes a great deal. Reading, on the other hand, is VERY fast.

RAID6 is the new hotness... it uses two parity drives, so any two drives can fail without killing the array. It's a bit slower on writes than RAID4/5 are, and reads are about the same speed.

All forms of RAID will normally let you attach extra redundant drives, and mark them as 'hot spares'... if you have a good controller, at least. When a drive fails, the controller will immediately replace it with a hot spare, and will do a rebuild automatically. You then take the failed drive out and replace it with a new one... it becomes a new hot spare. The 'cold spares' that eriko is referring to just means having an extra drive or two on hand, in a drawer, ready to immediately stand in for a failed drive.

Which approach to take? It's a tradeoff based on cost versus space versus speed versus redundancy. Hopefully, this will have given you enough info to make a good decision.

Right now, you need a backup, pronto. Danger, Will Robinson!
posted by Malor at 5:58 AM on May 22, 2006


Response by poster: Yeah, it appears I must have been sadly misinformed about the whole "one of benefits that comes with a Jbod array as compared to raid 0 is that if one drive fails you only lose data on that one drive" bit.

But luckily at this point my data is all totally readable (the drive has just been making an occasional clicking sound followed by a brief system pause, a sure sign something is going wrong, but nothing that'll keep me from transferring the data over).

My new backups disks are in the mail, and I guess I'll be sticking to my laptop for the next few days.

Thanks much everyone.
posted by Jezztek at 10:27 AM on May 22, 2006


« Older how to casually come out as poly?   |   Where should we live in Washington, D.C., with two... Newer »
This thread is closed to new comments.