Help me restore a RAID5 array?
January 20, 2013 12:55 PM
I have a software RAID5 array in Ubuntu. According to the SMART status, all the devices are okay (one has "a few bad sectors," but the health is green). I have 5 drives, but I can't get the raid to assemble! I'm about to try raid create as outlined here but it looks risky and I was hoping to see if there's something simple I'm missing.
I have 5, 2TB drives. They were named "sd[abcde]1" but for some reason are now named "sd[abcef]1" I don't know where "d" went or why it is now f (or possibly another letter). This might be part of the problem. I also have 3 drives on one controller, and two drives on another controller. I don't think that should be an issue, but thought I'd include it for thoroughness. Here's what I try to run:
I have 5, 2TB drives. They were named "sd[abcde]1" but for some reason are now named "sd[abcef]1" I don't know where "d" went or why it is now f (or possibly another letter). This might be part of the problem. I also have 3 drives on one controller, and two drives on another controller. I don't think that should be an issue, but thought I'd include it for thoroughness. Here's what I try to run:
mdadm -A /dev/md0 /dev/sd{a,b,c,e,f}1 mdadm: /dev/md0 assembled from 3 drives - not enough to start the array.Then I ran:
mdadm --examine /dev/sd*1And receive the following output:
/dev/sda1: Magic : a92b4efc Version : 0.90.00 UUID : 1ef7f14c:daf5731e:26644315:e48bd8ad Creation Time : Sun Sep 5 21:10:15 2010 Raid Level : raid5 Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB) Array Size : 7814047744 (7452.06 GiB 8001.58 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 0 Update Time : Sun May 20 21:30:36 2012 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 1 Spare Devices : 0 Checksum : beba01c0 - correct Events : 65642 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 33 1 active sync /dev/sdc1 0 0 8 17 0 active sync /dev/sdb1 1 1 8 33 1 active sync /dev/sdc1 2 2 0 0 2 faulty removed 3 3 8 1 3 active sync /dev/sda1 4 4 8 65 4 active sync /dev/sde1 /dev/sdb1: Magic : a92b4efc Version : 0.90.00 UUID : 1ef7f14c:daf5731e:26644315:e48bd8ad Creation Time : Sun Sep 5 21:10:15 2010 Raid Level : raid5 Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB) Array Size : 7814047744 (7452.06 GiB 8001.58 GB) Raid Devices : 5 Total Devices : 5 Preferred Minor : 0 Update Time : Fri Feb 10 18:55:25 2012 State : clean Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Checksum : be35dd2b - correct Events : 58402 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 49 2 active sync /dev/sdd1 0 0 8 17 0 active sync /dev/sdb1 1 1 8 33 1 active sync /dev/sdc1 2 2 8 49 2 active sync /dev/sdd1 3 3 8 1 3 active sync /dev/sda1 4 4 8 65 4 active sync /dev/sde1 /dev/sdc1: Magic : a92b4efc Version : 0.90.00 UUID : 1ef7f14c:daf5731e:26644315:e48bd8ad Creation Time : Sun Sep 5 21:10:15 2010 Raid Level : raid5 Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB) Array Size : 7814047744 (7452.06 GiB 8001.58 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 0 Update Time : Mon May 21 00:16:09 2012 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 2 Spare Devices : 0 Checksum : beba28c8 - correct Events : 65646 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 4 8 65 4 active sync /dev/sde1 0 0 8 17 0 active sync /dev/sdb1 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 8 1 3 active sync /dev/sda1 4 4 8 65 4 active sync /dev/sde1 mdadm: No md superblock detected on /dev/sdd1. /dev/sde1: Magic : a92b4efc Version : 0.90.00 UUID : 1ef7f14c:daf5731e:26644315:e48bd8ad Creation Time : Sun Sep 5 21:10:15 2010 Raid Level : raid5 Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB) Array Size : 7814047744 (7452.06 GiB 8001.58 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 0 Update Time : Mon May 21 00:16:09 2012 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 2 Spare Devices : 0 Checksum : beba2886 - correct Events : 65646 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 3 8 1 3 active sync /dev/sda1 0 0 8 17 0 active sync /dev/sdb1 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 8 1 3 active sync /dev/sda1 4 4 8 65 4 active sync /dev/sde1 /dev/sdf1: Magic : a92b4efc Version : 0.90.00 UUID : 1ef7f14c:daf5731e:26644315:e48bd8ad Creation Time : Sun Sep 5 21:10:15 2010 Raid Level : raid5 Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB) Array Size : 7814047744 (7452.06 GiB 8001.58 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 0 Update Time : Mon May 21 00:16:09 2012 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 2 Spare Devices : 0 Checksum : beba2890 - correct Events : 65646 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 17 0 active sync /dev/sdb1 0 0 8 17 0 active sync /dev/sdb1 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 8 1 3 active sync /dev/sda1 4 4 8 65 4 active sync /dev/sde1Sorry for the length on that. Any ideas on what to do next? I read this guide but it is a bit over my head and seems to indicate the original order of the disks being important, but I think I might have jacked with it enough that I lost that information, or maybe not? Given that the drives themselves report as healthy, I really think this is a weird software issue.
mdadm --create --assume-clean --level=5 --raid-devices=5 /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 missing /dev/sde1Like this, without line breaks?
1. How do I do this in read-only mode?
I think what happened is my OS is on a thumb drive, and when I was moving the computer it was taken out and put back into a different slot. This caused all the drives to be reported to the OS in the wrong order.
posted by geoff. at 2:52 PM on January 20, 2013
It looks like I can do mdadm --create --readonly, I may have answered my first question.
posted by geoff. at 2:59 PM on January 20, 2013
posted by geoff. at 2:59 PM on January 20, 2013
That looks right, but I'm not sure about the order.
I was looking through the man page for mdadm, there is a read-only option. But I've never used it.
I would also mount the array as read-only until you are sure about the order.
posted by gjc at 3:32 PM on January 20, 2013
I was looking through the man page for mdadm, there is a read-only option. But I've never used it.
I would also mount the array as read-only until you are sure about the order.
posted by gjc at 3:32 PM on January 20, 2013
So I did this and I keep getting unable to mount errors when I try to mount. Do I just keep trying different permutations until I can get something to mount?
posted by geoff. at 4:12 PM on January 20, 2013
posted by geoff. at 4:12 PM on January 20, 2013
Try
Otherwise, force it to assemble the array:
posted by titantoppler at 4:51 PM on January 20, 2013
mdadm --assemble --scan
to see if mdadm can automatically detect the array.Otherwise, force it to assemble the array:
mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
posted by titantoppler at 4:51 PM on January 20, 2013
It won't auto assemble because the metadata is inconsistent between the drives.
Yes, try each permutation until it works.
posted by gjc at 5:59 PM on January 20, 2013
Yes, try each permutation until it works.
posted by gjc at 5:59 PM on January 20, 2013
Ah! I was searching "permutations bash" when I came across the article I listed which has a perl script to test this. I didn't want to test 5! combinations.
posted by geoff. at 6:56 PM on January 20, 2013
posted by geoff. at 6:56 PM on January 20, 2013
Oh no! I accidently did --create without a missing parameter when testing the script. I think I destroyed the array.
posted by geoff. at 8:54 PM on January 20, 2013
posted by geoff. at 8:54 PM on January 20, 2013
Then it's time to erase it all, build it fresh and restore the content from your backup drives.
posted by flabdablet at 2:02 AM on January 21, 2013
posted by flabdablet at 2:02 AM on January 21, 2013
« Older Do I have a right to shares after turning them... | How to get the Galaxy Nte 2's Screen to stay on. Newer »
This thread is closed to new comments.
When you do your examine report, the drives think they are different drives than they used to be. a thinks it is c, b thinks it is d, c thinks it is e, e thinks it is a, f thinks it is b
You also have different versions of the superblock- some of the drives think two drives are bad, some of the drives think one is bad, and one thinks they are all good.
So I'm thinking the drives did in fact get reordered. Could be because the failing drive caused the controller to stall on POST and the drives got reported to the OS in the wrong order.
Meanwhile, one of the drives (the original sdd, I think) developed bad sectors or completely dropped out of the array. Then when the machine rebooted, it tried to auto assemble with the reordered drives and tried to use the bad/defunct one as the good one and got completely confused.
What I would do is pull the current sdb (that thinks it was sdd in the array). That's very likely the one that originally went bad and is probably the one with bad sectors. Double check your logs to make sure.
So after that, you'll have to do the create thing listed in the guide you mention. It looks correct to me. I would do it readonly until you are sure about the correct order.
Once you get it going, in the future, save the contents of your mdadm --examine command and also the results of a smartctl -i /dev/sd*1 so that you know the serial number of the drives in the array and what letter they are. So if in the future this happens again, you'll know the correct order based on serial number.
I think there is a way to create/assemble raid devices using disk labels instead of physical positions. This might be something to try, since your system seems to like to reorder drives.
posted by gjc at 2:30 PM on January 20, 2013