Help me restore a RAID5 array?
January 20, 2013 12:55 PM

I have a software RAID5 array in Ubuntu. According to the SMART status, all the devices are okay (one has "a few bad sectors," but the health is green). I have 5 drives, but I can't get the raid to assemble! I'm about to try raid create as outlined here but it looks risky and I was hoping to see if there's something simple I'm missing.

I have 5, 2TB drives. They were named "sd[abcde]1" but for some reason are now named "sd[abcef]1" I don't know where "d" went or why it is now f (or possibly another letter). This might be part of the problem. I also have 3 drives on one controller, and two drives on another controller. I don't think that should be an issue, but thought I'd include it for thoroughness. Here's what I try to run:
mdadm -A /dev/md0 /dev/sd{a,b,c,e,f}1
mdadm: /dev/md0 assembled from 3 drives - not enough to start the array.
Then I ran:
mdadm --examine /dev/sd*1
And receive the following output:
/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 1ef7f14c:daf5731e:26644315:e48bd8ad
  Creation Time : Sun Sep  5 21:10:15 2010
     Raid Level : raid5
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
     Array Size : 7814047744 (7452.06 GiB 8001.58 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun May 20 21:30:36 2012
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0
       Checksum : beba01c0 - correct
         Events : 65642

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       0        0        2      faulty removed
   3     3       8        1        3      active sync   /dev/sda1
   4     4       8       65        4      active sync   /dev/sde1
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 1ef7f14c:daf5731e:26644315:e48bd8ad
  Creation Time : Sun Sep  5 21:10:15 2010
     Raid Level : raid5
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
     Array Size : 7814047744 (7452.06 GiB 8001.58 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Fri Feb 10 18:55:25 2012
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : be35dd2b - correct
         Events : 58402

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       49        2      active sync   /dev/sdd1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       8        1        3      active sync   /dev/sda1
   4     4       8       65        4      active sync   /dev/sde1
/dev/sdc1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 1ef7f14c:daf5731e:26644315:e48bd8ad
  Creation Time : Sun Sep  5 21:10:15 2010
     Raid Level : raid5
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
     Array Size : 7814047744 (7452.06 GiB 8001.58 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0

    Update Time : Mon May 21 00:16:09 2012
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 0
       Checksum : beba28c8 - correct
         Events : 65646

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       65        4      active sync   /dev/sde1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       8        1        3      active sync   /dev/sda1
   4     4       8       65        4      active sync   /dev/sde1
mdadm: No md superblock detected on /dev/sdd1.
/dev/sde1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 1ef7f14c:daf5731e:26644315:e48bd8ad
  Creation Time : Sun Sep  5 21:10:15 2010
     Raid Level : raid5
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
     Array Size : 7814047744 (7452.06 GiB 8001.58 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0

    Update Time : Mon May 21 00:16:09 2012
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 0
       Checksum : beba2886 - correct
         Events : 65646

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8        1        3      active sync   /dev/sda1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       8        1        3      active sync   /dev/sda1
   4     4       8       65        4      active sync   /dev/sde1
/dev/sdf1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 1ef7f14c:daf5731e:26644315:e48bd8ad
  Creation Time : Sun Sep  5 21:10:15 2010
     Raid Level : raid5
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
     Array Size : 7814047744 (7452.06 GiB 8001.58 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0

    Update Time : Mon May 21 00:16:09 2012
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 0
       Checksum : beba2890 - correct
         Events : 65646

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       17        0      active sync   /dev/sdb1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       8        1        3      active sync   /dev/sda1
   4     4       8       65        4      active sync   /dev/sde1
Sorry for the length on that. Any ideas on what to do next? I read this guide but it is a bit over my head and seems to indicate the original order of the disks being important, but I think I might have jacked with it enough that I lost that information, or maybe not? Given that the drives themselves report as healthy, I really think this is a weird software issue.
posted by geoff. to Computers & Internet (11 answers total) 1 user marked this as a favorite
This config used to work, right? Did you make any changes prior to this happening?

When you do your examine report, the drives think they are different drives than they used to be. a thinks it is c, b thinks it is d, c thinks it is e, e thinks it is a, f thinks it is b

You also have different versions of the superblock- some of the drives think two drives are bad, some of the drives think one is bad, and one thinks they are all good.

So I'm thinking the drives did in fact get reordered. Could be because the failing drive caused the controller to stall on POST and the drives got reported to the OS in the wrong order.

Meanwhile, one of the drives (the original sdd, I think) developed bad sectors or completely dropped out of the array. Then when the machine rebooted, it tried to auto assemble with the reordered drives and tried to use the bad/defunct one as the good one and got completely confused.

What I would do is pull the current sdb (that thinks it was sdd in the array). That's very likely the one that originally went bad and is probably the one with bad sectors. Double check your logs to make sure.

So after that, you'll have to do the create thing listed in the guide you mention. It looks correct to me. I would do it readonly until you are sure about the correct order.

Once you get it going, in the future, save the contents of your mdadm --examine command and also the results of a smartctl -i /dev/sd*1 so that you know the serial number of the drives in the array and what letter they are. So if in the future this happens again, you'll know the correct order based on serial number.

I think there is a way to create/assemble raid devices using disk labels instead of physical positions. This might be something to try, since your system seems to like to reorder drives.
posted by gjc at 2:30 PM on January 20, 2013


mdadm --create --assume-clean --level=5 --raid-devices=5 
/dev/md0 
/dev/sda1 
/dev/sdb1 
/dev/sdc1 
missing 
/dev/sde1
Like this, without line breaks?

1. How do I do this in read-only mode?

I think what happened is my OS is on a thumb drive, and when I was moving the computer it was taken out and put back into a different slot. This caused all the drives to be reported to the OS in the wrong order.
posted by geoff. at 2:52 PM on January 20, 2013


It looks like I can do mdadm --create --readonly, I may have answered my first question.
posted by geoff. at 2:59 PM on January 20, 2013


That looks right, but I'm not sure about the order.

I was looking through the man page for mdadm, there is a read-only option. But I've never used it.

I would also mount the array as read-only until you are sure about the order.
posted by gjc at 3:32 PM on January 20, 2013


So I did this and I keep getting unable to mount errors when I try to mount. Do I just keep trying different permutations until I can get something to mount?
posted by geoff. at 4:12 PM on January 20, 2013


Try mdadm --assemble --scan to see if mdadm can automatically detect the array.

Otherwise, force it to assemble the array: mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
posted by titantoppler at 4:51 PM on January 20, 2013


It won't auto assemble because the metadata is inconsistent between the drives.

Yes, try each permutation until it works.
posted by gjc at 5:59 PM on January 20, 2013


Ack this is going to be tedious.
posted by geoff. at 6:51 PM on January 20, 2013


Ah! I was searching "permutations bash" when I came across the article I listed which has a perl script to test this. I didn't want to test 5! combinations.
posted by geoff. at 6:56 PM on January 20, 2013


Oh no! I accidently did --create without a missing parameter when testing the script. I think I destroyed the array.
posted by geoff. at 8:54 PM on January 20, 2013


Then it's time to erase it all, build it fresh and restore the content from your backup drives.
posted by flabdablet at 2:02 AM on January 21, 2013


« Older Do I have a right to shares after turning them...   |   How to get the Galaxy Nte 2's Screen to stay on. Newer »
This thread is closed to new comments.