20 drive raid-10, CentOS5.5, after reboot assemble fails

Linux RAID subsystem development
 help / color / mirror / Atom feed

* 20 drive raid-10, CentOS5.5, after reboot assemble fails - all drives "non-fresh"
@ 2011-08-08  2:37 Jeff Johnson
  2011-08-08  2:56 ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Johnson @ 2011-08-08  2:37 UTC (permalink / raw)
  To: linux-raid

Greetings,

I have a 20 drive raid-10 that has been running well for over one year. 
After the most recently system boot the raid will not assemble. 
/var/log/messages shows that all of the drives are "non-fresh". 
Examining the drives show that the raid partitions are present, the 
superblocks have valid data and the Event ticker for the drives are 
equal for the data drives. The spare drives have a different Event 
ticker value.

I am reluctant to try and use the --force switch with assemble until I 
understand the problem better. There is very important data on this 
volume and it is not backed up to my knowledge. I do not know how the 
machine was brought down prior to this system boot.

With all drives being "non-fresh" I can't start a partial array and then 
re-add the remaining drives. I've unraveled some pretty messed up md 
configs and recovered the underlying filesystem but this one has me at a 
loss.

Any advice is greatly appreciated!

--Jeff

Below is the config file and output from mdadm examine commands:

/* Config file */
ARRAY /dev/md3 level=raid10 num-devices=20 
UUID=e17a29e8:ec6bce5c:f13d343c:cfba4dc4
    spares=4   
devices=/dev/sdz1,/dev/sdy1,/dev/sdx1,/dev/sdw1,/dev/sdv1,/dev/sdu1,/dev/sdt1,/dev/sd
s1,/dev/sdr1,/dev/sdq1,/dev/sdp1,/dev/sdo1,/dev/sdn1,/dev/sdm1,/dev/sdl1,/dev/sdk1,/dev/sdj1,/dev/s
di1,/dev/sdh1,/dev/sdg1,/dev/sdf1,/dev/sde1,/dev/sdd1,/dev/sdc1

/* mdadm -E /dev/sd[cdefghijklmnopqrstuvwxyz]1 | grep Event */
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 90
          Events : 92
          Events : 92
          Events : 92
          Events : 92

/* mdadm -E /dev/sdc1 */
/dev/sdc1:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : e17a29e8:ec6bce5c:f13d343c:cfba4dc4
   Creation Time : Fri Sep 24 12:06:37 2010
      Raid Level : raid10
   Used Dev Size : 99924096 (95.30 GiB 102.32 GB)
      Array Size : 999240960 (952.95 GiB 1023.22 GB)
    Raid Devices : 20
   Total Devices : 24
Preferred Minor : 3

     Update Time : Sat Aug  6 05:54:37 2011
           State : clean
  Active Devices : 20
Working Devices : 24
  Failed Devices : 0
   Spare Devices : 4
        Checksum : d8d97049 - correct
          Events : 90

          Layout : near=2
      Chunk Size : 128K

       Number   Major   Minor   RaidDevice State
this     0       8       33        0      active sync   /dev/sdc1

    0     0       8       33        0      active sync   /dev/sdc1
    1     1       8       49        1      active sync   /dev/sdd1
    2     2       8       65        2      active sync   /dev/sde1
    3     3       8       81        3      active sync   /dev/sdf1
    4     4       8       97        4      active sync   /dev/sdg1
    5     5       8      113        5      active sync   /dev/sdh1
    6     6       8      129        6      active sync   /dev/sdi1
    7     7       8      145        7      active sync   /dev/sdj1
    8     8       8      161        8      active sync   /dev/sdk1
    9     9       8      177        9      active sync   /dev/sdl1
   10    10       8      193       10      active sync   /dev/sdm1
   11    11       8      209       11      active sync   /dev/sdn1
   12    12       8      225       12      active sync   /dev/sdo1
   13    13       8      241       13      active sync   /dev/sdp1
   14    14      65        1       14      active sync   /dev/sdq1
   15    15      65       17       15      active sync   /dev/sdr1
   16    16      65       33       16      active sync   /dev/sds1
   17    17      65       49       17      active sync   /dev/sdt1
   18    18      65       65       18      active sync   /dev/sdu1
   19    19      65       81       19      active sync   /dev/sdv1
   20    20      65      145       20      spare   /dev/sdz1
   21    21      65      129       21      spare   /dev/sdy1
   22    22      65      113       22      spare   /dev/sdx1
   23    23      65       97       23      spare   /dev/sdw1

-- 
------------------------------
Jeff Johnson
Manager
Aeon Computing

jeff.johnson "at" aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 20 drive raid-10, CentOS5.5, after reboot assemble fails - all drives "non-fresh"
  2011-08-08  2:37 20 drive raid-10, CentOS5.5, after reboot assemble fails - all drives "non-fresh" Jeff Johnson
@ 2011-08-08  2:56 ` NeilBrown
  2011-08-08  4:32   ` Jeff Johnson
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2011-08-08  2:56 UTC (permalink / raw)
  To: Jeff Johnson; +Cc: linux-raid

On Sun, 07 Aug 2011 19:37:04 -0700 Jeff Johnson
<jeff.johnson@aeoncomputing.com> wrote:

> Greetings,
> 
> I have a 20 drive raid-10 that has been running well for over one year. 
> After the most recently system boot the raid will not assemble. 
> /var/log/messages shows that all of the drives are "non-fresh". 
> Examining the drives show that the raid partitions are present, the 
> superblocks have valid data and the Event ticker for the drives are 
> equal for the data drives. The spare drives have a different Event 
> ticker value.
> 
> I am reluctant to try and use the --force switch with assemble until I 
> understand the problem better. There is very important data on this 
> volume and it is not backed up to my knowledge. I do not know how the 
> machine was brought down prior to this system boot.
> 
> With all drives being "non-fresh" I can't start a partial array and then 
> re-add the remaining drives. I've unraveled some pretty messed up md 
> configs and recovered the underlying filesystem but this one has me at a 
> loss.
> 
> Any advice is greatly appreciated!
> 
> --Jeff
> 
> Below is the config file and output from mdadm examine commands:
> 
> /* Config file */
> ARRAY /dev/md3 level=raid10 num-devices=20 
> UUID=e17a29e8:ec6bce5c:f13d343c:cfba4dc4
>     spares=4   
> devices=/dev/sdz1,/dev/sdy1,/dev/sdx1,/dev/sdw1,/dev/sdv1,/dev/sdu1,/dev/sdt1,/dev/sd
> s1,/dev/sdr1,/dev/sdq1,/dev/sdp1,/dev/sdo1,/dev/sdn1,/dev/sdm1,/dev/sdl1,/dev/sdk1,/dev/sdj1,/dev/s
> di1,/dev/sdh1,/dev/sdg1,/dev/sdf1,/dev/sde1,/dev/sdd1,/dev/sdc1

You really don't want that 'devices=" clause in there.  Device names can
change...


> 
> /* mdadm -E /dev/sd[cdefghijklmnopqrstuvwxyz]1 | grep Event */
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 90
>           Events : 92
>           Events : 92
>           Events : 92
>           Events : 92

So the spares are '92' and the others are '90'.  That is weird...

However you should be able to assemble the array by simply listing all the
non-spare devices:

 mdadm -A /dev/md3 /dev/sd[c-v]1

NeilBrown



> 
> /* mdadm -E /dev/sdc1 */
> /dev/sdc1:
>            Magic : a92b4efc
>          Version : 0.90.00
>             UUID : e17a29e8:ec6bce5c:f13d343c:cfba4dc4
>    Creation Time : Fri Sep 24 12:06:37 2010
>       Raid Level : raid10
>    Used Dev Size : 99924096 (95.30 GiB 102.32 GB)
>       Array Size : 999240960 (952.95 GiB 1023.22 GB)
>     Raid Devices : 20
>    Total Devices : 24
> Preferred Minor : 3
> 
>      Update Time : Sat Aug  6 05:54:37 2011
>            State : clean
>   Active Devices : 20
> Working Devices : 24
>   Failed Devices : 0
>    Spare Devices : 4
>         Checksum : d8d97049 - correct
>           Events : 90
> 
>           Layout : near=2
>       Chunk Size : 128K
> 
>        Number   Major   Minor   RaidDevice State
> this     0       8       33        0      active sync   /dev/sdc1
> 
>     0     0       8       33        0      active sync   /dev/sdc1
>     1     1       8       49        1      active sync   /dev/sdd1
>     2     2       8       65        2      active sync   /dev/sde1
>     3     3       8       81        3      active sync   /dev/sdf1
>     4     4       8       97        4      active sync   /dev/sdg1
>     5     5       8      113        5      active sync   /dev/sdh1
>     6     6       8      129        6      active sync   /dev/sdi1
>     7     7       8      145        7      active sync   /dev/sdj1
>     8     8       8      161        8      active sync   /dev/sdk1
>     9     9       8      177        9      active sync   /dev/sdl1
>    10    10       8      193       10      active sync   /dev/sdm1
>    11    11       8      209       11      active sync   /dev/sdn1
>    12    12       8      225       12      active sync   /dev/sdo1
>    13    13       8      241       13      active sync   /dev/sdp1
>    14    14      65        1       14      active sync   /dev/sdq1
>    15    15      65       17       15      active sync   /dev/sdr1
>    16    16      65       33       16      active sync   /dev/sds1
>    17    17      65       49       17      active sync   /dev/sdt1
>    18    18      65       65       18      active sync   /dev/sdu1
>    19    19      65       81       19      active sync   /dev/sdv1
>    20    20      65      145       20      spare   /dev/sdz1
>    21    21      65      129       21      spare   /dev/sdy1
>    22    22      65      113       22      spare   /dev/sdx1
>    23    23      65       97       23      spare   /dev/sdw1
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 20 drive raid-10, CentOS5.5, after reboot assemble fails - all drives "non-fresh"
  2011-08-08  2:56 ` NeilBrown
@ 2011-08-08  4:32   ` Jeff Johnson
  2011-08-08  4:40     ` Joe Landman
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Johnson @ 2011-08-08  4:32 UTC (permalink / raw)
  To: linux-raid

I am now able (thanks to Neil's suggestion) manually assemble the 
/dev/md3 raid10 volume using:

mdadm -A /dev/md3 /dev/sd[cdefghijklmnopqrstuv]1

and then manually adding the spares back with: mdadm --add /dev/md3 
/dev/sd[wxyz]1

The data is intact, phew! I am still unable to start the raid using a 
config file. I gracefully stopped the raid using 'mdadm --stop /dev/md3' 
and then tried 'mdadm -A /dev/md3 -c /etc/mdadm.conf.mdt' and it fails 
to start.

I recreated the config file using 'mdadm --examine --scan > 
/etc/mdadm.conf'. Then I stopped /dev/md3 and tried to assemble it again 
using 'mdadm -A /dev/md3' and it fails to assemble and start.

It is good I can start the raid manually but it isn't supposed to work 
like that. Any idea why assembling from a config file would fail? Here 
is the latest version of the config file line (made with mdadm --examine 
--scan):

ARRAY /dev/md3 level=raid10 num-devices=20 metadata=0.90 spares=4 
UUID=e17a29e8:ec6bce5c:f13d343c:cfba4dc4

--Jeff

On Sun, Aug 7, 2011 at 7:56 PM, NeilBrown<neilb@suse.de 
<mailto:neilb@suse.de>>wrote:

    On Sun, 07 Aug 2011 19:37:04 -0700 Jeff Johnson
    <jeff.johnson@aeoncomputing.com
    <mailto:jeff.johnson@aeoncomputing.com>> wrote:

    >  Greetings,
    >
    >  I have a 20 drive raid-10 that has been running well for over one
    year.
    >  After the most recently system boot the raid will not assemble.
    >  /var/log/messages shows that all of the drives are "non-fresh".

    --snip--

    You really don't want that 'devices=" clause in there.  Device names can
    change.. 

    --snip--

    >            Events : 90
    >            Events : 90
    >            Events : 92
    >            Events : 92
    >            Events : 92
    >            Events : 92

    So the spares are '92' and the others are '90'.  That is weird...

    However you should be able to assemble the array by simply listing
    all the
    non-spare devices:

      mdadm -A /dev/md3 /dev/sd[c-v]1

    NeilBrown


--
------------------------------
Jeff Johnson
Manager
Aeon Computing

jeff.johnson "at" aeoncomputing.com <mailto:jeff.johnson@aeoncomputing.com>
www.aeoncomputing.com <http://www.aeoncomputing.com/>
t:858-412-3810 x101 <tel:858-412-3810%20x101>  f:858-412-3845 
<tel:858-412-3845>


4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 20 drive raid-10, CentOS5.5, after reboot assemble fails - all drives "non-fresh"
  2011-08-08  4:32   ` Jeff Johnson
@ 2011-08-08  4:40     ` Joe Landman
  2011-08-08  4:54       ` Jeff Johnson
  0 siblings, 1 reply; 8+ messages in thread
From: Joe Landman @ 2011-08-08  4:40 UTC (permalink / raw)
  To: Jeff Johnson; +Cc: linux-raid

On 08/08/2011 12:32 AM, Jeff Johnson wrote:

> It is good I can start the raid manually but it isn't supposed to work
> like that. Any idea why assembling from a config file would fail? Here
> is the latest version of the config file line (made with mdadm --examine
> --scan):

Jeff,

   You might need to update the raid superblocks during the manual assemble.

   mdadm --assemble --update=summaries /dev/md3 /dev/sd[c-v]1

   Also, you can simplify the below a bit to the following:

>
> ARRAY /dev/md3 level=raid10 num-devices=20 metadata=0.90 spares=4
> UUID=e17a29e8:ec6bce5c:f13d343c:cfba4dc4

ARRAY /dev/md3 UUID=e17a29e8:ec6bce5c:f13d343c:cfba4dc4

>
> --Jeff
>
> On Sun, Aug 7, 2011 at 7:56 PM, NeilBrown<neilb@suse.de
> <mailto:neilb@suse.de>>wrote:
>
> On Sun, 07 Aug 2011 19:37:04 -0700 Jeff Johnson
> <jeff.johnson@aeoncomputing.com
> <mailto:jeff.johnson@aeoncomputing.com>> wrote:
>
>  > Greetings,
>  >
>  > I have a 20 drive raid-10 that has been running well for over one
> year.
>  > After the most recently system boot the raid will not assemble.
>  > /var/log/messages shows that all of the drives are "non-fresh".
>
> --snip--
>
> You really don't want that 'devices=" clause in there. Device names can
> change..
> --snip--
>
>  > Events : 90
>  > Events : 90
>  > Events : 92
>  > Events : 92
>  > Events : 92
>  > Events : 92
>
> So the spares are '92' and the others are '90'. That is weird...
>
> However you should be able to assemble the array by simply listing
> all the
> non-spare devices:
>
> mdadm -A /dev/md3 /dev/sd[c-v]1
>
> NeilBrown
>
>
> --
> ------------------------------
> Jeff Johnson
> Manager
> Aeon Computing
>
> jeff.johnson "at" aeoncomputing.com <mailto:jeff.johnson@aeoncomputing.com>
> www.aeoncomputing.com <http://www.aeoncomputing.com/>
> t:858-412-3810 x101 <tel:858-412-3810%20x101> f:858-412-3845
> <tel:858-412-3845>
>
>
> 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 20 drive raid-10, CentOS5.5, after reboot assemble fails - all drives "non-fresh"
  2011-08-08  4:40     ` Joe Landman
@ 2011-08-08  4:54       ` Jeff Johnson
  2011-08-08  5:04         ` Joe Landman
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Johnson @ 2011-08-08  4:54 UTC (permalink / raw)
  To: linux-raid

Joe,

The raid still won't assemble via config file:

mdadm --assemble --update=summaries /dev/md3 /dev/sd[c-v]1
mdadm --add /dev/md3 /dev/sd[wxyz]1   (spares)
mdadm --examine --scan | grep md3 > /etc/mdadm.conf.mdt_new
mdadm --stop /dev/md3
mdadm -vv --assemble /dev/md3 -c /etc/mdadm.conf.mdt_new

Output:
mdadm: looking for devices for /dev/md3
mdadm: no recogniseable superblock on /dev/sdz2
mdadm: /dev/sdz2 has wrong uuid.
mdadm: /dev/sdz has wrong uuid.
mdadm: no RAID superblock on /dev/sdy2
mdadm: /dev/sdy2 has wrong uuid.
mdadm: /dev/sdy has wrong uuid.
mdadm: no RAID superblock on /dev/sdx2
mdadm: /dev/sdx2 has wrong uuid.
mdadm: /dev/sdx has wrong uuid.
mdadm: no RAID superblock on /dev/sdw2
mdadm: /dev/sdw2 has wrong uuid.
mdadm: /dev/sdw has wrong uuid.
mdadm: no RAID superblock on /dev/sdv2
mdadm: /dev/sdv2 has wrong uuid.
mdadm: /dev/sdv has wrong uuid.
mdadm: no RAID superblock on /dev/sdu2
mdadm: /dev/sdu2 has wrong uuid.
mdadm: /dev/sdu has wrong uuid.
mdadm: no RAID superblock on /dev/sdt2
mdadm: /dev/sdt2 has wrong uuid.
mdadm: /dev/sdt has wrong uuid.
mdadm: no RAID superblock on /dev/sds2
mdadm: /dev/sds2 has wrong uuid.
mdadm: /dev/sds has wrong uuid.
mdadm: no RAID superblock on /dev/sdr2
mdadm: /dev/sdr2 has wrong uuid.
mdadm: /dev/sdr has wrong uuid.
mdadm: no RAID superblock on /dev/sdq2
mdadm: /dev/sdq2 has wrong uuid.
mdadm: /dev/sdq has wrong uuid.
mdadm: no RAID superblock on /dev/sdp2
mdadm: /dev/sdp2 has wrong uuid.
mdadm: /dev/sdp has wrong uuid.
mdadm: no RAID superblock on /dev/sdo2
mdadm: /dev/sdo2 has wrong uuid.
mdadm: /dev/sdo has wrong uuid.
mdadm: no RAID superblock on /dev/sdn2
mdadm: /dev/sdn2 has wrong uuid.
mdadm: /dev/sdn has wrong uuid.
mdadm: no RAID superblock on /dev/sdm2
mdadm: /dev/sdm2 has wrong uuid.
mdadm: /dev/sdm has wrong uuid.
mdadm: no RAID superblock on /dev/sdl2
mdadm: /dev/sdl2 has wrong uuid.
mdadm: /dev/sdl has wrong uuid.
mdadm: no RAID superblock on /dev/sdk2
mdadm: /dev/sdk2 has wrong uuid.
mdadm: /dev/sdk has wrong uuid.
mdadm: no RAID superblock on /dev/sdj2
mdadm: /dev/sdj2 has wrong uuid.
mdadm: /dev/sdj has wrong uuid.
mdadm: no RAID superblock on /dev/sdi2
mdadm: /dev/sdi2 has wrong uuid.
mdadm: /dev/sdi has wrong uuid.
mdadm: no RAID superblock on /dev/sdh2
mdadm: /dev/sdh2 has wrong uuid.
mdadm: /dev/sdh has wrong uuid.
mdadm: no RAID superblock on /dev/sdg2
mdadm: /dev/sdg2 has wrong uuid.
mdadm: /dev/sdg has wrong uuid.
mdadm: no RAID superblock on /dev/sdf2
mdadm: /dev/sdf2 has wrong uuid.
mdadm: /dev/sdf has wrong uuid.
mdadm: no RAID superblock on /dev/sde2
mdadm: /dev/sde2 has wrong uuid.
mdadm: /dev/sde has wrong uuid.
mdadm: no RAID superblock on /dev/sdd2
mdadm: /dev/sdd2 has wrong uuid.
mdadm: /dev/sdd has wrong uuid.
mdadm: no RAID superblock on /dev/sdc2
mdadm: /dev/sdc2 has wrong uuid.
mdadm: /dev/sdc has wrong uuid.
mdadm: cannot open device 
/dev/disk/by-uuid/55389e74-b43e-4a6b-97c5-573fcd91a4b7: Device or 
resource busy
mdadm: /dev/disk/by-uuid/55389e74-b43e-4a6b-97c5-573fcd91a4b7 has wrong 
uuid.
mdadm: cannot open device 
/dev/disk/by-uuid/ab90577f-7c58-4e91-95e4-25025cf01790: Device or 
resource busy
mdadm: /dev/disk/by-uuid/ab90577f-7c58-4e91-95e4-25025cf01790 has wrong 
uuid.
mdadm: cannot open device /dev/root: Device or resource busy
mdadm: /dev/root has wrong uuid.
mdadm: cannot open device /dev/sdb3: Device or resource busy
mdadm: /dev/sdb3 has wrong uuid.
mdadm: cannot open device /dev/sdb2: Device or resource busy
mdadm: /dev/sdb2 has wrong uuid.
mdadm: cannot open device /dev/sdb1: Device or resource busy
mdadm: /dev/sdb1 has wrong uuid.
mdadm: cannot open device /dev/sdb: Device or resource busy
mdadm: /dev/sdb has wrong uuid.
mdadm: cannot open device /dev/sda3: Device or resource busy
mdadm: /dev/sda3 has wrong uuid.
mdadm: cannot open device /dev/sda2: Device or resource busy
mdadm: /dev/sda2 has wrong uuid.
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has wrong uuid.
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has wrong uuid.
mdadm: /dev/sdz1 is identified as a member of /dev/md3, slot 20.
mdadm: /dev/sdy1 is identified as a member of /dev/md3, slot 21.
mdadm: /dev/sdx1 is identified as a member of /dev/md3, slot 22.
mdadm: /dev/sdw1 is identified as a member of /dev/md3, slot 23.
mdadm: /dev/sdv1 is identified as a member of /dev/md3, slot 19.
mdadm: /dev/sdu1 is identified as a member of /dev/md3, slot 18.
mdadm: /dev/sdt1 is identified as a member of /dev/md3, slot 17.
mdadm: /dev/sds1 is identified as a member of /dev/md3, slot 16.
mdadm: /dev/sdr1 is identified as a member of /dev/md3, slot 15.
mdadm: /dev/sdq1 is identified as a member of /dev/md3, slot 14.
mdadm: /dev/sdp1 is identified as a member of /dev/md3, slot 13.
mdadm: /dev/sdo1 is identified as a member of /dev/md3, slot 12.
mdadm: /dev/sdn1 is identified as a member of /dev/md3, slot 11.
mdadm: /dev/sdm1 is identified as a member of /dev/md3, slot 10.
mdadm: /dev/sdl1 is identified as a member of /dev/md3, slot 9.
mdadm: /dev/sdk1 is identified as a member of /dev/md3, slot 8.
mdadm: /dev/sdj1 is identified as a member of /dev/md3, slot 7.
mdadm: /dev/sdi1 is identified as a member of /dev/md3, slot 6.
mdadm: /dev/sdh1 is identified as a member of /dev/md3, slot 5.
mdadm: /dev/sdg1 is identified as a member of /dev/md3, slot 4.
mdadm: /dev/sdf1 is identified as a member of /dev/md3, slot 3.
mdadm: /dev/sde1 is identified as a member of /dev/md3, slot 2.
mdadm: /dev/sdd1 is identified as a member of /dev/md3, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md3, slot 0.
mdadm: No suitable drives found for /dev/md3

Maybe '--update=uuid' ??

--Jeff

On 8/7/11 9:40 PM, Joe Landman wrote:
> On 08/08/2011 12:32 AM, Jeff Johnson wrote:
>
>> It is good I can start the raid manually but it isn't supposed to work
>> like that. Any idea why assembling from a config file would fail? Here
>> is the latest version of the config file line (made with mdadm --examine
>> --scan):
>
> Jeff,
>
>   You might need to update the raid superblocks during the manual 
> assemble.
>
>   mdadm --assemble --update=summaries /dev/md3 /dev/sd[c-v]1
>
>   Also, you can simplify the below a bit to the following:
>
>>
>> ARRAY /dev/md3 level=raid10 num-devices=20 metadata=0.90 spares=4
>> UUID=e17a29e8:ec6bce5c:f13d343c:cfba4dc4
>
> ARRAY /dev/md3 UUID=e17a29e8:ec6bce5c:f13d343c:cfba4dc4
>

-- 
------------------------------
Jeff Johnson
Manager
Aeon Computing

jeff.johnson "at" aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 20 drive raid-10, CentOS5.5, after reboot assemble fails - all drives "non-fresh"
  2011-08-08  4:54       ` Jeff Johnson
@ 2011-08-08  5:04         ` Joe Landman
  2011-08-08  5:55           ` Jeff Johnson
  0 siblings, 1 reply; 8+ messages in thread
From: Joe Landman @ 2011-08-08  5:04 UTC (permalink / raw)
  To: Jeff Johnson; +Cc: linux-raid

On 08/08/2011 12:54 AM, Jeff Johnson wrote:
> Joe,
>
> The raid still won't assemble via config file:

[...]

> mdadm: /dev/sdz1 is identified as a member of /dev/md3, slot 20.
> mdadm: /dev/sdy1 is identified as a member of /dev/md3, slot 21.
> mdadm: /dev/sdx1 is identified as a member of /dev/md3, slot 22.
> mdadm: /dev/sdw1 is identified as a member of /dev/md3, slot 23.
> mdadm: /dev/sdv1 is identified as a member of /dev/md3, slot 19.
> mdadm: /dev/sdu1 is identified as a member of /dev/md3, slot 18.
> mdadm: /dev/sdt1 is identified as a member of /dev/md3, slot 17.
> mdadm: /dev/sds1 is identified as a member of /dev/md3, slot 16.
> mdadm: /dev/sdr1 is identified as a member of /dev/md3, slot 15.
> mdadm: /dev/sdq1 is identified as a member of /dev/md3, slot 14.
> mdadm: /dev/sdp1 is identified as a member of /dev/md3, slot 13.
> mdadm: /dev/sdo1 is identified as a member of /dev/md3, slot 12.
> mdadm: /dev/sdn1 is identified as a member of /dev/md3, slot 11.
> mdadm: /dev/sdm1 is identified as a member of /dev/md3, slot 10.
> mdadm: /dev/sdl1 is identified as a member of /dev/md3, slot 9.
> mdadm: /dev/sdk1 is identified as a member of /dev/md3, slot 8.
> mdadm: /dev/sdj1 is identified as a member of /dev/md3, slot 7.
> mdadm: /dev/sdi1 is identified as a member of /dev/md3, slot 6.
> mdadm: /dev/sdh1 is identified as a member of /dev/md3, slot 5.
> mdadm: /dev/sdg1 is identified as a member of /dev/md3, slot 4.
> mdadm: /dev/sdf1 is identified as a member of /dev/md3, slot 3.
> mdadm: /dev/sde1 is identified as a member of /dev/md3, slot 2.
> mdadm: /dev/sdd1 is identified as a member of /dev/md3, slot 1.
> mdadm: /dev/sdc1 is identified as a member of /dev/md3, slot 0.
> mdadm: No suitable drives found for /dev/md3
>
>
> Maybe '--update=uuid' ??

It looks like it correctly finds /dev/sd[c-z]1 as elements of /dev/md3

Which mdadm are you using?

	mdadm -V

and which kernel?

Try the UUID update, and let us know if it helps.  Also if your mdadm is 
old (2.6.x), try updating to 3.1.x.

FWIW: we've found problems in the past with Centos 5.4 to 5.5 kernels 
with MD arrays.  Often times our only real solution would be to update 
the full OS on the boot drives.  This is for distro specific kernels. 
For our kernels, we don't run into this issue.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 20 drive raid-10, CentOS5.5, after reboot assemble fails - all drives "non-fresh"
  2011-08-08  5:04         ` Joe Landman
@ 2011-08-08  5:55           ` Jeff Johnson
  2011-08-08 16:17             ` Joe Landman
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Johnson @ 2011-08-08  5:55 UTC (permalink / raw)
  To: linux-raid

Joe / et-al,

The '--assemble --update=uuid' appears to have done the trick. It is 
weird because the uuid in the config file matched the uuid of the raid 
volume shown with 'mdadm -D /dev/md3' and the uuid on each of the drives 
shown with 'mdadm -E /dev/sdc1'

The '--update=summaries' did not work. Assigning a new random uuid 
appears to have repaired whatever bit in the superblock was mucked up.

Strange...

Joe, thanks for your help. Find me at SC11, I'm buying you beers.

--Jeff



On 8/7/11 10:04 PM, Joe Landman wrote:
>> Maybe '--update=uuid' ??
>
>
> It looks like it correctly finds /dev/sd[c-z]1 as elements of /dev/md3
>
> Which mdadm are you using?
>
>     mdadm -V
>
> and which kernel?
>
> Try the UUID update, and let us know if it helps.  Also if your mdadm 
> is old (2.6.x), try updating to 3.1.x.
>
> FWIW: we've found problems in the past with Centos 5.4 to 5.5 kernels 
> with MD arrays.  Often times our only real solution would be to update 
> the full OS on the boot drives.  This is for distro specific kernels. 
> For our kernels, we don't run into this issue.
>
>


-- 
------------------------------
Jeff Johnson
Manager
Aeon Computing

jeff.johnson "at" aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 20 drive raid-10, CentOS5.5, after reboot assemble fails - all drives "non-fresh"
  2011-08-08  5:55           ` Jeff Johnson
@ 2011-08-08 16:17             ` Joe Landman
  0 siblings, 0 replies; 8+ messages in thread
From: Joe Landman @ 2011-08-08 16:17 UTC (permalink / raw)
  To: Jeff Johnson; +Cc: linux-raid

On 08/08/2011 01:55 AM, Jeff Johnson wrote:
> Joe / et-al,
>
> The '--assemble --update=uuid' appears to have done the trick. It is
> weird because the uuid in the config file matched the uuid of the raid
> volume shown with 'mdadm -D /dev/md3' and the uuid on each of the drives
> shown with 'mdadm -E /dev/sdc1'

Interesting.

>
> The '--update=summaries' did not work. Assigning a new random uuid
> appears to have repaired whatever bit in the superblock was mucked up.
>
> Strange...
>
> Joe, thanks for your help. Find me at SC11, I'm buying you beers.
>

:)  see you there




-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-08-08 16:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-08  2:37 20 drive raid-10, CentOS5.5, after reboot assemble fails - all drives "non-fresh" Jeff Johnson
2011-08-08  2:56 ` NeilBrown
2011-08-08  4:32   ` Jeff Johnson
2011-08-08  4:40     ` Joe Landman
2011-08-08  4:54       ` Jeff Johnson
2011-08-08  5:04         ` Joe Landman
2011-08-08  5:55           ` Jeff Johnson
2011-08-08 16:17             ` Joe Landman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox