Upgrade from Ubuntu 10.04 to 12.04 broken raid6.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
@ 2012-09-30  9:21 EJ
  2012-09-30  9:30 ` EJ Vincent
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: EJ @ 2012-09-30  9:21 UTC (permalink / raw)
  To: linux-raid

Greetings,

I hope that I'm posting this in the right place, if not my apologies.

Up until several hours ago, my system was running Ubuntu 10.04 LTS, using the 
stock version of mdadm--unfortunately I have no idea which version it was.

Fast forward to now, I've upgraded the system to 12.04 LTS and have lost access 
to my array. The array itself is a nine (9) disk raid6 managed by mdadm.

I'm not sure this is pertinent information, but trying to get 12.04 LTS to boot 
was an exercise in patience.  There was some sort of race condition possibly 
happening between the disks of the array initializing and 12.04's udev. It would 
constantly drop me to a busybox shell, trying to degrade the known-working 
array.

Eventually, I had to go into /usr/share/initramfs-tools/scripts/mdadm-functions 
and type "exit 1" into both degraded_arrays() and mountroot_fail() so that my 
system could at the very least boot.  I fear that the constant rebooting and 
12.04's aggressive initramfs scripting has somehow damaged my array.

Ok back to the array itself, here's some raw command data:

# mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 
/dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1
mdadm: superblock on /dev/sdc1 doesn't match others - assembly aborted

I also tried # mdadm --auto-detect and found this in dmesg:

[  676.998212] md: Autodetecting RAID arrays.
[  676.998426] md: invalid raid superblock magic on sdc1
[  676.998458] md: sdc1 does not have a valid v0.90 superblock, not importing!
[  676.998870] md: invalid raid superblock magic on sde1
[  676.998911] md: sde1 does not have a valid v0.90 superblock, not importing!
[  676.999474] md: invalid raid superblock magic on sdb1
[  676.999495] md: sdb1 does not have a valid v0.90 superblock, not importing!
[  676.999703] md: invalid raid superblock magic on sdd1
[  676.999732] md: sdd1 does not have a valid v0.90 superblock, not importing!
[  677.000137] md: invalid raid superblock magic on sdf1
[  677.000163] md: sdf1 does not have a valid v0.90 superblock, not importing!
[  677.000566] md: invalid raid superblock magic on sdg1
[  677.000586] md: sdg1 does not have a valid v0.90 superblock, not importing!
[  677.000940] md: invalid raid superblock magic on sdh1
[  677.000960] md: sdh1 does not have a valid v0.90 superblock, not importing!
[  677.001356] md: invalid raid superblock magic on sdi1
[  677.001375] md: sdi1 does not have a valid v0.90 superblock, not importing!
[  677.001841] md: invalid raid superblock magic on sdj1
[  677.001871] md: sdj1 does not have a valid v0.90 superblock, not importing!
[  677.001933] md: Scanned 9 and added 0 devices.
[  677.001938] md: autorun ...
[  677.001941] md: ... autorun DONE.

Here are the disks themselves:

# mdadm -E /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 19:40:25 2011
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : a6fd99b2:7bb75287:5d844ec5:822b6d8a

    Update Time : Sun Sep 30 04:34:27 2012
       Checksum : 760485cb - correct
         Events : 2474296

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : AAAAAAAAA ('A' == active, '.' == missing)

# mdadm -E /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 19:40:25 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : f3f72549:8543972f:1f4a655d:fa9416bd

    Update Time : Sun Sep 30 07:26:43 2012
       Checksum : 7e955e4e - correct
         Events : 1


   Device Role : spare
   Array State :  ('A' == active, '.' == missing)

# mdadm -E /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 19:40:25 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 9c908e4b:ad7d8af8:ff5d2ab6:50b013e5

    Update Time : Sun Sep 30 07:26:43 2012
       Checksum : cab36055 - correct
         Events : 1


   Device Role : spare
   Array State :  ('A' == active, '.' == missing)

# mdadm -E /dev/sde1
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 19:40:25 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 321368f6:9f38bc16:76f787c3:4b3d398d

    Update Time : Sun Sep 30 07:26:43 2012
       Checksum : 4941c455 - correct
         Events : 1


   Device Role : spare
   Array State :  ('A' == active, '.' == missing)

# mdadm -E /dev/sdf1
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 19:40:25 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 6190765b:200ff748:d50a75e3:597405c4

    Update Time : Sun Sep 30 07:26:43 2012
       Checksum : 37446270 - correct
         Events : 1


   Device Role : spare
   Array State :  ('A' == active, '.' == missing)

# mdadm -E /dev/sdg1
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 19:40:25 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 7d707598:a8881376:531ae0c6:aac82909

    Update Time : Sun Sep 30 07:26:43 2012
       Checksum : c9ef1fe9 - correct
         Events : 1


   Device Role : spare
   Array State :  ('A' == active, '.' == missing)

# mdadm -E /dev/sdh1
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 19:40:25 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 179691a0:fd201c2d:49c73803:409a0a9c

    Update Time : Sun Sep 30 07:26:43 2012
       Checksum : 584d5c61 - correct
         Events : 1


   Device Role : spare
   Array State :  ('A' == active, '.' == missing)

# mdadm -E /dev/sdi1
/dev/sdi1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 19:40:25 2011
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 9d53248b:1db27ffc:a2a511c3:7176a7eb

    Update Time : Sun Sep 30 04:34:27 2012
       Checksum : 22b9429c - correct
         Events : 2474296

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 8
   Array State : AAAAAAAAA ('A' == active, '.' == missing)

# mdadm -E /dev/sdj1
/dev/sdj1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 19:40:25 2011
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 880ed7fb:b9c673de:929d14c5:53f9b81d

    Update Time : Sun Sep 30 04:34:27 2012
       Checksum : a9748cf3 - correct
         Events : 2474296

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 7
   Array State : AAAAAAAAA ('A' == active, '.' == missing)

I find it odd that the raid levels for some of the disks would register as 
"unknown" and that their device roles would be shifted to "spare".

Current system:

Linux ruby 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 
x86_64 x86_64 GNU/Linux

Mdadm version:

mdadm - v3.2.3 - 23rd December 2011

I hope I've provided enough information. I would be more than happy to elaborate 
or provide additional data if need be. Again, this array was functioning 
normally up until a few hours ago. Am I able to salvage my data?

Thank you.

-EJ


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-09-30  9:21 Upgrade from Ubuntu 10.04 to 12.04 broken raid6 EJ
@ 2012-09-30  9:30 ` EJ Vincent
  2012-09-30  9:44 ` Jan Ceuleers
  2012-09-30 10:04 ` Mikael Abrahamsson
  2 siblings, 0 replies; 17+ messages in thread
From: EJ Vincent @ 2012-09-30  9:30 UTC (permalink / raw)
  To: linux-raid

On 9/30/2012 5:21 AM, EJ wrote:
> Greetings,
>
> I hope that I'm posting this in the right place, if not my apologies.
>
> Up until several hours ago, my system was running Ubuntu 10.04 LTS, using the
> stock version of mdadm--unfortunately I have no idea which version it was.
>
> Fast forward to now, I've upgraded the system to 12.04 LTS and have lost access
> to my array. The array itself is a nine (9) disk raid6 managed by mdadm.
>
> I'm not sure this is pertinent information, but trying to get 12.04 LTS to boot
> was an exercise in patience.  There was some sort of race condition possibly
> happening between the disks of the array initializing and 12.04's udev. It would
> constantly drop me to a busybox shell, trying to degrade the known-working
> array.
>
> Eventually, I had to go into /usr/share/initramfs-tools/scripts/mdadm-functions
> and type "exit 1" into both degraded_arrays() and mountroot_fail() so that my
> system could at the very least boot.  I fear that the constant rebooting and
> 12.04's aggressive initramfs scripting has somehow damaged my array.
>
> Ok back to the array itself, here's some raw command data:
>
> # mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
> /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1
> mdadm: superblock on /dev/sdc1 doesn't match others - assembly aborted
>
> I also tried # mdadm --auto-detect and found this in dmesg:
>
> [  676.998212] md: Autodetecting RAID arrays.
> [  676.998426] md: invalid raid superblock magic on sdc1
> [  676.998458] md: sdc1 does not have a valid v0.90 superblock, not importing!
> [  676.998870] md: invalid raid superblock magic on sde1
> [  676.998911] md: sde1 does not have a valid v0.90 superblock, not importing!
> [  676.999474] md: invalid raid superblock magic on sdb1
> [  676.999495] md: sdb1 does not have a valid v0.90 superblock, not importing!
> [  676.999703] md: invalid raid superblock magic on sdd1
> [  676.999732] md: sdd1 does not have a valid v0.90 superblock, not importing!
> [  677.000137] md: invalid raid superblock magic on sdf1
> [  677.000163] md: sdf1 does not have a valid v0.90 superblock, not importing!
> [  677.000566] md: invalid raid superblock magic on sdg1
> [  677.000586] md: sdg1 does not have a valid v0.90 superblock, not importing!
> [  677.000940] md: invalid raid superblock magic on sdh1
> [  677.000960] md: sdh1 does not have a valid v0.90 superblock, not importing!
> [  677.001356] md: invalid raid superblock magic on sdi1
> [  677.001375] md: sdi1 does not have a valid v0.90 superblock, not importing!
> [  677.001841] md: invalid raid superblock magic on sdj1
> [  677.001871] md: sdj1 does not have a valid v0.90 superblock, not importing!
> [  677.001933] md: Scanned 9 and added 0 devices.
> [  677.001938] md: autorun ...
> [  677.001941] md: ... autorun DONE.
>
> Here are the disks themselves:
>
> # mdadm -E /dev/sdb1
> /dev/sdb1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
>             Name : ruby:6  (local to host ruby)
>    Creation Time : Mon Apr 11 19:40:25 2011
>       Raid Level : raid6
>     Raid Devices : 9
>
>   Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
>       Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
>    Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
>      Data Offset : 272 sectors
>     Super Offset : 8 sectors
>            State : clean
>      Device UUID : a6fd99b2:7bb75287:5d844ec5:822b6d8a
>
>      Update Time : Sun Sep 30 04:34:27 2012
>         Checksum : 760485cb - correct
>           Events : 2474296
>
>           Layout : left-symmetric
>       Chunk Size : 512K
>
>     Device Role : Active device 5
>     Array State : AAAAAAAAA ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdc1
> /dev/sdc1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
>             Name : ruby:6  (local to host ruby)
>    Creation Time : Mon Apr 11 19:40:25 2011
>       Raid Level : -unknown-
>     Raid Devices : 0
>
>   Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
>      Data Offset : 272 sectors
>     Super Offset : 8 sectors
>            State : active
>      Device UUID : f3f72549:8543972f:1f4a655d:fa9416bd
>
>      Update Time : Sun Sep 30 07:26:43 2012
>         Checksum : 7e955e4e - correct
>           Events : 1
>
>
>     Device Role : spare
>     Array State :  ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdd1
> /dev/sdd1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
>             Name : ruby:6  (local to host ruby)
>    Creation Time : Mon Apr 11 19:40:25 2011
>       Raid Level : -unknown-
>     Raid Devices : 0
>
>   Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
>      Data Offset : 272 sectors
>     Super Offset : 8 sectors
>            State : active
>      Device UUID : 9c908e4b:ad7d8af8:ff5d2ab6:50b013e5
>
>      Update Time : Sun Sep 30 07:26:43 2012
>         Checksum : cab36055 - correct
>           Events : 1
>
>
>     Device Role : spare
>     Array State :  ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sde1
> /dev/sde1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
>             Name : ruby:6  (local to host ruby)
>    Creation Time : Mon Apr 11 19:40:25 2011
>       Raid Level : -unknown-
>     Raid Devices : 0
>
>   Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
>      Data Offset : 272 sectors
>     Super Offset : 8 sectors
>            State : active
>      Device UUID : 321368f6:9f38bc16:76f787c3:4b3d398d
>
>      Update Time : Sun Sep 30 07:26:43 2012
>         Checksum : 4941c455 - correct
>           Events : 1
>
>
>     Device Role : spare
>     Array State :  ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdf1
> /dev/sdf1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
>             Name : ruby:6  (local to host ruby)
>    Creation Time : Mon Apr 11 19:40:25 2011
>       Raid Level : -unknown-
>     Raid Devices : 0
>
>   Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
>      Data Offset : 272 sectors
>     Super Offset : 8 sectors
>            State : active
>      Device UUID : 6190765b:200ff748:d50a75e3:597405c4
>
>      Update Time : Sun Sep 30 07:26:43 2012
>         Checksum : 37446270 - correct
>           Events : 1
>
>
>     Device Role : spare
>     Array State :  ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdg1
> /dev/sdg1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
>             Name : ruby:6  (local to host ruby)
>    Creation Time : Mon Apr 11 19:40:25 2011
>       Raid Level : -unknown-
>     Raid Devices : 0
>
>   Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
>      Data Offset : 272 sectors
>     Super Offset : 8 sectors
>            State : active
>      Device UUID : 7d707598:a8881376:531ae0c6:aac82909
>
>      Update Time : Sun Sep 30 07:26:43 2012
>         Checksum : c9ef1fe9 - correct
>           Events : 1
>
>
>     Device Role : spare
>     Array State :  ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdh1
> /dev/sdh1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
>             Name : ruby:6  (local to host ruby)
>    Creation Time : Mon Apr 11 19:40:25 2011
>       Raid Level : -unknown-
>     Raid Devices : 0
>
>   Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
>      Data Offset : 272 sectors
>     Super Offset : 8 sectors
>            State : active
>      Device UUID : 179691a0:fd201c2d:49c73803:409a0a9c
>
>      Update Time : Sun Sep 30 07:26:43 2012
>         Checksum : 584d5c61 - correct
>           Events : 1
>
>
>     Device Role : spare
>     Array State :  ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdi1
> /dev/sdi1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
>             Name : ruby:6  (local to host ruby)
>    Creation Time : Mon Apr 11 19:40:25 2011
>       Raid Level : raid6
>     Raid Devices : 9
>
>   Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
>       Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
>    Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
>      Data Offset : 272 sectors
>     Super Offset : 8 sectors
>            State : clean
>      Device UUID : 9d53248b:1db27ffc:a2a511c3:7176a7eb
>
>      Update Time : Sun Sep 30 04:34:27 2012
>         Checksum : 22b9429c - correct
>           Events : 2474296
>
>           Layout : left-symmetric
>       Chunk Size : 512K
>
>     Device Role : Active device 8
>     Array State : AAAAAAAAA ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdj1
> /dev/sdj1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
>             Name : ruby:6  (local to host ruby)
>    Creation Time : Mon Apr 11 19:40:25 2011
>       Raid Level : raid6
>     Raid Devices : 9
>
>   Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
>       Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
>    Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
>      Data Offset : 272 sectors
>     Super Offset : 8 sectors
>            State : clean
>      Device UUID : 880ed7fb:b9c673de:929d14c5:53f9b81d
>
>      Update Time : Sun Sep 30 04:34:27 2012
>         Checksum : a9748cf3 - correct
>           Events : 2474296
>
>           Layout : left-symmetric
>       Chunk Size : 512K
>
>     Device Role : Active device 7
>     Array State : AAAAAAAAA ('A' == active, '.' == missing)
>
> I find it odd that the raid levels for some of the disks would register as
> "unknown" and that their device roles would be shifted to "spare".
>
> Current system:
>
> Linux ruby 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64
> x86_64 x86_64 GNU/Linux
>
> Mdadm version:
>
> mdadm - v3.2.3 - 23rd December 2011
>
> I hope I've provided enough information. I would be more than happy to elaborate
> or provide additional data if need be. Again, this array was functioning
> normally up until a few hours ago. Am I able to salvage my data?
>
> Thank you.
>
> -EJ
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hello again, a quick follow-up, I've rebooted the server and 
/proc/mdstat now looks like this:

$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
[raid4] [raid10]
md6 : inactive sdh1[8](S) sdf1[4](S) sdg1[11](S) sde1[6](S) sdc1[1](S) 
sdd1[0](S)
       11721080016 blocks super 1.2

$ mdadm -D /dev/md6
mdadm: md device /dev/md6 does not appear to be active.

Although I'm still not sure how to proceed-- I thought it best to 
include this information to the list.

Thanks again,

-EJ

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-09-30  9:21 Upgrade from Ubuntu 10.04 to 12.04 broken raid6 EJ
  2012-09-30  9:30 ` EJ Vincent
@ 2012-09-30  9:44 ` Jan Ceuleers
  2012-09-30 10:04 ` Mikael Abrahamsson
  2 siblings, 0 replies; 17+ messages in thread
From: Jan Ceuleers @ 2012-09-30  9:44 UTC (permalink / raw)
  To: EJ; +Cc: linux-raid

On 09/30/2012 11:21 AM, EJ wrote:
> Greetings,
> 
> I hope that I'm posting this in the right place, if not my apologies.
> 
> Up until several hours ago, my system was running Ubuntu 10.04 LTS, using the 
> stock version of mdadm--unfortunately I have no idea which version it was.

If your 10.04 installation was up-to-date before you upgraded, you were
running the following version of mdadm:

mdadm - v2.6.7.1 - 15th October 2008

HTH, Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-09-30  9:21 Upgrade from Ubuntu 10.04 to 12.04 broken raid6 EJ
  2012-09-30  9:30 ` EJ Vincent
  2012-09-30  9:44 ` Jan Ceuleers
@ 2012-09-30 10:04 ` Mikael Abrahamsson
  2012-09-30 19:20   ` EJ Vincent
  2 siblings, 1 reply; 17+ messages in thread
From: Mikael Abrahamsson @ 2012-09-30 10:04 UTC (permalink / raw)
  To: EJ; +Cc: linux-raid

On Sun, 30 Sep 2012, EJ wrote:

> Fast forward to now, I've upgraded the system to 12.04 LTS and have lost 
> access to my array. The array itself is a nine (9) disk raid6 managed by 
> mdadm.

What version of kernel for 12.04 were you running?

If you didn't upgrade your kernel, you might have been hit by the bug 
described in:

<http://neil.brown.name/blog/20120615073245>

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-09-30 10:04 ` Mikael Abrahamsson
@ 2012-09-30 19:20   ` EJ Vincent
  2012-09-30 19:22     ` Mathias Burén
  2012-09-30 19:50     ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6 Chris Murphy
  0 siblings, 2 replies; 17+ messages in thread
From: EJ Vincent @ 2012-09-30 19:20 UTC (permalink / raw)
  To: linux-raid

On 9/30/2012 6:04 AM, Mikael Abrahamsson wrote:
> On Sun, 30 Sep 2012, EJ wrote:
>
>> Fast forward to now, I've upgraded the system to 12.04 LTS and have 
>> lost access to my array. The array itself is a nine (9) disk raid6 
>> managed by mdadm.
>
> What version of kernel for 12.04 were you running?
>
> If you didn't upgrade your kernel, you might have been hit by the bug 
> described in:
>
> <http://neil.brown.name/blog/20120615073245>
>

Hello,

I'm running the stock version of Ubuntu 12.04.0, using kernel 
3.2.0-23-generic.

That link looks interesting-- I'm not sure if I triggered the bug how 
Mr. Neil Brown describes it, but I definitely have symptoms on some (not 
all) the disks of RAID level "-unknown-" and devices appearing to be 
spares.

I'm hesitant to re-create the array again (using mdadm) because 
according to that blog post, for RAID-6, the order of devices are 
important, and with this being a 9 disk array and no record of device 
order in logs or from my own memory, I have no idea what the proper 
order might be.

I do know that 1) I was using metadata version 1.2, 2) the array was not 
degraded and subsequently 3) no disks were missing.

Am I over-estimating the importance of the order and should proceed with 
the re-creation, or perhaps wait for Neil himself to weigh in the problem?

Thanks for all the responses, much appreciated.

-EJ

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-09-30 19:20   ` EJ Vincent
@ 2012-09-30 19:22     ` Mathias Burén
  2012-09-30 19:25       ` EJ Vincent
  2012-09-30 19:50     ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6 Chris Murphy
  1 sibling, 1 reply; 17+ messages in thread
From: Mathias Burén @ 2012-09-30 19:22 UTC (permalink / raw)
  To: EJ Vincent; +Cc: linux-raid

On 30 September 2012 20:20, EJ Vincent <ej@ejane.org> wrote:
> On 9/30/2012 6:04 AM, Mikael Abrahamsson wrote:
>>
>> On Sun, 30 Sep 2012, EJ wrote:
>>
>>> Fast forward to now, I've upgraded the system to 12.04 LTS and have lost
>>> access to my array. The array itself is a nine (9) disk raid6 managed by
>>> mdadm.
>>
>>
>> What version of kernel for 12.04 were you running?
>>
>> If you didn't upgrade your kernel, you might have been hit by the bug
>> described in:
>>
>> <http://neil.brown.name/blog/20120615073245>
>>
>
> Hello,
>
> I'm running the stock version of Ubuntu 12.04.0, using kernel
> 3.2.0-23-generic.
>
> That link looks interesting-- I'm not sure if I triggered the bug how Mr.
> Neil Brown describes it, but I definitely have symptoms on some (not all)
> the disks of RAID level "-unknown-" and devices appearing to be spares.
>
> I'm hesitant to re-create the array again (using mdadm) because according to
> that blog post, for RAID-6, the order of devices are important, and with
> this being a 9 disk array and no record of device order in logs or from my
> own memory, I have no idea what the proper order might be.
>
> I do know that 1) I was using metadata version 1.2, 2) the array was not
> degraded and subsequently 3) no disks were missing.
>
> Am I over-estimating the importance of the order and should proceed with the
> re-creation, or perhaps wait for Neil himself to weigh in the problem?
>
> Thanks for all the responses, much appreciated.
>
> -EJ
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Can't you just boot off an older Ubuntu USB, install mdadm and scan /
assemble, see the device order?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-09-30 19:22     ` Mathias Burén
@ 2012-09-30 19:25       ` EJ Vincent
  2012-09-30 20:28         ` Phil Turmel
  0 siblings, 1 reply; 17+ messages in thread
From: EJ Vincent @ 2012-09-30 19:25 UTC (permalink / raw)
  To: linux-raid

On 9/30/2012 3:22 PM, Mathias Burén wrote:
> On 30 September 2012 20:20, EJ Vincent <ej@ejane.org> wrote:
>> On 9/30/2012 6:04 AM, Mikael Abrahamsson wrote:
>>> On Sun, 30 Sep 2012, EJ wrote:
>>>
>>>> Fast forward to now, I've upgraded the system to 12.04 LTS and have lost
>>>> access to my array. The array itself is a nine (9) disk raid6 managed by
>>>> mdadm.
>>>
>>> What version of kernel for 12.04 were you running?
>>>
>>> If you didn't upgrade your kernel, you might have been hit by the bug
>>> described in:
>>>
>>> <http://neil.brown.name/blog/20120615073245>
>>>
>> Hello,
>>
>> I'm running the stock version of Ubuntu 12.04.0, using kernel
>> 3.2.0-23-generic.
>>
>> That link looks interesting-- I'm not sure if I triggered the bug how Mr.
>> Neil Brown describes it, but I definitely have symptoms on some (not all)
>> the disks of RAID level "-unknown-" and devices appearing to be spares.
>>
>> I'm hesitant to re-create the array again (using mdadm) because according to
>> that blog post, for RAID-6, the order of devices are important, and with
>> this being a 9 disk array and no record of device order in logs or from my
>> own memory, I have no idea what the proper order might be.
>>
>> I do know that 1) I was using metadata version 1.2, 2) the array was not
>> degraded and subsequently 3) no disks were missing.
>>
>> Am I over-estimating the importance of the order and should proceed with the
>> re-creation, or perhaps wait for Neil himself to weigh in the problem?
>>
>> Thanks for all the responses, much appreciated.
>>
>> -EJ
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
> assemble, see the device order?

Hi Mathias,

I'm under the impression that damage to the metadata has already been 
done by 12.04, making a recovery from an older version of Ubuntu 
(10.04), impossible.  Is this line of thinking, flawed?

Thanks,

-EJ
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-09-30 19:20   ` EJ Vincent
  2012-09-30 19:22     ` Mathias Burén
@ 2012-09-30 19:50     ` Chris Murphy
  1 sibling, 0 replies; 17+ messages in thread
From: Chris Murphy @ 2012-09-30 19:50 UTC (permalink / raw)
  To: Linux RAID

On Sep 30, 2012, at 1:20 PM, EJ Vincent wrote:

> On 9/30/2012 6:04 AM, Mikael Abrahamsson wrote:
>> 
>> <http://neil.brown.name/blog/20120615073245>

> 3.2.0-23-generic.

That kernel is inside the range that had the bug, although I'm not sure if that kernel actually has the bug. The symptoms match up as you say.

> 
> That link looks interesting-- I'm not sure if I triggered the bug how Mr. Neil Brown describes it, but I definitely have symptoms on some (not all) the disks of RAID level "-unknown-" and devices appearing to be spares.
> 
> I'm hesitant to re-create the array again (using mdadm) because according to that blog post, for RAID-6, the order of devices are important, and with this being a 9 disk array and no record of device order in logs or from my own memory, I have no idea what the proper order might be.

You'll have to iterate, it sounds like, if you have nothing else to go on. Faster to iterate and try again than to blow way the RAID and restore from backup.

> I do know that 1) I was using metadata version 1.2, 2) the array was not degraded and subsequently 3) no disks were missing.
> 
> Am I over-estimating the importance of the order and should proceed with the re-creation, or perhaps wait for Neil himself to weigh in the problem?

Either. Just make sure you're using --assume-clean and don't mount it, to prevent either resync or changes to the file system. The echo > check (read only scrub) test described in the blog entry will rather clearly tell you if you get the order of the disks correct. The blog entry is pretty detailed. The question I have remaining is if there's some way for you to cheat and have a better chance at getting the disk order correct.

Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-09-30 19:25       ` EJ Vincent
@ 2012-09-30 20:28         ` Phil Turmel
  2012-09-30 23:23           ` EJ Vincent
  0 siblings, 1 reply; 17+ messages in thread
From: Phil Turmel @ 2012-09-30 20:28 UTC (permalink / raw)
  To: EJ Vincent; +Cc: linux-raid

On 09/30/2012 03:25 PM, EJ Vincent wrote:
> On 9/30/2012 3:22 PM, Mathias Burén wrote:
>> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
>> assemble, see the device order?
> 
> Hi Mathias,
> 
> I'm under the impression that damage to the metadata has already been
> done by 12.04, making a recovery from an older version of Ubuntu
> (10.04), impossible.  Is this line of thinking, flawed?

Your impression is correct.  Permanent damage to the metadata was done.
 You *must* re-create your array.

However, you *cannot* use your new version of mdadm, as it will get the
data offset wrong.  Your first report showed a data offset of 272.
Newer versions of mdadm default to 2048.  You *must* perform all of your
"mdadm --create --assume-clean" permutations with 10.04.

Do you have *any* dmesg output from the old system?  Or dmesg from the
very first boot under 12.04?  That might have enough information to
shorten your search.

In the future, you should record your setup by saving the output of
"mdadm -D" on each array, "mdadm -E" on each member device, and the
output of "ls -l /dev/disk/by-id/"

Or try my documentation script "lsdrv". [1]

HTH,

Phil

[1] http://github.com/pturmel/lsdrv

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-09-30 20:28         ` Phil Turmel
@ 2012-09-30 23:23           ` EJ Vincent
  2012-10-01 12:40             ` Phil Turmel
  2012-10-02  2:15             ` NeilBrown
  0 siblings, 2 replies; 17+ messages in thread
From: EJ Vincent @ 2012-09-30 23:23 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On 9/30/2012 4:28 PM, Phil Turmel wrote:
> On 09/30/2012 03:25 PM, EJ Vincent wrote:
>> On 9/30/2012 3:22 PM, Mathias Burén wrote:
>>> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
>>> assemble, see the device order?
>> Hi Mathias,
>>
>> I'm under the impression that damage to the metadata has already been
>> done by 12.04, making a recovery from an older version of Ubuntu
>> (10.04), impossible.  Is this line of thinking, flawed?
> Your impression is correct.  Permanent damage to the metadata was done.
>   You *must* re-create your array.
>
> However, you *cannot* use your new version of mdadm, as it will get the
> data offset wrong.  Your first report showed a data offset of 272.
> Newer versions of mdadm default to 2048.  You *must* perform all of your
> "mdadm --create --assume-clean" permutations with 10.04.
>
> Do you have *any* dmesg output from the old system?  Or dmesg from the
> very first boot under 12.04?  That might have enough information to
> shorten your search.
>
> In the future, you should record your setup by saving the output of
> "mdadm -D" on each array, "mdadm -E" on each member device, and the
> output of "ls -l /dev/disk/by-id/"
>
> Or try my documentation script "lsdrv". [1]
>
> HTH,
>
> Phil
>
> [1] http://github.com/pturmel/lsdrv
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hi Phil,

Unfortunately I don't have any dmesg log from the old system or the 
first boot under 12.04.

Getting my system to boot at all under 12.04 was chaotic enough, with 
the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions 
ravaging my array and then dropping me to a busybox shell over and over 
again.  I didn't think to record the very first error.

Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and 
/dev/sdj1 don't have the Raid level "-unknown-", neither are they 
labeled as spares.  They are in fact, labeled clean and appear 
*different* from the others.

Could these disks still contain my metadata from 10.04?  I recall during 
my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so 
that I could drop in a SATA CD/DVDRW into the slot.

I am downloading 10.04.4 LTS and will be ready to use it soon.  I fear 
having to do permutations-- 9! (factorial) would mean 362,880 
combinations.  *gasp*

Many thanks for all your comments and insights.

-EJ
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-09-30 23:23           ` EJ Vincent
@ 2012-10-01 12:40             ` Phil Turmel
  2012-10-01 17:14               ` EJ Vincent
  2012-10-02  2:15             ` NeilBrown
  1 sibling, 1 reply; 17+ messages in thread
From: Phil Turmel @ 2012-10-01 12:40 UTC (permalink / raw)
  To: EJ Vincent; +Cc: linux-raid

Hi EJ,

On 09/30/2012 07:23 PM, EJ Vincent wrote:
> On 9/30/2012 4:28 PM, Phil Turmel wrote:

>> Do you have *any* dmesg output from the old system?  Or dmesg from the
>> very first boot under 12.04?  That might have enough information to
>> shorten your search.
>>
>> In the future, you should record your setup by saving the output of
>> "mdadm -D" on each array, "mdadm -E" on each member device, and the
>> output of "ls -l /dev/disk/by-id/"
>>
>> Or try my documentation script "lsdrv". [1]
>>
>> HTH,
>>
>> Phil
>>
>> [1] http://github.com/pturmel/lsdrv
> 
> Hi Phil,
> 
> Unfortunately I don't have any dmesg log from the old system or the
> first boot under 12.04.
> 
> Getting my system to boot at all under 12.04 was chaotic enough, with
> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
> ravaging my array and then dropping me to a busybox shell over and over
> again.  I didn't think to record the very first error.

I'm not prepared to condemn the 12.04 initramfs--I really don't think it
is a factor in this crisis.  The critical part is the degraded reboot bug.

> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
> labeled as spares.  They are in fact, labeled clean and appear
> *different* from the others.
> 
> Could these disks still contain my metadata from 10.04?  I recall during
> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
> that I could drop in a SATA CD/DVDRW into the slot.

Leaving disks unpowered sounds like a key factor in your crisis.  Raid6
can't operate with more than two missing, and won't assemble if any disk
disappears between shutdown and the next boot.  (Must be forced.)

So your array would only partially assemble under 12.04 due to
deliberately missing drives, then you rebooted with a kernel that has a
problem with that scenario.

The disks very likely do have useful metadata, but no disk has all of
it.  It might reduce the permutations you need to try.  If you share
more information about your system layout, some educated first guesses
might be possible, too.  The output of "mdadm -E" for every drive, and
lsdrv for an overview.

> I am downloading 10.04.4 LTS and will be ready to use it soon.  I fear
> having to do permutations-- 9! (factorial) would mean 362,880
> combinations.  *gasp*

Phil


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-10-01 12:40             ` Phil Turmel
@ 2012-10-01 17:14               ` EJ Vincent
  0 siblings, 0 replies; 17+ messages in thread
From: EJ Vincent @ 2012-10-01 17:14 UTC (permalink / raw)
  To: linux-raid

On 10/1/2012 8:40 AM, Phil Turmel wrote:
> Hi EJ,
>
> On 09/30/2012 07:23 PM, EJ Vincent wrote:
>> On 9/30/2012 4:28 PM, Phil Turmel wrote:
>>> Do you have *any* dmesg output from the old system?  Or dmesg from the
>>> very first boot under 12.04?  That might have enough information to
>>> shorten your search.
>>>
>>> In the future, you should record your setup by saving the output of
>>> "mdadm -D" on each array, "mdadm -E" on each member device, and the
>>> output of "ls -l /dev/disk/by-id/"
>>>
>>> Or try my documentation script "lsdrv". [1]
>>>
>>> HTH,
>>>
>>> Phil
>>>
>>> [1] http://github.com/pturmel/lsdrv
>> Hi Phil,
>>
>> Unfortunately I don't have any dmesg log from the old system or the
>> first boot under 12.04.
>>
>> Getting my system to boot at all under 12.04 was chaotic enough, with
>> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
>> ravaging my array and then dropping me to a busybox shell over and over
>> again.  I didn't think to record the very first error.
> I'm not prepared to condemn the 12.04 initramfs--I really don't think it
> is a factor in this crisis.  The critical part is the degraded reboot bug.
>
>> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
>> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
>> labeled as spares.  They are in fact, labeled clean and appear
>> *different* from the others.
>>
>> Could these disks still contain my metadata from 10.04?  I recall during
>> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
>> that I could drop in a SATA CD/DVDRW into the slot.
> Leaving disks unpowered sounds like a key factor in your crisis.  Raid6
> can't operate with more than two missing, and won't assemble if any disk
> disappears between shutdown and the next boot.  (Must be forced.)
>
> So your array would only partially assemble under 12.04 due to
> deliberately missing drives, then you rebooted with a kernel that has a
> problem with that scenario.
>
> The disks very likely do have useful metadata, but no disk has all of
> it.  It might reduce the permutations you need to try.  If you share
> more information about your system layout, some educated first guesses
> might be possible, too.  The output of "mdadm -E" for every drive, and
> lsdrv for an overview.
>
>> I am downloading 10.04.4 LTS and will be ready to use it soon.  I fear
>> having to do permutations-- 9! (factorial) would mean 362,880
>> combinations.  *gasp*
> Phil
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

On 10/1/2012 8:40 AM, Phil Turmel wrote:
> Hi EJ,
>
> On 09/30/2012 07:23 PM, EJ Vincent wrote:
>> On 9/30/2012 4:28 PM, Phil Turmel wrote:
>>> Do you have *any* dmesg output from the old system?  Or dmesg from the
>>> very first boot under 12.04?  That might have enough information to
>>> shorten your search.
>>>
>>> In the future, you should record your setup by saving the output of
>>> "mdadm -D" on each array, "mdadm -E" on each member device, and the
>>> output of "ls -l /dev/disk/by-id/"
>>>
>>> Or try my documentation script "lsdrv". [1]
>>>
>>> HTH,
>>>
>>> Phil
>>>
>>> [1] http://github.com/pturmel/lsdrv
>> Hi Phil,
>>
>> Unfortunately I don't have any dmesg log from the old system or the
>> first boot under 12.04.
>>
>> Getting my system to boot at all under 12.04 was chaotic enough, with
>> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
>> ravaging my array and then dropping me to a busybox shell over and over
>> again.  I didn't think to record the very first error.
> I'm not prepared to condemn the 12.04 initramfs--I really don't think it
> is a factor in this crisis.  The critical part is the degraded reboot bug.
>
>> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
>> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
>> labeled as spares.  They are in fact, labeled clean and appear
>> *different* from the others.
>>
>> Could these disks still contain my metadata from 10.04?  I recall during
>> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
>> that I could drop in a SATA CD/DVDRW into the slot.
> Leaving disks unpowered sounds like a key factor in your crisis.  Raid6
> can't operate with more than two missing, and won't assemble if any disk
> disappears between shutdown and the next boot.  (Must be forced.)
>
> So your array would only partially assemble under 12.04 due to
> deliberately missing drives, then you rebooted with a kernel that has a
> problem with that scenario.
>
> The disks very likely do have useful metadata, but no disk has all of
> it.  It might reduce the permutations you need to try.  If you share
> more information about your system layout, some educated first guesses
> might be possible, too.  The output of "mdadm -E" for every drive, and
> lsdrv for an overview.
>
>> I am downloading 10.04.4 LTS and will be ready to use it soon.  I fear
>> having to do permutations-- 9! (factorial) would mean 362,880
>> combinations.  *gasp*
> Phil
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hi Phil,

Here's the information you requested.

The server has 10 disks, a dedicated 500GB disk for the operating system 
(which Ubuntu 10.04.4 has labeled /dev/sdd), and 9 x 2TB disks 
(/dev/sd[a,b,c,e,f,g,h,i,j):

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdd: 500.1 GB, 500107862016 bytes
Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdg: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdh: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdi: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdj: 2000.4 GB, 2000398934016 bytes

The devices are spread amongst an on-board SATA controller, MCP78S 
GeForce AHCI, and two SiI 3124 PCI-X SATA controllers.

The layout is as follows: 5 disks are attached to the on-board 
controller, 3 attached to one SiI 3124 controller, and 2 attached to the 
other SiI 3124 controller.

I've loaded your lsdrv script, here are the results:

PCI [pata_amd] 00:06.0 IDE interface: nVidia Corporation MCP78S [GeForce 
8200] IDE (rev a1)
scsi 0:x:x:x [Empty]
scsi 1:x:x:x [Empty]

PCI [sata_sil24] 06:04.0 RAID bus controller: Silicon Image, Inc. SiI 
3124 PCI-X Serial ATA Controller (rev 02)
scsi 2:0:0:0 ATA ST2000DL003-9VT1
sda 1.82t [8:0] Empty/Unknown
  sda1 1.82t [8:1] Empty/Unknown
scsi 5:0:0:0 ATA ST2000DL003-9VT1
sdb 1.82t [8:16] Empty/Unknown
  sdb1 1.82t [8:17] Empty/Unknown
scsi 7:0:0:0 ATA ST2000DL003-9VT1
sdc 1.82t [8:32] Empty/Unknown
  sdc1 1.82t [8:33] Empty/Unknown
scsi 9:x:x:x [Empty]

PCI [ahci] 00:09.0 SATA controller: nVidia Corporation MCP78S [GeForce 
8200] AHCI Controller (rev a2)
scsi 3:0:0:0 ATA WDC WD5000AAKS-2
sdd 465.76g [8:48] Empty/Unknown
  sdd1 237.00m [8:49] Empty/Unknown
  Mounted as /dev/sdd1 @ /boot
  sdd2 3.73g [8:50] Empty/Unknown
  sdd3 23.28g [8:51] Empty/Unknown
  Mounted as /dev/disk/by-uuid/65a128d3-3e2e-487a-a36b-11cbe5530429 @ /
  sdd4 438.52g [8:52] Empty/Unknown
scsi 4:0:0:0 ATA ST2000DL003-9VT1
sde 1.82t [8:64] Empty/Unknown
  sde1 1.82t [8:65] Empty/Unknown
scsi 6:0:0:0 ATA ST32000542AS
sdf 1.82t [8:80] Empty/Unknown
  sdf1 1.82t [8:81] Empty/Unknown
scsi 8:0:0:0 ATA ST32000542AS
sdg 1.82t [8:96] Empty/Unknown
  sdg1 1.82t [8:97] Empty/Unknown
scsi 10:0:0:0 ATA ST2000DL003-9VT1
sdh 1.82t [8:112] Empty/Unknown
  sdh1 1.82t [8:113] Empty/Unknown
scsi 11:x:x:x [Empty]

PCI [sata_sil24] 08:04.0 RAID bus controller: Silicon Image, Inc. SiI 
3124 PCI-X Serial ATA Controller (rev 02)
scsi 12:0:0:0 ATA ST2000DL003-9VT1
sdi 1.82t [8:128] Empty/Unknown
  sdi1 1.82t [8:129] Empty/Unknown
scsi 13:0:0:0 ATA ST2000DL003-9VT1
sdj 1.82t [8:144] Empty/Unknown
  sdj1 1.82t [8:145] Empty/Unknown
scsi 14:x:x:x [Empty]
scsi 15:x:x:x [Empty]

Here is what mdadm -E looks like for each member of the array, now under 
Ubuntu 10.04.4:

# mdadm -E /dev/sda1
/dev/sda1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
            Name : ruby:6  (local to host ruby)
   Creation Time : Mon Apr 11 15:40:25 2011
      Raid Level : -unknown-
    Raid Devices : 0

  Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Data Offset : 272 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 6190765b:200ff748:d50a75e3:597405c4

     Update Time : Sun Sep 30 19:13:16 2012
        Checksum : 37454049 - correct
          Events : 1


     Array Slot : 4 (empty, empty, failed, failed, empty, failed, empty, 
failed, empty, failed, failed, empty, failed... <shortened for readability>)
    Array State :  378 failed

# mdadm -E /dev/sdb1
/dev/sdb1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
            Name : ruby:6  (local to host ruby)
   Creation Time : Mon Apr 11 15:40:25 2011
      Raid Level : -unknown-
    Raid Devices : 0

  Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Data Offset : 272 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 7d707598:a8881376:531ae0c6:aac82909

     Update Time : Sun Sep 30 19:13:16 2012
        Checksum : c9effdc2 - correct
          Events : 1


     Array Slot : 11 (empty, empty, failed, failed, empty, failed, 
empty, failed, empty, failed, failed, empty, failed... <shortened for 
readability>)
    Array State :  378 failed

# mdadm -E /dev/sdc1
/dev/sdc1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
            Name : ruby:6  (local to host ruby)
   Creation Time : Mon Apr 11 15:40:25 2011
      Raid Level : raid6
    Raid Devices : 9

  Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
      Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
     Data Offset : 272 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : a6fd99b2:7bb75287:5d844ec5:822b6d8a

     Update Time : Sun Sep 30 00:34:27 2012
        Checksum : 760485cb - correct
          Events : 2474296

      Chunk Size : 512K

     Array Slot : 7 (0, 1, failed, failed, 2, failed, 4, 5, 6, 7, 8, 3)
    Array State : uuuuuUuuu 3 failed

# mdadm -E /dev/sde1
/dev/sde1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
            Name : ruby:6  (local to host ruby)
   Creation Time : Mon Apr 11 15:40:25 2011
      Raid Level : -unknown-
    Raid Devices : 0

  Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Data Offset : 272 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 179691a0:fd201c2d:49c73803:409a0a9c

     Update Time : Sun Sep 30 19:13:16 2012
        Checksum : 584e3a3a - correct
          Events : 1


     Array Slot : 8 (empty, empty, failed, failed, empty, failed, empty, 
failed, empty, failed, failed, empty, failed... <shortened for readability>)
    Array State :  378 failed

# mdadm -E /dev/sdf1
/dev/sdf1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
            Name : ruby:6  (local to host ruby)
   Creation Time : Mon Apr 11 15:40:25 2011
      Raid Level : -unknown-
    Raid Devices : 0

  Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Data Offset : 272 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : f3f72549:8543972f:1f4a655d:fa9416bd

     Update Time : Sun Sep 30 19:13:16 2012
        Checksum : 7e963c27 - correct
          Events : 1


     Array Slot : 1 (empty, empty, failed, failed, empty, failed, empty, 
failed, empty, failed, failed, empty, failed... <shortened for readability>)
    Array State :  378 failed

# mdadm -E /dev/sdg1
/dev/sdg1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
            Name : ruby:6  (local to host ruby)
   Creation Time : Mon Apr 11 15:40:25 2011
      Raid Level : -unknown-
    Raid Devices : 0

  Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Data Offset : 272 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 9c908e4b:ad7d8af8:ff5d2ab6:50b013e5

     Update Time : Sun Sep 30 19:13:16 2012
        Checksum : cab43e2e - correct
          Events : 1


     Array Slot : 0 (empty, empty, failed, failed, empty, failed, empty, 
failed, empty, failed, failed, empty, failed... <shortened for readability>)
    Array State :  378 failed

# mdadm -E /dev/sdh1
/dev/sdh1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
            Name : ruby:6  (local to host ruby)
   Creation Time : Mon Apr 11 15:40:25 2011
      Raid Level : -unknown-
    Raid Devices : 0

  Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Data Offset : 272 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 321368f6:9f38bc16:76f787c3:4b3d398d

     Update Time : Sun Sep 30 19:13:16 2012
        Checksum : 4942a22e - correct
          Events : 1


     Array Slot : 6 (empty, empty, failed, failed, empty, failed, empty, 
failed, empty, failed, failed, empty, failed... <shortened for readability>)
    Array State :  378 failed

# mdadm -E /dev/sdi1
/dev/sdi1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
            Name : ruby:6  (local to host ruby)
   Creation Time : Mon Apr 11 15:40:25 2011
      Raid Level : raid6
    Raid Devices : 9

  Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
      Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
     Data Offset : 272 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 9d53248b:1db27ffc:a2a511c3:7176a7eb

     Update Time : Sun Sep 30 00:34:27 2012
        Checksum : 22b9429c - correct
          Events : 2474296

      Chunk Size : 512K

     Array Slot : 10 (0, 1, failed, failed, 2, failed, 4, 5, 6, 7, 8, 3)
    Array State : uuuuuuuuU 3 failed

# mdadm -E /dev/sdj1
/dev/sdj1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
            Name : ruby:6  (local to host ruby)
   Creation Time : Mon Apr 11 15:40:25 2011
      Raid Level : raid6
    Raid Devices : 9

  Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
      Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
     Data Offset : 272 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 880ed7fb:b9c673de:929d14c5:53f9b81d

     Update Time : Sun Sep 30 00:34:27 2012
        Checksum : a9748cf3 - correct
          Events : 2474296

      Chunk Size : 512K

     Array Slot : 9 (0, 1, failed, failed, 2, failed, 4, 5, 6, 7, 8, 3)
    Array State : uuuuuuuUu 3 failed

I'd be happy to also supply a dump of 'lshw' which I believe is similar 
to 'lsdrv' if that would be useful to you.  The system is back on 
10.04.4 LTS, and is using mdadm version 2.6.7.1.

Thanks for your continued input and assistance.  Much appreciated.

-EJ



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-09-30 23:23           ` EJ Vincent
  2012-10-01 12:40             ` Phil Turmel
@ 2012-10-02  2:15             ` NeilBrown
  2012-10-02  3:53               ` EJ Vincent
  1 sibling, 1 reply; 17+ messages in thread
From: NeilBrown @ 2012-10-02  2:15 UTC (permalink / raw)
  To: EJ Vincent; +Cc: Phil Turmel, linux-raid

[-- Attachment #1: Type: text/plain, Size: 4780 bytes --]

On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@ejane.org> wrote:

> On 9/30/2012 4:28 PM, Phil Turmel wrote:
> > On 09/30/2012 03:25 PM, EJ Vincent wrote:
> >> On 9/30/2012 3:22 PM, Mathias Burén wrote:
> >>> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
> >>> assemble, see the device order?
> >> Hi Mathias,
> >>
> >> I'm under the impression that damage to the metadata has already been
> >> done by 12.04, making a recovery from an older version of Ubuntu
> >> (10.04), impossible.  Is this line of thinking, flawed?
> > Your impression is correct.  Permanent damage to the metadata was done.
> >   You *must* re-create your array.
> >
> > However, you *cannot* use your new version of mdadm, as it will get the
> > data offset wrong.  Your first report showed a data offset of 272.
> > Newer versions of mdadm default to 2048.  You *must* perform all of your
> > "mdadm --create --assume-clean" permutations with 10.04.
> >
> > Do you have *any* dmesg output from the old system?  Or dmesg from the
> > very first boot under 12.04?  That might have enough information to
> > shorten your search.
> >
> > In the future, you should record your setup by saving the output of
> > "mdadm -D" on each array, "mdadm -E" on each member device, and the
> > output of "ls -l /dev/disk/by-id/"
> >
> > Or try my documentation script "lsdrv". [1]
> >
> > HTH,
> >
> > Phil
> >
> > [1] http://github.com/pturmel/lsdrv
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> Hi Phil,
> 
> Unfortunately I don't have any dmesg log from the old system or the 
> first boot under 12.04.
> 
> Getting my system to boot at all under 12.04 was chaotic enough, with 
> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions 
> ravaging my array and then dropping me to a busybox shell over and over 
> again.  I didn't think to record the very first error.
> 
> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and 
> /dev/sdj1 don't have the Raid level "-unknown-", neither are they 
> labeled as spares.  They are in fact, labeled clean and appear 
> *different* from the others.
> 
> Could these disks still contain my metadata from 10.04?  I recall during 
> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so 
> that I could drop in a SATA CD/DVDRW into the slot.
> 
> I am downloading 10.04.4 LTS and will be ready to use it soon.  I fear 
> having to do permutations-- 9! (factorial) would mean 362,880 
> combinations.  *gasp*

You might be able to avoid the 9! combinations, which could take a while ...
4 days if you could test one per second.

Try this:

 for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \
    skip=4256 | od -D | head -n1; done

This reads that 'dev_number' fields out of the metadata on each device.
This should not have been corrupted by the bug.
You might want some other pattern in place of "/dev/sd?1" - it needs to match
all the devices in your array.

Then on one of the devices which doesn't have corrupted metadata, run

  dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d

where $COUNT is one more than the largest number that was reported in the
"dev_number" values reported above.

Now for each device, take the dev_number that was reported, use that as an
index into the list of numbers produced by the second command, and that
number if the role of the device in the array.  i.e. it's position in the
list.

So after making an array of 5 'loop' devices in a non-obvious order, and
failing a device and re-adding it:

# for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done
/dev/loop0 0000000          3
/dev/loop1 0000000          4
/dev/loop2 0000000          1
/dev/loop3 0000000          0
/dev/loop4 0000000          5

and 

# dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d
0000000     0     1 65534     3     4     2
0000014

So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3'
/dev/loop1 has 'dev_number' 4, so is device 4
/dev/loop4 has dev_number '5', so is device 2
etc
So we can reconstruct the order of devices:

 /dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1

Note the '65534' in the list means that there is no device with that
dev_number.  i.e. no device is number '2', and looking at the list confirms
that.

You should be able to perform the same steps to recover the correct order to
try creating the array.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-10-02  2:15             ` NeilBrown
@ 2012-10-02  3:53               ` EJ Vincent
  2012-10-02  5:04                 ` NeilBrown
  0 siblings, 1 reply; 17+ messages in thread
From: EJ Vincent @ 2012-10-02  3:53 UTC (permalink / raw)
  To: NeilBrown; +Cc: Phil Turmel, linux-raid

On 10/1/2012 10:15 PM, NeilBrown wrote:
> On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@ejane.org> wrote:
>
>> On 9/30/2012 4:28 PM, Phil Turmel wrote:
>>> On 09/30/2012 03:25 PM, EJ Vincent wrote:
>>>> On 9/30/2012 3:22 PM, Mathias Burén wrote:
>>>>> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
>>>>> assemble, see the device order?
>>>> Hi Mathias,
>>>>
>>>> I'm under the impression that damage to the metadata has already been
>>>> done by 12.04, making a recovery from an older version of Ubuntu
>>>> (10.04), impossible.  Is this line of thinking, flawed?
>>> Your impression is correct.  Permanent damage to the metadata was done.
>>>    You *must* re-create your array.
>>>
>>> However, you *cannot* use your new version of mdadm, as it will get the
>>> data offset wrong.  Your first report showed a data offset of 272.
>>> Newer versions of mdadm default to 2048.  You *must* perform all of your
>>> "mdadm --create --assume-clean" permutations with 10.04.
>>>
>>> Do you have *any* dmesg output from the old system?  Or dmesg from the
>>> very first boot under 12.04?  That might have enough information to
>>> shorten your search.
>>>
>>> In the future, you should record your setup by saving the output of
>>> "mdadm -D" on each array, "mdadm -E" on each member device, and the
>>> output of "ls -l /dev/disk/by-id/"
>>>
>>> Or try my documentation script "lsdrv". [1]
>>>
>>> HTH,
>>>
>>> Phil
>>>
>>> [1] http://github.com/pturmel/lsdrv
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Hi Phil,
>>
>> Unfortunately I don't have any dmesg log from the old system or the
>> first boot under 12.04.
>>
>> Getting my system to boot at all under 12.04 was chaotic enough, with
>> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
>> ravaging my array and then dropping me to a busybox shell over and over
>> again.  I didn't think to record the very first error.
>>
>> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
>> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
>> labeled as spares.  They are in fact, labeled clean and appear
>> *different* from the others.
>>
>> Could these disks still contain my metadata from 10.04?  I recall during
>> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
>> that I could drop in a SATA CD/DVDRW into the slot.
>>
>> I am downloading 10.04.4 LTS and will be ready to use it soon.  I fear
>> having to do permutations-- 9! (factorial) would mean 362,880
>> combinations.  *gasp*
> You might be able to avoid the 9! combinations, which could take a while ...
> 4 days if you could test one per second.
>
> Try this:
>
>   for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \
>      skip=4256 | od -D | head -n1; done
>
> This reads that 'dev_number' fields out of the metadata on each device.
> This should not have been corrupted by the bug.
> You might want some other pattern in place of "/dev/sd?1" - it needs to match
> all the devices in your array.
>
> Then on one of the devices which doesn't have corrupted metadata, run
>
>    dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d
>
> where $COUNT is one more than the largest number that was reported in the
> "dev_number" values reported above.
>
> Now for each device, take the dev_number that was reported, use that as an
> index into the list of numbers produced by the second command, and that
> number if the role of the device in the array.  i.e. it's position in the
> list.
>
> So after making an array of 5 'loop' devices in a non-obvious order, and
> failing a device and re-adding it:
>
> # for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done
> /dev/loop0 0000000          3
> /dev/loop1 0000000          4
> /dev/loop2 0000000          1
> /dev/loop3 0000000          0
> /dev/loop4 0000000          5
>
> and
>
> # dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d
> 0000000     0     1 65534     3     4     2
> 0000014
>
> So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3'
> /dev/loop1 has 'dev_number' 4, so is device 4
> /dev/loop4 has dev_number '5', so is device 2
> etc
> So we can reconstruct the order of devices:
>
>   /dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1
>
> Note the '65534' in the list means that there is no device with that
> dev_number.  i.e. no device is number '2', and looking at the list confirms
> that.
>
> You should be able to perform the same steps to recover the correct order to
> try creating the array.
>
> NeilBrown
>


Hi Neil,

Thank you so much for taking the time to help me through this.

Here's what I've come up with, per your instructions:

/dev/sda1 0000000          4
/dev/sdb1 0000000         11
/dev/sdc1 0000000          7
/dev/sde1 0000000          8
/dev/sdf1 0000000          1
/dev/sdg1 0000000          0
/dev/sdh1 0000000          6
/dev/sdi1 0000000         10
/dev/sdj1 0000000          9

dd 2> /dev/null if=/dev/sdc1 bs=2 count=12 skip=2176 | od -d
0000000     0     1 65534 65534     2 65534     4     5
0000020     6     7     8     3
0000030

Mind doing a sanity check for me?

Based on the above information, one such possible device order is:

/dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* /dev/sda1 /dev/sdj1* /dev/sdh1 
/dev/sdc1 /dev/sde1

where * represents the three unknown devices marked by 65534?

Once I have your blessing, would I then proceed to:

mdadm --create /dev/md0 --assume-clean --level=6 --raid-devices=9 
--metadata=1.2 --chunk=512 /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* 
/dev/sda1 /dev/sdj1* /dev/sdh1 /dev/sdc1 /dev/sde1

and this is non-destructive, so I can attempt different orders?

Again, thank you for the help.

Best wishes,

-EJ
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
  2012-10-02  3:53               ` EJ Vincent
@ 2012-10-02  5:04                 ` NeilBrown
  2012-10-02  8:34                   ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED] EJ Vincent
  0 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2012-10-02  5:04 UTC (permalink / raw)
  To: EJ Vincent; +Cc: Phil Turmel, linux-raid

[-- Attachment #1: Type: text/plain, Size: 6953 bytes --]

On Mon, 01 Oct 2012 23:53:08 -0400 EJ Vincent <ej@ejane.org> wrote:

> On 10/1/2012 10:15 PM, NeilBrown wrote:
> > On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@ejane.org> wrote:
> >
> >> On 9/30/2012 4:28 PM, Phil Turmel wrote:
> >>> On 09/30/2012 03:25 PM, EJ Vincent wrote:
> >>>> On 9/30/2012 3:22 PM, Mathias Burén wrote:
> >>>>> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
> >>>>> assemble, see the device order?
> >>>> Hi Mathias,
> >>>>
> >>>> I'm under the impression that damage to the metadata has already been
> >>>> done by 12.04, making a recovery from an older version of Ubuntu
> >>>> (10.04), impossible.  Is this line of thinking, flawed?
> >>> Your impression is correct.  Permanent damage to the metadata was done.
> >>>    You *must* re-create your array.
> >>>
> >>> However, you *cannot* use your new version of mdadm, as it will get the
> >>> data offset wrong.  Your first report showed a data offset of 272.
> >>> Newer versions of mdadm default to 2048.  You *must* perform all of your
> >>> "mdadm --create --assume-clean" permutations with 10.04.
> >>>
> >>> Do you have *any* dmesg output from the old system?  Or dmesg from the
> >>> very first boot under 12.04?  That might have enough information to
> >>> shorten your search.
> >>>
> >>> In the future, you should record your setup by saving the output of
> >>> "mdadm -D" on each array, "mdadm -E" on each member device, and the
> >>> output of "ls -l /dev/disk/by-id/"
> >>>
> >>> Or try my documentation script "lsdrv". [1]
> >>>
> >>> HTH,
> >>>
> >>> Phil
> >>>
> >>> [1] http://github.com/pturmel/lsdrv
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Hi Phil,
> >>
> >> Unfortunately I don't have any dmesg log from the old system or the
> >> first boot under 12.04.
> >>
> >> Getting my system to boot at all under 12.04 was chaotic enough, with
> >> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
> >> ravaging my array and then dropping me to a busybox shell over and over
> >> again.  I didn't think to record the very first error.
> >>
> >> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
> >> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
> >> labeled as spares.  They are in fact, labeled clean and appear
> >> *different* from the others.
> >>
> >> Could these disks still contain my metadata from 10.04?  I recall during
> >> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
> >> that I could drop in a SATA CD/DVDRW into the slot.
> >>
> >> I am downloading 10.04.4 LTS and will be ready to use it soon.  I fear
> >> having to do permutations-- 9! (factorial) would mean 362,880
> >> combinations.  *gasp*
> > You might be able to avoid the 9! combinations, which could take a while ...
> > 4 days if you could test one per second.
> >
> > Try this:
> >
> >   for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \
> >      skip=4256 | od -D | head -n1; done
> >
> > This reads that 'dev_number' fields out of the metadata on each device.
> > This should not have been corrupted by the bug.
> > You might want some other pattern in place of "/dev/sd?1" - it needs to match
> > all the devices in your array.
> >
> > Then on one of the devices which doesn't have corrupted metadata, run
> >
> >    dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d
> >
> > where $COUNT is one more than the largest number that was reported in the
> > "dev_number" values reported above.
> >
> > Now for each device, take the dev_number that was reported, use that as an
> > index into the list of numbers produced by the second command, and that
> > number if the role of the device in the array.  i.e. it's position in the
> > list.
> >
> > So after making an array of 5 'loop' devices in a non-obvious order, and
> > failing a device and re-adding it:
> >
> > # for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done
> > /dev/loop0 0000000          3
> > /dev/loop1 0000000          4
> > /dev/loop2 0000000          1
> > /dev/loop3 0000000          0
> > /dev/loop4 0000000          5
> >
> > and
> >
> > # dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d
> > 0000000     0     1 65534     3     4     2
> > 0000014
> >
> > So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3'
> > /dev/loop1 has 'dev_number' 4, so is device 4
> > /dev/loop4 has dev_number '5', so is device 2
> > etc
> > So we can reconstruct the order of devices:
> >
> >   /dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1
> >
> > Note the '65534' in the list means that there is no device with that
> > dev_number.  i.e. no device is number '2', and looking at the list confirms
> > that.
> >
> > You should be able to perform the same steps to recover the correct order to
> > try creating the array.
> >
> > NeilBrown
> >
> 
> 
> Hi Neil,
> 
> Thank you so much for taking the time to help me through this.
> 
> Here's what I've come up with, per your instructions:
> 
> /dev/sda1 0000000          4
> /dev/sdb1 0000000         11
> /dev/sdc1 0000000          7
> /dev/sde1 0000000          8
> /dev/sdf1 0000000          1
> /dev/sdg1 0000000          0
> /dev/sdh1 0000000          6
> /dev/sdi1 0000000         10
> /dev/sdj1 0000000          9
> 
> dd 2> /dev/null if=/dev/sdc1 bs=2 count=12 skip=2176 | od -d
> 0000000     0     1 65534 65534     2 65534     4     5
> 0000020     6     7     8     3
> 0000030
> 
> Mind doing a sanity check for me?
> 
> Based on the above information, one such possible device order is:
> 
> /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* /dev/sda1 /dev/sdj1* /dev/sdh1 
> /dev/sdc1 /dev/sde1
> 
> where * represents the three unknown devices marked by 65534?

Nope.  The 65534 entries should never come into it.

 sdg1 sdf1 sda1 sdb1 sdh1 sdc1 sde1 sdj1 sdi1 

e.g. sdi1 is device '10'.  Entry 10 in the array is 8, so sdi1 goes in
position 8.

> 
> Once I have your blessing, would I then proceed to:
> 
> mdadm --create /dev/md0 --assume-clean --level=6 --raid-devices=9 
> --metadata=1.2 --chunk=512 /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* 
> /dev/sda1 /dev/sdj1* /dev/sdh1 /dev/sdc1 /dev/sde1
> 
> and this is non-destructive, so I can attempt different orders?

Yes.  Well, it destroys the metadata so make sure you have a copy of the "-E"
for each device, and it wouldn't hurt to run that second 'dd' command on
every device and keep that just in case.

NeilBrown

> 
> Again, thank you for the help.
> 
> Best wishes,
> 
> -EJ


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED]
  2012-10-02  5:04                 ` NeilBrown
@ 2012-10-02  8:34                   ` EJ Vincent
  2012-10-02 12:18                     ` Phil Turmel
  0 siblings, 1 reply; 17+ messages in thread
From: EJ Vincent @ 2012-10-02  8:34 UTC (permalink / raw)
  To: NeilBrown; +Cc: Phil Turmel, linux-raid

On 10/2/2012 1:04 AM, NeilBrown wrote:
> On Mon, 01 Oct 2012 23:53:08 -0400 EJ Vincent <ej@ejane.org> wrote:
>
>> On 10/1/2012 10:15 PM, NeilBrown wrote:
>>> On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@ejane.org> wrote:
>>>
>>>> On 9/30/2012 4:28 PM, Phil Turmel wrote:
>>>>> On 09/30/2012 03:25 PM, EJ Vincent wrote:
>>>>>> On 9/30/2012 3:22 PM, Mathias Burén wrote:
>>>>>>> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
>>>>>>> assemble, see the device order?
>>>>>> Hi Mathias,
>>>>>>
>>>>>> I'm under the impression that damage to the metadata has already been
>>>>>> done by 12.04, making a recovery from an older version of Ubuntu
>>>>>> (10.04), impossible.  Is this line of thinking, flawed?
>>>>> Your impression is correct.  Permanent damage to the metadata was done.
>>>>>     You *must* re-create your array.
>>>>>
>>>>> However, you *cannot* use your new version of mdadm, as it will get the
>>>>> data offset wrong.  Your first report showed a data offset of 272.
>>>>> Newer versions of mdadm default to 2048.  You *must* perform all of your
>>>>> "mdadm --create --assume-clean" permutations with 10.04.
>>>>>
>>>>> Do you have *any* dmesg output from the old system?  Or dmesg from the
>>>>> very first boot under 12.04?  That might have enough information to
>>>>> shorten your search.
>>>>>
>>>>> In the future, you should record your setup by saving the output of
>>>>> "mdadm -D" on each array, "mdadm -E" on each member device, and the
>>>>> output of "ls -l /dev/disk/by-id/"
>>>>>
>>>>> Or try my documentation script "lsdrv". [1]
>>>>>
>>>>> HTH,
>>>>>
>>>>> Phil
>>>>>
>>>>> [1] http://github.com/pturmel/lsdrv
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> Hi Phil,
>>>>
>>>> Unfortunately I don't have any dmesg log from the old system or the
>>>> first boot under 12.04.
>>>>
>>>> Getting my system to boot at all under 12.04 was chaotic enough, with
>>>> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
>>>> ravaging my array and then dropping me to a busybox shell over and over
>>>> again.  I didn't think to record the very first error.
>>>>
>>>> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
>>>> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
>>>> labeled as spares.  They are in fact, labeled clean and appear
>>>> *different* from the others.
>>>>
>>>> Could these disks still contain my metadata from 10.04?  I recall during
>>>> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
>>>> that I could drop in a SATA CD/DVDRW into the slot.
>>>>
>>>> I am downloading 10.04.4 LTS and will be ready to use it soon.  I fear
>>>> having to do permutations-- 9! (factorial) would mean 362,880
>>>> combinations.  *gasp*
>>> You might be able to avoid the 9! combinations, which could take a while ...
>>> 4 days if you could test one per second.
>>>
>>> Try this:
>>>
>>>    for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \
>>>       skip=4256 | od -D | head -n1; done
>>>
>>> This reads that 'dev_number' fields out of the metadata on each device.
>>> This should not have been corrupted by the bug.
>>> You might want some other pattern in place of "/dev/sd?1" - it needs to match
>>> all the devices in your array.
>>>
>>> Then on one of the devices which doesn't have corrupted metadata, run
>>>
>>>     dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d
>>>
>>> where $COUNT is one more than the largest number that was reported in the
>>> "dev_number" values reported above.
>>>
>>> Now for each device, take the dev_number that was reported, use that as an
>>> index into the list of numbers produced by the second command, and that
>>> number if the role of the device in the array.  i.e. it's position in the
>>> list.
>>>
>>> So after making an array of 5 'loop' devices in a non-obvious order, and
>>> failing a device and re-adding it:
>>>
>>> # for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done
>>> /dev/loop0 0000000          3
>>> /dev/loop1 0000000          4
>>> /dev/loop2 0000000          1
>>> /dev/loop3 0000000          0
>>> /dev/loop4 0000000          5
>>>
>>> and
>>>
>>> # dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d
>>> 0000000     0     1 65534     3     4     2
>>> 0000014
>>>
>>> So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3'
>>> /dev/loop1 has 'dev_number' 4, so is device 4
>>> /dev/loop4 has dev_number '5', so is device 2
>>> etc
>>> So we can reconstruct the order of devices:
>>>
>>>    /dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1
>>>
>>> Note the '65534' in the list means that there is no device with that
>>> dev_number.  i.e. no device is number '2', and looking at the list confirms
>>> that.
>>>
>>> You should be able to perform the same steps to recover the correct order to
>>> try creating the array.
>>>
>>> NeilBrown
>>>
>>
>> Hi Neil,
>>
>> Thank you so much for taking the time to help me through this.
>>
>> Here's what I've come up with, per your instructions:
>>
>> /dev/sda1 0000000          4
>> /dev/sdb1 0000000         11
>> /dev/sdc1 0000000          7
>> /dev/sde1 0000000          8
>> /dev/sdf1 0000000          1
>> /dev/sdg1 0000000          0
>> /dev/sdh1 0000000          6
>> /dev/sdi1 0000000         10
>> /dev/sdj1 0000000          9
>>
>> dd 2> /dev/null if=/dev/sdc1 bs=2 count=12 skip=2176 | od -d
>> 0000000     0     1 65534 65534     2 65534     4     5
>> 0000020     6     7     8     3
>> 0000030
>>
>> Mind doing a sanity check for me?
>>
>> Based on the above information, one such possible device order is:
>>
>> /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* /dev/sda1 /dev/sdj1* /dev/sdh1
>> /dev/sdc1 /dev/sde1
>>
>> where * represents the three unknown devices marked by 65534?
> Nope.  The 65534 entries should never come into it.
>
>   sdg1 sdf1 sda1 sdb1 sdh1 sdc1 sde1 sdj1 sdi1
>
> e.g. sdi1 is device '10'.  Entry 10 in the array is 8, so sdi1 goes in
> position 8.
>
>> Once I have your blessing, would I then proceed to:
>>
>> mdadm --create /dev/md0 --assume-clean --level=6 --raid-devices=9
>> --metadata=1.2 --chunk=512 /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1*
>> /dev/sda1 /dev/sdj1* /dev/sdh1 /dev/sdc1 /dev/sde1
>>
>> and this is non-destructive, so I can attempt different orders?
> Yes.  Well, it destroys the metadata so make sure you have a copy of the "-E"
> for each device, and it wouldn't hurt to run that second 'dd' command on
> every device and keep that just in case.
>
> NeilBrown
>
>> Again, thank you for the help.
>>
>> Best wishes,
>>
>> -EJ

Neil,

I've successfully re-created the array using the corrected device order 
you specified.

For the purpose of documenting,

I immediately started an 'xfs_check', but due to the size of the 
filesystem, it quickly (under 90 seconds) consumed all available memory 
on the server (16GB).  I instead used 'xfs_repair -n', which ran for 
about one minute before returning me to a shell (no errors reported):

(-n     No modify mode. Specifies that xfs_repair should not modify the 
filesystem but should only scan the filesystem and indicate what repairs 
would have been made.)

I then set the sync_action under /sys/block/md0/md/ to 'check' and also 
increased the stripe_cache_size to something not so modest, 4096 up from 
256.  I'm monitoring /sys/block/md0/md/mismatch_cnt using tail -f and so 
far it has been stuck at 0, a good sign for sure.  I'm well on my way to 
a complete recovery (about 25% checked as of writing this).

I want to thank you again Neil (and the rest of the linux-raid mailing 
list) for the absolutely flawless and expert support you've provided.

Best wishes,

-EJ
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED]
  2012-10-02  8:34                   ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED] EJ Vincent
@ 2012-10-02 12:18                     ` Phil Turmel
  0 siblings, 0 replies; 17+ messages in thread
From: Phil Turmel @ 2012-10-02 12:18 UTC (permalink / raw)
  To: EJ Vincent; +Cc: NeilBrown, linux-raid

On 10/02/2012 04:34 AM, EJ Vincent wrote:

> Neil,
> 
> I've successfully re-created the array using the corrected device order
> you specified.

Great news.  I'm tucking Neil's procedure away in my toolbox...

Phil

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-10-02 12:18 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-30  9:21 Upgrade from Ubuntu 10.04 to 12.04 broken raid6 EJ
2012-09-30  9:30 ` EJ Vincent
2012-09-30  9:44 ` Jan Ceuleers
2012-09-30 10:04 ` Mikael Abrahamsson
2012-09-30 19:20   ` EJ Vincent
2012-09-30 19:22     ` Mathias Burén
2012-09-30 19:25       ` EJ Vincent
2012-09-30 20:28         ` Phil Turmel
2012-09-30 23:23           ` EJ Vincent
2012-10-01 12:40             ` Phil Turmel
2012-10-01 17:14               ` EJ Vincent
2012-10-02  2:15             ` NeilBrown
2012-10-02  3:53               ` EJ Vincent
2012-10-02  5:04                 ` NeilBrown
2012-10-02  8:34                   ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED] EJ Vincent
2012-10-02 12:18                     ` Phil Turmel
2012-09-30 19:50     ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6 Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).