* Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
@ 2012-09-30 9:21 EJ
2012-09-30 9:30 ` EJ Vincent
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: EJ @ 2012-09-30 9:21 UTC (permalink / raw)
To: linux-raid
Greetings,
I hope that I'm posting this in the right place, if not my apologies.
Up until several hours ago, my system was running Ubuntu 10.04 LTS, using the
stock version of mdadm--unfortunately I have no idea which version it was.
Fast forward to now, I've upgraded the system to 12.04 LTS and have lost access
to my array. The array itself is a nine (9) disk raid6 managed by mdadm.
I'm not sure this is pertinent information, but trying to get 12.04 LTS to boot
was an exercise in patience. There was some sort of race condition possibly
happening between the disks of the array initializing and 12.04's udev. It would
constantly drop me to a busybox shell, trying to degrade the known-working
array.
Eventually, I had to go into /usr/share/initramfs-tools/scripts/mdadm-functions
and type "exit 1" into both degraded_arrays() and mountroot_fail() so that my
system could at the very least boot. I fear that the constant rebooting and
12.04's aggressive initramfs scripting has somehow damaged my array.
Ok back to the array itself, here's some raw command data:
# mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
/dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1
mdadm: superblock on /dev/sdc1 doesn't match others - assembly aborted
I also tried # mdadm --auto-detect and found this in dmesg:
[ 676.998212] md: Autodetecting RAID arrays.
[ 676.998426] md: invalid raid superblock magic on sdc1
[ 676.998458] md: sdc1 does not have a valid v0.90 superblock, not importing!
[ 676.998870] md: invalid raid superblock magic on sde1
[ 676.998911] md: sde1 does not have a valid v0.90 superblock, not importing!
[ 676.999474] md: invalid raid superblock magic on sdb1
[ 676.999495] md: sdb1 does not have a valid v0.90 superblock, not importing!
[ 676.999703] md: invalid raid superblock magic on sdd1
[ 676.999732] md: sdd1 does not have a valid v0.90 superblock, not importing!
[ 677.000137] md: invalid raid superblock magic on sdf1
[ 677.000163] md: sdf1 does not have a valid v0.90 superblock, not importing!
[ 677.000566] md: invalid raid superblock magic on sdg1
[ 677.000586] md: sdg1 does not have a valid v0.90 superblock, not importing!
[ 677.000940] md: invalid raid superblock magic on sdh1
[ 677.000960] md: sdh1 does not have a valid v0.90 superblock, not importing!
[ 677.001356] md: invalid raid superblock magic on sdi1
[ 677.001375] md: sdi1 does not have a valid v0.90 superblock, not importing!
[ 677.001841] md: invalid raid superblock magic on sdj1
[ 677.001871] md: sdj1 does not have a valid v0.90 superblock, not importing!
[ 677.001933] md: Scanned 9 and added 0 devices.
[ 677.001938] md: autorun ...
[ 677.001941] md: ... autorun DONE.
Here are the disks themselves:
# mdadm -E /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 19:40:25 2011
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : a6fd99b2:7bb75287:5d844ec5:822b6d8a
Update Time : Sun Sep 30 04:34:27 2012
Checksum : 760485cb - correct
Events : 2474296
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAAAAAAAA ('A' == active, '.' == missing)
# mdadm -E /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 19:40:25 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : f3f72549:8543972f:1f4a655d:fa9416bd
Update Time : Sun Sep 30 07:26:43 2012
Checksum : 7e955e4e - correct
Events : 1
Device Role : spare
Array State : ('A' == active, '.' == missing)
# mdadm -E /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 19:40:25 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 9c908e4b:ad7d8af8:ff5d2ab6:50b013e5
Update Time : Sun Sep 30 07:26:43 2012
Checksum : cab36055 - correct
Events : 1
Device Role : spare
Array State : ('A' == active, '.' == missing)
# mdadm -E /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 19:40:25 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 321368f6:9f38bc16:76f787c3:4b3d398d
Update Time : Sun Sep 30 07:26:43 2012
Checksum : 4941c455 - correct
Events : 1
Device Role : spare
Array State : ('A' == active, '.' == missing)
# mdadm -E /dev/sdf1
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 19:40:25 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 6190765b:200ff748:d50a75e3:597405c4
Update Time : Sun Sep 30 07:26:43 2012
Checksum : 37446270 - correct
Events : 1
Device Role : spare
Array State : ('A' == active, '.' == missing)
# mdadm -E /dev/sdg1
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 19:40:25 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 7d707598:a8881376:531ae0c6:aac82909
Update Time : Sun Sep 30 07:26:43 2012
Checksum : c9ef1fe9 - correct
Events : 1
Device Role : spare
Array State : ('A' == active, '.' == missing)
# mdadm -E /dev/sdh1
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 19:40:25 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 179691a0:fd201c2d:49c73803:409a0a9c
Update Time : Sun Sep 30 07:26:43 2012
Checksum : 584d5c61 - correct
Events : 1
Device Role : spare
Array State : ('A' == active, '.' == missing)
# mdadm -E /dev/sdi1
/dev/sdi1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 19:40:25 2011
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 9d53248b:1db27ffc:a2a511c3:7176a7eb
Update Time : Sun Sep 30 04:34:27 2012
Checksum : 22b9429c - correct
Events : 2474296
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 8
Array State : AAAAAAAAA ('A' == active, '.' == missing)
# mdadm -E /dev/sdj1
/dev/sdj1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 19:40:25 2011
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 880ed7fb:b9c673de:929d14c5:53f9b81d
Update Time : Sun Sep 30 04:34:27 2012
Checksum : a9748cf3 - correct
Events : 2474296
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 7
Array State : AAAAAAAAA ('A' == active, '.' == missing)
I find it odd that the raid levels for some of the disks would register as
"unknown" and that their device roles would be shifted to "spare".
Current system:
Linux ruby 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64
x86_64 x86_64 GNU/Linux
Mdadm version:
mdadm - v3.2.3 - 23rd December 2011
I hope I've provided enough information. I would be more than happy to elaborate
or provide additional data if need be. Again, this array was functioning
normally up until a few hours ago. Am I able to salvage my data?
Thank you.
-EJ
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-09-30 9:21 Upgrade from Ubuntu 10.04 to 12.04 broken raid6 EJ
@ 2012-09-30 9:30 ` EJ Vincent
2012-09-30 9:44 ` Jan Ceuleers
2012-09-30 10:04 ` Mikael Abrahamsson
2 siblings, 0 replies; 17+ messages in thread
From: EJ Vincent @ 2012-09-30 9:30 UTC (permalink / raw)
To: linux-raid
On 9/30/2012 5:21 AM, EJ wrote:
> Greetings,
>
> I hope that I'm posting this in the right place, if not my apologies.
>
> Up until several hours ago, my system was running Ubuntu 10.04 LTS, using the
> stock version of mdadm--unfortunately I have no idea which version it was.
>
> Fast forward to now, I've upgraded the system to 12.04 LTS and have lost access
> to my array. The array itself is a nine (9) disk raid6 managed by mdadm.
>
> I'm not sure this is pertinent information, but trying to get 12.04 LTS to boot
> was an exercise in patience. There was some sort of race condition possibly
> happening between the disks of the array initializing and 12.04's udev. It would
> constantly drop me to a busybox shell, trying to degrade the known-working
> array.
>
> Eventually, I had to go into /usr/share/initramfs-tools/scripts/mdadm-functions
> and type "exit 1" into both degraded_arrays() and mountroot_fail() so that my
> system could at the very least boot. I fear that the constant rebooting and
> 12.04's aggressive initramfs scripting has somehow damaged my array.
>
> Ok back to the array itself, here's some raw command data:
>
> # mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
> /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1
> mdadm: superblock on /dev/sdc1 doesn't match others - assembly aborted
>
> I also tried # mdadm --auto-detect and found this in dmesg:
>
> [ 676.998212] md: Autodetecting RAID arrays.
> [ 676.998426] md: invalid raid superblock magic on sdc1
> [ 676.998458] md: sdc1 does not have a valid v0.90 superblock, not importing!
> [ 676.998870] md: invalid raid superblock magic on sde1
> [ 676.998911] md: sde1 does not have a valid v0.90 superblock, not importing!
> [ 676.999474] md: invalid raid superblock magic on sdb1
> [ 676.999495] md: sdb1 does not have a valid v0.90 superblock, not importing!
> [ 676.999703] md: invalid raid superblock magic on sdd1
> [ 676.999732] md: sdd1 does not have a valid v0.90 superblock, not importing!
> [ 677.000137] md: invalid raid superblock magic on sdf1
> [ 677.000163] md: sdf1 does not have a valid v0.90 superblock, not importing!
> [ 677.000566] md: invalid raid superblock magic on sdg1
> [ 677.000586] md: sdg1 does not have a valid v0.90 superblock, not importing!
> [ 677.000940] md: invalid raid superblock magic on sdh1
> [ 677.000960] md: sdh1 does not have a valid v0.90 superblock, not importing!
> [ 677.001356] md: invalid raid superblock magic on sdi1
> [ 677.001375] md: sdi1 does not have a valid v0.90 superblock, not importing!
> [ 677.001841] md: invalid raid superblock magic on sdj1
> [ 677.001871] md: sdj1 does not have a valid v0.90 superblock, not importing!
> [ 677.001933] md: Scanned 9 and added 0 devices.
> [ 677.001938] md: autorun ...
> [ 677.001941] md: ... autorun DONE.
>
> Here are the disks themselves:
>
> # mdadm -E /dev/sdb1
> /dev/sdb1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
> Name : ruby:6 (local to host ruby)
> Creation Time : Mon Apr 11 19:40:25 2011
> Raid Level : raid6
> Raid Devices : 9
>
> Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
> Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
> Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : a6fd99b2:7bb75287:5d844ec5:822b6d8a
>
> Update Time : Sun Sep 30 04:34:27 2012
> Checksum : 760485cb - correct
> Events : 2474296
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 5
> Array State : AAAAAAAAA ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdc1
> /dev/sdc1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
> Name : ruby:6 (local to host ruby)
> Creation Time : Mon Apr 11 19:40:25 2011
> Raid Level : -unknown-
> Raid Devices : 0
>
> Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : f3f72549:8543972f:1f4a655d:fa9416bd
>
> Update Time : Sun Sep 30 07:26:43 2012
> Checksum : 7e955e4e - correct
> Events : 1
>
>
> Device Role : spare
> Array State : ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdd1
> /dev/sdd1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
> Name : ruby:6 (local to host ruby)
> Creation Time : Mon Apr 11 19:40:25 2011
> Raid Level : -unknown-
> Raid Devices : 0
>
> Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 9c908e4b:ad7d8af8:ff5d2ab6:50b013e5
>
> Update Time : Sun Sep 30 07:26:43 2012
> Checksum : cab36055 - correct
> Events : 1
>
>
> Device Role : spare
> Array State : ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sde1
> /dev/sde1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
> Name : ruby:6 (local to host ruby)
> Creation Time : Mon Apr 11 19:40:25 2011
> Raid Level : -unknown-
> Raid Devices : 0
>
> Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 321368f6:9f38bc16:76f787c3:4b3d398d
>
> Update Time : Sun Sep 30 07:26:43 2012
> Checksum : 4941c455 - correct
> Events : 1
>
>
> Device Role : spare
> Array State : ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdf1
> /dev/sdf1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
> Name : ruby:6 (local to host ruby)
> Creation Time : Mon Apr 11 19:40:25 2011
> Raid Level : -unknown-
> Raid Devices : 0
>
> Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 6190765b:200ff748:d50a75e3:597405c4
>
> Update Time : Sun Sep 30 07:26:43 2012
> Checksum : 37446270 - correct
> Events : 1
>
>
> Device Role : spare
> Array State : ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdg1
> /dev/sdg1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
> Name : ruby:6 (local to host ruby)
> Creation Time : Mon Apr 11 19:40:25 2011
> Raid Level : -unknown-
> Raid Devices : 0
>
> Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 7d707598:a8881376:531ae0c6:aac82909
>
> Update Time : Sun Sep 30 07:26:43 2012
> Checksum : c9ef1fe9 - correct
> Events : 1
>
>
> Device Role : spare
> Array State : ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdh1
> /dev/sdh1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
> Name : ruby:6 (local to host ruby)
> Creation Time : Mon Apr 11 19:40:25 2011
> Raid Level : -unknown-
> Raid Devices : 0
>
> Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 179691a0:fd201c2d:49c73803:409a0a9c
>
> Update Time : Sun Sep 30 07:26:43 2012
> Checksum : 584d5c61 - correct
> Events : 1
>
>
> Device Role : spare
> Array State : ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdi1
> /dev/sdi1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
> Name : ruby:6 (local to host ruby)
> Creation Time : Mon Apr 11 19:40:25 2011
> Raid Level : raid6
> Raid Devices : 9
>
> Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
> Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
> Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 9d53248b:1db27ffc:a2a511c3:7176a7eb
>
> Update Time : Sun Sep 30 04:34:27 2012
> Checksum : 22b9429c - correct
> Events : 2474296
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 8
> Array State : AAAAAAAAA ('A' == active, '.' == missing)
>
> # mdadm -E /dev/sdj1
> /dev/sdj1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
> Name : ruby:6 (local to host ruby)
> Creation Time : Mon Apr 11 19:40:25 2011
> Raid Level : raid6
> Raid Devices : 9
>
> Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
> Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
> Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 880ed7fb:b9c673de:929d14c5:53f9b81d
>
> Update Time : Sun Sep 30 04:34:27 2012
> Checksum : a9748cf3 - correct
> Events : 2474296
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 7
> Array State : AAAAAAAAA ('A' == active, '.' == missing)
>
> I find it odd that the raid levels for some of the disks would register as
> "unknown" and that their device roles would be shifted to "spare".
>
> Current system:
>
> Linux ruby 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64
> x86_64 x86_64 GNU/Linux
>
> Mdadm version:
>
> mdadm - v3.2.3 - 23rd December 2011
>
> I hope I've provided enough information. I would be more than happy to elaborate
> or provide additional data if need be. Again, this array was functioning
> normally up until a few hours ago. Am I able to salvage my data?
>
> Thank you.
>
> -EJ
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello again, a quick follow-up, I've rebooted the server and
/proc/mdstat now looks like this:
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md6 : inactive sdh1[8](S) sdf1[4](S) sdg1[11](S) sde1[6](S) sdc1[1](S)
sdd1[0](S)
11721080016 blocks super 1.2
$ mdadm -D /dev/md6
mdadm: md device /dev/md6 does not appear to be active.
Although I'm still not sure how to proceed-- I thought it best to
include this information to the list.
Thanks again,
-EJ
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-09-30 9:21 Upgrade from Ubuntu 10.04 to 12.04 broken raid6 EJ
2012-09-30 9:30 ` EJ Vincent
@ 2012-09-30 9:44 ` Jan Ceuleers
2012-09-30 10:04 ` Mikael Abrahamsson
2 siblings, 0 replies; 17+ messages in thread
From: Jan Ceuleers @ 2012-09-30 9:44 UTC (permalink / raw)
To: EJ; +Cc: linux-raid
On 09/30/2012 11:21 AM, EJ wrote:
> Greetings,
>
> I hope that I'm posting this in the right place, if not my apologies.
>
> Up until several hours ago, my system was running Ubuntu 10.04 LTS, using the
> stock version of mdadm--unfortunately I have no idea which version it was.
If your 10.04 installation was up-to-date before you upgraded, you were
running the following version of mdadm:
mdadm - v2.6.7.1 - 15th October 2008
HTH, Jan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-09-30 9:21 Upgrade from Ubuntu 10.04 to 12.04 broken raid6 EJ
2012-09-30 9:30 ` EJ Vincent
2012-09-30 9:44 ` Jan Ceuleers
@ 2012-09-30 10:04 ` Mikael Abrahamsson
2012-09-30 19:20 ` EJ Vincent
2 siblings, 1 reply; 17+ messages in thread
From: Mikael Abrahamsson @ 2012-09-30 10:04 UTC (permalink / raw)
To: EJ; +Cc: linux-raid
On Sun, 30 Sep 2012, EJ wrote:
> Fast forward to now, I've upgraded the system to 12.04 LTS and have lost
> access to my array. The array itself is a nine (9) disk raid6 managed by
> mdadm.
What version of kernel for 12.04 were you running?
If you didn't upgrade your kernel, you might have been hit by the bug
described in:
<http://neil.brown.name/blog/20120615073245>
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-09-30 10:04 ` Mikael Abrahamsson
@ 2012-09-30 19:20 ` EJ Vincent
2012-09-30 19:22 ` Mathias Burén
2012-09-30 19:50 ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6 Chris Murphy
0 siblings, 2 replies; 17+ messages in thread
From: EJ Vincent @ 2012-09-30 19:20 UTC (permalink / raw)
To: linux-raid
On 9/30/2012 6:04 AM, Mikael Abrahamsson wrote:
> On Sun, 30 Sep 2012, EJ wrote:
>
>> Fast forward to now, I've upgraded the system to 12.04 LTS and have
>> lost access to my array. The array itself is a nine (9) disk raid6
>> managed by mdadm.
>
> What version of kernel for 12.04 were you running?
>
> If you didn't upgrade your kernel, you might have been hit by the bug
> described in:
>
> <http://neil.brown.name/blog/20120615073245>
>
Hello,
I'm running the stock version of Ubuntu 12.04.0, using kernel
3.2.0-23-generic.
That link looks interesting-- I'm not sure if I triggered the bug how
Mr. Neil Brown describes it, but I definitely have symptoms on some (not
all) the disks of RAID level "-unknown-" and devices appearing to be
spares.
I'm hesitant to re-create the array again (using mdadm) because
according to that blog post, for RAID-6, the order of devices are
important, and with this being a 9 disk array and no record of device
order in logs or from my own memory, I have no idea what the proper
order might be.
I do know that 1) I was using metadata version 1.2, 2) the array was not
degraded and subsequently 3) no disks were missing.
Am I over-estimating the importance of the order and should proceed with
the re-creation, or perhaps wait for Neil himself to weigh in the problem?
Thanks for all the responses, much appreciated.
-EJ
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-09-30 19:20 ` EJ Vincent
@ 2012-09-30 19:22 ` Mathias Burén
2012-09-30 19:25 ` EJ Vincent
2012-09-30 19:50 ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6 Chris Murphy
1 sibling, 1 reply; 17+ messages in thread
From: Mathias Burén @ 2012-09-30 19:22 UTC (permalink / raw)
To: EJ Vincent; +Cc: linux-raid
On 30 September 2012 20:20, EJ Vincent <ej@ejane.org> wrote:
> On 9/30/2012 6:04 AM, Mikael Abrahamsson wrote:
>>
>> On Sun, 30 Sep 2012, EJ wrote:
>>
>>> Fast forward to now, I've upgraded the system to 12.04 LTS and have lost
>>> access to my array. The array itself is a nine (9) disk raid6 managed by
>>> mdadm.
>>
>>
>> What version of kernel for 12.04 were you running?
>>
>> If you didn't upgrade your kernel, you might have been hit by the bug
>> described in:
>>
>> <http://neil.brown.name/blog/20120615073245>
>>
>
> Hello,
>
> I'm running the stock version of Ubuntu 12.04.0, using kernel
> 3.2.0-23-generic.
>
> That link looks interesting-- I'm not sure if I triggered the bug how Mr.
> Neil Brown describes it, but I definitely have symptoms on some (not all)
> the disks of RAID level "-unknown-" and devices appearing to be spares.
>
> I'm hesitant to re-create the array again (using mdadm) because according to
> that blog post, for RAID-6, the order of devices are important, and with
> this being a 9 disk array and no record of device order in logs or from my
> own memory, I have no idea what the proper order might be.
>
> I do know that 1) I was using metadata version 1.2, 2) the array was not
> degraded and subsequently 3) no disks were missing.
>
> Am I over-estimating the importance of the order and should proceed with the
> re-creation, or perhaps wait for Neil himself to weigh in the problem?
>
> Thanks for all the responses, much appreciated.
>
> -EJ
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Can't you just boot off an older Ubuntu USB, install mdadm and scan /
assemble, see the device order?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-09-30 19:22 ` Mathias Burén
@ 2012-09-30 19:25 ` EJ Vincent
2012-09-30 20:28 ` Phil Turmel
0 siblings, 1 reply; 17+ messages in thread
From: EJ Vincent @ 2012-09-30 19:25 UTC (permalink / raw)
To: linux-raid
On 9/30/2012 3:22 PM, Mathias Burén wrote:
> On 30 September 2012 20:20, EJ Vincent <ej@ejane.org> wrote:
>> On 9/30/2012 6:04 AM, Mikael Abrahamsson wrote:
>>> On Sun, 30 Sep 2012, EJ wrote:
>>>
>>>> Fast forward to now, I've upgraded the system to 12.04 LTS and have lost
>>>> access to my array. The array itself is a nine (9) disk raid6 managed by
>>>> mdadm.
>>>
>>> What version of kernel for 12.04 were you running?
>>>
>>> If you didn't upgrade your kernel, you might have been hit by the bug
>>> described in:
>>>
>>> <http://neil.brown.name/blog/20120615073245>
>>>
>> Hello,
>>
>> I'm running the stock version of Ubuntu 12.04.0, using kernel
>> 3.2.0-23-generic.
>>
>> That link looks interesting-- I'm not sure if I triggered the bug how Mr.
>> Neil Brown describes it, but I definitely have symptoms on some (not all)
>> the disks of RAID level "-unknown-" and devices appearing to be spares.
>>
>> I'm hesitant to re-create the array again (using mdadm) because according to
>> that blog post, for RAID-6, the order of devices are important, and with
>> this being a 9 disk array and no record of device order in logs or from my
>> own memory, I have no idea what the proper order might be.
>>
>> I do know that 1) I was using metadata version 1.2, 2) the array was not
>> degraded and subsequently 3) no disks were missing.
>>
>> Am I over-estimating the importance of the order and should proceed with the
>> re-creation, or perhaps wait for Neil himself to weigh in the problem?
>>
>> Thanks for all the responses, much appreciated.
>>
>> -EJ
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
> assemble, see the device order?
Hi Mathias,
I'm under the impression that damage to the metadata has already been
done by 12.04, making a recovery from an older version of Ubuntu
(10.04), impossible. Is this line of thinking, flawed?
Thanks,
-EJ
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-09-30 19:20 ` EJ Vincent
2012-09-30 19:22 ` Mathias Burén
@ 2012-09-30 19:50 ` Chris Murphy
1 sibling, 0 replies; 17+ messages in thread
From: Chris Murphy @ 2012-09-30 19:50 UTC (permalink / raw)
To: Linux RAID
On Sep 30, 2012, at 1:20 PM, EJ Vincent wrote:
> On 9/30/2012 6:04 AM, Mikael Abrahamsson wrote:
>>
>> <http://neil.brown.name/blog/20120615073245>
> 3.2.0-23-generic.
That kernel is inside the range that had the bug, although I'm not sure if that kernel actually has the bug. The symptoms match up as you say.
>
> That link looks interesting-- I'm not sure if I triggered the bug how Mr. Neil Brown describes it, but I definitely have symptoms on some (not all) the disks of RAID level "-unknown-" and devices appearing to be spares.
>
> I'm hesitant to re-create the array again (using mdadm) because according to that blog post, for RAID-6, the order of devices are important, and with this being a 9 disk array and no record of device order in logs or from my own memory, I have no idea what the proper order might be.
You'll have to iterate, it sounds like, if you have nothing else to go on. Faster to iterate and try again than to blow way the RAID and restore from backup.
> I do know that 1) I was using metadata version 1.2, 2) the array was not degraded and subsequently 3) no disks were missing.
>
> Am I over-estimating the importance of the order and should proceed with the re-creation, or perhaps wait for Neil himself to weigh in the problem?
Either. Just make sure you're using --assume-clean and don't mount it, to prevent either resync or changes to the file system. The echo > check (read only scrub) test described in the blog entry will rather clearly tell you if you get the order of the disks correct. The blog entry is pretty detailed. The question I have remaining is if there's some way for you to cheat and have a better chance at getting the disk order correct.
Chris Murphy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-09-30 19:25 ` EJ Vincent
@ 2012-09-30 20:28 ` Phil Turmel
2012-09-30 23:23 ` EJ Vincent
0 siblings, 1 reply; 17+ messages in thread
From: Phil Turmel @ 2012-09-30 20:28 UTC (permalink / raw)
To: EJ Vincent; +Cc: linux-raid
On 09/30/2012 03:25 PM, EJ Vincent wrote:
> On 9/30/2012 3:22 PM, Mathias Burén wrote:
>> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
>> assemble, see the device order?
>
> Hi Mathias,
>
> I'm under the impression that damage to the metadata has already been
> done by 12.04, making a recovery from an older version of Ubuntu
> (10.04), impossible. Is this line of thinking, flawed?
Your impression is correct. Permanent damage to the metadata was done.
You *must* re-create your array.
However, you *cannot* use your new version of mdadm, as it will get the
data offset wrong. Your first report showed a data offset of 272.
Newer versions of mdadm default to 2048. You *must* perform all of your
"mdadm --create --assume-clean" permutations with 10.04.
Do you have *any* dmesg output from the old system? Or dmesg from the
very first boot under 12.04? That might have enough information to
shorten your search.
In the future, you should record your setup by saving the output of
"mdadm -D" on each array, "mdadm -E" on each member device, and the
output of "ls -l /dev/disk/by-id/"
Or try my documentation script "lsdrv". [1]
HTH,
Phil
[1] http://github.com/pturmel/lsdrv
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-09-30 20:28 ` Phil Turmel
@ 2012-09-30 23:23 ` EJ Vincent
2012-10-01 12:40 ` Phil Turmel
2012-10-02 2:15 ` NeilBrown
0 siblings, 2 replies; 17+ messages in thread
From: EJ Vincent @ 2012-09-30 23:23 UTC (permalink / raw)
To: Phil Turmel; +Cc: linux-raid
On 9/30/2012 4:28 PM, Phil Turmel wrote:
> On 09/30/2012 03:25 PM, EJ Vincent wrote:
>> On 9/30/2012 3:22 PM, Mathias Burén wrote:
>>> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
>>> assemble, see the device order?
>> Hi Mathias,
>>
>> I'm under the impression that damage to the metadata has already been
>> done by 12.04, making a recovery from an older version of Ubuntu
>> (10.04), impossible. Is this line of thinking, flawed?
> Your impression is correct. Permanent damage to the metadata was done.
> You *must* re-create your array.
>
> However, you *cannot* use your new version of mdadm, as it will get the
> data offset wrong. Your first report showed a data offset of 272.
> Newer versions of mdadm default to 2048. You *must* perform all of your
> "mdadm --create --assume-clean" permutations with 10.04.
>
> Do you have *any* dmesg output from the old system? Or dmesg from the
> very first boot under 12.04? That might have enough information to
> shorten your search.
>
> In the future, you should record your setup by saving the output of
> "mdadm -D" on each array, "mdadm -E" on each member device, and the
> output of "ls -l /dev/disk/by-id/"
>
> Or try my documentation script "lsdrv". [1]
>
> HTH,
>
> Phil
>
> [1] http://github.com/pturmel/lsdrv
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Phil,
Unfortunately I don't have any dmesg log from the old system or the
first boot under 12.04.
Getting my system to boot at all under 12.04 was chaotic enough, with
the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
ravaging my array and then dropping me to a busybox shell over and over
again. I didn't think to record the very first error.
Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
/dev/sdj1 don't have the Raid level "-unknown-", neither are they
labeled as spares. They are in fact, labeled clean and appear
*different* from the others.
Could these disks still contain my metadata from 10.04? I recall during
my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
that I could drop in a SATA CD/DVDRW into the slot.
I am downloading 10.04.4 LTS and will be ready to use it soon. I fear
having to do permutations-- 9! (factorial) would mean 362,880
combinations. *gasp*
Many thanks for all your comments and insights.
-EJ
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-09-30 23:23 ` EJ Vincent
@ 2012-10-01 12:40 ` Phil Turmel
2012-10-01 17:14 ` EJ Vincent
2012-10-02 2:15 ` NeilBrown
1 sibling, 1 reply; 17+ messages in thread
From: Phil Turmel @ 2012-10-01 12:40 UTC (permalink / raw)
To: EJ Vincent; +Cc: linux-raid
Hi EJ,
On 09/30/2012 07:23 PM, EJ Vincent wrote:
> On 9/30/2012 4:28 PM, Phil Turmel wrote:
>> Do you have *any* dmesg output from the old system? Or dmesg from the
>> very first boot under 12.04? That might have enough information to
>> shorten your search.
>>
>> In the future, you should record your setup by saving the output of
>> "mdadm -D" on each array, "mdadm -E" on each member device, and the
>> output of "ls -l /dev/disk/by-id/"
>>
>> Or try my documentation script "lsdrv". [1]
>>
>> HTH,
>>
>> Phil
>>
>> [1] http://github.com/pturmel/lsdrv
>
> Hi Phil,
>
> Unfortunately I don't have any dmesg log from the old system or the
> first boot under 12.04.
>
> Getting my system to boot at all under 12.04 was chaotic enough, with
> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
> ravaging my array and then dropping me to a busybox shell over and over
> again. I didn't think to record the very first error.
I'm not prepared to condemn the 12.04 initramfs--I really don't think it
is a factor in this crisis. The critical part is the degraded reboot bug.
> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
> labeled as spares. They are in fact, labeled clean and appear
> *different* from the others.
>
> Could these disks still contain my metadata from 10.04? I recall during
> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
> that I could drop in a SATA CD/DVDRW into the slot.
Leaving disks unpowered sounds like a key factor in your crisis. Raid6
can't operate with more than two missing, and won't assemble if any disk
disappears between shutdown and the next boot. (Must be forced.)
So your array would only partially assemble under 12.04 due to
deliberately missing drives, then you rebooted with a kernel that has a
problem with that scenario.
The disks very likely do have useful metadata, but no disk has all of
it. It might reduce the permutations you need to try. If you share
more information about your system layout, some educated first guesses
might be possible, too. The output of "mdadm -E" for every drive, and
lsdrv for an overview.
> I am downloading 10.04.4 LTS and will be ready to use it soon. I fear
> having to do permutations-- 9! (factorial) would mean 362,880
> combinations. *gasp*
Phil
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-10-01 12:40 ` Phil Turmel
@ 2012-10-01 17:14 ` EJ Vincent
0 siblings, 0 replies; 17+ messages in thread
From: EJ Vincent @ 2012-10-01 17:14 UTC (permalink / raw)
To: linux-raid
On 10/1/2012 8:40 AM, Phil Turmel wrote:
> Hi EJ,
>
> On 09/30/2012 07:23 PM, EJ Vincent wrote:
>> On 9/30/2012 4:28 PM, Phil Turmel wrote:
>>> Do you have *any* dmesg output from the old system? Or dmesg from the
>>> very first boot under 12.04? That might have enough information to
>>> shorten your search.
>>>
>>> In the future, you should record your setup by saving the output of
>>> "mdadm -D" on each array, "mdadm -E" on each member device, and the
>>> output of "ls -l /dev/disk/by-id/"
>>>
>>> Or try my documentation script "lsdrv". [1]
>>>
>>> HTH,
>>>
>>> Phil
>>>
>>> [1] http://github.com/pturmel/lsdrv
>> Hi Phil,
>>
>> Unfortunately I don't have any dmesg log from the old system or the
>> first boot under 12.04.
>>
>> Getting my system to boot at all under 12.04 was chaotic enough, with
>> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
>> ravaging my array and then dropping me to a busybox shell over and over
>> again. I didn't think to record the very first error.
> I'm not prepared to condemn the 12.04 initramfs--I really don't think it
> is a factor in this crisis. The critical part is the degraded reboot bug.
>
>> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
>> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
>> labeled as spares. They are in fact, labeled clean and appear
>> *different* from the others.
>>
>> Could these disks still contain my metadata from 10.04? I recall during
>> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
>> that I could drop in a SATA CD/DVDRW into the slot.
> Leaving disks unpowered sounds like a key factor in your crisis. Raid6
> can't operate with more than two missing, and won't assemble if any disk
> disappears between shutdown and the next boot. (Must be forced.)
>
> So your array would only partially assemble under 12.04 due to
> deliberately missing drives, then you rebooted with a kernel that has a
> problem with that scenario.
>
> The disks very likely do have useful metadata, but no disk has all of
> it. It might reduce the permutations you need to try. If you share
> more information about your system layout, some educated first guesses
> might be possible, too. The output of "mdadm -E" for every drive, and
> lsdrv for an overview.
>
>> I am downloading 10.04.4 LTS and will be ready to use it soon. I fear
>> having to do permutations-- 9! (factorial) would mean 362,880
>> combinations. *gasp*
> Phil
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/1/2012 8:40 AM, Phil Turmel wrote:
> Hi EJ,
>
> On 09/30/2012 07:23 PM, EJ Vincent wrote:
>> On 9/30/2012 4:28 PM, Phil Turmel wrote:
>>> Do you have *any* dmesg output from the old system? Or dmesg from the
>>> very first boot under 12.04? That might have enough information to
>>> shorten your search.
>>>
>>> In the future, you should record your setup by saving the output of
>>> "mdadm -D" on each array, "mdadm -E" on each member device, and the
>>> output of "ls -l /dev/disk/by-id/"
>>>
>>> Or try my documentation script "lsdrv". [1]
>>>
>>> HTH,
>>>
>>> Phil
>>>
>>> [1] http://github.com/pturmel/lsdrv
>> Hi Phil,
>>
>> Unfortunately I don't have any dmesg log from the old system or the
>> first boot under 12.04.
>>
>> Getting my system to boot at all under 12.04 was chaotic enough, with
>> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
>> ravaging my array and then dropping me to a busybox shell over and over
>> again. I didn't think to record the very first error.
> I'm not prepared to condemn the 12.04 initramfs--I really don't think it
> is a factor in this crisis. The critical part is the degraded reboot bug.
>
>> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
>> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
>> labeled as spares. They are in fact, labeled clean and appear
>> *different* from the others.
>>
>> Could these disks still contain my metadata from 10.04? I recall during
>> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
>> that I could drop in a SATA CD/DVDRW into the slot.
> Leaving disks unpowered sounds like a key factor in your crisis. Raid6
> can't operate with more than two missing, and won't assemble if any disk
> disappears between shutdown and the next boot. (Must be forced.)
>
> So your array would only partially assemble under 12.04 due to
> deliberately missing drives, then you rebooted with a kernel that has a
> problem with that scenario.
>
> The disks very likely do have useful metadata, but no disk has all of
> it. It might reduce the permutations you need to try. If you share
> more information about your system layout, some educated first guesses
> might be possible, too. The output of "mdadm -E" for every drive, and
> lsdrv for an overview.
>
>> I am downloading 10.04.4 LTS and will be ready to use it soon. I fear
>> having to do permutations-- 9! (factorial) would mean 362,880
>> combinations. *gasp*
> Phil
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Phil,
Here's the information you requested.
The server has 10 disks, a dedicated 500GB disk for the operating system
(which Ubuntu 10.04.4 has labeled /dev/sdd), and 9 x 2TB disks
(/dev/sd[a,b,c,e,f,g,h,i,j):
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdd: 500.1 GB, 500107862016 bytes
Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdg: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdh: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdi: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdj: 2000.4 GB, 2000398934016 bytes
The devices are spread amongst an on-board SATA controller, MCP78S
GeForce AHCI, and two SiI 3124 PCI-X SATA controllers.
The layout is as follows: 5 disks are attached to the on-board
controller, 3 attached to one SiI 3124 controller, and 2 attached to the
other SiI 3124 controller.
I've loaded your lsdrv script, here are the results:
PCI [pata_amd] 00:06.0 IDE interface: nVidia Corporation MCP78S [GeForce
8200] IDE (rev a1)
scsi 0:x:x:x [Empty]
scsi 1:x:x:x [Empty]
PCI [sata_sil24] 06:04.0 RAID bus controller: Silicon Image, Inc. SiI
3124 PCI-X Serial ATA Controller (rev 02)
scsi 2:0:0:0 ATA ST2000DL003-9VT1
sda 1.82t [8:0] Empty/Unknown
sda1 1.82t [8:1] Empty/Unknown
scsi 5:0:0:0 ATA ST2000DL003-9VT1
sdb 1.82t [8:16] Empty/Unknown
sdb1 1.82t [8:17] Empty/Unknown
scsi 7:0:0:0 ATA ST2000DL003-9VT1
sdc 1.82t [8:32] Empty/Unknown
sdc1 1.82t [8:33] Empty/Unknown
scsi 9:x:x:x [Empty]
PCI [ahci] 00:09.0 SATA controller: nVidia Corporation MCP78S [GeForce
8200] AHCI Controller (rev a2)
scsi 3:0:0:0 ATA WDC WD5000AAKS-2
sdd 465.76g [8:48] Empty/Unknown
sdd1 237.00m [8:49] Empty/Unknown
Mounted as /dev/sdd1 @ /boot
sdd2 3.73g [8:50] Empty/Unknown
sdd3 23.28g [8:51] Empty/Unknown
Mounted as /dev/disk/by-uuid/65a128d3-3e2e-487a-a36b-11cbe5530429 @ /
sdd4 438.52g [8:52] Empty/Unknown
scsi 4:0:0:0 ATA ST2000DL003-9VT1
sde 1.82t [8:64] Empty/Unknown
sde1 1.82t [8:65] Empty/Unknown
scsi 6:0:0:0 ATA ST32000542AS
sdf 1.82t [8:80] Empty/Unknown
sdf1 1.82t [8:81] Empty/Unknown
scsi 8:0:0:0 ATA ST32000542AS
sdg 1.82t [8:96] Empty/Unknown
sdg1 1.82t [8:97] Empty/Unknown
scsi 10:0:0:0 ATA ST2000DL003-9VT1
sdh 1.82t [8:112] Empty/Unknown
sdh1 1.82t [8:113] Empty/Unknown
scsi 11:x:x:x [Empty]
PCI [sata_sil24] 08:04.0 RAID bus controller: Silicon Image, Inc. SiI
3124 PCI-X Serial ATA Controller (rev 02)
scsi 12:0:0:0 ATA ST2000DL003-9VT1
sdi 1.82t [8:128] Empty/Unknown
sdi1 1.82t [8:129] Empty/Unknown
scsi 13:0:0:0 ATA ST2000DL003-9VT1
sdj 1.82t [8:144] Empty/Unknown
sdj1 1.82t [8:145] Empty/Unknown
scsi 14:x:x:x [Empty]
scsi 15:x:x:x [Empty]
Here is what mdadm -E looks like for each member of the array, now under
Ubuntu 10.04.4:
# mdadm -E /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 15:40:25 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 6190765b:200ff748:d50a75e3:597405c4
Update Time : Sun Sep 30 19:13:16 2012
Checksum : 37454049 - correct
Events : 1
Array Slot : 4 (empty, empty, failed, failed, empty, failed, empty,
failed, empty, failed, failed, empty, failed... <shortened for readability>)
Array State : 378 failed
# mdadm -E /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 15:40:25 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 7d707598:a8881376:531ae0c6:aac82909
Update Time : Sun Sep 30 19:13:16 2012
Checksum : c9effdc2 - correct
Events : 1
Array Slot : 11 (empty, empty, failed, failed, empty, failed,
empty, failed, empty, failed, failed, empty, failed... <shortened for
readability>)
Array State : 378 failed
# mdadm -E /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 15:40:25 2011
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : a6fd99b2:7bb75287:5d844ec5:822b6d8a
Update Time : Sun Sep 30 00:34:27 2012
Checksum : 760485cb - correct
Events : 2474296
Chunk Size : 512K
Array Slot : 7 (0, 1, failed, failed, 2, failed, 4, 5, 6, 7, 8, 3)
Array State : uuuuuUuuu 3 failed
# mdadm -E /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 15:40:25 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 179691a0:fd201c2d:49c73803:409a0a9c
Update Time : Sun Sep 30 19:13:16 2012
Checksum : 584e3a3a - correct
Events : 1
Array Slot : 8 (empty, empty, failed, failed, empty, failed, empty,
failed, empty, failed, failed, empty, failed... <shortened for readability>)
Array State : 378 failed
# mdadm -E /dev/sdf1
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 15:40:25 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : f3f72549:8543972f:1f4a655d:fa9416bd
Update Time : Sun Sep 30 19:13:16 2012
Checksum : 7e963c27 - correct
Events : 1
Array Slot : 1 (empty, empty, failed, failed, empty, failed, empty,
failed, empty, failed, failed, empty, failed... <shortened for readability>)
Array State : 378 failed
# mdadm -E /dev/sdg1
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 15:40:25 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 9c908e4b:ad7d8af8:ff5d2ab6:50b013e5
Update Time : Sun Sep 30 19:13:16 2012
Checksum : cab43e2e - correct
Events : 1
Array Slot : 0 (empty, empty, failed, failed, empty, failed, empty,
failed, empty, failed, failed, empty, failed... <shortened for readability>)
Array State : 378 failed
# mdadm -E /dev/sdh1
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 15:40:25 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 321368f6:9f38bc16:76f787c3:4b3d398d
Update Time : Sun Sep 30 19:13:16 2012
Checksum : 4942a22e - correct
Events : 1
Array Slot : 6 (empty, empty, failed, failed, empty, failed, empty,
failed, empty, failed, failed, empty, failed... <shortened for readability>)
Array State : 378 failed
# mdadm -E /dev/sdi1
/dev/sdi1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 15:40:25 2011
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 9d53248b:1db27ffc:a2a511c3:7176a7eb
Update Time : Sun Sep 30 00:34:27 2012
Checksum : 22b9429c - correct
Events : 2474296
Chunk Size : 512K
Array Slot : 10 (0, 1, failed, failed, 2, failed, 4, 5, 6, 7, 8, 3)
Array State : uuuuuuuuU 3 failed
# mdadm -E /dev/sdj1
/dev/sdj1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
Name : ruby:6 (local to host ruby)
Creation Time : Mon Apr 11 15:40:25 2011
Raid Level : raid6
Raid Devices : 9
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 880ed7fb:b9c673de:929d14c5:53f9b81d
Update Time : Sun Sep 30 00:34:27 2012
Checksum : a9748cf3 - correct
Events : 2474296
Chunk Size : 512K
Array Slot : 9 (0, 1, failed, failed, 2, failed, 4, 5, 6, 7, 8, 3)
Array State : uuuuuuuUu 3 failed
I'd be happy to also supply a dump of 'lshw' which I believe is similar
to 'lsdrv' if that would be useful to you. The system is back on
10.04.4 LTS, and is using mdadm version 2.6.7.1.
Thanks for your continued input and assistance. Much appreciated.
-EJ
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-09-30 23:23 ` EJ Vincent
2012-10-01 12:40 ` Phil Turmel
@ 2012-10-02 2:15 ` NeilBrown
2012-10-02 3:53 ` EJ Vincent
1 sibling, 1 reply; 17+ messages in thread
From: NeilBrown @ 2012-10-02 2:15 UTC (permalink / raw)
To: EJ Vincent; +Cc: Phil Turmel, linux-raid
[-- Attachment #1: Type: text/plain, Size: 4780 bytes --]
On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@ejane.org> wrote:
> On 9/30/2012 4:28 PM, Phil Turmel wrote:
> > On 09/30/2012 03:25 PM, EJ Vincent wrote:
> >> On 9/30/2012 3:22 PM, Mathias Burén wrote:
> >>> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
> >>> assemble, see the device order?
> >> Hi Mathias,
> >>
> >> I'm under the impression that damage to the metadata has already been
> >> done by 12.04, making a recovery from an older version of Ubuntu
> >> (10.04), impossible. Is this line of thinking, flawed?
> > Your impression is correct. Permanent damage to the metadata was done.
> > You *must* re-create your array.
> >
> > However, you *cannot* use your new version of mdadm, as it will get the
> > data offset wrong. Your first report showed a data offset of 272.
> > Newer versions of mdadm default to 2048. You *must* perform all of your
> > "mdadm --create --assume-clean" permutations with 10.04.
> >
> > Do you have *any* dmesg output from the old system? Or dmesg from the
> > very first boot under 12.04? That might have enough information to
> > shorten your search.
> >
> > In the future, you should record your setup by saving the output of
> > "mdadm -D" on each array, "mdadm -E" on each member device, and the
> > output of "ls -l /dev/disk/by-id/"
> >
> > Or try my documentation script "lsdrv". [1]
> >
> > HTH,
> >
> > Phil
> >
> > [1] http://github.com/pturmel/lsdrv
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> Hi Phil,
>
> Unfortunately I don't have any dmesg log from the old system or the
> first boot under 12.04.
>
> Getting my system to boot at all under 12.04 was chaotic enough, with
> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
> ravaging my array and then dropping me to a busybox shell over and over
> again. I didn't think to record the very first error.
>
> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
> labeled as spares. They are in fact, labeled clean and appear
> *different* from the others.
>
> Could these disks still contain my metadata from 10.04? I recall during
> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
> that I could drop in a SATA CD/DVDRW into the slot.
>
> I am downloading 10.04.4 LTS and will be ready to use it soon. I fear
> having to do permutations-- 9! (factorial) would mean 362,880
> combinations. *gasp*
You might be able to avoid the 9! combinations, which could take a while ...
4 days if you could test one per second.
Try this:
for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \
skip=4256 | od -D | head -n1; done
This reads that 'dev_number' fields out of the metadata on each device.
This should not have been corrupted by the bug.
You might want some other pattern in place of "/dev/sd?1" - it needs to match
all the devices in your array.
Then on one of the devices which doesn't have corrupted metadata, run
dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d
where $COUNT is one more than the largest number that was reported in the
"dev_number" values reported above.
Now for each device, take the dev_number that was reported, use that as an
index into the list of numbers produced by the second command, and that
number if the role of the device in the array. i.e. it's position in the
list.
So after making an array of 5 'loop' devices in a non-obvious order, and
failing a device and re-adding it:
# for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done
/dev/loop0 0000000 3
/dev/loop1 0000000 4
/dev/loop2 0000000 1
/dev/loop3 0000000 0
/dev/loop4 0000000 5
and
# dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d
0000000 0 1 65534 3 4 2
0000014
So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3'
/dev/loop1 has 'dev_number' 4, so is device 4
/dev/loop4 has dev_number '5', so is device 2
etc
So we can reconstruct the order of devices:
/dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1
Note the '65534' in the list means that there is no device with that
dev_number. i.e. no device is number '2', and looking at the list confirms
that.
You should be able to perform the same steps to recover the correct order to
try creating the array.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-10-02 2:15 ` NeilBrown
@ 2012-10-02 3:53 ` EJ Vincent
2012-10-02 5:04 ` NeilBrown
0 siblings, 1 reply; 17+ messages in thread
From: EJ Vincent @ 2012-10-02 3:53 UTC (permalink / raw)
To: NeilBrown; +Cc: Phil Turmel, linux-raid
On 10/1/2012 10:15 PM, NeilBrown wrote:
> On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@ejane.org> wrote:
>
>> On 9/30/2012 4:28 PM, Phil Turmel wrote:
>>> On 09/30/2012 03:25 PM, EJ Vincent wrote:
>>>> On 9/30/2012 3:22 PM, Mathias Burén wrote:
>>>>> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
>>>>> assemble, see the device order?
>>>> Hi Mathias,
>>>>
>>>> I'm under the impression that damage to the metadata has already been
>>>> done by 12.04, making a recovery from an older version of Ubuntu
>>>> (10.04), impossible. Is this line of thinking, flawed?
>>> Your impression is correct. Permanent damage to the metadata was done.
>>> You *must* re-create your array.
>>>
>>> However, you *cannot* use your new version of mdadm, as it will get the
>>> data offset wrong. Your first report showed a data offset of 272.
>>> Newer versions of mdadm default to 2048. You *must* perform all of your
>>> "mdadm --create --assume-clean" permutations with 10.04.
>>>
>>> Do you have *any* dmesg output from the old system? Or dmesg from the
>>> very first boot under 12.04? That might have enough information to
>>> shorten your search.
>>>
>>> In the future, you should record your setup by saving the output of
>>> "mdadm -D" on each array, "mdadm -E" on each member device, and the
>>> output of "ls -l /dev/disk/by-id/"
>>>
>>> Or try my documentation script "lsdrv". [1]
>>>
>>> HTH,
>>>
>>> Phil
>>>
>>> [1] http://github.com/pturmel/lsdrv
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Hi Phil,
>>
>> Unfortunately I don't have any dmesg log from the old system or the
>> first boot under 12.04.
>>
>> Getting my system to boot at all under 12.04 was chaotic enough, with
>> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
>> ravaging my array and then dropping me to a busybox shell over and over
>> again. I didn't think to record the very first error.
>>
>> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
>> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
>> labeled as spares. They are in fact, labeled clean and appear
>> *different* from the others.
>>
>> Could these disks still contain my metadata from 10.04? I recall during
>> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
>> that I could drop in a SATA CD/DVDRW into the slot.
>>
>> I am downloading 10.04.4 LTS and will be ready to use it soon. I fear
>> having to do permutations-- 9! (factorial) would mean 362,880
>> combinations. *gasp*
> You might be able to avoid the 9! combinations, which could take a while ...
> 4 days if you could test one per second.
>
> Try this:
>
> for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \
> skip=4256 | od -D | head -n1; done
>
> This reads that 'dev_number' fields out of the metadata on each device.
> This should not have been corrupted by the bug.
> You might want some other pattern in place of "/dev/sd?1" - it needs to match
> all the devices in your array.
>
> Then on one of the devices which doesn't have corrupted metadata, run
>
> dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d
>
> where $COUNT is one more than the largest number that was reported in the
> "dev_number" values reported above.
>
> Now for each device, take the dev_number that was reported, use that as an
> index into the list of numbers produced by the second command, and that
> number if the role of the device in the array. i.e. it's position in the
> list.
>
> So after making an array of 5 'loop' devices in a non-obvious order, and
> failing a device and re-adding it:
>
> # for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done
> /dev/loop0 0000000 3
> /dev/loop1 0000000 4
> /dev/loop2 0000000 1
> /dev/loop3 0000000 0
> /dev/loop4 0000000 5
>
> and
>
> # dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d
> 0000000 0 1 65534 3 4 2
> 0000014
>
> So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3'
> /dev/loop1 has 'dev_number' 4, so is device 4
> /dev/loop4 has dev_number '5', so is device 2
> etc
> So we can reconstruct the order of devices:
>
> /dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1
>
> Note the '65534' in the list means that there is no device with that
> dev_number. i.e. no device is number '2', and looking at the list confirms
> that.
>
> You should be able to perform the same steps to recover the correct order to
> try creating the array.
>
> NeilBrown
>
Hi Neil,
Thank you so much for taking the time to help me through this.
Here's what I've come up with, per your instructions:
/dev/sda1 0000000 4
/dev/sdb1 0000000 11
/dev/sdc1 0000000 7
/dev/sde1 0000000 8
/dev/sdf1 0000000 1
/dev/sdg1 0000000 0
/dev/sdh1 0000000 6
/dev/sdi1 0000000 10
/dev/sdj1 0000000 9
dd 2> /dev/null if=/dev/sdc1 bs=2 count=12 skip=2176 | od -d
0000000 0 1 65534 65534 2 65534 4 5
0000020 6 7 8 3
0000030
Mind doing a sanity check for me?
Based on the above information, one such possible device order is:
/dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* /dev/sda1 /dev/sdj1* /dev/sdh1
/dev/sdc1 /dev/sde1
where * represents the three unknown devices marked by 65534?
Once I have your blessing, would I then proceed to:
mdadm --create /dev/md0 --assume-clean --level=6 --raid-devices=9
--metadata=1.2 --chunk=512 /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1*
/dev/sda1 /dev/sdj1* /dev/sdh1 /dev/sdc1 /dev/sde1
and this is non-destructive, so I can attempt different orders?
Again, thank you for the help.
Best wishes,
-EJ
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.
2012-10-02 3:53 ` EJ Vincent
@ 2012-10-02 5:04 ` NeilBrown
2012-10-02 8:34 ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED] EJ Vincent
0 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2012-10-02 5:04 UTC (permalink / raw)
To: EJ Vincent; +Cc: Phil Turmel, linux-raid
[-- Attachment #1: Type: text/plain, Size: 6953 bytes --]
On Mon, 01 Oct 2012 23:53:08 -0400 EJ Vincent <ej@ejane.org> wrote:
> On 10/1/2012 10:15 PM, NeilBrown wrote:
> > On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@ejane.org> wrote:
> >
> >> On 9/30/2012 4:28 PM, Phil Turmel wrote:
> >>> On 09/30/2012 03:25 PM, EJ Vincent wrote:
> >>>> On 9/30/2012 3:22 PM, Mathias Burén wrote:
> >>>>> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
> >>>>> assemble, see the device order?
> >>>> Hi Mathias,
> >>>>
> >>>> I'm under the impression that damage to the metadata has already been
> >>>> done by 12.04, making a recovery from an older version of Ubuntu
> >>>> (10.04), impossible. Is this line of thinking, flawed?
> >>> Your impression is correct. Permanent damage to the metadata was done.
> >>> You *must* re-create your array.
> >>>
> >>> However, you *cannot* use your new version of mdadm, as it will get the
> >>> data offset wrong. Your first report showed a data offset of 272.
> >>> Newer versions of mdadm default to 2048. You *must* perform all of your
> >>> "mdadm --create --assume-clean" permutations with 10.04.
> >>>
> >>> Do you have *any* dmesg output from the old system? Or dmesg from the
> >>> very first boot under 12.04? That might have enough information to
> >>> shorten your search.
> >>>
> >>> In the future, you should record your setup by saving the output of
> >>> "mdadm -D" on each array, "mdadm -E" on each member device, and the
> >>> output of "ls -l /dev/disk/by-id/"
> >>>
> >>> Or try my documentation script "lsdrv". [1]
> >>>
> >>> HTH,
> >>>
> >>> Phil
> >>>
> >>> [1] http://github.com/pturmel/lsdrv
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> Hi Phil,
> >>
> >> Unfortunately I don't have any dmesg log from the old system or the
> >> first boot under 12.04.
> >>
> >> Getting my system to boot at all under 12.04 was chaotic enough, with
> >> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
> >> ravaging my array and then dropping me to a busybox shell over and over
> >> again. I didn't think to record the very first error.
> >>
> >> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
> >> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
> >> labeled as spares. They are in fact, labeled clean and appear
> >> *different* from the others.
> >>
> >> Could these disks still contain my metadata from 10.04? I recall during
> >> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
> >> that I could drop in a SATA CD/DVDRW into the slot.
> >>
> >> I am downloading 10.04.4 LTS and will be ready to use it soon. I fear
> >> having to do permutations-- 9! (factorial) would mean 362,880
> >> combinations. *gasp*
> > You might be able to avoid the 9! combinations, which could take a while ...
> > 4 days if you could test one per second.
> >
> > Try this:
> >
> > for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \
> > skip=4256 | od -D | head -n1; done
> >
> > This reads that 'dev_number' fields out of the metadata on each device.
> > This should not have been corrupted by the bug.
> > You might want some other pattern in place of "/dev/sd?1" - it needs to match
> > all the devices in your array.
> >
> > Then on one of the devices which doesn't have corrupted metadata, run
> >
> > dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d
> >
> > where $COUNT is one more than the largest number that was reported in the
> > "dev_number" values reported above.
> >
> > Now for each device, take the dev_number that was reported, use that as an
> > index into the list of numbers produced by the second command, and that
> > number if the role of the device in the array. i.e. it's position in the
> > list.
> >
> > So after making an array of 5 'loop' devices in a non-obvious order, and
> > failing a device and re-adding it:
> >
> > # for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done
> > /dev/loop0 0000000 3
> > /dev/loop1 0000000 4
> > /dev/loop2 0000000 1
> > /dev/loop3 0000000 0
> > /dev/loop4 0000000 5
> >
> > and
> >
> > # dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d
> > 0000000 0 1 65534 3 4 2
> > 0000014
> >
> > So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3'
> > /dev/loop1 has 'dev_number' 4, so is device 4
> > /dev/loop4 has dev_number '5', so is device 2
> > etc
> > So we can reconstruct the order of devices:
> >
> > /dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1
> >
> > Note the '65534' in the list means that there is no device with that
> > dev_number. i.e. no device is number '2', and looking at the list confirms
> > that.
> >
> > You should be able to perform the same steps to recover the correct order to
> > try creating the array.
> >
> > NeilBrown
> >
>
>
> Hi Neil,
>
> Thank you so much for taking the time to help me through this.
>
> Here's what I've come up with, per your instructions:
>
> /dev/sda1 0000000 4
> /dev/sdb1 0000000 11
> /dev/sdc1 0000000 7
> /dev/sde1 0000000 8
> /dev/sdf1 0000000 1
> /dev/sdg1 0000000 0
> /dev/sdh1 0000000 6
> /dev/sdi1 0000000 10
> /dev/sdj1 0000000 9
>
> dd 2> /dev/null if=/dev/sdc1 bs=2 count=12 skip=2176 | od -d
> 0000000 0 1 65534 65534 2 65534 4 5
> 0000020 6 7 8 3
> 0000030
>
> Mind doing a sanity check for me?
>
> Based on the above information, one such possible device order is:
>
> /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* /dev/sda1 /dev/sdj1* /dev/sdh1
> /dev/sdc1 /dev/sde1
>
> where * represents the three unknown devices marked by 65534?
Nope. The 65534 entries should never come into it.
sdg1 sdf1 sda1 sdb1 sdh1 sdc1 sde1 sdj1 sdi1
e.g. sdi1 is device '10'. Entry 10 in the array is 8, so sdi1 goes in
position 8.
>
> Once I have your blessing, would I then proceed to:
>
> mdadm --create /dev/md0 --assume-clean --level=6 --raid-devices=9
> --metadata=1.2 --chunk=512 /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1*
> /dev/sda1 /dev/sdj1* /dev/sdh1 /dev/sdc1 /dev/sde1
>
> and this is non-destructive, so I can attempt different orders?
Yes. Well, it destroys the metadata so make sure you have a copy of the "-E"
for each device, and it wouldn't hurt to run that second 'dd' command on
every device and keep that just in case.
NeilBrown
>
> Again, thank you for the help.
>
> Best wishes,
>
> -EJ
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED]
2012-10-02 5:04 ` NeilBrown
@ 2012-10-02 8:34 ` EJ Vincent
2012-10-02 12:18 ` Phil Turmel
0 siblings, 1 reply; 17+ messages in thread
From: EJ Vincent @ 2012-10-02 8:34 UTC (permalink / raw)
To: NeilBrown; +Cc: Phil Turmel, linux-raid
On 10/2/2012 1:04 AM, NeilBrown wrote:
> On Mon, 01 Oct 2012 23:53:08 -0400 EJ Vincent <ej@ejane.org> wrote:
>
>> On 10/1/2012 10:15 PM, NeilBrown wrote:
>>> On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@ejane.org> wrote:
>>>
>>>> On 9/30/2012 4:28 PM, Phil Turmel wrote:
>>>>> On 09/30/2012 03:25 PM, EJ Vincent wrote:
>>>>>> On 9/30/2012 3:22 PM, Mathias Burén wrote:
>>>>>>> Can't you just boot off an older Ubuntu USB, install mdadm and scan /
>>>>>>> assemble, see the device order?
>>>>>> Hi Mathias,
>>>>>>
>>>>>> I'm under the impression that damage to the metadata has already been
>>>>>> done by 12.04, making a recovery from an older version of Ubuntu
>>>>>> (10.04), impossible. Is this line of thinking, flawed?
>>>>> Your impression is correct. Permanent damage to the metadata was done.
>>>>> You *must* re-create your array.
>>>>>
>>>>> However, you *cannot* use your new version of mdadm, as it will get the
>>>>> data offset wrong. Your first report showed a data offset of 272.
>>>>> Newer versions of mdadm default to 2048. You *must* perform all of your
>>>>> "mdadm --create --assume-clean" permutations with 10.04.
>>>>>
>>>>> Do you have *any* dmesg output from the old system? Or dmesg from the
>>>>> very first boot under 12.04? That might have enough information to
>>>>> shorten your search.
>>>>>
>>>>> In the future, you should record your setup by saving the output of
>>>>> "mdadm -D" on each array, "mdadm -E" on each member device, and the
>>>>> output of "ls -l /dev/disk/by-id/"
>>>>>
>>>>> Or try my documentation script "lsdrv". [1]
>>>>>
>>>>> HTH,
>>>>>
>>>>> Phil
>>>>>
>>>>> [1] http://github.com/pturmel/lsdrv
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> Hi Phil,
>>>>
>>>> Unfortunately I don't have any dmesg log from the old system or the
>>>> first boot under 12.04.
>>>>
>>>> Getting my system to boot at all under 12.04 was chaotic enough, with
>>>> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
>>>> ravaging my array and then dropping me to a busybox shell over and over
>>>> again. I didn't think to record the very first error.
>>>>
>>>> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
>>>> /dev/sdj1 don't have the Raid level "-unknown-", neither are they
>>>> labeled as spares. They are in fact, labeled clean and appear
>>>> *different* from the others.
>>>>
>>>> Could these disks still contain my metadata from 10.04? I recall during
>>>> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
>>>> that I could drop in a SATA CD/DVDRW into the slot.
>>>>
>>>> I am downloading 10.04.4 LTS and will be ready to use it soon. I fear
>>>> having to do permutations-- 9! (factorial) would mean 362,880
>>>> combinations. *gasp*
>>> You might be able to avoid the 9! combinations, which could take a while ...
>>> 4 days if you could test one per second.
>>>
>>> Try this:
>>>
>>> for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \
>>> skip=4256 | od -D | head -n1; done
>>>
>>> This reads that 'dev_number' fields out of the metadata on each device.
>>> This should not have been corrupted by the bug.
>>> You might want some other pattern in place of "/dev/sd?1" - it needs to match
>>> all the devices in your array.
>>>
>>> Then on one of the devices which doesn't have corrupted metadata, run
>>>
>>> dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d
>>>
>>> where $COUNT is one more than the largest number that was reported in the
>>> "dev_number" values reported above.
>>>
>>> Now for each device, take the dev_number that was reported, use that as an
>>> index into the list of numbers produced by the second command, and that
>>> number if the role of the device in the array. i.e. it's position in the
>>> list.
>>>
>>> So after making an array of 5 'loop' devices in a non-obvious order, and
>>> failing a device and re-adding it:
>>>
>>> # for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done
>>> /dev/loop0 0000000 3
>>> /dev/loop1 0000000 4
>>> /dev/loop2 0000000 1
>>> /dev/loop3 0000000 0
>>> /dev/loop4 0000000 5
>>>
>>> and
>>>
>>> # dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d
>>> 0000000 0 1 65534 3 4 2
>>> 0000014
>>>
>>> So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3'
>>> /dev/loop1 has 'dev_number' 4, so is device 4
>>> /dev/loop4 has dev_number '5', so is device 2
>>> etc
>>> So we can reconstruct the order of devices:
>>>
>>> /dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1
>>>
>>> Note the '65534' in the list means that there is no device with that
>>> dev_number. i.e. no device is number '2', and looking at the list confirms
>>> that.
>>>
>>> You should be able to perform the same steps to recover the correct order to
>>> try creating the array.
>>>
>>> NeilBrown
>>>
>>
>> Hi Neil,
>>
>> Thank you so much for taking the time to help me through this.
>>
>> Here's what I've come up with, per your instructions:
>>
>> /dev/sda1 0000000 4
>> /dev/sdb1 0000000 11
>> /dev/sdc1 0000000 7
>> /dev/sde1 0000000 8
>> /dev/sdf1 0000000 1
>> /dev/sdg1 0000000 0
>> /dev/sdh1 0000000 6
>> /dev/sdi1 0000000 10
>> /dev/sdj1 0000000 9
>>
>> dd 2> /dev/null if=/dev/sdc1 bs=2 count=12 skip=2176 | od -d
>> 0000000 0 1 65534 65534 2 65534 4 5
>> 0000020 6 7 8 3
>> 0000030
>>
>> Mind doing a sanity check for me?
>>
>> Based on the above information, one such possible device order is:
>>
>> /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* /dev/sda1 /dev/sdj1* /dev/sdh1
>> /dev/sdc1 /dev/sde1
>>
>> where * represents the three unknown devices marked by 65534?
> Nope. The 65534 entries should never come into it.
>
> sdg1 sdf1 sda1 sdb1 sdh1 sdc1 sde1 sdj1 sdi1
>
> e.g. sdi1 is device '10'. Entry 10 in the array is 8, so sdi1 goes in
> position 8.
>
>> Once I have your blessing, would I then proceed to:
>>
>> mdadm --create /dev/md0 --assume-clean --level=6 --raid-devices=9
>> --metadata=1.2 --chunk=512 /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1*
>> /dev/sda1 /dev/sdj1* /dev/sdh1 /dev/sdc1 /dev/sde1
>>
>> and this is non-destructive, so I can attempt different orders?
> Yes. Well, it destroys the metadata so make sure you have a copy of the "-E"
> for each device, and it wouldn't hurt to run that second 'dd' command on
> every device and keep that just in case.
>
> NeilBrown
>
>> Again, thank you for the help.
>>
>> Best wishes,
>>
>> -EJ
Neil,
I've successfully re-created the array using the corrected device order
you specified.
For the purpose of documenting,
I immediately started an 'xfs_check', but due to the size of the
filesystem, it quickly (under 90 seconds) consumed all available memory
on the server (16GB). I instead used 'xfs_repair -n', which ran for
about one minute before returning me to a shell (no errors reported):
(-n No modify mode. Specifies that xfs_repair should not modify the
filesystem but should only scan the filesystem and indicate what repairs
would have been made.)
I then set the sync_action under /sys/block/md0/md/ to 'check' and also
increased the stripe_cache_size to something not so modest, 4096 up from
256. I'm monitoring /sys/block/md0/md/mismatch_cnt using tail -f and so
far it has been stuck at 0, a good sign for sure. I'm well on my way to
a complete recovery (about 25% checked as of writing this).
I want to thank you again Neil (and the rest of the linux-raid mailing
list) for the absolutely flawless and expert support you've provided.
Best wishes,
-EJ
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED]
2012-10-02 8:34 ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED] EJ Vincent
@ 2012-10-02 12:18 ` Phil Turmel
0 siblings, 0 replies; 17+ messages in thread
From: Phil Turmel @ 2012-10-02 12:18 UTC (permalink / raw)
To: EJ Vincent; +Cc: NeilBrown, linux-raid
On 10/02/2012 04:34 AM, EJ Vincent wrote:
> Neil,
>
> I've successfully re-created the array using the corrected device order
> you specified.
Great news. I'm tucking Neil's procedure away in my toolbox...
Phil
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-10-02 12:18 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-30 9:21 Upgrade from Ubuntu 10.04 to 12.04 broken raid6 EJ
2012-09-30 9:30 ` EJ Vincent
2012-09-30 9:44 ` Jan Ceuleers
2012-09-30 10:04 ` Mikael Abrahamsson
2012-09-30 19:20 ` EJ Vincent
2012-09-30 19:22 ` Mathias Burén
2012-09-30 19:25 ` EJ Vincent
2012-09-30 20:28 ` Phil Turmel
2012-09-30 23:23 ` EJ Vincent
2012-10-01 12:40 ` Phil Turmel
2012-10-01 17:14 ` EJ Vincent
2012-10-02 2:15 ` NeilBrown
2012-10-02 3:53 ` EJ Vincent
2012-10-02 5:04 ` NeilBrown
2012-10-02 8:34 ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED] EJ Vincent
2012-10-02 12:18 ` Phil Turmel
2012-09-30 19:50 ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6 Chris Murphy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).