* Upgrade from Ubuntu 10.04 to 12.04 broken raid6. @ 2012-09-30 9:21 EJ 2012-09-30 9:30 ` EJ Vincent ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: EJ @ 2012-09-30 9:21 UTC (permalink / raw) To: linux-raid Greetings, I hope that I'm posting this in the right place, if not my apologies. Up until several hours ago, my system was running Ubuntu 10.04 LTS, using the stock version of mdadm--unfortunately I have no idea which version it was. Fast forward to now, I've upgraded the system to 12.04 LTS and have lost access to my array. The array itself is a nine (9) disk raid6 managed by mdadm. I'm not sure this is pertinent information, but trying to get 12.04 LTS to boot was an exercise in patience. There was some sort of race condition possibly happening between the disks of the array initializing and 12.04's udev. It would constantly drop me to a busybox shell, trying to degrade the known-working array. Eventually, I had to go into /usr/share/initramfs-tools/scripts/mdadm-functions and type "exit 1" into both degraded_arrays() and mountroot_fail() so that my system could at the very least boot. I fear that the constant rebooting and 12.04's aggressive initramfs scripting has somehow damaged my array. Ok back to the array itself, here's some raw command data: # mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 mdadm: superblock on /dev/sdc1 doesn't match others - assembly aborted I also tried # mdadm --auto-detect and found this in dmesg: [ 676.998212] md: Autodetecting RAID arrays. [ 676.998426] md: invalid raid superblock magic on sdc1 [ 676.998458] md: sdc1 does not have a valid v0.90 superblock, not importing! [ 676.998870] md: invalid raid superblock magic on sde1 [ 676.998911] md: sde1 does not have a valid v0.90 superblock, not importing! [ 676.999474] md: invalid raid superblock magic on sdb1 [ 676.999495] md: sdb1 does not have a valid v0.90 superblock, not importing! [ 676.999703] md: invalid raid superblock magic on sdd1 [ 676.999732] md: sdd1 does not have a valid v0.90 superblock, not importing! [ 677.000137] md: invalid raid superblock magic on sdf1 [ 677.000163] md: sdf1 does not have a valid v0.90 superblock, not importing! [ 677.000566] md: invalid raid superblock magic on sdg1 [ 677.000586] md: sdg1 does not have a valid v0.90 superblock, not importing! [ 677.000940] md: invalid raid superblock magic on sdh1 [ 677.000960] md: sdh1 does not have a valid v0.90 superblock, not importing! [ 677.001356] md: invalid raid superblock magic on sdi1 [ 677.001375] md: sdi1 does not have a valid v0.90 superblock, not importing! [ 677.001841] md: invalid raid superblock magic on sdj1 [ 677.001871] md: sdj1 does not have a valid v0.90 superblock, not importing! [ 677.001933] md: Scanned 9 and added 0 devices. [ 677.001938] md: autorun ... [ 677.001941] md: ... autorun DONE. Here are the disks themselves: # mdadm -E /dev/sdb1 /dev/sdb1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 19:40:25 2011 Raid Level : raid6 Raid Devices : 9 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Array Size : 27349181440 (13041.11 GiB 14002.78 GB) Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : a6fd99b2:7bb75287:5d844ec5:822b6d8a Update Time : Sun Sep 30 04:34:27 2012 Checksum : 760485cb - correct Events : 2474296 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 5 Array State : AAAAAAAAA ('A' == active, '.' == missing) # mdadm -E /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 19:40:25 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : active Device UUID : f3f72549:8543972f:1f4a655d:fa9416bd Update Time : Sun Sep 30 07:26:43 2012 Checksum : 7e955e4e - correct Events : 1 Device Role : spare Array State : ('A' == active, '.' == missing) # mdadm -E /dev/sdd1 /dev/sdd1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 19:40:25 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : active Device UUID : 9c908e4b:ad7d8af8:ff5d2ab6:50b013e5 Update Time : Sun Sep 30 07:26:43 2012 Checksum : cab36055 - correct Events : 1 Device Role : spare Array State : ('A' == active, '.' == missing) # mdadm -E /dev/sde1 /dev/sde1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 19:40:25 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : active Device UUID : 321368f6:9f38bc16:76f787c3:4b3d398d Update Time : Sun Sep 30 07:26:43 2012 Checksum : 4941c455 - correct Events : 1 Device Role : spare Array State : ('A' == active, '.' == missing) # mdadm -E /dev/sdf1 /dev/sdf1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 19:40:25 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : active Device UUID : 6190765b:200ff748:d50a75e3:597405c4 Update Time : Sun Sep 30 07:26:43 2012 Checksum : 37446270 - correct Events : 1 Device Role : spare Array State : ('A' == active, '.' == missing) # mdadm -E /dev/sdg1 /dev/sdg1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 19:40:25 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : active Device UUID : 7d707598:a8881376:531ae0c6:aac82909 Update Time : Sun Sep 30 07:26:43 2012 Checksum : c9ef1fe9 - correct Events : 1 Device Role : spare Array State : ('A' == active, '.' == missing) # mdadm -E /dev/sdh1 /dev/sdh1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 19:40:25 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : active Device UUID : 179691a0:fd201c2d:49c73803:409a0a9c Update Time : Sun Sep 30 07:26:43 2012 Checksum : 584d5c61 - correct Events : 1 Device Role : spare Array State : ('A' == active, '.' == missing) # mdadm -E /dev/sdi1 /dev/sdi1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 19:40:25 2011 Raid Level : raid6 Raid Devices : 9 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Array Size : 27349181440 (13041.11 GiB 14002.78 GB) Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : 9d53248b:1db27ffc:a2a511c3:7176a7eb Update Time : Sun Sep 30 04:34:27 2012 Checksum : 22b9429c - correct Events : 2474296 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 8 Array State : AAAAAAAAA ('A' == active, '.' == missing) # mdadm -E /dev/sdj1 /dev/sdj1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 19:40:25 2011 Raid Level : raid6 Raid Devices : 9 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Array Size : 27349181440 (13041.11 GiB 14002.78 GB) Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : 880ed7fb:b9c673de:929d14c5:53f9b81d Update Time : Sun Sep 30 04:34:27 2012 Checksum : a9748cf3 - correct Events : 2474296 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 7 Array State : AAAAAAAAA ('A' == active, '.' == missing) I find it odd that the raid levels for some of the disks would register as "unknown" and that their device roles would be shifted to "spare". Current system: Linux ruby 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Mdadm version: mdadm - v3.2.3 - 23rd December 2011 I hope I've provided enough information. I would be more than happy to elaborate or provide additional data if need be. Again, this array was functioning normally up until a few hours ago. Am I able to salvage my data? Thank you. -EJ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-09-30 9:21 Upgrade from Ubuntu 10.04 to 12.04 broken raid6 EJ @ 2012-09-30 9:30 ` EJ Vincent 2012-09-30 9:44 ` Jan Ceuleers 2012-09-30 10:04 ` Mikael Abrahamsson 2 siblings, 0 replies; 17+ messages in thread From: EJ Vincent @ 2012-09-30 9:30 UTC (permalink / raw) To: linux-raid On 9/30/2012 5:21 AM, EJ wrote: > Greetings, > > I hope that I'm posting this in the right place, if not my apologies. > > Up until several hours ago, my system was running Ubuntu 10.04 LTS, using the > stock version of mdadm--unfortunately I have no idea which version it was. > > Fast forward to now, I've upgraded the system to 12.04 LTS and have lost access > to my array. The array itself is a nine (9) disk raid6 managed by mdadm. > > I'm not sure this is pertinent information, but trying to get 12.04 LTS to boot > was an exercise in patience. There was some sort of race condition possibly > happening between the disks of the array initializing and 12.04's udev. It would > constantly drop me to a busybox shell, trying to degrade the known-working > array. > > Eventually, I had to go into /usr/share/initramfs-tools/scripts/mdadm-functions > and type "exit 1" into both degraded_arrays() and mountroot_fail() so that my > system could at the very least boot. I fear that the constant rebooting and > 12.04's aggressive initramfs scripting has somehow damaged my array. > > Ok back to the array itself, here's some raw command data: > > # mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 > /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 > mdadm: superblock on /dev/sdc1 doesn't match others - assembly aborted > > I also tried # mdadm --auto-detect and found this in dmesg: > > [ 676.998212] md: Autodetecting RAID arrays. > [ 676.998426] md: invalid raid superblock magic on sdc1 > [ 676.998458] md: sdc1 does not have a valid v0.90 superblock, not importing! > [ 676.998870] md: invalid raid superblock magic on sde1 > [ 676.998911] md: sde1 does not have a valid v0.90 superblock, not importing! > [ 676.999474] md: invalid raid superblock magic on sdb1 > [ 676.999495] md: sdb1 does not have a valid v0.90 superblock, not importing! > [ 676.999703] md: invalid raid superblock magic on sdd1 > [ 676.999732] md: sdd1 does not have a valid v0.90 superblock, not importing! > [ 677.000137] md: invalid raid superblock magic on sdf1 > [ 677.000163] md: sdf1 does not have a valid v0.90 superblock, not importing! > [ 677.000566] md: invalid raid superblock magic on sdg1 > [ 677.000586] md: sdg1 does not have a valid v0.90 superblock, not importing! > [ 677.000940] md: invalid raid superblock magic on sdh1 > [ 677.000960] md: sdh1 does not have a valid v0.90 superblock, not importing! > [ 677.001356] md: invalid raid superblock magic on sdi1 > [ 677.001375] md: sdi1 does not have a valid v0.90 superblock, not importing! > [ 677.001841] md: invalid raid superblock magic on sdj1 > [ 677.001871] md: sdj1 does not have a valid v0.90 superblock, not importing! > [ 677.001933] md: Scanned 9 and added 0 devices. > [ 677.001938] md: autorun ... > [ 677.001941] md: ... autorun DONE. > > Here are the disks themselves: > > # mdadm -E /dev/sdb1 > /dev/sdb1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 > Name : ruby:6 (local to host ruby) > Creation Time : Mon Apr 11 19:40:25 2011 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) > Array Size : 27349181440 (13041.11 GiB 14002.78 GB) > Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : clean > Device UUID : a6fd99b2:7bb75287:5d844ec5:822b6d8a > > Update Time : Sun Sep 30 04:34:27 2012 > Checksum : 760485cb - correct > Events : 2474296 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 5 > Array State : AAAAAAAAA ('A' == active, '.' == missing) > > # mdadm -E /dev/sdc1 > /dev/sdc1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 > Name : ruby:6 (local to host ruby) > Creation Time : Mon Apr 11 19:40:25 2011 > Raid Level : -unknown- > Raid Devices : 0 > > Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : active > Device UUID : f3f72549:8543972f:1f4a655d:fa9416bd > > Update Time : Sun Sep 30 07:26:43 2012 > Checksum : 7e955e4e - correct > Events : 1 > > > Device Role : spare > Array State : ('A' == active, '.' == missing) > > # mdadm -E /dev/sdd1 > /dev/sdd1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 > Name : ruby:6 (local to host ruby) > Creation Time : Mon Apr 11 19:40:25 2011 > Raid Level : -unknown- > Raid Devices : 0 > > Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : active > Device UUID : 9c908e4b:ad7d8af8:ff5d2ab6:50b013e5 > > Update Time : Sun Sep 30 07:26:43 2012 > Checksum : cab36055 - correct > Events : 1 > > > Device Role : spare > Array State : ('A' == active, '.' == missing) > > # mdadm -E /dev/sde1 > /dev/sde1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 > Name : ruby:6 (local to host ruby) > Creation Time : Mon Apr 11 19:40:25 2011 > Raid Level : -unknown- > Raid Devices : 0 > > Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : active > Device UUID : 321368f6:9f38bc16:76f787c3:4b3d398d > > Update Time : Sun Sep 30 07:26:43 2012 > Checksum : 4941c455 - correct > Events : 1 > > > Device Role : spare > Array State : ('A' == active, '.' == missing) > > # mdadm -E /dev/sdf1 > /dev/sdf1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 > Name : ruby:6 (local to host ruby) > Creation Time : Mon Apr 11 19:40:25 2011 > Raid Level : -unknown- > Raid Devices : 0 > > Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : active > Device UUID : 6190765b:200ff748:d50a75e3:597405c4 > > Update Time : Sun Sep 30 07:26:43 2012 > Checksum : 37446270 - correct > Events : 1 > > > Device Role : spare > Array State : ('A' == active, '.' == missing) > > # mdadm -E /dev/sdg1 > /dev/sdg1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 > Name : ruby:6 (local to host ruby) > Creation Time : Mon Apr 11 19:40:25 2011 > Raid Level : -unknown- > Raid Devices : 0 > > Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : active > Device UUID : 7d707598:a8881376:531ae0c6:aac82909 > > Update Time : Sun Sep 30 07:26:43 2012 > Checksum : c9ef1fe9 - correct > Events : 1 > > > Device Role : spare > Array State : ('A' == active, '.' == missing) > > # mdadm -E /dev/sdh1 > /dev/sdh1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 > Name : ruby:6 (local to host ruby) > Creation Time : Mon Apr 11 19:40:25 2011 > Raid Level : -unknown- > Raid Devices : 0 > > Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : active > Device UUID : 179691a0:fd201c2d:49c73803:409a0a9c > > Update Time : Sun Sep 30 07:26:43 2012 > Checksum : 584d5c61 - correct > Events : 1 > > > Device Role : spare > Array State : ('A' == active, '.' == missing) > > # mdadm -E /dev/sdi1 > /dev/sdi1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 > Name : ruby:6 (local to host ruby) > Creation Time : Mon Apr 11 19:40:25 2011 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) > Array Size : 27349181440 (13041.11 GiB 14002.78 GB) > Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 9d53248b:1db27ffc:a2a511c3:7176a7eb > > Update Time : Sun Sep 30 04:34:27 2012 > Checksum : 22b9429c - correct > Events : 2474296 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 8 > Array State : AAAAAAAAA ('A' == active, '.' == missing) > > # mdadm -E /dev/sdj1 > /dev/sdj1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 > Name : ruby:6 (local to host ruby) > Creation Time : Mon Apr 11 19:40:25 2011 > Raid Level : raid6 > Raid Devices : 9 > > Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) > Array Size : 27349181440 (13041.11 GiB 14002.78 GB) > Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 880ed7fb:b9c673de:929d14c5:53f9b81d > > Update Time : Sun Sep 30 04:34:27 2012 > Checksum : a9748cf3 - correct > Events : 2474296 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 7 > Array State : AAAAAAAAA ('A' == active, '.' == missing) > > I find it odd that the raid levels for some of the disks would register as > "unknown" and that their device roles would be shifted to "spare". > > Current system: > > Linux ruby 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 > x86_64 x86_64 GNU/Linux > > Mdadm version: > > mdadm - v3.2.3 - 23rd December 2011 > > I hope I've provided enough information. I would be more than happy to elaborate > or provide additional data if need be. Again, this array was functioning > normally up until a few hours ago. Am I able to salvage my data? > > Thank you. > > -EJ > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Hello again, a quick follow-up, I've rebooted the server and /proc/mdstat now looks like this: $ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md6 : inactive sdh1[8](S) sdf1[4](S) sdg1[11](S) sde1[6](S) sdc1[1](S) sdd1[0](S) 11721080016 blocks super 1.2 $ mdadm -D /dev/md6 mdadm: md device /dev/md6 does not appear to be active. Although I'm still not sure how to proceed-- I thought it best to include this information to the list. Thanks again, -EJ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-09-30 9:21 Upgrade from Ubuntu 10.04 to 12.04 broken raid6 EJ 2012-09-30 9:30 ` EJ Vincent @ 2012-09-30 9:44 ` Jan Ceuleers 2012-09-30 10:04 ` Mikael Abrahamsson 2 siblings, 0 replies; 17+ messages in thread From: Jan Ceuleers @ 2012-09-30 9:44 UTC (permalink / raw) To: EJ; +Cc: linux-raid On 09/30/2012 11:21 AM, EJ wrote: > Greetings, > > I hope that I'm posting this in the right place, if not my apologies. > > Up until several hours ago, my system was running Ubuntu 10.04 LTS, using the > stock version of mdadm--unfortunately I have no idea which version it was. If your 10.04 installation was up-to-date before you upgraded, you were running the following version of mdadm: mdadm - v2.6.7.1 - 15th October 2008 HTH, Jan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-09-30 9:21 Upgrade from Ubuntu 10.04 to 12.04 broken raid6 EJ 2012-09-30 9:30 ` EJ Vincent 2012-09-30 9:44 ` Jan Ceuleers @ 2012-09-30 10:04 ` Mikael Abrahamsson 2012-09-30 19:20 ` EJ Vincent 2 siblings, 1 reply; 17+ messages in thread From: Mikael Abrahamsson @ 2012-09-30 10:04 UTC (permalink / raw) To: EJ; +Cc: linux-raid On Sun, 30 Sep 2012, EJ wrote: > Fast forward to now, I've upgraded the system to 12.04 LTS and have lost > access to my array. The array itself is a nine (9) disk raid6 managed by > mdadm. What version of kernel for 12.04 were you running? If you didn't upgrade your kernel, you might have been hit by the bug described in: <http://neil.brown.name/blog/20120615073245> -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-09-30 10:04 ` Mikael Abrahamsson @ 2012-09-30 19:20 ` EJ Vincent 2012-09-30 19:22 ` Mathias Burén 2012-09-30 19:50 ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6 Chris Murphy 0 siblings, 2 replies; 17+ messages in thread From: EJ Vincent @ 2012-09-30 19:20 UTC (permalink / raw) To: linux-raid On 9/30/2012 6:04 AM, Mikael Abrahamsson wrote: > On Sun, 30 Sep 2012, EJ wrote: > >> Fast forward to now, I've upgraded the system to 12.04 LTS and have >> lost access to my array. The array itself is a nine (9) disk raid6 >> managed by mdadm. > > What version of kernel for 12.04 were you running? > > If you didn't upgrade your kernel, you might have been hit by the bug > described in: > > <http://neil.brown.name/blog/20120615073245> > Hello, I'm running the stock version of Ubuntu 12.04.0, using kernel 3.2.0-23-generic. That link looks interesting-- I'm not sure if I triggered the bug how Mr. Neil Brown describes it, but I definitely have symptoms on some (not all) the disks of RAID level "-unknown-" and devices appearing to be spares. I'm hesitant to re-create the array again (using mdadm) because according to that blog post, for RAID-6, the order of devices are important, and with this being a 9 disk array and no record of device order in logs or from my own memory, I have no idea what the proper order might be. I do know that 1) I was using metadata version 1.2, 2) the array was not degraded and subsequently 3) no disks were missing. Am I over-estimating the importance of the order and should proceed with the re-creation, or perhaps wait for Neil himself to weigh in the problem? Thanks for all the responses, much appreciated. -EJ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-09-30 19:20 ` EJ Vincent @ 2012-09-30 19:22 ` Mathias Burén 2012-09-30 19:25 ` EJ Vincent 2012-09-30 19:50 ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6 Chris Murphy 1 sibling, 1 reply; 17+ messages in thread From: Mathias Burén @ 2012-09-30 19:22 UTC (permalink / raw) To: EJ Vincent; +Cc: linux-raid On 30 September 2012 20:20, EJ Vincent <ej@ejane.org> wrote: > On 9/30/2012 6:04 AM, Mikael Abrahamsson wrote: >> >> On Sun, 30 Sep 2012, EJ wrote: >> >>> Fast forward to now, I've upgraded the system to 12.04 LTS and have lost >>> access to my array. The array itself is a nine (9) disk raid6 managed by >>> mdadm. >> >> >> What version of kernel for 12.04 were you running? >> >> If you didn't upgrade your kernel, you might have been hit by the bug >> described in: >> >> <http://neil.brown.name/blog/20120615073245> >> > > Hello, > > I'm running the stock version of Ubuntu 12.04.0, using kernel > 3.2.0-23-generic. > > That link looks interesting-- I'm not sure if I triggered the bug how Mr. > Neil Brown describes it, but I definitely have symptoms on some (not all) > the disks of RAID level "-unknown-" and devices appearing to be spares. > > I'm hesitant to re-create the array again (using mdadm) because according to > that blog post, for RAID-6, the order of devices are important, and with > this being a 9 disk array and no record of device order in logs or from my > own memory, I have no idea what the proper order might be. > > I do know that 1) I was using metadata version 1.2, 2) the array was not > degraded and subsequently 3) no disks were missing. > > Am I over-estimating the importance of the order and should proceed with the > re-creation, or perhaps wait for Neil himself to weigh in the problem? > > Thanks for all the responses, much appreciated. > > -EJ > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Can't you just boot off an older Ubuntu USB, install mdadm and scan / assemble, see the device order? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-09-30 19:22 ` Mathias Burén @ 2012-09-30 19:25 ` EJ Vincent 2012-09-30 20:28 ` Phil Turmel 0 siblings, 1 reply; 17+ messages in thread From: EJ Vincent @ 2012-09-30 19:25 UTC (permalink / raw) To: linux-raid On 9/30/2012 3:22 PM, Mathias Burén wrote: > On 30 September 2012 20:20, EJ Vincent <ej@ejane.org> wrote: >> On 9/30/2012 6:04 AM, Mikael Abrahamsson wrote: >>> On Sun, 30 Sep 2012, EJ wrote: >>> >>>> Fast forward to now, I've upgraded the system to 12.04 LTS and have lost >>>> access to my array. The array itself is a nine (9) disk raid6 managed by >>>> mdadm. >>> >>> What version of kernel for 12.04 were you running? >>> >>> If you didn't upgrade your kernel, you might have been hit by the bug >>> described in: >>> >>> <http://neil.brown.name/blog/20120615073245> >>> >> Hello, >> >> I'm running the stock version of Ubuntu 12.04.0, using kernel >> 3.2.0-23-generic. >> >> That link looks interesting-- I'm not sure if I triggered the bug how Mr. >> Neil Brown describes it, but I definitely have symptoms on some (not all) >> the disks of RAID level "-unknown-" and devices appearing to be spares. >> >> I'm hesitant to re-create the array again (using mdadm) because according to >> that blog post, for RAID-6, the order of devices are important, and with >> this being a 9 disk array and no record of device order in logs or from my >> own memory, I have no idea what the proper order might be. >> >> I do know that 1) I was using metadata version 1.2, 2) the array was not >> degraded and subsequently 3) no disks were missing. >> >> Am I over-estimating the importance of the order and should proceed with the >> re-creation, or perhaps wait for Neil himself to weigh in the problem? >> >> Thanks for all the responses, much appreciated. >> >> -EJ >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > Can't you just boot off an older Ubuntu USB, install mdadm and scan / > assemble, see the device order? Hi Mathias, I'm under the impression that damage to the metadata has already been done by 12.04, making a recovery from an older version of Ubuntu (10.04), impossible. Is this line of thinking, flawed? Thanks, -EJ -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-09-30 19:25 ` EJ Vincent @ 2012-09-30 20:28 ` Phil Turmel 2012-09-30 23:23 ` EJ Vincent 0 siblings, 1 reply; 17+ messages in thread From: Phil Turmel @ 2012-09-30 20:28 UTC (permalink / raw) To: EJ Vincent; +Cc: linux-raid On 09/30/2012 03:25 PM, EJ Vincent wrote: > On 9/30/2012 3:22 PM, Mathias Burén wrote: >> Can't you just boot off an older Ubuntu USB, install mdadm and scan / >> assemble, see the device order? > > Hi Mathias, > > I'm under the impression that damage to the metadata has already been > done by 12.04, making a recovery from an older version of Ubuntu > (10.04), impossible. Is this line of thinking, flawed? Your impression is correct. Permanent damage to the metadata was done. You *must* re-create your array. However, you *cannot* use your new version of mdadm, as it will get the data offset wrong. Your first report showed a data offset of 272. Newer versions of mdadm default to 2048. You *must* perform all of your "mdadm --create --assume-clean" permutations with 10.04. Do you have *any* dmesg output from the old system? Or dmesg from the very first boot under 12.04? That might have enough information to shorten your search. In the future, you should record your setup by saving the output of "mdadm -D" on each array, "mdadm -E" on each member device, and the output of "ls -l /dev/disk/by-id/" Or try my documentation script "lsdrv". [1] HTH, Phil [1] http://github.com/pturmel/lsdrv -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-09-30 20:28 ` Phil Turmel @ 2012-09-30 23:23 ` EJ Vincent 2012-10-01 12:40 ` Phil Turmel 2012-10-02 2:15 ` NeilBrown 0 siblings, 2 replies; 17+ messages in thread From: EJ Vincent @ 2012-09-30 23:23 UTC (permalink / raw) To: Phil Turmel; +Cc: linux-raid On 9/30/2012 4:28 PM, Phil Turmel wrote: > On 09/30/2012 03:25 PM, EJ Vincent wrote: >> On 9/30/2012 3:22 PM, Mathias Burén wrote: >>> Can't you just boot off an older Ubuntu USB, install mdadm and scan / >>> assemble, see the device order? >> Hi Mathias, >> >> I'm under the impression that damage to the metadata has already been >> done by 12.04, making a recovery from an older version of Ubuntu >> (10.04), impossible. Is this line of thinking, flawed? > Your impression is correct. Permanent damage to the metadata was done. > You *must* re-create your array. > > However, you *cannot* use your new version of mdadm, as it will get the > data offset wrong. Your first report showed a data offset of 272. > Newer versions of mdadm default to 2048. You *must* perform all of your > "mdadm --create --assume-clean" permutations with 10.04. > > Do you have *any* dmesg output from the old system? Or dmesg from the > very first boot under 12.04? That might have enough information to > shorten your search. > > In the future, you should record your setup by saving the output of > "mdadm -D" on each array, "mdadm -E" on each member device, and the > output of "ls -l /dev/disk/by-id/" > > Or try my documentation script "lsdrv". [1] > > HTH, > > Phil > > [1] http://github.com/pturmel/lsdrv > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Hi Phil, Unfortunately I don't have any dmesg log from the old system or the first boot under 12.04. Getting my system to boot at all under 12.04 was chaotic enough, with the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions ravaging my array and then dropping me to a busybox shell over and over again. I didn't think to record the very first error. Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and /dev/sdj1 don't have the Raid level "-unknown-", neither are they labeled as spares. They are in fact, labeled clean and appear *different* from the others. Could these disks still contain my metadata from 10.04? I recall during my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so that I could drop in a SATA CD/DVDRW into the slot. I am downloading 10.04.4 LTS and will be ready to use it soon. I fear having to do permutations-- 9! (factorial) would mean 362,880 combinations. *gasp* Many thanks for all your comments and insights. -EJ -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-09-30 23:23 ` EJ Vincent @ 2012-10-01 12:40 ` Phil Turmel 2012-10-01 17:14 ` EJ Vincent 2012-10-02 2:15 ` NeilBrown 1 sibling, 1 reply; 17+ messages in thread From: Phil Turmel @ 2012-10-01 12:40 UTC (permalink / raw) To: EJ Vincent; +Cc: linux-raid Hi EJ, On 09/30/2012 07:23 PM, EJ Vincent wrote: > On 9/30/2012 4:28 PM, Phil Turmel wrote: >> Do you have *any* dmesg output from the old system? Or dmesg from the >> very first boot under 12.04? That might have enough information to >> shorten your search. >> >> In the future, you should record your setup by saving the output of >> "mdadm -D" on each array, "mdadm -E" on each member device, and the >> output of "ls -l /dev/disk/by-id/" >> >> Or try my documentation script "lsdrv". [1] >> >> HTH, >> >> Phil >> >> [1] http://github.com/pturmel/lsdrv > > Hi Phil, > > Unfortunately I don't have any dmesg log from the old system or the > first boot under 12.04. > > Getting my system to boot at all under 12.04 was chaotic enough, with > the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions > ravaging my array and then dropping me to a busybox shell over and over > again. I didn't think to record the very first error. I'm not prepared to condemn the 12.04 initramfs--I really don't think it is a factor in this crisis. The critical part is the degraded reboot bug. > Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and > /dev/sdj1 don't have the Raid level "-unknown-", neither are they > labeled as spares. They are in fact, labeled clean and appear > *different* from the others. > > Could these disks still contain my metadata from 10.04? I recall during > my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so > that I could drop in a SATA CD/DVDRW into the slot. Leaving disks unpowered sounds like a key factor in your crisis. Raid6 can't operate with more than two missing, and won't assemble if any disk disappears between shutdown and the next boot. (Must be forced.) So your array would only partially assemble under 12.04 due to deliberately missing drives, then you rebooted with a kernel that has a problem with that scenario. The disks very likely do have useful metadata, but no disk has all of it. It might reduce the permutations you need to try. If you share more information about your system layout, some educated first guesses might be possible, too. The output of "mdadm -E" for every drive, and lsdrv for an overview. > I am downloading 10.04.4 LTS and will be ready to use it soon. I fear > having to do permutations-- 9! (factorial) would mean 362,880 > combinations. *gasp* Phil ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-10-01 12:40 ` Phil Turmel @ 2012-10-01 17:14 ` EJ Vincent 0 siblings, 0 replies; 17+ messages in thread From: EJ Vincent @ 2012-10-01 17:14 UTC (permalink / raw) To: linux-raid On 10/1/2012 8:40 AM, Phil Turmel wrote: > Hi EJ, > > On 09/30/2012 07:23 PM, EJ Vincent wrote: >> On 9/30/2012 4:28 PM, Phil Turmel wrote: >>> Do you have *any* dmesg output from the old system? Or dmesg from the >>> very first boot under 12.04? That might have enough information to >>> shorten your search. >>> >>> In the future, you should record your setup by saving the output of >>> "mdadm -D" on each array, "mdadm -E" on each member device, and the >>> output of "ls -l /dev/disk/by-id/" >>> >>> Or try my documentation script "lsdrv". [1] >>> >>> HTH, >>> >>> Phil >>> >>> [1] http://github.com/pturmel/lsdrv >> Hi Phil, >> >> Unfortunately I don't have any dmesg log from the old system or the >> first boot under 12.04. >> >> Getting my system to boot at all under 12.04 was chaotic enough, with >> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions >> ravaging my array and then dropping me to a busybox shell over and over >> again. I didn't think to record the very first error. > I'm not prepared to condemn the 12.04 initramfs--I really don't think it > is a factor in this crisis. The critical part is the degraded reboot bug. > >> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and >> /dev/sdj1 don't have the Raid level "-unknown-", neither are they >> labeled as spares. They are in fact, labeled clean and appear >> *different* from the others. >> >> Could these disks still contain my metadata from 10.04? I recall during >> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so >> that I could drop in a SATA CD/DVDRW into the slot. > Leaving disks unpowered sounds like a key factor in your crisis. Raid6 > can't operate with more than two missing, and won't assemble if any disk > disappears between shutdown and the next boot. (Must be forced.) > > So your array would only partially assemble under 12.04 due to > deliberately missing drives, then you rebooted with a kernel that has a > problem with that scenario. > > The disks very likely do have useful metadata, but no disk has all of > it. It might reduce the permutations you need to try. If you share > more information about your system layout, some educated first guesses > might be possible, too. The output of "mdadm -E" for every drive, and > lsdrv for an overview. > >> I am downloading 10.04.4 LTS and will be ready to use it soon. I fear >> having to do permutations-- 9! (factorial) would mean 362,880 >> combinations. *gasp* > Phil > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html On 10/1/2012 8:40 AM, Phil Turmel wrote: > Hi EJ, > > On 09/30/2012 07:23 PM, EJ Vincent wrote: >> On 9/30/2012 4:28 PM, Phil Turmel wrote: >>> Do you have *any* dmesg output from the old system? Or dmesg from the >>> very first boot under 12.04? That might have enough information to >>> shorten your search. >>> >>> In the future, you should record your setup by saving the output of >>> "mdadm -D" on each array, "mdadm -E" on each member device, and the >>> output of "ls -l /dev/disk/by-id/" >>> >>> Or try my documentation script "lsdrv". [1] >>> >>> HTH, >>> >>> Phil >>> >>> [1] http://github.com/pturmel/lsdrv >> Hi Phil, >> >> Unfortunately I don't have any dmesg log from the old system or the >> first boot under 12.04. >> >> Getting my system to boot at all under 12.04 was chaotic enough, with >> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions >> ravaging my array and then dropping me to a busybox shell over and over >> again. I didn't think to record the very first error. > I'm not prepared to condemn the 12.04 initramfs--I really don't think it > is a factor in this crisis. The critical part is the degraded reboot bug. > >> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and >> /dev/sdj1 don't have the Raid level "-unknown-", neither are they >> labeled as spares. They are in fact, labeled clean and appear >> *different* from the others. >> >> Could these disks still contain my metadata from 10.04? I recall during >> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so >> that I could drop in a SATA CD/DVDRW into the slot. > Leaving disks unpowered sounds like a key factor in your crisis. Raid6 > can't operate with more than two missing, and won't assemble if any disk > disappears between shutdown and the next boot. (Must be forced.) > > So your array would only partially assemble under 12.04 due to > deliberately missing drives, then you rebooted with a kernel that has a > problem with that scenario. > > The disks very likely do have useful metadata, but no disk has all of > it. It might reduce the permutations you need to try. If you share > more information about your system layout, some educated first guesses > might be possible, too. The output of "mdadm -E" for every drive, and > lsdrv for an overview. > >> I am downloading 10.04.4 LTS and will be ready to use it soon. I fear >> having to do permutations-- 9! (factorial) would mean 362,880 >> combinations. *gasp* > Phil > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Hi Phil, Here's the information you requested. The server has 10 disks, a dedicated 500GB disk for the operating system (which Ubuntu 10.04.4 has labeled /dev/sdd), and 9 x 2TB disks (/dev/sd[a,b,c,e,f,g,h,i,j): Disk /dev/sda: 2000.4 GB, 2000398934016 bytes Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes Disk /dev/sdd: 500.1 GB, 500107862016 bytes Disk /dev/sde: 2000.4 GB, 2000398934016 bytes Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes Disk /dev/sdg: 2000.4 GB, 2000398934016 bytes Disk /dev/sdh: 2000.4 GB, 2000398934016 bytes Disk /dev/sdi: 2000.4 GB, 2000398934016 bytes Disk /dev/sdj: 2000.4 GB, 2000398934016 bytes The devices are spread amongst an on-board SATA controller, MCP78S GeForce AHCI, and two SiI 3124 PCI-X SATA controllers. The layout is as follows: 5 disks are attached to the on-board controller, 3 attached to one SiI 3124 controller, and 2 attached to the other SiI 3124 controller. I've loaded your lsdrv script, here are the results: PCI [pata_amd] 00:06.0 IDE interface: nVidia Corporation MCP78S [GeForce 8200] IDE (rev a1) scsi 0:x:x:x [Empty] scsi 1:x:x:x [Empty] PCI [sata_sil24] 06:04.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02) scsi 2:0:0:0 ATA ST2000DL003-9VT1 sda 1.82t [8:0] Empty/Unknown sda1 1.82t [8:1] Empty/Unknown scsi 5:0:0:0 ATA ST2000DL003-9VT1 sdb 1.82t [8:16] Empty/Unknown sdb1 1.82t [8:17] Empty/Unknown scsi 7:0:0:0 ATA ST2000DL003-9VT1 sdc 1.82t [8:32] Empty/Unknown sdc1 1.82t [8:33] Empty/Unknown scsi 9:x:x:x [Empty] PCI [ahci] 00:09.0 SATA controller: nVidia Corporation MCP78S [GeForce 8200] AHCI Controller (rev a2) scsi 3:0:0:0 ATA WDC WD5000AAKS-2 sdd 465.76g [8:48] Empty/Unknown sdd1 237.00m [8:49] Empty/Unknown Mounted as /dev/sdd1 @ /boot sdd2 3.73g [8:50] Empty/Unknown sdd3 23.28g [8:51] Empty/Unknown Mounted as /dev/disk/by-uuid/65a128d3-3e2e-487a-a36b-11cbe5530429 @ / sdd4 438.52g [8:52] Empty/Unknown scsi 4:0:0:0 ATA ST2000DL003-9VT1 sde 1.82t [8:64] Empty/Unknown sde1 1.82t [8:65] Empty/Unknown scsi 6:0:0:0 ATA ST32000542AS sdf 1.82t [8:80] Empty/Unknown sdf1 1.82t [8:81] Empty/Unknown scsi 8:0:0:0 ATA ST32000542AS sdg 1.82t [8:96] Empty/Unknown sdg1 1.82t [8:97] Empty/Unknown scsi 10:0:0:0 ATA ST2000DL003-9VT1 sdh 1.82t [8:112] Empty/Unknown sdh1 1.82t [8:113] Empty/Unknown scsi 11:x:x:x [Empty] PCI [sata_sil24] 08:04.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02) scsi 12:0:0:0 ATA ST2000DL003-9VT1 sdi 1.82t [8:128] Empty/Unknown sdi1 1.82t [8:129] Empty/Unknown scsi 13:0:0:0 ATA ST2000DL003-9VT1 sdj 1.82t [8:144] Empty/Unknown sdj1 1.82t [8:145] Empty/Unknown scsi 14:x:x:x [Empty] scsi 15:x:x:x [Empty] Here is what mdadm -E looks like for each member of the array, now under Ubuntu 10.04.4: # mdadm -E /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 15:40:25 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : active Device UUID : 6190765b:200ff748:d50a75e3:597405c4 Update Time : Sun Sep 30 19:13:16 2012 Checksum : 37454049 - correct Events : 1 Array Slot : 4 (empty, empty, failed, failed, empty, failed, empty, failed, empty, failed, failed, empty, failed... <shortened for readability>) Array State : 378 failed # mdadm -E /dev/sdb1 /dev/sdb1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 15:40:25 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : active Device UUID : 7d707598:a8881376:531ae0c6:aac82909 Update Time : Sun Sep 30 19:13:16 2012 Checksum : c9effdc2 - correct Events : 1 Array Slot : 11 (empty, empty, failed, failed, empty, failed, empty, failed, empty, failed, failed, empty, failed... <shortened for readability>) Array State : 378 failed # mdadm -E /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 15:40:25 2011 Raid Level : raid6 Raid Devices : 9 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Array Size : 27349181440 (13041.11 GiB 14002.78 GB) Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : a6fd99b2:7bb75287:5d844ec5:822b6d8a Update Time : Sun Sep 30 00:34:27 2012 Checksum : 760485cb - correct Events : 2474296 Chunk Size : 512K Array Slot : 7 (0, 1, failed, failed, 2, failed, 4, 5, 6, 7, 8, 3) Array State : uuuuuUuuu 3 failed # mdadm -E /dev/sde1 /dev/sde1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 15:40:25 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : active Device UUID : 179691a0:fd201c2d:49c73803:409a0a9c Update Time : Sun Sep 30 19:13:16 2012 Checksum : 584e3a3a - correct Events : 1 Array Slot : 8 (empty, empty, failed, failed, empty, failed, empty, failed, empty, failed, failed, empty, failed... <shortened for readability>) Array State : 378 failed # mdadm -E /dev/sdf1 /dev/sdf1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 15:40:25 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : active Device UUID : f3f72549:8543972f:1f4a655d:fa9416bd Update Time : Sun Sep 30 19:13:16 2012 Checksum : 7e963c27 - correct Events : 1 Array Slot : 1 (empty, empty, failed, failed, empty, failed, empty, failed, empty, failed, failed, empty, failed... <shortened for readability>) Array State : 378 failed # mdadm -E /dev/sdg1 /dev/sdg1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 15:40:25 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : active Device UUID : 9c908e4b:ad7d8af8:ff5d2ab6:50b013e5 Update Time : Sun Sep 30 19:13:16 2012 Checksum : cab43e2e - correct Events : 1 Array Slot : 0 (empty, empty, failed, failed, empty, failed, empty, failed, empty, failed, failed, empty, failed... <shortened for readability>) Array State : 378 failed # mdadm -E /dev/sdh1 /dev/sdh1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 15:40:25 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : active Device UUID : 321368f6:9f38bc16:76f787c3:4b3d398d Update Time : Sun Sep 30 19:13:16 2012 Checksum : 4942a22e - correct Events : 1 Array Slot : 6 (empty, empty, failed, failed, empty, failed, empty, failed, empty, failed, failed, empty, failed... <shortened for readability>) Array State : 378 failed # mdadm -E /dev/sdi1 /dev/sdi1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 15:40:25 2011 Raid Level : raid6 Raid Devices : 9 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Array Size : 27349181440 (13041.11 GiB 14002.78 GB) Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : 9d53248b:1db27ffc:a2a511c3:7176a7eb Update Time : Sun Sep 30 00:34:27 2012 Checksum : 22b9429c - correct Events : 2474296 Chunk Size : 512K Array Slot : 10 (0, 1, failed, failed, 2, failed, 4, 5, 6, 7, 8, 3) Array State : uuuuuuuuU 3 failed # mdadm -E /dev/sdj1 /dev/sdj1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5 Name : ruby:6 (local to host ruby) Creation Time : Mon Apr 11 15:40:25 2011 Raid Level : raid6 Raid Devices : 9 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB) Array Size : 27349181440 (13041.11 GiB 14002.78 GB) Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : 880ed7fb:b9c673de:929d14c5:53f9b81d Update Time : Sun Sep 30 00:34:27 2012 Checksum : a9748cf3 - correct Events : 2474296 Chunk Size : 512K Array Slot : 9 (0, 1, failed, failed, 2, failed, 4, 5, 6, 7, 8, 3) Array State : uuuuuuuUu 3 failed I'd be happy to also supply a dump of 'lshw' which I believe is similar to 'lsdrv' if that would be useful to you. The system is back on 10.04.4 LTS, and is using mdadm version 2.6.7.1. Thanks for your continued input and assistance. Much appreciated. -EJ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-09-30 23:23 ` EJ Vincent 2012-10-01 12:40 ` Phil Turmel @ 2012-10-02 2:15 ` NeilBrown 2012-10-02 3:53 ` EJ Vincent 1 sibling, 1 reply; 17+ messages in thread From: NeilBrown @ 2012-10-02 2:15 UTC (permalink / raw) To: EJ Vincent; +Cc: Phil Turmel, linux-raid [-- Attachment #1: Type: text/plain, Size: 4780 bytes --] On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@ejane.org> wrote: > On 9/30/2012 4:28 PM, Phil Turmel wrote: > > On 09/30/2012 03:25 PM, EJ Vincent wrote: > >> On 9/30/2012 3:22 PM, Mathias Burén wrote: > >>> Can't you just boot off an older Ubuntu USB, install mdadm and scan / > >>> assemble, see the device order? > >> Hi Mathias, > >> > >> I'm under the impression that damage to the metadata has already been > >> done by 12.04, making a recovery from an older version of Ubuntu > >> (10.04), impossible. Is this line of thinking, flawed? > > Your impression is correct. Permanent damage to the metadata was done. > > You *must* re-create your array. > > > > However, you *cannot* use your new version of mdadm, as it will get the > > data offset wrong. Your first report showed a data offset of 272. > > Newer versions of mdadm default to 2048. You *must* perform all of your > > "mdadm --create --assume-clean" permutations with 10.04. > > > > Do you have *any* dmesg output from the old system? Or dmesg from the > > very first boot under 12.04? That might have enough information to > > shorten your search. > > > > In the future, you should record your setup by saving the output of > > "mdadm -D" on each array, "mdadm -E" on each member device, and the > > output of "ls -l /dev/disk/by-id/" > > > > Or try my documentation script "lsdrv". [1] > > > > HTH, > > > > Phil > > > > [1] http://github.com/pturmel/lsdrv > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Hi Phil, > > Unfortunately I don't have any dmesg log from the old system or the > first boot under 12.04. > > Getting my system to boot at all under 12.04 was chaotic enough, with > the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions > ravaging my array and then dropping me to a busybox shell over and over > again. I didn't think to record the very first error. > > Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and > /dev/sdj1 don't have the Raid level "-unknown-", neither are they > labeled as spares. They are in fact, labeled clean and appear > *different* from the others. > > Could these disks still contain my metadata from 10.04? I recall during > my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so > that I could drop in a SATA CD/DVDRW into the slot. > > I am downloading 10.04.4 LTS and will be ready to use it soon. I fear > having to do permutations-- 9! (factorial) would mean 362,880 > combinations. *gasp* You might be able to avoid the 9! combinations, which could take a while ... 4 days if you could test one per second. Try this: for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \ skip=4256 | od -D | head -n1; done This reads that 'dev_number' fields out of the metadata on each device. This should not have been corrupted by the bug. You might want some other pattern in place of "/dev/sd?1" - it needs to match all the devices in your array. Then on one of the devices which doesn't have corrupted metadata, run dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d where $COUNT is one more than the largest number that was reported in the "dev_number" values reported above. Now for each device, take the dev_number that was reported, use that as an index into the list of numbers produced by the second command, and that number if the role of the device in the array. i.e. it's position in the list. So after making an array of 5 'loop' devices in a non-obvious order, and failing a device and re-adding it: # for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done /dev/loop0 0000000 3 /dev/loop1 0000000 4 /dev/loop2 0000000 1 /dev/loop3 0000000 0 /dev/loop4 0000000 5 and # dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d 0000000 0 1 65534 3 4 2 0000014 So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3' /dev/loop1 has 'dev_number' 4, so is device 4 /dev/loop4 has dev_number '5', so is device 2 etc So we can reconstruct the order of devices: /dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1 Note the '65534' in the list means that there is no device with that dev_number. i.e. no device is number '2', and looking at the list confirms that. You should be able to perform the same steps to recover the correct order to try creating the array. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-10-02 2:15 ` NeilBrown @ 2012-10-02 3:53 ` EJ Vincent 2012-10-02 5:04 ` NeilBrown 0 siblings, 1 reply; 17+ messages in thread From: EJ Vincent @ 2012-10-02 3:53 UTC (permalink / raw) To: NeilBrown; +Cc: Phil Turmel, linux-raid On 10/1/2012 10:15 PM, NeilBrown wrote: > On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@ejane.org> wrote: > >> On 9/30/2012 4:28 PM, Phil Turmel wrote: >>> On 09/30/2012 03:25 PM, EJ Vincent wrote: >>>> On 9/30/2012 3:22 PM, Mathias Burén wrote: >>>>> Can't you just boot off an older Ubuntu USB, install mdadm and scan / >>>>> assemble, see the device order? >>>> Hi Mathias, >>>> >>>> I'm under the impression that damage to the metadata has already been >>>> done by 12.04, making a recovery from an older version of Ubuntu >>>> (10.04), impossible. Is this line of thinking, flawed? >>> Your impression is correct. Permanent damage to the metadata was done. >>> You *must* re-create your array. >>> >>> However, you *cannot* use your new version of mdadm, as it will get the >>> data offset wrong. Your first report showed a data offset of 272. >>> Newer versions of mdadm default to 2048. You *must* perform all of your >>> "mdadm --create --assume-clean" permutations with 10.04. >>> >>> Do you have *any* dmesg output from the old system? Or dmesg from the >>> very first boot under 12.04? That might have enough information to >>> shorten your search. >>> >>> In the future, you should record your setup by saving the output of >>> "mdadm -D" on each array, "mdadm -E" on each member device, and the >>> output of "ls -l /dev/disk/by-id/" >>> >>> Or try my documentation script "lsdrv". [1] >>> >>> HTH, >>> >>> Phil >>> >>> [1] http://github.com/pturmel/lsdrv >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Hi Phil, >> >> Unfortunately I don't have any dmesg log from the old system or the >> first boot under 12.04. >> >> Getting my system to boot at all under 12.04 was chaotic enough, with >> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions >> ravaging my array and then dropping me to a busybox shell over and over >> again. I didn't think to record the very first error. >> >> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and >> /dev/sdj1 don't have the Raid level "-unknown-", neither are they >> labeled as spares. They are in fact, labeled clean and appear >> *different* from the others. >> >> Could these disks still contain my metadata from 10.04? I recall during >> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so >> that I could drop in a SATA CD/DVDRW into the slot. >> >> I am downloading 10.04.4 LTS and will be ready to use it soon. I fear >> having to do permutations-- 9! (factorial) would mean 362,880 >> combinations. *gasp* > You might be able to avoid the 9! combinations, which could take a while ... > 4 days if you could test one per second. > > Try this: > > for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \ > skip=4256 | od -D | head -n1; done > > This reads that 'dev_number' fields out of the metadata on each device. > This should not have been corrupted by the bug. > You might want some other pattern in place of "/dev/sd?1" - it needs to match > all the devices in your array. > > Then on one of the devices which doesn't have corrupted metadata, run > > dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d > > where $COUNT is one more than the largest number that was reported in the > "dev_number" values reported above. > > Now for each device, take the dev_number that was reported, use that as an > index into the list of numbers produced by the second command, and that > number if the role of the device in the array. i.e. it's position in the > list. > > So after making an array of 5 'loop' devices in a non-obvious order, and > failing a device and re-adding it: > > # for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done > /dev/loop0 0000000 3 > /dev/loop1 0000000 4 > /dev/loop2 0000000 1 > /dev/loop3 0000000 0 > /dev/loop4 0000000 5 > > and > > # dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d > 0000000 0 1 65534 3 4 2 > 0000014 > > So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3' > /dev/loop1 has 'dev_number' 4, so is device 4 > /dev/loop4 has dev_number '5', so is device 2 > etc > So we can reconstruct the order of devices: > > /dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1 > > Note the '65534' in the list means that there is no device with that > dev_number. i.e. no device is number '2', and looking at the list confirms > that. > > You should be able to perform the same steps to recover the correct order to > try creating the array. > > NeilBrown > Hi Neil, Thank you so much for taking the time to help me through this. Here's what I've come up with, per your instructions: /dev/sda1 0000000 4 /dev/sdb1 0000000 11 /dev/sdc1 0000000 7 /dev/sde1 0000000 8 /dev/sdf1 0000000 1 /dev/sdg1 0000000 0 /dev/sdh1 0000000 6 /dev/sdi1 0000000 10 /dev/sdj1 0000000 9 dd 2> /dev/null if=/dev/sdc1 bs=2 count=12 skip=2176 | od -d 0000000 0 1 65534 65534 2 65534 4 5 0000020 6 7 8 3 0000030 Mind doing a sanity check for me? Based on the above information, one such possible device order is: /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* /dev/sda1 /dev/sdj1* /dev/sdh1 /dev/sdc1 /dev/sde1 where * represents the three unknown devices marked by 65534? Once I have your blessing, would I then proceed to: mdadm --create /dev/md0 --assume-clean --level=6 --raid-devices=9 --metadata=1.2 --chunk=512 /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* /dev/sda1 /dev/sdj1* /dev/sdh1 /dev/sdc1 /dev/sde1 and this is non-destructive, so I can attempt different orders? Again, thank you for the help. Best wishes, -EJ -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-10-02 3:53 ` EJ Vincent @ 2012-10-02 5:04 ` NeilBrown 2012-10-02 8:34 ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED] EJ Vincent 0 siblings, 1 reply; 17+ messages in thread From: NeilBrown @ 2012-10-02 5:04 UTC (permalink / raw) To: EJ Vincent; +Cc: Phil Turmel, linux-raid [-- Attachment #1: Type: text/plain, Size: 6953 bytes --] On Mon, 01 Oct 2012 23:53:08 -0400 EJ Vincent <ej@ejane.org> wrote: > On 10/1/2012 10:15 PM, NeilBrown wrote: > > On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@ejane.org> wrote: > > > >> On 9/30/2012 4:28 PM, Phil Turmel wrote: > >>> On 09/30/2012 03:25 PM, EJ Vincent wrote: > >>>> On 9/30/2012 3:22 PM, Mathias Burén wrote: > >>>>> Can't you just boot off an older Ubuntu USB, install mdadm and scan / > >>>>> assemble, see the device order? > >>>> Hi Mathias, > >>>> > >>>> I'm under the impression that damage to the metadata has already been > >>>> done by 12.04, making a recovery from an older version of Ubuntu > >>>> (10.04), impossible. Is this line of thinking, flawed? > >>> Your impression is correct. Permanent damage to the metadata was done. > >>> You *must* re-create your array. > >>> > >>> However, you *cannot* use your new version of mdadm, as it will get the > >>> data offset wrong. Your first report showed a data offset of 272. > >>> Newer versions of mdadm default to 2048. You *must* perform all of your > >>> "mdadm --create --assume-clean" permutations with 10.04. > >>> > >>> Do you have *any* dmesg output from the old system? Or dmesg from the > >>> very first boot under 12.04? That might have enough information to > >>> shorten your search. > >>> > >>> In the future, you should record your setup by saving the output of > >>> "mdadm -D" on each array, "mdadm -E" on each member device, and the > >>> output of "ls -l /dev/disk/by-id/" > >>> > >>> Or try my documentation script "lsdrv". [1] > >>> > >>> HTH, > >>> > >>> Phil > >>> > >>> [1] http://github.com/pturmel/lsdrv > >>> > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> Hi Phil, > >> > >> Unfortunately I don't have any dmesg log from the old system or the > >> first boot under 12.04. > >> > >> Getting my system to boot at all under 12.04 was chaotic enough, with > >> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions > >> ravaging my array and then dropping me to a busybox shell over and over > >> again. I didn't think to record the very first error. > >> > >> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and > >> /dev/sdj1 don't have the Raid level "-unknown-", neither are they > >> labeled as spares. They are in fact, labeled clean and appear > >> *different* from the others. > >> > >> Could these disks still contain my metadata from 10.04? I recall during > >> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so > >> that I could drop in a SATA CD/DVDRW into the slot. > >> > >> I am downloading 10.04.4 LTS and will be ready to use it soon. I fear > >> having to do permutations-- 9! (factorial) would mean 362,880 > >> combinations. *gasp* > > You might be able to avoid the 9! combinations, which could take a while ... > > 4 days if you could test one per second. > > > > Try this: > > > > for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \ > > skip=4256 | od -D | head -n1; done > > > > This reads that 'dev_number' fields out of the metadata on each device. > > This should not have been corrupted by the bug. > > You might want some other pattern in place of "/dev/sd?1" - it needs to match > > all the devices in your array. > > > > Then on one of the devices which doesn't have corrupted metadata, run > > > > dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d > > > > where $COUNT is one more than the largest number that was reported in the > > "dev_number" values reported above. > > > > Now for each device, take the dev_number that was reported, use that as an > > index into the list of numbers produced by the second command, and that > > number if the role of the device in the array. i.e. it's position in the > > list. > > > > So after making an array of 5 'loop' devices in a non-obvious order, and > > failing a device and re-adding it: > > > > # for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done > > /dev/loop0 0000000 3 > > /dev/loop1 0000000 4 > > /dev/loop2 0000000 1 > > /dev/loop3 0000000 0 > > /dev/loop4 0000000 5 > > > > and > > > > # dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d > > 0000000 0 1 65534 3 4 2 > > 0000014 > > > > So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3' > > /dev/loop1 has 'dev_number' 4, so is device 4 > > /dev/loop4 has dev_number '5', so is device 2 > > etc > > So we can reconstruct the order of devices: > > > > /dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1 > > > > Note the '65534' in the list means that there is no device with that > > dev_number. i.e. no device is number '2', and looking at the list confirms > > that. > > > > You should be able to perform the same steps to recover the correct order to > > try creating the array. > > > > NeilBrown > > > > > Hi Neil, > > Thank you so much for taking the time to help me through this. > > Here's what I've come up with, per your instructions: > > /dev/sda1 0000000 4 > /dev/sdb1 0000000 11 > /dev/sdc1 0000000 7 > /dev/sde1 0000000 8 > /dev/sdf1 0000000 1 > /dev/sdg1 0000000 0 > /dev/sdh1 0000000 6 > /dev/sdi1 0000000 10 > /dev/sdj1 0000000 9 > > dd 2> /dev/null if=/dev/sdc1 bs=2 count=12 skip=2176 | od -d > 0000000 0 1 65534 65534 2 65534 4 5 > 0000020 6 7 8 3 > 0000030 > > Mind doing a sanity check for me? > > Based on the above information, one such possible device order is: > > /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* /dev/sda1 /dev/sdj1* /dev/sdh1 > /dev/sdc1 /dev/sde1 > > where * represents the three unknown devices marked by 65534? Nope. The 65534 entries should never come into it. sdg1 sdf1 sda1 sdb1 sdh1 sdc1 sde1 sdj1 sdi1 e.g. sdi1 is device '10'. Entry 10 in the array is 8, so sdi1 goes in position 8. > > Once I have your blessing, would I then proceed to: > > mdadm --create /dev/md0 --assume-clean --level=6 --raid-devices=9 > --metadata=1.2 --chunk=512 /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* > /dev/sda1 /dev/sdj1* /dev/sdh1 /dev/sdc1 /dev/sde1 > > and this is non-destructive, so I can attempt different orders? Yes. Well, it destroys the metadata so make sure you have a copy of the "-E" for each device, and it wouldn't hurt to run that second 'dd' command on every device and keep that just in case. NeilBrown > > Again, thank you for the help. > > Best wishes, > > -EJ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED] 2012-10-02 5:04 ` NeilBrown @ 2012-10-02 8:34 ` EJ Vincent 2012-10-02 12:18 ` Phil Turmel 0 siblings, 1 reply; 17+ messages in thread From: EJ Vincent @ 2012-10-02 8:34 UTC (permalink / raw) To: NeilBrown; +Cc: Phil Turmel, linux-raid On 10/2/2012 1:04 AM, NeilBrown wrote: > On Mon, 01 Oct 2012 23:53:08 -0400 EJ Vincent <ej@ejane.org> wrote: > >> On 10/1/2012 10:15 PM, NeilBrown wrote: >>> On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@ejane.org> wrote: >>> >>>> On 9/30/2012 4:28 PM, Phil Turmel wrote: >>>>> On 09/30/2012 03:25 PM, EJ Vincent wrote: >>>>>> On 9/30/2012 3:22 PM, Mathias Burén wrote: >>>>>>> Can't you just boot off an older Ubuntu USB, install mdadm and scan / >>>>>>> assemble, see the device order? >>>>>> Hi Mathias, >>>>>> >>>>>> I'm under the impression that damage to the metadata has already been >>>>>> done by 12.04, making a recovery from an older version of Ubuntu >>>>>> (10.04), impossible. Is this line of thinking, flawed? >>>>> Your impression is correct. Permanent damage to the metadata was done. >>>>> You *must* re-create your array. >>>>> >>>>> However, you *cannot* use your new version of mdadm, as it will get the >>>>> data offset wrong. Your first report showed a data offset of 272. >>>>> Newer versions of mdadm default to 2048. You *must* perform all of your >>>>> "mdadm --create --assume-clean" permutations with 10.04. >>>>> >>>>> Do you have *any* dmesg output from the old system? Or dmesg from the >>>>> very first boot under 12.04? That might have enough information to >>>>> shorten your search. >>>>> >>>>> In the future, you should record your setup by saving the output of >>>>> "mdadm -D" on each array, "mdadm -E" on each member device, and the >>>>> output of "ls -l /dev/disk/by-id/" >>>>> >>>>> Or try my documentation script "lsdrv". [1] >>>>> >>>>> HTH, >>>>> >>>>> Phil >>>>> >>>>> [1] http://github.com/pturmel/lsdrv >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> Hi Phil, >>>> >>>> Unfortunately I don't have any dmesg log from the old system or the >>>> first boot under 12.04. >>>> >>>> Getting my system to boot at all under 12.04 was chaotic enough, with >>>> the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions >>>> ravaging my array and then dropping me to a busybox shell over and over >>>> again. I didn't think to record the very first error. >>>> >>>> Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and >>>> /dev/sdj1 don't have the Raid level "-unknown-", neither are they >>>> labeled as spares. They are in fact, labeled clean and appear >>>> *different* from the others. >>>> >>>> Could these disks still contain my metadata from 10.04? I recall during >>>> my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so >>>> that I could drop in a SATA CD/DVDRW into the slot. >>>> >>>> I am downloading 10.04.4 LTS and will be ready to use it soon. I fear >>>> having to do permutations-- 9! (factorial) would mean 362,880 >>>> combinations. *gasp* >>> You might be able to avoid the 9! combinations, which could take a while ... >>> 4 days if you could test one per second. >>> >>> Try this: >>> >>> for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \ >>> skip=4256 | od -D | head -n1; done >>> >>> This reads that 'dev_number' fields out of the metadata on each device. >>> This should not have been corrupted by the bug. >>> You might want some other pattern in place of "/dev/sd?1" - it needs to match >>> all the devices in your array. >>> >>> Then on one of the devices which doesn't have corrupted metadata, run >>> >>> dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d >>> >>> where $COUNT is one more than the largest number that was reported in the >>> "dev_number" values reported above. >>> >>> Now for each device, take the dev_number that was reported, use that as an >>> index into the list of numbers produced by the second command, and that >>> number if the role of the device in the array. i.e. it's position in the >>> list. >>> >>> So after making an array of 5 'loop' devices in a non-obvious order, and >>> failing a device and re-adding it: >>> >>> # for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done >>> /dev/loop0 0000000 3 >>> /dev/loop1 0000000 4 >>> /dev/loop2 0000000 1 >>> /dev/loop3 0000000 0 >>> /dev/loop4 0000000 5 >>> >>> and >>> >>> # dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d >>> 0000000 0 1 65534 3 4 2 >>> 0000014 >>> >>> So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3' >>> /dev/loop1 has 'dev_number' 4, so is device 4 >>> /dev/loop4 has dev_number '5', so is device 2 >>> etc >>> So we can reconstruct the order of devices: >>> >>> /dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1 >>> >>> Note the '65534' in the list means that there is no device with that >>> dev_number. i.e. no device is number '2', and looking at the list confirms >>> that. >>> >>> You should be able to perform the same steps to recover the correct order to >>> try creating the array. >>> >>> NeilBrown >>> >> >> Hi Neil, >> >> Thank you so much for taking the time to help me through this. >> >> Here's what I've come up with, per your instructions: >> >> /dev/sda1 0000000 4 >> /dev/sdb1 0000000 11 >> /dev/sdc1 0000000 7 >> /dev/sde1 0000000 8 >> /dev/sdf1 0000000 1 >> /dev/sdg1 0000000 0 >> /dev/sdh1 0000000 6 >> /dev/sdi1 0000000 10 >> /dev/sdj1 0000000 9 >> >> dd 2> /dev/null if=/dev/sdc1 bs=2 count=12 skip=2176 | od -d >> 0000000 0 1 65534 65534 2 65534 4 5 >> 0000020 6 7 8 3 >> 0000030 >> >> Mind doing a sanity check for me? >> >> Based on the above information, one such possible device order is: >> >> /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* /dev/sda1 /dev/sdj1* /dev/sdh1 >> /dev/sdc1 /dev/sde1 >> >> where * represents the three unknown devices marked by 65534? > Nope. The 65534 entries should never come into it. > > sdg1 sdf1 sda1 sdb1 sdh1 sdc1 sde1 sdj1 sdi1 > > e.g. sdi1 is device '10'. Entry 10 in the array is 8, so sdi1 goes in > position 8. > >> Once I have your blessing, would I then proceed to: >> >> mdadm --create /dev/md0 --assume-clean --level=6 --raid-devices=9 >> --metadata=1.2 --chunk=512 /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* >> /dev/sda1 /dev/sdj1* /dev/sdh1 /dev/sdc1 /dev/sde1 >> >> and this is non-destructive, so I can attempt different orders? > Yes. Well, it destroys the metadata so make sure you have a copy of the "-E" > for each device, and it wouldn't hurt to run that second 'dd' command on > every device and keep that just in case. > > NeilBrown > >> Again, thank you for the help. >> >> Best wishes, >> >> -EJ Neil, I've successfully re-created the array using the corrected device order you specified. For the purpose of documenting, I immediately started an 'xfs_check', but due to the size of the filesystem, it quickly (under 90 seconds) consumed all available memory on the server (16GB). I instead used 'xfs_repair -n', which ran for about one minute before returning me to a shell (no errors reported): (-n No modify mode. Specifies that xfs_repair should not modify the filesystem but should only scan the filesystem and indicate what repairs would have been made.) I then set the sync_action under /sys/block/md0/md/ to 'check' and also increased the stripe_cache_size to something not so modest, 4096 up from 256. I'm monitoring /sys/block/md0/md/mismatch_cnt using tail -f and so far it has been stuck at 0, a good sign for sure. I'm well on my way to a complete recovery (about 25% checked as of writing this). I want to thank you again Neil (and the rest of the linux-raid mailing list) for the absolutely flawless and expert support you've provided. Best wishes, -EJ -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED] 2012-10-02 8:34 ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED] EJ Vincent @ 2012-10-02 12:18 ` Phil Turmel 0 siblings, 0 replies; 17+ messages in thread From: Phil Turmel @ 2012-10-02 12:18 UTC (permalink / raw) To: EJ Vincent; +Cc: NeilBrown, linux-raid On 10/02/2012 04:34 AM, EJ Vincent wrote: > Neil, > > I've successfully re-created the array using the corrected device order > you specified. Great news. I'm tucking Neil's procedure away in my toolbox... Phil ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6. 2012-09-30 19:20 ` EJ Vincent 2012-09-30 19:22 ` Mathias Burén @ 2012-09-30 19:50 ` Chris Murphy 1 sibling, 0 replies; 17+ messages in thread From: Chris Murphy @ 2012-09-30 19:50 UTC (permalink / raw) To: Linux RAID On Sep 30, 2012, at 1:20 PM, EJ Vincent wrote: > On 9/30/2012 6:04 AM, Mikael Abrahamsson wrote: >> >> <http://neil.brown.name/blog/20120615073245> > 3.2.0-23-generic. That kernel is inside the range that had the bug, although I'm not sure if that kernel actually has the bug. The symptoms match up as you say. > > That link looks interesting-- I'm not sure if I triggered the bug how Mr. Neil Brown describes it, but I definitely have symptoms on some (not all) the disks of RAID level "-unknown-" and devices appearing to be spares. > > I'm hesitant to re-create the array again (using mdadm) because according to that blog post, for RAID-6, the order of devices are important, and with this being a 9 disk array and no record of device order in logs or from my own memory, I have no idea what the proper order might be. You'll have to iterate, it sounds like, if you have nothing else to go on. Faster to iterate and try again than to blow way the RAID and restore from backup. > I do know that 1) I was using metadata version 1.2, 2) the array was not degraded and subsequently 3) no disks were missing. > > Am I over-estimating the importance of the order and should proceed with the re-creation, or perhaps wait for Neil himself to weigh in the problem? Either. Just make sure you're using --assume-clean and don't mount it, to prevent either resync or changes to the file system. The echo > check (read only scrub) test described in the blog entry will rather clearly tell you if you get the order of the disks correct. The blog entry is pretty detailed. The question I have remaining is if there's some way for you to cheat and have a better chance at getting the disk order correct. Chris Murphy ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-10-02 12:18 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-30 9:21 Upgrade from Ubuntu 10.04 to 12.04 broken raid6 EJ 2012-09-30 9:30 ` EJ Vincent 2012-09-30 9:44 ` Jan Ceuleers 2012-09-30 10:04 ` Mikael Abrahamsson 2012-09-30 19:20 ` EJ Vincent 2012-09-30 19:22 ` Mathias Burén 2012-09-30 19:25 ` EJ Vincent 2012-09-30 20:28 ` Phil Turmel 2012-09-30 23:23 ` EJ Vincent 2012-10-01 12:40 ` Phil Turmel 2012-10-01 17:14 ` EJ Vincent 2012-10-02 2:15 ` NeilBrown 2012-10-02 3:53 ` EJ Vincent 2012-10-02 5:04 ` NeilBrown 2012-10-02 8:34 ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6. [SOLVED] EJ Vincent 2012-10-02 12:18 ` Phil Turmel 2012-09-30 19:50 ` Upgrade from Ubuntu 10.04 to 12.04 broken raid6 Chris Murphy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).