* RAID 6 (containing LUKS dm-crypt) recovery help.
@ 2014-11-07 5:46 xar
2014-11-07 10:24 ` Peter Grandi
0 siblings, 1 reply; 5+ messages in thread
From: xar @ 2014-11-07 5:46 UTC (permalink / raw)
To: linux-raid
Greetings,
I have a RAID 6 (which contains a LUKS container) that I'm hoping to get
some help/insight in recovering. The server experienced some sort of
hardware event that resulted in a mandatory restart of the server.
For the record, after the server completed the restart, the array looked
like this, "all spares":
md6 : inactive sdl1[7](S) sdh1[13](S) sdg1[14](S) sdk1[11](S)
sdj1[10](S) sdi1[6](S) sdd1[2](S) sdf1[8](S) sdb1[12](S) sde1[3](S)
sdc1[15](S) 21488638704 blocks super 1.2
The server in question is Ubuntu ("Precise") 12.04.5 LTS with mdadm
version 3.2.5-1ubuntu0 installed.
The mdadm array has the following characteristics:
RAID level: 6
Chunk size: 256k
Version: 1.2
Number of devices: 11
All attempts to assemble the array continued to result in the "all
spare" condition (output above). Thinking that the metadata had been
corrupted somehow, I set out to recreate the array.
The following is the dev_number fields from the metadata, before I
attempted to recreate the array:
# for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1
count=4 skip=4256 | od -D | head -n1; done:
/dev/sdb1 0000000 12
/dev/sdc1 0000000 15
/dev/sdd1 0000000 2
/dev/sde1 0000000 3
/dev/sdf1 0000000 8
/dev/sdg1 0000000 14
/dev/sdh1 0000000 13
/dev/sdi1 0000000 6
/dev/sdj1 0000000 10
/dev/sdk1 0000000 11
/dev/sdl1 0000000 7
I used the following to extract the index position of each device on a
device I suspected wasn't corrupted (for the record, they all returned
the same data):
# dd 2> /dev/null if=/dev/sdc1 bs=2 count=6 skip=2176 | od -d
0000000 65534 65534 2 65534 65534 65534 4 5
0000020 65534 65534 7 8 0 9 65534
0000036
As you can see, there's already a visible mismatch between the
dev_number and the listed indexes. For instance, /dev/sdc1 returned a
device number of 15, but there's not a 15th position in the
corresponding list.
I pulled from log history, the last known working "layout", circa July
of this year:
# mdadm -D /dev/md6
/dev/md6:
Version : 1.2
Creation Time : Sat Apr 23 06:22:23 2011
Raid Level : raid6
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 1953512192 (1863.01 GiB 2000.40 GB)
Raid Devices : 11
Total Devices : 10
Persistence : Superblock is persistent
Update Time : Sat Jun 21 21:13:45 2014
State : clean, degraded
Active Devices : 10
Working Devices : 10
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 256K
Name : server:6 (local to host server)
UUID : 6958450b:e4dfa2f3:259ff733:f343a584
Events : 390345
Number Major Minor RaidDevice State
12 8 17 0 active sync /dev/sdb1
3 8 65 1 active sync /dev/sde1
2 8 49 2 active sync /dev/sdd1
8 8 81 3 active sync /dev/sdf1
6 8 129 4 active sync /dev/sdi1
7 8 177 5 active sync /dev/sdl1
6 0 0 6 removed
10 8 145 7 active sync /dev/sdj1
11 8 161 8 active sync /dev/sdk1
13 8 113 9 active sync /dev/sdh1
14 8 97 10 active sync /dev/sdg1
The dev_numbers and index position information in conjunction with the
historic data (directly above) seemed to indicate that the proper
recreation order and command would be the following:
# mdadm --create /dev/md6 --assume-clean --level=6 --raid-devices=11
--metadata=1.2 --chunk=256 /dev/sdb1 /dev/sde1 /dev/sdd1 /dev/sdf1
/dev/sdi1 /dev/sdl1 /dev/sdc1 /dev/sdj1 /dev/sdk1 /dev/sdh1 /dev/sdg1
I ran this above command.
Here is the current output of "lsdrv" which I observed several people
found to be relevant on this list, from creator 'pturmel' on github:
PCI [ahci] 00:11.0 SATA controller: Advanced Micro Devices, Inc.
[AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40)
âscsi 0:0:0:0 ATA ST31500541AS {6XW03WTD}
ââsda 1.36t [8:0] Partitioned (dos)
â âsda1 109.79m [8:1] ext4 {07f99e8c-95d2-483d-9850-05f04820c3f6}
â ââMounted as /dev/sda1 @ /boot
â âsda2 2.01g [8:2] swap {d137430d-815a-4c45-a394-9bece3aa7136}
â âsda3 7.01g [8:3] ext4 {8db73200-8d9d-4991-9802-b13f1550a9d9}
â ââMounted as /dev/disk/by-uuid/8db73200-8d9d-4991-9802-b13f1550a9d9 @ /
â âsda4 1.36t [8:4] Empty/Unknown
â âdm-0 1.36t [252:0] xfs {db7ddb53-080c-45ba-ab4d-e45d35eb451c}
â âMounted as /dev/mapper/enc @ /encrypted
âscsi 1:0:0:0 ATA ST2000DL003-9VT1 {5YD4VZLV}
ââsdb 1.82t [8:16] Partitioned (gpt)
â âsdb1 1.82t [8:17] MD raid6 (11) inactive 'server:6'
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 2:0:0:0 ATA ST2000DM001-1CH1 {Z1E8GNFQ}
ââsdc 1.82t [8:32] Partitioned (gpt)
â âsdc1 1.82t [8:33] MD raid6 (11) inactive 'server:6'
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 3:0:0:0 ATA ST2000DL003-9VT1 {5YD2PZM3}
ââsdd 1.82t [8:48] Partitioned (gpt)
â âsdd1 1.82t [8:49] MD raid6 (11) inactive 'server:6'
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 4:0:0:0 ATA ST2000DL003-9VT1 {5YD2J0XD}
ââsde 1.82t [8:64] Partitioned (gpt)
â âsde1 1.82t [8:65] MD raid6 (11) inactive 'server:6'
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 5:0:0:0 ATA ST2000DL003-9VT1 {5YD3XE9M}
âsdf 1.82t [8:80] Partitioned (gpt)
âsdf1 1.82t [8:81] MD raid6 (11) inactive 'server:6'
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
PCI [sata_sil24] 04:04.0 RAID bus controller: Silicon Image, Inc. SiI
3124 PCI-X Serial ATA Controller (rev 02)
âscsi 6:0:0:0 ATA ST2000DL003-9VT1 {5YD6JW2L}
ââsdg 1.82t [8:96] Partitioned (gpt)
â âsdg1 1.82t [8:97] MD raid6 (11) inactive 'server:6'
{65daae65-118b-896a-6205-0f2c4dacb4de}
âscsi 7:0:0:0 ATA ST2000DL003-9VT1 {6YD05E5Y}
ââsdh 1.82t [8:112] Partitioned (gpt)
â âsdh1 1.82t [8:113] MD raid6 (11) inactive 'server:6'
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 8:x:x:x [Empty]
âscsi 9:x:x:x [Empty]
PCI [sata_sil24] 06:04.0 RAID bus controller: Silicon Image, Inc. SiI
3124 PCI-X Serial ATA Controller (rev 02)
âscsi 10:0:0:0 ATA ST32000542AS {5XW1PVCZ}
ââsdi 1.82t [8:128] Partitioned (gpt)
â âsdi1 1.82t [8:129] MD raid6 (11) inactive 'server:6'
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 11:0:0:0 ATA ST2000DL003-9VT1 {5YD2SND2}
ââsdj 1.82t [8:144] Partitioned (gpt)
â âsdj1 1.82t [8:145] MD raid6 (11) inactive 'server:6'
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 12:0:0:0 ATA ST2000DL003-9VT1 {5YD4JTZP}
ââsdk 1.82t [8:160] Partitioned (gpt)
â âsdk1 1.82t [8:161] MD raid6 (11) inactive 'server:6'
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 13:0:0:0 ATA ST32000542AS {5XW1KAEA}
âsdl 1.82t [8:176] Partitioned (gpt)
âsdl1 1.82t [8:177] MD raid6 (11) inactive 'server:6'
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
Other Block Devices
âloop0 0.00k [7:0] Empty/Unknown
âloop1 0.00k [7:1] Empty/Unknown
âloop2 0.00k [7:2] Empty/Unknown
âloop3 0.00k [7:3] Empty/Unknown
âloop4 0.00k [7:4] Empty/Unknown
âloop5 0.00k [7:5] Empty/Unknown
âloop6 0.00k [7:6] Empty/Unknown
âloop7 0.00k [7:7] Empty/Unknown
âram0 64.00m [1:0] Empty/Unknown
âram1 64.00m [1:1] Empty/Unknown
âram2 64.00m [1:2] Empty/Unknown
âram3 64.00m [1:3] Empty/Unknown
âram4 64.00m [1:4] Empty/Unknown
âram5 64.00m [1:5] Empty/Unknown
âram6 64.00m [1:6] Empty/Unknown
âram7 64.00m [1:7] Empty/Unknown
âram8 64.00m [1:8] Empty/Unknown
âram9 64.00m [1:9] Empty/Unknown
âram10 64.00m [1:10] Empty/Unknown
âram11 64.00m [1:11] Empty/Unknown
âram12 64.00m [1:12] Empty/Unknown
âram13 64.00m [1:13] Empty/Unknown
âram14 64.00m [1:14] Empty/Unknown
âram15 64.00m [1:15] Empty/Unknown
Following the recreation, the array now looks like this:
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md6 : active raid6 sdh1[9] sdk1[8] sdj1[7] sdc1[6] sdl1[5] sdi1[4]
sdf1[3] sdd1[2] sde1[1] sdb1[0]
17580439296 blocks super 1.2 level 6, 256k chunk, algorithm 2
[11/10] [UUUUUUUUUUU]
# mdadm -D /dev/md6
/dev/md6:
Version : 1.2
Creation Time : Fri Nov 7 05:40:16 2014
Raid Level : raid6
Array Size : 17580439296 (16766.01 GiB 18002.37 GB)
Used Dev Size : 1953382144 (1862.89 GiB 2000.26 GB)
Raid Devices : 11
Total Devices : 11
Persistence : Superblock is persistent
Update Time : Fri Nov 7 05:40:16 2014
State : clean
Active Devices : 11
Working Devices : 11
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 256K
Name : server:6 (local to host server)
UUID : b306872f:5ef902a8:76f5e233:f220f4d4
Events : 0
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 65 1 active sync /dev/sde1
2 8 49 2 active sync /dev/sdd1
3 8 81 3 active sync /dev/sdf1
4 8 129 4 active sync /dev/sdi1
5 8 177 5 active sync /dev/sdl1
6 8 33 6 active sync /dev/sdc1
7 8 145 7 active sync /dev/sdj1
8 8 161 8 active sync /dev/sdk1
9 8 113 9 active sync /dev/sdh1
10 8 97 10 active sync /dev/sdg1
This array should contain a LUKS container, however, it's missing. If I
hexdump the first 20 lines, the LUKS header is completely missing:
# cryptsetup luksOpen /dev/md6 luks
Device /dev/md6 is not a valid LUKS device.
# hexdump -C /dev/md6 | head -n16
00000000 0b 37 89 e0 66 96 7a d4 6c 5b 57 09 a5 8d 6a c5
|.7..f.z.l[W...j.|
00000010 a7 65 20 6e f0 db 74 db 03 d8 e9 2b 39 05 37 a4 |.e
n..t....+9.7.|
00000020 cb 25 d7 7b fd cf b5 b4 12 ad e2 24 24 de 66 42
|.%.{.......$$.fB|
00000030 61 a2 1b ea 8b 5c 04 38 7e 5e 61 11 3d ba 99 35
|a....\.8~^a.=..5|
00000040 b7 e9 e6 76 72 18 d2 d5 bd cd 1b ed 59 15 fb 83
|...vr.......Y...|
00000050 bc 57 94 85 31 c1 3e af 51 f1 25 50 db 57 d3 cd
|.W..1.>.Q.%P.W..|
00000060 69 d5 31 23 df 01 ef 03 e3 92 66 c6 1f 38 3f 57
|i.1#......f..8?W|
00000070 67 20 38 8c c2 ec 25 dc 59 42 b4 5d 9d 9e c1 79 |g
8...%.YB.]...y|
00000080 4a f5 e1 ad f8 08 16 d5 37 3f f6 83 62 f2 6f f5
|J.......7?..b.o.|
00000090 53 95 4f 69 ce 7c ba 4c 86 ef a1 1c 04 d7 b3 17
|S.Oi.|.L........|
000000a0 cd ea 5f 25 56 a4 0d 6f 64 e9 51 b5 71 b3 18 7f
|.._%V..od.Q.q...|
000000b0 46 e7 8b ab 08 ae f5 ed 65 0d 8f 3e 8b 03 25 5c
|F.......e..>..%\|
000000c0 bb 50 dc e6 31 33 4a 88 8e 22 20 72 f0 11 71 d0 |.P..13J.."
r..q.|
000000d0 59 c7 9d 20 f8 e2 f0 f8 75 5f ea 4a 57 d7 d7 9e |Y..
....u_.JW...|
000000e0 c8 05 85 9d d7 cf c9 ab 53 de 11 6f bf d4 e3 b2
|........S..o....|
000000f0 f6 5e 1e 46 5c 16 ae 46 a3 b5 9b f4 b9 ff ca 0c
|.^.F\..F........|
Is the "mdadm --create" operation that I issued, incorrect? Have I done
anything in error?
Unfortunately, I do not have a backup of the LUKS header, as I've
personally never encountered a situation like this, nor was I privy to
the knowledge that LUKS headers should be backed up at all.
If relevant, I did observe that /dev/sdb1 and /dev/sdg1 were showing
"Offline_Uncorrectable" errors via SMART, but nothing that I could
imagine would have contributed to this current predicament.
Is my data gone? Any and all insight are extremly welcomed and appreciated.
Warm regards,
-xar
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RAID 6 (containing LUKS dm-crypt) recovery help.
2014-11-07 5:46 RAID 6 (containing LUKS dm-crypt) recovery help xar
@ 2014-11-07 10:24 ` Peter Grandi
2014-11-07 11:40 ` xar
0 siblings, 1 reply; 5+ messages in thread
From: Peter Grandi @ 2014-11-07 10:24 UTC (permalink / raw)
To: Linux RAID
> [ ... ] The server experienced some sort of hardware event
> that resulted in a mandatory restart of the server.
Details would be helpful: because if some problem happens the
standard advice is "reload from backups". If you want to
shortcut that to mostly-recovery context matters to figuring out
how and how safely.
> [ ... ] completed the restart, the array looked like this,
> "all spares":
> md6 :
What happened to the other MD sets on the same server, if any?
Any damage? Because if those suffered no damage, there is the
possibility that the disk rack backplane holding the members of
'md6' got damaged, or the specific host adapter; and that the MD
set content is entirely undamaged and the funny stuff being read
is a transmission problem.
> inactive sdl1[7](S) sdh1[13](S) sdg1[14](S) sdk1[11](S)
> sdj1[10](S) sdi1[6](S) sdd1[2](S) sdf1[8](S) sdb1[12](S)
> sde1[3](S) sdc1[15](S) 21488638704 blocks super 1.2
"Clever" people hide details as possible, and go to such lengths
as to actually remove vital information as for example what
literally follows "super 1.2" here. Because actual quotes are
too "insipid" and paraphrases are more "challenging":
> The mdadm array has the following characteristics: RAID level:
> 6 Chunk size: 256k Version: 1.2 Number of devices: 11
How do you know? Is this part of your records or from actual
output of 'mdadm --examine'?
But assuming the above is somewhat reliable there is an
"interesting" situation: in "21488638704 blocks" the number
21,488,638,704 is not a whole multiple of 9:
$ factor 21488638704
21488638704: 2 2 2 2 3 13 1801 19121
> All attempts to assemble the array continued to result in the "all
> spare" condition (output above). Thinking that the metadata had been
> corrupted somehow,
Apparently without ever trying 'mdadm --detail /dev/md6' or
'mdadm --examine /dev/sd...' as per:
https://raid.wiki.kernel.org/index.php/RAID_Recovery
> I set out to recreate the array.
Quite "brave":
https://raid.wiki.kernel.org/index.php/RAID_Recovery
«Restore array by recreating (after multiple device failure)
Recreating should be considered a *last* resort, only to be
used when everything else fails.
People getting this wrong is one of the primary reasons people
lose data. It is very commonly used way too early in the fault
finding process. You have been warned!»
> The following is the dev_number fields from the metadata,
> before I attempted to recreate the array: for i in /dev/sd?1;
> do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4
> skip=4256 | od -D | head -n1; done: I used the following to
> extract the index position of each device on a device I
> suspected wasn't corrupted (for the record, they all returned
> the same data): [ ... ]
It is very "astute" indeed to use 'dd' instead of 'mdadm
--examine'. For example it "encourages" people who might want
to help to spend some extra time checking your offsets, that
"teaches" them.
[ ... ]
> Number Major Minor RaidDevice State
> 12 8 17 0 active sync /dev/sdb1
> 3 8 65 1 active sync /dev/sde1
> 2 8 49 2 active sync /dev/sdd1
> 8 8 81 3 active sync /dev/sdf1
> 6 8 129 4 active sync /dev/sdi1
> 7 8 177 5 active sync /dev/sdl1
> 6 0 0 6 removed
> 10 8 145 7 active sync /dev/sdj1
> 11 8 161 8 active sync /dev/sdk1
> 13 8 113 9 active sync /dev/sdh1
> 14 8 97 10 active sync /dev/sdg1
> The dev_numbers and index position information in conjunction
> with the historic data (directly above) seemed to indicate
> that the proper recreation order and command would be the
> following:
> mdadm --create /dev/md6 --assume-clean --level=6
> --raid-devices=11 --metadata=1.2 --chunk=256 /dev/sdb1
> /dev/sde1 /dev/sdd1 /dev/sdf1 /dev/sdi1 /dev/sdl1 /dev/sdc1
> /dev/sdj1 /dev/sdk1 /dev/sdh1 /dev/sdg1
The main consequence of the above is that the original MD member
metadata blocks are no longer available unless something like
this has been done:
https://raid.wiki.kernel.org/index.php/RAID_Recovery
«Preserving RAID superblock information
One of the most useful things to do first, when trying to
recover a broken RAID array, is to preserve the information
reported in the RAID superblocks on each device at the time
the array went down (and before you start trying to recreate
the array). Something like
mdadm --examine /dev/sd[bcdefghijklmn]1 >> raid.status»
If you went to the lengths to write 'dd' expressions, you might
as well have saved the output of '--examine'. Perhaps you did,
but if you did not attach that output to your request for help
it would be rather "stunning".
[ ... ]
> Is the "mdadm --create" operation that I issued, incorrect?
> Have I done anything in error?
There is something strange: what you report being the output of
'--detail' from July:
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 1953512192 (1863.01 GiB 2000.40 GB)
and the output of '--detail' for the re-created:
Array Size : 17580439296 (16766.01 GiB 18002.37 GB)
Used Dev Size : 1953382144 (1862.89 GiB 2000.26 GB)
Both numbers don't match. They are *slightly* different. In
particular it is rather strange that the "Used Dev Size" is
different. How is that possible? Have the disks shrunk a little
in the meantime? :-)
It is intriguing that the difference between 1953512192 and
1953382144 is 1024*127KiB or 1024*254 sectors.
Also I have noticed that the MD set is composed of disk of 3
different models (ST2000DL003-9VT1, ST2000DM001-1CH1,
ST32000542AS)...
> Is my data gone? Any and all insight are extremly welcomed and
> appreciated.
Whether your data is gone depends on what kind of hardware issue
you have had, and to the consequence of the "brave" '--create'
above. But also how the MD set was setup, e.g. with members of
slightly different sizes. The inconsistencies in the reported
numbers are "confusing".
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RAID 6 (containing LUKS dm-crypt) recovery help.
2014-11-07 10:24 ` Peter Grandi
@ 2014-11-07 11:40 ` xar
2014-11-07 14:22 ` Peter Grandi
0 siblings, 1 reply; 5+ messages in thread
From: xar @ 2014-11-07 11:40 UTC (permalink / raw)
To: Peter Grandi, Linux RAID
On 11/7/2014 5:24 AM, Peter Grandi wrote:
>> [ ... ] The server experienced some sort of hardware event
>> that resulted in a mandatory restart of the server.
> Details would be helpful: because if some problem happens the
> standard advice is "reload from backups". If you want to
> shortcut that to mostly-recovery context matters to figuring out
> how and how safely.
>
>> [ ... ] completed the restart, the array looked like this,
>> "all spares":
>> md6 :
> What happened to the other MD sets on the same server, if any?
> Any damage? Because if those suffered no damage, there is the
> possibility that the disk rack backplane holding the members of
> 'md6' got damaged, or the specific host adapter; and that the MD
> set content is entirely undamaged and the funny stuff being read
> is a transmission problem.
>
>> inactive sdl1[7](S) sdh1[13](S) sdg1[14](S) sdk1[11](S)
>> sdj1[10](S) sdi1[6](S) sdd1[2](S) sdf1[8](S) sdb1[12](S)
>> sde1[3](S) sdc1[15](S) 21488638704 blocks super 1.2
> "Clever" people hide details as possible, and go to such lengths
> as to actually remove vital information as for example what
> literally follows "super 1.2" here. Because actual quotes are
> too "insipid" and paraphrases are more "challenging":
>
>> The mdadm array has the following characteristics: RAID level:
>> 6 Chunk size: 256k Version: 1.2 Number of devices: 11
> How do you know? Is this part of your records or from actual
> output of 'mdadm --examine'?
>
> But assuming the above is somewhat reliable there is an
> "interesting" situation: in "21488638704 blocks" the number
> 21,488,638,704 is not a whole multiple of 9:
>
> $ factor 21488638704
> 21488638704: 2 2 2 2 3 13 1801 19121
>
>> All attempts to assemble the array continued to result in the "all
>> spare" condition (output above). Thinking that the metadata had been
>> corrupted somehow,
> Apparently without ever trying 'mdadm --detail /dev/md6' or
> 'mdadm --examine /dev/sd...' as per:
>
> https://raid.wiki.kernel.org/index.php/RAID_Recovery
>
>> I set out to recreate the array.
> Quite "brave":
>
> https://raid.wiki.kernel.org/index.php/RAID_Recovery
> «Restore array by recreating (after multiple device failure)
> Recreating should be considered a *last* resort, only to be
> used when everything else fails.
> People getting this wrong is one of the primary reasons people
> lose data. It is very commonly used way too early in the fault
> finding process. You have been warned!»
>
>> The following is the dev_number fields from the metadata,
>> before I attempted to recreate the array: for i in /dev/sd?1;
>> do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4
>> skip=4256 | od -D | head -n1; done: I used the following to
>> extract the index position of each device on a device I
>> suspected wasn't corrupted (for the record, they all returned
>> the same data): [ ... ]
> It is very "astute" indeed to use 'dd' instead of 'mdadm
> --examine'. For example it "encourages" people who might want
> to help to spend some extra time checking your offsets, that
> "teaches" them.
>
> [ ... ]
>> Number Major Minor RaidDevice State
>> 12 8 17 0 active sync /dev/sdb1
>> 3 8 65 1 active sync /dev/sde1
>> 2 8 49 2 active sync /dev/sdd1
>> 8 8 81 3 active sync /dev/sdf1
>> 6 8 129 4 active sync /dev/sdi1
>> 7 8 177 5 active sync /dev/sdl1
>> 6 0 0 6 removed
>> 10 8 145 7 active sync /dev/sdj1
>> 11 8 161 8 active sync /dev/sdk1
>> 13 8 113 9 active sync /dev/sdh1
>> 14 8 97 10 active sync /dev/sdg1
>> The dev_numbers and index position information in conjunction
>> with the historic data (directly above) seemed to indicate
>> that the proper recreation order and command would be the
>> following:
>> mdadm --create /dev/md6 --assume-clean --level=6
>> --raid-devices=11 --metadata=1.2 --chunk=256 /dev/sdb1
>> /dev/sde1 /dev/sdd1 /dev/sdf1 /dev/sdi1 /dev/sdl1 /dev/sdc1
>> /dev/sdj1 /dev/sdk1 /dev/sdh1 /dev/sdg1
> The main consequence of the above is that the original MD member
> metadata blocks are no longer available unless something like
> this has been done:
>
> https://raid.wiki.kernel.org/index.php/RAID_Recovery
> «Preserving RAID superblock information
> One of the most useful things to do first, when trying to
> recover a broken RAID array, is to preserve the information
> reported in the RAID superblocks on each device at the time
> the array went down (and before you start trying to recreate
> the array). Something like
> mdadm --examine /dev/sd[bcdefghijklmn]1 >> raid.status»
>
> If you went to the lengths to write 'dd' expressions, you might
> as well have saved the output of '--examine'. Perhaps you did,
> but if you did not attach that output to your request for help
> it would be rather "stunning".
>
> [ ... ]
>
>> Is the "mdadm --create" operation that I issued, incorrect?
>> Have I done anything in error?
> There is something strange: what you report being the output of
> '--detail' from July:
>
> Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
> Used Dev Size : 1953512192 (1863.01 GiB 2000.40 GB)
>
> and the output of '--detail' for the re-created:
>
> Array Size : 17580439296 (16766.01 GiB 18002.37 GB)
> Used Dev Size : 1953382144 (1862.89 GiB 2000.26 GB)
>
> Both numbers don't match. They are *slightly* different. In
> particular it is rather strange that the "Used Dev Size" is
> different. How is that possible? Have the disks shrunk a little
> in the meantime? :-)
>
> It is intriguing that the difference between 1953512192 and
> 1953382144 is 1024*127KiB or 1024*254 sectors.
>
> Also I have noticed that the MD set is composed of disk of 3
> different models (ST2000DL003-9VT1, ST2000DM001-1CH1,
> ST32000542AS)...
>
>> Is my data gone? Any and all insight are extremly welcomed and
>> appreciated.
> Whether your data is gone depends on what kind of hardware issue
> you have had, and to the consequence of the "brave" '--create'
> above. But also how the MD set was setup, e.g. with members of
> slightly different sizes. The inconsistencies in the reported
> numbers are "confusing".
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello Peter,
Thank you very much for your thorough and responsive reply. I will do my
best to clarify where possible.
> [ ... ] The server experienced some sort of hardware event
> that resulted in a mandatory restart of the server.
> Details would be helpful: because if some problem happens the
> standard advice is "reload from backups". If you want to
> shortcut that to mostly-recovery context matters to figuring out
> how and how safely.
Regarding the nature of the hardware event, unfortunately details are in
short supply: the server became unresponsive over the console when
attempting to connect via SSH, prompting a restart of the server. I
don't believe there was evidence of a power drop or loss. No server or
kernel logs are available for review.
> [ ... ] completed the restart, the array looked like this,
> "all spares":
>> md6 :
> What happened to the other MD sets on the same server, if any?
> Any damage? Because if those suffered no damage, there is the
> possibility that the disk rack backplane holding the members of
> 'md6' got damaged, or the specific host adapter; and that the MD
> set content is entirely undamaged and the funny stuff being read
> is a transmission problem.
"md6" is the only MD set on the server, so name as it is has a
raid-level 6. Sorry for any confusion.
> The mdadm array has the following characteristics: RAID level:
> 6 Chunk size: 256k Version: 1.2 Number of devices: 11
> How do you know? Is this part of your records or from actual
> output of 'mdadm --examine'?
> All attempts to assemble the array continued to result in the "all
> spare" condition (output above). Thinking that the metadata had been
> corrupted somehow,
> Apparently without ever trying 'mdadm --detail /dev/md6' or
> 'mdadm --examine /dev/sd...' as per:
> If you went to the lengths to write 'dd' expressions, you might
> as well have saved the output of '--examine'. Perhaps you did,
> but if you did not attach that output to your request for help
> it would be rather "stunning".
Yes, I saved the -E/--examine information, "just in case". :-)
Before performing a re-create of the array, I did, in fact, print the
contents (-E, --examine) of the metadata stored on each device:
# cat mdadm.e.bak
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
Name : server:6 (local to host server)
Creation Time : Sat Apr 23 06:22:23 2011
Raid Level : raid6
Raid Devices : 11
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 12a56302:5b436263:1b841be2:fccd07ed
Update Time : Fri Nov 7 00:37:26 2014
Checksum : d7063845 - correct
Events : 667126
Layout : left-symmetric
Chunk Size : 256K
Device Role : Active device 0
Array State : A.A.AA.AAA. ('A' == active, '.' == missing)
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
Name : server:6 (local to host server)
Creation Time : Sat Apr 23 06:22:23 2011
Raid Level : raid6
Raid Devices : 11
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 0416e499:16488db2:5473119d:1a0c8141
Update Time : Sun Nov 2 12:24:42 2014
Checksum : cd22e98b - correct
Events : 667122
Layout : left-symmetric
Chunk Size : 256K
Device Role : Active device 6
Array State : A.A.AAAAAA. ('A' == active, '.' == missing)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
Name : server:6 (local to host server)
Creation Time : Sat Apr 23 06:22:23 2011
Raid Level : raid6
Raid Devices : 11
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 56f35811:d62afc50:a893a3af:10f01367
Update Time : Fri Nov 7 00:37:26 2014
Checksum : 1b299f9b - correct
Events : 667126
Layout : left-symmetric
Chunk Size : 256K
Device Role : Active device 2
Array State : A.A.AA.AAA. ('A' == active, '.' == missing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
Name : server:6 (local to host server)
Creation Time : Sat Apr 23 06:22:23 2011
Raid Level : raid6
Raid Devices : 11
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 63f4d908:16f38b7f:ebd9a1d7:0f186e56
Update Time : Sun Nov 2 10:23:32 2014
Checksum : 5896c904 - correct
Events : 667118
Layout : left-symmetric
Chunk Size : 256K
Device Role : Active device 1
Array State : AAAAAAAAAA. ('A' == active, '.' == missing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
Name : server:6 (local to host server)
Creation Time : Sat Apr 23 06:22:23 2011
Raid Level : raid6
Raid Devices : 11
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : ee4ac68b:2152463c:b0d72a12:4da24489
Update Time : Sun Nov 2 10:23:32 2014
Checksum : 59d06a2 - correct
Events : 667118
Layout : left-symmetric
Chunk Size : 256K
Device Role : Active device 3
Array State : AAAAAAAAAA. ('A' == active, '.' == missing)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
Name : server:6 (local to host server)
Creation Time : Sat Apr 23 06:22:23 2011
Raid Level : raid6
Raid Devices : 11
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 72ee0230:51b42c7a:3327c930:302be14e
Update Time : Sun Nov 2 08:35:01 2014
Checksum : cbfacb4a - correct
Events : 667100
Layout : left-symmetric
Chunk Size : 256K
Device Role : Active device 10
Array State : .AAAAAAAAAA ('A' == active, '.' == missing)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
Name : server:6 (local to host server)
Creation Time : Sat Apr 23 06:22:23 2011
Raid Level : raid6
Raid Devices : 11
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 429cfff7:ecadc967:40f73261:bef9656e
Update Time : Fri Nov 7 00:37:26 2014
Checksum : d17f38ee - correct
Events : 667126
Layout : left-symmetric
Chunk Size : 256K
Device Role : Active device 9
Array State : A.A.AA.AAA. ('A' == active, '.' == missing)
/dev/sdi1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
Name : server:6 (local to host server)
Creation Time : Sat Apr 23 06:22:23 2011
Raid Level : raid6
Raid Devices : 11
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 6dea792a:f1117c0c:ac16951c:a8b61783
Update Time : Fri Nov 7 00:37:26 2014
Checksum : 78bfc76c - correct
Events : 667126
Layout : left-symmetric
Chunk Size : 256K
Device Role : Active device 4
Array State : A.A.AA.AAA. ('A' == active, '.' == missing)
/dev/sdj1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
Name : server:6 (local to host server)
Creation Time : Sat Apr 23 06:22:23 2011
Raid Level : raid6
Raid Devices : 11
Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 4b37d852:2236e8e6:15c52c77:4214f7de
Update Time : Fri Nov 7 00:37:26 2014
Checksum : 32014484 - correct
Events : 667126
Layout : left-symmetric
Chunk Size : 256K
Device Role : Active device 7
Array State : A.A.AA.AAA. ('A' == active, '.' == missing)
/dev/sdk1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
Name : server:6 (local to host server)
Creation Time : Sat Apr 23 06:22:23 2011
Raid Level : raid6
Raid Devices : 11
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : aa149905:9cd207c4:4bb4c244:3f502348
Update Time : Fri Nov 7 00:37:26 2014
Checksum : f8a3e98f - correct
Events : 667126
Layout : left-symmetric
Chunk Size : 256K
Device Role : Active device 8
Array State : A.A.AA.AAA. ('A' == active, '.' == missing)
/dev/sdl1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
Name : server:6 (local to host server)
Creation Time : Sat Apr 23 06:22:23 2011
Raid Level : raid6
Raid Devices : 11
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 59a2393b:27209cc2:1f6fa576:5ed6e2a7
Update Time : Fri Nov 7 00:37:26 2014
Checksum : be7b7d99 - correct
Events : 667126
Layout : left-symmetric
Chunk Size : 256K
Device Role : Active device 5
Array State : A.A.AA.AAA. ('A' == active, '.' == missing)
> There is something strange: what you report being the output of
> '--detail' from July:
>
> Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
> Used Dev Size : 1953512192 (1863.01 GiB 2000.40 GB)
>
> and the output of '--detail' for the re-created:
>
> Array Size : 17580439296 (16766.01 GiB 18002.37 GB)
> Used Dev Size : 1953382144 (1862.89 GiB 2000.26 GB)
>
> Both numbers don't match. They are*slightly* different. In
> particular it is rather strange that the "Used Dev Size" is
> different. How is that possible? Have the disks shrunk a little
> in the meantime?
Peter, that is an excellent observation! Indeed, the above -E/--examine
data confirms that some disks have a 272 offset, while most others have
a 2048 offset, for example:
/dev/sdj1:
Data Offset : 272 sectors
Super Offset : 8 sectors
/dev/sdk1:
Data Offset: 2048 sectors
Super Offset: 8 sectors
The current breakdown:
# grep "272 sectors" mdadm.e.bak | wc -l
2
# grep "2048 sectors" mdadm.e.bak | wc -l
9
Therefore, based on the backup -E/--examine data, two out of the 11
total disks have an offset of 272, while the remaining nine are using 2048.
Could this explain the discrepancy you observed?
For the record, every disk is GUID GPT partitioned, with the same sector
size for all partitions. All partitions are identical in sector size,
regardless of the Seagate HDD disk model.
Here is a sample of the partition data:
# parted /dev/sdb unit s print | grep -A1 Number
Number Start End Size File system Name Flags
1 2048s 3907028991s 3907026944s primary raid
# parted /dev/sdc unit s print | grep -A1 Number
Number Start End Size File system Name Flags
1 2048s 3907028991s 3907026944s primary raid
# parted /dev/sdd unit s print | grep -A1 Number
Number Start End Size File system Name Flags
1 2048s 3907028991s 3907026944s primary raid
# parted /dev/sde unit s print | grep -A1 Number
Number Start End Size File system Name Flags
1 2048s 3907028991s 3907026944s primary raid
# parted /dev/sdf unit s print | grep -A1 Number
Number Start End Size File system Name Flags
1 2048s 3907028991s 3907026944s primary raid
# parted /dev/sdg unit s print | grep -A1 Number
Number Start End Size File system Name Flags
1 2048s 3907028991s 3907026944s primary raid
# parted /dev/sdh unit s print | grep -A1 Number
Number Start End Size File system Name Flags
1 2048s 3907028991s 3907026944s ntfs primary raid
# parted /dev/sdi unit s print | grep -A1 Number
Number Start End Size File system Name Flags
1 2048s 3907028991s 3907026944s primary raid
# parted /dev/sdj unit s print | grep -A1 Number
Number Start End Size File system Name Flags
1 2048s 3907028991s 3907026944s primary raid
# parted /dev/sdk unit s print | grep -A1 Number
Number Start End Size File system Name Flags
1 2048s 3907028991s 3907026944s primary raid
# parted /dev/sdl unit s print | grep -A1 Number
Number Start End Size File system Name Flags
1 2048s 3907028991s 3907026944s primary raid
My only explanation is that the cause of this offset discrepancy may
have something to do with the age of the array. The array had an
original creation time of year 2011.
This server was originally running Ubuntu 10.04 LTS (I believe) before
being eventually upgraded to 12.04 LTS--although the server has been
running healthy on 12.04 LTS for several years without issue(s).
If memory serves, the older version of mdadm that shipped with 10.04 LTS
did a myriad of things differently regarding the location of the
superblock(s), offset(s), etc. but I can't say for sure. Did older mdadm
builds on 12.04 LTS ever use offsets of 272, rather than 2048?
Perhaps Neil could comment? :-)
I do hope that supplying the -E/--examine information will be useful to
you all. What's the next step?
Thank you for all your efforts and for your keen eyes.
-xar
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RAID 6 (containing LUKS dm-crypt) recovery help.
2014-11-07 11:40 ` xar
@ 2014-11-07 14:22 ` Peter Grandi
2014-11-12 11:01 ` Peter Grandi
0 siblings, 1 reply; 5+ messages in thread
From: Peter Grandi @ 2014-11-07 14:22 UTC (permalink / raw)
To: Linux RAID
[ ... ]
>> But assuming the above is somewhat reliable there is an
>> "interesting" situation: in "21488638704 blocks" the number
>> 21,488,638,704 is not a whole multiple of 9:
>>
>> $ factor 21488638704
>> 21488638704: 2 2 2 2 3 13 1801 19121
[ ... ]
>> If you went to the lengths to write 'dd' expressions, you
>> might as well have saved the output of '--examine'. Perhaps
>> you did, but if you did not attach that output to your
>> request for help it would be rather "stunning".
[ ... ]
>> Both numbers don't match. They are *slightly* different. In
>> particular it is rather strange that the "Used Dev Size" is
>> different. How is that possible? Have the disks shrunk a
>> little in the meantime? :-)
> Yes, I saved the -E/--examine information, "just in case". :-)
But without sending it with your request for help: rather
"stunning".
> [ ... ] some disks have a 272 offset, while most others have a
> 2048 offset, [ ... ] Did older mdadm builds on 12.04 LTS ever
> use offsets of 272, rather than 2048?
Given what '--examine' reports exactly where this came from does
not matter a lot. What matters is this note (in a well written
page!):
https://raid.wiki.kernel.org/index.php/RAID_Recovery
«Recreating an array
When an array is created, superblocks are written to the drive
and according to the defaults of mdadm, a certain area of the
drive is now considered "data area".
The data areas (that might or might not be correct) are not
written to, *provided* the array is created in degraded mode;
that is with a 'missing' device.
If the wrong superblock version is chosen, wrong data offset
(internal default value which has changed over time in mdadm),
chunk size (also value that has changed over time), then the
data area will not match what was previously on the drives.
The md superblock might have overwritten part of your data.
Use with caution!»
You can try to recreate the MD set superblocks specifying the
right data offset and member size for each member, as given by
the '--examine' outputs. Google knows how...
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RAID 6 (containing LUKS dm-crypt) recovery help.
2014-11-07 14:22 ` Peter Grandi
@ 2014-11-12 11:01 ` Peter Grandi
0 siblings, 0 replies; 5+ messages in thread
From: Peter Grandi @ 2014-11-12 11:01 UTC (permalink / raw)
To: Linux RAID
> You can try to recreate the MD set superblocks specifying the
> right data offset and member size for each member, as given by
> the '--examine' outputs. Google knows how...
Just in case, this describes a very similar case:
http://www.spinics.net/lists/raid/msg39353.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-11-12 11:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-07 5:46 RAID 6 (containing LUKS dm-crypt) recovery help xar
2014-11-07 10:24 ` Peter Grandi
2014-11-07 11:40 ` xar
2014-11-07 14:22 ` Peter Grandi
2014-11-12 11:01 ` Peter Grandi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox