From: Phil Turmel <philip@turmel.org>
To: "\"Großkreutz, Julian\"" <Julian.Grosskreutz@med.uni-jena.de>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Cc: "neilb@suse.de" <neilb@suse.de>
Subject: Re: mdadm expanded 8 disk raid 6 fails in new server, 5 original devices show no md superblock
Date: Sat, 11 Jan 2014 12:47:33 -0500 [thread overview]
Message-ID: <52D183B5.3060006@turmel.org> (raw)
In-Reply-To: <1389422546.11328.15.camel@achilles.aeskuladis.de>
Hi Julian,
Very good report. I think we can help.
On 01/11/2014 01:42 AM, Großkreutz, Julian wrote:
> Dear all, dear Neil (thanks for pointing me to this list),
>
> I am in desperate need of help. mdadm is fantastic work, and I have
> relied on mdadm for years to run very stable server systems, never had
> major problems I could not solve.
>
> This time its different:
>
> On a Centos 6.x (can't remember) initially in 2012:
>
> parted to create GPT partitions on 5 Seagate drives 3TB each
>
> Model: ATA ST3000DM001-9YN1 (scsi)
> Disk /dev/sda: 5860533168s # sd[bcde] identical
> Sector size (logical/physical): 512B/4096B
> Partition Table: gpt
>
> Number Start End Size File system Name Flags
> 1 2048s 1953791s 1951744s ext4 boot
> 2 1955840s 5860532223s 5858576384s primary raid
Ok.
Please also show the partition tables for the /dev/sd[fgh].
> I used an unknown mdadm version including unknown offset parameters for
> 4k alignment to create
>
> /dev/sd[abcde]1 as /dev/md0 raid 1 for booting (1 GB)
> /dev/sd[abcde]2 as /dev/md1 raid 6 for data (9 TB) lvm physical drive
>
> Later added 3 more 3T identical Seagate drives with identical partition
> layout, but later firmware.
>
> Using likely a different newer version of mdadm I expanded RAID 6 by 2
> drives and added 1 spare.
>
> /dev/md1 was at 15 TB gross, 13 TB usable, expanded pv
>
> Ran fine
Ok. Your evidence below has some evidence suggesting you created the
larger array from scratch instead of using --grow. Do you remember?
> Then I moved the 8 disks to a new server with an hba and backplane,
> array did not start because mdadm did not find the superblocks on the
> original 5 devices /dev/sd[abcde]2. Moving the disks back to the old
> server the error did not vanish. Using a centos 6.3 livecd, I got the
> following:
>
> [root@livecd ~]# mdadm -Evvvvs /dev/sd[abcdefgh]2
> mdadm: No md superblock detected on /dev/sda2.
> mdadm: No md superblock detected on /dev/sdb2.
> mdadm: No md superblock detected on /dev/sdc2.
> mdadm: No md superblock detected on /dev/sdd2.
> mdadm: No md superblock detected on /dev/sde2.
>
> /dev/sdf2:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
> Name : 1
> Creation Time : Wed Jul 31 18:24:38 2013
Note this creation time... would have been 2012 if you had used --grow.
> Raid Level : raid6
> Raid Devices : 7
>
> Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
> Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
> Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
This used dev size is very odd. The unused space after the data area is
1155584 sectors (>500MiB).
> Data Offset : 262144 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : d5a16cb2:ff41b9a5:cbbf12b7:3750026d
>
> Update Time : Mon Dec 16 01:16:26 2013
> Checksum : ee921c43 - correct
> Events : 327
>
> Layout : left-symmetric
> Chunk Size : 256K
>
> Device Role : Active device 5
> Array State : A.AAAAA ('A' == active, '.' == missing)
>
> /dev/sdg2:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
> Name : 1
> Creation Time : Wed Jul 31 18:24:38 2013
> Raid Level : raid6
> Raid Devices : 7
>
> Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
> Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
> Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
> Data Offset : 262144 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : a1e1e51b:d8912985:e51207a9:1d718292
>
> Update Time : Mon Dec 16 01:16:26 2013
> Checksum : 4ef01fe9 - correct
> Events : 327
>
> Layout : left-symmetric
> Chunk Size : 256K
>
> Device Role : Active device 6
> Array State : A.AAAAA ('A' == active, '.' == missing)
>
> /dev/sdh2:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
> Name : 1
> Creation Time : Wed Jul 31 18:24:38 2013
> Raid Level : raid6
> Raid Devices : 7
>
> Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
> Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
> Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
> Data Offset : 262144 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 030cb9a7:76a48b3c:b3448369:fcf013e1
>
> Update Time : Mon Dec 16 01:16:26 2013
> Checksum : a1330e97 - correct
> Events : 327
>
> Layout : left-symmetric
> Chunk Size : 256K
>
> Device Role : spare
> Array State : A.AAAAA ('A' == active, '.' == missing)
>
>
> I suspect that the superblock of the original 5 devices is at a
> different location, possibly because they where created with a different
> mdadm version, i.e. at the end of the partitions. Booting the drives
> with the hba in IT (non-raid) mode on the new server may have introduced
> an initialization on the first five drive at the end of the partitions
> because I can hexdump something with "EFI PART" in the last 64 kb in all
> 8 partitions used for the raid 6, which may not have affected the 3
> added drives which show metadata 1.2.
The "EFI PART" is part of the backup copy of the GPT. All the drives in
a working array will have the same metadata version (superblock
location) even if the data offsets are different.
I would suggest hexdumping entire devices looking for the MD superblock
magic value, which will always be at the start of a 4k-aligned block.
Show (will take a long time, even with the big block size):
for x in /dev/sd[a-e]2 ; echo -e "\nDevice $x" ; dd if=$x bs=1M |hexdump
-C |grep "000 fc 4e 2b a9" ; done
For any candidates found, hexdump the whole 4k block for us.
> If any of You can help me sort this I would greatly appreciate it. I
> guess I need the mdadm version where I can set the data offset
> differently for each device, but it doesn't compile with an error in
> sha1.c:
>
> sha1.h:29:22: Fehler: ansidecl.h: Datei oder Verzeichnis nicht gefunden
> (didn't find ansidecl.h, error in German)
You probably need some *-dev packages. I don't use the RHEL platform,
so I'm not sure what you'd need. In the ubuntu world, it'd be the
"build-essentials" meta-package.
> What would be the best way to proceed? There is critical data on this
> raid, not fully backed up.
>
> (UPD'T)
>
> Thanks for getting back.
>
> Yes, it's bad, I know, also tweaking without keeping exact records of
> versions and offsets.
>
> I am, however, rather sure that nothing was written to the disks when I
> plugged them into the NEW server, unless starting up a live cd causes an
> automatic assemble attempt with an update to the superblocks. That I
> cannot exclude.
>
> What I did so far w/o writing to the disks
>
> get non-00 data at the beginning of sda2:
>
> dd if=/dev/sda skip=1955840 bs=512 count=10 | hexdump -C | grep [^00]
FWIW, you could have combined "if=/dev/sda skip=1955840" into
"if=/dev/sda2" . . . :-)
> gives me
>
> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> |................|
> *
> 00001000 1e b5 54 51 20 4c 56 4d 32 20 78 5b 35 41 25 72 |..TQ LVM2
> x[5A%r|
> 00001010 30 4e 2a 3e 01 00 00 00 00 10 00 00 00 00 00 00 |
> 0N*>............|
> 00001020 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
> |................|
> 00001030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> |................|
> *
> 00001200 76 67 5f 6e 65 64 69 67 73 30 32 20 7b 0a 69 64 |vg_nedigs02
> {.id|
> 00001210 20 3d 20 22 32 4c 62 48 71 64 2d 72 67 42 74 2d | =
> "2LbHqd-rgBt-|
> 00001220 45 4a 75 31 2d 32 52 36 31 2d 41 35 7a 74 2d 6e |
> EJu1-2R61-A5zt-n|
> 00001230 49 58 53 2d 66 79 4f 36 33 73 22 0a 73 65 71 6e |
> IXS-fyO63s".seqn|
> 00001240 6f 20 3d 20 37 0a 66 6f 72 6d 61 74 20 3d 20 22 |o =
> 7.format = "|
> 00001250 6c 76 6d 32 22 20 23 20 69 6e 66 6f 72 6d 61 74 |lvm2" #
> informat|
> (cont'd)
This implies that /dev/sda2 is the first device in a raid5/6 that uses
metadata 0.9 or 1.0. You've found the LVM PV signature, which starts at
4k into a PV. Theoretically, this could be a stray, abandoned signature
from the original array, with the real LVM signature at the 262144
offset. Show:
dd if=/dev/sda2 skip=262144 count=16 |hexdump -C
>
> but on /dev/sdb
>
> 00000000 5f 80 00 00 5f 80 01 00 5f 80 02 00 5f 80 03 00 |
> _..._..._..._...|
> 00000010 5f 80 04 00 5f 80 0c 00 5f 80 0d 00 00 00 00 00 |
> _..._..._.......|
> 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> |................|
> *
> 00001000 60 80 00 00 60 80 01 00 60 80 02 00 60 80 03 00 |
> `...`...`...`...|
> 00001010 60 80 04 00 60 80 0c 00 60 80 0d 00 00 00 00 00 |
> `...`...`.......|
> 00001020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> |................|
> *
> 00001400
>
> so my initial guess that the data may start at 00001000 did not pan out.
No, but with parity raid scattering data amongst the participating
devices, the report on /dev/sdb2 is expected.
> Does anybody have an idea of how to reliably identify an mdadm
> superblock in a hexdump of the drive ?
Above.
> And second, have I got my numbers right ? In parted I see the block
> count, and when I multiply 512 (not 4096!) with the total count I get 3
> TB, so I think I have to use bs=512 in dd to get teh partition
> boundaries correct.
dd uses bs=512 as the default. And it can access the partitions directly.
> As for the last state: one drive was set faulty, apparently, but the
> spare had not been integrated. I may have gotten caught in a bug
> described by Neil Brown, where on shutdown disk were wrongly reported,
> and subsequently superblock information was overwritten.
Possible. If so, you may not find any superblocks with the grep above.
> I don't have NAS/SAN storage space to make identical copies of 5x3 TB,
> but maybe I should buy 5 more disks and do a dd mirror so I have a
> backup of the current state.
We can do some more non-destructive investigation first.
Regards,
Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-01-11 17:47 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-11 6:42 mdadm expanded 8 disk raid 6 fails in new server, 5 original devices show no md superblock Großkreutz, Julian
2014-01-11 17:47 ` Phil Turmel [this message]
[not found] ` <1389632980.11328.104.camel@achilles.aeskuladis.de>
2014-01-13 18:42 ` Phil Turmel
2014-01-13 20:11 ` Chris Murphy
2014-01-14 10:31 ` Großkreutz, Julian
2014-01-14 13:14 ` Phil Turmel
2014-01-14 14:00 ` AW: " Großkreutz, Julian
2014-01-14 17:47 ` Wilson Jonathan
2014-01-14 18:43 ` Phil Turmel
2014-01-15 12:50 ` Wilson Jonathan
2014-01-15 13:35 ` Phil Turmel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52D183B5.3060006@turmel.org \
--to=philip@turmel.org \
--cc=Julian.Grosskreutz@med.uni-jena.de \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.