From: Phil Turmel <philip@turmel.org>
To: "\"Großkreutz, Julian\"" <Julian.Grosskreutz@med.uni-jena.de>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Cc: "neilb@suse.de" <neilb@suse.de>
Subject: Re: mdadm expanded 8 disk raid 6 fails in new server, 5 original devices show no md superblock
Date: Sat, 11 Jan 2014 12:47:33 -0500 [thread overview]
Message-ID: <52D183B5.3060006@turmel.org> (raw)
In-Reply-To: <1389422546.11328.15.camel@achilles.aeskuladis.de>
Hi Julian,
Very good report. I think we can help.
On 01/11/2014 01:42 AM, Großkreutz, Julian wrote:
> Dear all, dear Neil (thanks for pointing me to this list),
>
> I am in desperate need of help. mdadm is fantastic work, and I have
> relied on mdadm for years to run very stable server systems, never had
> major problems I could not solve.
>
> This time its different:
>
> On a Centos 6.x (can't remember) initially in 2012:
>
> parted to create GPT partitions on 5 Seagate drives 3TB each
>
> Model: ATA ST3000DM001-9YN1 (scsi)
> Disk /dev/sda: 5860533168s # sd[bcde] identical
> Sector size (logical/physical): 512B/4096B
> Partition Table: gpt
>
> Number Start End Size File system Name Flags
> 1 2048s 1953791s 1951744s ext4 boot
> 2 1955840s 5860532223s 5858576384s primary raid
Ok.
Please also show the partition tables for the /dev/sd[fgh].
> I used an unknown mdadm version including unknown offset parameters for
> 4k alignment to create
>
> /dev/sd[abcde]1 as /dev/md0 raid 1 for booting (1 GB)
> /dev/sd[abcde]2 as /dev/md1 raid 6 for data (9 TB) lvm physical drive
>
> Later added 3 more 3T identical Seagate drives with identical partition
> layout, but later firmware.
>
> Using likely a different newer version of mdadm I expanded RAID 6 by 2
> drives and added 1 spare.
>
> /dev/md1 was at 15 TB gross, 13 TB usable, expanded pv
>
> Ran fine
Ok. Your evidence below has some evidence suggesting you created the
larger array from scratch instead of using --grow. Do you remember?
> Then I moved the 8 disks to a new server with an hba and backplane,
> array did not start because mdadm did not find the superblocks on the
> original 5 devices /dev/sd[abcde]2. Moving the disks back to the old
> server the error did not vanish. Using a centos 6.3 livecd, I got the
> following:
>
> [root@livecd ~]# mdadm -Evvvvs /dev/sd[abcdefgh]2
> mdadm: No md superblock detected on /dev/sda2.
> mdadm: No md superblock detected on /dev/sdb2.
> mdadm: No md superblock detected on /dev/sdc2.
> mdadm: No md superblock detected on /dev/sdd2.
> mdadm: No md superblock detected on /dev/sde2.
>
> /dev/sdf2:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
> Name : 1
> Creation Time : Wed Jul 31 18:24:38 2013
Note this creation time... would have been 2012 if you had used --grow.
> Raid Level : raid6
> Raid Devices : 7
>
> Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
> Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
> Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
This used dev size is very odd. The unused space after the data area is
1155584 sectors (>500MiB).
> Data Offset : 262144 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : d5a16cb2:ff41b9a5:cbbf12b7:3750026d
>
> Update Time : Mon Dec 16 01:16:26 2013
> Checksum : ee921c43 - correct
> Events : 327
>
> Layout : left-symmetric
> Chunk Size : 256K
>
> Device Role : Active device 5
> Array State : A.AAAAA ('A' == active, '.' == missing)
>
> /dev/sdg2:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
> Name : 1
> Creation Time : Wed Jul 31 18:24:38 2013
> Raid Level : raid6
> Raid Devices : 7
>
> Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
> Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
> Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
> Data Offset : 262144 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : a1e1e51b:d8912985:e51207a9:1d718292
>
> Update Time : Mon Dec 16 01:16:26 2013
> Checksum : 4ef01fe9 - correct
> Events : 327
>
> Layout : left-symmetric
> Chunk Size : 256K
>
> Device Role : Active device 6
> Array State : A.AAAAA ('A' == active, '.' == missing)
>
> /dev/sdh2:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
> Name : 1
> Creation Time : Wed Jul 31 18:24:38 2013
> Raid Level : raid6
> Raid Devices : 7
>
> Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
> Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
> Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
> Data Offset : 262144 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 030cb9a7:76a48b3c:b3448369:fcf013e1
>
> Update Time : Mon Dec 16 01:16:26 2013
> Checksum : a1330e97 - correct
> Events : 327
>
> Layout : left-symmetric
> Chunk Size : 256K
>
> Device Role : spare
> Array State : A.AAAAA ('A' == active, '.' == missing)
>
>
> I suspect that the superblock of the original 5 devices is at a
> different location, possibly because they where created with a different
> mdadm version, i.e. at the end of the partitions. Booting the drives
> with the hba in IT (non-raid) mode on the new server may have introduced
> an initialization on the first five drive at the end of the partitions
> because I can hexdump something with "EFI PART" in the last 64 kb in all
> 8 partitions used for the raid 6, which may not have affected the 3
> added drives which show metadata 1.2.
The "EFI PART" is part of the backup copy of the GPT. All the drives in
a working array will have the same metadata version (superblock
location) even if the data offsets are different.
I would suggest hexdumping entire devices looking for the MD superblock
magic value, which will always be at the start of a 4k-aligned block.
Show (will take a long time, even with the big block size):
for x in /dev/sd[a-e]2 ; echo -e "\nDevice $x" ; dd if=$x bs=1M |hexdump
-C |grep "000 fc 4e 2b a9" ; done
For any candidates found, hexdump the whole 4k block for us.
> If any of You can help me sort this I would greatly appreciate it. I
> guess I need the mdadm version where I can set the data offset
> differently for each device, but it doesn't compile with an error in
> sha1.c:
>
> sha1.h:29:22: Fehler: ansidecl.h: Datei oder Verzeichnis nicht gefunden
> (didn't find ansidecl.h, error in German)
You probably need some *-dev packages. I don't use the RHEL platform,
so I'm not sure what you'd need. In the ubuntu world, it'd be the
"build-essentials" meta-package.
> What would be the best way to proceed? There is critical data on this
> raid, not fully backed up.
>
> (UPD'T)
>
> Thanks for getting back.
>
> Yes, it's bad, I know, also tweaking without keeping exact records of
> versions and offsets.
>
> I am, however, rather sure that nothing was written to the disks when I
> plugged them into the NEW server, unless starting up a live cd causes an
> automatic assemble attempt with an update to the superblocks. That I
> cannot exclude.
>
> What I did so far w/o writing to the disks
>
> get non-00 data at the beginning of sda2:
>
> dd if=/dev/sda skip=1955840 bs=512 count=10 | hexdump -C | grep [^00]
FWIW, you could have combined "if=/dev/sda skip=1955840" into
"if=/dev/sda2" . . . :-)
> gives me
>
> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> |................|
> *
> 00001000 1e b5 54 51 20 4c 56 4d 32 20 78 5b 35 41 25 72 |..TQ LVM2
> x[5A%r|
> 00001010 30 4e 2a 3e 01 00 00 00 00 10 00 00 00 00 00 00 |
> 0N*>............|
> 00001020 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
> |................|
> 00001030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> |................|
> *
> 00001200 76 67 5f 6e 65 64 69 67 73 30 32 20 7b 0a 69 64 |vg_nedigs02
> {.id|
> 00001210 20 3d 20 22 32 4c 62 48 71 64 2d 72 67 42 74 2d | =
> "2LbHqd-rgBt-|
> 00001220 45 4a 75 31 2d 32 52 36 31 2d 41 35 7a 74 2d 6e |
> EJu1-2R61-A5zt-n|
> 00001230 49 58 53 2d 66 79 4f 36 33 73 22 0a 73 65 71 6e |
> IXS-fyO63s".seqn|
> 00001240 6f 20 3d 20 37 0a 66 6f 72 6d 61 74 20 3d 20 22 |o =
> 7.format = "|
> 00001250 6c 76 6d 32 22 20 23 20 69 6e 66 6f 72 6d 61 74 |lvm2" #
> informat|
> (cont'd)
This implies that /dev/sda2 is the first device in a raid5/6 that uses
metadata 0.9 or 1.0. You've found the LVM PV signature, which starts at
4k into a PV. Theoretically, this could be a stray, abandoned signature
from the original array, with the real LVM signature at the 262144
offset. Show:
dd if=/dev/sda2 skip=262144 count=16 |hexdump -C
>
> but on /dev/sdb
>
> 00000000 5f 80 00 00 5f 80 01 00 5f 80 02 00 5f 80 03 00 |
> _..._..._..._...|
> 00000010 5f 80 04 00 5f 80 0c 00 5f 80 0d 00 00 00 00 00 |
> _..._..._.......|
> 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> |................|
> *
> 00001000 60 80 00 00 60 80 01 00 60 80 02 00 60 80 03 00 |
> `...`...`...`...|
> 00001010 60 80 04 00 60 80 0c 00 60 80 0d 00 00 00 00 00 |
> `...`...`.......|
> 00001020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> |................|
> *
> 00001400
>
> so my initial guess that the data may start at 00001000 did not pan out.
No, but with parity raid scattering data amongst the participating
devices, the report on /dev/sdb2 is expected.
> Does anybody have an idea of how to reliably identify an mdadm
> superblock in a hexdump of the drive ?
Above.
> And second, have I got my numbers right ? In parted I see the block
> count, and when I multiply 512 (not 4096!) with the total count I get 3
> TB, so I think I have to use bs=512 in dd to get teh partition
> boundaries correct.
dd uses bs=512 as the default. And it can access the partitions directly.
> As for the last state: one drive was set faulty, apparently, but the
> spare had not been integrated. I may have gotten caught in a bug
> described by Neil Brown, where on shutdown disk were wrongly reported,
> and subsequently superblock information was overwritten.
Possible. If so, you may not find any superblocks with the grep above.
> I don't have NAS/SAN storage space to make identical copies of 5x3 TB,
> but maybe I should buy 5 more disks and do a dd mirror so I have a
> backup of the current state.
We can do some more non-destructive investigation first.
Regards,
Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-01-11 17:47 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-11 6:42 mdadm expanded 8 disk raid 6 fails in new server, 5 original devices show no md superblock Großkreutz, Julian
2014-01-11 17:47 ` Phil Turmel [this message]
[not found] ` <1389632980.11328.104.camel@achilles.aeskuladis.de>
2014-01-13 18:42 ` Phil Turmel
2014-01-13 20:11 ` Chris Murphy
2014-01-14 10:31 ` Großkreutz, Julian
2014-01-14 13:14 ` Phil Turmel
2014-01-14 14:00 ` AW: " Großkreutz, Julian
2014-01-14 17:47 ` Wilson Jonathan
2014-01-14 18:43 ` Phil Turmel
2014-01-15 12:50 ` Wilson Jonathan
2014-01-15 13:35 ` Phil Turmel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52D183B5.3060006@turmel.org \
--to=philip@turmel.org \
--cc=Julian.Grosskreutz@med.uni-jena.de \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).