Re: mdadm expanded 8 disk raid 6 fails in new server, 5 original devices show no md superblock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Phil Turmel <philip@turmel.org>
To: "\"Großkreutz, Julian\"" <Julian.Grosskreutz@med.uni-jena.de>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Cc: "neilb@suse.de" <neilb@suse.de>
Subject: Re: mdadm expanded 8 disk raid 6 fails in new server, 5 original devices show no md superblock
Date: Sat, 11 Jan 2014 12:47:33 -0500	[thread overview]
Message-ID: <52D183B5.3060006@turmel.org> (raw)
In-Reply-To: <1389422546.11328.15.camel@achilles.aeskuladis.de>

Hi Julian,

Very good report.  I think we can help.

On 01/11/2014 01:42 AM, Großkreutz, Julian wrote:
> Dear all, dear Neil (thanks for pointing me to this list),
> 
> I am in desperate need of help. mdadm is fantastic work, and I have
> relied on mdadm for years to run very stable server systems, never had
> major problems I could not solve.
> 
> This time its different:
> 
> On a Centos 6.x (can't remember) initially in 2012:
> 
> parted to create GPT partitions on 5 Seagate drives 3TB each
> 
> Model: ATA ST3000DM001-9YN1 (scsi)
> Disk /dev/sda: 5860533168s  # sd[bcde] identical
> Sector size (logical/physical): 512B/4096B
> Partition Table: gpt
> 
> Number  Start     End          Size         File system  Name     Flags
> 1      2048s     1953791s     1951744s     ext4                  boot
> 2      1955840s  5860532223s  5858576384s               primary  raid

Ok.

Please also show the partition tables for the /dev/sd[fgh].

> I used an unknown mdadm version including unknown offset parameters for
> 4k alignment to create
> 
> /dev/sd[abcde]1 as /dev/md0 raid 1 for booting (1 GB)
> /dev/sd[abcde]2 as /dev/md1 raid 6 for data (9 TB) lvm physical drive
> 
> Later added 3 more 3T identical Seagate drives with identical partition
> layout, but later firmware.
> 
> Using likely a different newer version of mdadm I expanded RAID 6 by 2
> drives and added 1 spare.
> 
> /dev/md1 was at 15 TB gross, 13 TB usable, expanded pv
> 
> Ran fine

Ok.  Your evidence below has some evidence suggesting you created the
larger array from scratch instead of using --grow.  Do you remember?

> Then I moved the 8 disks to a new server with an hba and backplane,
> array did not start because mdadm did not find the superblocks on the
> original 5 devices /dev/sd[abcde]2. Moving the disks back to the old
> server the error did not vanish. Using a centos 6.3 livecd, I got the
> following:
> 
> [root@livecd ~]# mdadm -Evvvvs /dev/sd[abcdefgh]2
> mdadm: No md superblock detected on /dev/sda2.
> mdadm: No md superblock detected on /dev/sdb2.
> mdadm: No md superblock detected on /dev/sdc2.
> mdadm: No md superblock detected on /dev/sdd2.
> mdadm: No md superblock detected on /dev/sde2.
> 
> /dev/sdf2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
>                Name : 1
>       Creation Time : Wed Jul 31 18:24:38 2013

Note this creation time...  would have been 2012 if you had used --grow.

>          Raid Level : raid6
>        Raid Devices : 7
> 
>      Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
>          Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
>       Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)

This used dev size is very odd.  The unused space after the data area is
1155584 sectors (>500MiB).

>         Data Offset : 262144 sectors
>        Super Offset : 8 sectors
>               State : active
>         Device UUID : d5a16cb2:ff41b9a5:cbbf12b7:3750026d
> 
>         Update Time : Mon Dec 16 01:16:26 2013
>            Checksum : ee921c43 - correct
>              Events : 327
> 
>              Layout : left-symmetric
>          Chunk Size : 256K
> 
>       Device Role : Active device 5
>       Array State : A.AAAAA ('A' == active, '.' == missing)
> 
> /dev/sdg2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
>                Name : 1
>       Creation Time : Wed Jul 31 18:24:38 2013
>          Raid Level : raid6
>        Raid Devices : 7
> 
>      Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
>          Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
>       Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
>         Data Offset : 262144 sectors
>        Super Offset : 8 sectors
>               State : active
>         Device UUID : a1e1e51b:d8912985:e51207a9:1d718292
> 
>         Update Time : Mon Dec 16 01:16:26 2013
>            Checksum : 4ef01fe9 - correct
>              Events : 327
> 
>              Layout : left-symmetric
>          Chunk Size : 256K
> 
>         Device Role : Active device 6
>         Array State : A.AAAAA ('A' == active, '.' == missing)
> 
> /dev/sdh2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
>                Name : 1
>       Creation Time : Wed Jul 31 18:24:38 2013
>          Raid Level : raid6
>        Raid Devices : 7
> 
>      Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
>          Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
>       Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
>         Data Offset : 262144 sectors
>        Super Offset : 8 sectors
>               State : active
>         Device UUID : 030cb9a7:76a48b3c:b3448369:fcf013e1
> 
>         Update Time : Mon Dec 16 01:16:26 2013
>            Checksum : a1330e97 - correct
>              Events : 327
> 
>              Layout : left-symmetric
>          Chunk Size : 256K
> 
>        Device Role : spare
>        Array State : A.AAAAA ('A' == active, '.' == missing)
> 
> 
> I suspect that the superblock of the original 5 devices is at a
> different location, possibly because they where created with a different
> mdadm version, i.e. at the end of the partitions. Booting the drives
> with the hba in IT (non-raid) mode on the new server may have introduced
> an initialization on the first five drive at the end of the partitions
> because I can hexdump something with "EFI PART" in the last 64 kb in all
> 8 partitions used for the raid 6, which may not have affected the 3
> added drives which show metadata 1.2.

The "EFI PART" is part of the backup copy of the GPT.  All the drives in
a working array will have the same metadata version (superblock
location) even if the data offsets are different.

I would suggest hexdumping entire devices looking for the MD superblock
magic value, which will always be at the start of a 4k-aligned block.

Show (will take a long time, even with the big block size):

for x in /dev/sd[a-e]2 ; echo -e "\nDevice $x" ; dd if=$x bs=1M |hexdump
-C |grep "000  fc 4e 2b a9" ; done

For any candidates found, hexdump the whole 4k block for us.

> If any of You can help me sort this I would greatly appreciate it. I
> guess I need the mdadm version where I can set the data offset
> differently for each device, but it doesn't compile with an error in
> sha1.c:
> 
> sha1.h:29:22: Fehler: ansidecl.h: Datei oder Verzeichnis nicht gefunden
> (didn't find ansidecl.h, error in German)

You probably need some *-dev packages.  I don't use the RHEL platform,
so I'm not sure what you'd need.  In the ubuntu world, it'd be the
"build-essentials" meta-package.

> What would be the best way to proceed? There is critical data on this
> raid, not fully backed up.
> 
> (UPD'T)
> 
> Thanks for getting back.
> 
> Yes, it's bad, I know, also tweaking without keeping exact records of
> versions and offsets.
> 
> I am, however, rather sure that nothing was written to the disks when I
> plugged them into the NEW server, unless starting up a live cd causes an
> automatic assemble attempt with an update to the superblocks. That I
> cannot exclude.
> 
> What I did so far w/o writing to the disks
> 
> get non-00 data at the beginning of sda2:
> 
> dd if=/dev/sda skip=1955840 bs=512 count=10 | hexdump -C | grep [^00]

FWIW, you could have combined "if=/dev/sda skip=1955840" into
"if=/dev/sda2" . . . :-)

> gives me
> 
> 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
>         *
> 00001000  1e b5 54 51 20 4c 56 4d  32 20 78 5b 35 41 25 72  |..TQ LVM2
> x[5A%r|
> 00001010  30 4e 2a 3e 01 00 00 00  00 10 00 00 00 00 00 00  |
> 0N*>............|
> 00001020  00 00 02 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> 00001030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00001200  76 67 5f 6e 65 64 69 67  73 30 32 20 7b 0a 69 64  |vg_nedigs02
> {.id|
> 00001210  20 3d 20 22 32 4c 62 48  71 64 2d 72 67 42 74 2d  | =
> "2LbHqd-rgBt-|
> 00001220  45 4a 75 31 2d 32 52 36  31 2d 41 35 7a 74 2d 6e  |
> EJu1-2R61-A5zt-n|
> 00001230  49 58 53 2d 66 79 4f 36  33 73 22 0a 73 65 71 6e  |
> IXS-fyO63s".seqn|
> 00001240  6f 20 3d 20 37 0a 66 6f  72 6d 61 74 20 3d 20 22  |o =
> 7.format = "|
> 00001250  6c 76 6d 32 22 20 23 20  69 6e 66 6f 72 6d 61 74  |lvm2" #
> informat|
> (cont'd)

This implies that /dev/sda2 is the first device in a raid5/6 that uses
metadata 0.9 or 1.0.  You've found the LVM PV signature, which starts at
4k into a PV.  Theoretically, this could be a stray, abandoned signature
from the original array, with the real LVM signature at the 262144
offset.  Show:

dd if=/dev/sda2 skip=262144 count=16 |hexdump -C

> 
> but on /dev/sdb
> 
> 00000000  5f 80 00 00 5f 80 01 00  5f 80 02 00 5f 80 03 00  |
> _..._..._..._...|
> 00000010  5f 80 04 00 5f 80 0c 00  5f 80 0d 00 00 00 00 00  |
> _..._..._.......|
> 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00001000  60 80 00 00 60 80 01 00  60 80 02 00 60 80 03 00  |
> `...`...`...`...|
> 00001010  60 80 04 00 60 80 0c 00  60 80 0d 00 00 00 00 00  |
> `...`...`.......|
> 00001020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00001400
> 
> so my initial guess that the data may start at 00001000 did not pan out.

No, but with parity raid scattering data amongst the participating
devices, the report on /dev/sdb2 is expected.

> Does anybody have an idea of how to reliably identify an mdadm
> superblock in a hexdump of the drive ?

Above.

> And second, have I got my numbers right ? In parted I see the block
> count, and when I multiply 512 (not 4096!) with the total count I get 3
> TB, so I think I have to use bs=512 in dd to get teh partition
> boundaries correct.

dd uses bs=512 as the default.  And it can access the partitions directly.

> As for the last state: one drive was set faulty, apparently, but the
> spare had not been integrated. I may have gotten caught in a bug
> described by Neil Brown, where on shutdown disk were wrongly reported,
> and subsequently superblock information was overwritten.

Possible.  If so, you may not find any superblocks with the grep above.

> I don't have NAS/SAN storage space to make identical copies of 5x3 TB,
> but maybe I should buy 5 more disks and do a dd mirror so I have a
> backup of the current state.

We can do some more non-destructive investigation first.

Regards,

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2014-01-11 17:47 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-11  6:42 mdadm expanded 8 disk raid 6 fails in new server, 5 original devices show no md superblock Großkreutz, Julian
2014-01-11 17:47 ` Phil Turmel [this message]
     [not found]   ` <1389632980.11328.104.camel@achilles.aeskuladis.de>
2014-01-13 18:42     ` Phil Turmel
2014-01-13 20:11       ` Chris Murphy
2014-01-14 10:31       ` Großkreutz, Julian
2014-01-14 13:14         ` Phil Turmel
2014-01-14 14:00           ` AW: " Großkreutz, Julian
2014-01-14 17:47           ` Wilson Jonathan
2014-01-14 18:43             ` Phil Turmel
2014-01-15 12:50               ` Wilson Jonathan
2014-01-15 13:35                 ` Phil Turmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52D183B5.3060006@turmel.org \
    --to=philip@turmel.org \
    --cc=Julian.Grosskreutz@med.uni-jena.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.