From: NeilBrown <neilb@suse.de>
To: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Recovering from two raid superblocks on the same disks
Date: Mon, 28 May 2012 21:40:03 +1000 [thread overview]
Message-ID: <20120528214003.6269b535@notabene.brown> (raw)
In-Reply-To: <4FC325EF.4020103@aeoncomputing.com>
[-- Attachment #1: Type: text/plain, Size: 6403 bytes --]
On Mon, 28 May 2012 00:14:55 -0700 Jeff Johnson
<jeff.johnson@aeoncomputing.com> wrote:
> Greetings,
>
> I am looking at a very unique situation and trying to successfully 1TB
> of very critical data.
>
> The md raid in question is a 12-drive RAID-10 sitting between two
> identical nodes via a shared SAS link. Originally the 12 drives were
> configured as two six drive RAID-10 volumes using the entire disk device
> (no partitions on member drives). That configuration was later scrapped
> in favor of a single 12-drive RAID-10 but in this configuration a single
> partition was created and the partition was used as the RAID member
> device instead of the entire disk (sdb1 vs sdb).
>
> One of the systems had the old two six-drive RAID-10 mdadm.conf file
> left in /etc. Due to a power outage both systems went down and then
> rebooted. When one system, the one with the old mdadm.conf file, came up
> md referenced the file, saw the intact old superblocks at the beginning
> of the drive and started an assemble and resync of those two six-drive
> RAID-10 volumes. The resync process got to 40% before it was stopped.
>
> The other system managed to enumerate the drives and see the partition
> maps prior to the other node assembling the old superblock config. I can
> still see the newer md superblocks that start on the partition boundary
> rather than the beginning of the physical drive.
>
> It appears that md overwrite protection was in a way circumvented by the
> old superblocks matching the old mdadm.conf file and not seeing
> conflicting superblocks at the beginning of the partition boundaries.
>
> Both versions, old and new, were RAID-10. It appears that the errant
> resync of the old configuration didn't corrupt the newer RAID config
> since the drives were allocated in the same order and the same drives
> were paired (mirrors) in both old and new configs. I am guessing that
> since the striping method was RAID-0 the absence of stripe parity to
> check kept the data on the drives from being corrupted. This is
> conjecture on my part.
>
> Old config:
> RAID-10, /dev/md0, /dev/sd[bcdefg]
> RAID-10, /dev/md1, /dev/sd[hijklm]
>
> New config:
> RAID-10, /dev/md0, /dev/sd[bcdefghijklm]1
>
> It appears that the old superblock remained in that ~17KB gap between
> physical start of disk and the start boundary of partition 1 where the
> new superblock was written.
>
> I was able to still see the partitions on the other node. I was able to
> read the new config superblocks from 11 of the 12 drives. UUIDs, state,
> all seem to be correct.
>
> Three questions:
>
> 1) Has anyone seen a situation like this before?
I haven't.
> 2) Is it possible that since the mirrored pairs were allocated in the
> same order that the data was not overwritten?
Certainly possible.
> 3) What is the best way to assemble and run a 12-drive RAID-10 with
> member drive 0 (sdb1) seemingly blank (no superblock)?
It would be good to work out exactly why sdb1 is blank as knowing that might
provide a useful insight into the overall situation. However it probably
isn't critical.
The --assemble command you list below should be perfectly safe and allow
read access without risking any corruption.
If you
echo 1 > /sys/module/md_mod/parameters/start_ro
then it will be even safer (if that is possible). It will certainly not write
anything until you write to the array yourself.
You can then 'fsck -n', 'mount -o ro' and copy any super-critical files before
proceeding.
I would then probably
echo check > /sys/block/md0/md/sync_action
just to see if everything is ok (low mismatch count expected).
I also recommend removing the old superblocks.
mdadm --zero /dev/sdc --metadata=0.90
will look for a 0.90 superblock on sdc and if it finds one, it will erase it.
You should first double check with
mdadm --examine --metadata=0.90 /dev/sda
to ensure that is the one you want to remove
(without the --metadata=0.90 it will look for other metadata, and you might
not want it to do that without you checking first).
Good luck,
NeilBrown
>
> The current state of the 12-drive volume is: (note: sdb1 has no
> superblock but the drive is physically fine)
>
> /dev/sdc1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 852267e0:095a343c:f4f590ad:3333cb43
> Creation Time : Tue Feb 14 18:56:08 2012
> Raid Level : raid10
> Used Dev Size : 586059136 (558.91 GiB 600.12 GB)
> Array Size : 3516354816 (3353.46 GiB 3600.75 GB)
> Raid Devices : 12
> Total Devices : 12
> Preferred Minor : 0
>
> Update Time : Sat May 26 12:05:11 2012
> State : clean
> Active Devices : 12
> Working Devices : 12
> Failed Devices : 0
> Spare Devices : 0
> Checksum : 21bca4ce - correct
> Events : 26
>
> Layout : near=2
> Chunk Size : 32K
>
> Number Major Minor RaidDevice State
> this 1 8 33 1 active sync /dev/sdc1
>
> 0 0 8 17 0 active sync
> 1 1 8 33 1 active sync /dev/sdc1
> 2 2 8 49 2 active sync /dev/sdd1
> 3 3 8 65 3 active sync /dev/sde1
> 4 4 8 81 4 active sync /dev/sdf1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 8 113 6 active sync /dev/sdh1
> 7 7 8 129 7 active sync /dev/sdi1
> 8 8 8 145 8 active sync /dev/sdj1
> 9 9 8 161 9 active sync /dev/sdk1
> 10 10 8 177 10 active sync /dev/sdl1
> 11 11 8 193 11 active sync /dev/sdm1
>
> I could just run 'mdadm -A --uuid=852267e0095a343cf4f590ad3333cb43
> /dev/sd[bcdefghijklm]1 --run' but I feel better seeking advice and
> consensus before doing anything.
>
> I have never seen a situation like this before. It seems like there
> might be one correct way to get the data back and many ways of losing
> the data for good. Any advice or feedback is greatly appreciated!
>
> --Jeff
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
prev parent reply other threads:[~2012-05-28 11:40 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-28 7:14 Recovering from two raid superblocks on the same disks Jeff Johnson
2012-05-28 11:40 ` NeilBrown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120528214003.6269b535@notabene.brown \
--to=neilb@suse.de \
--cc=jeff.johnson@aeoncomputing.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).