From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: Recovering from two raid superblocks on the same disks Date: Mon, 28 May 2012 21:40:03 +1000 Message-ID: <20120528214003.6269b535@notabene.brown> References: <4FC325EF.4020103@aeoncomputing.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/=KyVGveTxpef2eDmuwms_cL"; protocol="application/pgp-signature" Return-path: In-Reply-To: <4FC325EF.4020103@aeoncomputing.com> Sender: linux-raid-owner@vger.kernel.org To: Jeff Johnson Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids --Sig_/=KyVGveTxpef2eDmuwms_cL Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Mon, 28 May 2012 00:14:55 -0700 Jeff Johnson wrote: > Greetings, >=20 > I am looking at a very unique situation and trying to successfully 1TB=20 > of very critical data. >=20 > The md raid in question is a 12-drive RAID-10 sitting between two=20 > identical nodes via a shared SAS link. Originally the 12 drives were=20 > configured as two six drive RAID-10 volumes using the entire disk device= =20 > (no partitions on member drives). That configuration was later scrapped=20 > in favor of a single 12-drive RAID-10 but in this configuration a single= =20 > partition was created and the partition was used as the RAID member=20 > device instead of the entire disk (sdb1 vs sdb). >=20 > One of the systems had the old two six-drive RAID-10 mdadm.conf file=20 > left in /etc. Due to a power outage both systems went down and then=20 > rebooted. When one system, the one with the old mdadm.conf file, came up= =20 > md referenced the file, saw the intact old superblocks at the beginning=20 > of the drive and started an assemble and resync of those two six-drive=20 > RAID-10 volumes. The resync process got to 40% before it was stopped. >=20 > The other system managed to enumerate the drives and see the partition=20 > maps prior to the other node assembling the old superblock config. I can= =20 > still see the newer md superblocks that start on the partition boundary=20 > rather than the beginning of the physical drive. >=20 > It appears that md overwrite protection was in a way circumvented by the= =20 > old superblocks matching the old mdadm.conf file and not seeing=20 > conflicting superblocks at the beginning of the partition boundaries. >=20 > Both versions, old and new, were RAID-10. It appears that the errant=20 > resync of the old configuration didn't corrupt the newer RAID config=20 > since the drives were allocated in the same order and the same drives=20 > were paired (mirrors) in both old and new configs. I am guessing that=20 > since the striping method was RAID-0 the absence of stripe parity to=20 > check kept the data on the drives from being corrupted. This is=20 > conjecture on my part. >=20 > Old config: > RAID-10, /dev/md0, /dev/sd[bcdefg] > RAID-10, /dev/md1, /dev/sd[hijklm] >=20 > New config: > RAID-10, /dev/md0, /dev/sd[bcdefghijklm]1 >=20 > It appears that the old superblock remained in that ~17KB gap between=20 > physical start of disk and the start boundary of partition 1 where the=20 > new superblock was written. >=20 > I was able to still see the partitions on the other node. I was able to=20 > read the new config superblocks from 11 of the 12 drives. UUIDs, state,=20 > all seem to be correct. >=20 > Three questions: >=20 > 1) Has anyone seen a situation like this before? I haven't. > 2) Is it possible that since the mirrored pairs were allocated in the=20 > same order that the data was not overwritten? Certainly possible. > 3) What is the best way to assemble and run a 12-drive RAID-10 with=20 > member drive 0 (sdb1) seemingly blank (no superblock)? It would be good to work out exactly why sdb1 is blank as knowing that might provide a useful insight into the overall situation. However it probably isn't critical. The --assemble command you list below should be perfectly safe and allow read access without risking any corruption. If you echo 1 > /sys/module/md_mod/parameters/start_ro then it will be even safer (if that is possible). It will certainly not wr= ite anything until you write to the array yourself. You can then 'fsck -n', 'mount -o ro' and copy any super-critical files bef= ore proceeding. I would then probably echo check > /sys/block/md0/md/sync_action just to see if everything is ok (low mismatch count expected). I also recommend removing the old superblocks. mdadm --zero /dev/sdc --metadata=3D0.90 will look for a 0.90 superblock on sdc and if it finds one, it will erase i= t. You should first double check with mdadm --examine --metadata=3D0.90 /dev/sda to ensure that is the one you want to remove (without the --metadata=3D0.90 it will look for other metadata, and you mig= ht not want it to do that without you checking first). Good luck, NeilBrown >=20 > The current state of the 12-drive volume is: (note: sdb1 has no=20 > superblock but the drive is physically fine) >=20 > /dev/sdc1: > Magic : a92b4efc > Version : 0.90.00 > UUID : 852267e0:095a343c:f4f590ad:3333cb43 > Creation Time : Tue Feb 14 18:56:08 2012 > Raid Level : raid10 > Used Dev Size : 586059136 (558.91 GiB 600.12 GB) > Array Size : 3516354816 (3353.46 GiB 3600.75 GB) > Raid Devices : 12 > Total Devices : 12 > Preferred Minor : 0 >=20 > Update Time : Sat May 26 12:05:11 2012 > State : clean > Active Devices : 12 > Working Devices : 12 > Failed Devices : 0 > Spare Devices : 0 > Checksum : 21bca4ce - correct > Events : 26 >=20 > Layout : near=3D2 > Chunk Size : 32K >=20 > Number Major Minor RaidDevice State > this 1 8 33 1 active sync /dev/sdc1 >=20 > 0 0 8 17 0 active sync > 1 1 8 33 1 active sync /dev/sdc1 > 2 2 8 49 2 active sync /dev/sdd1 > 3 3 8 65 3 active sync /dev/sde1 > 4 4 8 81 4 active sync /dev/sdf1 > 5 5 8 97 5 active sync /dev/sdg1 > 6 6 8 113 6 active sync /dev/sdh1 > 7 7 8 129 7 active sync /dev/sdi1 > 8 8 8 145 8 active sync /dev/sdj1 > 9 9 8 161 9 active sync /dev/sdk1 > 10 10 8 177 10 active sync /dev/sdl1 > 11 11 8 193 11 active sync /dev/sdm1 >=20 > I could just run 'mdadm -A --uuid=3D852267e0095a343cf4f590ad3333cb43=20 > /dev/sd[bcdefghijklm]1 --run' but I feel better seeking advice and=20 > consensus before doing anything. >=20 > I have never seen a situation like this before. It seems like there=20 > might be one correct way to get the data back and many ways of losing=20 > the data for good. Any advice or feedback is greatly appreciated! >=20 > --Jeff >=20 --Sig_/=KyVGveTxpef2eDmuwms_cL Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIVAwUBT8NkGTnsnt1WYoG5AQKB5A//Slp3itNlNmBAj2Qse5Lux8pY4/xBmM48 heqiJjiNi6aTm2nVOEAWT+8ma+qnyoMGIf58QHQygd2HVLG35zsZWa5T0eY1WuKb Anjnu5c6/Bq69N3fnvQHvvMyrywCJz4rN9msPMExfsE/u65kHEdsUgUWxvFeDZJB zXuL5+G9pgUbu6NW/vT2jXNC1xFuqM5t9mbPGrgPMUb96VotnP1VeLha8SKuwHaK KmM1rLaVXRxjJM1gY2InZt24BoUqxLpOCqlO8Op4juVQ1txMNuUcU7PSOjb4QiKg bYzOhFJLyQjzQ6ljip/hI5L8wsPABc/b+/gPiOR4olA3fRbOilZgzUwaCP+ax4g1 T+03DUiKvbv1v0Vpp29ANN2bfjOUaY219D/xIlG5MoUPS3iBslnGqAPgCBeefthN gtvMXLirO2+5DdWrGUDBlcfXw3I1WgRLA4E+Vv4j8ACQLurpWvwVGd1KrLDeoUDq p5SOhXrg6wu7vKsk2E3epPzPTODNjwKl3vSBYl+AWCTBaY/8SSOpLuj+Ggbn8xcF rvjtNgwD28syW8ePvWPsaEBUmx8gLKUUMaqhRIx06OQ+cKQ+CPLO81vO1UibWnCY ie5xcI7KKFpLcT3SCrkiTB/bfnCpdSsn4v5ZYIlrfXUUMwgXDg2CzkbxW7tCDR+S 74Ii06Wx8O0= =tGKB -----END PGP SIGNATURE----- --Sig_/=KyVGveTxpef2eDmuwms_cL--