From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx13.extmail.prod.ext.phx2.redhat.com [10.5.110.18]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r7J4tYD8003887 for ; Mon, 19 Aug 2013 00:55:34 -0400 Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r7J4tWvR032175 for ; Mon, 19 Aug 2013 00:55:33 -0400 Received: from compute2.internal (compute2.nyi.mail.srv.osa [10.202.2.42]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 7FC9A21479 for ; Mon, 19 Aug 2013 00:55:32 -0400 (EDT) Received: from ryoohko.kodachi.com (unknown [108.49.181.241]) by mail.messagingengine.com (Postfix) with ESMTPA id 47A836800C4 for ; Mon, 19 Aug 2013 00:55:32 -0400 (EDT) Message-ID: <5211A543.9040603@kodachi.com> Date: Mon, 19 Aug 2013 00:55:31 -0400 From: Flynn MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [linux-lvm] PV that's present marked as missing? Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: linux-lvm@redhat.com I have a fairly complex LVM2/mdadm setup that I'm in the middle of turning into a simpler setup. I made a mistake along the way, though, and have landed in a confusing place. This is kind of long, and I apologize for that -- trying to describe completely how I got here. The complex setup I started with: /dev/md5 is a RAID5 of /dev/sd{b,d,e,f}5 /dev/md6 is a RAID5 of /dev/sd{b,d,e,f}6 etc on up to /dev/md14 /dev/md99 is a RAID1 of /dev/sdg and /dev/sdh /dev/md{5-14} plus /dev/md99 are all assembled into a volume group (creatively called vglinux), which has three logical volumes. Only one, lvstore, is relevant: the other two are getting destroyed as part of the simplication. The goal is to end with a RAID6 of /dev/sd{b,d,e,f,g,h}, and no multiple-partition madness (it's there from the days of old, when mdadm couldn't reshape arrays). The next step was to free up /dev/sdf, starting with pvmove /dev/md5 reshape md5 as a RAID5 of /dev/sd{b,d,e}5 (freeing /dev/sdf5) lather, rinse, and repeat for the other mds. The VG has plenty of free space for this; it's slow, but that's OK. The problem: while md{5,6,7} went fine, I botched the pvmove for md8 and ended up starting to reshape the array _before the pvmove happened._ Specifically, I did all of these: mdadm --grow /dev/md8 --array-size 292730880 # it was 439489920 pvresize /dev/md8 mdadm --grow /dev/md8 --raid-devices 3 --backup-file ~/backup _without_ having moved data off. Once I figured out what was going on, I did umount (all the filesystems in the VG) vgchange -a n vglinux mdadm --stop /dev/md8 which halted the reshape about 5% of the way done. Then (with some help from NeilBrown and a buncha experiments with loopback devices) I used the most recent mdadm snapshot to revert the reshape. mdadm --assemble --update=revert-reshape /dev/md8 /dev/sd{b,d,e,f}8 NOTE WELL: I KNOW THAT THIS HAS DESTROYED SOME DATA. That's not the question. [ :) ] There will be damage, yes, I know that, and I should be able to detect that and correct it. At this point /dev/md8 is back to 4 devices, array-size 439489920, and can be started. Next step is to fsck lvstore to get a handle on the damage before proceeding -- but vgchange -a y vglinux doesn't start lvstore: # vgchange -a y vglinux Incorrect metadata area header checksum Refusing activation of partial LV lvstore. Use --partial to override. 2 logical volume(s) in volume group "vglinux" now active (The two LVs that it did start are the irrelevant ones.) So things are confusing: First, it'd be awesome to know where exactly that "incorrect metada area header checksum" is coming from. Maybe, y'know, a device to look at, or some further hint of where to start tracking things down? [ :) ] Second, if I look in /etc/lvm/archive for vglinux's latest, I find this bit buried in there: pv2 { id = "4F3rcV-sS8p-E6t2-hjGm-gLVB-C6wl-4McUhc" device = "/dev/md8" # Hint only status = ["ALLOCATABLE"] flags = ["MISSING"] dev_size = 878979840 # 419.13 Gigabytes pe_start = 384 pe_count = 107297 # 419.129 Gigabytes } which seems to be why it's complaining about 'partial PV lvstore'. But, uh, 4F3rcV-sS8p-E6t2-hjGm-gLVB-C6wl-4McUhc _is_ the UUID of /dev/md8: # pvs -o +uuid --unit=4m Incorrect metadata area header checksum Unable to find "/dev/sdb5" in volume group "vglinux" PV VG Fmt Attr PSize PFree PV UUID /dev/md10 vglinux lvm2 a- 107297.00U 0U LO5KoK-1AjU-iXb0-fkLo-lUKR-Yo9P-wDZQPP /dev/md11 vglinux lvm2 a- 107297.00U 0U gBGcjz-DmIb-pAj9-CWnb-jopW-Wd19-iIs1ur /dev/md125 vglinux lvm2 a- 107297.00U 8607.00U 5JlNTx-yT14-271r-NMAm-a17W-FKe4-pXoOW4 /dev/md13 vglinux lvm2 a- 107297.00U 0U MJlTQO-lCyE-bP80-FlvE-m1nM-DD2x-qhlIQK /dev/md14 vglinux lvm2 a- 107297.00U 0U XDpA1D-kxbq-SEck-ozTl-rP4Y-bMws-MBwNNf /dev/md5 lvm2 a- 71467.50U 71467.50U 39oFQs-9tlf-ywT4-YgtX-nfcm-rAEq-pAPsdR /dev/md6 vglinux lvm2 a- 71531.00U 35856.00U ufKOpM-02YG-12rJ-mt1r-DbEm-xoJu-onzEtr /dev/md7 vglinux lvm2 a- 71531.00U 71531.00U NpAKLQ-4Irn-wDA4-0ZDI-ydW6-eY9n-rDp50e /dev/md8 vglinux lvm2 a- 107297.00U 0U 4F3rcV-sS8p-E6t2-hjGm-gLVB-C6wl-4McUhc /dev/md9 vglinux lvm2 a- 107297.00U 0U hRmTMN-Mx17-uUEX-rF1Z-hQ1J-8iDd-S7S2t7 /dev/md99 vglinux lvm2 a- 357667.00U 178748.00U jUgxoF-mvwR-6C8A-wzjP-K0Xu-MPf8-XewqUE Finally, note that "Unable to find /dev/sdb5 in vglinux" complaint, and note that /dev/md5 is _not_ listed as part of vglinux. md5 shouldn't be part of vglinux right now, and sdb5 has never been a PV on its own (it's only ever been a part of the md5 PV). WTFO? As it happens, I didn't actually reshape /dev/md5: after the pvmove, I shredded the md and recreated it instead. I suppose it's possible that I forgot to vgreduce before doing that? Googling and reading indicates that I need to clear that MISSING flag, and that vgcfgrestore is the only tool for that job -- but editing that archive file to remove the MISSING flag and trying vgcfgrestore with that doesn't work: # vgcfgrestore --debug --verbose --test --file wtfvglinux vglinux Test mode: Metadata will NOT be updated. Incorrect metadata area header checksum Incorrect metadata area header checksum Restore failed. Test mode: Wiping internal cache Wiping internal VG cache so, at this point, some guidance would be most welcome. (Also note that before I did the revert-reshape, I dd'd /dev/sd{b,d,e,f}8 to spare partitions as a backup. It may be relevant that there are two copies of the metadata for md8's devices?) Thanks very much, Flynn -- The trick is to keep breathing. (Garbage, from _Version 2.0_)