All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] PV that's present marked as missing?
@ 2013-08-19  4:55 Flynn
  0 siblings, 0 replies; only message in thread
From: Flynn @ 2013-08-19  4:55 UTC (permalink / raw)
  To: linux-lvm

I have a fairly complex LVM2/mdadm setup that I'm in the middle of 
turning into a simpler setup.  I made a mistake along the way, though, 
and have landed in a confusing place.

This is kind of long, and I apologize for that -- trying to describe 
completely how I got here.  The complex setup I started with:

/dev/md5 is a RAID5 of /dev/sd{b,d,e,f}5
/dev/md6 is a RAID5 of /dev/sd{b,d,e,f}6
etc on up to /dev/md14
/dev/md99 is a RAID1 of /dev/sdg and /dev/sdh

/dev/md{5-14} plus /dev/md99 are all assembled into a volume group 
(creatively called vglinux), which has three logical volumes.  Only one, 
lvstore, is relevant: the other two are getting destroyed as part of the 
simplication.

The goal is to end with a RAID6 of /dev/sd{b,d,e,f,g,h}, and no 
multiple-partition madness (it's there from the days of old, when mdadm 
couldn't reshape arrays).  The next step was to free up /dev/sdf, 
starting with

     pvmove /dev/md5
     reshape md5 as a RAID5 of /dev/sd{b,d,e}5 (freeing /dev/sdf5)
     lather, rinse, and repeat for the other mds.

The VG has plenty of free space for this; it's slow, but that's OK.

The problem: while md{5,6,7} went fine, I botched the pvmove for md8 and 
ended up starting to reshape the array _before the pvmove happened._ 
Specifically, I did all of these:

mdadm --grow /dev/md8 --array-size 292730880 # it was 439489920
pvresize /dev/md8
mdadm --grow /dev/md8 --raid-devices 3 --backup-file ~/backup

_without_ having moved data off.  Once I figured out what was going on, 
I did

umount (all the filesystems in the VG)
vgchange -a n vglinux
mdadm --stop /dev/md8

which halted the reshape about 5% of the way done.  Then (with some help 
from NeilBrown and a buncha experiments with loopback devices) I used 
the most recent mdadm snapshot to revert the reshape.

mdadm --assemble --update=revert-reshape /dev/md8 /dev/sd{b,d,e,f}8

NOTE WELL: I KNOW THAT THIS HAS DESTROYED SOME DATA.  That's not the 
question.  [ :) ]  There will be damage, yes, I know that, and I should 
be able to detect that and correct it.

At this point /dev/md8 is back to 4 devices, array-size 439489920, and 
can be started.  Next step is to fsck lvstore to get a handle on the 
damage before proceeding -- but vgchange -a y vglinux doesn't start lvstore:

# vgchange  -a y vglinux
   Incorrect metadata area header checksum
   Refusing activation of partial LV lvstore. Use --partial to override.
   2 logical volume(s) in volume group "vglinux" now active

(The two LVs that it did start are the irrelevant ones.)

So things are confusing:

First, it'd be awesome to know where exactly that "incorrect metada area 
header checksum" is coming from.  Maybe, y'know, a device to look at, or 
some further hint of where to start tracking things down?  [ :) ]

Second, if I look in /etc/lvm/archive for vglinux's latest, I find this 
bit buried in there:

     pv2 {
         id = "4F3rcV-sS8p-E6t2-hjGm-gLVB-C6wl-4McUhc"
         device = "/dev/md8"     # Hint only

         status = ["ALLOCATABLE"]
         flags = ["MISSING"]
         dev_size = 878979840    # 419.13 Gigabytes
         pe_start = 384
         pe_count = 107297       # 419.129 Gigabytes
     }

which seems to be why it's complaining about 'partial PV lvstore'.  But, 
uh, 4F3rcV-sS8p-E6t2-hjGm-gLVB-C6wl-4McUhc _is_ the UUID of /dev/md8:

# pvs -o +uuid --unit=4m
   Incorrect metadata area header checksum
   Unable to find "/dev/sdb5" in volume group "vglinux"
   PV         VG      Fmt  Attr PSize      PFree      PV UUID
   /dev/md10  vglinux lvm2 a-   107297.00U         0U 
LO5KoK-1AjU-iXb0-fkLo-lUKR-Yo9P-wDZQPP
   /dev/md11  vglinux lvm2 a-   107297.00U         0U 
gBGcjz-DmIb-pAj9-CWnb-jopW-Wd19-iIs1ur
   /dev/md125 vglinux lvm2 a-   107297.00U   8607.00U 
5JlNTx-yT14-271r-NMAm-a17W-FKe4-pXoOW4
   /dev/md13  vglinux lvm2 a-   107297.00U         0U 
MJlTQO-lCyE-bP80-FlvE-m1nM-DD2x-qhlIQK
   /dev/md14  vglinux lvm2 a-   107297.00U         0U 
XDpA1D-kxbq-SEck-ozTl-rP4Y-bMws-MBwNNf
   /dev/md5           lvm2 a-    71467.50U  71467.50U 
39oFQs-9tlf-ywT4-YgtX-nfcm-rAEq-pAPsdR
   /dev/md6   vglinux lvm2 a-    71531.00U  35856.00U 
ufKOpM-02YG-12rJ-mt1r-DbEm-xoJu-onzEtr
   /dev/md7   vglinux lvm2 a-    71531.00U  71531.00U 
NpAKLQ-4Irn-wDA4-0ZDI-ydW6-eY9n-rDp50e
   /dev/md8   vglinux lvm2 a-   107297.00U         0U 
4F3rcV-sS8p-E6t2-hjGm-gLVB-C6wl-4McUhc
   /dev/md9   vglinux lvm2 a-   107297.00U         0U 
hRmTMN-Mx17-uUEX-rF1Z-hQ1J-8iDd-S7S2t7
   /dev/md99  vglinux lvm2 a-   357667.00U 178748.00U 
jUgxoF-mvwR-6C8A-wzjP-K0Xu-MPf8-XewqUE

Finally, note that "Unable to find /dev/sdb5 in vglinux" complaint, and 
note that /dev/md5 is _not_ listed as part of vglinux.  md5 shouldn't be 
part of vglinux right now, and sdb5 has never been a PV on its own (it's 
only ever been a part of the md5 PV).  WTFO?  As it happens, I didn't 
actually reshape /dev/md5: after the pvmove, I shredded the md and 
recreated it instead.  I suppose it's possible that I forgot to vgreduce 
before doing that?

Googling and reading indicates that I need to clear that MISSING flag, 
and that vgcfgrestore is the only tool for that job -- but editing that 
archive file to remove the MISSING flag and trying vgcfgrestore with 
that doesn't work:

# vgcfgrestore --debug --verbose --test --file wtfvglinux vglinux
   Test mode: Metadata will NOT be updated.
   Incorrect metadata area header checksum
   Incorrect metadata area header checksum
   Restore failed.
     Test mode: Wiping internal cache
     Wiping internal VG cache

so, at this point, some guidance would be most welcome.

(Also note that before I did the revert-reshape, I dd'd 
/dev/sd{b,d,e,f}8 to spare partitions as a backup.  It may be relevant 
that there are two copies of the metadata for md8's devices?)

Thanks very much,
  Flynn

--
The trick is to keep breathing.              (Garbage, from _Version 2.0_)

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2013-08-19  4:55 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-19  4:55 [linux-lvm] PV that's present marked as missing? Flynn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.