From: Miles Fidelman <mfidelman@traversetechnologies.com>
To: LVM general discussion and development <linux-lvm@redhat.com>
Subject: Re: [linux-lvm] progress, but... - re. fixing LVM/md snafu
Date: Sun, 05 Apr 2009 17:12:10 -0400 [thread overview]
Message-ID: <49D91EAA.4080508@traversetechnologies.com> (raw)
In-Reply-To: <49DD2D4E-9D47-47D1-BB70-C85DE4D9C9AB@engineyard.com>
Jayson,
This is VERY helpful. Thanks!
Miles
Jayson Vantuyl wrote:
> Miles,
>
> It seems like what's probably happened is that LVM detected the raw
> device instead of the MD device at some point early in the boot
> process. This may be because the MD detection happened after LVM
> setup. I'm unsure if it's possible for LVM to "steal" the device from MD.
>
> Depending on your distribution, this may require different things to
> fix. Stop worrying about downtime. If the data is important, just
> don't worry about downtime. If downtime is really important, build a
> second machine, get it working right, and transfer the data. Being in
> a hurry and attempting to "optimize" the recovery process is a really
> good way to lose the data.
>
> Assuming that you're going to try to fix this setup, I'd start out
> with a backup. This is critical. Everybody always says to do a backup.
> Nobody ever does it. Really, do one. Get an S3 account, use an S3
> backup utility. There's just not an excuse these days. Your data is
> one-MD-mistake away from oblivion.
>
> So, right now MD should have sda/sdb but only has sda. sdb is now
> newer than sda and may have important data if this server stores
> anything like that. The challenge is that, according to MD, sda is
> newer. Since MD isn't handling writes to sdb, it won't be updating its
> metadata to know that it's newer. There are two options that I can
> think of, both ugly. Pick one of:
>
> 1. Destroy the MD. Create a new one with the same UUID and sdb3 as the
> source. (which you listed, the UUID part can trip you up)
> 2. Sync the updated data from sdb3 onto md2. Wipe sdb3. Add it back
> into md2. (might be less downtime depending on data size, doesn't nuke MD)
> 3. Build another machine. Get it working right. Transfer data with
> Rsync. (least downtime, most expensive)
>
> In the first two cases, this only sets you up for it to break again.
> The core problem is figuring out what happened during boot. In a
> perfect world, you would just tell LVM to only consider MD devices.
> That's not hard, but it's complicated by the fact that you have LVM on
> /. This means that the configuration that's used is likely not the
> version on / but a copy of it that is made when you set up your boot
> ramdisk (a.k.a. initrd, or possibly an initramfs). Even if we get LVM
> locked down to use just MDs and get that config used to boot-time,
> there's the possibility that the MD won't get assembled (since it
> already may not have been when LVM was first activated) and the system
> won't boot. Again, fraught with peril.
>
> If you want to fix the MD, first steps will be using a rescue LiveCD
> to boot up and do all of this. With that LiveCD, you can also adjust
> the LVM configuration and update the initrd (or whatever is used for
> boot). You may need to chroot into the system and/or trick the initrd
> into seeing the right devices. I don't really think I can walk you
> through this via an e-mail.
>
> The LVM part is pretty easy. Just set a filter line (you only get one,
> so disable any other filter lines) in <root of system>/etc/lvm.conf to:
>
>> filter = [ "a|^/dev/md.*$|", "r/.*/" ]
>
> That will prevent you from using anything but the MD.
>
> To update the initrd with this information depends on distro (and
> distro version)�. It's usually either some invocation of "mkinitrd" or
> some script that wraps it. It will get the LVM configuration available
> at boot-time. This *MIGHT* sort out the MD problem. It might not. If
> it doesn't, I'm not sure where to tell you to start. If mdadm is being
> used by your initrd, you'll need to tweak its configuration. If it's
> relying on MD autodetection, you might have turned that off in your
> kernel. If you have an IDE controller that takes too long to
> initialize, that can also cause this sort of thing (although that's
> REALLY unlikely these days).
>
> I hope that some of this helps. Although, it will be hard for anyone
> to give you really solid advice without a little more insight into why
> the MD isn't getting assembled prior to LVM's scan.
>
> On Apr 5, 2009, at 10:05 AM, Miles Fidelman wrote:
>
>> Hello again Folks,
>>
>> So.. I'm getting closer to fixing this messed up machine.
>>
>> Where things stand:
>>
>> I have root defined as an LVM2 LV, that should use /dev/md2 as it's PV.
>> /dev/md2 in turn is a RAID1 array built from /dev/sda3 /dev/sdb3 and
>> /dev/sdc3
>>
>> Instead, LVM is reporting: "Found duplicate PV
>> 2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3 not /dev/sda3"
>> and the /dev/md2 is reporting itself as inactive (cat /proc/mdstat)
>> and active,degraded (mdadm --detail)
>>
>> ---
>> I'm guessing that, during boot:
>>
>> - the raid array failed to start
>> - LVM found both copies of the PV, and picked one (/dev/sdb3)
>> - everything then came up and my server is humming away
>>
>> but: the md array can't rebuild because the most current device in it
>> is already in use
>>
>> so... I'm looking for the right sequence of events, with the minimum
>> downtime to:
>>
>> 1. stop changes to /dev/sdb3 (actually, to / - which complicates things)
>> 2. rebuild the RAID1 array, making sure to use /dev/sdb3 as the
>> starting point for current data
>> 3. restart in such a way that LVM finds /dev/md2 as the right PVM
>> instead of one of its components
>>
>> Each of these is just tricky enough that I'm sure there are lots of
>> gotchas to watch out for.
>>
>> So.. any suggestions?
>>
>> Thanks very much,
>>
>> Miles Fidelman
>>
>>
>>
>>
>> _______________________________________________
>> linux-lvm mailing list
>> linux-lvm@redhat.com <mailto:linux-lvm@redhat.com>
>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
> --
> Jayson Vantuyl
> Founder and Architect
> *Engine Yard <http://www.engineyard.com>*
> jvantuyl@engineyard.com <mailto:jvantuyl@engineyard.com>
> 1 866 518 9275 ext 204
> IRC (freenode): kagato
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
--
Miles R. Fidelman, Director of Government Programs
Traverse Technologies
145 Tremont Street, 3rd Floor
Boston, MA 02111
mfidelman@traversetechnologies.com
857-362-8314
www.traversetechnologies.com
next prev parent reply other threads:[~2009-04-05 21:12 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-05 17:05 [linux-lvm] progress, but... - re. fixing LVM/md snafu Miles Fidelman
2009-04-05 18:32 ` Jayson Vantuyl
2009-04-05 21:12 ` Miles Fidelman [this message]
2009-04-06 14:17 ` Miles Fidelman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49D91EAA.4080508@traversetechnologies.com \
--to=mfidelman@traversetechnologies.com \
--cc=linux-lvm@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox