From: Miles Fidelman <mfidelman@traversetechnologies.com>
To: LVM general discussion and development <linux-lvm@redhat.com>
Subject: Re: [linux-lvm] progress, but... - re. fixing LVM/md snafu
Date: Mon, 06 Apr 2009 10:17:58 -0400 [thread overview]
Message-ID: <49DA0F16.7000007@traversetechnologies.com> (raw)
In-Reply-To: <49DD2D4E-9D47-47D1-BB70-C85DE4D9C9AB@engineyard.com>
Hi Jayson,
Thanks for all the detailed information yesterday. I've done some more
digging into my system, and I wonder if you'd be willing to comment on
what I found, and the recovery procedure I'm considering.
Quick summary of situation:
- machine comes up, but LVM builds / on top of /dev/sdb3 instead of
/dev/md2 of which /dev/sdb3 is a part
- looks like md2 isn't starting, so I need to fix it (presumably
offline, using a LiveCD), then reboot and get LVM to use the mirror device
What's confusing is that the raid isn't starting at boot time, but
depending on which tools I use shows different status. So first, I have
to get the raid working again and make sure it has the up-to-date data.
Here are some more details, broken into four sections: RAID, LVM, boot
process, recovery procedure - the RAID section has a summary at the
front, followed by details of command listings, the other sections are
much shorter :-):
Comments on the recovery procedure, please!
---------- re. the RAID array --------
RE. the raid array:
summary:
- /proc/mdstat thinks the array is inactive, containing sdb3 and sdd3
- mdadm thinks it's active, degraded, also containing sdb3 and sdd3
(mdadm -D /dev/md2)
- looking at superblocks, mdadm seems to think it's active, degraded
(mdadm -E /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3)
-- containing sda3, only (mdadm -E /dev/sda3)
-- containing sda3, with sdb3 spare (mdadm -E /dev/sdb3)
-- containing sda3 and sdb3, with sdc3 spare (mdadm -E /dev/sdc3) - with
the same Magic #, different UUID from above
-- no superblock on /dev/sdd3 (mdadm -E /dev/sdd3)
details:
more /proc/mdstat:
md2 : inactive sdd3[0] sdb3[2]
195318016 blocks
<looking@RAID>
mdadm -D /dev/md2:
/dev/md2:
Version : 00.90.01
Creation Time : Thu Jul 20 06:15:18 2006
Raid Level : raid1
Device Size : 97659008 (93.13 GiB 100.00 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Fri Apr 3 10:06:41 2009
State : active, degraded
Active Devices : 0
Working Devices : 2
Failed Devices : 0
Spare Devices : 2
Number Major Minor RaidDevice State
0 8 51 0 spare rebuilding /dev/sdd3
1 0 0 - removed
2 8 19 - spare /dev/sdb3
<looking@component devices>
server1:/etc/lvm# mdadm -E /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
/dev/sda3:
Magic : a92b4efc
Version : 00.90.00
UUID : 3a32acee:8a132ab9:545792a8:0df49d99
Creation Time : Thu Jul 20 06:15:18 2006
Raid Level : raid1
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Update Time : Fri Apr 3 22:40:39 2009
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Checksum : 71d21f34 - correct
Events : 0.114704240
Number Major Minor RaidDevice State
this 0 8 3 0 active sync /dev/sda3
0 0 8 3 0 active sync /dev/sda3
1 1 0 0 1 faulty removed
/dev/sdb3:
Magic : a92b4efc
Version : 00.90.00
UUID : 3a32acee:8a132ab9:545792a8:0df49d99
Creation Time : Thu Jul 20 06:15:18 2006
Raid Level : raid1
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Update Time : Fri Apr 3 10:06:41 2009
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 1
Spare Devices : 1
Checksum : 71d1d1fa - correct
Events : 0.114716950
Number Major Minor RaidDevice State
this 2 8 19 2 spare /dev/sdb3
0 0 8 3 0 active sync /dev/sda3
1 1 0 0 1 faulty removed
2 2 8 19 2 spare /dev/sdb3
/dev/sdc3:
Magic : a92b4efc
Version : 00.90.00
UUID : 635fb32e:6a83a5be:12735af4:74016e66
Creation Time : Wed Jul 2 12:48:36 2008
Raid Level : raid1
Raid Devices : 2
Total Devices : 3
Preferred Minor : 2
Update Time : Fri Apr 3 06:42:50 2009
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1
Checksum : 95973481 - correct
Events : 0.26
Number Major Minor RaidDevice State
this 2 8 35 2 spare /dev/sdc3
0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 8 35 2 spare /dev/sdc3
mdadm: No super block found on /dev/sdd3 (Expected magic a92b4efc, got
00000000)
<looking@devices with --scan>
server1:/etc/lvm# mdadm -E --scan /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
ARRAY /dev/md2 level=raid1 num-devices=2
UUID=635fb32e:6a83a5be:12735af4:74016e66
devices=/dev/sdc3
ARRAY /dev/md2 level=raid1 num-devices=2
UUID=3a32acee:8a132ab9:545792a8:0df49d99
devices=/dev/sda3,/dev/sdb3
-------- re. LVM ---------
/etc/lvm.conf contains the line:
md_component_detection = 0
I expect that if I set it to 1 that would tell LVM to look for RAIDs first.
Also, /etc/lvm/backup/rootvolume contains:
pv0 {
id = "2ppSS2-q0kO-3t0t-uf8t-6S19-qY3y-pWBOxF"
device = "/dev/md2" # Hint only
which suggests that if the RAID is running, lvm will do the right thing
---------- re. boot process ------------
looks like detailed events are:
- MBR loads grub
- grub knows about md and lvm, mounts read-only
-- kernel /vmlinuz-2.6.8-3-686 root=/dev/mapper/rootvolume-rootlv
ro mem=4
- during main boot md comes up first, then lvm
-- from rcS.d/S25mdadm-raid: if not already running ... mdadm -A -s -a
---- I'm guessing this fails for /dev/md2
-- from rcS.d/S26lvm:
-- creates lvm device
-- creates dm device
-- does a vgscan
---- which is where this happens:
Found duplicate PV 2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3
not /dev/sda3
Found volume group "backupvolume" using metadata type lvm2
Found volume group "rootvolume" using metadata type lvm2
-- does a vgchange -a -y
---- which looks like it's picking up on sdb3
-- I'm guessing that if the mirror were active, and based on /dev/sdb3
- lvm would pick that up as the volume group
** is this where setting md_component_detection = 1 would be helpful?
------------ recovery procedure ------------
here's what I'm thinking of doing - comments please!
1. turn logging on in lvm.conf, reboot, examine logs to confirm above
guesses (or find out what's really happening)
-- based on the logging, maybe set md_component_detection = 1 in lvm.conf
2. shutdown, boot from LiveCD (I'm using systemrescuecd - great tool by
the way)
3. backup /dev/sdb3 using partimage (just in case!)
4. try to fix /dev/md2
if it's not running - start it, with only /dev/sdb3; then add in other
devices
- A /dev/md2 --add /dev/sdb3 --run (**is this the right way to do
this?**)
- add each device back (mdadm -a /dev/sda3; mdadm -a /dev/sdb3; mdadm -a
/dev/sdd3)
- grow to 3 active devices: mdadm --grow -n 3 /dev/md2
if it's running:
- fail all except /dev/sdb3 (mdadm -f /dev/sda3; mdadm -f /dev/sdb3;
mdadm -f /dev/sdd3)
- remove all except /dev/sdb3 (mdadm -r /dev/sda3; mdadm -r /dev/sdb3;
mdadm -r /dev/sdd3)
- add each device back (mdadm -a /dev/sda3; mdadm -a /dev/sdb3; mdadm -a
/dev/sdd3)
- grow to 3 active devices: mdadm --grow -n 3 /dev/md2
question: do I need to update mdadm.conf?
question: do I need to anything to get rid of the superblock containing
a different UUID
5. reboot the system
- it may just come up
- if it comes up and lvm is still operating off a single partition,
repeat the above, but first add a filter to lvm.conf (wash, rinse,
repeat as necessary)
*** does this seem like a reasonable game plan? ***
Thanks again for your help!
Miles
next prev parent reply other threads:[~2009-04-06 14:18 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-05 17:05 [linux-lvm] progress, but... - re. fixing LVM/md snafu Miles Fidelman
2009-04-05 17:05 ` Miles Fidelman
2009-04-05 18:32 ` [linux-lvm] " Jayson Vantuyl
2009-04-05 21:12 ` Miles Fidelman
2009-04-06 14:17 ` Miles Fidelman [this message]
2009-04-05 21:44 ` Goswin von Brederlow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49DA0F16.7000007@traversetechnologies.com \
--to=mfidelman@traversetechnologies.com \
--cc=linux-lvm@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.