All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] progress, but... - re. fixing LVM/md snafu
@ 2009-04-05 17:05 ` Miles Fidelman
  0 siblings, 0 replies; 6+ messages in thread
From: Miles Fidelman @ 2009-04-05 17:05 UTC (permalink / raw)
  To: debian-user, linux-raid, linux-lvm

Hello again Folks,

So.. I'm getting closer to fixing this messed up machine.

Where things stand:

I have root defined as an LVM2 LV, that should use /dev/md2 as it's PV.
/dev/md2 in turn is a RAID1 array built from /dev/sda3 /dev/sdb3 and 
/dev/sdc3

Instead, LVM is reporting: "Found duplicate PV 
2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3 not /dev/sda3"
and the /dev/md2 is reporting itself as inactive (cat /proc/mdstat) and 
active,degraded (mdadm --detail)

---
I'm guessing that, during boot:

- the raid array failed to start
- LVM found both copies of the PV, and picked one (/dev/sdb3)
- everything then came up and my server is humming away

but: the md array can't rebuild because the most current device in it is 
already in use

so...  I'm looking for the right sequence of events, with the minimum 
downtime to:

1. stop changes to /dev/sdb3 (actually, to / - which complicates things)
2. rebuild the RAID1 array, making sure to use /dev/sdb3 as the starting 
point for current data
3. restart in such a way that LVM finds /dev/md2 as the right PVM 
instead of one of its components

Each of these is just tricky enough that I'm sure there are lots of 
gotchas to watch out for.

So.. any suggestions?

Thanks very much,

Miles Fidelman

^ permalink raw reply	[flat|nested] 6+ messages in thread

* progress, but... - re. fixing LVM/md snafu
@ 2009-04-05 17:05 ` Miles Fidelman
  0 siblings, 0 replies; 6+ messages in thread
From: Miles Fidelman @ 2009-04-05 17:05 UTC (permalink / raw)
  To: debian-user, linux-raid, linux-lvm

Hello again Folks,

So.. I'm getting closer to fixing this messed up machine.

Where things stand:

I have root defined as an LVM2 LV, that should use /dev/md2 as it's PV.
/dev/md2 in turn is a RAID1 array built from /dev/sda3 /dev/sdb3 and 
/dev/sdc3

Instead, LVM is reporting: "Found duplicate PV 
2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3 not /dev/sda3"
and the /dev/md2 is reporting itself as inactive (cat /proc/mdstat) and 
active,degraded (mdadm --detail)

---
I'm guessing that, during boot:

- the raid array failed to start
- LVM found both copies of the PV, and picked one (/dev/sdb3)
- everything then came up and my server is humming away

but: the md array can't rebuild because the most current device in it is 
already in use

so...  I'm looking for the right sequence of events, with the minimum 
downtime to:

1. stop changes to /dev/sdb3 (actually, to / - which complicates things)
2. rebuild the RAID1 array, making sure to use /dev/sdb3 as the starting 
point for current data
3. restart in such a way that LVM finds /dev/md2 as the right PVM 
instead of one of its components

Each of these is just tricky enough that I'm sure there are lots of 
gotchas to watch out for.

So.. any suggestions?

Thanks very much,

Miles Fidelman




_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] progress, but... - re. fixing LVM/md snafu
  2009-04-05 17:05 ` Miles Fidelman
  (?)
@ 2009-04-05 18:32 ` Jayson Vantuyl
  2009-04-05 21:12   ` Miles Fidelman
  2009-04-06 14:17   ` Miles Fidelman
  -1 siblings, 2 replies; 6+ messages in thread
From: Jayson Vantuyl @ 2009-04-05 18:32 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 5915 bytes --]

Miles,

It seems like what's probably happened is that LVM detected the raw  
device instead of the MD device at some point early in the boot  
process.  This may be because the MD detection happened after LVM  
setup.  I'm unsure if it's possible for LVM to "steal" the device from  
MD.

Depending on your distribution, this may require different things to  
fix.  Stop worrying about downtime.  If the data is important, just  
don't worry about downtime.  If downtime is really important, build a  
second machine, get it working right, and transfer the data.  Being in  
a hurry and attempting to "optimize" the recovery process is a really  
good way to lose the data.

Assuming that you're going to try to fix this setup, I'd start out  
with a backup.  This is critical.  Everybody always says to do a  
backup.  Nobody ever does it.  Really, do one.  Get an S3 account, use  
an S3 backup utility.  There's just not an excuse these days.  Your  
data is one-MD-mistake away from oblivion.

So, right now MD should have sda/sdb but only has sda.  sdb is now  
newer than sda and may have important data if this server stores  
anything like that.  The challenge is that, according to MD, sda is  
newer.  Since MD isn't handling writes to sdb, it won't be updating  
its metadata to know that it's newer.  There are two options that I  
can think of, both ugly.  Pick one of:

1.  Destroy the MD.  Create a new one with the same UUID and sdb3 as  
the source. (which you listed, the UUID part can trip you up)
2.  Sync the updated data from sdb3 onto md2.  Wipe sdb3.  Add it back  
into md2. (might be less downtime depending on data size, doesn't nuke  
MD)
3.  Build another machine.  Get it working right.  Transfer data with  
Rsync. (least downtime, most expensive)

In the first two cases, this only sets you up for it to break again.   
The core problem is figuring out what happened during boot.  In a  
perfect world, you would just tell LVM to only consider MD devices.   
That's not hard, but it's complicated by the fact that you have LVM  
on /.  This means that the configuration that's used is likely not the  
version on / but a copy of it that is made when you set up your boot  
ramdisk (a.k.a. initrd, or possibly an initramfs).  Even if we get LVM  
locked down to use just MDs and get that config used to boot-time,  
there's the possibility that the MD won't get assembled (since it  
already may not have been when LVM was first activated) and the system  
won't boot.  Again, fraught with peril.

If you want to fix the MD, first steps will be using a rescue LiveCD  
to boot up and do all of this.  With that LiveCD, you can also adjust  
the LVM configuration and update the initrd (or whatever is used for  
boot).  You may need to chroot into the system and/or trick the initrd  
into seeing the right devices.  I don't really think I can walk you  
through this via an e-mail.

The LVM part is pretty easy.  Just set a filter line (you only get  
one, so disable any other filter lines) in <root of system>/etc/ 
lvm.conf to:

> filter = [ "a|^/dev/md.*$|", "r/.*/" ]


That will prevent you from using anything but the MD.

To update the initrd with this information depends on distro (and  
distro version)\0.  It's usually either some invocation of "mkinitrd"  
or some script that wraps it.  It will get the LVM configuration  
available at boot-time.  This *MIGHT* sort out the MD problem.  It  
might not.  If it doesn't, I'm not sure where to tell you to start.   
If mdadm is being used by your initrd, you'll need to tweak its  
configuration.  If it's relying on MD autodetection, you might have  
turned that off in your kernel.  If you have an IDE controller that  
takes too long to initialize, that can also cause this sort of thing  
(although that's REALLY unlikely these days).

I hope that some of this helps.  Although, it will be hard for anyone  
to give you really solid advice without a little more insight into why  
the MD isn't getting assembled prior to LVM's scan.

On Apr 5, 2009, at 10:05 AM, Miles Fidelman wrote:

> Hello again Folks,
>
> So.. I'm getting closer to fixing this messed up machine.
>
> Where things stand:
>
> I have root defined as an LVM2 LV, that should use /dev/md2 as it's  
> PV.
> /dev/md2 in turn is a RAID1 array built from /dev/sda3 /dev/sdb3  
> and /dev/sdc3
>
> Instead, LVM is reporting: "Found duplicate PV  
> 2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3 not /dev/sda3"
> and the /dev/md2 is reporting itself as inactive (cat /proc/mdstat)  
> and active,degraded (mdadm --detail)
>
> ---
> I'm guessing that, during boot:
>
> - the raid array failed to start
> - LVM found both copies of the PV, and picked one (/dev/sdb3)
> - everything then came up and my server is humming away
>
> but: the md array can't rebuild because the most current device in  
> it is already in use
>
> so...  I'm looking for the right sequence of events, with the  
> minimum downtime to:
>
> 1. stop changes to /dev/sdb3 (actually, to / - which complicates  
> things)
> 2. rebuild the RAID1 array, making sure to use /dev/sdb3 as the  
> starting point for current data
> 3. restart in such a way that LVM finds /dev/md2 as the right PVM  
> instead of one of its components
>
> Each of these is just tricky enough that I'm sure there are lots of  
> gotchas to watch out for.
>
> So.. any suggestions?
>
> Thanks very much,
>
> Miles Fidelman
>
>
>
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

-- 
Jayson Vantuyl
Founder and Architect
Engine Yard
jvantuyl@engineyard.com
1 866 518 9275 ext 204
IRC (freenode): kagato


[-- Attachment #2: Type: text/html, Size: 12575 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] progress, but... - re. fixing LVM/md snafu
  2009-04-05 18:32 ` [linux-lvm] " Jayson Vantuyl
@ 2009-04-05 21:12   ` Miles Fidelman
  2009-04-06 14:17   ` Miles Fidelman
  1 sibling, 0 replies; 6+ messages in thread
From: Miles Fidelman @ 2009-04-05 21:12 UTC (permalink / raw)
  To: LVM general discussion and development

Jayson,

This is VERY helpful. Thanks!

Miles

Jayson Vantuyl wrote:
> Miles,
>
> It seems like what's probably happened is that LVM detected the raw 
> device instead of the MD device at some point early in the boot 
> process. This may be because the MD detection happened after LVM 
> setup. I'm unsure if it's possible for LVM to "steal" the device from MD.
>
> Depending on your distribution, this may require different things to 
> fix. Stop worrying about downtime. If the data is important, just 
> don't worry about downtime. If downtime is really important, build a 
> second machine, get it working right, and transfer the data. Being in 
> a hurry and attempting to "optimize" the recovery process is a really 
> good way to lose the data.
>
> Assuming that you're going to try to fix this setup, I'd start out 
> with a backup. This is critical. Everybody always says to do a backup. 
> Nobody ever does it. Really, do one. Get an S3 account, use an S3 
> backup utility. There's just not an excuse these days. Your data is 
> one-MD-mistake away from oblivion.
>
> So, right now MD should have sda/sdb but only has sda. sdb is now 
> newer than sda and may have important data if this server stores 
> anything like that. The challenge is that, according to MD, sda is 
> newer. Since MD isn't handling writes to sdb, it won't be updating its 
> metadata to know that it's newer. There are two options that I can 
> think of, both ugly. Pick one of:
>
> 1. Destroy the MD. Create a new one with the same UUID and sdb3 as the 
> source. (which you listed, the UUID part can trip you up)
> 2. Sync the updated data from sdb3 onto md2. Wipe sdb3. Add it back 
> into md2. (might be less downtime depending on data size, doesn't nuke MD)
> 3. Build another machine. Get it working right. Transfer data with 
> Rsync. (least downtime, most expensive)
>
> In the first two cases, this only sets you up for it to break again. 
> The core problem is figuring out what happened during boot. In a 
> perfect world, you would just tell LVM to only consider MD devices. 
> That's not hard, but it's complicated by the fact that you have LVM on 
> /. This means that the configuration that's used is likely not the 
> version on / but a copy of it that is made when you set up your boot 
> ramdisk (a.k.a. initrd, or possibly an initramfs). Even if we get LVM 
> locked down to use just MDs and get that config used to boot-time, 
> there's the possibility that the MD won't get assembled (since it 
> already may not have been when LVM was first activated) and the system 
> won't boot. Again, fraught with peril.
>
> If you want to fix the MD, first steps will be using a rescue LiveCD 
> to boot up and do all of this. With that LiveCD, you can also adjust 
> the LVM configuration and update the initrd (or whatever is used for 
> boot). You may need to chroot into the system and/or trick the initrd 
> into seeing the right devices. I don't really think I can walk you 
> through this via an e-mail.
>
> The LVM part is pretty easy. Just set a filter line (you only get one, 
> so disable any other filter lines) in <root of system>/etc/lvm.conf to:
>
>> filter = [ "a|^/dev/md.*$|", "r/.*/" ]
>
> That will prevent you from using anything but the MD.
>
> To update the initrd with this information depends on distro (and 
> distro version)�. It's usually either some invocation of "mkinitrd" or 
> some script that wraps it. It will get the LVM configuration available 
> at boot-time. This *MIGHT* sort out the MD problem. It might not. If 
> it doesn't, I'm not sure where to tell you to start. If mdadm is being 
> used by your initrd, you'll need to tweak its configuration. If it's 
> relying on MD autodetection, you might have turned that off in your 
> kernel. If you have an IDE controller that takes too long to 
> initialize, that can also cause this sort of thing (although that's 
> REALLY unlikely these days).
>
> I hope that some of this helps. Although, it will be hard for anyone 
> to give you really solid advice without a little more insight into why 
> the MD isn't getting assembled prior to LVM's scan.
>
> On Apr 5, 2009, at 10:05 AM, Miles Fidelman wrote:
>
>> Hello again Folks,
>>
>> So.. I'm getting closer to fixing this messed up machine.
>>
>> Where things stand:
>>
>> I have root defined as an LVM2 LV, that should use /dev/md2 as it's PV.
>> /dev/md2 in turn is a RAID1 array built from /dev/sda3 /dev/sdb3 and 
>> /dev/sdc3
>>
>> Instead, LVM is reporting: "Found duplicate PV 
>> 2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3 not /dev/sda3"
>> and the /dev/md2 is reporting itself as inactive (cat /proc/mdstat) 
>> and active,degraded (mdadm --detail)
>>
>> ---
>> I'm guessing that, during boot:
>>
>> - the raid array failed to start
>> - LVM found both copies of the PV, and picked one (/dev/sdb3)
>> - everything then came up and my server is humming away
>>
>> but: the md array can't rebuild because the most current device in it 
>> is already in use
>>
>> so... I'm looking for the right sequence of events, with the minimum 
>> downtime to:
>>
>> 1. stop changes to /dev/sdb3 (actually, to / - which complicates things)
>> 2. rebuild the RAID1 array, making sure to use /dev/sdb3 as the 
>> starting point for current data
>> 3. restart in such a way that LVM finds /dev/md2 as the right PVM 
>> instead of one of its components
>>
>> Each of these is just tricky enough that I'm sure there are lots of 
>> gotchas to watch out for.
>>
>> So.. any suggestions?
>>
>> Thanks very much,
>>
>> Miles Fidelman
>>
>>
>>
>>
>> _______________________________________________
>> linux-lvm mailing list
>> linux-lvm@redhat.com <mailto:linux-lvm@redhat.com>
>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
> -- 
> Jayson Vantuyl
> Founder and Architect
> *Engine Yard <http://www.engineyard.com>*
> jvantuyl@engineyard.com <mailto:jvantuyl@engineyard.com>
> 1 866 518 9275 ext 204
> IRC (freenode): kagato
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


-- 
Miles R. Fidelman, Director of Government Programs
Traverse Technologies 
145 Tremont Street, 3rd Floor
Boston, MA  02111
mfidelman@traversetechnologies.com
857-362-8314
www.traversetechnologies.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: progress, but... - re. fixing LVM/md snafu
  2009-04-05 17:05 ` Miles Fidelman
  (?)
  (?)
@ 2009-04-05 21:44 ` Goswin von Brederlow
  -1 siblings, 0 replies; 6+ messages in thread
From: Goswin von Brederlow @ 2009-04-05 21:44 UTC (permalink / raw)
  To: Miles Fidelman; +Cc: debian-user, linux-raid, linux-lvm

Miles Fidelman <mfidelman@traversetechnologies.com> writes:

> Hello again Folks,
>
> So.. I'm getting closer to fixing this messed up machine.
>
> Where things stand:
>
> I have root defined as an LVM2 LV, that should use /dev/md2 as it's PV.
> /dev/md2 in turn is a RAID1 array built from /dev/sda3 /dev/sdb3 and
> /dev/sdc3
>
> Instead, LVM is reporting: "Found duplicate PV
> 2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3 not /dev/sda3"
> and the /dev/md2 is reporting itself as inactive (cat /proc/mdstat)
> and active,degraded (mdadm --detail)

So you didn't tell lvm.conf to ignore raid component devices or the
detection fails. Worst case exclude sd?3 manualy.

After that a reboot should fix it.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] progress, but... - re. fixing LVM/md snafu
  2009-04-05 18:32 ` [linux-lvm] " Jayson Vantuyl
  2009-04-05 21:12   ` Miles Fidelman
@ 2009-04-06 14:17   ` Miles Fidelman
  1 sibling, 0 replies; 6+ messages in thread
From: Miles Fidelman @ 2009-04-06 14:17 UTC (permalink / raw)
  To: LVM general discussion and development

Hi Jayson,

Thanks for all the detailed information yesterday.  I've done some more 
digging into my system, and I wonder if you'd be willing to comment on 
what I found, and the recovery procedure I'm considering.

Quick summary of situation:
- machine comes up, but LVM builds / on top of /dev/sdb3 instead of 
/dev/md2 of which /dev/sdb3 is a part
- looks like md2 isn't starting, so I need to fix it (presumably 
offline, using a LiveCD), then reboot and get LVM to use the mirror device

What's confusing is that the raid isn't starting at boot time, but 
depending on which tools I use shows different status.  So first, I have 
to get the raid working again and make sure it has the up-to-date data.

Here are some more details, broken into four sections: RAID, LVM, boot 
process, recovery procedure - the RAID section has a summary at the 
front, followed by details of command listings, the other sections are 
much shorter :-):

Comments on the recovery procedure, please!

---------- re. the RAID array --------
RE. the raid array:

summary:
- /proc/mdstat thinks the array is inactive, containing sdb3 and sdd3

- mdadm thinks it's active, degraded, also containing sdb3 and sdd3 
(mdadm -D /dev/md2)

- looking at superblocks, mdadm seems to think it's active, degraded 
(mdadm -E /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3)
-- containing sda3, only (mdadm -E /dev/sda3)
-- containing sda3, with sdb3 spare (mdadm -E /dev/sdb3)
-- containing sda3 and sdb3, with sdc3 spare (mdadm -E /dev/sdc3) - with 
the same Magic #, different UUID from above
-- no superblock on /dev/sdd3 (mdadm -E /dev/sdd3)

details:
more /proc/mdstat:
md2 : inactive sdd3[0] sdb3[2]
     195318016 blocks

<looking@RAID>
mdadm -D /dev/md2:
/dev/md2:
       Version : 00.90.01
 Creation Time : Thu Jul 20 06:15:18 2006
    Raid Level : raid1
   Device Size : 97659008 (93.13 GiB 100.00 GB)
  Raid Devices : 2
 Total Devices : 2
Preferred Minor : 2
   Persistence : Superblock is persistent

   Update Time : Fri Apr  3 10:06:41 2009
         State : active, degraded
Active Devices : 0
Working Devices : 2
Failed Devices : 0
 Spare Devices : 2

   Number   Major   Minor   RaidDevice State
      0       8       51        0      spare rebuilding   /dev/sdd3
      1       0        0        -      removed

      2       8       19        -      spare   /dev/sdb3

<looking@component devices>
server1:/etc/lvm# mdadm -E  /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
/dev/sda3:
         Magic : a92b4efc
       Version : 00.90.00
          UUID : 3a32acee:8a132ab9:545792a8:0df49d99
 Creation Time : Thu Jul 20 06:15:18 2006
    Raid Level : raid1
  Raid Devices : 2
 Total Devices : 1
Preferred Minor : 2

   Update Time : Fri Apr  3 22:40:39 2009
         State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 1
 Spare Devices : 0
      Checksum : 71d21f34 - correct
        Events : 0.114704240


     Number   Major   Minor   RaidDevice State
this     0       8        3        0      active sync   /dev/sda3

  0     0       8        3        0      active sync   /dev/sda3
  1     1       0        0        1      faulty removed
/dev/sdb3:
         Magic : a92b4efc
       Version : 00.90.00
          UUID : 3a32acee:8a132ab9:545792a8:0df49d99
 Creation Time : Thu Jul 20 06:15:18 2006
    Raid Level : raid1
  Raid Devices : 2
 Total Devices : 2
Preferred Minor : 2

   Update Time : Fri Apr  3 10:06:41 2009
         State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 1
 Spare Devices : 1
      Checksum : 71d1d1fa - correct
        Events : 0.114716950


     Number   Major   Minor   RaidDevice State
this     2       8       19        2      spare   /dev/sdb3

  0     0       8        3        0      active sync   /dev/sda3
  1     1       0        0        1      faulty removed
  2     2       8       19        2      spare   /dev/sdb3
/dev/sdc3:
         Magic : a92b4efc
       Version : 00.90.00
          UUID : 635fb32e:6a83a5be:12735af4:74016e66
 Creation Time : Wed Jul  2 12:48:36 2008
    Raid Level : raid1
  Raid Devices : 2
 Total Devices : 3
Preferred Minor : 2

   Update Time : Fri Apr  3 06:42:50 2009
         State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 0
 Spare Devices : 1
      Checksum : 95973481 - correct
        Events : 0.26


     Number   Major   Minor   RaidDevice State
this     2       8       35        2      spare   /dev/sdc3

  0     0       8        3        0      active sync   /dev/sda3
  1     1       8       19        1      active sync   /dev/sdb3
  2     2       8       35        2      spare   /dev/sdc3
mdadm: No super block found on /dev/sdd3 (Expected magic a92b4efc, got 
00000000)

<looking@devices with --scan>
server1:/etc/lvm# mdadm -E  --scan /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
ARRAY /dev/md2 level=raid1 num-devices=2 
UUID=635fb32e:6a83a5be:12735af4:74016e66
  devices=/dev/sdc3
ARRAY /dev/md2 level=raid1 num-devices=2 
UUID=3a32acee:8a132ab9:545792a8:0df49d99
  devices=/dev/sda3,/dev/sdb3

-------- re. LVM ---------

/etc/lvm.conf contains the line:
md_component_detection = 0

I expect that if I set it to 1 that would tell LVM to look for RAIDs first.

Also, /etc/lvm/backup/rootvolume contains:
pv0 {
           id = "2ppSS2-q0kO-3t0t-uf8t-6S19-qY3y-pWBOxF"
           device = "/dev/md2"    # Hint only

which suggests that if the RAID is running, lvm will do the right thing

---------- re. boot process ------------
looks like detailed events are:

- MBR loads grub

- grub knows about md and lvm, mounts read-only
-- kernel        /vmlinuz-2.6.8-3-686 root=/dev/mapper/rootvolume-rootlv 
ro mem=4

- during main boot md comes up first, then lvm
-- from rcS.d/S25mdadm-raid: if not already running ... mdadm -A -s -a
---- I'm guessing this fails for /dev/md2

-- from rcS.d/S26lvm:
-- creates lvm device
-- creates dm device
-- does a vgscan
---- which is where this happens:
 Found duplicate PV 2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3 
not /dev/sda3
 Found volume group "backupvolume" using metadata type lvm2
 Found volume group "rootvolume" using metadata type lvm2
-- does a vgchange -a -y
---- which looks like it's picking up on sdb3

--  I'm guessing that if the mirror were active, and based on /dev/sdb3 
- lvm would pick that up as the volume group
** is this where setting md_component_detection = 1 would be helpful?

------------ recovery procedure ------------

here's what I'm thinking of doing - comments please!

1. turn logging on in lvm.conf, reboot, examine logs to confirm above 
guesses (or find out what's really happening)
-- based on the logging, maybe set md_component_detection = 1 in lvm.conf

2. shutdown, boot from LiveCD (I'm using systemrescuecd - great tool by 
the way)

3. backup /dev/sdb3 using partimage (just in case!)

4. try to fix /dev/md2

if it's not running - start it, with only /dev/sdb3; then add in other 
devices
-  A /dev/md2 --add /dev/sdb3 --run  (**is this the right way to do 
this?**)
- add each device back (mdadm -a /dev/sda3; mdadm -a /dev/sdb3; mdadm -a 
/dev/sdd3)
- grow to 3 active devices: mdadm --grow -n 3 /dev/md2

if it's running:
- fail all except /dev/sdb3 (mdadm -f /dev/sda3; mdadm -f /dev/sdb3; 
mdadm -f /dev/sdd3)
- remove all except /dev/sdb3 (mdadm -r /dev/sda3; mdadm -r /dev/sdb3; 
mdadm -r /dev/sdd3)
- add each device back (mdadm -a /dev/sda3; mdadm -a /dev/sdb3; mdadm -a 
/dev/sdd3)
- grow to 3 active devices: mdadm --grow -n 3 /dev/md2

question: do I need to update mdadm.conf?
question: do I need to anything to get rid of the superblock containing 
a different UUID

5. reboot the system

- it may just come up

- if it comes up and lvm is still operating off a single partition, 
repeat the above, but first add a filter to lvm.conf (wash, rinse, 
repeat as necessary)

*** does this seem like a reasonable game plan? ***

Thanks again for  your help!

Miles

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-04-06 14:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-05 17:05 [linux-lvm] progress, but... - re. fixing LVM/md snafu Miles Fidelman
2009-04-05 17:05 ` Miles Fidelman
2009-04-05 18:32 ` [linux-lvm] " Jayson Vantuyl
2009-04-05 21:12   ` Miles Fidelman
2009-04-06 14:17   ` Miles Fidelman
2009-04-05 21:44 ` Goswin von Brederlow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.