linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] DM / LVM hangs if snapshot present on kernel v3.0.3
@ 2012-02-19  2:27 Spelic
  2012-02-19 10:00 ` [linux-lvm] [dm-devel] " Zdenek Kabelac
  2012-02-20 13:51 ` [linux-lvm] " Mike Snitzer
  0 siblings, 2 replies; 7+ messages in thread
From: Spelic @ 2012-02-19  2:27 UTC (permalink / raw)
  To: device-mapper development, linux-lvm

Hello lists,

Do you have any information about a bug in linux v3.0.3, of LVM snapshot 
making a mess at (clean!) reboot?

Symptoms are: message at boot:
     [   15.668799] device-mapper: table: 252:3: snapshot: Snapshot cow 
pairing for exception table handover failed
     [   15.668934] device-mapper: ioctl: error adding target to table
     [   19.388627] device-mapper: table: 252:3: snapshot: Snapshot cow 
pairing for exception table handover failed
     [   19.388786] device-mapper: ioctl: error adding target to table


and then the volume origin and snapshot come out inactive
         lvVM_TP1_d1 vgVM   owc-i- 500.00g
         ...
         tp1d1-snap1 vgVM   swi-i- 600.00g lvVM_TP1_d1 100.00      (*)
(other volumes not having snapshot are active and working)

(*) please note the size occupied in the snapshot is WRONG, it should be 
4.56% and not 100%.

At this point I did:

# lvchange --refresh vgVM/tp1d1-snap1
Couldn't find snapshot origin uuid 
LVM-WUPTe8bqp25OSeRsFcLpC228A6U0r84T22tfFj4EkWbuB6pP5UDTA7nVRfGSCZW7-real.
# lvs
... *everything hangs* ..!!

It hangs in DM code (too bad I lost the stack trace, sorry)
I think the ssh session hanged at uninterruptible sleep, there was no 
kernel panic, I could indeed login again, however the DM devices were 
hanged bad so AFAIR I had to force a reboot without syncing or it would 
not complete the shutdown process.


At reboot the situation at lvs is unchanged, with the two LVM devices 
(origin and snapshot) still inactive.

This time I try refresh on the *origin*:

# lvchange --refresh vgVM/lvVM_TP1_d1
(no output)
#

and magically everything starts working!
I can do lvs, dmsetup table is all filled, etc.
Size occupied in snapshot shown in lvs is back to correct value 4.56%

Then I reboot (clean!) again so to check that problems are solved now...
Surprise!! The problems are back. The two devices, origin and snapshot, 
are again inactive.

This time I think I learned the lesson and I refresh again *the origin*
(I am SURE I used the origin, I triple checked that, I gave *exactly* 
the same command of the previous time)

# lvchange --refresh vgVM/lvVM_TP1_d1

Surprise!! everything hangs!!

Like before, no kernel panic, however ssh session hangs and DM is 
unresponsive so I had to force a reboot without sync or it would not 
complete.


At reboot again devices are inactive.

At this point I am really fed up of LVM snapshots and I fear for our 
data, so I remove the snapshot with lvremove (I don't remember if I had 
to do lvchange --refresh on the origin before lvremove or not)

As soon as I removed the snapshot everything started working flawlessly.


I am very worried about this bug...
We would need snapshot at work for performing live backups, but with 
this situation I don't know if I am risking more with snapshots  or by 
not performing backups.
Do you have any information on this bug, e.g. has this been fixed since 
3.0.3?

Thank you
Sp

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] [dm-devel] DM / LVM hangs if snapshot present on kernel v3.0.3
  2012-02-19  2:27 [linux-lvm] DM / LVM hangs if snapshot present on kernel v3.0.3 Spelic
@ 2012-02-19 10:00 ` Zdenek Kabelac
  2012-02-20 13:51 ` [linux-lvm] " Mike Snitzer
  1 sibling, 0 replies; 7+ messages in thread
From: Zdenek Kabelac @ 2012-02-19 10:00 UTC (permalink / raw)
  Cc: linux-lvm, Spelic

Dne 19.2.2012 03:27, Spelic napsal(a):
> Hello lists,
> 
> Do you have any information about a bug in linux v3.0.3, of LVM snapshot
> making a mess at (clean!) reboot?

Well, your report is unfortunatelly missing a lot of basic information.

So please try to open proper lvm2 bugzilla at bugzilla.redhat.com.
If you are not a Fedora user - pick just fedora system.

We need to know your OS name and version.
Version of the lvm2 tools package.

Preferable metadata backup (vgcfgbackup)

And '-vvvv' trace from hanging command.

> 
> Symptoms are: message at boot:
>     [   15.668799] device-mapper: table: 252:3: snapshot: Snapshot cow pairing
> for exception table handover failed
>     [   15.668934] device-mapper: ioctl: error adding target to table
>     [   19.388627] device-mapper: table: 252:3: snapshot: Snapshot cow pairing
> for exception table handover failed
>     [   19.388786] device-mapper: ioctl: error adding target to table
> 

I guess you activation command (vgchange -ay) should be giving you some error
messages - try to use  -vvvv and attach to bugzilla.

(Just lvchange -ay -vvvv vgVM/lvVM_TP1_d1)

> and then the volume origin and snapshot come out inactive
>         lvVM_TP1_d1 vgVM   owc-i- 500.00g
>         ...
>         tp1d1-snap1 vgVM   swi-i- 600.00g lvVM_TP1_d1 100.00      (*)
> (other volumes not having snapshot are active and working)
> 
> (*) please note the size occupied in the snapshot is WRONG, it should be 4.56%
> and not 100%.
> 
> At this point I did:
> 
> # lvchange --refresh vgVM/tp1d1-snap1
> Couldn't find snapshot origin uuid
> LVM-WUPTe8bqp25OSeRsFcLpC228A6U0r84T22tfFj4EkWbuB6pP5UDTA7nVRfGSCZW7-real.
> # lvs
> ... *everything hangs* ..!!

Note - you cannot manipulate individual snapshots - with old (non-thin)
snapshot you always work with all snapshots and its origin at the same time.

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] DM / LVM hangs if snapshot present on kernel v3.0.3
  2012-02-19  2:27 [linux-lvm] DM / LVM hangs if snapshot present on kernel v3.0.3 Spelic
  2012-02-19 10:00 ` [linux-lvm] [dm-devel] " Zdenek Kabelac
@ 2012-02-20 13:51 ` Mike Snitzer
  2012-02-20 15:09   ` Spelic
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Snitzer @ 2012-02-20 13:51 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-lvm

On Sat, Feb 18 2012 at  9:27pm -0500,
Spelic <spelic@shiftmail.org> wrote:

> Hello lists,
> 
> Do you have any information about a bug in linux v3.0.3, of LVM
> snapshot making a mess at (clean!) reboot?
> 
> Symptoms are: message at boot:
>     [   15.668799] device-mapper: table: 252:3: snapshot: Snapshot
> cow pairing for exception table handover failed
>     [   15.668934] device-mapper: ioctl: error adding target to table
>     [   19.388627] device-mapper: table: 252:3: snapshot: Snapshot
> cow pairing for exception table handover failed
>     [   19.388786] device-mapper: ioctl: error adding target to table
> 
> 
> and then the volume origin and snapshot come out inactive
>         lvVM_TP1_d1 vgVM   owc-i- 500.00g
>         ...
>         tp1d1-snap1 vgVM   swi-i- 600.00g lvVM_TP1_d1 100.00      (*)
> (other volumes not having snapshot are active and working)
> 
> (*) please note the size occupied in the snapshot is WRONG, it
> should be 4.56% and not 100%.
> 
> At this point I did:
> 
> # lvchange --refresh vgVM/tp1d1-snap1
> Couldn't find snapshot origin uuid LVM-WUPTe8bqp25OSeRsFcLpC228A6U0r84T22tfFj4EkWbuB6pP5UDTA7nVRfGSCZW7-real.
> # lvs
> ... *everything hangs* ..!!
> 
> It hangs in DM code (too bad I lost the stack trace, sorry)
> I think the ssh session hanged at uninterruptible sleep, there was
> no kernel panic, I could indeed login again, however the DM devices
> were hanged bad so AFAIR I had to force a reboot without syncing or
> it would not complete the shutdown process.
> 
> 
> At reboot the situation at lvs is unchanged, with the two LVM
> devices (origin and snapshot) still inactive.
> 
> This time I try refresh on the *origin*:
> 
> # lvchange --refresh vgVM/lvVM_TP1_d1
> (no output)
> #
> 
> and magically everything starts working!
> I can do lvs, dmsetup table is all filled, etc.
> Size occupied in snapshot shown in lvs is back to correct value 4.56%
> 
> Then I reboot (clean!) again so to check that problems are solved now...
> Surprise!! The problems are back. The two devices, origin and
> snapshot, are again inactive.
> 
> This time I think I learned the lesson and I refresh again *the origin*
> (I am SURE I used the origin, I triple checked that, I gave
> *exactly* the same command of the previous time)
> 
> # lvchange --refresh vgVM/lvVM_TP1_d1
> 
> Surprise!! everything hangs!!
> 
> Like before, no kernel panic, however ssh session hangs and DM is
> unresponsive so I had to force a reboot without sync or it would not
> complete.
> 
> 
> At reboot again devices are inactive.
> 
> At this point I am really fed up of LVM snapshots and I fear for our
> data, so I remove the snapshot with lvremove (I don't remember if I
> had to do lvchange --refresh on the origin before lvremove or not)
> 
> As soon as I removed the snapshot everything started working flawlessly.
> 
> 
> I am very worried about this bug...
> We would need snapshot at work for performing live backups, but with
> this situation I don't know if I am risking more with snapshots  or
> by not performing backups.
> Do you have any information on this bug, e.g. has this been fixed
> since 3.0.3?

I've never seen this.

Which distro are you using?

The "Snapshot cow pairing for exception table handover failed" is the
error path most commonly associated with the snapshot-merge feature.
Are you using snapshot-merge for the root LV (e.g. lvconvert --merge ...)?

Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] DM / LVM hangs if snapshot present on kernel v3.0.3
  2012-02-20 13:51 ` [linux-lvm] " Mike Snitzer
@ 2012-02-20 15:09   ` Spelic
  2012-02-20 17:17     ` Zdenek Kabelac
  0 siblings, 1 reply; 7+ messages in thread
From: Spelic @ 2012-02-20 15:09 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, Zdenek Kabelac, linux-lvm

On 02/20/12 14:51, Mike Snitzer wrote:
> I've never seen this. Which distro are you using?

Ubuntu 11.04  64-bit but with vanilla 3.0.3 kernel

Ubuntu's lvm:
# lvm version
   LVM version:     2.02.66(2) (2010-05-20)
   Library version: 1.02.48 (2010-05-20)

> The "Snapshot cow pairing for exception table handover failed" is the 
> error path most commonly associated with the snapshot-merge feature. 
> Are you using snapshot-merge for the root LV (e.g. lvconvert --merge 
> ...)? Mike 

Absolutely not, never merged a snapshot in my life, and also the root of 
the host was *not* on LVM.

Here is some more info:

Host machine had 2 PV, 2 VG (each of 1 PV), only one LV had only one 
snapshot, more specifically:

PV1 (a MD raid) --> VG1 --> 5 x LVs used for virtual machines disks; of 
these, only one LV had one snapshot. No other snapshots. None of these 
volumes were mounted on the host. This is where the problem happened.
PV2 (a MD raid) --> VG2 --> 1 x LV , I don't remember if this was 
mounted on the host at that time or was a disk for a virtual machine, 
but anyway was not the root filesystem of the host. No snapshots here.

Root filesystem of the host was definitely *not* on LVM.

Unfortunately the machine where it happened is a production machine and 
it's too dangerous to try to reproduce it there; this weekend I tried to 
reproduce it elsewhere but clearly it didn't happen. I had a stack trace 
of the DM hang but I lost it, stupid me...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] DM / LVM hangs if snapshot present on kernel v3.0.3
  2012-02-20 15:09   ` Spelic
@ 2012-02-20 17:17     ` Zdenek Kabelac
  2012-02-21 10:22       ` Spelic
  0 siblings, 1 reply; 7+ messages in thread
From: Zdenek Kabelac @ 2012-02-20 17:17 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-lvm, Mike Snitzer, Spelic

Dne 20.2.2012 16:09, Spelic napsal(a):
> On 02/20/12 14:51, Mike Snitzer wrote:
>> I've never seen this. Which distro are you using?
> 
> Ubuntu 11.04  64-bit but with vanilla 3.0.3 kernel
> 
> Ubuntu's lvm:
> # lvm version
>   LVM version:     2.02.66(2) (2010-05-20)
>   Library version: 1.02.48 (2010-05-20)
> 

So I'd guess you might be a 'victim' of Debian home-brew udev rules for lvm2,
which were unfortunately not consulted with upstream (nor udev, nor lvm).

If you really need 'quick' solution - try to build CVS version
(shouldn't be really hard) - second thing to try my be more recent Debian
lvm2  build which has merged most of udev rules from upstream (though there
are still some deviation that might be source of troubles).

If you want to get some support for your version of OS and lvm2, you may try
to open Ubuntu bugzilla.

> PV1 (a MD raid) --> VG1 --> 5 x LVs used for virtual machines disks; of these,
> only one LV had one snapshot. No other snapshots. None of these volumes were
> mounted on the host. This is where the problem happened.
> PV2 (a MD raid) --> VG2 --> 1 x LV , I don't remember if this was mounted on
> the host at that time or was a disk for a virtual machine, but anyway was not
> the root filesystem of the host. No snapshots here.
> 
> Root filesystem of the host was definitely *not* on LVM.
> 
> Unfortunately the machine where it happened is a production machine and it's
> too dangerous to try to reproduce it there; this weekend I tried to reproduce
> it elsewhere but clearly it didn't happen. I had a stack trace of the DM hang
> but I lost it, stupid me...

It's always best to have reproducer on different machine so you may check your
solution you want to deploy on production machine.

Note - upstream code contains test suite (subdir test) which you may try to
run in your environment to see whether all tests passes.

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] DM / LVM hangs if snapshot present on kernel v3.0.3
  2012-02-20 17:17     ` Zdenek Kabelac
@ 2012-02-21 10:22       ` Spelic
  2012-02-21 11:08         ` Zdenek Kabelac
  0 siblings, 1 reply; 7+ messages in thread
From: Spelic @ 2012-02-21 10:22 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: device-mapper development, Mike Snitzer, linux-lvm

On 02/20/12 18:17, Zdenek Kabelac wrote:
> Dne 20.2.2012 16:09, Spelic napsal(a):
>> On 02/20/12 14:51, Mike Snitzer wrote:
>>> I've never seen this. Which distro are you using?
>> Ubuntu 11.04  64-bit but with vanilla 3.0.3 kernel
>>
>> Ubuntu's lvm:
>> # lvm version
>>    LVM version:     2.02.66(2) (2010-05-20)
>>    Library version: 1.02.48 (2010-05-20)
>>
> So I'd guess you might be a 'victim' of Debian home-brew udev rules for lvm2,
> which were unfortunately not consulted with upstream (nor udev, nor lvm).

Indeed I have had a few bad experiences with Ubuntu's udev rules, but 
never like this...
I don't think a wrong udev rule can hangup the whole machine with such 
symptoms, you really think so?

- lvs and/or "lvchange --refresh ..." processes hanging forever in 
kernel code (and AFAIR it was in DM region, that's the /proc/pid/stack 
trace which I unfortunately lost)

- "sync" also cannot complete, so I can only do force reboot without flush

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] DM / LVM hangs if snapshot present on kernel v3.0.3
  2012-02-21 10:22       ` Spelic
@ 2012-02-21 11:08         ` Zdenek Kabelac
  0 siblings, 0 replies; 7+ messages in thread
From: Zdenek Kabelac @ 2012-02-21 11:08 UTC (permalink / raw)
  Cc: Zdenek Kabelac, linux-lvm, Spelic

Dne 21.2.2012 11:22, Spelic napsal(a):
> On 02/20/12 18:17, Zdenek Kabelac wrote:
>> Dne 20.2.2012 16:09, Spelic napsal(a):
>>> On 02/20/12 14:51, Mike Snitzer wrote:
>>>> I've never seen this. Which distro are you using?
>>> Ubuntu 11.04  64-bit but with vanilla 3.0.3 kernel
>>>
>>> Ubuntu's lvm:
>>> # lvm version
>>>    LVM version:     2.02.66(2) (2010-05-20)
>>>    Library version: 1.02.48 (2010-05-20)
>>>
>> So I'd guess you might be a 'victim' of Debian home-brew udev rules for lvm2,
>> which were unfortunately not consulted with upstream (nor udev, nor lvm).
> 
> Indeed I have had a few bad experiences with Ubuntu's udev rules, but never
> like this...
> I don't think a wrong udev rule can hangup the whole machine with such
> symptoms, you really think so?
> 
> - lvs and/or "lvchange --refresh ..." processes hanging forever in kernel code
> (and AFAIR it was in DM region, that's the /proc/pid/stack trace which I
> unfortunately lost)

lvs might be just waiting for unfinished  lvchange process (lock holder)

You could probably lock at 'gdb  backtrace' to see for what it it waiting
(or just use  lvs -vvvv)

There is probably an easy way to detect udev related problems - check for
udev cookies waiting to be completed.

'dmsetup udevcookies'

And you may manually force it's completion by:

'dmsetup udevcomplete_all'

to see whether it will make lvchange to continue.

> 
> - "sync" also cannot complete, so I can only do force reboot without flush
> 
> -- 
> dm-devel mailing list
> dm-devel@redhat.com

We should probably keep this on  linux-lvm  list only as it's not dm related.

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-02-21 11:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-19  2:27 [linux-lvm] DM / LVM hangs if snapshot present on kernel v3.0.3 Spelic
2012-02-19 10:00 ` [linux-lvm] [dm-devel] " Zdenek Kabelac
2012-02-20 13:51 ` [linux-lvm] " Mike Snitzer
2012-02-20 15:09   ` Spelic
2012-02-20 17:17     ` Zdenek Kabelac
2012-02-21 10:22       ` Spelic
2012-02-21 11:08         ` Zdenek Kabelac

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).