Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
From: Urban Loesch <bind@enas.net>
To: Frank Steinborn <steinex@nognu.de>
Cc: drbd-dev@lists.linbit.com, 659762@bugs.debian.org
Subject: Re: [Drbd-dev] Bug#659762: lvm2: LVM commands freeze after snapshot delete fails
Date: Tue, 04 Mar 2014 14:19:15 -0000	[thread overview]
Message-ID: <51F29992.2000305@enas.net> (raw)
In-Reply-To: <CANhLx2MO6aDsiTMmt-O4Fx8mmaF8QvNpC8WOBAT9-kfKEaQmVQ@mail.gmail.com>

Hi,

we had the same problems with Debian Wheezy, LVM2 and DRBD.
But this seems not DRBD related. It seems to be some problem between lvm and udevd.

See:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=549691

Stopping udevd before taking the snapshot and starting after removing the
snapshot solved the problem for us. It's only a workaround, but it works for us.

Regards
Urban


Am 26.07.2013 17:14, schrieb Frank Steinborn:
> Hi,
>
> we are a bit further in debugging this. We installed a DELL PowerEdge r620 (same hardware as used in our DRBD-cluster where this problem happens). As
> noone in this thread brought DRBD into play, I didn't expect any interaction with it related to this bug. However, we were not able to reproduce with
> just LVM2 (eg. configure LV, do IO in LV, remove LV, hang.)
>
> So we installed a second machine and put DRBD on top of the LVs. And voila, as soon as we create a snapshot of the LV where DRBD is on top and remove
> this snapshot it fails ca. 1/3 of the time.
>
> Some facts:
>
> root@drbd-primary:~# lvremove --force /dev/vg0/lv0-snap
>    Unable to deactivate open vg0-lv0--snap-cow (254:3)
>    Failed to resume lv0-snap.
>    libdevmapper exiting with 1 device(s) still suspended.
>
> After this, "dmsetup info" gives the following output:
>
> <<< snip >>>
>
> Name:              vg0-lv0--snap
> State:             ACTIVE
> Read Ahead:        256
> Tables present:    LIVE
> Open count:        0
> Event number:      0
> Major, minor:      254, 1
> Number of targets: 1
> UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYy4WFhwy43CZA1g7zKFGF915pLAOIPvFZ
>
> Name:              vg0-lv0-real
> State:             ACTIVE
> Read Ahead:        0
> Tables present:    LIVE
> Open count:        1
> Event number:      0
> Major, minor:      254, 2
> Number of targets: 1
> UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYC3ppjt1CZ3AcZR2hNz1VT5CHdM4RR32j-real
>
> Name:              vg0-lv0
> State:             SUSPENDED
> Read Ahead:        256
> Tables present:    LIVE & INACTIVE
> Open count:        2
> Event number:      0
> Major, minor:      254, 0
> Number of targets: 1
> UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYC3ppjt1CZ3AcZR2hNz1VT5CHdM4RR32j
>
> Name:              vg0-lv0--snap-cow
> State:             ACTIVE
> Read Ahead:        0
> Tables present:    LIVE
> Open count:        0
> Event number:      0
> Major, minor:      254, 3
> Number of targets: 1
> UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYy4WFhwy43CZA1g7zKFGF915pLAOIPvFZ-cow
>
> <<< snap >>>
>
> As you can see, the real LV with DRBD on top is now in state SUSPENDED - which causes the cluster to be non-functional as IO operations stall on both
> the primary and secondary node until one does "dmsetup resume /dev/vg0/lv0".
>
> Another interesting issue we've seen: after doing "dmsetup resume /dev/vg0/lv0", lv0-snap doesn't appear to be a snapshot anymore, given the output of
> lvs (lv0-snap has no origin anymore):
>
>    LV       VG   Attr     LSize   Pool Origin Data%  Move Log Copy%  Convert
>    lv0      vg0  -wi-ao-- 200.00g
>    lv0-snap vg0  -wi-a---  40.00g
>
>
> Some miscellaneous notes:
> * It _feels_ to only happen when the snapshot is filled at least something around 50-60%.
> * We can trigger something like this even without DRBD. When triggered however, the LV will never end up in SUSPENDED state and a second try of
> lvremove will always succeed.
>
> Thats all we have so far. I already had a private conversation with waldi@debian.org <mailto:waldi@debian.org> on this and we will (probably) provide
> him remote access on this system as soon as we have the setup reachable from the outside.
>
> Please let me know if I can provide any more information to get this fixed. I put drbd-dev in cc, maybe someone over there has an idea on this?
>
> @drbd-dev: system is debian wheezy, w/ drbd 8.3.11, lvm2 2.02.95.
>
> Thanks,
> Frank


           reply	other threads:[~2014-03-04 14:19 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <CANhLx2MO6aDsiTMmt-O4Fx8mmaF8QvNpC8WOBAT9-kfKEaQmVQ@mail.gmail.com>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51F29992.2000305@enas.net \
    --to=bind@enas.net \
    --cc=659762@bugs.debian.org \
    --cc=drbd-dev@lists.linbit.com \
    --cc=steinex@nognu.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox