* [linux-lvm] [PATCH] lvremove: perform retry logic also on -real subvolume for snapshots
@ 2013-07-30 15:18 Christian Seiler
2013-07-31 8:50 ` Peter Rajnoha
0 siblings, 1 reply; 8+ messages in thread
From: Christian Seiler @ 2013-07-30 15:18 UTC (permalink / raw)
To: linux-lvm; +Cc: Christian Seiler
Many people have reported that lvremove may fail spuriously, even with
the -f option. This has been identified to be due to a combination of
LVM, udev and possibly other utilities, in that somehow udev rules may
be triggered (either by lvremove itself or by external utilities) that
cause small helper programs to open the LV for a short time to retrieve
information about its contents, which in turn causes the volume to be
in use in the precise moment that lvremove tries to remove it. In the
past, two main changes were made to solve this problem:
- LVM2 will try to remove the logical volume several times if it is
just opened by a program and not mounted, so that if a udev rule
just happened to be triggered at the same time, the retry will
hopefully take place once udev processing is over and the removal
can succeed.
- LVM2 tries really hard not to trigger those udev rules by itself,
by not opening the devices R/W during initialization.
This commit is a follow-up on the first change. If the last snapshot of
a volume is removed, the -real subvolume is not removed via the
codepath that has the retry logic, but by just doing a cleanup of
unused volumes. This means that spurious failures may still occur,
although not as often.
This patch adds the same retry logic to the cleanup codepath in order
to make it even more unlikely that such a spurious error may occur.
Signed-off-by: Christian Seiler <christian@iwakd.de>
---
lib/activate/dev_manager.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/lib/activate/dev_manager.c b/lib/activate/dev_manager.c
index 6a68653..1708946 100644
--- a/lib/activate/dev_manager.c
+++ b/lib/activate/dev_manager.c
@@ -2614,6 +2614,8 @@ static int _tree_action(struct dev_manager *dm, struct logical_volume *lv,
/* Only process nodes with uuid of "LVM-" plus VG id. */
switch(action) {
case CLEAN:
+ if (retry_deactivation())
+ dm_tree_retry_remove(root);
/* Deactivate any unused non-toplevel nodes */
if (!_clean_tree(dm, root, laopts->origin_only ? dlid : NULL))
goto_out;
--
1.7.2.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [linux-lvm] [PATCH] lvremove: perform retry logic also on -real subvolume for snapshots
2013-07-30 15:18 [linux-lvm] [PATCH] lvremove: perform retry logic also on -real subvolume for snapshots Christian Seiler
@ 2013-07-31 8:50 ` Peter Rajnoha
2013-07-31 11:31 ` Christian Seiler
0 siblings, 1 reply; 8+ messages in thread
From: Peter Rajnoha @ 2013-07-31 8:50 UTC (permalink / raw)
To: LVM general discussion and development; +Cc: Christian Seiler
On 07/30/2013 05:18 PM, Christian Seiler wrote:
> This commit is a follow-up on the first change. If the last snapshot of
> a volume is removed, the -real subvolume is not removed via the
> codepath that has the retry logic, but by just doing a cleanup of
> unused volumes. This means that spurious failures may still occur,
> although not as often.
These subvolumes should already be properly marked with flags
we attach to them during their creation - these are then sent
encoded in uevent and udev rules are skipped based on it,
including skipping the WATCH udev rule.
>
> This patch adds the same retry logic to the cleanup codepath in order
> to make it even more unlikely that such a spurious error may occur.
>
..but yes, we could probably add this patch to make sure something
like that does not happen at all.
But still it would be better to look why those spurious events
are triggered for non-top-level volumes (it's possible the
OPTIONS:="nowatch" in some case, I'll check...).
Peter
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] [PATCH] lvremove: perform retry logic also on -real subvolume for snapshots
2013-07-31 8:50 ` Peter Rajnoha
@ 2013-07-31 11:31 ` Christian Seiler
2013-07-31 12:10 ` Peter Rajnoha
0 siblings, 1 reply; 8+ messages in thread
From: Christian Seiler @ 2013-07-31 11:31 UTC (permalink / raw)
To: linux-lvm
Hi,
>> This commit is a follow-up on the first change. If the last snapshot
>> of
>> a volume is removed, the -real subvolume is not removed via the
>> codepath that has the retry logic, but by just doing a cleanup of
>> unused volumes. This means that spurious failures may still occur,
>> although not as often.
>
> These subvolumes should already be properly marked with flags
> we attach to them during their creation - these are then sent
> encoded in uevent and udev rules are skipped based on it,
> including skipping the WATCH udev rule.
Then perhaps it's not udev? I stumbled across the problem myself with
an old version of LVM (it failed like 95% of the time if udisks was
installed, without it failed less often but still quite a bit), then
after a lot of research I found out that this was supposedly fixed in
a current version, I upgraded and then it became rarer but still
occured in around 5% of cases (I had run a script that would create
a snapshot, mount it, umount it and remove the snapshot, all done in
a loop that only stopped on failure). After a *lot* of debugging and
applying the patch I sent here, I couldn't reproduce a single failure
anymore with 2000 tries. (I stopped afterwards, that was good enough
for me.)
> ..but yes, we could probably add this patch to make sure something
> like that does not happen at all.
I'd appreciate it, thanks.
> But still it would be better to look why those spurious events
> are triggered for non-top-level volumes (it's possible the
> OPTIONS:="nowatch" in some case, I'll check...).
I can try to see if I can reproduce it later this week with udevd
running with --debug to see what is going on.
Christian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] [PATCH] lvremove: perform retry logic also on -real subvolume for snapshots
2013-07-31 11:31 ` Christian Seiler
@ 2013-07-31 12:10 ` Peter Rajnoha
2013-07-31 17:27 ` Christian Seiler
0 siblings, 1 reply; 8+ messages in thread
From: Peter Rajnoha @ 2013-07-31 12:10 UTC (permalink / raw)
To: Christian Seiler; +Cc: LVM general discussion and development
On 07/31/2013 01:31 PM, Christian Seiler wrote:
>> But still it would be better to look why those spurious events
>> are triggered for non-top-level volumes (it's possible the
>> OPTIONS:="nowatch" in some case, I'll check...).
>
> I can try to see if I can reproduce it later this week with udevd
> running with --debug to see what is going on.
>
OK, thanks.
One more question - what's the distro you're using? Some distros
change udev rules or they add completely different ones from upstream
(for example I know that Debian changes upstream LVM udev rules).
Or have you tested this with upstream (or Fedora) LVM version?
Peter
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] [PATCH] lvremove: perform retry logic also on -real subvolume for snapshots
2013-07-31 12:10 ` Peter Rajnoha
@ 2013-07-31 17:27 ` Christian Seiler
2013-08-01 9:19 ` Christian Seiler
0 siblings, 1 reply; 8+ messages in thread
From: Christian Seiler @ 2013-07-31 17:27 UTC (permalink / raw)
To: Peter Rajnoha; +Cc: LVM general discussion and development
Hi there,
> One more question - what's the distro you're using? Some distros
> change udev rules or they add completely different ones from upstream
> (for example I know that Debian changes upstream LVM udev rules).
The computer in question is running Debian Squeeze (oldstable) (where
the LVM version was the one that was too old; a system upgrade to Wheezy
is planned but will take a while) and then I compiled the LVM version
that comes with Wheezy (stable) where at least from what I gathered from
the commit mailing archives here both fixes I mentioned in the original
mail are included.
FYI: These are the LVM udev rools that are in the LVM2 package that's
currently installed.
------------------------------------------------------------------
cat /lib/udev/rules.d/56-lvm.rules
# Udev rules for LVM.
# See /usr/share/doc/lvm2/README.udev for further information.
ACTION!="add|change", GOTO="lvm_end"
ENV{DM_UDEV_RULES}=="", GOTO="lvm_end"
ENV{DM_UUID}!="LVM-?*", GOTO="lvm_end"
# Use DM name and split it up into its VG/LV/layer constituents.
IMPORT{program}="/sbin/dmsetup splitname --nameprefixes --noheadings
--rows $env{DM_NAME}"
ENV{DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG}!="1", GOTO="lvm_end"
ENV{DM_UDEV_DISABLE_DISK_RULES_FLAG}="1"
ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}="1"
OPTIONS:="nowatch"
LABEL="lvm_end"
cat /lib/udev/rules.d/60-persistent-storage-lvm.rules
# Udev rules for LVM.
# See /usr/share/doc/lvm2/README.udev for further information.
ACTION!="add|change", GOTO="persistent_storage_lvm_end"
ENV{DM_UDEV_RULES}=="", GOTO="persistent_storage_lvm_end"
ENV{DM_UUID}!="LVM-?*", GOTO="persistent_storage_lvm_end"
ENV{DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG}=="1",
GOTO="persistent_storage_lvm_end"
ENV{DM_VG_NAME}=="?*", ENV{DM_LV_NAME}=="?*", ENV{DM_LV_LAYER}=="",
SYMLINK+="$env{DM_VG_NAME}/$env{DM_LV_NAME}"
LABEL="persistent_storage_lvm_end"
------------------------------------------------------------------
> Or have you tested this with upstream (or Fedora) LVM version?
I will do so once I get the chance. But since it is very likely that the
specific environment here might also contribute to the likelyhood of the
failures occuring, I'll do so not by trying it on a clean Fedora install
but by compiling the current upstream git on the Debian computer in
question.
Christian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] [PATCH] lvremove: perform retry logic also on -real subvolume for snapshots
2013-07-31 17:27 ` Christian Seiler
@ 2013-08-01 9:19 ` Christian Seiler
2013-08-01 10:16 ` Zdenek Kabelac
0 siblings, 1 reply; 8+ messages in thread
From: Christian Seiler @ 2013-08-01 9:19 UTC (permalink / raw)
To: linux-lvm
Hi again,
so I tried to use the current git version of LVM to see if I could
reproduce the problem, and I couldn't, even with Debian's udev rules.
Apparently, there is still a difference between the version that comes
with Debian Wheezy and the current git tree with respect to this
problem. (I actually thought the relevant patches were included, but
apparently I was wrong.) On the other hand, I'm too lazy to bisect the
entire commit history to see what exactly was the change that fixed it.
Nevertheless, I think it might be a good idea to still apply my patch
(perhaps with a slightly different commit message) just to provide an
additional layer of robustness, especially since it fixes the problem
in a previous version.
Christian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] [PATCH] lvremove: perform retry logic also on -real subvolume for snapshots
2013-08-01 9:19 ` Christian Seiler
@ 2013-08-01 10:16 ` Zdenek Kabelac
2013-08-01 10:53 ` Christian Seiler
0 siblings, 1 reply; 8+ messages in thread
From: Zdenek Kabelac @ 2013-08-01 10:16 UTC (permalink / raw)
To: LVM general discussion and development; +Cc: Christian Seiler
Dne 1.8.2013 11:19, Christian Seiler napsal(a):
> Hi again,
>
> so I tried to use the current git version of LVM to see if I could
> reproduce the problem, and I couldn't, even with Debian's udev rules.
The problem with Debian is it's shipped with non-upstream patches which are
changing whole udev synchronization - and unfortunately they are not correct
and Debian maintainer is somehow nonresponsive here.
(At least the latest unstable package have been using those udev monitoring
extentsions).
> Nevertheless, I think it might be a good idea to still apply my patch
Nope. They would mask problems - the code should already handle your
case at different level.
Anyway - before reporting any udev synchronization problem - always check with
upstream git and udev rules first - since the Debian version is usually
significantly changed in this area....
Zdenek
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] [PATCH] lvremove: perform retry logic also on -real subvolume for snapshots
2013-08-01 10:16 ` Zdenek Kabelac
@ 2013-08-01 10:53 ` Christian Seiler
0 siblings, 0 replies; 8+ messages in thread
From: Christian Seiler @ 2013-08-01 10:53 UTC (permalink / raw)
To: linux-lvm
Hi there,
>> Nevertheless, I think it might be a good idea to still apply my
>> patch
>
> Nope. They would mask problems - the code should already handle your
> case at different level.
I would like to submit to you that even though lvremove itself might
not trigger the udev rules that conflict with it, some other program
might by accident at the same, which could still cause problems.
Or, to turn your argument around: if you really think that performing a
retry on the -real device when deactivating is not a good idea, then
conversely one should probably remove the retry logic also from the
main device. (Which btw. appears to work, I couldn't reproduce a
failure if I comment out the same lines in the 'case DEACTIVATE' that
are already there in the current git.)
That said, personally I believe that increasing the robustness of a
piece of software is generally a good thing; not failing unless
absolutely necessary seems to be a sound general policy to me. As for
masking the problem: simply printing a warning message in case of the
retry would be sufficient in my eyes to still notice if the
synchronization logic fails.
On the other hand, my main interest is that it works, which appears to
be the case with current git, so take my 2¢ or leave them at your
discretion; I'll now start pestering the Debian bugtracker to get the
issue fixed there.
> Anyway - before reporting any udev synchronization problem - always
> check with upstream git and udev rules first - since the Debian
> version is usually significantly changed in this area....
I didn't know that the Debian patches were that invasive, sorry.
Thanks,
Christian
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-08-01 10:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-30 15:18 [linux-lvm] [PATCH] lvremove: perform retry logic also on -real subvolume for snapshots Christian Seiler
2013-07-31 8:50 ` Peter Rajnoha
2013-07-31 11:31 ` Christian Seiler
2013-07-31 12:10 ` Peter Rajnoha
2013-07-31 17:27 ` Christian Seiler
2013-08-01 9:19 ` Christian Seiler
2013-08-01 10:16 ` Zdenek Kabelac
2013-08-01 10:53 ` Christian Seiler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).