* [linux-lvm] Found: workaround for crash on snapshot removal, and hopefully a good clue to the underlying bug
@ 2005-12-07 19:34 James G. Sack (jim)
0 siblings, 0 replies; only message in thread
From: James G. Sack (jim) @ 2005-12-07 19:34 UTC (permalink / raw)
To: LVM LIST linux-lvm@redhat.com; +Cc: Alasdair G Kergon
Hooray!
I think I've found a definitive clue to a crash during lvremove of a
snapshot. I have a reliably repeatable failure test and a workaround
that seems to be passing.
Here's the regression test:
--------------------------
1. arrange to have some continuous i/o on an lvm volume
I do it with a simple shell loop that copies a 1GB file to another name
and then back (essentially: 'while :;do cp abcd wxyz;cp wxyz abcd;done')
2. while that's running, start a snapshot create/remove loop
Such as 'while :;do lvcreate -snSnap -L10G LVorigin;
lvremove -f /dev/VG/Snap;done
My experience is that a system crash always occurs upon executing the
lvremove call. The first one!
(On my most recent experiments, the system is locking hard,
although earlier I was able to see a kcopyd oops and the
keyboard scollback worked.)
Here's the workaround
---------------------
In the snap-cycle test surround the lvremove command with suspend/resume
dmsetup suspend VG-LVorigin
lvremove -f /dev/VGorigin/Snap
dmsetup resume VG-LVorigin
I am currently testing this workaround on a patched 2.6.14-1.1637_FC4
kernel
(using 4 patches suggested by agk on Tue, 15 Nov 2005 22:33:58 +0000)
<excerpt from that prior message>
---------------------------------
> > The kcopyd.c BUG at line 145 is triggered by the first lvremove
> > following start of the i/o (copy loop).
Try some kernel patches.
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/
in particular these four:
dm-snapshot-bio_list-fix.patch
dm-snapshot-metadata-reading-separation.patch
dm-snapshot-load-metadata-on-creation.patch
dm-ioctl-reduce-pf-memalloc-usage.patch
</excerpt>
==> BUT I suspect the lvremove problem is independent of those patches,
as I was getting the same symptom before putting in the suspend/resume.
I thought I had tried suspend/resume previously and found that they were
unnecessary because the create automatically performed a suspend/resume
-- so my current workaround is the result of a desperation-experiment of
applying the suspend/resume wrapper ONLY to the lvremove step.
==> SO MAYBE this current success points to a bug in the lvremove code,
eh?
I plan on repeating my test on a vanilla kernel. In the meantime, I hope
someone can look at the lvremove code (agk?..).
Regards,
..jim
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2005-12-07 19:34 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-07 19:34 [linux-lvm] Found: workaround for crash on snapshot removal, and hopefully a good clue to the underlying bug James G. Sack (jim)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).