linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
From: "James G. Sack (jim)" <jsack@tandbergdatacorp.com>
To: "LVM LIST linux-lvm@redhat.com" <linux-lvm@redhat.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Subject: [linux-lvm] Found: workaround for crash on snapshot removal, and hopefully a good clue to the underlying bug
Date: Wed, 07 Dec 2005 11:34:05 -0800	[thread overview]
Message-ID: <1133984046.4964.43.camel@jgs4.ino.pvt> (raw)

Hooray! 

I think I've found a definitive clue to a crash during lvremove of a
snapshot. I have a reliably repeatable failure test and a workaround
that seems to be passing.

Here's the regression test:
--------------------------

1. arrange to have some continuous i/o on an lvm volume
 I do it with a simple shell loop that copies a 1GB file to another name
and then back (essentially: 'while :;do cp abcd wxyz;cp wxyz abcd;done')

2. while that's running, start a snapshot create/remove loop
 Such as 'while :;do lvcreate -snSnap -L10G LVorigin;
  lvremove -f /dev/VG/Snap;done

My experience is that a system crash always occurs upon executing the
lvremove call. The first one! 

  (On my most recent experiments, the system is locking hard, 
   although earlier I was able to see a kcopyd oops and the 
   keyboard scollback worked.)


Here's the workaround
---------------------

In the snap-cycle test surround the lvremove command with suspend/resume
  dmsetup suspend VG-LVorigin
  lvremove -f /dev/VGorigin/Snap
  dmsetup resume VG-LVorigin

I am currently testing this workaround on a patched 2.6.14-1.1637_FC4
kernel 
  (using 4 patches suggested by agk on Tue, 15 Nov 2005 22:33:58 +0000)

<excerpt from that prior message>
---------------------------------
> > The kcopyd.c BUG at line 145 is triggered by the first lvremove
> > following start of the i/o (copy loop).

Try some kernel patches.

  http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/

in particular these four:

  dm-snapshot-bio_list-fix.patch
  dm-snapshot-metadata-reading-separation.patch
  dm-snapshot-load-metadata-on-creation.patch
  dm-ioctl-reduce-pf-memalloc-usage.patch
</excerpt>
  

==> BUT I suspect the lvremove problem is independent of those patches,
as I was getting the same symptom before putting in the suspend/resume.


I thought I had tried suspend/resume previously and found that they were
unnecessary because the create automatically performed a suspend/resume
-- so my current workaround is the result of a desperation-experiment of
applying the suspend/resume wrapper ONLY to the lvremove step. 

==> SO MAYBE this current success points to a bug in the lvremove code,
eh?


I plan on repeating my test on a vanilla kernel. In the meantime, I hope
someone can look at the lvremove code (agk?..).

Regards,
..jim

                 reply	other threads:[~2005-12-07 19:34 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1133984046.4964.43.camel@jgs4.ino.pvt \
    --to=jsack@tandbergdatacorp.com \
    --cc=agk@redhat.com \
    --cc=linux-lvm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).