linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Clements <paul.clements@steeleye.com>
To: linux-raid@vger.kernel.org, neilb@suse.de
Subject: [BUG] raid1: barrier retry does not work correctly with write-behind
Date: Thu, 03 Aug 2006 11:44:06 -0400	[thread overview]
Message-ID: <44D219C6.3090505@steeleye.com> (raw)

Description:
-----------
When a filesystem sends a barrier write down to raid1, raid1 tries to 
pass the write down to its component devices. However, if one or more of 
the devices return EOPNOTSUPP, it means that they do not support 
barriers, and raid1 must strip the barrier out of the write and retry 
it. Now, this all works fine in most cases. However, in a raid1 mirror 
with write-behind enabled, we end up getting a corrupt bio sent down for 
the write retry. The bug is in the following code (raid1.c, line 1409):

} else if (test_bit(R1BIO_BarrierRetry, &r1_bio->state)) {
      /* some requests in the r1bio were BIO_RW_BARRIER
       * requests which failed with -EOPNOTSUPP.  Hohumm..
       * Better resubmit without the barrier.
       * We know which devices to resubmit for, because
       * all others have had their bios[] entry cleared.
       * We already have a nr_pending reference on these rdevs.
       */
      int i;
      clear_bit(R1BIO_BarrierRetry, &r1_bio->state);
      clear_bit(R1BIO_Barrier, &r1_bio->state);
      for (i=0; i < conf->raid_disks; i++)
          if (r1_bio->bios[i])
              atomic_inc(&r1_bio->remaining);
      for (i=0; i < conf->raid_disks; i++)
          if (r1_bio->bios[i]) {
              struct bio_vec *bvec;
              int j;

              bio = bio_clone(r1_bio->master_bio, GFP_NOIO);


The r1_bio->master_bio may already have had end_io() called and been 
freed by the time we bio_clone() it. This results in an invalid bio 
being sent down to one (or more) of the raid1 components. In my testing, 
the original 4K barrier write ended up being a zero length write when it 
was retried.

Solution:
--------
Can we just bio_alloc instead of cloning? I think we have all the sector 
and offset data in the r1_bio. We could then bio_add_page() the pages 
into the new bio, possibly? Or could we just use the original bio that 
failed, being careful to clean up fields (I think we used to do this but 
Neil changed the behavior because of some problems).

--
Paul

             reply	other threads:[~2006-08-03 15:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-03 15:44 Paul Clements [this message]
2006-08-04  1:47 ` [BUG] raid1: barrier retry does not work correctly with write-behind Neil Brown
2006-08-04  3:09   ` Paul Clements
2006-08-04  3:35     ` Paul Clements

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44D219C6.3090505@steeleye.com \
    --to=paul.clements@steeleye.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).