From: NeilBrown <neilb@suse.de>
To: Paul Clements <paul.clements@us.sios.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: [BUG 2.6.32] md/raid1: barrier disabling does not work correctly in all cases
Date: Fri, 21 Jan 2011 07:25:15 +1100 [thread overview]
Message-ID: <20110121072515.07dc2b74@notabene.brown> (raw)
In-Reply-To: <AANLkTikGsB=ZdA-TpjC+npqW-TM9NjwZkWpjWtUEzswn@mail.gmail.com>
On Thu, 20 Jan 2011 11:52:32 -0500 Paul Clements <paul.clements@us.sios.com>
wrote:
> I'm having a problem with the 2.6.32 kernel (RHEL 6, actually) and the
> barrier disabling code in raid1.
>
> The gist of the problem is that when an fsync is done directly to
> /dev/md0 (fsck.ext4 performs an fsync), the kernel
> (blkdev_issue_flush, specifically) translates this into a zero-length
> write with-barrier-attached. raid1 of course sends this down to its
> component devices, which then return EOPNOTSUPP (in my case, a xen
> block device, which doesn't support barriers). raid1 then strips the
> BIO_RW_BARRIER off and retries the write.
>
> Apparently, a zero-length write (bio) without barrier makes some/all
> block drivers very unhappy (IDE, cciss, and xen-blkfront have been
> tested and all failed on this). I think these zero-length writes must
> normally get filtered out and discarded by filesystem/block layer
> before drivers ever see them, but in this particular case, where md is
> submitting the write directly to the underlying component driver, it
> doesn't get filtered and causes -EIO to be returned (the disk gets
> marked failed, the mirror breaks, chaos ensues...)
>
> I can do an obvious hack, which is to assume all the time that
> barriers don't work. That is, mark mddev->barriers_work as 0 or just
> disable that check:
>
> --- drivers/md/raid1.c.PRISTINE 2011-01-14 14:51:56.959809669 -0500
> +++ drivers/md/raid1.c 2011-01-20 10:17:11.001701186 -0500
> @@ -819,6 +833,7 @@ static int make_request(mddev_t *mddev,
> finish_wait(&conf->wait_barrier, &w);
> }
> - if (unlikely(!mddev->barriers_work &&
> + if ((
> bio_rw_flagged(bio, BIO_RW_BARRIER))) {
> if (rw == WRITE)
> md_write_end(mddev);
>
> That fixes things, but does Neil or someone else have an idea how best
> to fix this? Could we specifically just not retry a zero-length
> barrier write? I think that would fix it, but is kind of a hack too.
>
> I know barriers are being pulled out of the kernel, so this isn't a
> problem for recent kernels, but as 2.6.32 is a long-term support
> kernel, this may be something that others run into and will want
> fixed.
Normally the first thing that md/raid1 writes to a member device is the
metadata. This is written with a barrier write if possible. If that fails
then barriers_work is cleared, so all barrier writes from the filesystem
(empty or otherwise) will be rejected.
As you are getting an error here I assume that you are using non-persistent
metadata - correct?
Nonetheless, I think the correct fix is to add a special case for retrying a
zero-length.
Something like this maybe?
NeilBrown
Index: linux-2.6.32.orig/drivers/md/raid1.c
===================================================================
--- linux-2.6.32.orig.orig/drivers/md/raid1.c 2011-01-21 07:23:59.000000000 +1100
+++ linux-2.6.32.orig/drivers/md/raid1.c 2011-01-21 07:24:40.498354896 +1100
@@ -236,14 +236,18 @@ static void raid_end_bio_io(r1bio_t *r1_
/* if nobody has done the final endio yet, do it now */
if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) {
+ int err = 0;
PRINTK(KERN_DEBUG "raid1: sync end %s on sectors %llu-%llu\n",
(bio_data_dir(bio) == WRITE) ? "write" : "read",
(unsigned long long) bio->bi_sector,
(unsigned long long) bio->bi_sector +
(bio->bi_size >> 9) - 1);
- bio_endio(bio,
- test_bit(R1BIO_Uptodate, &r1_bio->state) ? 0 : -EIO);
+ if (test_bit(R1BIO_BarrierRetry, &r1_bio->state))
+ err = -EOPNOTSUPP;
+ else if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
+ err = -EIO;
+ bio_endio(bio, err);
}
free_r1bio(r1_bio);
}
@@ -1607,6 +1611,11 @@ static void raid1d(mddev_t *mddev)
*/
int i;
const bool do_sync = bio_rw_flagged(r1_bio->master_bio, BIO_RW_SYNCIO);
+ if (r1_bio->master_bio->bi_size == 0) {
+ /* cannot retry empty barriers, just fail */
+ raid_end_bio_io(r1_bio);
+ continue;
+ }
clear_bit(R1BIO_BarrierRetry, &r1_bio->state);
clear_bit(R1BIO_Barrier, &r1_bio->state);
for (i=0; i < conf->raid_disks; i++)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-01-20 20:25 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <AANLkTimp7eGLu5UEr-wH9MVSGrQWA3PKMakW74_48gk7@mail.gmail.com>
2011-01-20 16:52 ` [BUG 2.6.32] md/raid1: barrier disabling does not work correctly in all cases Paul Clements
2011-01-20 20:25 ` NeilBrown [this message]
2011-01-20 20:44 ` Paul Clements
2011-01-26 13:55 ` Paul Clements
2011-02-01 20:45 ` Paul Clements
2011-02-02 0:32 ` NeilBrown
2011-02-02 1:04 ` Paul Clements
2011-02-24 21:04 ` Paul Clements
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110121072515.07dc2b74@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=paul.clements@us.sios.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.