From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH 2/8] md/raid5: Ensure a batch member is not handled prematurely.
Date: Wed, 27 May 2015 10:36:56 +1000 [thread overview]
Message-ID: <20150527103656.6591e6f3@notabene.brown> (raw)
In-Reply-To: <20150527001005.GA106894@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 6431 bytes --]
On Tue, 26 May 2015 17:10:05 -0700 Shaohua Li <shli@kernel.org> wrote:
> On Wed, May 27, 2015 at 09:34:51AM +1000, NeilBrown wrote:
> > On Wed, 27 May 2015 08:35:32 +1000 NeilBrown <neilb@suse.de> wrote:
> >
> > > Could you please review and possibly test the patch below?
> > >
> >
> > well... that patch had a fairly obvious double-lock bug.
> > Try this one.
> > (oh, just saw your email that you spotted the lock bug :-)
> >
> >
> > NeilBrown
> >
> > From: NeilBrown <neilb@suse.de>
> > Date: Wed, 27 May 2015 08:43:45 +1000
> > Subject: [PATCH] md/raid5: close race between STRIPE_BIT_DELAY and batching.
> >
> > The first time a write is added to a stripe, we need to set the
> > bitmap bits (if a bitmap is active).
> > While doing that the stripe is not locked and other writes could
> > be added and then the stripe could be added to a batch.
> > Once it has entered the batch it is too large to set STRIPE_BIT_DELAY
> > as the batch head has taken over when the stripe will be written.
> >
> > We cannot hold the spinlock while adding the bitmap bit,
> > so introduce a new stripe_head flag 'STRIPE_BITMAP_PENDING' which
> > indicates that adding to the bitmap is pending. This prevents
> > the stripe from being added to a batch.
> >
> > Only the first thread to add a write to a stripe can set this bit,
> > so it is safe for it to clear it again when it is done.
> >
> > Reported-by: Shaohua Li <shli@kernel.org>
> > Signed-off-by: NeilBrown <neilb@suse.de>
> >
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index 73b5376dad3b..dae587ecdf71 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -760,6 +760,7 @@ static void unlock_two_stripes(struct stripe_head *sh1, struct stripe_head *sh2)
> > static bool stripe_can_batch(struct stripe_head *sh)
> > {
> > return test_bit(STRIPE_BATCH_READY, &sh->state) &&
> > + !test_bit(STRIPE_BITMAP_PENDING, &sh->state) &&
> > is_full_stripe_write(sh);
> > }
> >
> > @@ -3007,14 +3008,27 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx,
> > pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n",
> > (unsigned long long)(*bip)->bi_iter.bi_sector,
> > (unsigned long long)sh->sector, dd_idx);
> > - spin_unlock_irq(&sh->stripe_lock);
> >
> > if (conf->mddev->bitmap && firstwrite) {
> > + /* Cannot hold spinlock over bitmap_startwrite,
> > + * but must ensure this isn't added to a batch until
> > + * we have added to the bitmap and set bm_seq.
> > + * So set STRIPE_BITMAP_PENDING to prevent
> > + * batching.
> > + * Only the first thread to add a write to a stripe
> > + * can set this bit, so we "own" it.
> > + */
> > + WARN_ON(test_bit(STRIPE_BITMAP_PENDING, &sh->state));
> I keep hitting this. the firstwrite is set for every device.
>
I wonder why I'm not.... maybe because I'm using /dev/loop devices and they
behave a bit differently.
Of course 'firstwrite' is not per-stripe, it is per-device-in-a-stripe.
So it will be set for every device.
Which is rather pointless really, we only need to set the bitmap once for the
whole stripe. Maybe it was easier the other way.
Could you try this?
Thanks!
NeilBrown
From: NeilBrown <neilb@suse.de>
Date: Wed, 27 May 2015 08:43:45 +1000
Subject: [PATCH] md/raid5: close race between STRIPE_BIT_DELAY and batching.
When we add a write to a stripe we need to make sure the bitmap
bit is set. While doing that the stripe is not locked so it could
be added to a batch after which further changes to STRIPE_BIT_DELAY
and ->bm_seq are ineffective.
So we need to hold off adding to a stripe until bitmap_startwrite has
completed at least once, and we need to avoid further changes to
STRIPE_BIT_DELAY once the stripe has been added to a batch.
If a bitmap_startwrite() completes after the stripe was added to a
batch, it will not have set the bit, only incremented a counter, so no
extra delay of the stripe is needed.
Reported-by: Shaohua Li <shli@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 73b5376dad3b..5d5ce97c2210 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -760,6 +760,7 @@ static void unlock_two_stripes(struct stripe_head *sh1, struct stripe_head *sh2)
static bool stripe_can_batch(struct stripe_head *sh)
{
return test_bit(STRIPE_BATCH_READY, &sh->state) &&
+ !test_bit(STRIPE_BITMAP_PENDING, &sh->state) &&
is_full_stripe_write(sh);
}
@@ -3007,14 +3008,32 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx,
pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n",
(unsigned long long)(*bip)->bi_iter.bi_sector,
(unsigned long long)sh->sector, dd_idx);
- spin_unlock_irq(&sh->stripe_lock);
if (conf->mddev->bitmap && firstwrite) {
+ /* Cannot hold spinlock over bitmap_startwrite,
+ * but must ensure this isn't added to a batch until
+ * we have added to the bitmap and set bm_seq.
+ * So set STRIPE_BITMAP_PENDING to prevent
+ * batching.
+ * If multiple add_stripe_bio() calls race here they
+ * much all set STRIPE_BITMAP_PENDING. So only the first one
+ * to complete "bitmap_startwrite" gets to set
+ * STRIPE_BIT_DELAY. This is important as once a stripe
+ * is added to a batch, STRIPE_BIT_DELAY cannot be changed
+ * any more.
+ */
+ set_bit(STRIPE_BITMAP_PENDING, &sh->state);
+ spin_unlock_irq(&sh->stripe_lock);
bitmap_startwrite(conf->mddev->bitmap, sh->sector,
STRIPE_SECTORS, 0);
- sh->bm_seq = conf->seq_flush+1;
- set_bit(STRIPE_BIT_DELAY, &sh->state);
+ spin_lock_irq(&sh->stripe_lock);
+ clear_bit(STRIPE_BITMAP_PENDING, &sh->state);
+ if (!sh->batch_head) {
+ sh->bm_seq = conf->seq_flush+1;
+ set_bit(STRIPE_BIT_DELAY, &sh->state);
+ }
}
+ spin_unlock_irq(&sh->stripe_lock);
if (stripe_can_batch(sh))
stripe_add_to_batch_list(conf, sh);
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index d7b2bc8b756f..02c3bf8fbfe7 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -337,6 +337,9 @@ enum {
STRIPE_ON_RELEASE_LIST,
STRIPE_BATCH_READY,
STRIPE_BATCH_ERR,
+ STRIPE_BITMAP_PENDING, /* Being added to bitmap, don't add
+ * to batch yet.
+ */
};
#define STRIPE_EXPAND_SYNC_FLAGS \
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
next prev parent reply other threads:[~2015-05-27 0:36 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-22 5:30 [PATCH 0/8] Fixes for md/raid5 stripe batching code NeilBrown
2015-05-22 5:30 ` [PATCH 8/8] md/raid5: break stripe-batches when the array has failed NeilBrown
2015-05-22 5:30 ` [PATCH 3/8] md/raid5: remove condition test from check_break_stripe_batch_list NeilBrown
2015-05-22 5:30 ` [PATCH 2/8] md/raid5: Ensure a batch member is not handled prematurely NeilBrown
2015-05-22 23:44 ` Shaohua Li
2015-05-23 0:26 ` NeilBrown
2015-05-26 18:16 ` Shaohua Li
2015-05-26 22:35 ` NeilBrown
2015-05-26 23:21 ` Shaohua Li
2015-05-26 23:34 ` NeilBrown
2015-05-27 0:10 ` Shaohua Li
2015-05-27 0:36 ` NeilBrown [this message]
2015-05-22 5:30 ` [PATCH 6/8] md/raid5: be more selective about distributing flags across batch NeilBrown
2015-05-22 5:30 ` [PATCH 1/8] md/raid5: ensure whole batch is delayed for all required bitmap updates NeilBrown
2015-05-22 5:30 ` [PATCH 7/8] md/raid5: call break_stripe_batch_list from handle_stripe_clean_event NeilBrown
2015-05-22 5:30 ` [PATCH 4/8] md/raid5: duplicate some more handle_stripe_clean_event code in break_stripe_batch_list NeilBrown
2015-05-22 5:30 ` [PATCH 5/8] md/raid5: add handle_flags arg to break_stripe_batch_list NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150527103656.6591e6f3@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).