All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH 2/8] md/raid5: Ensure a batch member is not handled prematurely.
Date: Wed, 27 May 2015 09:34:51 +1000	[thread overview]
Message-ID: <20150527093451.0c83be44@notabene.brown> (raw)
In-Reply-To: <20150527083532.32486e08@notabene.brown>

[-- Attachment #1: Type: text/plain, Size: 3272 bytes --]

On Wed, 27 May 2015 08:35:32 +1000 NeilBrown <neilb@suse.de> wrote:

> Could you please review and possibly test the patch below?
> 

well... that patch had a fairly obvious double-lock bug.
Try this one.
(oh, just saw your email that you spotted the lock bug :-)


NeilBrown

From: NeilBrown <neilb@suse.de>
Date: Wed, 27 May 2015 08:43:45 +1000
Subject: [PATCH] md/raid5: close race between STRIPE_BIT_DELAY and batching.

The first time a write is added to a stripe, we need to set the
bitmap bits (if a bitmap is active).
While doing that the stripe is not locked and other writes could
be added and then the stripe could be added to a batch.
Once it has entered the batch it is too large to set STRIPE_BIT_DELAY
as the batch head has taken over when the stripe will be written.

We cannot hold the spinlock while adding the bitmap bit,
so introduce a new stripe_head flag 'STRIPE_BITMAP_PENDING' which
indicates that adding to the bitmap is pending.  This prevents
the stripe from being added to a batch.

Only the first thread to add a write to a stripe can set this bit,
so it is safe for it to clear it again when it is done.

Reported-by: Shaohua Li <shli@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 73b5376dad3b..dae587ecdf71 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -760,6 +760,7 @@ static void unlock_two_stripes(struct stripe_head *sh1, struct stripe_head *sh2)
 static bool stripe_can_batch(struct stripe_head *sh)
 {
 	return test_bit(STRIPE_BATCH_READY, &sh->state) &&
+		!test_bit(STRIPE_BITMAP_PENDING, &sh->state) &&
 		is_full_stripe_write(sh);
 }
 
@@ -3007,14 +3008,27 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx,
 	pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n",
 		(unsigned long long)(*bip)->bi_iter.bi_sector,
 		(unsigned long long)sh->sector, dd_idx);
-	spin_unlock_irq(&sh->stripe_lock);
 
 	if (conf->mddev->bitmap && firstwrite) {
+		/* Cannot hold spinlock over bitmap_startwrite,
+		 * but must ensure this isn't added to a batch until
+		 * we have added to the bitmap and set bm_seq.
+		 * So set STRIPE_BITMAP_PENDING to prevent
+		 * batching.
+		 * Only the first thread to add a write to a stripe
+		 * can set this bit, so we "own" it.
+		 */
+		WARN_ON(test_bit(STRIPE_BITMAP_PENDING, &sh->state));
+		set_bit(STRIPE_BITMAP_PENDING, &sh->state);
+		spin_unlock_irq(&sh->stripe_lock);
 		bitmap_startwrite(conf->mddev->bitmap, sh->sector,
 				  STRIPE_SECTORS, 0);
+		spin_lock_irq(&sh->stripe_lock);
+		clear_bit(STRIPE_BITMAP_PENDING, &sh->state);
 		sh->bm_seq = conf->seq_flush+1;
 		set_bit(STRIPE_BIT_DELAY, &sh->state);
 	}
+	spin_unlock_irq(&sh->stripe_lock);
 
 	if (stripe_can_batch(sh))
 		stripe_add_to_batch_list(conf, sh);
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index d7b2bc8b756f..02c3bf8fbfe7 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -337,6 +337,9 @@ enum {
 	STRIPE_ON_RELEASE_LIST,
 	STRIPE_BATCH_READY,
 	STRIPE_BATCH_ERR,
+	STRIPE_BITMAP_PENDING,	/* Being added to bitmap, don't add
+				 * to batch yet.
+				 */
 };
 
 #define STRIPE_EXPAND_SYNC_FLAGS \

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

  parent reply	other threads:[~2015-05-26 23:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-22  5:30 [PATCH 0/8] Fixes for md/raid5 stripe batching code NeilBrown
2015-05-22  5:30 ` [PATCH 4/8] md/raid5: duplicate some more handle_stripe_clean_event code in break_stripe_batch_list NeilBrown
2015-05-22  5:30 ` [PATCH 8/8] md/raid5: break stripe-batches when the array has failed NeilBrown
2015-05-22  5:30 ` [PATCH 5/8] md/raid5: add handle_flags arg to break_stripe_batch_list NeilBrown
2015-05-22  5:30 ` [PATCH 7/8] md/raid5: call break_stripe_batch_list from handle_stripe_clean_event NeilBrown
2015-05-22  5:30 ` [PATCH 6/8] md/raid5: be more selective about distributing flags across batch NeilBrown
2015-05-22  5:30 ` [PATCH 2/8] md/raid5: Ensure a batch member is not handled prematurely NeilBrown
2015-05-22 23:44   ` Shaohua Li
2015-05-23  0:26     ` NeilBrown
2015-05-26 18:16       ` Shaohua Li
2015-05-26 22:35         ` NeilBrown
2015-05-26 23:21           ` Shaohua Li
2015-05-26 23:34           ` NeilBrown [this message]
2015-05-27  0:10             ` Shaohua Li
2015-05-27  0:36               ` NeilBrown
2015-05-22  5:30 ` [PATCH 1/8] md/raid5: ensure whole batch is delayed for all required bitmap updates NeilBrown
2015-05-22  5:30 ` [PATCH 3/8] md/raid5: remove condition test from check_break_stripe_batch_list NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150527093451.0c83be44@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.