From: Shaohua Li <shli@kernel.org>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH 2/8] md/raid5: Ensure a batch member is not handled prematurely.
Date: Tue, 26 May 2015 11:16:47 -0700 [thread overview]
Message-ID: <20150526181647.GA38853@kernel.org> (raw)
In-Reply-To: <20150523102640.20be3fca@notabene.brown>
On Sat, May 23, 2015 at 10:26:40AM +1000, NeilBrown wrote:
> On Fri, 22 May 2015 16:44:02 -0700 Shaohua Li <shli@kernel.org> wrote:
>
> > On Fri, May 22, 2015 at 03:30:58PM +1000, NeilBrown wrote:
> > > If a stripe is a member of a batch, but not the head, it must
> > > not be handled separately from the rest of the batch.
> > >
> > > 'clear_batch_ready()' handles this requirement to some
> > > extent but not completely. If a member is passed to handle_stripe()
> > > a second time it returns '0' indicating the stripe can be handled,
> > > which is wrong.
> > > So add an extra test.
> > >
> > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > ---
> > > drivers/md/raid5.c | 6 +++++-
> > > 1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > index c3ccefbd4fe7..9a803b735848 100644
> > > --- a/drivers/md/raid5.c
> > > +++ b/drivers/md/raid5.c
> > > @@ -4192,9 +4192,13 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
> > >
> > > static int clear_batch_ready(struct stripe_head *sh)
> > > {
> > > + /* Return '1' if this is a member of batch, or
> > > + * '0' if it is a lone stripe or a head which can now be
> > > + * handled.
> > > + */
> > > struct stripe_head *tmp;
> > > if (!test_and_clear_bit(STRIPE_BATCH_READY, &sh->state))
> > > - return 0;
> > > + return (sh->batch_head && sh->batch_head != sh);
> > > spin_lock(&sh->stripe_lock);
> > > if (!sh->batch_head) {
> > > spin_unlock(&sh->stripe_lock);
> >
> > which case can this happen in?
>
> It definitely happens as I had reliable problems until I added this fix.
> 'retry_aligned_read()' can call handle_stripe() on any stripe at any time,
> but I doubt that would apply. I might try putting a warn-on there and see if
> it provides any hints.
>
> >
> > Patches look good. But I'm not in Fusionio any more, so can't check the
> > performance in big raid array with fast flash cards. I'm doing some tests here.
> > I hit a warning in break_stripe_batch_list, STRIPE_BIT_DELAY is set in the
> > stripe state. I'm checking the reason, but if you have thoughts I can try
> > immediately, please let me know.
>
> I got STRIPE_BIT_DELAY a few times. That was the main reason for
>
> md/raid5: ensure whole batch is delayed for all required bitmap updates.
>
> and they went away after I got that patch right.
>
> Maybe there is a race in there..
>
> If you can reproduce it, maybe WARN whenever STRIPE_BIT_DELAY gets set on a
> stripe with ->batch_head.
Ok, there is a race in add_stripe_bio(). We unlocked the stripe_lock to set the
BIT_DELAY. After the unlock, the stripe might be added to a batch,
stripe_add_to_batch_list didn't clear the bit. Holding the lock in
add_stripe_bio and checking ->batch_head again when we set the bit should fix
the issue.
And STRIPE_ON_UNPLUG_LIST and STRIPE_ON_RELEASE_LIST are set is legit in
break_stripe_batch_list(), they should be removed from the WARN_ON_ONCE().
Thanks,
Shaohua
next prev parent reply other threads:[~2015-05-26 18:16 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-22 5:30 [PATCH 0/8] Fixes for md/raid5 stripe batching code NeilBrown
2015-05-22 5:30 ` [PATCH 3/8] md/raid5: remove condition test from check_break_stripe_batch_list NeilBrown
2015-05-22 5:30 ` [PATCH 1/8] md/raid5: ensure whole batch is delayed for all required bitmap updates NeilBrown
2015-05-22 5:30 ` [PATCH 6/8] md/raid5: be more selective about distributing flags across batch NeilBrown
2015-05-22 5:30 ` [PATCH 2/8] md/raid5: Ensure a batch member is not handled prematurely NeilBrown
2015-05-22 23:44 ` Shaohua Li
2015-05-23 0:26 ` NeilBrown
2015-05-26 18:16 ` Shaohua Li [this message]
2015-05-26 22:35 ` NeilBrown
2015-05-26 23:21 ` Shaohua Li
2015-05-26 23:34 ` NeilBrown
2015-05-27 0:10 ` Shaohua Li
2015-05-27 0:36 ` NeilBrown
2015-05-22 5:30 ` [PATCH 7/8] md/raid5: call break_stripe_batch_list from handle_stripe_clean_event NeilBrown
2015-05-22 5:30 ` [PATCH 5/8] md/raid5: add handle_flags arg to break_stripe_batch_list NeilBrown
2015-05-22 5:30 ` [PATCH 8/8] md/raid5: break stripe-batches when the array has failed NeilBrown
2015-05-22 5:30 ` [PATCH 4/8] md/raid5: duplicate some more handle_stripe_clean_event code in break_stripe_batch_list NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150526181647.GA38853@kernel.org \
--to=shli@kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.