Re: live lock regression in raid5 reshape

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>, yuanhan.liu@linux.intel.com
Cc: linux-raid@vger.kernel.org, artur.paszkiewicz@intel.com
Subject: Re: live lock regression in raid5 reshape
Date: Fri, 26 Feb 2016 09:01:52 +1100	[thread overview]
Message-ID: <87d1rkla9b.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20160225190712.GA2390@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2155 bytes --]

On Fri, Feb 26 2016, Shaohua Li wrote:

> Hi,
>
> I hit a live lock in reshape test, which is introduced by:
>
> e9e4c377e2f563892c50d1d093dd55c7d518fc3d(md/raid5: per hash value and exclusive wait_for_stripe)
>
> The problem is get_active_stripe waits on conf->wait_for_stripe[hash]. Assume
> hash is 0. My test release stripes in this order:
> - release all stripes with hash 0
> - get_active_stripe still sleeps since active_stripes > max_nr_stripes * 3 / 4
> - release all stripes with hash other than 0. active_stripes becomes 0
> - get_active_stripe still sleeps, since nobody wakes up wait_for_stripe[0]
>
> The system live locks. The problem is active_stripes isn't a per-hash count.
> Revert the patch makes the lock go away.
>
> I didn't come out a solution yet except reverting the patch. Making
> active_stripes per-hash is a candidate, but not sure if there is thundering
> herd problem because each hash will have less stripes. On the other hand, I'm
> wondering if the patch makes sense now. The commit log declares the issue
> happens with limited stripes, but now stripe count is automatically increased.
>

->active_stripes does seem to be the core of the problem here.

The purpose of the comparison with max_nr_stripes*3/4 was to encourage
requests to be handled in large batches rather than dribbling out one at
a time.  That should encourage the creation of full stripe writes.  I
think it does (or at least: did) help but we know it isn't perfect.
There might be a better way.

If two threads are each writing full stripes of data, we would prefer
one could allocate a full set of stripe_heads and the other one get
nothing for a little while, rather than each get half of the number of
stripe_heads that they need.

Possibly we would could impose this restriction only on the first
stripe_head in a stripe (i.e. the start of a chunk).  That should have
much the same effect but wouldn't cause the problem you are seeing.

Certainly backing this out is simplest (particularly if you want to send
it to -stable).  I suspect it would be best to ultimately keep the
hashed wait queues if we can avoid the livelock.

thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

     prev parent reply	other threads:[~2016-02-25 22:01 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-25 19:07 live lock regression in raid5 reshape Shaohua Li
2016-02-25 22:01 ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d1rkla9b.fsf@notabene.neil.brown.name \
    --to=neilb@suse.de \
    --cc=artur.paszkiewicz@intel.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    --cc=yuanhan.liu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).