linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* live lock regression in raid5 reshape
@ 2016-02-25 19:07 Shaohua Li
  2016-02-25 22:01 ` NeilBrown
  0 siblings, 1 reply; 2+ messages in thread
From: Shaohua Li @ 2016-02-25 19:07 UTC (permalink / raw)
  To: yuanhan.liu, neilb; +Cc: linux-raid, artur.paszkiewicz

Hi,

I hit a live lock in reshape test, which is introduced by:

e9e4c377e2f563892c50d1d093dd55c7d518fc3d(md/raid5: per hash value and exclusive wait_for_stripe)

The problem is get_active_stripe waits on conf->wait_for_stripe[hash]. Assume
hash is 0. My test release stripes in this order:
- release all stripes with hash 0
- get_active_stripe still sleeps since active_stripes > max_nr_stripes * 3 / 4
- release all stripes with hash other than 0. active_stripes becomes 0
- get_active_stripe still sleeps, since nobody wakes up wait_for_stripe[0]

The system live locks. The problem is active_stripes isn't a per-hash count.
Revert the patch makes the lock go away.

I didn't come out a solution yet except reverting the patch. Making
active_stripes per-hash is a candidate, but not sure if there is thundering
herd problem because each hash will have less stripes. On the other hand, I'm
wondering if the patch makes sense now. The commit log declares the issue
happens with limited stripes, but now stripe count is automatically increased.

Yuanhan, could you please check if performance changes with the patch reverted
in latest kernel?

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-02-25 22:01 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-25 19:07 live lock regression in raid5 reshape Shaohua Li
2016-02-25 22:01 ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).