From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org, djbw@fb.com
Subject: Re: [patch 3/3] raid5: relieve lock contention in get_active_stripe()
Date: Wed, 4 Sep 2013 16:41:32 +1000 [thread overview]
Message-ID: <20130904164132.177701e0@notabene.brown> (raw)
In-Reply-To: <20130903070228.GA25041@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 3766 bytes --]
On Tue, 3 Sep 2013 15:02:28 +0800 Shaohua Li <shli@kernel.org> wrote:
> On Tue, Sep 03, 2013 at 04:08:58PM +1000, NeilBrown wrote:
> > On Wed, 28 Aug 2013 14:39:53 +0800 Shaohua Li <shli@kernel.org> wrote:
> >
> > > On Wed, Aug 28, 2013 at 02:32:52PM +1000, NeilBrown wrote:
> > > > On Tue, 27 Aug 2013 16:53:30 +0800 Shaohua Li <shli@kernel.org> wrote:
> > > >
> > > > > On Tue, Aug 27, 2013 at 01:17:52PM +1000, NeilBrown wrote:
> > > >
> > > > >
> > > > > > Then get_active_stripe wouldn't need to worry about device_lock at all and
> > > > > > would only need to get the hash lock for the particular sector. That should
> > > > > > make it a lot simpler.
> > > > >
> > > > > did you mean get_active_stripe() doesn't need device_lock for any code path?
> > > > > How could it be safe? device_lock still protects something like handle_list,
> > > > > delayed_list, which release_stripe() will use while a get_active_stripe can run
> > > > > concurrently.
> > > >
> > > > Yes you will still need device_lock to protect list_del_init(&sh->lru),
> > > > as well as the hash lock.
> > > > Do you need device_lock anywhere else in there?
> > >
> > > That's what I mean. So I need get both device_lock and hash_lock. To not
> > > deadlock, I need release hash_lock and relock device_lock/hash_lock. Since I
> > > release lock, I need recheck if I can find the stripe in hash again. So the
> > > seqcount locking doesn't simplify things here. I thought the seqlock only fixes
> > > one race. Did I miss anything?
> >
> > Can you order the locks so that you take the hash_lock first, then the
> > device_lock? That would be a lot simpler.
>
> Looks impossible. For example, in handle_active_stripes() we release several
> stripes, we can't take hash_lock first.
"impossible" just takes a little longer :-)
do_release_stripe gets called with only device_lock held. It gets passed an
(initially) empty list_head too.
If it wants to add the stripe to an inactive list it puts it on the given
list_head instead.
release_stripe(), after calling do_release_stripe() calls some function to
grab the appropriate hash_lock for each stripe in the list_head and add it
to that inactive list.
release_stripe_list() might collect some stripes from from __release_stripe
that need to go on an inactive list. It arranges for them to be put on the
right list, with the right lock, next time device_lock is dropped. That
might be in handle_active_stripes()
activate_bit_delay might similarly collect stripes, which are handled the
same way as those collected by release_stripe_list.
etc.
i.e. the hash_locks protect the various inactive lists. device_lock protects
all the others. If we need to add something to an inactive list while
holding device_lock we delay until device_lock can be dropped.
>
> > > I saw your tree only has seqcount_write lock in one place, but there are still
> > > other places which changing quiesce, degraded. I thought we still need lock all
> > > locks like what I did.
> >
> > Can you be specific? I thought I had convinced my self that I covered
> > everything that was necessary, but I might have missed something.
>
> For example, raid5_quiesce() will change quiesce which get_active_stripe() will
> use. So my point is get_active_stripe() still need get device_lock. Appears you
> agree get_active_stripe() need get device_lock. Maybe I confused your
> comments.
raid5_quiesce might reasonably take all of the hash_locks and then the
device_lock - it is expected to be a rare event and can afford to be heavy
handed.
get_active_stripe() should only take device_lock for list_del_init(&sh->lru).
What else have I missed?
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-09-04 6:41 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-12 2:24 [patch 0/3] raid5: relieve lock contention of get_active_stripe() Shaohua Li
2013-08-12 2:24 ` [patch 1/3] raid5: rename stripe_hash() Shaohua Li
2013-08-12 2:24 ` [patch 2/3] wait: add wait_event_cmd() Shaohua Li
2013-08-12 2:24 ` [patch 3/3] raid5: relieve lock contention in get_active_stripe() Shaohua Li
2013-08-27 3:17 ` NeilBrown
2013-08-27 8:53 ` Shaohua Li
2013-08-28 4:32 ` NeilBrown
2013-08-28 6:39 ` Shaohua Li
2013-09-03 6:08 ` NeilBrown
2013-09-03 7:02 ` Shaohua Li
2013-09-04 6:41 ` NeilBrown [this message]
2013-09-05 5:40 ` Shaohua Li
2013-09-05 6:29 ` NeilBrown
2013-09-05 9:18 ` Shaohua Li
2013-09-09 4:33 ` Shaohua Li
2013-09-10 1:13 ` NeilBrown
2013-09-10 2:35 ` Shaohua Li
2013-09-10 4:06 ` NeilBrown
2013-09-10 4:24 ` Shaohua Li
2013-09-10 5:20 ` NeilBrown
2013-09-10 6:59 ` Shaohua Li
2013-09-10 7:28 ` NeilBrown
2013-09-10 7:37 ` Shaohua Li
2013-09-11 1:34 ` NeilBrown
2013-09-12 1:55 ` Shaohua Li
2013-09-12 5:38 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130904164132.177701e0@notabene.brown \
--to=neilb@suse.de \
--cc=djbw@fb.com \
--cc=linux-raid@vger.kernel.org \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.