From: J. Bruce Fields <bfields@fieldses.org>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH v1 07/11] locks: only pull entries off of blocked_list when they are really unblocked
Date: Wed, 5 Jun 2013 08:59:06 -0400 [thread overview]
Message-ID: <20130605125906.GC24193@fieldses.org> (raw)
In-Reply-To: <20130605083859.72c855cd@corrin.poochiereds.net>
On Wed, Jun 05, 2013 at 08:38:59AM -0400, Jeff Layton wrote:
> On Wed, 5 Jun 2013 08:24:32 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
>
> > On Wed, Jun 05, 2013 at 07:38:22AM -0400, Jeff Layton wrote:
> > > On Tue, 4 Jun 2013 17:58:39 -0400
> > > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > >
> > > > On Fri, May 31, 2013 at 11:07:30PM -0400, Jeff Layton wrote:
> > > > > Currently, when there is a lot of lock contention the kernel spends an
> > > > > inordinate amount of time taking blocked locks off of the global
> > > > > blocked_list and then putting them right back on again. When all of this
> > > > > code was protected by a single lock, then it didn't matter much, but now
> > > > > it means a lot of file_lock_lock thrashing.
> > > > >
> > > > > Optimize this a bit by deferring the removal from the blocked_list until
> > > > > we're either applying or cancelling the lock. By doing this, and using a
> > > > > lockless list_empty check, we can avoid taking the file_lock_lock in
> > > > > many cases.
> > > > >
> > > > > Because the fl_link check is lockless, we must ensure that only the task
> > > > > that "owns" the request manipulates the fl_link. Also, with this change,
> > > > > it's possible that we'll see an entry on the blocked_list that has a
> > > > > NULL fl_next pointer. In that event, just ignore it and continue walking
> > > > > the list.
> > > >
> > > > OK, that sounds safe as in it shouldn't crash, but does the deadlock
> > > > detection still work, or can it miss loops?
> > > >
> > > > Those locks that are temporarily NULL would previously not have been on
> > > > the list at all, OK, but... I'm having trouble reasoning about how this
> > > > works now.
> > > >
> > > > Previously a single lock was held interrupted across
> > > > posix_locks_deadlock and locks_insert_block() which guaranteed we
> > > > shouldn't be adding a loop, is that still true?
> > > >
> > > > --b.
> > > >
> > >
> > > I had thought it was when I originally looked at this, but now that I
> > > consider it again I think you may be correct and that there are possible
> > > races here. Since we might end up reblocking behind a different lock
> > > without taking the global spinlock we could flip to blocking behind a
> > > different lock such that a loop is created if you had a complex (>2)
> > > chain of locks.
> > >
> > > I think I'm going to have to drop this approach and instead make it so
> > > that the deadlock detection and insertion into the global blocker
> > > list/hash are atomic.
> >
> > Right. Once you drop the lock you can no longer be sure that what you
> > learned about the file-lock graph stays true.
> >
> > > Ditto for locks_wake_up_blocks on posix locks and
> > > taking the entries off the list/hash.
> >
> > Here I'm not sure what you mean.
> >
>
> Basically, I mean that rather than setting the fl_next pointer to NULL
> while holding only the inode lock and then ignoring those locks in the
> deadlock detection code, we should additionally take the global lock in
> locks_wake_up_blocks too and take the blocked locks off the global list
> and the i_flock list at the same time.
OK, thanks, got it. I have a hard time thinking about that.... But yes
it bothers me that the deadlock detection code could see an out-of-date
value of fl_next, and I can't convince myself that this wouldn't result
in false positives or false negatives.
> That actually might not be completely necessary, but it'll make the
> logic clearer and easier to understand and probably won't hurt
> performance too much. Again, I'll need to do some perf testing to be
> sure.
OK!
--b.
next prev parent reply other threads:[~2013-06-05 12:59 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-01 3:07 [Cluster-devel] [PATCH v1 00/11] locks: scalability improvements for file locking Jeff Layton
2013-06-01 3:07 ` [Cluster-devel] [PATCH v1 01/11] cifs: use posix_unblock_lock instead of locks_delete_block Jeff Layton
2013-06-03 21:53 ` J. Bruce Fields
2013-06-01 3:07 ` [Cluster-devel] [PATCH v1 02/11] locks: make generic_add_lease and generic_delete_lease static Jeff Layton
2013-06-03 21:53 ` J. Bruce Fields
2013-06-01 3:07 ` [Cluster-devel] [PATCH v1 03/11] locks: comment cleanups and clarifications Jeff Layton
2013-06-03 22:00 ` J. Bruce Fields
2013-06-04 11:09 ` Jeff Layton
2013-06-01 3:07 ` [Cluster-devel] [PATCH v1 04/11] locks: make "added" in __posix_lock_file a bool Jeff Layton
2013-06-04 20:17 ` J. Bruce Fields
2013-06-01 3:07 ` [Cluster-devel] [PATCH v1 05/11] locks: encapsulate the fl_link list handling Jeff Layton
2013-06-04 20:17 ` J. Bruce Fields
2013-06-01 3:07 ` [Cluster-devel] [PATCH v1 06/11] locks: convert to i_lock to protect i_flock list Jeff Layton
2013-06-04 21:22 ` J. Bruce Fields
2013-06-05 0:46 ` Jeff Layton
2013-06-01 3:07 ` [Cluster-devel] [PATCH v1 07/11] locks: only pull entries off of blocked_list when they are really unblocked Jeff Layton
2013-06-04 21:58 ` J. Bruce Fields
2013-06-05 11:38 ` Jeff Layton
2013-06-05 12:24 ` J. Bruce Fields
2013-06-05 12:38 ` Jeff Layton
2013-06-05 12:59 ` J. Bruce Fields [this message]
2013-06-01 3:07 ` [Cluster-devel] [PATCH v1 08/11] locks: convert fl_link to a hlist_node Jeff Layton
2013-06-04 21:59 ` J. Bruce Fields
2013-06-05 11:43 ` Jeff Layton
2013-06-05 12:46 ` J. Bruce Fields
2013-06-01 3:07 ` [Cluster-devel] [PATCH v1 09/11] locks: turn the blocked_list into a hashtable Jeff Layton
2013-06-01 3:07 ` [Cluster-devel] [PATCH v1 10/11] locks: add a new "lm_owner_key" lock operation Jeff Layton
2013-06-01 3:07 ` [Cluster-devel] [PATCH v1 11/11] locks: give the blocked_hash its own spinlock Jeff Layton
2013-06-04 14:19 ` Stefan Metzmacher
2013-06-04 14:39 ` Jeff Layton
2013-06-04 14:46 ` Christoph Hellwig
2013-06-04 14:53 ` J. Bruce Fields
2013-06-04 15:15 ` Jeff Layton
2013-06-04 14:56 ` Jeff Layton
2013-06-03 19:04 ` [Cluster-devel] [PATCH v1 00/11] locks: scalability improvements for file locking Davidlohr Bueso
2013-06-03 21:31 ` J. Bruce Fields
2013-06-04 10:54 ` Jeff Layton
2013-06-04 11:56 ` Jim Rees
2013-06-04 12:15 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130605125906.GC24193@fieldses.org \
--to=bfields@fieldses.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).