Re: performance delta after VFS i_mutex=>i_rwsem conversion

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
	"Chen, Tim C" <tim.c.chen@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Davidlohr Bueso <dbueso@suse.de>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Jason Low <jason.low2@hp.com>,
	Michel Lespinasse <walken@google.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Waiman Long <waiman.long@hp.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: performance delta after VFS i_mutex=>i_rwsem conversion
Date: Mon, 6 Jun 2016 23:07:53 +0100	[thread overview]
Message-ID: <20160606220753.GG14480@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFw6oTECJ-GvYwkz4+RBgBoKiwPXfygOCZzX1V=KUEMG-g@mail.gmail.com>

On Mon, Jun 06, 2016 at 02:46:44PM -0700, Linus Torvalds wrote:

> Let's look at smaller changes first.
> 
> In particular, why the f*ck do we take "next->d_lock" at all, much less twice?

See upthread re "low-hanging fruit".

> Do we really need that? Both of them seem bogus:
> 
>  - the first one only protects the "simple_positive()" test.
> 
>     That seems stupid. We hold the parent d_lock _and_ we hold the
> parent inode lock for reading, how the hell is simple_positive() going
> to change? Yeah, yeah, *maybe* we could catch a lookup that just
> happens to add the inode field in process, but it's not a race we even
> care about.

Can't happen - it's ramfs and lookups there never end up adding a positive
entry.  So I don't believe that READ_ONCE() or anything of that sort would
be needed.  All transitions from negative to positive happen under exclusive
lock on parent, which gives us all barriers we need.  Transitions from
hashed positive to negative or unhashed also happen only under the same
exclusive lock on parent, which takes care of going in other direction.

> If we see the inode being non-NULL, it will now *stay*
> non-NULL, and we already depend on that (that "d_inode(next)" is then
> done without the lock held.

Like I said - it's stable.

>  - the second one only protects "list_move()" of the cursor. But since
> it's the child list, the "next->d_lock" thing ends up being
> irrelevant. It's the parent dentry lock we need to hold, nothing else.
> Not the "next" one.
> 
> so I don't see the point of half the d_lock games we play.

Yes.

> And the thing about spinlock contention: having *nested* spinlocks be
> contented turns contention into an exponential thing. I really suspect
> that if we can just remove the nested spinlock, the dentry->d_lock
> contention will go down by a huge amount, because then you completely
> remove the "wait on lock while holding another lock" thing, which is
> what tends to really suck.

True in general, but here we really do a lot under that ->d_lock - all
list traversals are under it.  So I suspect that contention on nested
lock is not an issue in that particular load.  It's certainly a separate
commit, so we'll see how much does it give on its own, but I doubt that
it'll be anywhere near enough.

next prev parent reply	other threads:[~2016-06-06 22:08 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06 20:00 performance delta after VFS i_mutex=>i_rwsem conversion Dave Hansen
2016-06-06 20:46 ` Linus Torvalds
2016-06-06 21:13   ` Waiman Long
2016-06-06 21:20     ` Linus Torvalds
2016-06-07  3:22       ` Valdis.Kletnieks
2016-06-07 15:22         ` Waiman Long
2016-06-08  8:58     ` Ingo Molnar
2016-06-09 10:25       ` Ingo Molnar
2016-06-09 18:14         ` Dave Hansen
2016-06-09 20:10           ` Chen, Tim C
2016-06-06 21:15   ` Al Viro
2016-06-06 21:46     ` Linus Torvalds
2016-06-06 22:07       ` Al Viro [this message]
2016-06-06 23:50         ` Linus Torvalds
2016-06-06 23:59           ` Linus Torvalds
2016-06-07  0:29             ` Linus Torvalds
2016-06-07  0:40           ` Al Viro
2016-06-07  0:44             ` Al Viro
2016-06-07  0:58             ` Al Viro
2016-06-07  0:58             ` Linus Torvalds
2016-06-07  1:19               ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160606220753.GG14480@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=dave.hansen@intel.com \
    --cc=dbueso@suse.de \
    --cc=jason.low2@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=waiman.long@hp.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox