Re: performance delta after VFS i_mutex=>i_rwsem conversion

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
	"Chen, Tim C" <tim.c.chen@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Davidlohr Bueso <dbueso@suse.de>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Jason Low <jason.low2@hp.com>,
	Michel Lespinasse <walken@google.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Waiman Long <waiman.long@hp.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: performance delta after VFS i_mutex=>i_rwsem conversion
Date: Mon, 6 Jun 2016 22:15:23 +0100	[thread overview]
Message-ID: <20160606211522.GF14480@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFxH_7wjo_BgUPK5iomWedE2=DaUZVX-yruHOWEk7OTiHQ@mail.gmail.com>

On Mon, Jun 06, 2016 at 01:46:23PM -0700, Linus Torvalds wrote:

> So my gut feel is that we do want to have the same heuristics for
> rwsems and mutexes (well, modulo possible actual semantic differences
> due to the whole shared-vs-exclusive issues).
> 
> And I also suspect that the mutexes have gotten a lot more performance
> tuning done on them, so it's likely the correct thing to try to make
> the rwsem match the mutex code rather than the other way around.
> 
> I think we had Jason and Davidlohr do mutex work last year, let's see
> if they agree on that "yes, the mutex case is the likely more tuned
> case" feeling.
> 
> The fact that your performance improves when you do that obviously
> then also validates the assumption that the mutex spinning is the
> better optimized one.

FWIW, there's another fun issue on ramfs - dcache_readdir() is doing an
obscene amount of grabbing/releasing ->d_lock and once you take the external
serialization out, parallel getdents load hits contention on *that*.
In spades.  And unlike mutex (or rswem exclusive), contention on ->d_lock
chews a lot of cycles.  The root cause is the use of cursors - we not only
move them more than we ought to (we do that on each entry reported, rather
than once before return from dcache_readdir()), we can't traverse the real
list entries (which remain nice and stable; another low-hanging fruit is
pointless grabbing ->d_lock on those) without ->d_lock on parent.

I think I have a kinda-sorta solution, but it has a problem.  What I want
to do is
	* list_move() only once per dcache_readdir()
	* ->d_lock taken for that and only for that.
	* list_move() itself surrounded with write_seqcount_{begin,end} on
some seqcount
	* traversal to the next real entry done under rcu_read_lock in a
seqretry loop.

The only problem is where to put that seqcount (unsigned int, really).
->i_dir_seq is an obvious candidate, but that'll need careful profiling
on getdents/lookup mixes...

next prev parent reply	other threads:[~2016-06-06 21:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06 20:00 performance delta after VFS i_mutex=>i_rwsem conversion Dave Hansen
2016-06-06 20:46 ` Linus Torvalds
2016-06-06 21:13   ` Waiman Long
2016-06-06 21:20     ` Linus Torvalds
2016-06-07  3:22       ` Valdis.Kletnieks
2016-06-07 15:22         ` Waiman Long
2016-06-08  8:58     ` Ingo Molnar
2016-06-09 10:25       ` Ingo Molnar
2016-06-09 18:14         ` Dave Hansen
2016-06-09 20:10           ` Chen, Tim C
2016-06-06 21:15   ` Al Viro [this message]
2016-06-06 21:46     ` Linus Torvalds
2016-06-06 22:07       ` Al Viro
2016-06-06 23:50         ` Linus Torvalds
2016-06-06 23:59           ` Linus Torvalds
2016-06-07  0:29             ` Linus Torvalds
2016-06-07  0:40           ` Al Viro
2016-06-07  0:44             ` Al Viro
2016-06-07  0:58             ` Al Viro
2016-06-07  0:58             ` Linus Torvalds
2016-06-07  1:19               ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160606211522.GF14480@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=dave.hansen@intel.com \
    --cc=dbueso@suse.de \
    --cc=jason.low2@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=waiman.long@hp.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.