From: Dave Hansen <dave.hansen@intel.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
LKML <linux-kernel@vger.kernel.org>,
Josh Triplett <josh@joshtriplett.org>,
"Chen, Tim C" <tim.c.chen@intel.com>,
Andi Kleen <ak@linux.intel.com>, Christoph Lameter <cl@linux.com>
Subject: [bisected] pre-3.16 regression on open() scalability
Date: Fri, 13 Jun 2014 13:04:28 -0700 [thread overview]
Message-ID: <539B594C.8070004@intel.com> (raw)
Hi Paul,
I'm seeing a regression when comparing 3.15 to Linus's current tree.
I'm using Anton Blanchard's will-it-scale "open1" test which creates a
bunch of processes and does open()/close() in a tight loop:
> https://github.com/antonblanchard/will-it-scale/blob/master/tests/open1.c
At about 50 cores worth of processes, 3.15 and the pre-3.16 code start
to diverge, with 3.15 scaling better:
http://sr71.net/~dave/intel/3.16-open1regression-0.png
Some profiles point to a big increase in contention inside slub.c's
get_partial_node() (the allocation side of the slub code) causing the
regression. That particular open() test is known to do a lot of slab
operations. But, the odd part is that the slub code hasn't been touched
much.
So, I bisected it down to this:
> commit ac1bea85781e9004da9b3e8a4b097c18492d857c
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date: Sun Mar 16 21:36:25 2014 -0700
>
> sched,rcu: Make cond_resched() report RCU quiescent states
Specifically, if I raise RCU_COND_RESCHED_LIM, things get back to their
3.15 levels.
Could the additional RCU quiescent states be causing us to be doing more
RCU frees that we were before, and getting less benefit from the lock
batching that RCU normally provides?
The top RCU functions in the profiles are as follows:
> 3.15.0-xxx: 2.58% open1_processes [kernel.kallsyms] [k] file_free_rcu
> 3.15.0-xxx: 2.45% open1_processes [kernel.kallsyms] [k] __d_lookup_rcu
> 3.15.0-xxx: 2.41% open1_processes [kernel.kallsyms] [k] rcu_process_callbacks
> 3.15.0-xxx: 1.87% open1_processes [kernel.kallsyms] [k] __call_rcu.constprop.10
> 3.16.0-rc0: 2.68% open1_processes [kernel.kallsyms] [k] rcu_process_callbacks
> 3.16.0-rc0: 2.68% open1_processes [kernel.kallsyms] [k] file_free_rcu
> 3.16.0-rc0: 1.55% open1_processes [kernel.kallsyms] [k] __call_rcu.constprop.10
> 3.16.0-rc0: 1.28% open1_processes [kernel.kallsyms] [k] __d_lookup_rcu
With everything else equal, we'd expect to see all of these _higher_ in
the profiles on a the faster kernel (3.15) since it has more RCU work to do.
But, they're all _roughly_ the same. __d_lookup_rcu went up in the
profile on the fast one (3.15) probably because there _were_ more
lookups happening there.
rcu_process_callbacks makes me syspicious. It went up slightly
(probably in the noise), but it _should_ have dropped due to there being
less RCU work to do.
This supports the theory that there are more callbacks happening than
before, causing more slab lock contention, which is the actual trigger
for the performance drop.
I also hacked in an interface to make RCU_COND_RESCHED_LIM a tunable.
Making it huge instantly makes my test go fast, and dropping it to 256
instantly makes it slow. Some brief toying with it shows that
RCU_COND_RESCHED_LIM has to be about 100,000 before performance gets
back to where it was before.
next reply other threads:[~2014-06-13 20:04 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-13 20:04 Dave Hansen [this message]
2014-06-13 22:45 ` [bisected] pre-3.16 regression on open() scalability Paul E. McKenney
2014-06-13 23:35 ` Dave Hansen
2014-06-14 2:03 ` Paul E. McKenney
2014-06-17 23:10 ` Dave Hansen
2014-06-18 0:00 ` Josh Triplett
2014-06-18 0:15 ` Andi Kleen
2014-06-18 1:04 ` Paul E. McKenney
2014-06-18 2:27 ` Andi Kleen
2014-06-18 4:47 ` Paul E. McKenney
2014-06-18 12:40 ` Andi Kleen
2014-06-18 12:56 ` Paul E. McKenney
2014-06-18 14:29 ` Christoph Lameter
2014-06-18 0:18 ` Paul E. McKenney
2014-06-18 6:33 ` Dave Hansen
2014-06-18 12:58 ` Paul E. McKenney
2014-06-18 17:36 ` Dave Hansen
2014-06-18 20:30 ` Paul E. McKenney
2014-06-18 23:51 ` Paul E. McKenney
2014-06-19 1:42 ` Andi Kleen
2014-06-19 2:13 ` Paul E. McKenney
2014-06-19 2:29 ` Paul E. McKenney
2014-06-19 2:50 ` Mike Galbraith
2014-06-19 4:19 ` Paul E. McKenney
2014-06-19 3:38 ` Andi Kleen
2014-06-19 4:19 ` Paul E. McKenney
2014-06-19 5:24 ` Mike Galbraith
2014-06-19 18:14 ` Paul E. McKenney
2014-06-19 4:52 ` Eric Dumazet
2014-06-19 5:23 ` Paul E. McKenney
2014-06-19 14:42 ` Christoph Lameter
2014-06-19 18:09 ` Paul E. McKenney
2014-06-19 20:31 ` Christoph Lameter
2014-06-19 20:42 ` Paul E. McKenney
2014-06-19 20:50 ` Andi Kleen
2014-06-19 21:03 ` Paul E. McKenney
2014-06-19 21:13 ` Christoph Lameter
2014-06-19 21:16 ` Christoph Lameter
2014-06-19 21:32 ` josh
2014-06-19 23:07 ` Paul E. McKenney
2014-06-20 15:20 ` Christoph Lameter
2014-06-20 15:38 ` Paul E. McKenney
2014-06-20 16:07 ` Christoph Lameter
2014-06-20 16:30 ` Paul E. McKenney
2014-06-20 17:39 ` Dave Hansen
2014-06-20 18:15 ` Paul E. McKenney
2014-06-18 21:48 ` Paul E. McKenney
2014-06-18 22:03 ` Dave Hansen
2014-06-18 22:52 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=539B594C.8070004@intel.com \
--to=dave.hansen@intel.com \
--cc=ak@linux.intel.com \
--cc=cl@linux.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=tim.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).