From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Josh Triplett <josh@joshtriplett.org>,
"Chen, Tim C" <tim.c.chen@intel.com>,
Andi Kleen <ak@linux.intel.com>, Christoph Lameter <cl@linux.com>
Subject: Re: [bisected] pre-3.16 regression on open() scalability
Date: Fri, 13 Jun 2014 15:45:19 -0700 [thread overview]
Message-ID: <20140613224519.GV4581@linux.vnet.ibm.com> (raw)
In-Reply-To: <539B594C.8070004@intel.com>
On Fri, Jun 13, 2014 at 01:04:28PM -0700, Dave Hansen wrote:
> Hi Paul,
>
> I'm seeing a regression when comparing 3.15 to Linus's current tree.
> I'm using Anton Blanchard's will-it-scale "open1" test which creates a
> bunch of processes and does open()/close() in a tight loop:
>
> > https://github.com/antonblanchard/will-it-scale/blob/master/tests/open1.c
>
> At about 50 cores worth of processes, 3.15 and the pre-3.16 code start
> to diverge, with 3.15 scaling better:
>
> http://sr71.net/~dave/intel/3.16-open1regression-0.png
>
> Some profiles point to a big increase in contention inside slub.c's
> get_partial_node() (the allocation side of the slub code) causing the
> regression. That particular open() test is known to do a lot of slab
> operations. But, the odd part is that the slub code hasn't been touched
> much.
>
> So, I bisected it down to this:
>
> > commit ac1bea85781e9004da9b3e8a4b097c18492d857c
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date: Sun Mar 16 21:36:25 2014 -0700
> >
> > sched,rcu: Make cond_resched() report RCU quiescent states
>
> Specifically, if I raise RCU_COND_RESCHED_LIM, things get back to their
> 3.15 levels.
>
> Could the additional RCU quiescent states be causing us to be doing more
> RCU frees that we were before, and getting less benefit from the lock
> batching that RCU normally provides?
Quite possibly. One way to check would be to use the debugfs files
rcu/*/rcugp, which give a count of grace periods since boot for each
RCU flavor. Here "*" is rcu_preempt for CONFIG_PREEMPT and rcu_sched
for !CONFIG_PREEMPT.
Another possibility is that someone is invoking cond_reched() in an
incredibly tight loop.
> The top RCU functions in the profiles are as follows:
>
> > 3.15.0-xxx: 2.58% open1_processes [kernel.kallsyms] [k] file_free_rcu
> > 3.15.0-xxx: 2.45% open1_processes [kernel.kallsyms] [k] __d_lookup_rcu
> > 3.15.0-xxx: 2.41% open1_processes [kernel.kallsyms] [k] rcu_process_callbacks
> > 3.15.0-xxx: 1.87% open1_processes [kernel.kallsyms] [k] __call_rcu.constprop.10
>
> > 3.16.0-rc0: 2.68% open1_processes [kernel.kallsyms] [k] rcu_process_callbacks
> > 3.16.0-rc0: 2.68% open1_processes [kernel.kallsyms] [k] file_free_rcu
> > 3.16.0-rc0: 1.55% open1_processes [kernel.kallsyms] [k] __call_rcu.constprop.10
> > 3.16.0-rc0: 1.28% open1_processes [kernel.kallsyms] [k] __d_lookup_rcu
>
> With everything else equal, we'd expect to see all of these _higher_ in
> the profiles on a the faster kernel (3.15) since it has more RCU work to do.
>
> But, they're all _roughly_ the same. __d_lookup_rcu went up in the
> profile on the fast one (3.15) probably because there _were_ more
> lookups happening there.
>
> rcu_process_callbacks makes me syspicious. It went up slightly
> (probably in the noise), but it _should_ have dropped due to there being
> less RCU work to do.
>
> This supports the theory that there are more callbacks happening than
> before, causing more slab lock contention, which is the actual trigger
> for the performance drop.
>
> I also hacked in an interface to make RCU_COND_RESCHED_LIM a tunable.
> Making it huge instantly makes my test go fast, and dropping it to 256
> instantly makes it slow. Some brief toying with it shows that
> RCU_COND_RESCHED_LIM has to be about 100,000 before performance gets
> back to where it was before.
That is way bigger than I would expect. My bet is that someone is
invoking cond_resched() in a 10s-of-nanoseconds tight loop.
But please feel free to send along your patch, CCing LKML. Longer
term, I probably need to take a more algorithmic approach, but what
you have will be useful to benchmarkers until then.
Thanx, Paul
next prev parent reply other threads:[~2014-06-13 22:45 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-13 20:04 [bisected] pre-3.16 regression on open() scalability Dave Hansen
2014-06-13 22:45 ` Paul E. McKenney [this message]
2014-06-13 23:35 ` Dave Hansen
2014-06-14 2:03 ` Paul E. McKenney
2014-06-17 23:10 ` Dave Hansen
2014-06-18 0:00 ` Josh Triplett
2014-06-18 0:15 ` Andi Kleen
2014-06-18 1:04 ` Paul E. McKenney
2014-06-18 2:27 ` Andi Kleen
2014-06-18 4:47 ` Paul E. McKenney
2014-06-18 12:40 ` Andi Kleen
2014-06-18 12:56 ` Paul E. McKenney
2014-06-18 14:29 ` Christoph Lameter
2014-06-18 0:18 ` Paul E. McKenney
2014-06-18 6:33 ` Dave Hansen
2014-06-18 12:58 ` Paul E. McKenney
2014-06-18 17:36 ` Dave Hansen
2014-06-18 20:30 ` Paul E. McKenney
2014-06-18 23:51 ` Paul E. McKenney
2014-06-19 1:42 ` Andi Kleen
2014-06-19 2:13 ` Paul E. McKenney
2014-06-19 2:29 ` Paul E. McKenney
2014-06-19 2:50 ` Mike Galbraith
2014-06-19 4:19 ` Paul E. McKenney
2014-06-19 3:38 ` Andi Kleen
2014-06-19 4:19 ` Paul E. McKenney
2014-06-19 5:24 ` Mike Galbraith
2014-06-19 18:14 ` Paul E. McKenney
2014-06-19 4:52 ` Eric Dumazet
2014-06-19 5:23 ` Paul E. McKenney
2014-06-19 14:42 ` Christoph Lameter
2014-06-19 18:09 ` Paul E. McKenney
2014-06-19 20:31 ` Christoph Lameter
2014-06-19 20:42 ` Paul E. McKenney
2014-06-19 20:50 ` Andi Kleen
2014-06-19 21:03 ` Paul E. McKenney
2014-06-19 21:13 ` Christoph Lameter
2014-06-19 21:16 ` Christoph Lameter
2014-06-19 21:32 ` josh
2014-06-19 23:07 ` Paul E. McKenney
2014-06-20 15:20 ` Christoph Lameter
2014-06-20 15:38 ` Paul E. McKenney
2014-06-20 16:07 ` Christoph Lameter
2014-06-20 16:30 ` Paul E. McKenney
2014-06-20 17:39 ` Dave Hansen
2014-06-20 18:15 ` Paul E. McKenney
2014-06-18 21:48 ` Paul E. McKenney
2014-06-18 22:03 ` Dave Hansen
2014-06-18 22:52 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140613224519.GV4581@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=ak@linux.intel.com \
--cc=cl@linux.com \
--cc=dave.hansen@intel.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tim.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.