From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Triplett <josh@joshtriplett.org>,
linux-kernel@vger.kernel.org, mingo@elte.hu,
laijs@cn.fujitsu.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
niv@us.ibm.com, tglx@linutronix.de, rostedt@goodmis.org,
Valdis.Kletnieks@vt.edu, dhowells@redhat.com,
edumazet@google.com, darren@dvhart.com, fweisbec@gmail.com,
sbw@mit.edu
Subject: Re: [PATCH tip/core/rcu 6/7] rcu: Drive quiescent-state-forcing delay from HZ
Date: Thu, 16 May 2013 06:22:10 -0700 [thread overview]
Message-ID: <20130516132210.GV4442@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130516094519.GJ19669@dyad.programming.kicks-ass.net>
On Thu, May 16, 2013 at 11:45:19AM +0200, Peter Zijlstra wrote:
> On Wed, May 15, 2013 at 10:31:42AM -0700, Paul E. McKenney wrote:
> > On Wed, May 15, 2013 at 11:02:34AM +0200, Peter Zijlstra wrote:
>
> > > Earlier you said that improving EQS behaviour was expensive in that it
> > > would require taking (global) locks or somesuch.
> > >
> > > Would it not be possible to have the cpu performing a FQS finish this
> > > work; that way the first FQS would be a little slow, but after that no
> > > FQS would be needed anymore, right? Since we'd no longer require the
> > > other CPUs to end a grace period.
> >
> > It is not just the first FQS that would be slow, it would also be slow
> > the next time that this CPU transitioned from idle to non-idle, which
> > is when this work would need to be undone.
>
> Hurm, yes I suppose that is true. If you've saved more on FQS cost it might be
> worth it for the throughput people though.
But the NO_HZ_PERIODIC and NO_HZ_IDLE throughput people will have their
CPUs non-idle, which means that they are reporting their quiescent states
and the FQS scan just isn't happening. The NO_HZ_FULL throughput people
will have their RCU GP kthreads pinned to the timekeeping CPU, and therefore
won't care much about the overhead of the FQS scan.
> But somehow I imagined making a CPU part of the GP would be easier than taking
> it out. After all, taking it out is dangerous and careful work, one is not to
> accidentally execute a callback or otherwise end a GP before time.
>
> When entering the GP cycle there is no such concern, the CPU state is clean
> after all.
But that would increase the overhead of GP initialization. Right now,
GP initialization touches only the leaf rcu_node structures, of which
there are by default one per 16 CPUs (and can be configured up to one per
64 CPUs, which it is on really big systems). So on busy mixed-workload
systems, this approach increases GP initialization overhead for no
good reason -- and on systems running these sorts of workloads, there
usually aren't "sacrificial lamb" timekeeping CPUs whose utilization
doesn't matter.
> > Furthermore, in this approach, RCU would still need to scan all the CPUs
> > to see if any did the first part of the transition to idle. And if we
> > have to scan either way, why not keep the idle-nonidle transitions cheap
> > and continue to rely on the scan? Here are the rationales I can think
> > of and what I am thinking in terms of doing instead:
> >
> > 1. The scan could become a scalability bottleneck. There is one
> > way to handle this today, and one possible future change. The way
> > to handle this today is to increas rcutree.jiffies_till_first_fqs,
> > for example, the SGI guys set it to 20 or thereabouts. If this
> > becomes problematic, I could easily create multiple kthreads to
> > carry out the FQS scan in parallel for large systems.
>
> *groan* whoever thought all this SMP nonsense was worth it again? :-)
NR_CPUS=0!!! It is the only way! ;-)
> > 2. Someone could demonstrate that RCU's grace periods were significantly
> > delaying boot. There are several ways of dealing with this:
>
> Surely there's also non-boot cases where most of the machine is 'idle' and
> we're running into FQS? Esp. now with that userspace NO_HZ stuff from Frederic.
Yep, but as noted above, the NO_HZ_FULL case will be running the RCU
GP kthreads on the timekeeping CPUs, where they aren't running worker
threads. In the general-purpose workload case, the CPUs are busy and
doing a wide variety of things, so that with high probability each
CPU checks in before the three-jiffies FQS scan has a chance to get
kicked off.
Thanx, Paul
next prev parent reply other threads:[~2013-05-16 13:22 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-12 23:18 [PATCH tip/core/rcu 0/7] RCU fixes for 3.11 Paul E. McKenney
2013-04-12 23:19 ` [PATCH tip/core/rcu 1/7] rcu: Convert rcutree.c printk calls Paul E. McKenney
2013-04-12 23:19 ` [PATCH tip/core/rcu 2/7] rcu: Convert rcutree_plugin.h " Paul E. McKenney
2013-04-12 23:19 ` [PATCH tip/core/rcu 3/7] rcu: Kick adaptive-ticks CPUs that are holding up RCU grace periods Paul E. McKenney
2013-04-13 14:06 ` Frederic Weisbecker
2013-04-13 15:19 ` Paul E. McKenney
2013-04-12 23:19 ` [PATCH tip/core/rcu 4/7] rcu: Don't allocate bootmem from rcu_init() Paul E. McKenney
2013-04-12 23:19 ` [PATCH tip/core/rcu 5/7] rcu: Remove "Experimental" flags Paul E. McKenney
2013-04-12 23:19 ` [PATCH tip/core/rcu 6/7] rcu: Drive quiescent-state-forcing delay from HZ Paul E. McKenney
2013-04-12 23:54 ` Josh Triplett
2013-04-13 6:38 ` Paul E. McKenney
2013-04-13 18:18 ` Josh Triplett
2013-04-13 19:34 ` Paul E. McKenney
2013-04-13 19:53 ` Josh Triplett
2013-04-13 22:09 ` Paul E. McKenney
2013-04-14 6:10 ` Paul E. McKenney
2013-05-14 12:20 ` Peter Zijlstra
2013-05-14 14:12 ` Paul E. McKenney
2013-05-14 14:51 ` Peter Zijlstra
2013-05-14 15:47 ` Paul E. McKenney
2013-05-15 8:56 ` Peter Zijlstra
2013-05-15 9:02 ` Peter Zijlstra
2013-05-15 17:31 ` Paul E. McKenney
2013-05-16 9:45 ` Peter Zijlstra
2013-05-16 13:22 ` Paul E. McKenney [this message]
2013-05-21 9:45 ` Peter Zijlstra
2013-05-21 16:54 ` Paul E. McKenney
2013-05-15 16:37 ` Paul E. McKenney
2013-05-16 9:37 ` Peter Zijlstra
2013-05-16 13:13 ` Paul E. McKenney
2013-05-15 9:20 ` Ingo Molnar
2013-05-15 15:44 ` Paul E. McKenney
2013-05-28 10:07 ` Ingo Molnar
2013-05-29 1:29 ` Paul E. McKenney
2013-04-15 2:03 ` Paul Mackerras
2013-04-15 17:26 ` Paul E. McKenney
2013-04-12 23:19 ` [PATCH tip/core/rcu 7/7] rcu: Merge adjacent identical ifdefs Paul E. McKenney
2013-04-13 0:01 ` [PATCH tip/core/rcu 0/7] RCU fixes for 3.11 Josh Triplett
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130516132210.GV4442@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=Valdis.Kletnieks@vt.edu \
--cc=akpm@linux-foundation.org \
--cc=darren@dvhart.com \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=josh@joshtriplett.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=mingo@elte.hu \
--cc=niv@us.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sbw@mit.edu \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).