linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Josh Triplett <josh@joshtriplett.org>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
	niv@us.ibm.com, tglx@linutronix.de, peterz@infradead.org,
	rostedt@goodmis.org, Valdis.Kletnieks@vt.edu,
	dhowells@redhat.com, edumazet@google.com, darren@dvhart.com,
	fweisbec@gmail.com, sbw@mit.edu
Subject: Re: [PATCH tip/core/rcu 6/7] rcu: Drive quiescent-state-forcing delay from HZ
Date: Sat, 13 Apr 2013 12:34:25 -0700	[thread overview]
Message-ID: <20130413193425.GY29861@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130413181800.GA12096@leaf>

On Sat, Apr 13, 2013 at 11:18:00AM -0700, Josh Triplett wrote:
> On Fri, Apr 12, 2013 at 11:38:04PM -0700, Paul E. McKenney wrote:
> > On Fri, Apr 12, 2013 at 04:54:02PM -0700, Josh Triplett wrote:
> > > On Fri, Apr 12, 2013 at 04:19:13PM -0700, Paul E. McKenney wrote:
> > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > Systems with HZ=100 can have slow bootup times due to the default
> > > > three-jiffy delays between quiescent-state forcing attempts.  This
> > > > commit therefore auto-tunes the RCU_JIFFIES_TILL_FORCE_QS value based
> > > > on the value of HZ.  However, this would break very large systems that
> > > > require more time between quiescent-state forcing attempts.  This
> > > > commit therefore also ups the default delay by one jiffy for each
> > > > 256 CPUs that might be on the system (based off of nr_cpu_ids at
> > > > runtime, -not- NR_CPUS at build time).
> > > > 
> > > > Reported-by: Paul Mackerras <paulus@au1.ibm.com>
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > Something seems very wrong if RCU regularly hits the fqs code during
> > > boot; feels like there's some more straightforward solution we're
> > > missing.  What causes these CPUs to fall under RCU's scrutiny during
> > > boot yet not actually hit the RCU codepaths naturally?
> > 
> > The problem is that they are running HZ=100, so that RCU will often
> > take 30-60 milliseconds per grace period.  At that point, you only
> > need 16-30 grace periods to chew up a full second, so it is not all
> > that hard to eat up the additional 8-12 seconds of boot time that
> > they were seeing.  IIRC, UP boot was costing them 4 seconds.
> > 
> > For HZ=1000, this would translate to 800ms to 1.2s, which is nowhere
> > near as annoying.
> 
> That raises two questions, though.  First, who calls synchronize_rcu()
> repeatedly during boot, and could they call call_rcu() instead to avoid
> blocking for an RCU grace period?  Second, why does RCU need 3-6 jiffies
> to resolve a grace period during boot?  That suggests that RCU doesn't
> actually resolve a grace period until the force-quiescent-state
> machinery kicks in, meaning that the normal quiescent-state mechanism
> didn't work.

Indeed, converting synchronize_rcu() to call_rcu() might also be
helpful.  The reason that RCU often does not resolve grace periods until
force_quiescent_state() is that it is often the case during boot that
all but one CPU is idle.  RCU tries hard to avoid waking up idle CPUs,
so it must scan them.  Scanning is relatively expensive, so there is
reason to wait.

One thing that could be done would be to scan immediately during boot,
and then back off once boot has completed.  Of course, RCU has no idea
when boot has completed, but one way to get this effect is to boot
with rcutree.jiffies_till_first_fqs=0, and then use sysfs to set it
to 3 once boot has completed.

> > > Also, a comment below.
> > > 
> > > > --- a/kernel/rcutree.h
> > > > +++ b/kernel/rcutree.h
> > > > @@ -342,7 +342,17 @@ struct rcu_data {
> > > >  #define RCU_FORCE_QS		3	/* Need to force quiescent state. */
> > > >  #define RCU_SIGNAL_INIT		RCU_SAVE_DYNTICK
> > > >  
> > > > -#define RCU_JIFFIES_TILL_FORCE_QS	 3	/* for rsp->jiffies_force_qs */
> > > > +#if HZ > 500
> > > > +#define RCU_JIFFIES_TILL_FORCE_QS	 3	/* for jiffies_till_first_fqs */
> > > > +#elif HZ > 250
> > > > +#define RCU_JIFFIES_TILL_FORCE_QS	 2
> > > > +#else
> > > > +#define RCU_JIFFIES_TILL_FORCE_QS	 1
> > > > +#endif
> > > 
> > > This seems like it really wants to use a duration calculated directly
> > > from HZ; perhaps (HZ/100)?
> > 
> > Very possibly to the direct calculation, but HZ/100 would get 10 ticks
> > delay at HZ=1000, which is too high -- the value of 3 ticks for HZ=1000
> > works well.  But I could do something like this:
> > 
> > #define RCU_JIFFIES_TILL_FORCE_QS (((HZ + 199) / 300) + ((HZ + 199) / 300 ? 0 : 1))
> > 
> > Or maybe a bit better:
> > 
> > #define RCU_JTFQS_SE ((HZ + 199) / 300)
> > #define RCU_JIFFIES_TILL_FORCE_QS (RCU_JTFQS_SE + (RCU_JTFQS_SE ? 0 : 1))
> > 
> > This would come reasonably close to the values shown above.  Would
> > this work for you?
> 
> I'd argue that if you need something that complex, you should just
> explicitly write it as a step function:
> 
> #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))

Yeah, I couldn't resist handling HZ>1000, but that doesn't sound all
that likely.  I will use your suggested approach.

							Thanx, Paul


  reply	other threads:[~2013-04-13 19:34 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-12 23:18 [PATCH tip/core/rcu 0/7] RCU fixes for 3.11 Paul E. McKenney
2013-04-12 23:19 ` [PATCH tip/core/rcu 1/7] rcu: Convert rcutree.c printk calls Paul E. McKenney
2013-04-12 23:19   ` [PATCH tip/core/rcu 2/7] rcu: Convert rcutree_plugin.h " Paul E. McKenney
2013-04-12 23:19   ` [PATCH tip/core/rcu 3/7] rcu: Kick adaptive-ticks CPUs that are holding up RCU grace periods Paul E. McKenney
2013-04-13 14:06     ` Frederic Weisbecker
2013-04-13 15:19       ` Paul E. McKenney
2013-04-12 23:19   ` [PATCH tip/core/rcu 4/7] rcu: Don't allocate bootmem from rcu_init() Paul E. McKenney
2013-04-12 23:19   ` [PATCH tip/core/rcu 5/7] rcu: Remove "Experimental" flags Paul E. McKenney
2013-04-12 23:19   ` [PATCH tip/core/rcu 6/7] rcu: Drive quiescent-state-forcing delay from HZ Paul E. McKenney
2013-04-12 23:54     ` Josh Triplett
2013-04-13  6:38       ` Paul E. McKenney
2013-04-13 18:18         ` Josh Triplett
2013-04-13 19:34           ` Paul E. McKenney [this message]
2013-04-13 19:53             ` Josh Triplett
2013-04-13 22:09               ` Paul E. McKenney
2013-04-14  6:10                 ` Paul E. McKenney
2013-05-14 12:20                 ` Peter Zijlstra
2013-05-14 14:12                   ` Paul E. McKenney
2013-05-14 14:51                     ` Peter Zijlstra
2013-05-14 15:47                       ` Paul E. McKenney
2013-05-15  8:56                         ` Peter Zijlstra
2013-05-15  9:02                           ` Peter Zijlstra
2013-05-15 17:31                             ` Paul E. McKenney
2013-05-16  9:45                               ` Peter Zijlstra
2013-05-16 13:22                                 ` Paul E. McKenney
2013-05-21  9:45                                   ` Peter Zijlstra
2013-05-21 16:54                                     ` Paul E. McKenney
2013-05-15 16:37                           ` Paul E. McKenney
2013-05-16  9:37                             ` Peter Zijlstra
2013-05-16 13:13                               ` Paul E. McKenney
2013-05-15  9:20                     ` Ingo Molnar
2013-05-15 15:44                       ` Paul E. McKenney
2013-05-28 10:07                         ` Ingo Molnar
2013-05-29  1:29                           ` Paul E. McKenney
2013-04-15  2:03         ` Paul Mackerras
2013-04-15 17:26           ` Paul E. McKenney
2013-04-12 23:19   ` [PATCH tip/core/rcu 7/7] rcu: Merge adjacent identical ifdefs Paul E. McKenney
2013-04-13  0:01 ` [PATCH tip/core/rcu 0/7] RCU fixes for 3.11 Josh Triplett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130413193425.GY29861@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sbw@mit.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).