Re: A few questions and issues with dynticks, NOHZ and powertop

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Dominik Brodowski <linux@dominikbrodowski.net>,
	Alan Stern <stern@rowland.harvard.edu>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>,
	Arjan van de Ven <arjan@linux.intel.com>,
	Dmitry Torokhov <dtor@mail.ru>
Subject: Re: A few questions and issues with dynticks, NOHZ and powertop
Date: Mon, 5 Apr 2010 15:31:30 -0700	[thread overview]
Message-ID: <20100405223130.GM2525@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100405221123.GA1903@isilmar.linta.de>

On Tue, Apr 06, 2010 at 12:11:23AM +0200, Dominik Brodowski wrote:
> On Mon, Apr 05, 2010 at 02:38:52PM -0700, Paul E. McKenney wrote:
> > On Mon, Apr 05, 2010 at 11:03:40PM +0200, Dominik Brodowski wrote:
> > > Paul,
> > > 
> > > I really appreaciate your reply -- thanks! I've done some more testing in
> > > the meantime:
> > > 
> > > On Sun, Apr 04, 2010 at 01:47:25PM -0700, Paul E. McKenney wrote:
> > > > On Sun, Apr 04, 2010 at 06:39:24PM +0200, Dominik Brodowski wrote:
> > > > > On Sun, Apr 04, 2010 at 11:17:37AM -0400, Alan Stern wrote:
> > > > > > On Sun, 4 Apr 2010, Dominik Brodowski wrote:
> > > > > > 
> > > > > > > Booting a SMP-capable kernel with "nosmp", or manually offlining one CPU
> > > > > > > (or -- though I haven't tested it -- booting a SMP-capable kernel on a
> > > > > > > system with merely one CPU) means that in up to about half of the calls to
> > > > > > > tick_nohz_stop_sched_tick() are aborted due to rcu_needs_cpu(). This is
> > > > > > > quite strange to me: AFAIK, RCU is an excellent tool for SMP, but not really
> > > > > > > needed for UP?
> > > > > > 
> > > > > > I can't answer the real question here, not knowing enough about the RCU
> > > > > > implementation.  However, your impression is wrong: RCU very definitely
> > > > > > _is_ useful and needed on UP systems.  It coordinates among processes
> > > > > > (and interrupt handlers) as well as among processors.
> > > > > 
> > > > > Okay, but still: can't this be sped up by much on UP (especially if
> > > > > CONFIG_RCU_FAST_NO_HZ is set), so that we can go to sleep right away?
> > > > 
> > > > One situation that will prevent CONFIG_RCU_FAST_NO_HZ from putting the
> > > > machine to sleep right away is if there is an RCU callback posted that
> > > > spawns another RCU callback, and so on.  CONFIG_RCU_FAST_NO_HZ will handle
> > > > one callback that spawns another, but it gives up if the second callback
> > > > spawns a third.
> > > 
> > > Will the remaining callbacks be executed immediately afterwards (due to a
> > > need_resched() etc.), or only after the next tick?
> > 
> > Only after the next tick.  To see why, imagine an RCU callback that
> > re-registers itself -- which is a perfectly legal thing to do.  The
> > only thing that will happen if we run through grace periods faster is
> > that we will have more invocations of that same callback to deal with.
> > 
> > So we try for a bit, and if that doesn't get rid of all of the callbacks,
> > we hold off until the next jiffy.
> > 
> > > > Might this be what is happening to you?
> > > > 
> > > > If so, would you be willing to patch your kernel?  RCU_NEEDS_CPU_FLUSHES
> > > > is currently set to 5, and might be set to (say) 8.  This is defined
> > > > in kernel/rcutree_plugin.h, near line 990.
> > > 
> > > Applied the patch by Lai Jiangshan, and tested 5 and 8:
> > > 
> > > 5:	  Wakeups-from-idle: 33.4		(hrtimer_sched_timer: 78 %)
> > > 		34% of calls to tick_nohz_stop_sched_tick fail due to
> > > 			rcu_needs_cpu()
> > > 8:	  Wakeups-from-idle: 36.5		(hrtimer_sched_timer: 83 %)
> > > 		37% of calls to tick_nohz_stop_sched_tick fail due to
> > > 			rcu_needs_cpu()
> > 
> > I don't recall your posting wakeups-from-idle for the original -- did
> > we get improvement?  You did say "roughly 50%", but...
> 
> Actually, no. I'd say the 5-to-8 change has no significant effect at all;
> for the Patch by Lai Jiangshan, I'd need to re-run the test.
> 
> > OK, I see what is happening...
> > 
> > What happens in the CONFIG_RCU_FAST_NO_HZ case is as follows:
> > 
> > o	Check to see if the holdoff period is in effect, and if so,
> > 	just check to see if RCU needs the CPU for later processing
> > 	without attempting to accelerate grace periods.
> > 
> > o	Check to see if there is some other non-dyntick-idle CPU.
> > 	If there is, reset holdoff state and just check to see if
> > 	RCU needs the CPU for later processing without attempting to
> > 	accelerate grace periods.
> > 
> > o	Check for initialization and hitting the RCU_NEEDS_CPU_FLUSHES
> > 	limit, again doing the "just check" thing if we hit the limit.
> > 
> > o	For each of RCU-sched and RCU-bh, note a quiescent state
> > 	and force the grace-period machinery, noting in each case
> > 	whether or not there are callbacks left to invoke.
> > 
> > o	If there are callbacks left to invoke, raise RCU_SOFTIRQ.
> > 	This softirq will process the callbacks.  (Why not just invoke
> > 	the softirq function directly?	Because lockdep yells at you
> > 	and I do not believe that this is a false positive.)
> > 
> > o	If there are callbacks left to invoke, tell the caller that
> > 	this CPU cannot yet enter dyntick-idle state.
> > 
> > But if we told the caller that this CPU cannot yet enter dyntick-idle
> > state, then we also raised RCU_SOFTIRQ.  Once the softirq returns, we
> > should once again try to enter dyntick-idle state.
> > 
> > So a significant fraction of calls to rcu_needs_cpu() saying "no" does
> > not necessarily mean that we are taking significant time to get the
> > grace periods and callbacks out of the way.  The funny loop involving
> > softirq is required due to locking-design issues.
> > 
> > Or are you seeing significant delays between successive calls to
> > rcu_needs_cpu() on your setup?
> 
> Will check this, but all the data I'm seeing points to rcu_needs_cpu() not
> leading to additional wakeups. It might just be wrong reports by powertop,
> after all, for the UP case.

OK, for all I know, powertop might need some adjustment to allow for
the presence of CONFIG_RCU_FAST_NO_HZ.

>                             Quoting my original mail:
> 
> > 5) powertop and hrtimer_start_range_ns (tick_sched_timer) on a SMP kernel
> > booted with "nosmp":
> > 
> > Wakeups-from-idle per second :  9.9     interval: 15.0s
> > ...
> >   48.5% (  9.4)     <kernel core> : hrtimer_start (tick_sched_timer) 
> >   26.1% (  5.1)     <kernel core> : cursor_timer_handler
> >   (cursor_timer_handle
> >   20.6% (  4.0)     <kernel core> : usb_hcd_poll_rh_status (rh_timer_func) 
> >    1.0% (  0.2)     <kernel core> : arm_supers_timer
> >   (sync_supers_timer_fn) 
> >    0.7% (  0.1)       <interrupt> : ata_piix 
> >    ...
> > 
> > Accoding to http://www.linuxpowertop.org , the count in the brackets is
> > how
> > many wakeups per seconds were caused by one source. Adding all _except_
> >   48.5% (  9.4)     <kernel core> : hrtimer_start (tick_sched_timer)
> > up leads to the 9.9.

OK, so you further instrumented the hrtimer_sched_timer (or was it
tick_sched_timer?) to find the number that you were attributing to
rcu_needs_cpu()?

> Back to your mail:
> 
> > > tick_nohz_stop_sched_tick() doesn't fail in this case because of
> > > rcu_needs_cpu(). However, the improvements are hardly recognizable:
> > > 
> > > TINY_RCU: Wakeups-from-idle: 33.9		(hrtimer_sched_timer: 53 %)
> > 
> > TINY_RCU is set up to automatically do CONFIG_RCU_FAST_NO_HZ, and do
> > the same softirq dance, or that is the theory, anyway.  Again, are you
> > seeing significant delays between successive calls to rcu_needs_cpu()?
> 
> Actually, rcu_needs_cpu() is statically defined to return 0 on TINY_RCU in
> include/linux/rcutiny.h .

Exactly!  ;-)

							Thanx, Paul

next prev parent reply	other threads:[~2010-04-05 22:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-03 22:33 A few questions and issues with dynticks, NOHZ and powertop Dominik Brodowski
2010-04-03 23:53 ` Dmitry Torokhov
2010-04-04 10:35   ` Dominik Brodowski
2010-04-05 20:54     ` Dmitry Torokhov
2010-04-04 10:47   ` Dominik Brodowski
2010-04-05  3:42     ` Arjan van de Ven
2010-04-05 20:41       ` Dominik Brodowski
2010-04-05 20:52         ` Dmitry Torokhov
2010-04-04 15:17 ` Alan Stern
2010-04-04 16:39   ` Dominik Brodowski
2010-04-04 20:47     ` Paul E. McKenney
2010-04-04 23:37       ` Paul E. McKenney
2010-04-05  3:44         ` Arjan van de Ven
2010-04-05  4:22           ` Paul E. McKenney
2010-04-05 14:40             ` Arjan van de Ven
2010-04-05 15:14               ` Paul E. McKenney
2010-04-05 16:07                 ` Arjan van de Ven
2010-04-05 16:22                   ` Paul E. McKenney
2010-04-05 16:23                     ` Arjan van de Ven
2010-04-05 16:40                       ` Paul E. McKenney
2010-04-05 18:44                   ` david
2010-04-05 19:48                     ` Arjan van de Ven
2010-04-05 20:34                       ` Paul E. McKenney
2010-04-05 21:03       ` Dominik Brodowski
2010-04-05 21:38         ` Paul E. McKenney
2010-04-05 22:11           ` Dominik Brodowski
2010-04-05 22:31             ` Paul E. McKenney [this message]
2010-04-06 20:45               ` Dominik Brodowski
2010-04-06 20:59                 ` Paul E. McKenney
2010-04-08 19:59 ` [RFC PATCH] nohz/sched: disable ilb on !mc_capable() Dominik Brodowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100405223130.GM2525@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=arjan@linux.intel.com \
    --cc=dtor@mail.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@dominikbrodowski.net \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=stern@rowland.harvard.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox