public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, josh@joshtriplett.org,
	rostedt@goodmis.org, mathieu.desnoyers@efficios.com,
	jiangshanlai@gmail.com
Subject: Re: dyntick-idle CPU and node's qsmask
Date: Sun, 11 Nov 2018 13:04:08 -0800	[thread overview]
Message-ID: <20181111210408.GA85122@google.com> (raw)
In-Reply-To: <20181111183618.GY4170@linux.ibm.com>

On Sun, Nov 11, 2018 at 10:36:18AM -0800, Paul E. McKenney wrote:
[..]
> > > > > CPU will with high probability report its own quiescent state before three
> > > > > jiffies pass, in which case the cache misses on the rcu_data structures
> > > > > would be wasted motion.
> > > > 
> > > > If all the CPUs are busy and reporting their QS themselves, then I think the
> > > > qsmask is likely 0 so then rcu_implicit_dynticks_qs (called from
> > > > force_qs_rnp) wouldn't be called and so there would no cache misses on
> > > > rcu_data right?
> > > 
> > > Yes, but assuming that all CPUs report their quiescent states before
> > > the first call to rcu_gp_fqs().  One exception is when some CPU is
> > > looping in the kernel for many milliseconds without passing through a
> > > quiescent state.  This is because for recent kernels, cond_resched()
> > > is not a quiescent state until the grace period is something like 100
> > > milliseconds old.  (For older kernels, cond_resched() was never an RCU
> > > quiescent state unless it actually scheduled.)
> > > 
> > > Why wait 100 milliseconds?  Because otherwise the increase in
> > > cond_resched() overhead shows up all too well, causing 0day test robot
> > > to complain bitterly.  Besides, I would expect that in the common case,
> > > CPUs would be executing usermode code.
> > 
> > Makes sense. I was also wondering about this other thing you mentioned about
> > waiting for 3 jiffies before reporting the idle CPU's quiescent state. Does
> > that mean that even if a single CPU is dyntick-idle for a long period of
> > time, then the minimum grace period duration would be atleast 3 jiffies? In
> > our mobile embedded devices, jiffies is set to 3.33ms (HZ=300) to keep power
> > consumption low. Not that I'm saying its an issue or anything (since IIUC if
> > someone wants shorter grace periods, they should just use expedited GPs), but
> > it sounds like it would be shorter GP if we just set the qsmask early on some
> > how and we can manage the overhead of doing so.
> 
> First, there is some autotuning of the delay based on HZ:
> 
> #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
> 
> So at HZ=300, you should be seeing a two-jiffy delay rather than the
> usual HZ=1000 three-jiffy delay.  Of course, this means that the delay
> is 6.67ms rather than the usual 3ms, but the theory is that lower HZ
> rates often mean slower instruction execution and thus a desire for
> lower RCU overhead.  There is further autotuning based on number of
> CPUs, but this does not kick in until you have 256 CPUs on your system,
> and I bet that smartphones aren't there yet.  Nevertheless, check out
> RCU_JIFFIES_FQS_DIV for more info on this.

Got it. I agree with that heuristic.

> But you can always override this autotuning using the following kernel
> boot paramters:
> 
> rcutree.jiffies_till_first_fqs
> rcutree.jiffies_till_next_fqs
> 
> You can even set the first one to zero if you want the effect of pre-scanning
> for idle CPUs.  ;-)
> 
> The second must be set to one or greater.
> 
> Both are capped at one second (HZ).

Got it. Thanks a lot for the explanations.

> > > > Anyway it was just an idea that popped up when I was going through traces :)
> > > > Thanks for the discussion and happy to discuss further or try out anything.
> > > 
> > > Either way, I do appreciate your going through this.  People have found
> > > RCU bugs this way, one of which involved RCU uselessly calling a particular
> > > function twice in quick succession.  ;-)
> >  
> > Thanks.  It is my pleasure and happy to help :) I'll keep digging into it.
> 
> Looking forward to further questions and patches.  ;-)

Will do! thanks,

 - Joel


  reply	other threads:[~2018-11-11 21:04 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-10 21:46 dyntick-idle CPU and node's qsmask Joel Fernandes
2018-11-10 23:04 ` Paul E. McKenney
2018-11-11  3:09   ` Joel Fernandes
2018-11-11  4:22     ` Paul E. McKenney
2018-11-11 18:09       ` Joel Fernandes
2018-11-11 18:36         ` Paul E. McKenney
2018-11-11 21:04           ` Joel Fernandes [this message]
2018-11-20 20:42           ` Joel Fernandes
2018-11-20 22:28             ` Paul E. McKenney
2018-11-20 22:34               ` Paul E. McKenney
2018-11-21  2:06               ` Joel Fernandes
2018-11-21  2:41                 ` Paul E. McKenney
2018-11-21  4:37                   ` Joel Fernandes
2018-11-21 14:39                     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181111210408.GA85122@google.com \
    --to=joel@joelfernandes.org \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@linux.ibm.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox