From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, josh@joshtriplett.org,
rostedt@goodmis.org, mathieu.desnoyers@efficios.com,
jiangshanlai@gmail.com
Subject: Re: dyntick-idle CPU and node's qsmask
Date: Sun, 11 Nov 2018 13:04:08 -0800 [thread overview]
Message-ID: <20181111210408.GA85122@google.com> (raw)
In-Reply-To: <20181111183618.GY4170@linux.ibm.com>
On Sun, Nov 11, 2018 at 10:36:18AM -0800, Paul E. McKenney wrote:
[..]
> > > > > CPU will with high probability report its own quiescent state before three
> > > > > jiffies pass, in which case the cache misses on the rcu_data structures
> > > > > would be wasted motion.
> > > >
> > > > If all the CPUs are busy and reporting their QS themselves, then I think the
> > > > qsmask is likely 0 so then rcu_implicit_dynticks_qs (called from
> > > > force_qs_rnp) wouldn't be called and so there would no cache misses on
> > > > rcu_data right?
> > >
> > > Yes, but assuming that all CPUs report their quiescent states before
> > > the first call to rcu_gp_fqs(). One exception is when some CPU is
> > > looping in the kernel for many milliseconds without passing through a
> > > quiescent state. This is because for recent kernels, cond_resched()
> > > is not a quiescent state until the grace period is something like 100
> > > milliseconds old. (For older kernels, cond_resched() was never an RCU
> > > quiescent state unless it actually scheduled.)
> > >
> > > Why wait 100 milliseconds? Because otherwise the increase in
> > > cond_resched() overhead shows up all too well, causing 0day test robot
> > > to complain bitterly. Besides, I would expect that in the common case,
> > > CPUs would be executing usermode code.
> >
> > Makes sense. I was also wondering about this other thing you mentioned about
> > waiting for 3 jiffies before reporting the idle CPU's quiescent state. Does
> > that mean that even if a single CPU is dyntick-idle for a long period of
> > time, then the minimum grace period duration would be atleast 3 jiffies? In
> > our mobile embedded devices, jiffies is set to 3.33ms (HZ=300) to keep power
> > consumption low. Not that I'm saying its an issue or anything (since IIUC if
> > someone wants shorter grace periods, they should just use expedited GPs), but
> > it sounds like it would be shorter GP if we just set the qsmask early on some
> > how and we can manage the overhead of doing so.
>
> First, there is some autotuning of the delay based on HZ:
>
> #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
>
> So at HZ=300, you should be seeing a two-jiffy delay rather than the
> usual HZ=1000 three-jiffy delay. Of course, this means that the delay
> is 6.67ms rather than the usual 3ms, but the theory is that lower HZ
> rates often mean slower instruction execution and thus a desire for
> lower RCU overhead. There is further autotuning based on number of
> CPUs, but this does not kick in until you have 256 CPUs on your system,
> and I bet that smartphones aren't there yet. Nevertheless, check out
> RCU_JIFFIES_FQS_DIV for more info on this.
Got it. I agree with that heuristic.
> But you can always override this autotuning using the following kernel
> boot paramters:
>
> rcutree.jiffies_till_first_fqs
> rcutree.jiffies_till_next_fqs
>
> You can even set the first one to zero if you want the effect of pre-scanning
> for idle CPUs. ;-)
>
> The second must be set to one or greater.
>
> Both are capped at one second (HZ).
Got it. Thanks a lot for the explanations.
> > > > Anyway it was just an idea that popped up when I was going through traces :)
> > > > Thanks for the discussion and happy to discuss further or try out anything.
> > >
> > > Either way, I do appreciate your going through this. People have found
> > > RCU bugs this way, one of which involved RCU uselessly calling a particular
> > > function twice in quick succession. ;-)
> >
> > Thanks. It is my pleasure and happy to help :) I'll keep digging into it.
>
> Looking forward to further questions and patches. ;-)
Will do! thanks,
- Joel
next prev parent reply other threads:[~2018-11-11 21:04 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-10 21:46 dyntick-idle CPU and node's qsmask Joel Fernandes
2018-11-10 23:04 ` Paul E. McKenney
2018-11-11 3:09 ` Joel Fernandes
2018-11-11 4:22 ` Paul E. McKenney
2018-11-11 18:09 ` Joel Fernandes
2018-11-11 18:36 ` Paul E. McKenney
2018-11-11 21:04 ` Joel Fernandes [this message]
2018-11-20 20:42 ` Joel Fernandes
2018-11-20 22:28 ` Paul E. McKenney
2018-11-20 22:34 ` Paul E. McKenney
2018-11-21 2:06 ` Joel Fernandes
2018-11-21 2:41 ` Paul E. McKenney
2018-11-21 4:37 ` Joel Fernandes
2018-11-21 14:39 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181111210408.GA85122@google.com \
--to=joel@joelfernandes.org \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@linux.ibm.com \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.