Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Lai Jiangshan <laijs@cn.fujitsu.com>, Ingo Molnar <mingo@elte.hu>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods
Date: Wed, 24 Nov 2010 10:20:51 -0800	[thread overview]
Message-ID: <20101124182051.GF2207@linux.vnet.ibm.com> (raw)
In-Reply-To: <AANLkTikbZUO6MbSC=uWYp4JRCdHPHrr5Z2_=to1YsC1X@mail.gmail.com>

On Wed, Nov 24, 2010 at 06:38:45PM +0100, Frederic Weisbecker wrote:
> 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > On Wed, Nov 24, 2010 at 04:45:11PM +0100, Frederic Weisbecker wrote:
> >> 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> >> > On Wed, Nov 24, 2010 at 02:48:46PM +0100, Frederic Weisbecker wrote:
> >> CPU 1, the one that was idle :-D
> >>
> >> So CPU 1 rdp did catch up with node and state for its completed field.
> >> But not its pgnum yet.
> >
> > OK, I will need to take a closer look at the rdp->gpnum setting.
> 
> Ok, do you want me to resend the patch with the changelog changed accordingly
> to our discussion or?

Please!

> >> >> >> > Now we call rcu_check_quiescent_state() -> check_for_new_grace_period()
> >> >> >> > -> note_new_gpnum() and then we end up a requested quiescent state while
> >> >> >> > every grace periods are completed.
> >> >> >>
> >> >> >> Sorry I should have described that in the changelogs but my ideas
> >> >> >> weren't as clear as they
> >> >> >> are now (at least I feel they are, doesn't mean they actually are ;)
> >> >> >> Chasing these RCU bugs for too much hours has toasted my brain..
> >> >> >
> >> >> > Welcome to my world!!!  But keep in mind that an extra timer tick
> >> >> > or two is much preferable to a potential hang!  And you only get
> >> >> > the extra timer tick if there was some other reason that the
> >> >> > CPU came out of nohz mode, correct?
> >> >>
> >> >> Yeah, either because of a timer, hrtimer, or a reschedule.
> >> >> But you still generate a spurious softirq in this scheme.
> >> >
> >> > Eliminating spurious softirqs is a good thing, but let's review the
> >> > overall priorities (probably missing a few, but this should give you
> >> > an overall view):
> >> >
> >> > 1.      No too-short grace periods!!!
> >>
> >> Oh, I did not know that. Is it to avoid too much
> >> short series of callbacks execution or?
> >
> > To avoid breaking RCU's fundamental guarantee.  ;-)
> >
> > It is easy to create bugs where a CPU thinks that it needs to respond
> > to the grace period based on a prior quiescent state, but unknown to
> > it that grace period has finished and a new one has started.  This
> > CPU might then wrongly report a quiescent state against the new grace
> > period.  Needless to say, this is very very very very very very bad.
> 
> Aaah ok. I thought you were talking about the time of a correct grace period,
> not a bug that would make it seem shorter than it actually is.
> 
> Ok. Of course, priority number 1.

;-)

> >> > 7.      Let CPUs that can do so go to sleep immediately (as opposed to
> >> >        waiting a few milliseconds).
> >> >
> >> > 8.      Avoid spurious softirqs.
> >>
> >> Avoiding spurious softirqs bring us back to 7, even though it's not a matter
> >> of ms but rather us.
> >
> > Microseconds should not be a problem as long as they are rare.  Milliseconds
> > are a problem for the battery-powered folks as well as for people who care
> > about OS jitter.
> 
> Right.
> 
> 
> >> >> Two in fact: one because of the rnp->completed != rsp->completed condition
> >> >> in rcu_pending(), another one because when we update the pgnum, we
> >> >> always start chasing QS, regardless of the last GP beeing completed or not.
> >> >
> >> > OK -- is this an example of #8 above, or is it really #7?  I am absolutely
> >> > not worried about a pair of back-to-back softirqs, and I don't believe
> >> > that you should be, either.  ;-)
> >>
> >> This seems to be 8 and 7.
> >>
> >> A pair of back-to-back softirqs is no huge deal, but still time
> >> consuming, spurious, etc...
> >> if we can easily spott they are useless, why not get rid of them. At
> >> least we can easily
> >> and reliably avoid the second one from note_new_gpnum().
> >
> > As long as I can prove to myself that the patch that gets rid of them
> > does not cause other problems.  ;-)
> 
> :)
> 
> >> >> > Which is why checking the rnp fields makes more sense to me, actually.
> >> >> > Acquiring rnp->lock is much less painful than pinning down the rsp state.
> >> >>
> >> >> Right.
> >> >>
> >> >> Another thing, we  already have the (rnp->gpnum) != rdp->gpnu check in
> >> >> rcu_pending(),
> >> >> why also checking (rnp->completed) != rdp->completed) ?
> >> >
> >> > Because if (rnp->completed != rdp->completed), we might need to process
> >> > some callbacks, either advancing them or invoking them.
> >>
> >> The rdp->nxttail in rcu is still an obscur part for me, so I just believe you :)
> >
> > It is just the place that the pending rcu_head callbacks are stored.
> 
> Yeah. I mean, I need to read how the code manages the different queues.
> But __rcu_process_gp_end() seems to sum it up quite well.

For advancing callbacks, that is the one!  For invocation of callbacks,
see rcu_do_batch().

> >> > By the way, have you introduced a config variable for your HPC dynticks
> >> > changes?
> >>
> >> I will, because I'll have to touch some fast path and I would prefer to do
> >> that compile-conditional.
> >>
> >> But for now I don't care and experiment drafts :)
> >
> > Well, if any of the softirq/tick streamlining in RCU is risky, it needs
> > to go under that same #ifdef!  ;-)
> 
> Sure, if something needs a reorg or a change even in the dyntick-hpc
> off-case code,
> I'll propose that to you, like I did for these two patches.
> 
> But I'll try to keep the invasive changes that are only for
> dyntick-hpc only under that config.

Sounds good!

> >> I'm quite close to something that seems to work well BTW, those series of
> >> softirq were my latest problem as it made rcu_pending() returning 1 for
> >> a good while and I coudn't stop the tick. After the 2nd patch it should be
> >> fine now, I hope.
> >
> > By "couldn't stop the tick" you mean "the tick went on a few times more
> > than you like" or do you really mean "couldn't ever stop the tick"?
> > My guess is the former.  If there were problems like the latter, Arjan
> > would probably have beat me up about it long ago!
> 
> It's more like couldn't ever stop the tick. But that doesn't concern mainline.
> This is because I have a hook that prevents the tick from beeing stopped
> until rcu_pending() == 0.

That would certainly change behavior!!!  Why did you need to do that?

Ah, because force_quiescent_state() has not yet been taught about
dyntick-HPC, got it...

> In mainline it doesn't prevent the CPU from going nohz idle though, because
> the softirq is armed from the tick. Once the softirq is processed, the CPU
> can go to sleep. On the next timer tick it would again raise the softirq and
> then could again go to sleep, etc..

You lost me on this one.  If the CPU goes to sleep (AKA enters dyntick-idle
mode, right?), then there wouldn't be a next timer tick, right?

> I still have a trace of that, with my rcu_pending() hook, in
> dyntick-hpc, that kept
> returning 1 during at least 100 seconds and on each tick.
> I did not go really further into this from my code as I immediately
> switched to tip:master
> to check if the problem came from my code or not.
> And then I discovered that rcu_pending() indeed kept returning 1 for some while
> in mainline (don't remember how much could be "some while" though), I
> saw all these
> spurious rcu softirq at each ticks caused by rcu_pending() and for
> random time slices:
> probably between a wake up from idle and the next grace period, if my
> theory is right, and I
> think that happened likely with bh flavour probably because it's
> subject to less grace periods.
> 
> And this is what the second patch fixes in mainline and that also
> seems to fix my issue in
> dyntick-hpc.
> 
> Probably it happened more easily on dynctick-hpc as I was calling
> rcu_pending() after
> calling rcu_enter_nohz() (some buggy part of mine).

OK, but that is why dyntick-idle is governed by rcu_needs_cpu() rather
than rcu_pending().  But yes, need to upgrade force_quiescent_state().

One hacky way to do that would be to replace smp_send_reschedule() with
an smp_call_function_single() that invoked something like the following
on the target CPU:

	static void rcu_poke_cpu(void *info)
	{
		raise_softirq(RCU_SOFTIRQ);
	}

So rcu_implicit_offline_qs() does something like the following in place
of the smp_send_reschedule():

	smp_call_function_single(rdp->cpu, rcu_poke_cpu, NULL, 0);

The call to set_need_resched() can remain as is.

Of course, a mainline version would need to be a bit more discerning,
but this should do work just fine for your experimental use.

This should allow you to revert back to rcu_needs_cpu().

Or am I missing something here?

> >> >  Longer term, __rcu_pending() for your HPC dynticks case
> >> > could check for the current CPU having any callbacks before the call to
> >> > check_cpu_stalls(), as in rcu_needs_cpu_quick_check().  That way, the
> >> > CPUs with callbacks would drive the RCU core state machine.
> >>
> >> Ah, because currently every CPUs calling rcu_pending() can be pulled into
> >> handling the state machine because of this check, right?
> >>
> >>         /* Has an RCU GP gone long enough to send resched IPIs &c? */
> >>       if (rcu_gp_in_progress(rsp) &&
> >>           ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs), jiffies)) {
> >>               rdp->n_rp_need_fqs++;
> >>               return 1;
> >>
> >> So the goal would be to let this job to CPUs that have callbacks, right?
> >
> > In the dynticks-HPC case, yes.
> >
> >> > We don't
> >> > necessarily want this in the common case because it can increase
> >> > grace-period latency
> >>
> >> Because it makes less CPUs to handle the grace period state machine?
> >
> > Not quite.  Here I am not worried about the dyntick-HPC case, but rather
> > the usual more-runnable-tasks-than-CPUs case.  Here, CPUs are context
> > switching frequently, so if they report their own quiescent states (even
> > when they don't happen to have any callbacks queued) the grace period will
> > complete more quickly.  After all, force_quiescent_state() doesn't even
> > get started until the third tick following the start of the grace period.
> 
> Ah, I see what you mean. So you would suggest to even ignore those
> explicit QS report when in dynticj-hpc mode for CPUs that don't have callbacks?
> 
> Why not keeping them?

My belief is that people needing dyntick-HPC are OK with RCU grace periods
taking a few jiffies longer than they might otherwise.  Besides, when you
are running dyntick-HPC, you aren't context switching much, so keeping
the tick doesn't buy you as much reduction in grace-period latency.

> >> > but it could be very useful in the HPC-dyntick
> >> > case -- eliminate any number of sources of spurious ticks with low
> >> > risk to the high-priority RCU properties.  There are some corresponding
> >> > changes required on the force_quiescent_state() path, but these are
> >> > reasonably straightforward.
> >>
> >> Ok, I may have a try at it (if I completely understand what you suggest :-)
> >> But first I'll try to get that dyntick mode with the current state, which seems
> >> to be enough for the basic desired functionality.
> >
> > If the two changes you have posted thus far are all you need, this does
> > sound like a good starting point.
> 
> For now yeah, it seems to work well. I may run into further surpises though :)

Perish the thought!!!  ;-)

							Thanx, Paul

next prev parent reply	other threads:[~2010-11-24 18:20 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-24  0:31 [PATCH 0/2] rcu: Fix series of spurious RCU softirqs Frederic Weisbecker
2010-11-24  0:31 ` [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods Frederic Weisbecker
2010-11-24  0:58   ` Paul E. McKenney
2010-11-24  2:29     ` Frederic Weisbecker
2010-11-24  2:33       ` Frederic Weisbecker
2010-11-24  6:22         ` Paul E. McKenney
2010-11-24 13:48           ` Frederic Weisbecker
2010-11-24 14:42             ` Paul E. McKenney
2010-11-24 15:45               ` Frederic Weisbecker
2010-11-24 16:15                 ` Paul E. McKenney
2010-11-24 17:38                   ` Frederic Weisbecker
2010-11-24 18:20                     ` Paul E. McKenney [this message]
2010-11-24 20:22                       ` Paul E. McKenney
2010-11-24 20:45                         ` Frederic Weisbecker
2010-11-24 21:19                           ` Paul E. McKenney
2010-11-24 21:50                             ` Frederic Weisbecker
2010-11-24 22:42                       ` Frederic Weisbecker
2010-11-25 14:56                         ` Paul E. McKenney
2010-11-26 14:06                           ` Frederic Weisbecker
2010-11-29 23:06                             ` Paul E. McKenney
2010-11-24  0:31 ` [PATCH 2/2] rcu: Stop checking quiescent states after grace period completion from remote Frederic Weisbecker
2010-11-24  1:03   ` Paul E. McKenney
2010-11-25  3:42 ` [PATCH 0/2] rcu: Fix series of spurious RCU softirqs Lai Jiangshan
2010-11-25  7:38   ` Frederic Weisbecker
2010-11-25  8:35     ` Lai Jiangshan
2010-11-25  9:27       ` Frederic Weisbecker
2010-11-25 14:58         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101124182051.GF2207@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=fweisbec@gmail.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.