Re: [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop() enable tick on all online CPUs

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: Frederic Weisbecker <frederic@kernel.org>,
	rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
	mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
	rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
	fweisbec@gmail.com, oleg@redhat.com
Subject: Re: [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop() enable tick on all online CPUs
Date: Thu, 15 Aug 2019 11:39:37 -0700	[thread overview]
Message-ID: <20190815183937.GK28441@linux.ibm.com> (raw)
In-Reply-To: <20190815181500.GC12078@google.com>

On Thu, Aug 15, 2019 at 02:15:00PM -0400, Joel Fernandes wrote:
> On Thu, Aug 15, 2019 at 10:23:51AM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 15, 2019 at 11:07:35AM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 14, 2019 at 03:05:16PM -0700, Paul E. McKenney wrote:
> > > [snip]
> > > > > > If so, perhaps that monitoring could periodically invoke an RCU function
> > > > > > that I provide for deciding when to turn the tick on.  We would also need
> > > > > > to work out how to turn the tick off in a timely fashion once the CPU got
> > > > > > out of kernel mode, perhaps in rcu_user_enter() or rcu_nmi_exit_common().
> > > > > > 
> > > > > > If this would be called only every second or so, the separate grace-period
> > > > > > checking is still needed for its shorter timespan, though.
> > > > > > 
> > > > > > Thoughts?
> > > > > 
> > > > > Do you want me to test the below patch to see if it fixes the issue with my
> > > > > other test case (where I had a nohz full CPU holding up a grace period).
> > > > 
> > > > Please!
> > > 
> > > I tried the patch below, but it did not seem to make a difference to the
> > > issue I was seeing. My test tree is here in case you can spot anything I did
> > > not do right: https://github.com/joelagnel/linux-kernel/commits/rcu/nohz-test
> > > The main patch is here:
> > > https://github.com/joelagnel/linux-kernel/commit/4dc282b559d918a0be826936f997db0bdad7abb3
> > 
> > That is more aggressive that rcutorture's rcu_torture_fwd_prog_nr(), so
> > I am guessing that I need to up rcu_torture_fwd_prog_nr()'s game.  I am
> > currently testing that.
> > 
> > > On the trace output, I grep something like: egrep "(rcu_perf|cpu 3|3d)". I
> > > see a few ticks after 300ms, but then there are no more ticks and just a
> > > periodic resched_cpu() from rcu_implicit_dynticks_qs():
> > > 
> > > [   19.534107] rcu_perf-165    12.... 2276436us : rcu_perf_writer: Start of rcuperf test
> > > [   19.557968] rcu_pree-10      0d..1 2287973us : rcu_implicit_dynticks_qs: Sending urgent resched to cpu 3
> > > [   20.136222] rcu_perf-165     3d.h. 2591894us : rcu_sched_clock_irq: sched-tick
> > > [   20.137185] rcu_perf-165     3d.h2 2591906us : rcu_sched_clock_irq: sched-tick
> > > [   20.138149] rcu_perf-165     3d.h. 2591911us : rcu_sched_clock_irq: sched-tick
> > > [   20.139106] rcu_perf-165     3d.h. 2591915us : rcu_sched_clock_irq: sched-tick
> [snip]
> > > [   20.147797] rcu_perf-165     3d.h. 2591953us : rcu_sched_clock_irq: sched-tick
> > > [   20.148759] rcu_perf-165     3d.h. 2591957us : rcu_sched_clock_irq: sched-tick
> > > [   20.151655] rcu_pree-10      0d..1 2591979us : rcu_implicit_dynticks_qs: Sending urgent resched to cpu 3
> > > [   20.732938] rcu_pree-10      0d..1 2895960us : rcu_implicit_dynticks_qs: Sending urgent resched to cpu 3
> [snip]
> > > [   26.566100] rcu_pree-10      0d..1 5935982us : rcu_implicit_dynticks_qs: Sending urgent resched to cpu 3
> > > [   27.144497] rcu_pree-10      0d..1 6239973us : rcu_implicit_dynticks_qs: Sending urgent resched to cpu 3
> > > [   27.192661] rcu_perf-165     3d.h. 6276923us : rcu_sched_clock_irq: sched-tick
> > > [   27.705789] rcu_pree-10      0d..1 6541901us : rcu_implicit_dynticks_qs: Sending urgent resched to cpu 3
> > > [   28.292155] rcu_pree-10      0d..1 6845974us : rcu_implicit_dynticks_qs: Sending urgent resched to cpu 3
> > > [   28.874049] rcu_pree-10      0d..1 7149972us : rcu_implicit_dynticks_qs: Sending urgent resched to cpu 3
> > > [   29.112646] rcu_perf-165     3.... 7275951us : rcu_perf_writer: End of rcuperf test
> > 
> > That would be due to my own stupidity.  I forgot to clear ->rcu_forced_tick
> > in rcu_disable_tick_upon_qs() inside the "if" statement.  This of course
> > prevents rcu_nmi_exit_common() from ever re-enabling it.
> > 
> > Excellent catch!  Thank you for testing this!!!
> 
> Ah I missed it too. Happy to help! I tried setting it as below but getting
> same results:
> 
> +/*
> + * If the scheduler-clock interrupt was enabled on a nohz_full CPU
> + * in order to get to a quiescent state, disable it.
> + */
> +void rcu_disable_tick_upon_qs(struct rcu_data *rdp)
> +{
> +       if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick)
> +               tick_dep_clear_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> +       rdp->rcu_forced_tick = false;

I put this inside the "if" statement, though I would not expect that to
change behavior in this case.

Does your test case still avoid turning on the tick more than once?  Or
is it turning on the tick each time the grace period gets too long, but
without the tick managing to end the grace periods?

> +}
> +
> 
> > > [snip]
> > > > > >  	if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> > > > > > +		rcu_disable_tick_upon_qs(rdp);
> > > > > >  		/* Report QS -after- changing ->qsmaskinitnext! */
> > > > > >  		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> > > > > 
> > > > > Just curious about the existing code. If a CPU is just starting up (after
> > > > > bringing it online), how can RCU be waiting on it? I thought RCU would not be
> > > > > watching offline CPUs.
> > > > 
> > > > Well, neither grace periods nor CPU-hotplug operations are atomic,
> > > > and each can take significant time to complete.
> > > > 
> > > > So suppose we have a large system with multiple leaf rcu_node structures
> > > > (not that 17 CPUs is all that many these days, but please bear with me).
> > > > Suppose just after a new grace period initializes a given leaf rcu_node
> > > > structure, one of its CPUs goes offline (yes, that CPU would have to
> > > > have waited on a grace period, but that might have been the previous
> > > > grace period).  But before the FQS scan notices that RCU is waiting on
> > > > an offline CPU, the CPU comes back online.
> > > > 
> > > > That situation is exactly what the above code is intended to handle.
> > > 
> > > That makes sense!
> > > 
> > > > Without that code, RCU can give false-positive splats at various points
> > > > in its processing.  ("Wait!  How can a task be blocked waiting on a
> > > > grace period that hasn't even started yet???")
> > > 
> > > I did not fully understand the question in brackets though, a task can be on
> > > a different CPU though which has nothing to do with the CPU that's going
> > > offline/online so it could totally be waiting on a grace period right?
> > > 
> > > Also waiting on a grace period that hasn't even started is totally possible:
> > > 
> > >      GP1         GP2
> > > |<--------->|<-------->|
> > >      ^                 ^
> > >      |                 |____  task gets unblocked
> > > task blocks
> > > on synchronize_rcu
> > > but is waiting on
> > > GP2 which hasn't started
> > > 
> > > Or did I misunderstand the question?
> > 
> > There is a ->gp_tasks field in the leaf rcu_node structures that
> > references a list of tasks blocking the current grace period.  When there
> > is no grace period in progress (as is the case from the end of GP1 to
> > the beginning of GP2, the RCU code expects ->gp_tasks to be NULL.
> > Without the curiosity code you pointed out above, ->gp_tasks could
> > in fact end up being non-NULL when no grace period was in progress.
> > 
> > And did end up being non-NULL from time to time, initially every few
> > hundred hours of a particular rcutorture scenario.
> 
> Oh ok! I will think more about it. I am not yet able to connect the gp_tasks
> being non-NULL to the CPU going offline/online scenario though. Maybe I
> should delete this code, run an experiment and trace for this condition
> (gp_tasks != NULL)?

Or you could dig through the git logs for this code change.

> I love it how you found these issues by heavy testing and fixed them.

Me, I would have rather foreseen them and avoided them in the first place,
but I agree that it is better for rcutorture to find them than for some
hapless user somewhere to be inconvenienced by them.  ;-)

							Thanx, Paul

next prev parent reply	other threads:[~2019-08-15 18:40 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02 15:14 [PATCH tip/core/rcu 0/14] No-CBs bypass addition for v5.4 Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 01/14] rcu/nocb: Atomic ->len field in rcu_segcblist structure Paul E. McKenney
2019-08-04 14:50   ` Peter Zijlstra
2019-08-04 14:52     ` Peter Zijlstra
2019-08-04 18:45       ` Paul E. McKenney
2019-08-04 18:42     ` Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 02/14] rcu/nocb: Add bypass callback queueing Paul E. McKenney
2019-08-07  0:03   ` Joel Fernandes
2019-08-07  0:16     ` Joel Fernandes
2019-08-07  0:35     ` Paul E. McKenney
2019-08-07  0:40       ` Steven Rostedt
2019-08-07  1:17         ` Paul E. McKenney
2019-08-07  1:24           ` Steven Rostedt
2019-08-07  3:47             ` Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 03/14] rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 04/14] rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 05/14] rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake() Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 06/14] rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks() Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 07/14] rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 08/14] rcu/nocb: Reduce __call_rcu_nocb_wake() " Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 09/14] rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 10/14] rcu: Allow rcu_do_batch() to dynamically adjust batch sizes Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 11/14] EXP nohz: Add TICK_DEP_BIT_RCU Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 12/14] rcu/nohz: Force on tick when invoking lots of callbacks Paul E. McKenney
2019-08-02 15:15 ` [PATCH RFC tip/core/rcu 13/14] rcutorture: Force on tick for readers and callback flooders Paul E. McKenney
2019-08-02 15:15 ` [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop() enable tick on all online CPUs Paul E. McKenney
2019-08-04 14:43   ` Peter Zijlstra
2019-08-04 14:48     ` Peter Zijlstra
2019-08-04 18:41       ` Paul E. McKenney
2019-08-04 20:24         ` Paul E. McKenney
2019-08-05  4:19           ` Paul E. McKenney
2019-08-05  8:07             ` Peter Zijlstra
2019-08-05 14:47               ` Paul E. McKenney
2019-08-05  8:05         ` Peter Zijlstra
2019-08-05 14:54           ` Paul E. McKenney
2019-08-05 15:50             ` Peter Zijlstra
2019-08-05 17:48               ` Paul E. McKenney
2019-08-06 18:08                 ` Paul E. McKenney
2019-08-07 21:41                   ` Paul E. McKenney
2019-08-08 20:35                     ` Paul E. McKenney
2019-08-08 21:30                       ` Paul E. McKenney
2019-08-09 16:51                         ` Paul E. McKenney
2019-08-09 18:07                           ` Joel Fernandes
2019-08-09 18:39                             ` Paul E. McKenney
2019-08-12 21:02   ` Frederic Weisbecker
2019-08-12 23:23     ` Paul E. McKenney
2019-08-13  1:33       ` Joel Fernandes
2019-08-13 12:30       ` Frederic Weisbecker
2019-08-13 14:48         ` Paul E. McKenney
2019-08-14 17:55           ` Joel Fernandes
2019-08-14 22:05             ` Paul E. McKenney
2019-08-15 15:07               ` Joel Fernandes
2019-08-15 17:23                 ` Paul E. McKenney
2019-08-15 18:15                   ` Joel Fernandes
2019-08-15 18:39                     ` Paul E. McKenney [this message]
2019-08-15 19:42                       ` Joel Fernandes
2019-08-13 21:06       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190815183937.GK28441@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=frederic@kernel.org \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).