Re: [PATCH -rcu dev 1/2] Revert b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@kernel.org>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: linux-kernel@vger.kernel.org, Andy Lutomirski <luto@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Ingo Molnar <mingo@redhat.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Petr Mladek <pmladek@suse.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	rcu@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>,
	Yafang Shao <laoar.shao@gmail.com>
Subject: Re: [PATCH -rcu dev 1/2] Revert b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")
Date: Thu, 5 Sep 2019 09:43:29 -0700	[thread overview]
Message-ID: <20190905164329.GT4125@linux.ibm.com> (raw)
In-Reply-To: <20190905153620.GG26466@google.com>

On Thu, Sep 05, 2019 at 11:36:20AM -0400, Joel Fernandes wrote:
> On Wed, Sep 04, 2019 at 04:13:08PM -0700, Paul E. McKenney wrote:
> > On Wed, Sep 04, 2019 at 09:54:20AM -0400, Joel Fernandes wrote:
> > > On Wed, Sep 04, 2019 at 03:12:10AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Sep 04, 2019 at 12:59:10AM -0400, Joel Fernandes wrote:
> > > > > On Tue, Sep 03, 2019 at 01:02:49PM -0700, Paul E. McKenney wrote:
> > 
> > [ . . . ]
> > 
> > > > If this task gets delayed betweentimes, rcu_implicit_dynticks_qs() would
> > > > fail to set .rcu_need_heavy_qs because it saw it already being set,
> > > > even though the corresponding ->dynticks update had already happened.
> > > > (It might be a new grace period, given that the old grace period might
> > > > have ended courtesy of the atomic_add_return().)
> > > 
> > > Makes sense and I agree.
> > > 
> > > Also, I would really appreciate if you can correct the nits in the above
> > > patch we're reviewing, and apply them (if you can).
> > > I think, there are only 2 changes left:
> > > - rename special to seq.
> > > - reorder the rcu_need_heavy_qs write.
> > > 
> > >  On a related point, when I was working on the NOHZ_FULL testing I noticed a
> > >  weird issue where rcu_urgent_qs was reset but rcu_need_heavy_qs was still
> > >  set indefinitely. I am a bit afraid our hints are not being cleared
> > >  appropriately and I believe I fixed a similar issue a few months ago. I
> > >  would rather have them cleared once they are no longer needed.  What do you
> > >  think about the below patch? I did not submit it yet because I was working
> > >  on other patches. 
> > > 
> > > ---8<-----------------------
> > > 
> > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > Subject: [RFC] rcu/tree: Reset CPU hints when reporting a quiescent state
> > > 
> > > While tracing, I am seeing cases where need_heavy_qs is still set even
> > > though urgent_qs was cleared, after a quiescent state is reported. One
> > > such case is when the softirq reports that a CPU has passed quiescent
> > > state.
> > > 
> > > Previously in 671a63517cf9 ("rcu: Avoid unnecessary softirq when system
> > > is idle"), I had fixed a bug where core_needs_qs was not being cleared.
> > > I worry we keep running into similar situations. Let us just add a
> > > function to clear hints and call it from all relevant places to make the
> > > code more robust and avoid such stale hints which could in theory at
> > > least, cause false hints after the quiescent state was already reported.
> > > 
> > > Tested overnight with rcutorture running for 60 minutes on all
> > > configurations of RCU.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/tree.c | 17 ++++++++++++++++-
> > >  1 file changed, 16 insertions(+), 1 deletion(-)
> > 
> > Excellent point!  But how about if we combine it with the existing
> > disabling of the scheduler tick, perhaps something like the following?
> > 
> > Note that the FQS clearing can come from some other CPU, hence the added
> > {READ,WRITE}_ONCE() calls.  The call is moved down in rcu_report_qs_rdp()
> > because something would have had to clear the bit to prevent execution
> > from getting there, and I believe that the other bit-clearing events
> > have calls to rcu_disable_urgency_upon_qs().  (But I easily could have
> > missed something!)
> 
> Is there any harm just clearing it earlier in rcu_report_qs_rdp()? If no,
> then let us just play it safe and do it that way (clear earlier in
> rcu_report_qs_rdp())?

Maybe...

But given that missing a path doesn't cause a major failure (too-short
grace period, for example), I am more inclined to find the paths and
fix them as needed.  Especially given that my ignorance of any path to
a quiescent state likely hides a serious bug.

> > I am OK leaving RCU urgency set on offline CPUs, hence clearing things
> > at online time.
> 
> Got it, probably this point can be added to the commit message.
> 
> Added more comments below but otherwise it looks good to me:
> 
> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 68ebf0eb64c8..2b74b6c94086 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -827,7 +827,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> >  		incby = 1;
> >  	} else if (tick_nohz_full_cpu(rdp->cpu) &&
> >  		   rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE &&
> > -		   rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> > +		   READ_ONCE(rdp->rcu_urgent_qs) && !rdp->rcu_forced_tick) {
> >  		rdp->rcu_forced_tick = true;
> >  		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> >  	}
> > @@ -892,11 +892,15 @@ void rcu_irq_enter_irqson(void)
> >  }
> >  
> >  /*
> > - * If the scheduler-clock interrupt was enabled on a nohz_full CPU
> > - * in order to get to a quiescent state, disable it.
> > + * If any sort of urgency was applied to the current CPU (for example,
> > + * the scheduler-clock interrupt was enabled on a nohz_full CPU) in order
> > + * to get to a quiescent state, disable it.
> >   */
> > -void rcu_disable_tick_upon_qs(struct rcu_data *rdp)
> > +void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
> >  {
> > +	WRITE_ONCE(rdp->core_needs_qs, false);
> > +	WRITE_ONCE(rdp->rcu_urgent_qs, false);
> > +	WRITE_ONCE(rdp->rcu_need_heavy_qs, false);
> 
> Better to put a comment here saying _ONCE is needed to avoid data-races with
> the FQS loop? Just so if anyone thinks why we are using _ONCE().

Good point.  I added a "// WRITE_ONCE() for FQS".

> And I am guessing the __this_cpu_read(rcu_data.core_needs_qs) in
> rcu_flavor_sched_clock_irq() implies READ_ONCE() so no need READ_ONCE()
> there right?

Assembly in x86.  Not so much on other architectures, though.  ;-)
See raw_cpu_generic_read().

> >  	if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) {
> >  		tick_dep_clear_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> >  		rdp->rcu_forced_tick = false;
> > @@ -1417,7 +1421,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> >  		trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> >  		need_gp = !!(rnp->qsmask & rdp->grpmask);
> >  		rdp->cpu_no_qs.b.norm = need_gp;
> > -		rdp->core_needs_qs = need_gp;
> > +		WRITE_ONCE(rdp->core_needs_qs, need_gp);
> >  		zero_cpu_stall_ticks(rdp);
> >  	}
> >  	rdp->gp_seq = rnp->gp_seq;  /* Remember new grace-period state. */
> > @@ -1987,7 +1991,6 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
> >  		return;
> >  	}
> >  	mask = rdp->grpmask;
> > -	rdp->core_needs_qs = false;
> >  	if ((rnp->qsmask & mask) == 0) {
> >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >  	} else {
> > @@ -1998,7 +2001,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
> >  		if (!offloaded)
> >  			needwake = rcu_accelerate_cbs(rnp, rdp);
> >  
> > -		rcu_disable_tick_upon_qs(rdp);
> > +		rcu_disable_urgency_upon_qs(rdp);
> >  		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> >  		/* ^^^ Released rnp->lock */
> >  		if (needwake)
> > @@ -2022,7 +2025,7 @@ rcu_check_quiescent_state(struct rcu_data *rdp)
> >  	 * Does this CPU still need to do its part for current grace period?
> >  	 * If no, return and let the other CPUs do their part as well.
> >  	 */
> > -	if (!rdp->core_needs_qs)
> > +	if (!READ_ONCE(rdp->core_needs_qs))
> >  		return;
> >  
> >  	/*
> > @@ -2316,7 +2319,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
> >  				rdp = per_cpu_ptr(&rcu_data, cpu);
> >  				if (f(rdp)) {
> >  					mask |= bit;
> > -					rcu_disable_tick_upon_qs(rdp);
> > +					rcu_disable_urgency_upon_qs(rdp);
> >  				}
> >  			}
> >  		}
> > @@ -3004,7 +3007,7 @@ static int rcu_pending(void)
> >  		return 0;
> >  
> >  	/* Is the RCU core waiting for a quiescent state from this CPU? */
> > -	if (rdp->core_needs_qs && !rdp->cpu_no_qs.b.norm)
> > +	if (READ_ONCE(rdp->core_needs_qs) && !rdp->cpu_no_qs.b.norm)
> >  		return 1;
> >  
> >  	/* Does this CPU have callbacks ready to invoke? */
> > @@ -3244,7 +3247,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
> >  	rdp->gp_seq = rnp->gp_seq;
> >  	rdp->gp_seq_needed = rnp->gp_seq;
> >  	rdp->cpu_no_qs.b.norm = true;
> > -	rdp->core_needs_qs = false;
> 
> How about calling the new hint-clearing function here as well? Just for
> robustness and consistency purposes?

This and the next function are both called during a CPU-hotplug online
operation, so there is little robustness or consistency to be had by
doing it twice.

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> >  	rdp->rcu_iw_pending = false;
> >  	rdp->rcu_iw_gp_seq = rnp->gp_seq - 1;
> >  	trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> > @@ -3359,7 +3361,7 @@ void rcu_cpu_starting(unsigned int cpu)
> >  	rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> >  	rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> >  	if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> > -		rcu_disable_tick_upon_qs(rdp);
> > +		rcu_disable_urgency_upon_qs(rdp);
> >  		/* Report QS -after- changing ->qsmaskinitnext! */
> >  		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> >  	} else {

next prev parent reply	other threads:[~2019-09-05 16:44 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-30 16:23 [PATCH -rcu dev 1/2] Revert b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter") Joel Fernandes (Google)
2019-08-30 16:23 ` [PATCH -rcu dev 2/2] rcu/dyntick-idle: Add better tracing Joel Fernandes (Google)
2019-09-03 20:04   ` Paul E. McKenney
2019-09-04  0:46     ` Joel Fernandes
2019-09-03 20:02 ` [PATCH -rcu dev 1/2] Revert b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter") Paul E. McKenney
2019-09-04  4:59   ` Joel Fernandes
2019-09-04 10:12     ` Paul E. McKenney
2019-09-04 13:54       ` Joel Fernandes
2019-09-04 23:13         ` Paul E. McKenney
2019-09-05 15:36           ` Joel Fernandes
2019-09-05 16:43             ` Paul E. McKenney [this message]
2019-09-06  0:01               ` Joel Fernandes
2019-09-06 15:08                 ` Joel Fernandes
2019-09-06 15:21                   ` Paul E. McKenney
2019-09-06 15:27                     ` Paul E. McKenney
2019-09-06 16:57                       ` Joel Fernandes
2019-09-06 17:16                         ` Paul E. McKenney
2019-09-06 17:26                           ` Joel Fernandes
2019-09-07 17:28                           ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190905164329.GT4125@linux.ibm.com \
    --to=paulmck@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=pmladek@suse.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.