Linux RCU subsystem development
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Neeraj Upadhyay <neeraj.upadhyay@amd.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Zqiang <qiang.zhang1211@gmail.com>, rcu <rcu@vger.kernel.org>
Subject: Re: [PATCH 3/3] rcu/exp: Remove needless CPU up quiescent state report
Date: Sat, 15 Feb 2025 23:23:45 +0100	[thread overview]
Message-ID: <Z7ET8S4HKqSPubQY@pavilion.home> (raw)
In-Reply-To: <610596cf-9836-473f-bcdc-15c69b7e0cd4@paulmck-laptop>

Le Sat, Feb 15, 2025 at 02:38:04AM -0800, Paul E. McKenney a écrit :
> On Fri, Feb 14, 2025 at 01:10:52PM +0100, Frederic Weisbecker wrote:
> > Le Fri, Feb 14, 2025 at 01:01:56AM -0800, Paul E. McKenney a écrit :
> > > On Fri, Feb 14, 2025 at 12:25:59AM +0100, Frederic Weisbecker wrote:
> > > > A CPU coming online checks for an ongoing grace period and reports
> > > > a quiescent state accordingly if needed. This special treatment that
> > > > shortcuts the expedited IPI finds its origin as an optimization purpose
> > > > on the following commit:
> > > > 
> > > > 	338b0f760e84 (rcu: Better hotplug handling for synchronize_sched_expedited()
> > > > 
> > > > The point is to avoid an IPI while waiting for a CPU to become online
> > > > or failing to become offline.
> > > > 
> > > > However this is pointless and even error prone for several reasons:
> > > > 
> > > > * If the CPU has been seen offline in the first round scanning offline
> > > >   and idle CPUs, no IPI is even tried and the quiescent state is
> > > >   reported on behalf of the CPU.
> > > > 
> > > > * This means that if the IPI fails, the CPU just became offline. So
> > > >   it's unlikely to become online right away, unless the cpu hotplug
> > > >   operation failed and rolled back, which is a rare event that can
> > > >   wait a jiffy for a new IPI to be issued.
> 
> But the expedited grace period might be preempted for an arbitrarily
> long period, especially if a hypervisor is in play.  And we do drop
> that lock midway through...

Well, then that delays the expedited grace period as a whole anyway...

> > > > For all those reasons, remove this optimization that doesn't look worthy
> > > > to keep around.
> > > 
> > > Thank you for digging into this!
> > > 
> > > When I ran tests that removed the call to sync_sched_exp_online_cleanup()
> > > a few months ago, I got grace-period hangs [1].  Has something changed
> > > to make this safe?
> > 
> > Hmm, but was it before or after "rcu: Fix get_state_synchronize_rcu_full()
> > GP-start detection" ?
> 
> Before.  There was also some buggy debug code in play.  Also, to get the
> failure, it was necessary to make TREE03 disable preemption, as stock
> TREE03 has an empty sync_sched_exp_online_cleanup() function.
> 
> I am rerunning the test with a WARN_ON_ONCE() after the early exit from
> the sync_sched_exp_online_cleanup().  Of course, lack of a failure does
> not necessairly indicate

Cool, thanks!

> 
> > And if after do we know why?
> 
> Here are some (possibly bogus) possibilities that came to mind:
> 
> 1.	There is some coming-online race that deprives the incoming
> 	CPU of an IPI, but nevertheless marks that CPU as blocking the
> 	current grace period.

Arguably there is a tiny window between rcutree_report_cpu_starting()
and set_cpu_online() that could make ->qsmaskinitnext visible before
cpu_online() and therefore delay the IPI a bit. But I don't expect
more than a jiffy to fill up the gap. And if that's relevant, note that
only !PREEMPT_RCU is then "fixed" by sync_sched_exp_online_cleanup() here.

> 
> 2.	Some strange scenario involves the CPU going offline for just a
> 	little bit, so that the IPI gets wasted on the outgoing due to
> 	neither of the "if" conditions in rcu_exp_handler() being true.
> 	The outgoing CPU just says "I need a QS", then leaves and
> 	comes back.  (The expedited grace period doesn't retry because
> 	it believes that it already sent that IPI.)

I don't think this is possible. Once the CPU enters CPUHP_TEARDOWN_CPU with
stop_machine, no more IPIs can be issued. The remaining ones are executed
at CPUHP_AP_SMPCFD_DYING, still in stop_machine. So this is the last call
for rcu_exp_handler() execution. And this last call has to be followed
by rcu_note_context_switch() between stop_machine and the final schedule to
idle. And that rcu_note_context_switch() must report the rdp exp context
switch.

One easy way to assert that is:

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 86935fe00397..40d6090a33f5 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4347,6 +4347,12 @@ void rcutree_report_cpu_dead(void)
 	 * may introduce a new READ-side while it is actually off the QS masks.
 	 */
 	lockdep_assert_irqs_disabled();
+	/*
+	 * CPUHP_AP_SMPCFD_DYING was the last call for rcu_exp_handler() execution.
+	 * The requested QS must have been reported on the last context switch
+	 * from stop machine to idle.
+	 */
+	WARN_ON_ONCE(rdp->cpu_no_qs.b.exp);
 	// Do any dangling deferred wakeups.
 	do_nocb_deferred_wakeup(rdp);
 
> 
> 3.	Your ideas here!  ;-)

:-)

> 
> 							Thanx, Paul

  reply	other threads:[~2025-02-15 22:23 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-13 23:25 [PATCH 0/3] rcu/exp updates Frederic Weisbecker
2025-02-13 23:25 ` [PATCH 1/3] rcu/exp: Protect against early QS report Frederic Weisbecker
2025-02-14  9:10   ` Paul E. McKenney
2025-03-13 16:40     ` Frederic Weisbecker
2025-03-13 17:04       ` Paul E. McKenney
2025-02-13 23:25 ` [PATCH 2/3] rcu/exp: Remove confusing needless full barrier on task unblock Frederic Weisbecker
2025-02-25 21:59   ` Joel Fernandes
2025-02-26  0:08     ` Paul E. McKenney
2025-02-26 12:52     ` Frederic Weisbecker
2025-02-26 15:04       ` Paul E. McKenney
2025-02-26 15:26         ` Joel Fernandes
2025-02-26 15:34           ` Frederic Weisbecker
2025-02-13 23:25 ` [PATCH 3/3] rcu/exp: Remove needless CPU up quiescent state report Frederic Weisbecker
2025-02-14  9:01   ` Paul E. McKenney
2025-02-14 12:10     ` Frederic Weisbecker
2025-02-15 10:38       ` Paul E. McKenney
2025-02-15 22:23         ` Frederic Weisbecker [this message]
2025-02-19 14:58           ` Paul E. McKenney
2025-02-19 15:55             ` Paul E. McKenney
2025-02-21 15:31               ` Frederic Weisbecker
2025-02-21 15:52             ` Frederic Weisbecker
2025-02-26  0:00               ` Paul E. McKenney
2025-03-03 20:10   ` Paul E. McKenney
2025-03-14 14:39     ` Frederic Weisbecker
2025-03-18 17:07       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z7ET8S4HKqSPubQY@pavilion.home \
    --to=frederic@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neeraj.upadhyay@amd.com \
    --cc=paulmck@kernel.org \
    --cc=qiang.zhang1211@gmail.com \
    --cc=rcu@vger.kernel.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox