From: Joel Fernandes <joelagnelf@nvidia.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: linux-kernel@vger.kernel.org,
"Paul E. McKenney" <paulmck@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
Uladzislau Rezki <urezki@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang@linux.dev>,
rcu@vger.kernel.org
Subject: Re: [PATCH RFC] rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path
Date: Thu, 25 Dec 2025 21:19:36 -0500 [thread overview]
Message-ID: <20251226021936.GA739018@joelbox2> (raw)
In-Reply-To: <aU24p7he1T63Qeke@pavilion.home>
Hi Frederic,
On Thu, Dec 25, 2025 at 11:20:23PM +0100, Frederic Weisbecker wrote:
> Le Thu, Dec 25, 2025 at 02:44:50AM -0500, Joel Fernandes a écrit :
> > The WakeOvfIsDeferred code path in __call_rcu_nocb_wake() attempts to
> > wake rcuog when the callback count exceeds qhimark and callbacks aren't
> > done with their GP (newly queued or awaiting GP). However, a lot of
> > testing proves this wake is always redundant or useless.
> >
> > In the flooding case, rcuog is always waiting for a GP to finish. So
> > waking up the rcuog thread is pointless. The timer wakeup adds overhead,
> > rcuog simply wakes up and goes back to sleep achieving nothing.
> >
> > This path also adds a full memory barrier, and additional timer expiry
> > modifications unnecessarily.
> >
> > The root cause is that WakeOvfIsDeferred fires when
> > !rcu_segcblist_ready_cbs() (GP not complete), but waking rcuog cannot
> > accelerate GP completion.
> >
> > This commit therefore removes this path, which also adding some rdp
> > counters to ensure we don't have lost wake ups.
>
> There should be two patches: one that removes the useless path and the
> other that adds the debugging.
Sure, will split.
> > Tested with rcutorture scenarios: TREE01, TREE05, TREE08 (all NOCB
> > configurations) - all pass. Also stress tested using a kernel module
> > that floods call_rcu() to trigger the overload conditions and made the
> > observations confirming the findings.
> >
> > Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>
> Cool! Just a few comments:
>
> > @@ -549,24 +546,26 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
> > lazy_len = READ_ONCE(rdp->lazy_len);
> > if (was_alldone) {
> > rdp->qlen_last_fqs_check = len;
> > + rdp->nocb_gp_wake_attempt = true;
> > + rcu_nocb_unlock(rdp);
> > // Only lazy CBs in bypass list
> > if (lazy_len && bypass_len == lazy_len) {
> > - rcu_nocb_unlock(rdp);
> > wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
> > TPS("WakeLazy"));
> > } else if (!irqs_disabled_flags(flags)) {
> > /* ... if queue was empty ... */
> > - rcu_nocb_unlock(rdp);
> > wake_nocb_gp(rdp, false);
> > trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> > TPS("WakeEmpty"));
> > } else {
> > - rcu_nocb_unlock(rdp);
> > wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE,
> > TPS("WakeEmptyIsDeferred"));
> > }
> > +
> > + return;
> > } else if (len > rdp->qlen_last_fqs_check + qhimark) {
> > - /* ... or if many callbacks queued. */
> > + /* Callback overload condition. */
> > + WARN_ON_ONCE(!rdp->nocb_gp_wake_attempt && !rdp->nocb_gp_serving);
>
> With this test, the point of ->nocb_gp_serving is unclear given that both
> states are cleared in the same place but ->nocb_gp_serving is set later by
> the gp kthread. ->nocb_gp_serving implies ->nocb_gp_wake_attempt so the above
> test is the same as WARN_ON_ONCE(!rdp->nocb_gp_wake_attempt).
>
> In fact ->nocb_gp_wake_attempt alone probably makes sense?
Ah true, I got a bit paranoid about false positive warnings hence I added the
extra variable, however on further analysis I realized the nocb lock takes
care of preventing potential false positive warnings. So yes, I will just use
the single variable. Thanks.
>
> > rdp->qlen_last_fqs_check = len;
> > j = jiffies;
> > if (j != rdp->nocb_gp_adv_time &&
> > @@ -575,21 +574,10 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
> > rcu_advance_cbs_nowake(rdp->mynode, rdp);
> > rdp->nocb_gp_adv_time = j;
> > }
> > - smp_mb(); /* Enqueue before timer_pending(). */
>
> You need to remove the pairing smp_mb__after_spin_lock() in
> do_nocb_deferred_wakeup_timer().
Ah, will do. Thanks!
- Joel
prev parent reply other threads:[~2025-12-26 2:19 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-25 7:44 [PATCH RFC] rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path Joel Fernandes
2025-12-25 22:20 ` Frederic Weisbecker
2025-12-26 2:19 ` Joel Fernandes [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251226021936.GA739018@joelbox2 \
--to=joelagnelf@nvidia.com \
--cc=boqun.feng@gmail.com \
--cc=frederic@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=qiang.zhang@linux.dev \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox