From: Joel Fernandes <joelagnelf@nvidia.com>
To: paulmck@kernel.org
Cc: linux-kernel@vger.kernel.org, Boqun Feng <boqun.feng@gmail.com>,
rcu@vger.kernel.org, Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Josh Triplett <josh@joshtriplett.org>,
Uladzislau Rezki <urezki@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang@linux.dev>
Subject: Re: [PATCH -next v3 2/3] rcu/nocb: Remove dead callback overload handling
Date: Thu, 22 Jan 2026 20:29:41 -0500 [thread overview]
Message-ID: <013102c7-a1c8-4f13-8dd9-5803e4d69aaf@nvidia.com> (raw)
In-Reply-To: <EBEF016B-721C-4A54-98E3-4B8BE6AA4C21@nvidia.com>
On Jan 22, 2026, at 7:18 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> On Thu, Jan 22, 2026 at 06:43:31PM -0500, Joel Fernandes wrote:
>> On Thu, Jan 22, 2026 at 01:55:11PM -0800, Paul E. McKenney wrote:
>>> On Mon, Jan 19, 2026 at 06:12:22PM -0500, Joel Fernandes wrote:
>>>> - } else if (len > rdp->qlen_last_fqs_check + qhimark) {
>>>> - /* ... or if many callbacks queued. */
>>>> - rdp->qlen_last_fqs_check = len;
>>>> - j = jiffies;
>>>> - if (j != rdp->nocb_gp_adv_time &&
>>>> - rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
>>> This places in cur_gp_seq not the grace period for the current callback
>>> (which would be unlikely to have finished), but rather the grace period
>>> for the oldest callback that has not yet been marked as done. And that
>>> callback started some time ago, and thus might well have finished.
>>> So while this code might not have been executed in your tests, it is
>>> definitely not a logical contradiction.
>>> Or am I missing something subtle here?
>>
>> You're right that it's not a logical contradiction - I was imprecise.
>> rcu_segcblist_nextgp() returns the GP for the oldest pending callback,
>> which could indeed have completed.
>>
>> However, the question becomes: under what scenario do we need to advance
>> here? If that GP completed, rcuog should have already advanced those
>> callbacks. The only way this code path can execute is if rcuog is starved
>> and not running to advance them, right?
>
> That is one way. The other way is if the RCU grace-period gets delayed
> (perhaps by vCPU preemption) between the time that it updates the
> leaf rcu_node structure's ->gp_seq field and the time that it invokes
> rcu_nocb_gp_cleanup().
I see the window you're describing. In rcu_gp_cleanup(), for each leaf node:
WRITE_ONCE(rnp->gp_seq, new_gp_seq); // GP appears complete
...
raw_spin_unlock_irq_rcu_node(rnp);
/* vCPU preemption */
rcu_nocb_gp_cleanup(sq); // wakes rcuog
So yes, in this window, the call_rcu() CPU could see the updated gp_seq
and have rcu_seq_done() return true for the now-completed GP.
However, even in this window, advancing callbacks doesn't help:
1. We advance callbacks from WAIT to DONE state
2. But rcuog is still sleeping, waiting for GP kthread to wake it
3. rcuoc is still sleeping, waiting for rcuog to wake it
4. Callbacks sit in DONE state but nobody invokes them
So the critical path is unchanged:
swake_up_all() -> rcuog -> rcuoc -> invoke.
I guess this is the redundancy argument - the window exists, but
exploiting it provides no meaningful benefit AFAICS.
>
>> But as Frederic pointed out, even if rcuog is starved, advancing here
>> doesn't help - rcuog must still run anyway to wake the callback thread.
>> We're just duplicating work it will do when it finally gets to run.
>
> So maybe we don't want that first patch after all? ;-)
Do you mean we want the first patch so that it can remove the code that
we don't want?
>
>> The extensive testing (300K callback floods, hours of rcutorture) showing
>> zero hits confirms this window is practically unreachable. I can update the
>> commit message to remove the "logical contradiction" claim and focus on the
>> redundancy argument instead.
>
> That would definitely be good!
Thanks. I will focus on this argument, then. I will resend with a better
patch description in the morning.
>
>> Would that address your concern?
>
> Your point about the rcuoc kthread needing to be awakened is a good one.
> I am still concerned about flooding on busy systems, especially if the
> busy component is an underlying hypervisor, but we might need a more
> principled approach for that situation.
Hmm true. There is also the case that any of the kthreads in the way of
the callback getting preempted by the hypervisor could also be
problematic, to your point of requiring a more principled approach. I
guess we did not want the reader side vCPU preemption workarounds either
for similar reason.
One trick I found irrespective of virtualization, is, rcu_nocb_poll can
result in grace periods completing faster. I think this could help
overload situations by retiring callbacks sooner than later. I can
experiment with this idea in future. Was considering a dynamic trigger
to enable polling mode in overload. I guess there is one way to find out
how well this will work, but initial testing does look promising. :-D.
--
Joel Fernandes
next parent reply other threads:[~2026-01-23 1:29 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <EBEF016B-721C-4A54-98E3-4B8BE6AA4C21@nvidia.com>
2026-01-23 1:29 ` Joel Fernandes [this message]
2026-01-23 5:46 ` [PATCH -next v3 2/3] rcu/nocb: Remove dead callback overload handling Paul E. McKenney
2026-01-23 15:30 ` Joel Fernandes
2026-01-23 16:49 ` Paul E. McKenney
2026-01-23 19:36 ` Joel Fernandes
2026-01-23 21:27 ` Paul E. McKenney
2026-01-24 1:11 ` Joel Fernandes
2026-01-25 14:46 ` Joel Fernandes
2026-01-19 23:12 [PATCH -next v3 0/3] rcu/nocb: Cleanup patches for next merge window Joel Fernandes
2026-01-19 23:12 ` [PATCH -next v3 2/3] rcu/nocb: Remove dead callback overload handling Joel Fernandes
2026-01-19 23:53 ` Frederic Weisbecker
2026-01-20 0:07 ` Paul E. McKenney
2026-01-20 0:59 ` joelagnelf
2026-01-22 21:55 ` Paul E. McKenney
2026-01-22 23:43 ` Joel Fernandes
2026-01-23 0:12 ` Paul E. McKenney
2026-01-23 5:41 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=013102c7-a1c8-4f13-8dd9-5803e4d69aaf@nvidia.com \
--to=joelagnelf@nvidia.com \
--cc=boqun.feng@gmail.com \
--cc=frederic@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=qiang.zhang@linux.dev \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox