From: Joel Fernandes <joelagnelf@nvidia.com>
To: Boqun Feng <boqun@kernel.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: paulmck@kernel.org, frederic@kernel.org, neeraj.iitr10@gmail.com,
urezki@gmail.com, boqun.feng@gmail.com, rcu@vger.kernel.org,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Tejun Heo <tj@kernel.org>,
bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Andrea Righi <arighi@nvidia.com>
Subject: Re: Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT
Date: Thu, 19 Mar 2026 14:42:56 -0400 [thread overview]
Message-ID: <c520958e-78e1-41ce-b675-4a560305a206@nvidia.com> (raw)
In-Reply-To: <abw2ECYirX1tTwV9@tardis.local>
On 3/19/2026 1:44 PM, Boqun Feng wrote:
> On Thu, Mar 19, 2026 at 06:02:44PM +0100, Sebastian Andrzej Siewior wrote:
>> On 2026-03-19 09:48:16 [-0700], Boqun Feng wrote:
>>> I agree it's not RCU's fault ;-)
>>
>> I never claimed it is anyone's fault. I just see that BPF should be able
>> to do things which kgdb would not be allowed to.
>>
>>> I guess it'll be difficult to restrict BPF, however maybe BPF can call
>>> call_srcu() in irq_work instead? Or a more systematic defer mechanism
>>> that allows BPF to defer any lock holding functions to a different
>>> context. (We have a similar issue that BPF cannot call kfree_rcu() in
>>> some cases IIRC).
>>>
>>> But we need to fix this in v7.0, so this short-term fix is still needed.
>>
>> I would prefer something substantial before we rush to get a quick fix
>> and move on.
>>
>
> The quick fix here is really "restore the previous behavior of
> call_rcu_tasks_trace() in call_srcu()", and the future work will
Unfortunately reverting c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in
terms of SRCU-fast") is tricky since the original body of RCU Tasks Trace code
is deleted. Perhaps we should have added an easier escape-hatch, lesson learnt:)
> naturally happen: if the extra irq_work layer turns out calling issues
> to other SRCU users, then we need to fix them as well. Otherwise, there
> is no real need to avoid the extra irq_work hop. So I *think* it's OK
> ;-)
>
> Cleaning up all the ad-hoc irq_work usages in BPF is another thing,
> which can happen if we learn about all the cases and have a good design.
>
>> If we could get that irq_work() part only for BPF where it is required
>> then it would be already a step forward.
>>
>
> I'm happy to include that (i.e. using Qiang's suggestion) if Joel also
> agrees.
Sure, I am Ok with sort of short-term fix, but I worry that it still does not
the issues due to the tasks-trace conversion. In particular, it doesn't fix the
issue Andrea reported AFAICS, because there is a dependency on pool->lock? see:
https://lore.kernel.org/all/abjzvz_tL_siV17s@gpd4/
That happens precisely because of the queue_delayed_work() happening from the
SRCU tasks-trace specific BPF right?
This looks something like this, due to combination of SRCU, scheduler and WQ:
srcu_usage.lock -> pool->lock -> pi_lock -> rq->__lock
^ |
| |
+----------- DEADLOCK CYCLE ------------+
>> Long term it would be nice if we could avoid calling this while locks
>> are held. I think call_rcu() can't be used under rq/pi lock, but timers
>> should be fine.
>>
>> Is this rq/pi locking originating from "regular" BPF code or sched_ext?
>>
>
> I think if you have any tracepoint (include traceable functions) under
> rq/pi locking, then potentially BPF can call call_srcu() there.
>
> The root cause of the issues is that BPF is actually like a NMI unless
> the code is noinstr (There is a rabit hole about BPF calling
> call_srcu() while it's instrumenting call_srcu() itself). And the right
> way to solve all the issues is to have a general defer mechanism for
> BPF.
Will that really solve the above mentioned issue though that Andrea reported?
+Andrea, +Steve as well.
thanks,
--
Joel Fernandes
next prev parent reply other threads:[~2026-03-19 18:43 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 13:34 Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Paul E. McKenney
2026-03-18 10:50 ` Sebastian Andrzej Siewior
2026-03-18 11:49 ` Paul E. McKenney
2026-03-18 14:43 ` Sebastian Andrzej Siewior
2026-03-18 15:43 ` Paul E. McKenney
2026-03-18 16:04 ` Sebastian Andrzej Siewior
2026-03-18 16:32 ` Paul E. McKenney
2026-03-18 16:42 ` Boqun Feng
2026-03-18 18:45 ` Paul E. McKenney
2026-03-18 16:47 ` Sebastian Andrzej Siewior
2026-03-18 18:48 ` Paul E. McKenney
2026-03-19 8:55 ` Sebastian Andrzej Siewior
2026-03-19 10:05 ` Paul E. McKenney
2026-03-19 10:43 ` Paul E. McKenney
2026-03-19 10:51 ` Sebastian Andrzej Siewior
2026-03-18 15:51 ` Boqun Feng
2026-03-18 18:42 ` Paul E. McKenney
2026-03-18 20:04 ` Joel Fernandes
2026-03-18 20:11 ` Kumar Kartikeya Dwivedi
2026-03-18 20:25 ` Joel Fernandes
2026-03-18 21:52 ` Boqun Feng
2026-03-18 21:55 ` Boqun Feng
2026-03-18 22:15 ` Boqun Feng
2026-03-18 22:52 ` Joel Fernandes
2026-03-18 23:27 ` Boqun Feng
2026-03-19 1:08 ` Boqun Feng
2026-03-19 9:03 ` Sebastian Andrzej Siewior
2026-03-19 16:27 ` Boqun Feng
2026-03-19 16:33 ` Sebastian Andrzej Siewior
2026-03-19 16:48 ` Boqun Feng
2026-03-19 16:59 ` Kumar Kartikeya Dwivedi
2026-03-19 17:27 ` Boqun Feng
2026-03-19 18:41 ` Kumar Kartikeya Dwivedi
2026-03-19 20:14 ` Boqun Feng
2026-03-19 20:21 ` Joel Fernandes
2026-03-19 20:39 ` Boqun Feng
2026-03-20 15:34 ` Paul E. McKenney
2026-03-20 15:59 ` Boqun Feng
2026-03-20 16:24 ` Paul E. McKenney
2026-03-20 16:57 ` Boqun Feng
2026-03-20 17:54 ` Joel Fernandes
2026-03-20 18:14 ` [PATCH] rcu: Use an intermediate irq_work to start process_srcu() Boqun Feng
2026-03-20 19:18 ` Joel Fernandes
2026-03-20 20:47 ` Andrea Righi
2026-03-20 20:54 ` Boqun Feng
2026-03-20 21:00 ` Andrea Righi
2026-03-20 21:02 ` Andrea Righi
2026-03-20 21:06 ` Boqun Feng
2026-03-20 22:29 ` [PATCH v2] " Boqun Feng
2026-03-23 21:09 ` Joel Fernandes
2026-03-23 22:18 ` Boqun Feng
2026-03-23 22:50 ` Joel Fernandes
2026-03-24 11:27 ` Frederic Weisbecker
2026-03-24 14:56 ` Joel Fernandes
2026-03-24 14:56 ` Alexei Starovoitov
2026-03-24 17:36 ` Boqun Feng
2026-03-24 18:40 ` Joel Fernandes
2026-03-24 19:23 ` Paul E. McKenney
2026-03-26 19:12 ` patchwork-bot+netdevbpf
2026-03-21 4:27 ` [PATCH] " Zqiang
2026-03-21 18:15 ` Boqun Feng
2026-03-21 10:10 ` Paul E. McKenney
2026-03-21 17:15 ` Boqun Feng
2026-03-21 17:41 ` Paul E. McKenney
2026-03-21 18:06 ` Boqun Feng
2026-03-21 19:31 ` Paul E. McKenney
2026-03-21 19:45 ` Boqun Feng
2026-03-21 20:07 ` Paul E. McKenney
2026-03-21 20:08 ` Boqun Feng
2026-03-22 10:09 ` Paul E. McKenney
2026-03-22 16:16 ` Boqun Feng
2026-03-22 17:09 ` Paul E. McKenney
2026-03-22 17:31 ` Boqun Feng
2026-03-22 17:44 ` Paul E. McKenney
2026-03-22 18:17 ` Boqun Feng
2026-03-22 19:47 ` Paul E. McKenney
2026-03-22 20:26 ` Boqun Feng
2026-03-23 7:50 ` Paul E. McKenney
2026-03-20 18:20 ` Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Boqun Feng
2026-03-20 23:11 ` Paul E. McKenney
2026-03-21 3:29 ` Paul E. McKenney
2026-03-21 17:03 ` [RFC PATCH] rcu-tasks: Avoid using mod_timer() in call_rcu_tasks_generic() Boqun Feng
2026-03-23 15:17 ` Boqun Feng
2026-03-23 20:37 ` Joel Fernandes
2026-03-23 21:50 ` Kumar Kartikeya Dwivedi
2026-03-23 22:13 ` Boqun Feng
2026-03-20 16:15 ` Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Boqun Feng
2026-03-20 16:24 ` Paul E. McKenney
2026-03-19 17:02 ` Sebastian Andrzej Siewior
2026-03-19 17:44 ` Boqun Feng
2026-03-19 18:42 ` Joel Fernandes [this message]
2026-03-19 20:20 ` Boqun Feng
2026-03-19 20:26 ` Joel Fernandes
2026-03-19 20:45 ` Joel Fernandes
2026-03-19 10:02 ` Paul E. McKenney
2026-03-19 14:34 ` Boqun Feng
2026-03-19 16:10 ` Paul E. McKenney
2026-03-18 23:56 ` Kumar Kartikeya Dwivedi
2026-03-19 0:26 ` Zqiang
2026-03-19 1:13 ` Boqun Feng
2026-03-19 2:47 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c520958e-78e1-41ce-b675-4a560305a206@nvidia.com \
--to=joelagnelf@nvidia.com \
--cc=arighi@nvidia.com \
--cc=ast@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=boqun@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=frederic@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=memxor@gmail.com \
--cc=neeraj.iitr10@gmail.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.