From: Boqun Feng <boqun@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Joel Fernandes <joelagnelf@nvidia.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
frederic@kernel.org, neeraj.iitr10@gmail.com, urezki@gmail.com,
boqun.feng@gmail.com, rcu@vger.kernel.org,
Tejun Heo <tj@kernel.org>,
bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>
Subject: Re: Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT
Date: Fri, 20 Mar 2026 09:57:21 -0700 [thread overview]
Message-ID: <ab18cUI_4jKA8wcA@tardis.local> (raw)
In-Reply-To: <2b3848e9-3b11-41b8-8c44-5de28d4a4433@paulmck-laptop>
On Fri, Mar 20, 2026 at 09:24:15AM -0700, Paul E. McKenney wrote:
[...]
> > > > In an alternative universe, BPF has a defer mechanism, and BPF core
> > > > would just call (for example):
> > > >
> > > > bpf_defer(call_srcu, ...); // <- a lockless defer
> > > >
> > > > so the issue won't happen.
> > >
> > > In theory, this is quite true.
> > >
> > > In practice, unfortunately for keeping this part of RCU as simple as
> > > we might wish, when a BPF program gets attached to some function in
> > > the kernel, it does not know whether or not that function holds a given
> > > scheduler lock. For example, there are any number of utility functions
> > > that can be (and are) called both with and without those scheduler
> > > locks held. Worse yet, it might be attached to a function that is
> > > *never* invoked with a scheduler lock held -- until some out-of-tree
> > > module is loaded. Which means that this module might well be loaded
> > > after BPF has JIT-ed the BPF program.
> > >
> >
> > Hmm.. maybe I failed to make myself more clear. I was suggesting we
> > treat BPF as a special context, and you cannot do everything, if there
> > is any call_srcu() needed, switch it to bpf_defer(). We should have the
> > same result as either 1) call_srcu() locklessly defer itself or 2) a
> > call_srcu_lockless().
> >
> > Certainly we can call_srcu() do locklessly defer, but if it's only for
> > BPF, that looks like a whack-a-mole approach to me. Say later on we want
> > to use call_hazptr() in BPF for some reason (there is hoping!), then we
> > need to make it locklessly defer as well. Now we have two lockless logic
> > in both call_srcu() and call_hazptr(), if there is a third one, we need
> > to do that as well. So where's the end?
>
> Except that by the same line of reasoning, how do the BPF guys figure out
> exactly which function calls they need to defer and under what conditions
> they need to defer them? Keeping in mind that the list of functions and
Can't they just defer anything that is deferrable? If it's deferrable,
then what's the actual cost for BPF to defer it?
> corresponding conditions is subject to change as the kernel continues
> to change.
>
> > The lockless defer request comes from BPF being special, a proper way to
> > deal with it IMO would be BPF has a general defer mechanism. Whether
> > call_srcu() or call_srcu_lockless() can do lockless defer is
> > orthogonal.
>
> Fair point, and for the general defer mechanism, I hereby nominate
> the irq_work_queue() function. We can use this both for RCU and
> for hazard pointers. The code to make a call_srcu_lockless() and
> call_hazptr_lockless() that includes the relevant checks and that does
> the deferral will not be large, complex, or slow. Especially assuming
> that we consolidate common checks.
>
> > BTW, an example to my point, I think we have a deadlock even with the
> > old call_rcu_tasks_trace(), because at:
> >
> > https://elixir.bootlin.com/linux/v6.19.8/source/kernel/rcu/tasks.h#L384
> >
> > We do a:
> >
> > mod_timer(&rtpcp->lazy_timer, rcu_tasks_lazy_time(rtp));
> >
> > which means call_rcu_tasks_trace() may acquire timer base lock, and that
> > means if BPF was to trace a point where timer base lock is held, then we
> > may have a deadlock. So Now I wonder whether you had any magic to avoid
> > the deadlock pre-7.0 or we are just lucky ;-)
>
> Test it and see! ;-)
>
"Program testing can be used to show the presence of bugs, but never to
show their absence!" ;-)
> > See, without a general defer mechanism, we will have a lot of fun
> > auditing all the primitives that BPF may use.
>
> No, *we* only audit the primitives in our subsystem that BPF actually
> uses when BPF starts using them. We let the *other* subsystems worry
> about *their* interactions with BPF.
>
As an RCU mainatainer: fine
As a LOCKING maintainer: shake my head, because for every primitive that
BPF uses, now there could be a normal version and a _bpf/lockless()
version. That could create more maintenance issues, but only time can
tell.
> > > So we really do need to make some variant of call_srcu() that deals
> > > with this.
> > >
> > > We do have some options. First, we could make call_srcu() deal with it
> > > directly, or second, we could create something like call_srcu_lockless()
> > > or call_srcu_nolock() or whatever that can safely be invoked from any
> > > context, including NMI handlers, and that invokes call_srcu() directly
> > > when it determines that it is safe to do so. The advantage of the second
> > > approach is that it avoids incurring the overhead of checking in the
> > > common case.
> >
> > Within the RCU scope, I prefer the second option.
>
> Works for me!
>
> Would you guys like to implement this, or would you prefer that I do so?
>
I feel I don't have cycles for it soon, I have a big backlog (including
making preempt_count 64bit on 64bit x86). But I will send the fix in the
current call_srcu() for v7.0 and work with Joel to get into Linus' tree.
I will definitely review it if you beat me to it ;-)
Regards,
Boqun
> Thanx, Paul
>
> > Regards,
> > Boqun
> >
[..]
next prev parent reply other threads:[~2026-03-20 16:57 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 13:34 Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Paul E. McKenney
2026-03-18 10:50 ` Sebastian Andrzej Siewior
2026-03-18 11:49 ` Paul E. McKenney
2026-03-18 14:43 ` Sebastian Andrzej Siewior
2026-03-18 15:43 ` Paul E. McKenney
2026-03-18 16:04 ` Sebastian Andrzej Siewior
2026-03-18 16:32 ` Paul E. McKenney
2026-03-18 16:42 ` Boqun Feng
2026-03-18 18:45 ` Paul E. McKenney
2026-03-18 16:47 ` Sebastian Andrzej Siewior
2026-03-18 18:48 ` Paul E. McKenney
2026-03-19 8:55 ` Sebastian Andrzej Siewior
2026-03-19 10:05 ` Paul E. McKenney
2026-03-19 10:43 ` Paul E. McKenney
2026-03-19 10:51 ` Sebastian Andrzej Siewior
2026-03-18 15:51 ` Boqun Feng
2026-03-18 18:42 ` Paul E. McKenney
2026-03-18 20:04 ` Joel Fernandes
2026-03-18 20:11 ` Kumar Kartikeya Dwivedi
2026-03-18 20:25 ` Joel Fernandes
2026-03-18 21:52 ` Boqun Feng
2026-03-18 21:55 ` Boqun Feng
2026-03-18 22:15 ` Boqun Feng
2026-03-18 22:52 ` Joel Fernandes
2026-03-18 23:27 ` Boqun Feng
2026-03-19 1:08 ` Boqun Feng
2026-03-19 9:03 ` Sebastian Andrzej Siewior
2026-03-19 16:27 ` Boqun Feng
2026-03-19 16:33 ` Sebastian Andrzej Siewior
2026-03-19 16:48 ` Boqun Feng
2026-03-19 16:59 ` Kumar Kartikeya Dwivedi
2026-03-19 17:27 ` Boqun Feng
2026-03-19 18:41 ` Kumar Kartikeya Dwivedi
2026-03-19 20:14 ` Boqun Feng
2026-03-19 20:21 ` Joel Fernandes
2026-03-19 20:39 ` Boqun Feng
2026-03-20 15:34 ` Paul E. McKenney
2026-03-20 15:59 ` Boqun Feng
2026-03-20 16:24 ` Paul E. McKenney
2026-03-20 16:57 ` Boqun Feng [this message]
2026-03-20 17:54 ` Joel Fernandes
2026-03-20 18:14 ` [PATCH] rcu: Use an intermediate irq_work to start process_srcu() Boqun Feng
2026-03-20 19:18 ` Joel Fernandes
2026-03-20 20:47 ` Andrea Righi
2026-03-20 20:54 ` Boqun Feng
2026-03-20 21:00 ` Andrea Righi
2026-03-20 21:02 ` Andrea Righi
2026-03-20 21:06 ` Boqun Feng
2026-03-20 22:29 ` [PATCH v2] " Boqun Feng
2026-03-23 21:09 ` Joel Fernandes
2026-03-23 22:18 ` Boqun Feng
2026-03-23 22:50 ` Joel Fernandes
2026-03-24 11:27 ` Frederic Weisbecker
2026-03-24 14:56 ` Joel Fernandes
2026-03-24 14:56 ` Alexei Starovoitov
2026-03-24 17:36 ` Boqun Feng
2026-03-24 18:40 ` Joel Fernandes
2026-03-24 19:23 ` Paul E. McKenney
2026-03-26 19:12 ` patchwork-bot+netdevbpf
2026-03-21 4:27 ` [PATCH] " Zqiang
2026-03-21 18:15 ` Boqun Feng
2026-03-21 10:10 ` Paul E. McKenney
2026-03-21 17:15 ` Boqun Feng
2026-03-21 17:41 ` Paul E. McKenney
2026-03-21 18:06 ` Boqun Feng
2026-03-21 19:31 ` Paul E. McKenney
2026-03-21 19:45 ` Boqun Feng
2026-03-21 20:07 ` Paul E. McKenney
2026-03-21 20:08 ` Boqun Feng
2026-03-22 10:09 ` Paul E. McKenney
2026-03-22 16:16 ` Boqun Feng
2026-03-22 17:09 ` Paul E. McKenney
2026-03-22 17:31 ` Boqun Feng
2026-03-22 17:44 ` Paul E. McKenney
2026-03-22 18:17 ` Boqun Feng
2026-03-22 19:47 ` Paul E. McKenney
2026-03-22 20:26 ` Boqun Feng
2026-03-23 7:50 ` Paul E. McKenney
2026-03-20 18:20 ` Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Boqun Feng
2026-03-20 23:11 ` Paul E. McKenney
2026-03-21 3:29 ` Paul E. McKenney
2026-03-21 17:03 ` [RFC PATCH] rcu-tasks: Avoid using mod_timer() in call_rcu_tasks_generic() Boqun Feng
2026-03-23 15:17 ` Boqun Feng
2026-03-23 20:37 ` Joel Fernandes
2026-03-23 21:50 ` Kumar Kartikeya Dwivedi
2026-03-23 22:13 ` Boqun Feng
2026-03-20 16:15 ` Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Boqun Feng
2026-03-20 16:24 ` Paul E. McKenney
2026-03-19 17:02 ` Sebastian Andrzej Siewior
2026-03-19 17:44 ` Boqun Feng
2026-03-19 18:42 ` Joel Fernandes
2026-03-19 20:20 ` Boqun Feng
2026-03-19 20:26 ` Joel Fernandes
2026-03-19 20:45 ` Joel Fernandes
2026-03-19 10:02 ` Paul E. McKenney
2026-03-19 14:34 ` Boqun Feng
2026-03-19 16:10 ` Paul E. McKenney
2026-03-18 23:56 ` Kumar Kartikeya Dwivedi
2026-03-19 0:26 ` Zqiang
2026-03-19 1:13 ` Boqun Feng
2026-03-19 2:47 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab18cUI_4jKA8wcA@tardis.local \
--to=boqun@kernel.org \
--cc=ast@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=frederic@kernel.org \
--cc=joelagnelf@nvidia.com \
--cc=john.fastabend@gmail.com \
--cc=memxor@gmail.com \
--cc=neeraj.iitr10@gmail.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox