From: Boqun Feng <boqun@kernel.org>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Joel Fernandes <joelagnelf@nvidia.com>,
paulmck@kernel.org, frederic@kernel.org, neeraj.iitr10@gmail.com,
urezki@gmail.com, boqun.feng@gmail.com, rcu@vger.kernel.org,
Tejun Heo <tj@kernel.org>,
bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>
Subject: Re: Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT
Date: Thu, 19 Mar 2026 13:14:08 -0700 [thread overview]
Message-ID: <abxZEE4SAMkNEleq@tardis.local> (raw)
In-Reply-To: <CAP01T75S-NMgB=s_0jqq52xRwe1cQ29DzD33eeqUZHM3cSb=oA@mail.gmail.com>
On Thu, Mar 19, 2026 at 07:41:06PM +0100, Kumar Kartikeya Dwivedi wrote:
> On Thu, 19 Mar 2026 at 18:27, Boqun Feng <boqun@kernel.org> wrote:
> >
> > On Thu, Mar 19, 2026 at 05:59:40PM +0100, Kumar Kartikeya Dwivedi wrote:
> > > On Thu, 19 Mar 2026 at 17:48, Boqun Feng <boqun@kernel.org> wrote:
> > > >
> > > > On Thu, Mar 19, 2026 at 05:33:50PM +0100, Sebastian Andrzej Siewior wrote:
> > > > > On 2026-03-19 09:27:59 [-0700], Boqun Feng wrote:
> > > > > > On Thu, Mar 19, 2026 at 10:03:15AM +0100, Sebastian Andrzej Siewior wrote:
> > > > > > > Please just use the queue_delayed_work() with a delay >0.
> > > > > > >
> > > > > >
> > > > > > That doesn't work since queue_delayed_work() with a positive delay will
> > > > > > still acquire timer base lock, and we can have BPF instrument with timer
> > > > > > base lock held i.e. calling call_srcu() with timer base lock.
> > > > > >
> > > > > > irq_work on the other hand doesn't use any locking.
> > > > >
> > > > > Could we please restrict BPF somehow so it does roam free? It is
> > > > > absolutely awful to have irq_work() in call_srcu() just because it
> > > > > might acquire locks.
> > > > >
> > > >
> > > > I agree it's not RCU's fault ;-)
> > > >
> > > > I guess it'll be difficult to restrict BPF, however maybe BPF can call
> > > > call_srcu() in irq_work instead? Or a more systematic defer mechanism
> > > > that allows BPF to defer any lock holding functions to a different
> > > > context. (We have a similar issue that BPF cannot call kfree_rcu() in
> > > > some cases IIRC).
> > > >
> > > > But we need to fix this in v7.0, so this short-term fix is still needed.
> > > >
> > >
> > > I don't think this is an option, even longer term. We already do it
> > > when it's incorrect to invoke call_rcu() or any other API in a
> > > specific context (e.g., NMI, where we punt it using irq_work).
> > > However, the case reported in this thread is different. It was an
> > > existing user which worked fine before but got broken now. We were
> > > using call_rcu_tasks_trace() just fine in scx callbacks where rq->lock
> > > is held before, so the conversion underneath to call_srcu() should
> > > continue to remain transparent in this respect.
> > >
> >
> > I'm not sure that's a real argument here, kernel doesn't have a stable
> > internal API, which allows developers to refactor the code into a saner
> > way. There are currently multiple issues that suggest we may need a
> > defer mechanism for BPF core, and if it makes the code more easier to
> > reason about then why not? Think about it like a process that we learn
> > about all the defer patterns that BPF currently needs and wrap them in a
> > nice and maintainable way.
>
> This is all right in theory, but I don't understand how your
> theoretical deferral mechanism for BPF will help here in the case
> we're discussing, or is even appealing.
>
> How do we decide when to defer? Will we annotate all locks that can be
> held by RCU internals to be able to check if they are held (on the
> current cpu, which is non-trivial except by maintaining a held lock
> table, testing the locked bit is too conservative), and then deferring
> the call_srcu() from the caller in BPF? What if you gain new locks? It
> doesn't seem practical to me. Plus it pushes the burden of detection
> and deferral to the caller, making everything more complicated and
> error-prone.
>
My suggestion would be: deferring all call_srcu()s that in BPF
core. For new locks, I think every lock usage in BPF core should be
carefully audited because that's very similar to NMI. It's not hard to
trade BPF as a different context and let lockdep detect lock mis-usage,
similar as what we do for interrupts and NMIs.
Basically, if we want to use some synchronization in BPF core:
1. If it's re-entrant safe, then go ahead and use it.
2. If it has a lock, but can be deferred, use the general BPF defer
mechanism to defer the operation.
3. If it cannot be deferred, it has to change or add a new variant that
support either 1 or 2.
i.e. a universal solution.
> Also, any unconditional deferral in the caller for APIs that can "hold
> locks" to avoid all this is not without its cost.
>
> The implementation of RCU knows and can stay in sync with those
> conditions for when deferral is needed, and hide all that complexity
> from the caller. The cost should definitely be paid by the caller if
> we would break the API's broad contract, e.g., by trying to invoke it
The thing is, lots of the synchronization primitives existed before BPF,
and they were not designed or implemented with "BPF safe" in mind, and
they could be dragged into BPF core code path if we begin to use them.
For example, irq_work may be just "happen-to-work" here, or there is a
bug that we are missing. It would be rather easier or clearer if we
design a dedicate defer mechanism with BPF core in mind, and then we use
that for all the deferrable operations.
Regards,
Boqun
> in NMI which it is not supposed to run in yet, in that case we already
> handle things using irq_work. Anything more complicated than that is
> hard to scale. All of this may also change in the future where we
> support call_rcu_nolock() to make it work everywhere, and only defer
> when we detect reentrancy (in the same or different context).
>
>
>
> >
> > Regards,
> > Boqun
> >
> > > > Regars,
> > > > Boqun
> > > >
> > > > > > Regards,
> > > > > > Boqun
> > > > > >
> > > > > Sebastian
next prev parent reply other threads:[~2026-03-19 20:14 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 13:34 Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Paul E. McKenney
2026-03-18 10:50 ` Sebastian Andrzej Siewior
2026-03-18 11:49 ` Paul E. McKenney
2026-03-18 14:43 ` Sebastian Andrzej Siewior
2026-03-18 15:43 ` Paul E. McKenney
2026-03-18 16:04 ` Sebastian Andrzej Siewior
2026-03-18 16:32 ` Paul E. McKenney
2026-03-18 16:42 ` Boqun Feng
2026-03-18 18:45 ` Paul E. McKenney
2026-03-18 16:47 ` Sebastian Andrzej Siewior
2026-03-18 18:48 ` Paul E. McKenney
2026-03-19 8:55 ` Sebastian Andrzej Siewior
2026-03-19 10:05 ` Paul E. McKenney
2026-03-19 10:43 ` Paul E. McKenney
2026-03-19 10:51 ` Sebastian Andrzej Siewior
2026-03-18 15:51 ` Boqun Feng
2026-03-18 18:42 ` Paul E. McKenney
2026-03-18 20:04 ` Joel Fernandes
2026-03-18 20:11 ` Kumar Kartikeya Dwivedi
2026-03-18 20:25 ` Joel Fernandes
2026-03-18 21:52 ` Boqun Feng
2026-03-18 21:55 ` Boqun Feng
2026-03-18 22:15 ` Boqun Feng
2026-03-18 22:52 ` Joel Fernandes
2026-03-18 23:27 ` Boqun Feng
2026-03-19 1:08 ` Boqun Feng
2026-03-19 9:03 ` Sebastian Andrzej Siewior
2026-03-19 16:27 ` Boqun Feng
2026-03-19 16:33 ` Sebastian Andrzej Siewior
2026-03-19 16:48 ` Boqun Feng
2026-03-19 16:59 ` Kumar Kartikeya Dwivedi
2026-03-19 17:27 ` Boqun Feng
2026-03-19 18:41 ` Kumar Kartikeya Dwivedi
2026-03-19 20:14 ` Boqun Feng [this message]
2026-03-19 20:21 ` Joel Fernandes
2026-03-19 20:39 ` Boqun Feng
2026-03-20 15:34 ` Paul E. McKenney
2026-03-20 15:59 ` Boqun Feng
2026-03-20 16:24 ` Paul E. McKenney
2026-03-20 16:57 ` Boqun Feng
2026-03-20 17:54 ` Joel Fernandes
2026-03-20 18:14 ` [PATCH] rcu: Use an intermediate irq_work to start process_srcu() Boqun Feng
2026-03-20 19:18 ` Joel Fernandes
2026-03-20 20:47 ` Andrea Righi
2026-03-20 20:54 ` Boqun Feng
2026-03-20 21:00 ` Andrea Righi
2026-03-20 21:02 ` Andrea Righi
2026-03-20 21:06 ` Boqun Feng
2026-03-20 22:29 ` [PATCH v2] " Boqun Feng
2026-03-23 21:09 ` Joel Fernandes
2026-03-23 22:18 ` Boqun Feng
2026-03-23 22:50 ` Joel Fernandes
2026-03-24 11:27 ` Frederic Weisbecker
2026-03-24 14:56 ` Joel Fernandes
2026-03-24 14:56 ` Alexei Starovoitov
2026-03-24 17:36 ` Boqun Feng
2026-03-24 18:40 ` Joel Fernandes
2026-03-24 19:23 ` Paul E. McKenney
2026-03-26 19:12 ` patchwork-bot+netdevbpf
2026-03-21 4:27 ` [PATCH] " Zqiang
2026-03-21 18:15 ` Boqun Feng
2026-03-21 10:10 ` Paul E. McKenney
2026-03-21 17:15 ` Boqun Feng
2026-03-21 17:41 ` Paul E. McKenney
2026-03-21 18:06 ` Boqun Feng
2026-03-21 19:31 ` Paul E. McKenney
2026-03-21 19:45 ` Boqun Feng
2026-03-21 20:07 ` Paul E. McKenney
2026-03-21 20:08 ` Boqun Feng
2026-03-22 10:09 ` Paul E. McKenney
2026-03-22 16:16 ` Boqun Feng
2026-03-22 17:09 ` Paul E. McKenney
2026-03-22 17:31 ` Boqun Feng
2026-03-22 17:44 ` Paul E. McKenney
2026-03-22 18:17 ` Boqun Feng
2026-03-22 19:47 ` Paul E. McKenney
2026-03-22 20:26 ` Boqun Feng
2026-03-23 7:50 ` Paul E. McKenney
2026-03-20 18:20 ` Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Boqun Feng
2026-03-20 23:11 ` Paul E. McKenney
2026-03-21 3:29 ` Paul E. McKenney
2026-03-21 17:03 ` [RFC PATCH] rcu-tasks: Avoid using mod_timer() in call_rcu_tasks_generic() Boqun Feng
2026-03-23 15:17 ` Boqun Feng
2026-03-23 20:37 ` Joel Fernandes
2026-03-23 21:50 ` Kumar Kartikeya Dwivedi
2026-03-23 22:13 ` Boqun Feng
2026-03-20 16:15 ` Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Boqun Feng
2026-03-20 16:24 ` Paul E. McKenney
2026-03-19 17:02 ` Sebastian Andrzej Siewior
2026-03-19 17:44 ` Boqun Feng
2026-03-19 18:42 ` Joel Fernandes
2026-03-19 20:20 ` Boqun Feng
2026-03-19 20:26 ` Joel Fernandes
2026-03-19 20:45 ` Joel Fernandes
2026-03-19 10:02 ` Paul E. McKenney
2026-03-19 14:34 ` Boqun Feng
2026-03-19 16:10 ` Paul E. McKenney
2026-03-18 23:56 ` Kumar Kartikeya Dwivedi
2026-03-19 0:26 ` Zqiang
2026-03-19 1:13 ` Boqun Feng
2026-03-19 2:47 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abxZEE4SAMkNEleq@tardis.local \
--to=boqun@kernel.org \
--cc=ast@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=frederic@kernel.org \
--cc=joelagnelf@nvidia.com \
--cc=john.fastabend@gmail.com \
--cc=memxor@gmail.com \
--cc=neeraj.iitr10@gmail.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox