From: Joel Fernandes <joelagnelf@nvidia.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: paulmck@kernel.org, Boqun Feng <boqun@kernel.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
frederic@kernel.org, neeraj.iitr10@gmail.com, urezki@gmail.com,
boqun.feng@gmail.com, rcu@vger.kernel.org
Subject: Re: Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT
Date: Wed, 18 Mar 2026 16:25:09 -0400 [thread overview]
Message-ID: <7bbc5c57-2a69-43c8-bdc7-806b05c1a60e@nvidia.com> (raw)
In-Reply-To: <CAP01T77waxWVZ7Ftscn9PZtUb=MaPZYtYJz36Xhs+2fe3pqP9w@mail.gmail.com>
On 3/18/2026 4:11 PM, Kumar Kartikeya Dwivedi wrote:
> On Wed, 18 Mar 2026 at 21:04, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>
>> On 3/18/2026 2:42 PM, Paul E. McKenney wrote:
>>> On Wed, Mar 18, 2026 at 08:51:16AM -0700, Boqun Feng wrote:
>>>> On Wed, Mar 18, 2026 at 03:43:05PM +0100, Sebastian Andrzej Siewior wrote:
>>>> [..]
>>>>>>>> way that vanilla RCU's call_rcu_core() function takes an early exit if
>>>>>>>> interrupts are disabled. Of course, vanilla RCU can rely on things like
>>>>>>>> the scheduling-clock interrupt to start any needed grace periods [1],
>>>>>>>> but SRCU will instead need to manually defer this work, perhaps using
>>>>>>>> workqueues or IRQ work.
>>>>>>>>
>>>>>>>> In addition, rcutorture needs to be upgraded to sometimes invoke
>>>>>>>> ->call() with the scheduler pi lock held, but this change is not fixing
>>>>>>>> a regression, so could be deferred. (There is already code in rcutorture
>>>>>>>> that invokes the readers while holding a scheduler pi lock.)
>>>>>>>>
>>>>>>>> Given that RCU for this week through the end of March belongs to you guys,
>>>>>>>> if one of you can get this done by end of day Thursday, London time,
>>>>>>>> very good! Otherwise, I can put something together.
>>>>>>>>
>>>>>>>> Please let me know!
>>>>>>>
>>>>>>> Given that the current locking does allow it and lockdep should have
>>>>>>> complained, I am curious if we could rule that out ;)
>>>>>
>>>>> Your patch just s/spinlock_t/raw_spinlock_t so we get the locking/
>>>>> nesting right. The wakeup problem remains, right?
>>>>> But looking at the code, there is just srcu_funnel_gp_start(). If its
>>>>> srcu_schedule_cbs_sdp() / queue_delayed_work() usage is always delayed
>>>>> then there will be always a timer and never a direct wake up of the
>>>>> worker. Wouldn't that work?
>>>>
>>>> Late to the party, so just make sure I understand the problem. The
>>>> problem is the wakeup in call_srcu() when it's called with scheduler
>>>> lock held, right? If so I think the current code works as what you
>>>> already explain, we defer the wakeup into a workqueue.
>>>
>>> The issue is that call_rcu_tasks() (which is call_srcu() now) is
>>> also invoked with a scheduler pi/rq lock held, which results in a
>>> deadlock cycle. So the srcu_gp_start_if_needed() function's call to
>>> raw_spin_lock_irqsave_sdp_contention() must be deferred to the workqueue
>>> handler, not just the wake-up. And that in turn means that the callback
>>> point also needs to be passed to this handler.
>>>
>>> See this email thread:
>>>
>>> https://lore.kernel.org/all/CAP01T75eKpvw+95NqNWg9P-1+kzVzojpN0NLat+28SF1B9wQQQ@mail.gmail.com/
>>>
>>>> (but Paul, we are not talking about calling call_srcu(), that requires
>>>> some more work to get it work)
>>>
>>> Agreed, splitting srcu_gp_start_if_needed() and using a workqueue if
>>> interrupts were already disabled on entry. Otherwise, directly invoking
>>> the split-out portion of srcu_gp_start_if_needed().
>>>
>>> But we might be talking past each other.
>>>
>>
>> Ah so it is an ABBA deadlock, not a ABA self-deadlock. I guess this is a
>> different issue, from the NMI issue? It is more of an issue of calling
>> call_srcu API with scheduler locks held.
>>
>> Something like below I think:
>>
>> CPU A (BPF tracepoint) CPU B (concurrent call_srcu)
>> ---------------------------- ------------------------------------
>> [1] holds &rq->__lock
>> [2]
>> -> call_srcu
>> -> srcu_gp_start_if_needed
>> -> srcu_funnel_gp_start
>> -> spin_lock_irqsave_ssp_content...
>> -> holds srcu locks
>>
>> [4] calls call_rcu_tasks_trace() [5] srcu_funnel_gp_start (cont..)
>> -> queue_delayed_work
>> -> call_srcu() -> __queue_work()
>> -> srcu_gp_start_if_needed() -> wake_up_worker()
>> -> srcu_funnel_gp_start() -> try_to_wake_up()
>> -> spin_lock_irqsave_ssp_contention() [6] WANTS rq->__lock
>> -> WANTS srcu locks
>>
>> If I understand this, this looks like an issue that can happen independent
>> of the conversion of the spin locks.
>>
>
> Yes, this is a separate issue, we should make the conversion to raw
> spin locks anyway, but lockdep found this once we applied that fix
> from Paul.
> In sched-ext, we can end up calling call_srcu() while rq->lock is
> held, e.g. from exit_task() -> some bpf map that deletes an element ->
> call_srcu().
> There are other callbacks of course where it can be held, and other
> programs that can run tracing the kernel while it is held.
>
Thanks. I guess I am also wondering, why didn't lockdep find it without the
conversion to raw spin locks though? An ABBA deadlock should have been
detected either way. Is there some difference in lockdep's ability to find
deadlocks depending on whether a spinlock is raw?
Anyway, I am applying the raw lock conversion fix and running some more tests.
thanks,
--
Joel Fernandes
next prev parent reply other threads:[~2026-03-18 20:25 UTC|newest]
Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 13:34 Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Paul E. McKenney
2026-03-18 10:50 ` Sebastian Andrzej Siewior
2026-03-18 11:49 ` Paul E. McKenney
2026-03-18 14:43 ` Sebastian Andrzej Siewior
2026-03-18 15:43 ` Paul E. McKenney
2026-03-18 16:04 ` Sebastian Andrzej Siewior
2026-03-18 16:32 ` Paul E. McKenney
2026-03-18 16:42 ` Boqun Feng
2026-03-18 18:45 ` Paul E. McKenney
2026-03-18 16:47 ` Sebastian Andrzej Siewior
2026-03-18 18:48 ` Paul E. McKenney
2026-03-19 8:55 ` Sebastian Andrzej Siewior
2026-03-19 10:05 ` Paul E. McKenney
2026-03-19 10:43 ` Paul E. McKenney
2026-03-19 10:51 ` Sebastian Andrzej Siewior
2026-03-18 15:51 ` Boqun Feng
2026-03-18 18:42 ` Paul E. McKenney
2026-03-18 20:04 ` Joel Fernandes
2026-03-18 20:11 ` Kumar Kartikeya Dwivedi
2026-03-18 20:25 ` Joel Fernandes [this message]
2026-03-18 21:52 ` Boqun Feng
2026-03-18 21:55 ` Boqun Feng
2026-03-18 22:15 ` Boqun Feng
2026-03-18 22:52 ` Joel Fernandes
2026-03-18 23:27 ` Boqun Feng
2026-03-19 1:08 ` Boqun Feng
2026-03-19 9:03 ` Sebastian Andrzej Siewior
2026-03-19 16:27 ` Boqun Feng
2026-03-19 16:33 ` Sebastian Andrzej Siewior
2026-03-19 16:48 ` Boqun Feng
2026-03-19 16:59 ` Kumar Kartikeya Dwivedi
2026-03-19 17:27 ` Boqun Feng
2026-03-19 18:41 ` Kumar Kartikeya Dwivedi
2026-03-19 20:14 ` Boqun Feng
2026-03-19 20:21 ` Joel Fernandes
2026-03-19 20:39 ` Boqun Feng
2026-03-20 15:34 ` Paul E. McKenney
2026-03-20 15:59 ` Boqun Feng
2026-03-20 16:24 ` Paul E. McKenney
2026-03-20 16:57 ` Boqun Feng
2026-03-20 17:54 ` Joel Fernandes
2026-03-20 18:14 ` [PATCH] rcu: Use an intermediate irq_work to start process_srcu() Boqun Feng
2026-03-20 19:18 ` Joel Fernandes
2026-03-20 20:47 ` Andrea Righi
2026-03-20 20:54 ` Boqun Feng
2026-03-20 21:00 ` Andrea Righi
2026-03-20 21:02 ` Andrea Righi
2026-03-20 21:06 ` Boqun Feng
2026-03-20 22:29 ` [PATCH v2] " Boqun Feng
2026-03-23 21:09 ` Joel Fernandes
2026-03-23 22:18 ` Boqun Feng
2026-03-23 22:50 ` Joel Fernandes
2026-03-24 11:27 ` Frederic Weisbecker
2026-03-24 14:56 ` Joel Fernandes
2026-03-24 14:56 ` Alexei Starovoitov
2026-03-24 17:36 ` Boqun Feng
2026-03-24 18:40 ` Joel Fernandes
2026-03-24 19:23 ` Paul E. McKenney
2026-03-21 4:27 ` [PATCH] " Zqiang
2026-03-21 18:15 ` Boqun Feng
2026-03-21 10:10 ` Paul E. McKenney
2026-03-21 17:15 ` Boqun Feng
2026-03-21 17:41 ` Paul E. McKenney
2026-03-21 18:06 ` Boqun Feng
2026-03-21 19:31 ` Paul E. McKenney
2026-03-21 19:45 ` Boqun Feng
2026-03-21 20:07 ` Paul E. McKenney
2026-03-21 20:08 ` Boqun Feng
2026-03-22 10:09 ` Paul E. McKenney
2026-03-22 16:16 ` Boqun Feng
2026-03-22 17:09 ` Paul E. McKenney
2026-03-22 17:31 ` Boqun Feng
2026-03-22 17:44 ` Paul E. McKenney
2026-03-22 18:17 ` Boqun Feng
2026-03-22 19:47 ` Paul E. McKenney
2026-03-22 20:26 ` Boqun Feng
2026-03-23 7:50 ` Paul E. McKenney
2026-03-20 18:20 ` Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Boqun Feng
2026-03-20 23:11 ` Paul E. McKenney
2026-03-21 3:29 ` Paul E. McKenney
2026-03-21 17:03 ` [RFC PATCH] rcu-tasks: Avoid using mod_timer() in call_rcu_tasks_generic() Boqun Feng
2026-03-23 15:17 ` Boqun Feng
2026-03-23 20:37 ` Joel Fernandes
2026-03-23 21:50 ` Kumar Kartikeya Dwivedi
2026-03-23 22:13 ` Boqun Feng
2026-03-20 16:15 ` Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Boqun Feng
2026-03-20 16:24 ` Paul E. McKenney
2026-03-19 17:02 ` Sebastian Andrzej Siewior
2026-03-19 17:44 ` Boqun Feng
2026-03-19 18:42 ` Joel Fernandes
2026-03-19 20:20 ` Boqun Feng
2026-03-19 20:26 ` Joel Fernandes
2026-03-19 20:45 ` Joel Fernandes
2026-03-19 10:02 ` Paul E. McKenney
2026-03-19 14:34 ` Boqun Feng
2026-03-19 16:10 ` Paul E. McKenney
2026-03-18 23:56 ` Kumar Kartikeya Dwivedi
2026-03-19 0:26 ` Zqiang
2026-03-19 1:13 ` Boqun Feng
2026-03-19 2:47 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7bbc5c57-2a69-43c8-bdc7-806b05c1a60e@nvidia.com \
--to=joelagnelf@nvidia.com \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=boqun@kernel.org \
--cc=frederic@kernel.org \
--cc=memxor@gmail.com \
--cc=neeraj.iitr10@gmail.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox