From: Joel Fernandes <joelagnelf@nvidia.com>
To: "paulmck@kernel.org" <paulmck@kernel.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Steve Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: [PATCH v3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
Date: Fri, 12 Dec 2025 23:54:28 +0000 [thread overview]
Message-ID: <C9254103-18E1-480F-8009-003EB44F6F2F@nvidia.com> (raw)
In-Reply-To: <d863f1ad-477d-4e3f-a0b5-fa9f282a164a@paulmck-laptop>
> On Dec 13, 2025, at 8:10 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Dec 12, 2025 at 09:28:37AM +0000, Joel Fernandes wrote:
>>
>>
>>>> On Dec 12, 2025, at 4:50 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>
>>> On Fri, Dec 12, 2025 at 03:43:07AM +0000, Joel Fernandes wrote:
>>>>
>>>>
>>>>>> On Dec 12, 2025, at 9:47 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>>>
>>>>> On Fri, Dec 12, 2025 at 09:12:07AM +0900, Joel Fernandes wrote:
>>>>>>
>>>>>>
>>>>>>> On 12/11/2025 3:23 PM, Paul E. McKenney wrote:
>>>>>>> On Thu, Dec 11, 2025 at 08:02:15PM +0000, Joel Fernandes wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Dec 8, 2025, at 1:20 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>>>>>>>
>>>>>>>>> The current use of guard(preempt_notrace)() within __DECLARE_TRACE()
>>>>>>>>> to protect invocation of __DO_TRACE_CALL() means that BPF programs
>>>>>>>>> attached to tracepoints are non-preemptible. This is unhelpful in
>>>>>>>>> real-time systems, whose users apparently wish to use BPF while also
>>>>>>>>> achieving low latencies. (Who knew?)
>>>>>>>>>
>>>>>>>>> One option would be to use preemptible RCU, but this introduces
>>>>>>>>> many opportunities for infinite recursion, which many consider to
>>>>>>>>> be counterproductive, especially given the relatively small stacks
>>>>>>>>> provided by the Linux kernel. These opportunities could be shut down
>>>>>>>>> by sufficiently energetic duplication of code, but this sort of thing
>>>>>>>>> is considered impolite in some circles.
>>>>>>>>>
>>>>>>>>> Therefore, use the shiny new SRCU-fast API, which provides somewhat faster
>>>>>>>>> readers than those of preemptible RCU, at least on Paul E. McKenney's
>>>>>>>>> laptop, where task_struct access is more expensive than access to per-CPU
>>>>>>>>> variables. And SRCU-fast provides way faster readers than does SRCU,
>>>>>>>>> courtesy of being able to avoid the read-side use of smp_mb(). Also,
>>>>>>>>> it is quite straightforward to create srcu_read_{,un}lock_fast_notrace()
>>>>>>>>> functions.
>>>>>>>>>
>>>>>>>>> While in the area, SRCU now supports early boot call_srcu(). Therefore,
>>>>>>>>> remove the checks that used to avoid such use from rcu_free_old_probes()
>>>>>>>>> before this commit was applied:
>>>>>>>>>
>>>>>>>>> e53244e2c893 ("tracepoint: Remove SRCU protection")
>>>>>>>>>
>>>>>>>>> The current commit can be thought of as an approximate revert of that
>>>>>>>>> commit, with some compensating additions of preemption disabling.
>>>>>>>>> This preemption disabling uses guard(preempt_notrace)().
>>>>>>>>>
>>>>>>>>> However, Yonghong Song points out that BPF assumes that non-sleepable
>>>>>>>>> BPF programs will remain on the same CPU, which means that migration
>>>>>>>>> must be disabled whenever preemption remains enabled. In addition,
>>>>>>>>> non-RT kernels have performance expectations that would be violated by
>>>>>>>>> allowing the BPF programs to be preempted.
>>>>>>>>>
>>>>>>>>> Therefore, continue to disable preemption in non-RT kernels, and protect
>>>>>>>>> the BPF program with both SRCU and migration disabling for RT kernels,
>>>>>>>>> and even then only if preemption is not already disabled.
>>>>>>>>
>>>>>>>> Hi Paul,
>>>>>>>>
>>>>>>>> Is there a reason to not make non-RT also benefit from SRCU fast and trace points for BPF? Can be a follow up patch though if needed.
>>>>>>>
>>>>>>> Because in some cases the non-RT benefit is suspected to be negative
>>>>>>> due to increasing the probability of preemption in awkward places.
>>>>>>
>>>>>> Since you mentioned suspected, I am guessing there is no concrete data collected
>>>>>> to substantiate that specifically for BPF programs, but correct me if I missed
>>>>>> something. Assuming you're referring to latency versus tradeoffs issues, due to
>>>>>> preemption, Android is not PREEMPT_RT but is expected to be low latency in
>>>>>> general as well. So is this decision the right one for Android as well,
>>>>>> considering that (I heard) it uses BPF? Just an open-ended question.
>>>>>>
>>>>>> There is also issue of 2 different paths for PREEMPT_RT versus otherwise,
>>>>>> complicating the tracing side so there better be a reason for that I guess.
>>>>>
>>>>> You are advocating a change in behavior for non-RT workloads. Why do
>>>>> you believe that this change would be OK for those workloads?
>>>>
>>>> Same reasons I provided in my last email. If we are saying SRCU-fast is required for lower latency, I find it strange that we are leaving out Android which has low latency audio usecases, for instance.
>>>
>>> If Android provides numbers showing that it helps them, then it is easy
>>> to provide a Kconfig option that defaults to PREEMPT_RT, but that Android
>>> can override. Right?
>>
>> Sure, but my suspicion is Android or others are not going to look into every PREEMPT_RT specific optimization (not just this one) and see if it benefits their interactivity usecases. They will simply miss out on it without knowing they are.
>>
>> It might be a good idea (for me) to explore how many such optimizations exist though, that we take for granted. I will look into exploring this on my side. :)
>
> One workload's optimization is another workload's pessimization, in
> part because there are a lot of different measures of performance that
> different workloads care about..
>
> But as a practical matter, this is Steven's decision.
>
> Though if he does change the behavior on non-RT setups, I would thank
> him to remove my name from the commit, or at least record in the commit
> log that I object to changing other workloads' behaviors.
You have a point. I am not saying we should do this for sure but should at least consider / explore it.
Thanks.
>
> Thanx, Paul
>
>> thanks,
>>
>> - Joel
>>
>>>
>>> Thanx, Paul
>>>
>>>> Thanks,
>>>>
>>>> - Joel
>>>>
>>>>
>>>>>
>>>>> Thanx, Paul
next prev parent reply other threads:[~2025-12-12 23:54 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-08 4:20 [PATCH v3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast Paul E. McKenney
2025-12-08 9:43 ` Steven Rostedt
2025-12-08 20:46 ` Paul E. McKenney
2025-12-09 0:38 ` Steven Rostedt
2025-12-09 22:29 ` Paul E. McKenney
2025-12-10 3:11 ` Steven Rostedt
2025-12-11 20:02 ` Joel Fernandes
2025-12-11 20:23 ` Paul E. McKenney
2025-12-12 0:12 ` Joel Fernandes
2025-12-12 0:47 ` Paul E. McKenney
2025-12-12 3:43 ` Joel Fernandes
2025-12-12 7:50 ` Paul E. McKenney
2025-12-12 9:28 ` Joel Fernandes
2025-12-12 23:10 ` Paul E. McKenney
2025-12-12 23:54 ` Joel Fernandes [this message]
2025-12-13 0:06 ` Paul E. McKenney
2025-12-13 2:18 ` Steven Rostedt
2025-12-13 4:19 ` Mathieu Desnoyers
2025-12-13 6:20 ` Steven Rostedt
2025-12-12 1:13 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=C9254103-18E1-480F-8009-003EB44F6F2F@nvidia.com \
--to=joelagnelf@nvidia.com \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox