BPF List
 help / color / mirror / Atom feed
From: Joel Fernandes <joelagnelf@nvidia.com>
To: "paulmck@kernel.org" <paulmck@kernel.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Steve Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: [PATCH v3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
Date: Fri, 12 Dec 2025 09:28:37 +0000	[thread overview]
Message-ID: <7683319A-AB3D-4DF4-8720-9C39E3C683BA@nvidia.com> (raw)
In-Reply-To: <83cd4b4d-1eec-47d0-be91-57c915775612@paulmck-laptop>



> On Dec 12, 2025, at 4:50 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Fri, Dec 12, 2025 at 03:43:07AM +0000, Joel Fernandes wrote:
>> 
>> 
>>>> On Dec 12, 2025, at 9:47 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
>>> 
>>> On Fri, Dec 12, 2025 at 09:12:07AM +0900, Joel Fernandes wrote:
>>>> 
>>>> 
>>>>> On 12/11/2025 3:23 PM, Paul E. McKenney wrote:
>>>>> On Thu, Dec 11, 2025 at 08:02:15PM +0000, Joel Fernandes wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Dec 8, 2025, at 1:20 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>>>>> 
>>>>>>> The current use of guard(preempt_notrace)() within __DECLARE_TRACE()
>>>>>>> to protect invocation of __DO_TRACE_CALL() means that BPF programs
>>>>>>> attached to tracepoints are non-preemptible.  This is unhelpful in
>>>>>>> real-time systems, whose users apparently wish to use BPF while also
>>>>>>> achieving low latencies.  (Who knew?)
>>>>>>> 
>>>>>>> One option would be to use preemptible RCU, but this introduces
>>>>>>> many opportunities for infinite recursion, which many consider to
>>>>>>> be counterproductive, especially given the relatively small stacks
>>>>>>> provided by the Linux kernel.  These opportunities could be shut down
>>>>>>> by sufficiently energetic duplication of code, but this sort of thing
>>>>>>> is considered impolite in some circles.
>>>>>>> 
>>>>>>> Therefore, use the shiny new SRCU-fast API, which provides somewhat faster
>>>>>>> readers than those of preemptible RCU, at least on Paul E. McKenney's
>>>>>>> laptop, where task_struct access is more expensive than access to per-CPU
>>>>>>> variables.  And SRCU-fast provides way faster readers than does SRCU,
>>>>>>> courtesy of being able to avoid the read-side use of smp_mb().  Also,
>>>>>>> it is quite straightforward to create srcu_read_{,un}lock_fast_notrace()
>>>>>>> functions.
>>>>>>> 
>>>>>>> While in the area, SRCU now supports early boot call_srcu().  Therefore,
>>>>>>> remove the checks that used to avoid such use from rcu_free_old_probes()
>>>>>>> before this commit was applied:
>>>>>>> 
>>>>>>> e53244e2c893 ("tracepoint: Remove SRCU protection")
>>>>>>> 
>>>>>>> The current commit can be thought of as an approximate revert of that
>>>>>>> commit, with some compensating additions of preemption disabling.
>>>>>>> This preemption disabling uses guard(preempt_notrace)().
>>>>>>> 
>>>>>>> However, Yonghong Song points out that BPF assumes that non-sleepable
>>>>>>> BPF programs will remain on the same CPU, which means that migration
>>>>>>> must be disabled whenever preemption remains enabled.  In addition,
>>>>>>> non-RT kernels have performance expectations that would be violated by
>>>>>>> allowing the BPF programs to be preempted.
>>>>>>> 
>>>>>>> Therefore, continue to disable preemption in non-RT kernels, and protect
>>>>>>> the BPF program with both SRCU and migration disabling for RT kernels,
>>>>>>> and even then only if preemption is not already disabled.
>>>>>> 
>>>>>> Hi Paul,
>>>>>> 
>>>>>> Is there a reason to not make non-RT also benefit from SRCU fast and trace points for BPF? Can be a follow up patch though if needed.
>>>>> 
>>>>> Because in some cases the non-RT benefit is suspected to be negative
>>>>> due to increasing the probability of preemption in awkward places.
>>>> 
>>>> Since you mentioned suspected, I am guessing there is no concrete data collected
>>>> to substantiate that specifically for BPF programs, but correct me if I missed
>>>> something. Assuming you're referring to latency versus tradeoffs issues, due to
>>>> preemption, Android is not PREEMPT_RT but is expected to be low latency in
>>>> general as well. So is this decision the right one for Android as well,
>>>> considering that (I heard) it uses BPF? Just an open-ended question.
>>>> 
>>>> There is also issue of 2 different paths for PREEMPT_RT versus otherwise,
>>>> complicating the tracing side so there better be a reason for that I guess.
>>> 
>>> You are advocating a change in behavior for non-RT workloads.  Why do
>>> you believe that this change would be OK for those workloads?
>> 
>> Same reasons I provided in my last email. If we are saying SRCU-fast is required for lower latency, I find it strange that we are leaving out Android which has low latency audio usecases, for instance.
> 
> If Android provides numbers showing that it helps them, then it is easy
> to provide a Kconfig option that defaults to PREEMPT_RT, but that Android
> can override.  Right?

Sure, but my suspicion is Android or others are not going to look into every PREEMPT_RT specific optimization (not just this one) and see if it benefits their interactivity usecases. They will simply miss out on it without knowing they are.

It might be a good idea (for me) to explore how many such optimizations exist though, that we take for granted. I will look into exploring this on my side. :)

thanks,

 - Joel 

> 
>                            Thanx, Paul
> 
>> Thanks,
>> 
>> - Joel
>> 
>> 
>>> 
>>>                           Thanx, Paul

  reply	other threads:[~2025-12-12  9:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-08  4:20 [PATCH v3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast Paul E. McKenney
2025-12-08  9:43 ` Steven Rostedt
2025-12-08 20:46   ` Paul E. McKenney
2025-12-09  0:38     ` Steven Rostedt
2025-12-09 22:29       ` Paul E. McKenney
2025-12-10  3:11         ` Steven Rostedt
2025-12-11 20:02 ` Joel Fernandes
2025-12-11 20:23   ` Paul E. McKenney
2025-12-12  0:12     ` Joel Fernandes
2025-12-12  0:47       ` Paul E. McKenney
2025-12-12  3:43         ` Joel Fernandes
2025-12-12  7:50           ` Paul E. McKenney
2025-12-12  9:28             ` Joel Fernandes [this message]
2025-12-12 23:10               ` Paul E. McKenney
2025-12-12 23:54                 ` Joel Fernandes
2025-12-13  0:06                   ` Paul E. McKenney
2025-12-13  2:18                     ` Steven Rostedt
2025-12-13  4:19                       ` Mathieu Desnoyers
2025-12-13  6:20                         ` Steven Rostedt
2025-12-12  1:13     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7683319A-AB3D-4DF4-8720-9C39E3C683BA@nvidia.com \
    --to=joelagnelf@nvidia.com \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox