linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Boqun Feng <boqun.feng@gmail.com>,
	linux-rt-devel@lists.linux.dev, rcu@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	Frederic Weisbecker <frederic@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Uladzislau Rezki <urezki@gmail.com>,
	Zqiang <qiang.zhang@linux.dev>,
	bpf@vger.kernel.org
Subject: Re: [RFC PATCH 1/2] rcu: Add rcu_read_lock_notrace()
Date: Thu, 17 Jul 2025 08:07:11 -0700	[thread overview]
Message-ID: <62d91ce9-22b3-435f-b34a-cc2a65ce3b39@paulmck-laptop> (raw)
In-Reply-To: <e8f7829c-51c9-494a-827a-ee471b2e17cd@efficios.com>

On Thu, Jul 17, 2025 at 09:14:41AM -0400, Mathieu Desnoyers wrote:
> On 2025-07-16 18:54, Paul E. McKenney wrote:
> > On Wed, Jul 16, 2025 at 01:35:48PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jul 16, 2025 at 11:09:22AM -0400, Steven Rostedt wrote:
> > > > On Fri, 11 Jul 2025 10:05:26 -0700
> > > > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > > > 
> > > > > This trace point will invoke rcu_read_unlock{,_notrace}(), which will
> > > > > note that preemption is disabled.  If rcutree.use_softirq is set and
> > > > > this task is blocking an expedited RCU grace period, it will directly
> > > > > invoke the non-notrace function raise_softirq_irqoff().  Otherwise,
> > > > > it will directly invoke the non-notrace function irq_work_queue_on().
> > > > 
> > > > Just to clarify some things; A function annotated by "notrace" simply
> > > > will not have the ftrace hook to that function, but that function may
> > > > very well have tracing triggered inside of it.
> > > > 
> > > > Functions with "_notrace" in its name (like preempt_disable_notrace())
> > > > should not have any tracing instrumentation (as Mathieu stated)
> > > > inside of it, so that it can be used in the tracing infrastructure.
> > > > 
> > > > raise_softirq_irqoff() has a tracepoint inside of it. If we have the
> > > > tracing infrastructure call that, and we happen to enable that
> > > > tracepoint, we will have:
> > > > 
> > > >    raise_softirq_irqoff()
> > > >       trace_softirq_raise()
> > > >         [..]
> > > >           raise_softirq_irqoff()
> > > >              trace_softirq_raise()
> > > >                 [..]
> > > >                   Ad infinitum!
> > > > 
> > > > I'm not sure if that's what is being proposed or not, but I just wanted
> > > > to make sure everyone is aware of the above.
> > > 
> > > OK, I *think* I might actually understand the problem.  Maybe.
> > > 
> > > I am sure that the usual suspects will not be shy about correcting any
> > > misapprehensions in the following.  ;-)
> > > 
> > > My guess is that some users of real-time Linux would like to use BPF
> > > programs while still getting decent latencies out of their systems.
> > > (Not something I would have predicted, but then again, I was surprised
> > > some years back to see people with a 4096-CPU system complaining about
> > > 200-microsecond latency blows from RCU.)  And the BPF guys (now CCed)
> > > made some changes some years back to support this, perhaps most notably
> > > replacing some uses of preempt_disable() with migrate_disable().
> > > 
> > > Except that the current __DECLARE_TRACE() macro defeats this work
> > > for tracepoints by disabling preemption across the tracepoint call,
> > > which might well be a BPF program.  So we need to do something to
> > > __DECLARE_TRACE() to get the right sort of protection while still leaving
> > > preemption enabled.
> > > 
> > > One way of attacking this problem is to use preemptible RCU.  The problem
> > > with this is that although one could construct a trace-safe version
> > > of rcu_read_unlock(), these would negate some optimizations that Lai
> > > Jiangshan worked so hard to put in place.  Plus those optimizations
> > > also simplified the code quite a bit.  Which is why I was pushing back
> > > so hard, especially given that I did not realize that real-time systems
> > > would be running BPF programs concurrently with real-time applications.
> > > This meant that I was looking for a functional problem with the current
> > > disabling of preemption, and not finding it.
> > > 
> > > So another way of dealing with this is to use SRCU-fast, which is
> > > like SRCU, but dispenses with the smp_mb() calls and the redundant
> > > read-side array indexing.  Plus it is easy to make _notrace variants
> > > srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace(),
> > > along with the requisite guards.
> > > 
> > > Re-introducing SRCU requires reverting most of e53244e2c893 ("tracepoint:
> > > Remove SRCU protection"), and I have hacked together this and the
> > > prerequisites mentioned in the previous paragraph.
> > > 
> > > These are passing ridiculously light testing, but probably have at
> > > least their share of bugs.
> > > 
> > > But first, do I actually finally understand the problem?
> > 
> > OK, they pass somewhat less ridiculously moderate testing, though I have
> > not yet hit them over the head with the ftrace selftests.
> > 
> > So might as well post them.
> > 
> > Thoughts?
> 
> Your explanation of the problem context fits my understanding.
> 
> Note that I've mostly been pulled into this by Sebastian who wanted
> to understand better the how we could make the tracepoint
> instrumentation work with bpf probes that need to sleep due to
> locking. Hence my original somewhat high-level desiderata.
> 
> I'm glad this seems to be converging towards a concrete solution.
> 
> There are two things I'm wondering:
> 
> 1) Would we want to always use srcu-fast (for both preempt and
>    non-preempt kernels ?), or is there any downside compared to
>    preempt-off rcu ? (e.g. overhead ?)

For kernels built with CONFIG_PREEMPT_DYNAMIC=n and either
CONFIG_PREEMPT_NONE=y or CONFIG_PREEMPT_VOLUNTARY=y, non-preemptible
RCU would be faster.  I did consider this, but decided to keep the
initial patch simple.

>    If the overhead is similar when actually used by tracers
>    (I'm talking about actual workload benchmark and not a
>    microbenchmark), I would tend to err towards simplicity
>    and to minimize the number of configurations to test, and
>    use srcu-fast everywhere.

To this point, I was wondering whether it is still necessary to do the
call_rcu() stage, but left it because that is the safe mistake to make.

I am testing a fifth patch that removes the early-boot deferral of
call_srcu() because call_srcu() now does exactly this deferral internally.

> 2) I think I'm late to the party in reviewing srcu-fast, I'll
>    go have a look :)

Looking forward to seeing what you come up with!

I deferred one further optimization, namely statically classifying
srcu_struct structures as intended for vanilla, _nmisafe(), or _fast()
use, or at least doing so at initialization time.  This would get rid
of the call to srcu_check_read_flavor_force() in srcu_read_lock_fast()
srcu_read_unlock_fast(), and friends, or at least to tuck it under
CONFIG_PROVE_RCU.  On my laptop, this saves an additional 25%, though
that 25% amounts to a big half of a nanosecond.

Thoughts?

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com

  parent reply	other threads:[~2025-07-17 15:07 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-13 15:22 [RFC PATCH 0/2] Switch tracing from sched-RCU to preempt-RCU Sebastian Andrzej Siewior
2025-06-13 15:22 ` [RFC PATCH 1/2] rcu: Add rcu_read_lock_notrace() Sebastian Andrzej Siewior
2025-06-18 17:21   ` Boqun Feng
2025-06-20  8:43     ` Sebastian Andrzej Siewior
2025-06-20 11:23       ` Paul E. McKenney
2025-06-23 10:49         ` Sebastian Andrzej Siewior
2025-06-23 18:13           ` Paul E. McKenney
2025-07-07 21:56             ` Paul E. McKenney
2025-07-08 19:40               ` Mathieu Desnoyers
2025-07-08 20:49                 ` Paul E. McKenney
2025-07-09 14:31                   ` Mathieu Desnoyers
2025-07-09 18:33                     ` Paul E. McKenney
2025-07-11 13:46                       ` Mathieu Desnoyers
2025-07-11 17:05                         ` Paul E. McKenney
2025-07-14 16:34                           ` Paul E. McKenney
2025-07-15 19:56                             ` Mathieu Desnoyers
2025-07-15 23:23                               ` Paul E. McKenney
2025-07-15 19:54                           ` Mathieu Desnoyers
2025-07-15 23:18                             ` Paul E. McKenney
2025-07-16  0:42                               ` Paul E. McKenney
2025-07-16  4:41                                 ` Paul E. McKenney
2025-07-16 15:09                           ` Steven Rostedt
2025-07-16 20:35                             ` Paul E. McKenney
2025-07-16 22:54                               ` Paul E. McKenney
2025-07-17 13:14                                 ` Mathieu Desnoyers
2025-07-17 14:46                                   ` Mathieu Desnoyers
2025-07-17 15:18                                     ` Paul E. McKenney
2025-07-17 19:36                                       ` Mathieu Desnoyers
2025-07-17 21:27                                         ` Paul E. McKenney
2025-07-17 14:57                                   ` Alexei Starovoitov
2025-07-17 15:12                                     ` Steven Rostedt
2025-07-17 15:27                                       ` Alexei Starovoitov
2025-07-17 15:40                                         ` Steven Rostedt
2025-07-17 15:55                                           ` Steven Rostedt
2025-07-17 16:02                                             ` Alexei Starovoitov
2025-07-17 16:19                                               ` Steven Rostedt
2025-07-17 17:38                                               ` Mathieu Desnoyers
2025-07-17 16:04                                             ` Paul E. McKenney
2025-07-17 15:44                                         ` Paul E. McKenney
2025-07-17 15:30                                     ` Paul E. McKenney
2025-07-17 15:07                                   ` Paul E. McKenney [this message]
2025-07-17 19:04                                 ` [PATCH RFC 6/4] srcu: Add guards for SRCU-fast readers Paul E. McKenney
2025-07-17 19:19                                   ` Steven Rostedt
2025-07-17 19:51                                     ` Paul E. McKenney
2025-07-17 19:56                                       ` Steven Rostedt
2025-07-17 20:38                                         ` Paul E. McKenney
2025-07-19  0:28                                 ` [RFC PATCH 1/2] rcu: Add rcu_read_lock_notrace() Paul E. McKenney
2025-06-13 15:22 ` [RFC PATCH 2/2] trace: Use rcu_read_lock() instead preempt_disable() Sebastian Andrzej Siewior
2025-06-13 15:38 ` [RFC PATCH 0/2] Switch tracing from sched-RCU to preempt-RCU Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62d91ce9-22b3-435f-b34a-cc2a65ce3b39@paulmck-laptop \
    --to=paulmck@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=qiang.zhang@linux.dev \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).