public inbox for linux-next@vger.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Linux Next Mailing List <linux-next@vger.kernel.org>
Subject: Re: linux-next: manual merge of the rcu tree with the ftrace tree
Date: Fri, 14 Nov 2025 18:02:32 +0100	[thread overview]
Message-ID: <20251114170232.wHJFS35i@linutronix.de> (raw)
In-Reply-To: <20251114114828.5b7d4fe8@gandalf.local.home>

On 2025-11-14 11:48:28 [-0500], Steven Rostedt wrote:
> On Fri, 14 Nov 2025 17:33:30 +0100
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > > Where in PREEMPT_RT we do not disable preemption around the tracepoint
> > > callback, but in non RT we do. Instead it uses a srcu and migrate disable.  
> > 
> > I appreciate the effort. I really do. But why can't we have SRCU on both
> > configs?
> 
> I don't know. Is there more overhead with disabling migration than
> disabling preemption?

On the first and last invocation, yes. But we if disabling migration is
not required for SRCU then why doing it?

We had the disabled preemption due to rcu_read_lock_sched() due to
tracepoint requirement which was not spelled out. This appears to be
replaced with srcu_fast(). I just don't see why we need two flavours
here (RT vs !RT) and where migrate_disable() requirement is from.

> > 
> > Also why does tracepoint_guard() need to disable migration? The BPF
> > program already disables migrations (see for instance
> > bpf_prog_run_array()).
> 
> We also would need to audit all tracepoint callbacks, as there may be some
> assumptions about staying on the same CPU.

Sure. Okay. What would I need to grep for in order to audit it?

> > This is true for RT and !RT. So there is no need to do it here.
> > 
> > > The migrate_disable in the syscall tracepoint (which gets called by the
> > > system call version that doesn't disable migration, even in RT), needs to
> > > disable migration so that the accounting that happens in:
> > > 
> > >   trace_event_buffer_reserve()
> > > 
> > > matches what happens when that function gets called by a normal tracepoint
> > > callback.  
> > 
> > buh. But this is something. If we know that the call chain does not
> > disable migration, couldn't we just use a different function? I mean we
> > have tracing_gen_ctx_dec() and tracing_gen_ctx)(). Wouldn't this work
> > for migrate_disable(), too? 
> > Just in case we need it and can not avoid it, see above.
> 
> I thought about that too. It would then create two different
> trace_event_buffer_reserve():
> 
> static __always_inline void *event_buffer_reserve(struct trace_event_buffer *fbuffer,
> 						  struct trace_event_file *trace_file,
> 						  unsigned long len, bool dec)
> {
> 	struct trace_event_call *event_call = trace_file->event_call;
> 
> 	if ((trace_file->flags & EVENT_FILE_FL_PID_FILTER) &&
> 	    trace_event_ignore_this_pid(trace_file))
> 		return NULL;
> 
> 	/*
> 	 * If CONFIG_PREEMPTION is enabled, then the tracepoint itself disables
> 	 * preemption (adding one to the preempt_count). Since we are
> 	 * interested in the preempt_count at the time the tracepoint was
> 	 * hit, we need to subtract one to offset the increment.
> 	 */
> 	fbuffer->trace_ctx = dec ? tracing_gen_ctx_dec() : tracing_gen_ctx();
> 	fbuffer->trace_file = trace_file;
> 
> 	fbuffer->event =
> 		trace_event_buffer_lock_reserve(&fbuffer->buffer, trace_file,
> 						event_call->event.type, len,
> 						fbuffer->trace_ctx);
> 	if (!fbuffer->event)
> 		return NULL;
> 
> 	fbuffer->regs = NULL;
> 	fbuffer->entry = ring_buffer_event_data(fbuffer->event);
> 	return fbuffer->entry;
> }
> 
> void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
> 				 struct trace_event_file *trace_file,
> 				 unsigned long len)
> {
> 	return event_buffer_reserve(fbuffer, trace_file, len, true);
> }
> 
> void *trace_syscall_event_buffer_reserve(struct trace_event_buffer *fbuffer,
> 					 struct trace_event_file *trace_file,
> 					 unsigned long len)
> {
> 	return event_buffer_reserve(fbuffer, trace_file, len, false);
> }
> 
> Hmm

Yeah. I *think* in the preempt case we always use the one or the other.

So I would prefer this instead of explicitly disable migration so the a
function down in the stack can decrement the counter again.
Ideally, we don't disable migration to begin with.

_If_ the BPF program disables migrations before invocation of its
program then any trace recording that happens within this program
_should_ record the migration counter at that time. Which would be 1 at
the minimum.

> -- Steve

Sebastian

  reply	other threads:[~2025-11-14 17:02 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-14  2:52 linux-next: manual merge of the rcu tree with the ftrace tree Stephen Rothwell
2025-11-14 12:42 ` Steven Rostedt
2025-11-14 13:35   ` Sebastian Andrzej Siewior
2025-11-14 15:46     ` Steven Rostedt
2025-11-14 16:00       ` Sebastian Andrzej Siewior
2025-11-14 16:22         ` Steven Rostedt
2025-11-14 16:33           ` Sebastian Andrzej Siewior
2025-11-14 16:48             ` Steven Rostedt
2025-11-14 17:02               ` Sebastian Andrzej Siewior [this message]
2025-11-14 17:11                 ` Steven Rostedt
2025-11-14 17:00             ` Paul E. McKenney
2025-11-14 17:10               ` Sebastian Andrzej Siewior
2025-11-14 17:25                 ` Paul E. McKenney
2025-11-14 17:40                   ` Steven Rostedt
2025-11-14 17:41                   ` Sebastian Andrzej Siewior
2025-11-14 18:26                     ` Paul E. McKenney
2025-11-14 14:48   ` Frederic Weisbecker
2025-11-14 16:01     ` Steven Rostedt
2025-11-14 17:06     ` Paul E. McKenney
2025-11-14 18:58       ` Paul E. McKenney
2025-11-18 13:05       ` Frederic Weisbecker
2025-11-18 15:04         ` Paul E. McKenney
2025-12-02  0:57           ` Paul E. McKenney
2025-12-07 20:43             ` Paul E. McKenney
2025-12-08  0:17               ` Steven Rostedt
2025-12-08  4:21                 ` Paul E. McKenney
2025-11-14 17:05   ` Paul E. McKenney
2025-11-14 18:31     ` Yonghong Song
2025-11-18  7:35       ` Sebastian Andrzej Siewior
2025-11-18 15:05         ` Paul E. McKenney
2025-11-30 18:49         ` Yonghong Song
2025-11-19  0:38     ` Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2025-11-06  1:24 Stephen Rothwell
2017-05-01  3:18 Stephen Rothwell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251114170232.wHJFS35i@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sfr@canb.auug.org.au \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox