Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* Re: [PATCH v2] selftests/ftrace: Fix trace_marker_raw test on 64K page kernels
From: Tianchen Ding @ 2026-05-29  2:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Shuah Khan, linux-kernel,
	linux-trace-kernel, linux-kselftest
In-Reply-To: <20260528091348.71ae3aa3@fedora>



On 5/28/26 9:13 PM, Steven Rostedt wrote:
> On Thu, 28 May 2026 10:24:17 +0800
> Tianchen Ding <dtcccc@linux.alibaba.com> wrote:
> 
>>
>> diff --git a/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc
>> index 8e905d4fe6dd..f68f1901f65f 100644
>> --- a/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc
>> +++ b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc
>> @@ -43,8 +43,11 @@ write_buffer() {
>>   	id=$1
>>   	size=$2
>>   
>> -	# write the string into the raw marker
>> -	make_str $id $size > trace_marker_raw
>> +	# Pipe through dd to ensure a single atomic write() syscall
>> +	# on architectures with 64K pages, where shell's printf builtin
>> +	# uses stdio buffering which may split the output into multiple
>> +	# writes.
>> +	make_str $id $size | dd of=trace_marker_raw bs=`expr $size + 4` iflag=fullblock
> 
> I was looking at this more, and I'm not comfortable with the hard coded
> 4 above. I rather use the length of the string. Something like:
> 
> 	str=`make_str $id $size`
> 	len=${#str}
> 	echo "$str" | dd of=trace_marker_raw bs=$len iflag=fullblock
> 
> -- Steve
> 

Capturing make_str output into a shell variable doesn't work because make_str 
outputs raw binary that may contain NUL bytes, and shell command substitution 
silently strips them.

However, the val variable inside make_str doesn't hold actual NUL bytes — it 
holds the text of escape sequences (e.g., the literal characters 
\003\000\000\000). The binary conversion only happens at the final printf 
"${val}${data}".

We can take advantage of this by having make_str return the escape-sequence text 
instead of binary, and letting write_buffer handle the conversion:

   make_str() {
         ...
         printf '%s' "${val}${data}"
   }

   write_buffer() {
         id=$1
         size=$2

         str=`make_str $id $size`
         len=$(printf "$str" | wc -c)
         printf "$str" | dd of=trace_marker_raw bs=$len iflag=fullblock
   }

This way str holds only printable escape-sequence text (no NUL), printf "$str" 
converts it to real binary through the pipe, and wc -c measures the true binary 
length.

>>   }
>>   
>>   


^ permalink raw reply

* Re: [PATCH] rethook: Use tsk->on_cpu to check task execution state
From: Tengda Wu @ 2026-05-29  3:39 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Mathieu Desnoyers, Alexei Starovoitov,
	linux-trace-kernel, linux-kernel
In-Reply-To: <20260526123719.482f07a3843e207e22d95378@kernel.org>

Hi Masami,

thanks for the review and feedback.

On 2026/5/26 11:37, Masami Hiramatsu wrote:
> On Mon, 25 May 2026 21:22:53 +0800
> Tengda Wu <wutengda@huaweicloud.com> wrote:
> 
>> When a task calls schedule() to yield the CPU, its state remains
>> TASK_RUNNING, but its stack is frozen and safe to walk.
>>
>> Replace task_is_running(tsk) with tsk->on_cpu to avoid overly
>> conservative rejections.
> 
> Please see the Sashiko's comment.
> 
> https://sashiko.dev/#/patchset/20260525132253.1889726-1-wutengda%40huaweicloud.com
> 
> When calling Unwind on a task other than the current, IMHO, it is
> the responsibility of the caller of this function to ensure that the
> stack trace of that task is safe.

Agree.

> We also should not use tsk->on_cpu, but should use task_on_cpu(tsk).
> 
> BTW, should task_on_cpu() use READ_ONCE() etc?
> wait_task_inactive() seems a bit fragile.
> 
> Thanks,
> 

It seems that using task_on_cpu() is not necessary here because:

1. It requires an additional 'rq' parameter not available in the rethook context.
2. It just returns p->on_cpu, which is identical to our current use of tsk->on_cpu.

/* file: kernel/sched/sched.h */
static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
{
	return p->on_cpu;
}

Given these constraints, staying with tsk->on_cpu seems more straightforward
for the rethook context.

Thanks,
Tengda

>>
>> Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
>> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
>> ---
>>  kernel/trace/rethook.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
>> index 5a8bdf88999a..bd5e5f455e85 100644
>> --- a/kernel/trace/rethook.c
>> +++ b/kernel/trace/rethook.c
>> @@ -250,7 +250,7 @@ unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame
>>  	if (WARN_ON_ONCE(!cur))
>>  		return 0;
>>  
>> -	if (tsk != current && task_is_running(tsk))
>> +	if (tsk != current && tsk->on_cpu)
>>  		return 0;
>>  
>>  	do {
>> -- 
>> 2.34.1
>>
>>
> 
> 


^ permalink raw reply

* Re: [PATCH v6] tracing/eprobes: Allow use of BTF names to dereference pointers
From: Masami Hiramatsu @ 2026-05-29  4:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux trace kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Mark Rutland, Peter Zijlstra, Namhyung Kim, Takaya Saeki,
	Douglas Raillard, Tom Zanussi, Andrew Morton, Thomas Gleixner,
	Ian Rogers, Jiri Olsa
In-Reply-To: <20260521225033.56458336@fedora>

On Thu, 21 May 2026 22:50:33 -0400
Steven Rostedt <rostedt@kernel.org> wrote:

> +	if (ctx->flags & TPARG_FL_TEVENT) {
> +		int ret;
> +
> +		ret = parse_trace_event(varname, code, ctx);
> +		if (ret < 0)
> +			return ret;
> +
> +		if (ctx->flags & TPARG_FL_TYPECAST) {
> +			type = ctx->last_struct;
> +			goto found_type;
> +		}
> +		return 0;

Here is a bit complicated but a buggy case.

parse_btf_arg() is not used for eprobe arguments because those
requires '$' prefix. However, only if it has a typecast, this
parse_btf_arg() is called for eprobes.

Thus, we should report a kernel bug if !ctx->flags & TPARG_FL_TYPECAST
here. Something like this:

		if (WARN_ONCE(!(ctx->flags & TPARG_FL_TYPECAST)))
			return -EINVAL;
		type = ctx->last_struct;
		goto found_type;

Thanks,
-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v2 02/12] rv: Fix read_lock scope in per-task DA cleanup
From: Gabriele Monaco @ 2026-05-29  6:08 UTC (permalink / raw)
  To: Nam Cao; +Cc: Wen Yang, linux-kernel, Steven Rostedt, linux-trace-kernel
In-Reply-To: <87ldd3orft.fsf@yellow.woof>

On Thu, 2026-05-28 at 10:43 +0200, Nam Cao wrote:
> Gabriele Monaco <gmonaco@redhat.com> writes:
> > The da_monitor_reset_all() function for per-task monitors takes
> > tasklist_lock while iterating over tasks, then keeps it also while
> > iterating over idle tasks (one per CPU). The latter is not
> > necessary
> > since the lock needs to guard only for_each_process_thread().
> > 
> > Use a scoped_guard for more compact syntax and adjust the scope
> > only
> > where the lock is necessary.
> > 
> > Fixes: 30984ccf31b7f ("rv: Refactor da_monitor to minimise macros")
> > Fixes: 8259cb14a7068 ("rv: Reset per-task monitors also for idle
> > tasks")
> 
> Fixes: tag "indicates that the patch fixes a bug in a previous
> commit". There is no bug here, so I don't think Fixes tags are
> applicable.

Yeah good point, that isn't a real bug.. We're just holding a lock for
a bit too long but there's no harm in that. Will remove the tags.

Thanks,
Gabriele

> 
> > Reviewed-by: Wen Yang <wen.yang@linux.dev>
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> 
> Reviewed-by: Nam Cao <namcao@linutronix.de>


^ permalink raw reply

* Re: [PATCH v2] unwind: Add sframe_(un)register() system calls
From: Heiko Carstens @ 2026-05-29  8:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, bpf, Masami Hiramatsu,
	Mathieu Desnoyers, Jens Remus, Josh Poimboeuf, Peter Zijlstra,
	Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
	Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
	Kees Cook, Carlos O'Donell, Sam James, Dylan Hatch,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Vasily Gorbik,
	Thomas Weißschuh
In-Reply-To: <20260528151023.00f5ec4e@gandalf.local.home>

On Thu, May 28, 2026 at 03:10:23PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> Add system calls to register and unregister sframes that can be used by
> dynamic linkers to tell the kernel where the sframe section is in memory
> for libraries it loads.

...

> diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
> index 09a7ef04d979..52519e2acdc8 100644
> --- a/arch/s390/kernel/syscalls/syscall.tbl
> +++ b/arch/s390/kernel/syscalls/syscall.tbl
> @@ -398,3 +398,6 @@
>  469	common	file_setattr			sys_file_setattr
>  470	common	listns				sys_listns
>  471	common	rseq_slice_yield		sys_rseq_slice_yield
> +472	common	stacktrace_setup		sys_stacktrace_setup
> +472	common	sframe_register			sys_sframe_register
> +473	common	sframe_unregister		sys_sframe_unregister

What is stacktrace_setup? And why only for s390? Looks like a leftover.

^ permalink raw reply

* Re: [PATCH v2] selftests/ftrace: Fix trace_marker_raw test on 64K page kernels
From: Steven Rostedt @ 2026-05-29 13:15 UTC (permalink / raw)
  To: Tianchen Ding
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Shuah Khan, linux-kernel,
	linux-trace-kernel, linux-kselftest
In-Reply-To: <bcb52d91-0440-4e73-86af-997e8b723711@linux.alibaba.com>

On Fri, 29 May 2026 10:59:34 +0800
Tianchen Ding <dtcccc@linux.alibaba.com> wrote:

> We can take advantage of this by having make_str return the escape-sequence text 
> instead of binary, and letting write_buffer handle the conversion:
> 
>    make_str() {
>          ...
>          printf '%s' "${val}${data}"
>    }
> 
>    write_buffer() {
>          id=$1
>          size=$2
> 
>          str=`make_str $id $size`
>          len=$(printf "$str" | wc -c)
>          printf "$str" | dd of=trace_marker_raw bs=$len iflag=fullblock
>    }
> 
> This way str holds only printable escape-sequence text (no NUL), printf "$str" 
> converts it to real binary through the pipe, and wc -c measures the true binary 
> length.

This is quite hacky, but at least it removes the hardcoded assumptions.

OK, you can send a v3 that does that.

Thanks,

-- Steve

^ permalink raw reply

* [GIT PULL] RTLA changes for 7.2
From: Tomas Glozar @ 2026-05-29 13:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Costa Shulyupin, Crystal Wood, LKML, linux-trace-kernel,
	Tomas Glozar

Steven,

The following changes since commit 5200f5f493f79f14bbdc349e402a40dfb32f23c8:

  Linux 7.1-rc4 (2026-05-17 13:59:58 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/tglozar/linux.git tags/rtla-v7.2

for you to fetch changes up to db956bcf8d681b5a01ebe04c79f6a7b29b9934f9:

  rtla: Document tests in README (2026-05-29 09:40:54 +0200)

----------------------------------------------------------------
RTLA patches for v7.2

- Fix discrepancy in --dump-tasks option

Due to a mistake, rtla-timerlat-hist used the CLI syntax "--dump-task"
instead of the documented "--dump-tasks". Change the option to match
both documentation and the other timerlat tool, rtla-timerlat-top.

- Extend coverage of runtime tests

Cover both top and hist tools in all applicable test cases, add tests
for a few uncovered options, and extend checks for some existing tests.

- Add unit tests for actions

rtla's actions feature is implemented in its source file and contains
non-trivial parsing logic. Cover it with unit tests.

- Stop record trace on interrupt

Fix a bug where an interval exists after receiving a signal in which
the main instance is stopped but the record instance is not, leading to
discrepancies in reported results and sometimes rtla hanging.

- Restore continue flag in actions_perform()

Fix a bug where rtla always continues tracing after hitting a threshold
even if the continue action was triggered just once, and add tests
verifying that the flag is reset properly.

- Migrate command line interface to libsubcmd

Replace rtla's argument parsing using getopt_long() with libsubcmd, used
by perf and objtool, to reuse existing code and auto-generate better
help messages. Extensive unit tests are included to detect regressions.

- Add -A/--aligned option to timerlat tools

Add an option to align timerlat threads, based on the recently
introduced TIMERLAT_ALIGN option of the timerlat tracer, together with
unit tests and documentation.

- Document tests in README

Document how to run unit and runtime tests in rtla's README.txt,
including the dependencies needed to run them.

---

Two of the commits:
- 534d9a93dbff2 tools subcmd: support optarg as separate argument
- da62fc3458462 tools subcmd: allow parsing distinct --opt and --no-opt
do minor changes to libsubcmd code needed for rtla. libsubcmd does not
have a MAINTAINERS entry, but generally follows perf commit message
style, so I also followed that instead of the tracing subsystem style.

The tag was built and tested (make && make unit-tests && sudo make
check) on 7.1-rc5 kernel, the same was also done after test-merge into
next-20260528. No new issues were found.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>

----------------------------------------------------------------
Costa Shulyupin (1):
      tools/rtla: Fix --dump-tasks usage in timerlat

Crystal Wood (1):
      rtla: Stop the record trace on interrupt

Tomas Glozar (24):
      rtla/tests: Cover both top and hist tools where possible
      rtla/tests: Add get_workload_pids() helper
      rtla/tests: Check -c/--cpus thread affinity
      rtla/tests: Use negative match when testing --aa-only
      rtla/tests: Extend timerlat top --aa-only coverage
      rtla/tests: Cover all hist options in runtime tests
      rtla/tests: Add runtime test for -H/--house-keeping
      rtla/tests: Add runtime test for -k and -u options
      rtla/tests: Add runtime tests for -C/--cgroup
      rtla/tests: Add unit tests for actions module
      rtla/actions: Restore continue flag in actions_perform()
      rtla/tests: Add unit test for restoring continue flag
      rtla/tests: Run runtime tests in temporary directory
      rtla/tests: Add runtime tests for restoring continue flag
      rtla: Add libsubcmd dependency
      tools subcmd: support optarg as separate argument
      tools subcmd: allow parsing distinct --opt and --no-opt
      rtla: Parse cmdline using libsubcmd
      rtla/tests: Add unit tests for _parse_args() functions
      rtla/tests: Add unit tests for CLI option callbacks
      rtla/timerlat: Add -A/--aligned CLI option
      rtla/tests: Add unit tests for -A/--aligned option
      Documentation/rtla: Add -A/--aligned option
      rtla: Document tests in README

 Documentation/tools/rtla/common_appendix.txt       |   7 +-
 .../tools/rtla/common_timerlat_options.txt         |  11 +
 tools/lib/subcmd/parse-options.c                   |  63 +-
 tools/lib/subcmd/parse-options.h                   |   4 +
 tools/tracing/rtla/.gitignore                      |   3 +
 tools/tracing/rtla/Makefile                        |  66 +-
 tools/tracing/rtla/README.txt                      |  30 +
 tools/tracing/rtla/src/Build                       |   2 +-
 tools/tracing/rtla/src/actions.c                   |   2 +
 tools/tracing/rtla/src/cli.c                       | 539 +++++++++++++++
 tools/tracing/rtla/src/cli.h                       |   9 +
 tools/tracing/rtla/src/cli_p.h                     | 687 ++++++++++++++++++++
 tools/tracing/rtla/src/common.c                    | 128 +---
 tools/tracing/rtla/src/common.h                    |  36 +-
 tools/tracing/rtla/src/osnoise.c                   | 158 ++++-
 tools/tracing/rtla/src/osnoise.h                   |   6 +
 tools/tracing/rtla/src/osnoise_hist.c              | 221 +------
 tools/tracing/rtla/src/osnoise_top.c               | 200 +-----
 tools/tracing/rtla/src/rtla.c                      |  89 ---
 tools/tracing/rtla/src/timerlat.c                  |  29 +-
 tools/tracing/rtla/src/timerlat.h                  |   8 +-
 tools/tracing/rtla/src/timerlat_hist.c             | 317 +--------
 tools/tracing/rtla/src/timerlat_top.c              | 285 +-------
 tools/tracing/rtla/src/utils.c                     |  28 +-
 tools/tracing/rtla/src/utils.h                     |   9 +-
 tools/tracing/rtla/tests/engine.sh                 |  27 +
 tools/tracing/rtla/tests/hwnoise.t                 |   2 +-
 tools/tracing/rtla/tests/osnoise.t                 |  77 ++-
 .../rtla/tests/scripts/check-cgroup-match.sh       |  17 +
 tools/tracing/rtla/tests/scripts/check-cpus.sh     |   9 +
 .../rtla/tests/scripts/check-housekeeping-cpus.sh  |   4 +
 tools/tracing/rtla/tests/scripts/check-priority.sh |   8 +-
 .../tests/scripts/check-user-kernel-threads.sh     |  16 +
 .../rtla/tests/scripts/lib/get_workload_pids.sh    |  11 +
 tools/tracing/rtla/tests/timerlat.t                | 117 ++--
 tools/tracing/rtla/tests/unit/Build                |   8 +-
 tools/tracing/rtla/tests/unit/Makefile.unit        |   6 +-
 tools/tracing/rtla/tests/unit/actions.c            | 393 +++++++++++
 tools/tracing/rtla/tests/unit/cli_opt_callback.c   | 716 ++++++++++++++++++++
 tools/tracing/rtla/tests/unit/cli_params_assert.h  |  68 ++
 tools/tracing/rtla/tests/unit/osnoise_hist_cli.c   | 557 ++++++++++++++++
 tools/tracing/rtla/tests/unit/osnoise_top_cli.c    | 503 ++++++++++++++
 tools/tracing/rtla/tests/unit/timerlat_hist_cli.c  | 722 +++++++++++++++++++++
 tools/tracing/rtla/tests/unit/timerlat_top_cli.c   | 654 +++++++++++++++++++
 tools/tracing/rtla/tests/unit/unit_tests.c         | 120 +---
 tools/tracing/rtla/tests/unit/utils.c              | 106 +++
 46 files changed, 5591 insertions(+), 1487 deletions(-)
 create mode 100644 tools/tracing/rtla/src/cli.c
 create mode 100644 tools/tracing/rtla/src/cli.h
 create mode 100644 tools/tracing/rtla/src/cli_p.h
 delete mode 100644 tools/tracing/rtla/src/rtla.c
 create mode 100755 tools/tracing/rtla/tests/scripts/check-cgroup-match.sh
 create mode 100755 tools/tracing/rtla/tests/scripts/check-cpus.sh
 create mode 100755 tools/tracing/rtla/tests/scripts/check-housekeeping-cpus.sh
 create mode 100755 tools/tracing/rtla/tests/scripts/check-user-kernel-threads.sh
 create mode 100644 tools/tracing/rtla/tests/scripts/lib/get_workload_pids.sh
 create mode 100644 tools/tracing/rtla/tests/unit/actions.c
 create mode 100644 tools/tracing/rtla/tests/unit/cli_opt_callback.c
 create mode 100644 tools/tracing/rtla/tests/unit/cli_params_assert.h
 create mode 100644 tools/tracing/rtla/tests/unit/osnoise_hist_cli.c
 create mode 100644 tools/tracing/rtla/tests/unit/osnoise_top_cli.c
 create mode 100644 tools/tracing/rtla/tests/unit/timerlat_hist_cli.c
 create mode 100644 tools/tracing/rtla/tests/unit/timerlat_top_cli.c
 create mode 100644 tools/tracing/rtla/tests/unit/utils.c


^ permalink raw reply

* Re: [PATCH v6] tracing/eprobes: Allow use of BTF names to dereference pointers
From: Steven Rostedt @ 2026-05-29 13:19 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: LKML, Linux trace kernel, Mathieu Desnoyers, Mark Rutland,
	Peter Zijlstra, Namhyung Kim, Takaya Saeki, Douglas Raillard,
	Tom Zanussi, Andrew Morton, Thomas Gleixner, Ian Rogers,
	Jiri Olsa
In-Reply-To: <20260529132508.2e98cab925fdb1fa7be21a9b@kernel.org>

On Fri, 29 May 2026 13:25:08 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> Thus, we should report a kernel bug if !ctx->flags & TPARG_FL_TYPECAST
> here. Something like this:
> 
> 		if (WARN_ONCE(!(ctx->flags & TPARG_FL_TYPECAST)))
> 			return -EINVAL;
> 		type = ctx->last_struct;
> 		goto found_type;

OK, will update it in v7.

Thanks,

-- Steve


^ permalink raw reply

* Re: [PATCH v2] unwind: Add sframe_(un)register() system calls
From: Steven Rostedt @ 2026-05-29 13:31 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: LKML, Linux Trace Kernel, bpf, Masami Hiramatsu,
	Mathieu Desnoyers, Jens Remus, Josh Poimboeuf, Peter Zijlstra,
	Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
	Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
	Kees Cook, Carlos O'Donell, Sam James, Dylan Hatch,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Vasily Gorbik,
	Thomas Weißschuh
In-Reply-To: <20260529082840.26496A2c-hca@linux.ibm.com>

On Fri, 29 May 2026 10:28:40 +0200
Heiko Carstens <hca@linux.ibm.com> wrote:

> > diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
> > index 09a7ef04d979..52519e2acdc8 100644
> > --- a/arch/s390/kernel/syscalls/syscall.tbl
> > +++ b/arch/s390/kernel/syscalls/syscall.tbl
> > @@ -398,3 +398,6 @@
> >  469	common	file_setattr			sys_file_setattr
> >  470	common	listns				sys_listns
> >  471	common	rseq_slice_yield		sys_rseq_slice_yield
> > +472	common	stacktrace_setup		sys_stacktrace_setup
> > +472	common	sframe_register			sys_sframe_register
> > +473	common	sframe_unregister		sys_sframe_unregister  
> 
> What is stacktrace_setup? And why only for s390? Looks like a leftover.

Oops! Yes it's a left over. Thanks for noticing.

-- Steve

^ permalink raw reply

* Re: [GIT PULL] RTLA changes for 7.2
From: Steven Rostedt @ 2026-05-29 13:56 UTC (permalink / raw)
  To: Tomas Glozar; +Cc: Costa Shulyupin, Crystal Wood, LKML, linux-trace-kernel
In-Reply-To: <20260529130643.3080315-1-tglozar@redhat.com>

On Fri, 29 May 2026 15:06:43 +0200
Tomas Glozar <tglozar@redhat.com> wrote:

> - Fix discrepancy in --dump-tasks option
> 
> Due to a mistake, rtla-timerlat-hist used the CLI syntax "--dump-task"
> instead of the documented "--dump-tasks". Change the option to match
> both documentation and the other timerlat tool, rtla-timerlat-top.

Is there any concern that scripts might be using the old option? 

I wonder if you should keep the old option for backward compatibility,
but do not document that it exists.

-- Steve

^ permalink raw reply

* Re: [PATCH v4 1/2] serial: qcom-geni: trace: Add tracepoint support for Qualcomm GENI serial
From: Steven Rostedt @ 2026-05-29 14:14 UTC (permalink / raw)
  To: Praveen Talari
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Greg Kroah-Hartman,
	Jiri Slaby, konrad.dybcio, linux-kernel, linux-trace-kernel,
	linux-arm-msm, linux-serial, mukesh.savaliya, aniket.randive,
	chandana.chiluveru
In-Reply-To: <20260526-add-tracepoints-for-qcom-geni-serial-v4-1-e94fbaec0232@oss.qualcomm.com>

On Tue, 26 May 2026 23:07:39 +0530
Praveen Talari <praveen.talari@oss.qualcomm.com> wrote:

> +DECLARE_EVENT_CLASS(geni_serial_data,
> +		    TP_PROTO(struct device *dev, const u8 *buf, unsigned int len),
> +		    TP_ARGS(dev, buf, len),
> +
> +		    TP_STRUCT__entry(__string(name, dev_name(dev))
> +				     __field(unsigned int, len)
> +				     __dynamic_array(u8, data, len)
> +		    ),
> +
> +		    TP_fast_assign(__assign_str(name);
> +				   __entry->len = len;
> +				   memcpy(__get_dynamic_array(data), buf, len);
> +		    ),
> +
> +		    TP_printk("%s: len=%u data=%s",
> +			      __get_str(name), __entry->len,
> +			      __print_hex(__get_dynamic_array(data), __entry->len))
> +);

No need to save the length of the dynamic array in __entry->len because
it's already saved in the metadata of the dynamic array that is stored
on the buffer. Instead you can have:

DECLARE_EVENT_CLASS(geni_serial_data,
		    TP_PROTO(struct device *dev, const u8 *buf, unsigned int len),
		    TP_ARGS(dev, buf, len),

		    TP_STRUCT__entry(__string(name, dev_name(dev))
				     __dynamic_array(u8, data, len)
		    ),

		    TP_fast_assign(__assign_str(name);
				   memcpy(__get_dynamic_array(data), buf, len);
		    ),

		    TP_printk("%s: len=%u data=%s",
			      __get_str(name), __entry->len,
			      __print_hex(__get_dynamic_array(data),
					__get_dynamic_array_len(data)))
);

That will save you 4 bytes per event on the ring buffer. And a few
cycles not having to store the redundant information.

-- Steve

^ permalink raw reply

* Re: [PATCH] selftests/ftrace: Drop invalid top-level local in test_ownership
From: Steven Rostedt @ 2026-05-29 14:24 UTC (permalink / raw)
  To: CaoRuichuang, Shuah Khan
  Cc: mhiramat, mathieu.desnoyers, shuah, linux-kernel,
	linux-trace-kernel, linux-kselftest
In-Reply-To: <20260513110208.3c14878b@gandalf.local.home>


Shuah,

Ping?

-- Steve


On Wed, 13 May 2026 11:02:08 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> Shuah,
> 
> Can you pull this into your urgent branch?
> 
> Thanks,
> 
> -- Steve
> 
> 
> On Tue, 7 Apr 2026 20:37:27 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Shuah,
> > 
> > Care to take this through your tree. Probably could even add:
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: 8b55572e51805 ("tracing/selftests: Add tracefs mount options test")
> > 
> > As well as:
> > 
> > Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > 
> > -- Steve
> > 
> > 
> > On Tue,  7 Apr 2026 18:26:13 +0800
> > CaoRuichuang <create0818@163.com> wrote:
> >   
> > > From: Cao Ruichuang <create0818@163.com>
> > > 
> > > test_ownership.tc is sourced by ftracetest under /bin/sh.
> > > 
> > > The script currently declares mount_point with local at file scope,
> > > which makes /bin/sh abort with "local: not in a function" before the
> > > test can reach the eventfs ownership checks.
> > > 
> > > Replace the top-level local declaration with a normal shell variable so
> > > kernels that support the gid= tracefs mount option can run the test at
> > > all.
> > > 
> > > Signed-off-by: Cao Ruichuang <create0818@163.com>
> > > ---
> > >  tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
> > > index e71cc3ad0..6d00d3c0f 100644
> > > --- a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
> > > +++ b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
> > > @@ -6,7 +6,7 @@
> > >  original_group=`stat -c "%g" .`
> > >  original_owner=`stat -c "%u" .`
> > >  
> > > -local mount_point=$(get_mount_point)
> > > +mount_point=$(get_mount_point)
> > >  
> > >  mount_options=$(get_mnt_options "$mount_point")
> > >      
> >   
> 


^ permalink raw reply

* Re: [RFC PATCH] trace: Introduce a new filter_pred "caller"
From: Steven Rostedt @ 2026-05-29 14:38 UTC (permalink / raw)
  To: Chen Jun
  Cc: Masami Hiramatsu (Google), mathieu.desnoyers, linux-kernel,
	linux-trace-kernel
In-Reply-To: <20260514131901.8c94136f6fede18c608c8a55@kernel.org>


Hi Chen,

Do you plan on sending updates to address the comments that Masami and
I have made?

-- Steve


On Thu, 14 May 2026 13:19:01 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> On Wed, 13 May 2026 12:40:17 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Tue, 12 May 2026 08:47:50 +0900
> > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> >   
> > > On Fri, 8 May 2026 20:26:23 +0800
> > > Chen Jun <chenjun102@huawei.com> wrote:
> > >   
> > > > Low-level functions have many call paths, and sometimes
> > > > we only care about the calls on a specific call path.
> > > > Add a new filter to filter based on the call stack.
> > > > 
> > > > Usage:
> > > > 1. echo 'caller=="$function_name"' > events/../filter    
> > > 
> > > Thanks for interesting idea :)
> > > 
> > > BTW, we already have "stacktrace". Since this actually checks
> > > stacktrace, not caller, so I think we should reuse it.
> > > Also, I think OP_GLOB is more suitable for this case.
> > > (and more useful)  
> > 
> > Actually, it's not a stack trace, it's a function that is called from other
> > functions. But since "caller" sounds like a direct called function (stack
> > trace of the first instance), I think perhaps it should be "called_within" or
> > something similar. :-/  
> 
> Yeah, what about "callers"?
> 
> > 
> > Also, OP_GLOB can't work because it only works for a single function. At
> > the time of parsing, it finds the function (and should probably error out
> > if there's more than one function with a given name). It then records the
> > start and end address of the function so it only needs to find if one of
> > the entries in the stack trace is between the start and end of the function.  
> 
> Ah, OK. It is just comparing address, not name.
> 
> > 
> > I don't think this is possible with GLOB. We don't want to do a search of
> > the functions when the event is triggered.  
> 
> Agreed.
> 
> Thanks,
> 
> > 
> > -- Steve  
> 
> 


^ permalink raw reply

* Re: [RFC PATCH 05/10] rcu: Enable RCU callbacks to benefit from expedited grace periods
From: Frederic Weisbecker @ 2026-05-29 14:39 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: rcu, linux-kernel, linux-trace-kernel, Paul E. McKenney,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
In-Reply-To: <20260417231203.785172-6-puranjay@kernel.org>

On Fri, Apr 17, 2026 at 04:11:53PM -0700, Puranjay Mohan wrote:
> Currently, RCU callbacks only track normal grace period sequence
> numbers.  This means callbacks must wait for normal grace periods to
> complete even when expedited grace periods have already elapsed.
> 
> This commit uses the full rcu_gp_oldstate structure (which tracks both
> normal and expedited GP sequences) throughout the callback
> infrastructure.
> 
> The rcu_segcblist_advance() function now checks both normal and
> expedited GP completion via poll_state_synchronize_rcu_full(), becoming
> parameterless since it reads the GP state internally.
> rcu_segcblist_accelerate() stores the full GP state (both normal and
> expedited sequences) instead of just the normal sequence.
> 
> The rcu_accelerate_cbs() and rcu_accelerate_cbs_unlocked() functions use
> get_state_synchronize_rcu_full() to capture both GP sequences. The NOCB
> code uses poll_state_synchronize_rcu_full() for advance checks instead
> of comparing only the normal GP sequence.
> 
> srcu_segcblist_advance() become standalone implementations because
> compares SRCU sequences directly (it cannot use
> poll_state_synchronize_rcu_full(), which reads RCU-specific globals).
> srcu_segcblist_accelerate() sets rgos_exp to RCU_GET_STATE_NOT_TRACKED
> so poll_state_synchronize_rcu_full() only compares the rgosp->rgos_norm
> and ignores rgos_exp.
> 
> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
>  kernel/rcu/rcu_segcblist.c | 30 ++++++++++++++++++++++++------
>  kernel/rcu/rcu_segcblist.h |  2 +-
>  kernel/rcu/tree.c          |  9 +++------
>  kernel/rcu/tree_nocb.h     | 33 +++++++++++++++++++++++----------
>  4 files changed, 51 insertions(+), 23 deletions(-)
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 00e164db8b74..11174e2be3c2 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -12,6 +12,7 @@
>  #include <linux/kernel.h>
>  #include <linux/types.h>
>  
> +#include "rcu.h"
>  #include "rcu_segcblist.h"
>  
>  /* Initialize simple callback list. */
> @@ -494,9 +495,9 @@ static void rcu_segcblist_advance_compact(struct rcu_segcblist *rsclp, int i)
>  
>  /*
>   * Advance the callbacks in the specified rcu_segcblist structure based
> - * on the current value passed in for the grace-period counter.
> + * on the current value of the grace-period counter.
>   */
> -void rcu_segcblist_advance(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *rgosp)
> +void rcu_segcblist_advance(struct rcu_segcblist *rsclp)
>  {
>  	int i;
>  
> @@ -509,7 +510,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *
>  	 * are ready to invoke, and put them into the RCU_DONE_TAIL segment.
>  	 */
>  	for (i = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++) {
> -		if (ULONG_CMP_LT(rgosp->rgos_norm, rsclp->gp_seq_full[i].rgos_norm))
> +		if (!poll_state_synchronize_rcu_full(&rsclp->gp_seq_full[i]))

Ok that should work but it's worth noting two subtle changes:

- The sequence number to be compared against the segment snapshot one is now
  always from the root while it often used to be the one from the leaf node.
  It shouldn't matter I think especially as now the source of the segment
  snapshot is always either rcu_seq_snap() on the state or get_state_synchronize_rcu_full()
  and thus rcu_seq_snap() on the state as well.

- But poll_state_synchronize_rcu_full() does two full memory barriers. Hmm
  probably the one after reading the root state is necessary because we may not
  be holding the root node lock. But is the first smp_mb() necessary here?
  
Thanks.  

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply

* Re: [RFC PATCH 08/10] rcu: Detect expedited grace period completion in rcu_pending()
From: Frederic Weisbecker @ 2026-05-29 14:44 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: rcu, linux-kernel, linux-trace-kernel, Paul E. McKenney,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
In-Reply-To: <20260417231203.785172-9-puranjay@kernel.org>

On Fri, Apr 17, 2026 at 04:11:56PM -0700, Puranjay Mohan wrote:
> rcu_pending() is the gatekeeper that decides whether rcu_core() should
> run on the current CPU's timer tick. Currently it checks if the CPU has
> callbacks ready to invoke or a grace period has completed or started.
> 
> It does not check that an expedited GP has completed. After an expedited
> GP, callbacks remain in RCU_WAIT_TAIL (not yet advanced to
> RCU_DONE_TAIL) and So rcu_core() never runs to advance them.
> 
> Add a check using rcu_segcblist_nextgp() combined with
> poll_state_synchronize_rcu_full() to detect when any pending callbacks'
> grace period has completed.
> 
> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
>  kernel/rcu/tree.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 0e43866dc4cd..309273a37b0a 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3671,6 +3671,7 @@ EXPORT_SYMBOL_GPL(cond_synchronize_rcu_full);
>  static int rcu_pending(int user)
>  {
>  	bool gp_in_progress;
> +	struct rcu_gp_oldstate gp_state;
>  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>  	struct rcu_node *rnp = rdp->mynode;
>  
> @@ -3701,6 +3702,12 @@ static int rcu_pending(int user)
>  	    rcu_segcblist_ready_cbs(&rdp->cblist))
>  		return 1;
>  
> +	/* Has a GP (normal or expedited) completed for pending callbacks? */
> +	if (!rcu_rdp_is_offloaded(rdp) &&
> +	    rcu_segcblist_nextgp(&rdp->cblist, &gp_state) &&
> +	    poll_state_synchronize_rcu_full(&gp_state))
> +		return 1;

Do we need the overhead of at least one and at worst two full memory
barriers on every ticks that have pending callbacks? I think that there can be
a racy check here, some unordered version of poll_state_synchronize_rcu_full()
perhaps, and leave the ordering duty to rcu_core().

Thanks.

> +
>  	/* Has RCU gone idle with this CPU needing another grace period? */
>  	if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
>  	    !rcu_rdp_is_offloaded(rdp) &&
> -- 
> 2.52.0
> 

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply

* PATCH v7] tracing/eprobes: Allow use of BTF names to dereference pointers
From: Steven Rostedt @ 2026-05-29 15:04 UTC (permalink / raw)
  To: LKML, Linux trace kernel
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Mark Rutland, Peter Zijlstra,
	Namhyung Kim, Takaya Saeki, Douglas Raillard, Tom Zanussi,
	Andrew Morton, Thomas Gleixner, Ian Rogers, Jiri Olsa

From: Steven Rostedt <rostedt@goodmis.org>

Add syntax to the parsing of eprobes to be able to typecast a trace event
field that is a pointer to a structure.

Currently, a dereference must be a number, where the user has to figure
out manually the offset of a member of a structure that they want to
dereference.

But for event probes that records a field that happens to be a pointer to
a structure, it cannot dereference these values with BTF naming, but
must use numerical offsets.

For example, to find out what device a sk_buff is pointing to in the
net_dev_xmit trace event, one must first use gdb to find the offsets of the
members of the structures:

 (gdb) p &((struct sk_buff *)0)->dev
 $1 = (struct net_device **) 0x10
 (gdb) p &((struct net_device *)0)->name
 $2 = (char (*)[16]) 0x118

And then use the raw numbers to dereference:

  # echo 'e:xmit net.net_dev_xmit +0x118(+0x10($skbaddr)):string' >> dynamic_events

If BTF is in the kernel, then instead, the skbaddr can be typecast to
sk_buff and use the normal dereference logic.

  # echo 'e:xmit net.net_dev_xmit (sk_buff)skbaddr->dev->name:string' >> dynamic_events
  # echo 1 > events/eprobes/xmit/enable
  # cat trace
[..]
    sshd-session-1022    [000] b..2.   860.249343: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.250061: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.250142: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.263553: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.283820: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.302716: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.322905: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.342828: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.362268: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.382335: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.400856: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.419893: xmit: (net.net_dev_xmit) arg1="enp7s0"

The syntax is simply: (STRUCT)(FIELD)->MEMBER[->MEMBER..]

Also add comments around the #else and #endif of #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
to know what they are for.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v6: https://patch.msgid.link/20260521225033.56458336@fedora

- Set ctx->struct_btf to NULL when finished with it in handle_typecast()
  (Sashiko)

- Remove extra unneeded "ret" declaration (Masami Hiramatsu)

- Add a WARN_ON_ONCE() in parse_btf_arg for TEVENT being called without
  TYPECAST being set. (Masami Hiramatsu)

 Documentation/trace/eprobetrace.rst |   4 +
 kernel/trace/trace_probe.c          | 168 +++++++++++++++++++++++-----
 kernel/trace/trace_probe.h          |   7 +-
 3 files changed, 149 insertions(+), 30 deletions(-)

diff --git a/Documentation/trace/eprobetrace.rst b/Documentation/trace/eprobetrace.rst
index 89b5157cfab8..fe3602540569 100644
--- a/Documentation/trace/eprobetrace.rst
+++ b/Documentation/trace/eprobetrace.rst
@@ -46,6 +46,10 @@ Synopsis of eprobe_events
 		  (x8/x16/x32/x64), VFS layer common type(%pd/%pD), "char",
                   "string", "ustring", "symbol", "symstr" and "bitfield" are
                   supported.
+  (STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
+                  a pointer to STRUCT and then derference the pointer defined by
+                  ->MEMBER. Note that when this is used, the FIELD name does not
+                  need to be prefixed with a '$'.
 
 Types
 -----
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index e0d3a0da26af..9246e9c3d066 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -332,6 +332,25 @@ static int parse_trace_event_arg(char *arg, struct fetch_insn *code,
 	return -ENOENT;
 }
 
+static int parse_trace_event(char *arg, struct fetch_insn *code,
+			     struct traceprobe_parse_context *ctx)
+{
+	int ret;
+
+	if (code->data)
+		return -EFAULT;
+	ret = parse_trace_event_arg(arg, code, ctx);
+	if (!ret)
+		return 0;
+	if (strcmp(arg, "comm") == 0 || strcmp(arg, "COMM") == 0) {
+		code->op = FETCH_OP_COMM;
+		return 0;
+	}
+	/* backward compatibility */
+	ctx->offset = 0;
+	return -EINVAL;
+}
+
 #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
 
 static u32 btf_type_int(const struct btf_type *t)
@@ -376,11 +395,17 @@ static bool btf_type_is_char_array(struct btf *btf, const struct btf_type *type)
 		&& BTF_INT_BITS(intdata) == 8;
 }
 
+static struct btf *ctx_btf(struct traceprobe_parse_context *ctx)
+{
+	return ctx->flags & TPARG_FL_TYPECAST ?
+		ctx->struct_btf : ctx->btf;
+}
+
 static int check_prepare_btf_string_fetch(char *typename,
 				struct fetch_insn **pcode,
 				struct traceprobe_parse_context *ctx)
 {
-	struct btf *btf = ctx->btf;
+	struct btf *btf = ctx_btf(ctx);
 
 	if (!btf || !ctx->last_type)
 		return 0;
@@ -554,22 +579,29 @@ static int parse_btf_field(char *fieldname, const struct btf_type *type,
 	struct fetch_insn *code = *pcode;
 	const struct btf_member *field;
 	u32 bitoffs, anon_offs;
+	bool is_struct = ctx->flags & TPARG_FL_TYPECAST;
+	struct btf *btf = ctx_btf(ctx);
 	char *next;
 	int is_ptr;
 	s32 tid;
 
 	do {
-		/* Outer loop for solving arrow operator ('->') */
-		if (BTF_INFO_KIND(type->info) != BTF_KIND_PTR) {
-			trace_probe_log_err(ctx->offset, NO_PTR_STRCT);
-			return -EINVAL;
-		}
-		/* Convert a struct pointer type to a struct type */
-		type = btf_type_skip_modifiers(ctx->btf, type->type, &tid);
-		if (!type) {
-			trace_probe_log_err(ctx->offset, BAD_BTF_TID);
-			return -EINVAL;
+		if (!is_struct) {
+			/* Outer loop for solving arrow operator ('->') */
+			if (BTF_INFO_KIND(type->info) != BTF_KIND_PTR) {
+				trace_probe_log_err(ctx->offset, NO_PTR_STRCT);
+				return -EINVAL;
+			}
+
+			/* Convert a struct pointer type to a struct type */
+			type = btf_type_skip_modifiers(btf, type->type, &tid);
+			if (!type) {
+				trace_probe_log_err(ctx->offset, BAD_BTF_TID);
+				return -EINVAL;
+			}
 		}
+		/* Only the first type can skip being a pointer */
+		is_struct = false;
 
 		bitoffs = 0;
 		do {
@@ -580,7 +612,7 @@ static int parse_btf_field(char *fieldname, const struct btf_type *type,
 				return is_ptr;
 
 			anon_offs = 0;
-			field = btf_find_struct_member(ctx->btf, type, fieldname,
+			field = btf_find_struct_member(btf, type, fieldname,
 						       &anon_offs);
 			if (IS_ERR(field)) {
 				trace_probe_log_err(ctx->offset, BAD_BTF_TID);
@@ -602,7 +634,7 @@ static int parse_btf_field(char *fieldname, const struct btf_type *type,
 				ctx->last_bitsize = 0;
 			}
 
-			type = btf_type_skip_modifiers(ctx->btf, field->type, &tid);
+			type = btf_type_skip_modifiers(btf, field->type, &tid);
 			if (!type) {
 				trace_probe_log_err(ctx->offset, BAD_BTF_TID);
 				return -EINVAL;
@@ -627,6 +659,7 @@ static int parse_btf_field(char *fieldname, const struct btf_type *type,
 	return 0;
 }
 
+
 static int __store_entry_arg(struct trace_probe *tp, int argnum);
 
 static int parse_btf_arg(char *varname,
@@ -640,7 +673,7 @@ static int parse_btf_arg(char *varname,
 	int i, is_ptr, ret;
 	u32 tid;
 
-	if (WARN_ON_ONCE(!ctx->funcname))
+	if (WARN_ON_ONCE(!ctx->funcname && !(ctx->flags & TPARG_FL_TEVENT)))
 		return -EINVAL;
 
 	is_ptr = split_next_field(varname, &field, ctx);
@@ -653,6 +686,16 @@ static int parse_btf_arg(char *varname,
 		return -EOPNOTSUPP;
 	}
 
+	if (ctx->flags & TPARG_FL_TEVENT) {
+		ret = parse_trace_event(varname, code, ctx);
+		if (ret < 0)
+			return ret;
+		if (WARN_ON_ONCE(!(ctx->flags & TPARG_FL_TYPECAST)))
+			return -EINVAL;
+		type = ctx->last_struct;
+		goto found_type;
+	}
+
 	if (ctx->flags & TPARG_FL_RETURN && !strcmp(varname, "$retval")) {
 		code->op = FETCH_OP_RETVAL;
 		/* Check whether the function return type is not void */
@@ -709,6 +752,7 @@ static int parse_btf_arg(char *varname,
 
 found:
 	type = btf_type_skip_modifiers(ctx->btf, tid, &tid);
+found_type:
 	if (!type) {
 		trace_probe_log_err(ctx->offset, BAD_BTF_TID);
 		return -EINVAL;
@@ -727,7 +771,7 @@ static int parse_btf_arg(char *varname,
 static const struct fetch_type *find_fetch_type_from_btf_type(
 					struct traceprobe_parse_context *ctx)
 {
-	struct btf *btf = ctx->btf;
+	struct btf *btf = ctx_btf(ctx);
 	const char *typestr = NULL;
 
 	if (btf && ctx->last_type)
@@ -758,7 +802,71 @@ static int parse_btf_bitfield(struct fetch_insn **pcode,
 	return 0;
 }
 
-#else
+static int query_btf_struct(const char *sname, struct traceprobe_parse_context *ctx)
+{
+	int id;
+
+	if (!ctx->struct_btf) {
+		struct btf *btf;
+
+		id = bpf_find_btf_id(sname, BTF_KIND_STRUCT, &btf);
+		if (id < 0)
+			return id;
+		ctx->struct_btf = btf;
+	} else {
+		id = btf_find_by_name_kind(ctx->struct_btf, sname, BTF_KIND_STRUCT);
+		if (id < 0)
+			return id;
+	}
+
+	ctx->last_struct = btf_type_by_id(ctx->struct_btf, id);
+	return 0;
+}
+
+static int handle_typecast(char *arg, struct fetch_insn **pcode,
+			   struct fetch_insn *end,
+			   struct traceprobe_parse_context *ctx)
+{
+	char *tmp;
+	int ret;
+
+	/* Currently this only works for eprobes */
+	if (!(ctx->flags & TPARG_FL_TEVENT)) {
+		trace_probe_log_err(ctx->offset, TYPECAST_NOT_EVENT);
+		return -EINVAL;
+	}
+
+	tmp = strchr(arg, ')');
+	if (!tmp) {
+		trace_probe_log_err(ctx->offset + strlen(arg),
+				    DEREF_OPEN_BRACE);
+		return -EINVAL;
+	}
+	*tmp = '\0';
+	ret = query_btf_struct(arg + 1, ctx);
+	*tmp = ')';
+
+	if (ret < 0) {
+		trace_probe_log_err(ctx->offset + 1, NO_PTR_STRCT);
+		ret = -EINVAL;
+		goto out_put;
+	}
+
+	ctx->flags |= TPARG_FL_TYPECAST;
+	tmp++;
+
+	ctx->offset += tmp - arg;
+	ret = parse_btf_arg(tmp, pcode, end, ctx);
+	ctx->flags &= ~TPARG_FL_TYPECAST;
+	ctx->last_struct = NULL;
+out_put:
+	btf_put(ctx->struct_btf);
+	ctx->struct_btf = NULL;
+	return ret;
+}
+
+#else /* !CONFIG_PROBE_EVENTS_BTF_ARGS */
+
 static void clear_btf_context(struct traceprobe_parse_context *ctx)
 {
 	ctx->btf = NULL;
@@ -794,7 +902,15 @@ static int check_prepare_btf_string_fetch(char *typename,
 	return 0;
 }
 
-#endif
+static int handle_typecast(char *arg, struct fetch_insn **pcode,
+			   struct fetch_insn *end,
+			   struct traceprobe_parse_context *ctx)
+{
+	trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
+	return -EOPNOTSUPP;
+}
+
+#endif /* CONFIG_PROBE_EVENTS_BTF_ARGS */
 
 #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
 
@@ -953,18 +1069,9 @@ static int parse_probe_vars(char *orig_arg, const struct fetch_type *t,
 	int len;
 
 	if (ctx->flags & TPARG_FL_TEVENT) {
-		if (code->data)
-			return -EFAULT;
-		ret = parse_trace_event_arg(arg, code, ctx);
-		if (!ret)
-			return 0;
-		if (strcmp(arg, "comm") == 0 || strcmp(arg, "COMM") == 0) {
-			code->op = FETCH_OP_COMM;
-			return 0;
-		}
-		/* backward compatibility */
-		ctx->offset = 0;
-		goto inval;
+		if (parse_trace_event(arg, code, ctx) < 0)
+			goto inval;
+		return 0;
 	}
 
 	if (str_has_prefix(arg, "retval")) {
@@ -1231,6 +1338,9 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 				code->op = FETCH_OP_IMM;
 		}
 		break;
+	case '(':
+		ret = handle_typecast(arg, pcode, end, ctx);
+		break;
 	default:
 		if (isalpha(arg[0]) || arg[0] == '_') {	/* BTF variable */
 			if (!tparg_is_function_entry(ctx->flags) &&
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 262d8707a3df..952e3d7582b8 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -394,6 +394,7 @@ static inline int traceprobe_get_entry_data_size(struct trace_probe *tp)
  * TPARG_FL_KERNEL and TPARG_FL_USER are also mutually exclusive.
  * TPARG_FL_FPROBE and TPARG_FL_TPOINT are optional but it should be with
  * TPARG_FL_KERNEL.
+ * TPARG_FL_TYPECAST is set if an argument was typecast to a structure.
  */
 #define TPARG_FL_RETURN BIT(0)
 #define TPARG_FL_KERNEL BIT(1)
@@ -402,6 +403,7 @@ static inline int traceprobe_get_entry_data_size(struct trace_probe *tp)
 #define TPARG_FL_USER   BIT(4)
 #define TPARG_FL_FPROBE BIT(5)
 #define TPARG_FL_TPOINT BIT(6)
+#define TPARG_FL_TYPECAST BIT(7)
 #define TPARG_FL_LOC_MASK	GENMASK(4, 0)
 
 static inline bool tparg_is_function_entry(unsigned int flags)
@@ -422,7 +424,9 @@ struct traceprobe_parse_context {
 	const struct btf_param *params;	/* Parameter of the function */
 	s32 nr_params;			/* The number of the parameters */
 	struct btf *btf;		/* The BTF to be used */
+	struct btf *struct_btf;		/* The BTF to be used for structs */
 	const struct btf_type *last_type;	/* Saved type */
+	const struct btf_type *last_struct;	/* Saved structure */
 	u32 last_bitoffs;		/* Saved bitoffs */
 	u32 last_bitsize;		/* Saved bitsize */
 	struct trace_probe *tp;
@@ -563,7 +567,8 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
 	C(NEED_STRING_TYPE,	"$comm and immediate-string only accepts string type"),\
 	C(TOO_MANY_ARGS,	"Too many arguments are specified"),	\
 	C(TOO_MANY_EARGS,	"Too many entry arguments specified"),	\
-	C(EVENT_TOO_BIG,	"Event too big (too many fields?)"),
+	C(EVENT_TOO_BIG,	"Event too big (too many fields?)"),  \
+	C(TYPECAST_NOT_EVENT,	"Typecasts are only for eprobe fields"),
 
 #undef C
 #define C(a, b)		TP_ERR_##a
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH] tracing: Fix field_var_str allocation errno
From: Steven Rostedt @ 2026-05-29 15:25 UTC (permalink / raw)
  To: Yu Peng
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Tom Zanussi,
	linux-trace-kernel, linux-kernel, Rosen Penev
In-Reply-To: <20260526095022.1330107-1-pengyu@kylinos.cn>

On Tue, 26 May 2026 17:50:22 +0800
Yu Peng <pengyu@kylinos.cn> wrote:

> hist_trigger_elt_data_alloc() returns -EINVAL when the field_var_str
> kcalloc() fails. Return -ENOMEM instead, matching the other allocation
> failures in the function.
> 
> Fixes: c910db943d35 ("tracing: Dynamically allocate the per-elt hist_elt_data array")
> Signed-off-by: Yu Peng <pengyu@kylinos.cn>
> ---
>  kernel/trace/trace_events_hist.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index eb2c2bc8bc3d..17fe13e12a4f 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -1680,7 +1680,7 @@ static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
>  	elt_data->field_var_str = kcalloc(n_str, sizeof(char *), GFP_KERNEL);
>  	if (!elt_data->field_var_str) {
>  		hist_elt_data_free(elt_data);
> -		return -EINVAL;
> +		return -ENOMEM;
>  	}
>  	elt_data->n_field_var_str = n_str;
>  

Thanks but this code is made obsolete by this patch that I'm pulling in:

  https://lore.kernel.org/all/20260522214407.18120-1-rosenp@gmail.com/

-- Steve

^ permalink raw reply

* Re: [PATCH] tracing: fix CFI violation in probestub helper
From: Peter Zijlstra @ 2026-05-29 20:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Eva Kurchatova, mhiramat, linux-trace-kernel, linux-kernel,
	mathieu.desnoyers, jpoimboe, samitolvanen
In-Reply-To: <20260528164902.1bb985f3@gandalf.local.home>

On Thu, May 28, 2026 at 04:49:02PM -0400, Steven Rostedt wrote:
> On Sun, 24 May 2026 18:43:01 +0300
> Eva Kurchatova <eva.kurchatova@virtuozzo.com> wrote:
> 
> > When multiple callbacks are registered on the same tracepoint, probestub
> > will be indirectly called via traceiter helper.
> > 
> > Pointer to probestub callback resides in __tracepoints section, which is
> > excluded from ENDBR checks in objtool. Pointers to regfunc/unregfunc
> > callbacks reside in extended structure however, which is not affected.
> > 
> > Registering multiple callbacks will result in a #CP exception due to
> > missed ENDBR in __probestub helper on a CFI-enabled machine.
> > 
> > Fix this by adding CFI_NOSEAL annotation to probestub declaration.
> > 
> > Fixes: d5173f753750 ("objtool: Exclude __tracepoints data from ENDBR checks")
> > Signed-off-by: Eva Kurchatova <eva.kurchatova@virtuozzo.com>
> 
> Wait! The probestub is not in the __tracepoints section. At least it
> shouldn't be. Are you sure there's not another issue here?
> 
> #define __DEFINE_TRACE_EXT(_name, _ext, proto, args)			\
> 	static const char __tpstrtab_##_name[]				\
> 	__section("__tracepoints_strings") = #_name;			\
> 	extern struct static_call_key STATIC_CALL_KEY(tp_func_##_name);	\
> 	int __traceiter_##_name(void *__data, proto);			\
> 	void __probestub_##_name(void *__data, proto);			\
> 	struct tracepoint __tracepoint_##_name	__used			\
> 	__section("__tracepoints") = {					\
> 
>  Here the structure __tracepoint_##name is in the __tracepoints section.
> 
> 		.name = __tpstrtab_##_name,				\
> 		.key = STATIC_KEY_FALSE_INIT,				\
> 		.static_call_key = &STATIC_CALL_KEY(tp_func_##_name),	\
> 		.static_call_tramp = STATIC_CALL_TRAMP_ADDR(tp_func_##_name), \
> 		.iterator = &__traceiter_##_name,			\
> 		.probestub = &__probestub_##_name,			\

                    ^^^^^^^^^^ this

> 		.funcs = NULL,						\
> 		.ext = _ext,						\
> 	};								\
> 	__TRACEPOINT_ENTRY(_name);					\
> 	int __traceiter_##_name(void *__data, proto)			\
> 	{								\
> 		struct tracepoint_func *it_func_ptr;			\
> 		void *it_func;						\
> 									\
> 		it_func_ptr =						\
> 			rcu_dereference_raw((&__tracepoint_##_name)->funcs); \
> 		if (it_func_ptr) {					\
> 			do {						\
> 				it_func = READ_ONCE((it_func_ptr)->func); \
> 				__data = (it_func_ptr)->data;		\
> 				((void(*)(void *, proto))(it_func))(__data, args); \
> 			} while ((++it_func_ptr)->func);		\
> 		}							\
> 		return 0;						\
> 	}								\
> 	void __probestub_##_name(void *__data, proto)			\
> 	{								\
> 	}
> 
> But above, probestub is just a function defined wherever the tracepoint is
> created.
> 
> In fact, it's just there for fprobes to work. It doesn't get called if you
> add more than one callback to the tracepoint. So your explanation is totally
> bogus.

The only place the function address lives is in that __tracepoint
section. Since that is explicitly excluded by objtool, it figures there
are no actual references to __probestub and the function goes on the
seal list and the kernel explicitly scribbles the ENDBR on boot.

Then, if it ever gets used on an IBT enabled host, *boom*.

I agree it would've perhaps been clearer if there was part of a splat in
the changelog, but the issue is real afaict.

Also, I do think this:

> > @@ -356,6 +357,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> >  	void __probestub_##_name(void *__data, proto)			\
> >  	{								\
> >  	}								\
> > +	CFI_NOSEAL(__probestub_##_name);				\
> >  	DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name);
> >  
> >  #define DEFINE_TRACE_FN(_name, _reg, _unreg, _proto, _args)		\

could do with a comment, explaining why it wants the NOSEAL.

^ permalink raw reply

* Re: [GIT PULL] RTLA changes for 7.2
From: Tomas Glozar @ 2026-05-29 20:50 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Costa Shulyupin, Crystal Wood, LKML, linux-trace-kernel
In-Reply-To: <20260529095646.73c95f60@fedora>

pá 29. 5. 2026 v 15:57 odesílatel Steven Rostedt <rostedt@goodmis.org> napsal:
>
> On Fri, 29 May 2026 15:06:43 +0200
> Tomas Glozar <tglozar@redhat.com> wrote:
>
> > - Fix discrepancy in --dump-tasks option
> >
> > Due to a mistake, rtla-timerlat-hist used the CLI syntax "--dump-task"
> > instead of the documented "--dump-tasks". Change the option to match
> > both documentation and the other timerlat tool, rtla-timerlat-top.
>
> Is there any concern that scripts might be using the old option?

Good point, I'm not aware of any, but it is possible.

>
> I wonder if you should keep the old option for backward compatibility,
> but do not document that it exists.
>

I did not originally plan on keeping the old option - and the commit
does not keep it - but since libsubcmd allows match on prefixes in
general, it actually works in the tag:

$ ./rtla timerlat hist --help --dump-tasks

Usage: rtla timerlat hist [<options>] [-h|--help]
       --dump-tasks      prints the task running on all CPUs if stop
conditions are met (depends on !--no-aa)

$ sudo ./rtla timerlat hist -c 0 -T 1 --dump-task
# RTLA timerlat histogram
# Time unit is microseconds (us)
# Duration:   0 00:00:01
...
Printing CPU tasks:
   [000]              timerlatu/0:66862
   [001]                         :0
   [002]                         :0
   [003]                         :0
   [004]                         :0
   [005]                         :0
   [006]                   auditd:1000
   [007]                         :0
   [008]                         :0
   [009]                         :0
   [010]                         :0
   [011]                         :0
   [012]                         :0
   [013]                         :0

The only build where it does not work would be in the middle of the
pull request, with the --dump-tasks fix but without the libsubcmd
migration patchset.

Anyway, thanks for the feedback.

Tomas


^ permalink raw reply

* Re: [PATCH] tracing: fix CFI violation in probestub helper
From: Steven Rostedt @ 2026-05-29 23:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eva Kurchatova, mhiramat, linux-trace-kernel, linux-kernel,
	mathieu.desnoyers, jpoimboe, samitolvanen
In-Reply-To: <20260529200826.GO3493090@noisy.programming.kicks-ass.net>

On Fri, 29 May 2026 22:08:26 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, May 28, 2026 at 04:49:02PM -0400, Steven Rostedt wrote:
> > On Sun, 24 May 2026 18:43:01 +0300
> > Eva Kurchatova <eva.kurchatova@virtuozzo.com> wrote:
> >   
> > > When multiple callbacks are registered on the same tracepoint, probestub
> > > will be indirectly called via traceiter helper.
> > > 
> > > Pointer to probestub callback resides in __tracepoints section, which is
> > > excluded from ENDBR checks in objtool. Pointers to regfunc/unregfunc
> > > callbacks reside in extended structure however, which is not affected.
> > > 
> > > Registering multiple callbacks will result in a #CP exception due to
> > > missed ENDBR in __probestub helper on a CFI-enabled machine.
> > > 
> > > Fix this by adding CFI_NOSEAL annotation to probestub declaration.
> > > 
> > > Fixes: d5173f753750 ("objtool: Exclude __tracepoints data from ENDBR checks")
> > > Signed-off-by: Eva Kurchatova <eva.kurchatova@virtuozzo.com>  

> 
> The only place the function address lives is in that __tracepoint
> section. Since that is explicitly excluded by objtool, it figures there
> are no actual references to __probestub and the function goes on the
> seal list and the kernel explicitly scribbles the ENDBR on boot.
> 
> Then, if it ever gets used on an IBT enabled host, *boom*.

That makes much more sense.

> 
> I agree it would've perhaps been clearer if there was part of a splat in
> the changelog, but the issue is real afaict.
> 
> Also, I do think this:
> 
> > > @@ -356,6 +357,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> > >  	void __probestub_##_name(void *__data, proto)			\
> > >  	{								\
> > >  	}								\
> > > +	CFI_NOSEAL(__probestub_##_name);				\
> > >  	DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name);
> > >  
> > >  #define DEFINE_TRACE_FN(_name, _reg, _unreg, _proto, _args)		\  
> 
> could do with a comment, explaining why it wants the NOSEAL.

Yes.

Thus, the above change log is totally incorrect and should be updated to:

   tprobes uses a stub function of the tracepoint to allow fprobes to
   attach to the tracepoint call site and have access to its arguments.
   The stub function is called __probestub_##_name() and is only
   referenced as a pointer in the tracepoint structure so that the
   tprobe can have access to it.

   The issue is that the probstub function is only referenced in the
   __tracepoint section and objtool thinks nothing calls it. Since it
   explicitly excludes the __tracepoint section, objtool thinks there
   are no callers so it puts the probstub function into the seal list
   and then the kernel scrubs its ENDBR on boot.

   This becomes an issue if someone were to use a tprobe which will
   register the probestub as a callback to the tracepoint so that a
   fprobe may attach to it and get access to the arguments. Without the
   ENDBR it will make the kernel go BOOM!


Then have a comment in the patch with:

 void __probestub_##_name(void *__data, proto)			\
 {								\
 }								\
+/*								\
+ * The probestub is only used for tprobes and not referenced	\
+ * anywhere else. This causes objtool to think it's not called	\
+ * at all and will add it to the seal list which will remove	\
+ * the ENDBR causing issues if a tprobe is ever used.		\
+ */								\
+CFI_NOSEAL(__probestub_##_name);				\
 DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name);


-- Steve

^ permalink raw reply

* Re: [GIT PULL] RTLA changes for 7.2
From: Steven Rostedt @ 2026-05-30  0:02 UTC (permalink / raw)
  To: Tomas Glozar; +Cc: Costa Shulyupin, Crystal Wood, LKML, linux-trace-kernel
In-Reply-To: <CAP4=nvSi2q=h308Osmrph4YgOkMJE_QmsfSAQcyvi8sxKY2kug@mail.gmail.com>

On Fri, 29 May 2026 22:50:31 +0200
Tomas Glozar <tglozar@redhat.com> wrote:
> 
> The only build where it does not work would be in the middle of the
> pull request, with the --dump-tasks fix but without the libsubcmd
> migration patchset.

Great, then we don't need to worry about it.

-- Steve

^ permalink raw reply

* Re: [GIT PULL] RTLA changes for 7.2
From: Crystal Wood @ 2026-05-30  1:32 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar; +Cc: Costa Shulyupin, LKML, linux-trace-kernel
In-Reply-To: <20260529200229.2c63539a@fedora>

On Fri, 2026-05-29 at 20:02 -0400, Steven Rostedt wrote:
> On Fri, 29 May 2026 22:50:31 +0200
> Tomas Glozar <tglozar@redhat.com> wrote:
> > 
> > The only build where it does not work would be in the middle of the
> > pull request, with the --dump-tasks fix but without the libsubcmd
> > migration patchset.
> 
> Great, then we don't need to worry about it.

...until someone ends up depending on a script that uses some random
prefix that becomes ambiguous in a future update :-P

-Crystal


^ permalink raw reply

* Re: [PATCH v2] tracing/osnoise: Array printk init and cleanup
From: Crystal Wood @ 2026-05-30  3:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-trace-kernel, John Kacur, Tomas Glozar, Costa Shulyupin,
	Wander Lairson Costa, sashiko-bot, sashiko-reviews
In-Reply-To: <0c6606c49ab912904f00663f113289f546084925.camel@redhat.com>

On Thu, 2026-05-21 at 18:48 -0500, Crystal Wood wrote:
> On Wed, 2026-05-20 at 16:28 -0400, Steven Rostedt wrote:
> > [ Replying to Sashiko: https://sashiko.dev/?list=org.kernel.vger.linux-trace-kernel#/patchset/20260511223035.1475676-1-crwood%40redhat.com ]
> > 
> > > > diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> > > > index 75678053b21c5..2be188768ab42 100644
> > > > --- a/kernel/trace/trace_osnoise.c
> > > > +++ b/kernel/trace/trace_osnoise.c
> > > > @@ -83,6 +83,22 @@ struct osnoise_instance {
> > > >  
> > > >  static struct list_head osnoise_instances;
> > > >  
> > > > +static void osnoise_print(const char *fmt, ...)
> > > > +{
> > > > +	struct osnoise_instance *inst;
> > > > +	struct trace_array *tr;
> > > > +	va_list ap;
> > > > +
> > > > +	rcu_read_lock();
> > > > +	list_for_each_entry_rcu(inst, &osnoise_instances, list) {
> > > > +		tr = inst->tr;
> > > > +		va_start(ap, fmt);
> > > > +		trace_array_vprintk(tr, _RET_IP_, fmt, ap);
> > 
> > > Does this code create a use-after-free on the trace array if an instance is
> > > removed concurrently?
> 
> If so, it was already an issue with osnoise_taint(),
> osnoise_stop_tracing(), osnoise_stop_exception(), etc.  Wouldn't be
> surprising, as this file has a number of other synchronization issues as
> well.

It looks like it's actually also an issue in a bunch more places such as
record_osnoise_sample(), timerlat_dump_stack(), etc.

> > > When a user deletes a trace instance via rmdir, the unregister function
> > > removes the instance from the list using list_del_rcu(). However, the removal
> > > routine does not appear to wait for an RCU grace period before freeing the
> > > trace array itself.
> > > Could a concurrent execution of this loop inside the rcu_read_lock() section
> > > still access the unlinked instance, read the freed inst->tr, and pass it to
> > > trace_array_vprintk()? This appears to be an existing issue, but it still
> > > affects the loop here.
> > 
> > Hmm, this is interesting. osnoise keeps track of its own instances via a
> > osnoise_instances list. But it only use kfree_rcu() to free the list
> > descriptor but doesn't take care of the tr being freed before hand!
> > 
> > Something like this could work [not even compiled]
> > 
> > diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> > index 75678053b21c..bda1e0e0d2e1 100644
> > --- a/kernel/trace/trace_osnoise.c
> > +++ b/kernel/trace/trace_osnoise.c
> > @@ -476,8 +476,11 @@ static void print_osnoise_headers(struct seq_file *s)
> >  										\
> >  	rcu_read_lock();							\
> >  	list_for_each_entry_rcu(inst, &osnoise_instances, list) {		\
> > +		if (trace_array_get(inst->tr) < 0)				\
> > +			continue;						\
> >  		buffer = inst->tr->array_buffer.buffer;				\
> >  		trace_array_printk_buf(buffer, _THIS_IP_, msg);			\
> > +		trace_array_put(inst->tr);					\
> >  	}									\
> >  	rcu_read_unlock();							\
> >  	osnoise_data.tainted = true;						\
> 
> OK, I'll prepare a v3.

Many osnoise_taint() callers, as well as timerlat_dump_stack(), can have
preemption disabled, so the mutex in trace_array_get() won't work.

What is the intended way for a tracer to record to all of its
instances?  I tried looking at other tracers that allow instances, but
it seems that most of them only allow one instance, apart from
trace_function/trace_function_graph that are driven by a callback
mechanism that doesn't fit here, and that made my brain hurt when I
dove into the code to try to figure out how it ensures a valid tr.

We could have osnoise_unregister_instance() set inst->tr = NULL
under a raw lock, and then require users to hold the raw lock when
manipulating inst->tr (which can't go away until ->stop() completes),
skipping any that are NULL.

It's not great that we lose the ability to do things with tr that are
incompatible with a raw lock, but I don't know how to fix that without
something like changing the tr refcount mechanism to allow an atomic
refcount increase on a known-valid-for-now tr.  It looks like all the
current users of this list are OK with a raw lock.

We could go even further and simplify by un-RCUing the list, requiring
that the raw lock be held over traversal -- I'm not thrilled at the
idea of letting userspace create unbounded instances that are traversed
with preemption disabled, but as noted, we already do that in some places.

In any case, this patch is just moving the code around, not introducing
the problem, so I hope that whatever synchronization overhaul this file
requires (which goes well beyond this one issue) can wait for followup
patches.

-Crystal

^ permalink raw reply

* Re: [RFC PATCH] trace: Introduce a new filter_pred "caller"
From: chenjun (AM) @ 2026-05-30 10:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu (Google), mathieu.desnoyers@efficios.com,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
In-Reply-To: <20260529103811.37b0e357@fedora>

在 2026/5/29 22:38, Steven Rostedt 写道:
> 
> Hi Chen,
> 
> Do you plan on sending updates to address the comments that Masami and
> I have made?
> 
> -- Steve

Hi，

Sorry, I've been busy with other things lately. I'll release the patch 
v2 next week.

One thing I'd like to confirm is whether to use `called_within` as the 
filter name.

Thanks
-- Chen Jun

> 
> 
> On Thu, 14 May 2026 13:19:01 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
>> On Wed, 13 May 2026 12:40:17 -0400
>> Steven Rostedt <rostedt@goodmis.org> wrote:
>>
>>> On Tue, 12 May 2026 08:47:50 +0900
>>> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
>>>    
>>>> On Fri, 8 May 2026 20:26:23 +0800
>>>> Chen Jun <chenjun102@huawei.com> wrote:
>>>>    
>>>>> Low-level functions have many call paths, and sometimes
>>>>> we only care about the calls on a specific call path.
>>>>> Add a new filter to filter based on the call stack.
>>>>>
>>>>> Usage:
>>>>> 1. echo 'caller=="$function_name"' > events/../filter
>>>>
>>>> Thanks for interesting idea :)
>>>>
>>>> BTW, we already have "stacktrace". Since this actually checks
>>>> stacktrace, not caller, so I think we should reuse it.
>>>> Also, I think OP_GLOB is more suitable for this case.
>>>> (and more useful)
>>>
>>> Actually, it's not a stack trace, it's a function that is called from other
>>> functions. But since "caller" sounds like a direct called function (stack
>>> trace of the first instance), I think perhaps it should be "called_within" or
>>> something similar. :-/
>>
>> Yeah, what about "callers"?
>>
>>>
>>> Also, OP_GLOB can't work because it only works for a single function. At
>>> the time of parsing, it finds the function (and should probably error out
>>> if there's more than one function with a given name). It then records the
>>> start and end address of the function so it only needs to find if one of
>>> the entries in the stack trace is between the start and end of the function.
>>
>> Ah, OK. It is just comparing address, not name.
>>
>>>
>>> I don't think this is possible with GLOB. We don't want to do a search of
>>> the functions when the event is triggered.
>>
>> Agreed.
>>
>> Thanks,
>>
>>>
>>> -- Steve
>>
>>
> 
> 
> 


^ permalink raw reply

* Re: [PATCH] tracing/probes: Point the error offset correctly for eprobe argument error
From: Masami Hiramatsu @ 2026-05-30 13:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Shuah Khan, Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
	linux-kselftest
In-Reply-To: <20260528222934.7d0b1881@fedora>

On Thu, 28 May 2026 22:29:34 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 25 May 2026 11:21:14 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> 
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > 
> > Fix to point the error offset correctly for eprobe argument error.
> > In the cleanup commit 1b8b0cd754cd ("tracing/probes: Move event parameter
> > fetching code to common parser"), due to incorrect backward compatibility
> > aimed at conforming to the test specifications, the error location was set
> > to 0 when a non-existent formal parameter was specified for Eprobe.
> > However, this should be corrected in both the test and the implementation
> > to point correct error position.
> > 
> > Fixes: 1b8b0cd754cd ("tracing/probes: Move event parameter fetching code to common parser")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> 

Thanks! Let me pick this to probes/fixes branch.


> -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox