Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* Re: [PATCHv3 11/12] selftests/bpf: Add tests for uprobe nop10 red zone clobbering
From: Jiri Olsa @ 2026-05-25 15:44 UTC (permalink / raw)
  To: bot+bpf-ci
  Cc: oleg, peterz, mingo, mhiramat, andrii, bpf, linux-trace-kernel,
	ast, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai
In-Reply-To: <7773c27d0070aa7544a5a7596aceb8abacb4fbdb20a3aab998f05c39de4f20dc@mail.kernel.org>

On Thu, May 21, 2026 at 01:26:07PM +0000, bot+bpf-ci@kernel.org wrote:
> > diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > index 969f4deba9fd..efff0c515184 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> 
> [ ... ]
> 
> > diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> > index fda3a298ccfc..8004c9568ffa 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/usdt.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> 
> [ ... ]
> 
> > diff --git a/tools/testing/selftests/bpf/progs/test_usdt.c b/tools/testing/selftests/bpf/progs/test_usdt.c
> > index f00cb52874e0..0ee78fb050a1 100644
> > --- a/tools/testing/selftests/bpf/progs/test_usdt.c
> > +++ b/tools/testing/selftests/bpf/progs/test_usdt.c
> 
> [ ... ]
> 
> > diff --git a/tools/testing/selftests/bpf/usdt_2.c b/tools/testing/selftests/bpf/usdt_2.c
> > index b359b389f6c0..5e38f8605b02 100644
> > --- a/tools/testing/selftests/bpf/usdt_2.c
> > +++ b/tools/testing/selftests/bpf/usdt_2.c
> > @@ -13,4 +13,17 @@ void usdt_2(void)
> >  	USDT(optimized_attach, usdt_2);
> >  }
> >
> > +static volatile unsigned long usdt_red_zone_arg1 = 0xDEADBEEF;
> > +static volatile unsigned long usdt_red_zone_arg2 = 0xCAFEBABE;
> > +static volatile unsigned long usdt_red_zone_arg3 = 0xFEEDFACE;
> > +
> > +void __attribute__((noinline)) usdt_red_zone_trigger(void)
> > +{
> > +	unsigned long a1 = usdt_red_zone_arg1;
> > +	unsigned long a2 = usdt_red_zone_arg2;
> > +	unsigned long a3 = usdt_red_zone_arg3;
> > +
> > +	USDT(optimized_attach, usdt_red_zone, a1, a2, a3);
> > +}
> 
> Following up on a review comment from sashiko-bot: is this test reliable
> across different compiler versions?

Andrii replied this one in the original post:
  https://lore.kernel.org/bpf/CAEf4BzZ-nASpoF+-vLYZ1D8eWRVd5JWAuhrwx8XZNrX7ZPY5_Q@mail.gmail.com/

jirka

> 
> The x86-64 USDT macro uses the "nor" inline assembly constraint, which allows
> the compiler to allocate these operands to CPU registers. Because there is no
> register pressure in usdt_red_zone_trigger(), modern compilers might allocate
> the local variables a1, a2, and a3 directly to registers rather than the
> stack's red zone.
> 
> If the operands are placed in registers, the uprobe CALL optimization
> clobbering [rsp-8] will not corrupt them. This could cause the test to pass
> even on buggy kernels, creating a false positive.
> 
> The author acknowledged this concern but kept the test as-is, noting that
> selftests compile with -O0 and there's also an assembly-based test.
> 
> Would it be safer to explicitly force these operands into the red zone using
> inline assembly constraints, rather than depending on the compiler's register
> allocator?
> 
> Reference:
> https://lore.kernel.org/bpf/20260509021238.AE98FC2BCB0@smtp.kernel.org/
> 
> 
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
> 
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26227520868


^ permalink raw reply

* Re: [PATCHv3 08/12] selftests/bpf: Change uprobe syscall tests to use nop10
From: Jiri Olsa @ 2026-05-25 15:44 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko, bpf, linux-trace-kernel
In-Reply-To: <CAEf4BzZn0NvNwuPdh10z+MAdw-9K6CgZ0XnAL0J4XRpQHhFv2w@mail.gmail.com>

On Fri, May 22, 2026 at 11:57:04AM -0700, Andrii Nakryiko wrote:
> On Thu, May 21, 2026 at 5:45 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Optimized uprobes are now on top of 10-bytes nop instructions,
> > reflect that in existing tests.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  .../selftests/bpf/benchs/bench_trigger.c      |  2 +-
> >  .../selftests/bpf/prog_tests/uprobe_syscall.c | 28 ++++++++++---------
> >  tools/testing/selftests/bpf/prog_tests/usdt.c | 25 ++++++++++-------
> >  tools/testing/selftests/bpf/usdt_2.c          |  2 +-
> >  4 files changed, 32 insertions(+), 25 deletions(-)
> >
> > diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
> > index 2f22ec61667b..a60b8173cdc4 100644
> > --- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
> > +++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
> > @@ -398,7 +398,7 @@ static void *uprobe_producer_ret(void *input)
> >  #ifdef __x86_64__
> >  __nocf_check __weak void uprobe_target_nop5(void)
> 
> heh, nop5 -> nop_a_lot ;)
> 
> 
> >  {
> > -       asm volatile (".byte 0x0f, 0x1f, 0x44, 0x00, 0x00");
> > +       asm volatile (".byte 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00");
> >  }
> >
> 
> [...]
> 
> > @@ -420,7 +421,8 @@ static void *check_attach(struct uprobe_syscall_executed *skel, trigger_t trigge
> >         ASSERT_EQ(skel->bss->executed, executed, "executed");
> >
> >         /* .. and check the trampoline is as expected. */
> > -       call = (struct __arch_relative_insn *) addr;
> > +       ASSERT_OK(memcmp(addr, lea_rsp, 5), "lea_rsp");
> > +       call = (struct __arch_relative_insn *)(addr + 5);
> >         tramp = (void *) (call + 1) + call->raddr;
> >         ASSERT_EQ(call->op, 0xe8, "call");
> >         ASSERT_OK(find_uprobes_trampoline(tramp), "uprobes_trampoline");
> > @@ -432,7 +434,7 @@ static void check_detach(void *addr, void *tramp)
> >  {
> >         /* [uprobes_trampoline] stays after detach */
> >         ASSERT_OK(find_uprobes_trampoline(tramp), "uprobes_trampoline");
> > -       ASSERT_OK(memcmp(addr, nop5, 5), "nop5");
> > +       ASSERT_OK(memcmp(addr, jmp2B, 2), "jmp2B");
> 
> yeah, not jump anymore?

mixed up the selftest change.. I had it fixed already,
new version will have both fixes ;-)

thanks,
jirka

> 
> >  }
> >
> >  static void check(struct uprobe_syscall_executed *skel, struct bpf_link *link,
> 
> [...]

^ permalink raw reply

* [PATCH] tracing: Disable KCOV instrumentation for trace_irqsoff.o
From: Karl Mehltretter @ 2026-05-25 17:04 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Dmitry Vyukov, Andrey Konovalov, Marco Elver,
	kasan-dev, linux-trace-kernel, linux-kernel, Karl Mehltretter

When KCOV runs its boot selftest with whole-kernel instrumentation
enabled, it sets current->kcov_mode to KCOV_MODE_TRACE_PC without
installing a coverage area. Any instrumented code accepted as task-context
coverage in that window dereferences current->kcov_area and crashes.

On ARMv5 Versatile PB with CONFIG_KCOV_SELFTEST=y,
CONFIG_KCOV_INSTRUMENT_ALL=y and CONFIG_IRQSOFF_TRACER=y, boot hits a
NULL pointer fault during the selftest:

  kcov: running self test
  Internal error: Oops: 5 [#1] ARM
  PC is at __sanitizer_cov_trace_pc+0x4c/0x90
  Kernel panic - not syncing: Fatal exception

A diagnostic run showed the unwanted coverage comes from the IRQs-off
tracer callbacks reached from ARM IRQ entry before hardirq context is
visible to KCOV:

  __sanitizer_cov_trace_pc from tracer_hardirqs_off+0x18/0x1cc
  tracer_hardirqs_off from trace_hardirqs_off+0x34/0x54
  trace_hardirqs_off from __irq_svc+0x58/0xb0
  __irq_svc from kcov_init+0x7c/0xdc

and similarly through tracer_hardirqs_on().

trace_preemptirq.o is already excluded because this tracing path can run
from early interrupt code and produce coverage unrelated to syscall
inputs. Exclude trace_irqsoff.o as well, instead of requiring users to
turn off CONFIG_KCOV_INSTRUMENT_ALL=y, which is the default whole-kernel
KCOV mode.

With the exclusion in place, the same ARMv5 Versatile PB QEMU test boots
through the KCOV selftest and reaches userspace.

Tested on ARMv5 Versatile PB QEMU with CONFIG_KCOV_SELFTEST=y,
CONFIG_KCOV_INSTRUMENT_ALL=y and CONFIG_IRQSOFF_TRACER=y.

Assisted-by: Codex:gpt-5
Signed-off-by: Karl Mehltretter <kmehltretter@gmail.com>
---
 kernel/trace/Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 8d3d96e847d8..f934ff586bd4 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -48,9 +48,10 @@ ifdef CONFIG_GCOV_PROFILE_FTRACE
 GCOV_PROFILE := y
 endif

-# Functions in this file could be invoked from early interrupt
-# code and produce random code coverage.
+# Functions in these files can run from IRQ entry before hardirq context
+# is visible to KCOV, and produce coverage unrelated to syscall inputs.
 KCOV_INSTRUMENT_trace_preemptirq.o := n
+KCOV_INSTRUMENT_trace_irqsoff.o := n

 CFLAGS_bpf_trace.o := -I$(src)

-- 
2.39.5 (Apple Git-154)

^ permalink raw reply related

* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: Andrew Morton @ 2026-05-25 19:10 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, david, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
	mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe
In-Reply-To: <2b2cda8c-358a-4a5c-989c-ae42593ef2ea@redhat.com>

On Mon, 25 May 2026 08:15:53 -0600 Nico Pache <npache@redhat.com> wrote:

> Can you please append the following fixup that reverts one of the
> changes requested in V17. The issue with the change is described
> below.

OK.  fyi, what I received was badly mangled: wordwrapping, tabs messed
up, etc.

Here's my reconstruction:


Author: Nico Pache <npache@redhat.com>
Subject: fix potential use-after-free of vma in mthp_collapse()
Date: Mon May 25 07:38:59 2026 -0600

Between V17 and v18, one reviewer (Wei) brought up that we are not doing
the uffd-armed check until deep in the collapse operation.  While not
functionally incorrect, it can lead to unnecessary work.

We optimized this by passing the vma variable to mthp_collapse() and using
the collapse_max_ptes_none() function to check the state of uffd-armed
preventing the wasted work later in the collapse.

mthp_collapse() is called after mmap_read_unlock(), so the vma pointer can
become stale.  Remove the vma parameter and pass NULL to
collapse_max_ptes_none() instead.

Link: https://lore.kernel.org/2b2cda8c-358a-4a5c-989c-ae42593ef2ea@redhat.com
Signed-off-by: Nico Pache <npache@redhat.com>
...

 mm/khugepaged.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/mm/khugepaged.c~mm-khugepaged-introduce-mthp-collapse-support-fix
+++ a/mm/khugepaged.c
@@ -1502,9 +1502,9 @@ static unsigned int collapse_mthp_count_
  * If a collapse is permitted, we attempt to collapse the PTE range into a
  * mTHP.
  */
-static int mthp_collapse(struct mm_struct *mm, struct vm_area_struct *vma,
-		unsigned long address, int referenced, int unmapped,
-		struct collapse_control *cc, unsigned long enabled_orders)
+static int mthp_collapse(struct mm_struct *mm, unsigned long address,
+		int referenced, int unmapped, struct collapse_control *cc,
+		unsigned long enabled_orders)
 {
 	unsigned int nr_occupied_ptes, nr_ptes, max_ptes_none;
 	int collapsed = 0, stack_size = 0;
@@ -1524,7 +1524,7 @@ static int mthp_collapse(struct mm_struc
 		if (!test_bit(order, &enabled_orders))
 			goto next_order;
 
-		max_ptes_none = collapse_max_ptes_none(cc, vma, order);
+		max_ptes_none = collapse_max_ptes_none(cc, NULL, order);
 
 		nr_occupied_ptes = collapse_mthp_count_present(cc, offset,
 							       nr_ptes);
@@ -1749,7 +1749,7 @@ out_unmap:
 	if (result == SCAN_SUCCEED) {
 		/* collapse_huge_page expects the lock to be dropped before calling */
 		mmap_read_unlock(mm);
-		nr_collapsed = mthp_collapse(mm, vma, start_addr, referenced,
+		nr_collapsed = mthp_collapse(mm, start_addr, referenced,
 					     unmapped, cc, enabled_orders);
 		/* mmap_lock was released above, set lock_dropped */
 		*lock_dropped = true;
_


^ permalink raw reply

* Re: [PATCHv5 bpf-next 06/28] bpf: Move trampoline image setup into bpf_trampoline_ops callbacks
From: Jiri Olsa @ 2026-05-25 20:05 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: bot+bpf-ci, ast, daniel, andrii, bpf, linux-trace-kernel, kafai,
	eddyz87, songliubraving, yhs, menglong8.dong, rostedt, martin.lau,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <aec7d0Z70WKqC1b4@krava>

On Tue, Apr 21, 2026 at 10:55:19AM +0200, Jiri Olsa wrote:
> On Fri, Apr 17, 2026 at 08:10:25PM +0000, bot+bpf-ci@kernel.org wrote:
> > > There's slight functional change for the unregister path, where we
> > > currently free the image unconditionally even if the detach fails.
> > > The new code keeps the image in place, possibly preventing the crash.
> > 
> > This appears to be fixing a bug where the image was freed even when detach
> > failed. The bug was introduced in commit e21aa341785c ("bpf: Fix fexit
> > trampoline.") which unconditionally freed the trampoline image.

actually the paragraph above is not accurate, we'd not prevent the crash with
this change.. that needs separate fix

jirka


> > 
> > Should this commit include a Fixes tag? The suggested tag would be:
> > 
> > Fixes: e21aa341785c ("bpf: Fix fexit trampoline.")
> 
> right, I think I'll send separate fix for current code,
> to have this fixed earlier
> 
> jirka
> 
> > 
> > 
> > ---
> > AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> > See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
> > 
> > CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24583317711
> 

^ permalink raw reply

* Re: [PATCH v6] tracing/eprobes: Allow use of BTF names to dereference pointers
From: Masami Hiramatsu @ 2026-05-26  0:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux trace kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Mark Rutland, Peter Zijlstra, Namhyung Kim, Takaya Saeki,
	Douglas Raillard, Tom Zanussi, Andrew Morton, Thomas Gleixner,
	Ian Rogers, Jiri Olsa
In-Reply-To: <20260521225033.56458336@fedora>

On Thu, 21 May 2026 22:50:33 -0400
Steven Rostedt <rostedt@kernel.org> wrote:

> @@ -640,7 +673,7 @@ static int parse_btf_arg(char *varname,
>  	int i, is_ptr, ret;
>  	u32 tid;
>  
> -	if (WARN_ON_ONCE(!ctx->funcname))
> +	if (WARN_ON_ONCE(!ctx->funcname && !(ctx->flags & TPARG_FL_TEVENT)))
>  		return -EINVAL;
>  
>  	is_ptr = split_next_field(varname, &field, ctx);
> @@ -653,6 +686,20 @@ static int parse_btf_arg(char *varname,
>  		return -EOPNOTSUPP;
>  	}
>  
> +	if (ctx->flags & TPARG_FL_TEVENT) {
> +		int ret;

nit: parse_btf_arg already declared @ret. So we don't need this.

Thanks,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH] rethook: Use tsk->on_cpu to check task execution state
From: Masami Hiramatsu @ 2026-05-26  3:37 UTC (permalink / raw)
  To: Tengda Wu
  Cc: Steven Rostedt, Mathieu Desnoyers, Alexei Starovoitov,
	linux-trace-kernel, linux-kernel
In-Reply-To: <20260525132253.1889726-1-wutengda@huaweicloud.com>

On Mon, 25 May 2026 21:22:53 +0800
Tengda Wu <wutengda@huaweicloud.com> wrote:

> When a task calls schedule() to yield the CPU, its state remains
> TASK_RUNNING, but its stack is frozen and safe to walk.
> 
> Replace task_is_running(tsk) with tsk->on_cpu to avoid overly
> conservative rejections.

Please see the Sashiko's comment.

https://sashiko.dev/#/patchset/20260525132253.1889726-1-wutengda%40huaweicloud.com

When calling Unwind on a task other than the current, IMHO, it is
the responsibility of the caller of this function to ensure that the
stack trace of that task is safe.
We also should not use tsk->on_cpu, but should use task_on_cpu(tsk).

BTW, should task_on_cpu() use READ_ONCE() etc?
wait_task_inactive() seems a bit fragile.

Thanks,

> 
> Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
> ---
>  kernel/trace/rethook.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
> index 5a8bdf88999a..bd5e5f455e85 100644
> --- a/kernel/trace/rethook.c
> +++ b/kernel/trace/rethook.c
> @@ -250,7 +250,7 @@ unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame
>  	if (WARN_ON_ONCE(!cur))
>  		return 0;
>  
> -	if (tsk != current && task_is_running(tsk))
> +	if (tsk != current && tsk->on_cpu)
>  		return 0;
>  
>  	do {
> -- 
> 2.34.1
> 
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCHv2] trace: allocate fields with elt struct
From: Masami Hiramatsu @ 2026-05-26  4:43 UTC (permalink / raw)
  To: Rosen Penev, Steven Rostedt
  Cc: linux-trace-kernel, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, open list:TRACING
In-Reply-To: <20260522212627.12406-1-rosenp@gmail.com>

Hi,

Sashiko made 2 comments.

https://sashiko.dev/#/patchset/20260522212627.12406-1-rosenp%40gmail.com

On Fri, 22 May 2026 14:26:27 -0700
Rosen Penev <rosenp@gmail.com> wrote:

> > diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
> > index 0dd7927df22ac..196a7b0be99df 100644
> > --- a/kernel/trace/tracing_map.c
> > +++ b/kernel/trace/tracing_map.c
> [ ... ]
> > @@ -125,32 +125,32 @@ u64 tracing_map_read_var_once(struct tracing_map_elt *elt, unsigned int i)
> [ ... ]
> >  #define DEFINE_TRACING_MAP_CMP_FN(type)					\
> > -static int tracing_map_cmp_##type(void *val_a, void *val_b)		\
> > +static int tracing_map_cmp_##type(const void *val_a, const void *val_b)	\
> >  {									\
> > -	type a = (type)(*(u64 *)val_a);					\
> > -	type b = (type)(*(u64 *)val_b);					\
> > +	type a = (type)(*(const u64 *)val_a);				\
> > +	type b = (type)(*(const u64 *)val_b);				\
> >  									\
> >  	return (a > b) ? 1 : ((a < b) ? -1 : 0);			\
> >  }
> This is a pre-existing issue, but does unconditionally reading 8 bytes
> via the u64 cast cause unaligned access exceptions on architectures that
> do not support them?
> Additionally, for fields near the end of the dynamically allocated elt->key
> buffer, can this trigger KASAN slab-out-of-bounds reads?
> Also, on big-endian architectures, reading a smaller integer as a 64-bit
> value and casting it down extracts the least-significant bytes rather than
> the correct field value. Could this result in completely incorrect sorting
> for small types?

Steve, it seems this comes from your commit 106f41f5a302 ("tracing: Have
the histogram compare functions convert to u64 first").

I think neither of them is a problem, but could you check it?


> > diff --git a/kernel/trace/tracing_map.h b/kernel/trace/tracing_map.h
> > index 99c37eeebc161..ed64136782d80 100644
> > --- a/kernel/trace/tracing_map.h
> > +++ b/kernel/trace/tracing_map.h
> [ ... ]
> > @@ -137,11 +137,11 @@ struct tracing_map_field {
> >  
> >  struct tracing_map_elt {
> >  	struct tracing_map		*map;
> > -	struct tracing_map_field	*fields;
> >  	atomic64_t			*vars;
> >  	bool				*var_set;
> >  	void				*key;
> >  	void				*private_data;
> > +	struct tracing_map_field	fields[];
> >  };
>
> Is it safe to add a flexible array member to struct tracing_map_elt when it
> is embedded by value in the middle of struct track_data?
> kernel/trace/trace_events_hist.c:struct track_data {
>     ...
>     struct tracing_map_elt      elt;
>     ...
>     struct action_data          *action_data;
>     struct hist_trigger_data    *hist_data;
> };
> Embedding a struct with a flexible array member in the middle of another
> struct violates C standard constraints. Does this trigger compiler
> warnings (such as -Wflex-array-member-not-at-end on modern compilers) or
> break bounds computations for FORTIFY_SOURCE?

Yeah, from this reason, this is is not acceptable.
To fix this issue, we need to refactor the trace_events_hist.c,
because track_data only uses tracing_map_elt as a placeholder
of private_data.

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v21 8/9] ring-buffer: Show persistent buffer dropped events in trace file
From: Masami Hiramatsu @ 2026-05-26  5:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Ian Rogers
In-Reply-To: <20260522171052.006276604@kernel.org>

On Fri, 22 May 2026 13:09:05 -0400
Steven Rostedt <rostedt@kernel.org> wrote:

> @@ -7187,6 +7190,8 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
>  		local_set(&reader->entries, 0);
>  		reader->read = 0;
>  		data_page->data = dpage;
> +		if (!missed_events && rb_data_page_commit(dpage) & RB_MISSED_EVENTS)
> +			missed_events = -1;
>  
>  		/*
>  		 * Use the real_end for the data size,
> @@ -7204,10 +7209,12 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
>  	 * Set a flag in the commit field if we lost events
>  	 */
>  	if (missed_events) {
> -		/* If there is room at the end of the page to save the
> +		/*
> +		 * If there is room at the end of the page to save the
>  		 * missed events, then record it there.
>  		 */
> -		if (buffer->subbuf_size - commit >= sizeof(missed_events)) {
> +		if (missed_events > 0 &&
> +		    buffer->subbuf_size - commit >= sizeof(missed_events)) {
>  			memcpy(&dpage->data[commit], &missed_events,
>  			       sizeof(missed_events));
>  			local_add(RB_MISSED_STORED, &dpage->commit);

After this line, we "add" RB_MISSED_EVENTS instead of set.
In this case, does it clear the RB_MISSED_EVENTS bit because
it already sets RB_MISSED_EVENTS.

			commit += sizeof(missed_events);
		}
		local_add(RB_MISSED_EVENTS, &bpage->commit);
                      ^^^ here.


Thanks,


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v3 2/2] serial: qcom-geni: Add tracepoints for Qualcomm GENI serial driver
From: Praveen Talari @ 2026-05-26  5:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Jiri Slaby,
	Konrad Dybcio, linux-kernel, linux-trace-kernel, linux-arm-msm,
	linux-serial, Mukesh Kumar Savaliya, Aniket Randive,
	chandana.chiluveru, jyothi.seerapu
In-Reply-To: <2026052258-scrooge-friction-fe21@gregkh>

Hi

On 22-05-2026 15:17, Greg Kroah-Hartman wrote:
> On Mon, May 18, 2026 at 11:26:56PM +0530, Praveen Talari wrote:
>> Add tracing to the Qualcomm GENI serial driver to improve runtime
>> observability.
>>
>> Trace hooks are added at key points including termios and clock
>> configuration, manual control get/set, interrupt handling, and data
>> TX/RX paths.
>>
>> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>> ---
>> v2->v3:
>> - Updated commit text(removed example as it was available on cover
>>    letter).
>> ---
>>   drivers/tty/serial/qcom_geni_serial.c | 27 +++++++++++++++++++++++----
>>   1 file changed, 23 insertions(+), 4 deletions(-)
> This patch did not apply to my tree :(
Do you mean these patches are not applied cleanly?If yes, i will push on 
linux-next tip.


Thanks,

Praveen Talari


^ permalink raw reply

* Re: [PATCH v21 0/9] ring-buffer: Making persistent ring buffers robust
From: Masami Hiramatsu @ 2026-05-26  5:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Ian Rogers
In-Reply-To: <20260522170857.263969486@kernel.org>

On Fri, 22 May 2026 13:08:57 -0400
Steven Rostedt <rostedt@kernel.org> wrote:

> This is to make the persistent ring buffer more robust when sub-buffers
> are detected to be corrupted. Instead of invalidating the entire buffer,
> just invalidate the individual sub-buffers.
> 
> I started with Masami's patches and modified some from Sashiko reviews.
> I added a few patches to display the dropped events when the persistent
> ring buffers validation checks found sub-buffers were dropped due to being
> corrupted data.

It seems that Sashiko still marks it "Incompleted".
Maybe we need base-commit: tag in this cover mail?
I also guess that this series does not use "In-Reply-To:" but
only uses "References:" tag in the mail header. I guess
Sashiko's mail header parser missed it.

Thanks,

> 
> Changes since v20: https://lore.kernel.org/all/20260520184938.749337513@kernel.org/
> 
> - squashed the fix for max_loops in rb_iter_peek()
> 
> - Still process reader page if head page fails validation (Sashiko)
> 
> - Removed left over printk() (Masami Hiramatsu)
> 
> 
> Masami Hiramatsu (Google) (6):
>       ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
>       ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
>       ring-buffer: Add persistent ring buffer invalid-page inject test
>       ring-buffer: Show commit numbers in buffer_meta file
>       ring-buffer: Cleanup persistent ring buffer validation
>       ring-buffer: Cleanup buffer_data_page related code
> 
> Steven Rostedt (3):
>       ring-buffer: Have dropped subbuffers be persistent across reboots
>       ring-buffer: Show persistent buffer dropped events in trace file
>       ring-buffer: Show persistent buffer dropped events in trace_pipe file
> 
> ----
>  include/linux/ring_buffer.h |   1 +
>  kernel/trace/Kconfig        |  34 +++
>  kernel/trace/ring_buffer.c  | 543 +++++++++++++++++++++++++++++---------------
>  kernel/trace/trace.c        |   4 +
>  4 files changed, 402 insertions(+), 180 deletions(-)




-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v21 9/9] ring-buffer: Show persistent buffer dropped events in trace_pipe file
From: Masami Hiramatsu @ 2026-05-26  6:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Ian Rogers
In-Reply-To: <20260522171052.156419479@kernel.org>

On Fri, 22 May 2026 13:09:06 -0400
Steven Rostedt <rostedt@kernel.org> wrote:

> From: Steven Rostedt <rostedt@goodmis.org>
> 
> When the persistent ring buffer is validated on boot up, if a subbuffer is
> deemed invalid, it resets the buffer and continues. Have the code preserve
> the RB_MISSED_EVENTS flag in the commit portion of the subbuffer header
> and pass that back so that the trace_pipe file can show the missed events
> like the trace file does.
> 
> For example:
> 
>    <...>-1242    [005] d....  4429.120116: page_fault_user: address=0x7ffaebb6e728 ip=0x7ffaeb9d4960 error_code=0x7
>    <...>-1242    [005] .....  4429.120124: mm_page_alloc: page=00000000055254f3 pfn=0x1373bd order=0 migratetype=1 gfp_flags=GFP_HIGHUSER_MOVABLE|__GFP_COMP
>    <...>-1242    [005] d..2.  4429.120132: tlb_flush: pages:1 reason:local MM shootdown (3)
> CPU:5 [LOST EVENTS]
>    <...>-1242    [005] d....  4429.120661: page_fault_user: address=0x55ba7c2d0944 ip=0x55ba7c20cd02 error_code=0x7
>    <...>-1242    [005] .....  4429.120669: mm_page_alloc: page=0000000005a02500 pfn=0x12b6e4 order=0 migratetype=1 gfp_flags=GFP_HIGHUSER_MOVABLE|__GFP_COMP
>    <...>-1242    [005] d..2.  4429.120680: tlb_flush: pages:1 reason:local MM shootdown (3)
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
> Changes since v20: https://patch.msgid.link/20260520185018.470465795@kernel.org
> 

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thanks,

> - Removed left over printk() (Masami Hiramatsu)
> 
>  kernel/trace/ring_buffer.c | 56 +++++++++++++++++++++++---------------
>  1 file changed, 34 insertions(+), 22 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 988915f035c7..910f6b3adf74 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -5801,6 +5801,7 @@ __rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
>  	struct buffer_page *reader = NULL;
>  	unsigned long overwrite;
>  	unsigned long flags;
> +	int missed_events = 0;
>  	int nr_loops = 0;
>  	bool ret;
>  
> @@ -5901,6 +5902,9 @@ __rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
>  	if (!ret)
>  		goto spin;
>  
> +	if (rb_page_commit(reader) & RB_MISSED_EVENTS)
> +		missed_events = -1;
> +
>  	if (cpu_buffer->ring_meta)
>  		rb_update_meta_reader(cpu_buffer, reader);
>  
> @@ -5965,6 +5969,8 @@ __rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
>  	 */
>  	smp_rmb();
>  
> +	if (!cpu_buffer->lost_events)
> +		cpu_buffer->lost_events = missed_events;
>  
>  	return reader;
>  }
> @@ -7066,6 +7072,7 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
>  	struct buffer_page *reader;
>  	long missed_events;
>  	unsigned int commit;
> +	unsigned int size;
>  	unsigned int read;
>  	u64 save_timestamp;
>  	bool force_memcpy;
> @@ -7101,7 +7108,8 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
>  	event = rb_reader_event(cpu_buffer);
>  
>  	read = reader->read;
> -	commit = rb_page_size(reader);
> +	commit = rb_page_commit(reader);
> +	size = rb_page_size(reader);
>  
>  	/* Check if any events were dropped */
>  	missed_events = cpu_buffer->lost_events;
> @@ -7115,13 +7123,14 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
>  	 * we must copy the data from the page to the buffer.
>  	 * Otherwise, we can simply swap the page with the one passed in.
>  	 */
> -	if (read || (len < (commit - read)) ||
> +	if (read || (len < (size - read)) ||
>  	    cpu_buffer->reader_page == cpu_buffer->commit_page ||
>  	    force_memcpy) {
>  		struct buffer_data_page *rpage = cpu_buffer->reader_page->page;
>  		unsigned int rpos = read;
>  		unsigned int pos = 0;
> -		unsigned int size;
> +		unsigned int event_size;
> +		unsigned int flags = 0;
>  
>  		/*
>  		 * If a full page is expected, this can still be returned
> @@ -7130,19 +7139,22 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
>  		 * the reader page.
>  		 */
>  		if (full &&
> -		    (!read || (len < (commit - read)) ||
> +		    (!read || (len < (size - read)) ||
>  		     cpu_buffer->reader_page == cpu_buffer->commit_page))
>  			return -1;
>  
> -		if (len > (commit - read))
> -			len = (commit - read);
> +		if (len > (size - read))
> +			len = (size - read);
>  
>  		/* Always keep the time extend and data together */
> -		size = rb_event_ts_length(event);
> +		event_size = rb_event_ts_length(event);
>  
> -		if (len < size)
> +		if (len < event_size)
>  			return -1;
>  
> +		if (commit & RB_MISSED_EVENTS)
> +			flags = RB_MISSED_EVENTS;
> +
>  		/* save the current timestamp, since the user will need it */
>  		save_timestamp = cpu_buffer->read_stamp;
>  
> @@ -7154,25 +7166,25 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
>  			 * one or two events.
>  			 * We have already ensured there's enough space if this
>  			 * is a time extend. */
> -			size = rb_event_length(event);
> -			memcpy(dpage->data + pos, rpage->data + rpos, size);
> +			event_size = rb_event_length(event);
> +			memcpy(dpage->data + pos, rpage->data + rpos, event_size);
>  
> -			len -= size;
> +			len -= event_size;
>  
>  			rb_advance_reader(cpu_buffer);
>  			rpos = reader->read;
> -			pos += size;
> +			pos += event_size;
>  
> -			if (rpos >= commit)
> +			if (rpos >= event_size)
>  				break;
>  
>  			event = rb_reader_event(cpu_buffer);
>  			/* Always keep the time extend and data together */
> -			size = rb_event_ts_length(event);
> -		} while (len >= size);
> +			event_size = rb_event_ts_length(event);
> +		} while (len >= event_size);
>  
>  		/* update dpage */
> -		local_set(&dpage->commit, pos);
> +		local_set(&dpage->commit, pos | flags);
>  		dpage->time_stamp = save_timestamp;
>  
>  		/* we copied everything to the beginning */
> @@ -7204,7 +7216,7 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
>  
>  	cpu_buffer->lost_events = 0;
>  
> -	commit = rb_data_page_commit(dpage);
> +	size = rb_data_page_size(dpage);
>  	/*
>  	 * Set a flag in the commit field if we lost events
>  	 */
> @@ -7214,11 +7226,11 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
>  		 * missed events, then record it there.
>  		 */
>  		if (missed_events > 0 &&
> -		    buffer->subbuf_size - commit >= sizeof(missed_events)) {
> -			memcpy(&dpage->data[commit], &missed_events,
> +		    buffer->subbuf_size - size >= sizeof(missed_events)) {
> +			memcpy(&dpage->data[size], &missed_events,
>  			       sizeof(missed_events));
>  			local_add(RB_MISSED_STORED, &dpage->commit);
> -			commit += sizeof(missed_events);
> +			size += sizeof(missed_events);
>  		}
>  		local_add(RB_MISSED_EVENTS, &dpage->commit);
>  	}
> @@ -7226,8 +7238,8 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
>  	/*
>  	 * This page may be off to user land. Zero it out here.
>  	 */
> -	if (commit < buffer->subbuf_size)
> -		memset(&dpage->data[commit], 0, buffer->subbuf_size - commit);
> +	if (size < buffer->subbuf_size)
> +		memset(&dpage->data[size], 0, buffer->subbuf_size - size);
>  
>  	return read;
>  }
> -- 
> 2.53.0
> 
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH] tracing: Disable KCOV instrumentation for trace_irqsoff.o
From: Masami Hiramatsu @ 2026-05-26  6:07 UTC (permalink / raw)
  To: Karl Mehltretter
  Cc: Steven Rostedt, Mathieu Desnoyers, Dmitry Vyukov,
	Andrey Konovalov, Marco Elver, kasan-dev, linux-trace-kernel,
	linux-kernel
In-Reply-To: <20260525170428.67211-1-kmehltretter@gmail.com>

On Mon, 25 May 2026 19:04:28 +0200
Karl Mehltretter <kmehltretter@gmail.com> wrote:

> When KCOV runs its boot selftest with whole-kernel instrumentation
> enabled, it sets current->kcov_mode to KCOV_MODE_TRACE_PC without
> installing a coverage area. Any instrumented code accepted as task-context
> coverage in that window dereferences current->kcov_area and crashes.
> 
> On ARMv5 Versatile PB with CONFIG_KCOV_SELFTEST=y,
> CONFIG_KCOV_INSTRUMENT_ALL=y and CONFIG_IRQSOFF_TRACER=y, boot hits a
> NULL pointer fault during the selftest:
> 
>   kcov: running self test
>   Internal error: Oops: 5 [#1] ARM
>   PC is at __sanitizer_cov_trace_pc+0x4c/0x90
>   Kernel panic - not syncing: Fatal exception
> 
> A diagnostic run showed the unwanted coverage comes from the IRQs-off
> tracer callbacks reached from ARM IRQ entry before hardirq context is
> visible to KCOV:
> 
>   __sanitizer_cov_trace_pc from tracer_hardirqs_off+0x18/0x1cc
>   tracer_hardirqs_off from trace_hardirqs_off+0x34/0x54
>   trace_hardirqs_off from __irq_svc+0x58/0xb0
>   __irq_svc from kcov_init+0x7c/0xdc
> 
> and similarly through tracer_hardirqs_on().
> 
> trace_preemptirq.o is already excluded because this tracing path can run
> from early interrupt code and produce coverage unrelated to syscall
> inputs. Exclude trace_irqsoff.o as well, instead of requiring users to
> turn off CONFIG_KCOV_INSTRUMENT_ALL=y, which is the default whole-kernel
> KCOV mode.
> 
> With the exclusion in place, the same ARMv5 Versatile PB QEMU test boots
> through the KCOV selftest and reaches userspace.
> 
> Tested on ARMv5 Versatile PB QEMU with CONFIG_KCOV_SELFTEST=y,
> CONFIG_KCOV_INSTRUMENT_ALL=y and CONFIG_IRQSOFF_TRACER=y.


Thanks for reporting. This looks good to me for a mitigation.
BTW, I could not reproduce the bug with above configs.
Is this only for arm32?


> 
> Assisted-by: Codex:gpt-5
> Signed-off-by: Karl Mehltretter <kmehltretter@gmail.com>
> ---
>  kernel/trace/Makefile | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index 8d3d96e847d8..f934ff586bd4 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -48,9 +48,10 @@ ifdef CONFIG_GCOV_PROFILE_FTRACE
>  GCOV_PROFILE := y
>  endif
>  
> -# Functions in this file could be invoked from early interrupt
> -# code and produce random code coverage.
> +# Functions in these files can run from IRQ entry before hardirq context
> +# is visible to KCOV, and produce coverage unrelated to syscall inputs.
>  KCOV_INSTRUMENT_trace_preemptirq.o := n
> +KCOV_INSTRUMENT_trace_irqsoff.o := n
>  
>  CFLAGS_bpf_trace.o := -I$(src)
>  
> -- 
> 2.39.5 (Apple Git-154)
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: Wei Yang @ 2026-05-26  6:57 UTC (permalink / raw)
  To: Nico Pache, Andrew Morton
  Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, david, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
	mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe
In-Reply-To: <20260525121041.2f2508a4f627c338cddd837a@linux-foundation.org>

On Mon, May 25, 2026 at 12:10:41PM -0700, Andrew Morton wrote:
>On Mon, 25 May 2026 08:15:53 -0600 Nico Pache <npache@redhat.com> wrote:
>
>> Can you please append the following fixup that reverts one of the
>> changes requested in V17. The issue with the change is described
>> below.
>
>OK.  fyi, what I received was badly mangled: wordwrapping, tabs messed
>up, etc.
>
>Here's my reconstruction:
>

Hi, Nico

I tried to reply your mail, but found it has some encoding problem, so reply
here.

>
>Author: Nico Pache <npache@redhat.com>
>Subject: fix potential use-after-free of vma in mthp_collapse()
>Date: Mon May 25 07:38:59 2026 -0600
>
>Between V17 and v18, one reviewer (Wei) brought up that we are not doing
>the uffd-armed check until deep in the collapse operation.  While not
>functionally incorrect, it can lead to unnecessary work.

So we decide to tolerate the behavioral change?

>
>We optimized this by passing the vma variable to mthp_collapse() and using
>the collapse_max_ptes_none() function to check the state of uffd-armed
>preventing the wasted work later in the collapse.
>
>mthp_collapse() is called after mmap_read_unlock(), so the vma pointer can
>become stale.  Remove the vma parameter and pass NULL to
>collapse_max_ptes_none() instead.
>
>Link: https://lore.kernel.org/2b2cda8c-358a-4a5c-989c-ae42593ef2ea@redhat.com
>Signed-off-by: Nico Pache <npache@redhat.com>
>...
>
> mm/khugepaged.c |   10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
>--- a/mm/khugepaged.c~mm-khugepaged-introduce-mthp-collapse-support-fix
>+++ a/mm/khugepaged.c
>@@ -1502,9 +1502,9 @@ static unsigned int collapse_mthp_count_
>  * If a collapse is permitted, we attempt to collapse the PTE range into a
>  * mTHP.
>  */
>-static int mthp_collapse(struct mm_struct *mm, struct vm_area_struct *vma,
>-		unsigned long address, int referenced, int unmapped,
>-		struct collapse_control *cc, unsigned long enabled_orders)
>+static int mthp_collapse(struct mm_struct *mm, unsigned long address,
>+		int referenced, int unmapped, struct collapse_control *cc,
>+		unsigned long enabled_orders)
> {
> 	unsigned int nr_occupied_ptes, nr_ptes, max_ptes_none;
> 	int collapsed = 0, stack_size = 0;
>@@ -1524,7 +1524,7 @@ static int mthp_collapse(struct mm_struc
> 		if (!test_bit(order, &enabled_orders))
> 			goto next_order;
> 
>-		max_ptes_none = collapse_max_ptes_none(cc, vma, order);
>+		max_ptes_none = collapse_max_ptes_none(cc, NULL, order);
> 
> 		nr_occupied_ptes = collapse_mthp_count_present(cc, offset,
> 							       nr_ptes);
>@@ -1749,7 +1749,7 @@ out_unmap:
> 	if (result == SCAN_SUCCEED) {
> 		/* collapse_huge_page expects the lock to be dropped before calling */
> 		mmap_read_unlock(mm);
>-		nr_collapsed = mthp_collapse(mm, vma, start_addr, referenced,
>+		nr_collapsed = mthp_collapse(mm, start_addr, referenced,
> 					     unmapped, cc, enabled_orders);
> 		/* mmap_lock was released above, set lock_dropped */
> 		*lock_dropped = true;
>_

-- 
Wei Yang
Help you, Help me

^ permalink raw reply

* Re: [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP collapse support
From: Lorenzo Stoakes @ 2026-05-26  8:14 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-doc, akpm, linux-kernel, linux-mm, linux-trace-kernel,
	aarcange, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, liam, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <ahCFIDuyrvEfB9jv@lucifer>

Nico,

While I stand by the below, and we very well might wish to delay this until
the next cycle, I will try to take some time to go through this myself as
soon as I am able.

If David's happy with it for this cycle, and I don't find anything too
crazy, then it's not impossible we could still move forward with it now.

My only aim here is to avoid rushing something in that might have
unexpected changes or issues in it, given how late in the cycle we are :)

Cheers, Lorenzo

On Fri, May 22, 2026 at 06:12:59PM +0100, Lorenzo Stoakes wrote:
> On Fri, May 22, 2026 at 10:31:41AM -0600, Nico Pache wrote:
> > On Fri, May 22, 2026 at 10:20 AM Lorenzo Stoakes <ljs@kernel.org> wrote:
> > > There's some kind of confusion here.
> > >
> > > This series isn't suited for 7.2.
> > >
> > > Sorry but Zi's series, unless it depends on functionality here, will have
> > > to be rebased.
> > >
> > > People have been at conferences, people have been on leave, I've had to
> > > pace myself for health reasons and it seems there's been more than simply
> > > review comment-based changes happening here.
> > >
> > > (Again I strongly encourage, at this stage, to ONLY be making changes based
> > > on review, not adding ANYTHING else or changing ANYTHING else to avoid
> > > delays :)
> >
> > All the changes are based on review points. Very small changes in this
> > version; the largest being the one that you specifically argeed too.
>
> 16->17
>
>  Documentation/admin-guide/mm/transhuge.rst |  24 +++++-------------
>  include/linux/khugepaged.h                 |   7 ++---
>  include/trace/events/huge_memory.h         |   3 ++-
>  mm/huge_memory.c                           |   2 +-
>  mm/khugepaged.c                            | 168 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------------------
>  mm/vma.c                                   |   6 ++---
>  tools/testing/vma/include/stubs.h          |   3 ++-
>  7 files changed, 103 insertions(+), 110 deletions(-)
>
> 17->18
>
>  Documentation/admin-guide/mm/transhuge.rst |   5 +++--
>  include/trace/events/huge_memory.h         |   3 +--
>  mm/khugepaged.c                            | 121 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------------------------
>  3 files changed, 66 insertions(+), 63 deletions(-)
>
> These are not small 'very small changes'.
>
> We're nearly at rc-5, and this is a major, invasive, dangerous change that
> we have to get right.
>
> You've also made changes unrelated to review, repeatedly, throughout this
> process, which as I've told you, is causing delays.
>
> You've also throughout the review of this series done stuff like make MAJOR
> changes to things and _kept review tags_.
>
> You're forcing us to use git range-diff etc. to forensically check that the
> series is what is claimed.
>
> Dude I mean you switched to using // comment style which is not used in mm
> anywhere for instance? Don't do things like that and complain about
> delays. Honestly.
>
> Also, again, LSF happened. Other confeerences happened. Bandwidth is
> reduced.
>
> So again, I'm sorry, but you've been hit with some bad luck here.
>
> I really wanted this in for 7.2, and I feel bad that we couldn't make it,
> but you're also doing thing that's making it difficult for us.
>
> I've spent double-digits hours on your series, and I've also had work
> pushed out becasue of that leading me to work evenings and weekends as a
> result.
>
> And I'm not even going to get any credit for it :))
>
> So while I sypmathise, really, please have empathy and realise it goes both
> ways, please.
>
> I'm not being mean for the sake of it, I'm pushing back because I feel this
> is not at a stage where I'd feel confident in this being merged at this
> time.
>
> And it's very much a regret, as I _really_ wanted us to have it in this
> time. But life and circumstances and the issues mentioned above have
> intervened, sadly.
>
> >
> > >
> > > Also - shouldn't mm-unstable already have mm-hotfixes-unstable in it?
> > >
> > > I think in mm-next we will have an stable branch, that everything is
> > > based on, where things go once review is complete and things are mergeable.
> > >
> > > And a separate hotfixes branch based on Linus's tree.
> > >
> > > That would avoid issues like this :)
> >
> > Im sorry im new to this, but I really dont think this tiny error, and
> > something that I'd confirmed with Andrew beforehand deserves NAKing
> > and defering it. Ive worked through my PTO to clean up some of these
> > review nits just to get it in 7.2. I even through this through my
> > rounds of testing today before resending.
>
> The issue wasn't the error (though it wasn't tiny...!), it's the state of
> review. There was fresh review comments from a few days ago, and there's
> big diffs between revisions.
>
> You've also made unrelated changes as you have done throughout the series.
>
> As I said above, I'm sorry that you spent time in your PTO on this, but we
> cannot rush this in when things are not clearly ready yet, and I am not
> confident in this being ready at this stage.
>
> >
> > >
> > > >
> > > > The intent wasn't that this is a hotfix, just that this was the
> > > > closest base before the v17 that is already in the tree.
> > >
> > > The convention is that [PATCH ... <branch>] indicates the target of the
> > > changes. Putting the hotfixes branch there implies it's a hotfix.
> >
> > Sorry I thought the <branch> was what base you used.
>
> I mean, sure there's clearly confusion here as you sent [PATCH 7.2 v16 ...]
> (against an unreleased kernel version) then a branch specifier then the
> hotfixes one...
>
> Anyway sure, it's fine, I've made vastly more dumb mistakes than that
> myself, nobody minds, but it's concerning as by convention [PATCH
> ... <mm->hotfixes<whatever>] generally is taken to mean 'please rush this
> to hotfixes!' :)
>
> So be careful with that please!
>
> >
> > >
> > > So please be careful with that in future :)
> >
> > Yes will do for sure.
>
> Thanks!
>
> >
> > >
> > > >
> > > > Sorry for the confusion, hopefully Andrew can still apply it to the
> > > > correct tree.
> > >
> > > I'm not even sure what's best for that at this stage given we have
> > > conflicts and this has to be delayed until 7.3.
> > >
> > > I wonder if given that we should not have this in mm-unstable at all and
> > > just wait it out until the next cycle begins? Review can happen
> > > concurrently.
> >
> > I still dont see why this has to be deferred, I was working with
> > Andrew to prevent merge headaches.
>
> I've explained the why above, and David and I co-maintain THP so I feel
> that ultimately given the blood, sweat and tears we've put into THP review
> we ought to have some input on this :)
>
> Thanks, Lorenzo

^ permalink raw reply

* Re: [PATCHv3 04/12] uprobes/x86: Move optimized uprobe from nop5 to nop10
From: Peter Zijlstra @ 2026-05-26  9:19 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Oleg Nesterov, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko, bpf, linux-trace-kernel
In-Reply-To: <ahDIVTM5WfVqiYE6@krava>

On Fri, May 22, 2026 at 11:19:17PM +0200, Jiri Olsa wrote:
> On Fri, May 22, 2026 at 11:50:44AM -0700, Andrii Nakryiko wrote:
> > On Thu, May 21, 2026 at 5:44 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > >
> > > Andrii reported an issue with optimized uprobes [1] that can clobber
> > > redzone area with call instruction storing return address on stack
> > > where user code may keep temporary data without adjusting rsp.
> > >
> > > Fixing this by moving the optimized uprobes on top of 10-bytes nop
> > > instruction, so we can squeeze another instruction to escape the
> > > redzone area before doing the call, like:
> > >
> > >   lea -0x80(%rsp), %rsp
> > >   call tramp
> > >
> > > Note the lea instruction is used to adjust the rsp register without
> > > changing the flags.
> > >
> > > We use nop10 and following transofrmation to optimized instructions
> > > above and back as suggested by Peterz [2].
> > >
> > > Optimize path (int3_update_optimize):
> > >
> > >   1) Initial state after set_swbp() installed the uprobe:
> > >       cc 2e 0f 1f 84 00 00 00 00 00
> > >
> > >      From offset 0 this is INT3 followed by the tail of the original
> > >      10-byte NOP.
> > >
> > >   2) Trap the call slot before rewriting the NOP tail:
> > >       cc 2e 0f 1f 84 [cc] 00 00 00 00
> > >
> > >      From offset 0 this traps on the uprobe INT3.  A thread reaching
> > >      offset 5 traps on the temporary INT3 instead of seeing a partially
> > >      patched call.
> > >
> > >   3) Rewrite the LEA tail and call displacement, keeping both INT3 bytes:
> > >       cc [8d 64 24 80] cc [d0 d1 d2 d3]
> > >
> > >      From offset 0 and offset 5 this still traps.  The bytes between
> > >      them are not executable entry points while both traps are in place.
> > >
> > >   4) Restore the call opcode at offset 5:
> > >       cc 8d 64 24 80 [e8] d0 d1 d2 d3
> > >
> > >      From offset 0 this still traps.  From offset 5 the instruction is
> > >      the final CALL to the uprobe trampoline.
> > >
> > 
> > I'm sorry if I'm slow, but I don't understand why we need that second
> > cc at offset 5? Isn't original nop10 processed by CPU as single
> > instruction? So it will either be at ip of nop10, or at ip+10, no? If
> > we trap at ip and in int3 handler +10 from there while we are
> > installing lea+call, why do we need cc on byte 5?
> > 
> > I.e., I don't understand how CPU can end up being at ip+5 until we
> > finalize lea+call sequence? Can it?
> 
> hum, so I though it's for the case when you do unoptimize+optimize,
> then you can have cpu executing the previous lea and hitting the int3
> on +5 offset.. but as pointed by Peter (and you) the call instruction
> never changes, so now I'm not sure why we need it

So I missed you did the second INT3 in my initial reading.

That second INT3 is absolutely required *IF* the CALL can ever change.
However Andrii pointed out that once the CALL is written, it will always
be the same CALL -- there is but the one trampoline, it doesn't move.

Therefore, the second INT3 is not strictly required.

Does this clarify?

^ permalink raw reply

* [PATCH] tracing: Fix field_var_str allocation errno
From: Yu Peng @ 2026-05-26  9:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Tom Zanussi,
	linux-trace-kernel, linux-kernel, Yu Peng

hist_trigger_elt_data_alloc() returns -EINVAL when the field_var_str
kcalloc() fails. Return -ENOMEM instead, matching the other allocation
failures in the function.

Fixes: c910db943d35 ("tracing: Dynamically allocate the per-elt hist_elt_data array")
Signed-off-by: Yu Peng <pengyu@kylinos.cn>
---
 kernel/trace/trace_events_hist.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index eb2c2bc8bc3d..17fe13e12a4f 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1680,7 +1680,7 @@ static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
 	elt_data->field_var_str = kcalloc(n_str, sizeof(char *), GFP_KERNEL);
 	if (!elt_data->field_var_str) {
 		hist_elt_data_free(elt_data);
-		return -EINVAL;
+		return -ENOMEM;
 	}
 	elt_data->n_field_var_str = n_str;
 
-- 
2.43.0

^ permalink raw reply related

* [PATCH] tracing: Use kstrdup_const() for constant hist field type
From: Yu Peng @ 2026-05-26  9:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-trace-kernel,
	linux-kernel, Yu Peng

The HIST_FIELD_FL_CONST path duplicates the literal "u64" type with
kstrdup(), then releases it through kfree_const().

Use kstrdup_const() instead, avoiding the allocation for a .rodata string
while keeping the matching free helper.

Signed-off-by: Yu Peng <pengyu@kylinos.cn>
---
 kernel/trace/trace_events_hist.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index eb2c2bc8bc3d..6ffe9f4720a0 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1992,7 +1992,7 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
 	if (flags & HIST_FIELD_FL_CONST) {
 		hist_field->fn_num = HIST_FIELD_FN_CONST;
 		hist_field->size = sizeof(u64);
-		hist_field->type = kstrdup("u64", GFP_KERNEL);
+		hist_field->type = kstrdup_const("u64", GFP_KERNEL);
 		if (!hist_field->type)
 			goto free;
 		goto out;
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH bpf-next v2 2/3] tracing: Expose tracepoint BTF ids via tracefs
From: Mykyta Yatsenko @ 2026-05-26 10:07 UTC (permalink / raw)
  To: bpf, rostedt
  Cc: Mykyta Yatsenko, linux-trace-kernel, Andrii Nakryiko,
	Alexei Starovoitov
In-Reply-To: <20260518-generic_tracepoint-v2-2-b755a5cf67bb@meta.com>

Hi Steven,

Gentle ping on this patch from the series.

Since this part touches tracing, I’d appreciate your thoughts on the
tracing changes whenever you have a chance.

Thanks,
Mykyta

^ permalink raw reply

* Re: [PATCHv3 04/12] uprobes/x86: Move optimized uprobe from nop5 to nop10
From: Jiri Olsa @ 2026-05-26 10:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Andrii Nakryiko, Oleg Nesterov, Ingo Molnar,
	Masami Hiramatsu, Andrii Nakryiko, bpf, linux-trace-kernel
In-Reply-To: <20260526091944.GB4149641@noisy.programming.kicks-ass.net>

On Tue, May 26, 2026 at 11:19:44AM +0200, Peter Zijlstra wrote:
> On Fri, May 22, 2026 at 11:19:17PM +0200, Jiri Olsa wrote:
> > On Fri, May 22, 2026 at 11:50:44AM -0700, Andrii Nakryiko wrote:
> > > On Thu, May 21, 2026 at 5:44 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > > >
> > > > Andrii reported an issue with optimized uprobes [1] that can clobber
> > > > redzone area with call instruction storing return address on stack
> > > > where user code may keep temporary data without adjusting rsp.
> > > >
> > > > Fixing this by moving the optimized uprobes on top of 10-bytes nop
> > > > instruction, so we can squeeze another instruction to escape the
> > > > redzone area before doing the call, like:
> > > >
> > > >   lea -0x80(%rsp), %rsp
> > > >   call tramp
> > > >
> > > > Note the lea instruction is used to adjust the rsp register without
> > > > changing the flags.
> > > >
> > > > We use nop10 and following transofrmation to optimized instructions
> > > > above and back as suggested by Peterz [2].
> > > >
> > > > Optimize path (int3_update_optimize):
> > > >
> > > >   1) Initial state after set_swbp() installed the uprobe:
> > > >       cc 2e 0f 1f 84 00 00 00 00 00
> > > >
> > > >      From offset 0 this is INT3 followed by the tail of the original
> > > >      10-byte NOP.
> > > >
> > > >   2) Trap the call slot before rewriting the NOP tail:
> > > >       cc 2e 0f 1f 84 [cc] 00 00 00 00
> > > >
> > > >      From offset 0 this traps on the uprobe INT3.  A thread reaching
> > > >      offset 5 traps on the temporary INT3 instead of seeing a partially
> > > >      patched call.
> > > >
> > > >   3) Rewrite the LEA tail and call displacement, keeping both INT3 bytes:
> > > >       cc [8d 64 24 80] cc [d0 d1 d2 d3]
> > > >
> > > >      From offset 0 and offset 5 this still traps.  The bytes between
> > > >      them are not executable entry points while both traps are in place.
> > > >
> > > >   4) Restore the call opcode at offset 5:
> > > >       cc 8d 64 24 80 [e8] d0 d1 d2 d3
> > > >
> > > >      From offset 0 this still traps.  From offset 5 the instruction is
> > > >      the final CALL to the uprobe trampoline.
> > > >
> > > 
> > > I'm sorry if I'm slow, but I don't understand why we need that second
> > > cc at offset 5? Isn't original nop10 processed by CPU as single
> > > instruction? So it will either be at ip of nop10, or at ip+10, no? If
> > > we trap at ip and in int3 handler +10 from there while we are
> > > installing lea+call, why do we need cc on byte 5?
> > > 
> > > I.e., I don't understand how CPU can end up being at ip+5 until we
> > > finalize lea+call sequence? Can it?
> > 
> > hum, so I though it's for the case when you do unoptimize+optimize,
> > then you can have cpu executing the previous lea and hitting the int3
> > on +5 offset.. but as pointed by Peter (and you) the call instruction
> > never changes, so now I'm not sure why we need it
> 
> So I missed you did the second INT3 in my initial reading.
> 
> That second INT3 is absolutely required *IF* the CALL can ever change.
> However Andrii pointed out that once the CALL is written, it will always
> be the same CALL -- there is but the one trampoline, it doesn't move.
> 
> Therefore, the second INT3 is not strictly required.
> 
> Does this clarify?

yes, will change that in next version

thanks,
jirka

^ permalink raw reply

* Re: [PATCH] tracing: Disable KCOV instrumentation for trace_irqsoff.o
From: Karl Mehltretter @ 2026-05-26 10:22 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Mathieu Desnoyers, Dmitry Vyukov,
	Andrey Konovalov, Marco Elver, kasan-dev, linux-trace-kernel,
	linux-kernel
In-Reply-To: <20260526150758.4e0f37745d688f95a1c710d8@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 3082 bytes --]

On Tue, May 26, 2026 at 03:07:58PM +0100, Masami Hiramatsu wrote:
> Thanks for reporting. This looks good to me for a mitigation.
> BTW, I could not reproduce the bug with above configs.
> Is this only for arm32?

I was able to reproduce this on arm64 QEMU virt with the attached
config and log.

Test base:
  4cbfe4502e3d ("Merge tag 'v7.1-rc5-ksmbd-server-fixes' ...")

QEMU command:
  qemu-system-aarch64 \
    -machine virt,gic-version=2 -cpu cortex-a57 -m 512M -smp 1 \
    -kernel arch/arm64/boot/Image \
    -append "console=ttyAMA0,115200 earlycon=pl011,0x9000000 rdinit=/init panic_on_warn=0 oops=panic loglevel=8 printk.time=1" \
    -nographic -no-reboot

Relevant config options:
  CONFIG_TRACE_IRQFLAGS=y
  CONFIG_IRQSOFF_TRACER=y
  CONFIG_KCOV=y
  CONFIG_KCOV_INSTRUMENT_ALL=y
  CONFIG_KCOV_SELFTEST=y

The raw arm64 crash first runs into other KCOV-instrumented early
IRQ/stack helpers. To isolate the trace_irqsoff.o part, I used the
following additional changes. This is not intended for merge:

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 74b76bb70452..d69eb3fd0577 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -24,6 +24,9 @@ KASAN_SANITIZE_stacktrace.o := n
 # inhibit KCOV instrumentation, disable it for the entire compilation unit.
 KCOV_INSTRUMENT_entry-common.o := n
 KCOV_INSTRUMENT_idle.o := n
+KCOV_INSTRUMENT_irq.o := n
+KCOV_INSTRUMENT_return_address.o := n
+KCOV_INSTRUMENT_stacktrace.o := n

 # Object file lists.
 obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
diff --git a/kernel/time/Makefile b/kernel/time/Makefile
index eaf290c972f9..2641a44f6339 100644
--- a/kernel/time/Makefile
+++ b/kernel/time/Makefile
@@ -21,6 +21,7 @@ ifeq ($(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST),y)
  obj-$(CONFIG_TICK_ONESHOT)			+= tick-broadcast-hrtimer.o
 endif
 obj-$(CONFIG_GENERIC_SCHED_CLOCK)		+= sched_clock.o
+KCOV_INSTRUMENT_sched_clock.o := n
 obj-$(CONFIG_TICK_ONESHOT)			+= tick-oneshot.o tick-sched.o
 obj-$(CONFIG_LEGACY_TIMER_TICK)			+= tick-legacy.o
 ifeq ($(CONFIG_SMP),y)

With these changes, but with trace_irqsoff.o still instrumented,
the kernel still crashes during the KCOV selftest:

  kcov: running self test
  pc : __sanitizer_cov_trace_pc+0x64/0x84
  Kernel panic - not syncing: kernel stack overflow
  ...
  tracer_hardirqs_off+0x1c/0x78
  trace_hardirqs_off.part.0+0x70/0x1a0
  trace_hardirqs_off_finish+0x60/0x6c
  arm64_enter_from_kernel_mode.isra.0+0x18/0x38
  el1_interrupt+0x24/0x58
  el1h_64_irq+0x6c/0x70
  kcov_init+0xc8/0x118

Then adding the line from my original ARMv5
mitigation makes the arm64 kernel boot through the KCOV selftest:

  KCOV_INSTRUMENT_trace_irqsoff.o := n

The boot log then shows:

  kcov: running self test
  kcov: done running self test
  tiny-init: reached userspace

So arm64 also confirms that trace_irqsoff.o is reachable from this early
IRQ entry path while KCOV selftest mode is active.

Arm64 appears to have additional KCOV/early-entry paths with this config,
which probably need to be investigated independently.

Regards,
Karl

[-- Attachment #2: arm64-kcov.config.gz --]
[-- Type: application/x-gunzip, Size: 11591 bytes --]

[-- Attachment #3: arm64-kcov-trace-irqsoff-crash.log.gz --]
[-- Type: application/x-gunzip, Size: 4458 bytes --]

^ permalink raw reply related

* [PATCH 1/4] rtla/actions: Restore continue flag in actions_perform()
From: Tomas Glozar @ 2026-05-26 10:25 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel

Currently, actions_perform() only ever sets the continue flag (when
performing the continue action), but never resets it. That leads to
RTLA continuing tracing even if the continue action was not performed in
the current iteration.

For example, the following command:

$ rtla timerlat hist -T 100 --on-threshold shell,command='
    echo Spike!
    if [ -f /tmp/a ]
    then
      exit 1
    else
      touch /tmp/a
    fi' --on-threshold continue

should print Spike! at most once, because after hitting the threshold
for the first time, /tmp/a exists, the shell action will fail, and the
continue action is not performed. However, unless /tmp/a exists before
the measurement, it will print Spike! until stopped, as the continue
flag stays set.

Set the continue flag to false in the beginning of actions_perform() to
make RTLA continue only if the action was actually performed.

Fixes: 8d933d5c89e ("rtla/timerlat: Add continue action")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/src/actions.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/tracing/rtla/src/actions.c b/tools/tracing/rtla/src/actions.c
index b0d68b5de08d..bf13d9d68f16 100644
--- a/tools/tracing/rtla/src/actions.c
+++ b/tools/tracing/rtla/src/actions.c
@@ -247,6 +247,8 @@ actions_perform(struct actions *self)
 	int pid, retval;
 	const struct action *action;

+	self->continue_flag = false;
+
 	for_each_action(self, action) {
 		switch (action->type) {
 		case ACTION_TRACE_OUTPUT:
-- 
2.54.0

^ permalink raw reply related

* [PATCH 2/4] rtla/tests: Add unit test for restoring continue flag
From: Tomas Glozar @ 2026-05-26 10:25 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260526102523.2662391-1-tglozar@redhat.com>

In case an action preceding the continue action fails, not only
the continue flag should not be set, it should be unset if it was set
from a previous run of actions_perform().

Add a unit test to check if this is implemented correctly.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---

Depends on "rtla/tests: Add unit tests for actions module"
- https://lore.kernel.org/linux-trace-kernel/20260424140244.958495-1-tglozar@redhat.com/

 tools/tracing/rtla/tests/unit/actions.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tools/tracing/rtla/tests/unit/actions.c b/tools/tracing/rtla/tests/unit/actions.c
index a5808ab71a4d..94ad5ad42774 100644
--- a/tools/tracing/rtla/tests/unit/actions.c
+++ b/tools/tracing/rtla/tests/unit/actions.c
@@ -328,6 +328,18 @@ START_TEST(test_actions_perform_continue_after_failed_shell_command)
 }
 END_TEST
 
+START_TEST(test_actions_perform_continue_unset_flag)
+{
+	actions_fixture.continue_flag = true;
+
+	actions_add_shell(&actions_fixture, "exit 1");
+	actions_add_continue(&actions_fixture);
+	ck_assert_int_eq(actions_perform(&actions_fixture), 1 << 8);
+
+	ck_assert(!actions_fixture.continue_flag);
+}
+END_TEST
+
 Suite *actions_suite(void)
 {
 	Suite *s = suite_create("actions");
@@ -374,6 +386,7 @@ Suite *actions_suite(void)
 	tcase_add_test(tc, test_actions_perform_continue);
 	tcase_add_test(tc, test_actions_perform_continue_after_successful_shell_command);
 	tcase_add_test(tc, test_actions_perform_continue_after_failed_shell_command);
+	tcase_add_test(tc, test_actions_perform_continue_unset_flag);
 	suite_add_tcase(s, tc);
 
 	return s;
-- 
2.54.0


^ permalink raw reply related

* [PATCH 3/4] rtla/tests: Run runtime tests in temporary directory
From: Tomas Glozar @ 2026-05-26 10:25 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260526102523.2662391-1-tglozar@redhat.com>

Create a temporary directory before each test case to serve as working
directory during the duration of the test.

This prevents littering of the original working directory as well as
allows tests to use it to avoid path conflicts.

In order not to break already existing tests, also add a new "testdir"
variable containing the directory where the test file is located. This
is then used to locate artifacts used during testing like BPF programs
and scripts for checking the tracer threads.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---

Depends on "rtla/tests: Extend runtime test coverage" patchset
- https://lore.kernel.org/linux-trace-kernel/20260423130558.882022-1-tglozar@redhat.com/

 tools/tracing/rtla/tests/engine.sh  | 12 ++++++++++++
 tools/tracing/rtla/tests/osnoise.t  |  8 ++++----
 tools/tracing/rtla/tests/timerlat.t | 16 ++++++++--------
 3 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/tools/tracing/rtla/tests/engine.sh b/tools/tracing/rtla/tests/engine.sh
index 27d92f19a322..5bf8453d354d 100644
--- a/tools/tracing/rtla/tests/engine.sh
+++ b/tools/tracing/rtla/tests/engine.sh
@@ -4,6 +4,9 @@ test_begin() {
 	# Count tests to allow the test harness to double-check if all were
 	# included correctly.
 	ctr=0
+	# Set test directory to the directory of the script
+	scriptfile=$(realpath "$0")
+	testdir=$(dirname "$scriptfile")
 	[ -z "$RTLA" ] && RTLA="./rtla"
 	[ -n "$TEST_COUNT" ] && echo "1..$TEST_COUNT"
 }
@@ -51,6 +54,11 @@ check() {
 	then
 		# Reset osnoise options before running test.
 		[ "$NO_RESET_OSNOISE" == 1 ] || reset_osnoise
+
+		# Create a temporary directory to contain rtla output
+		tmpdir=$(mktemp -d)
+		pushd $tmpdir >/dev/null
+
 		# Run rtla; in case of failure, include its output as comment
 		# in the test results.
 		result=$(eval stdbuf -oL $TIMEOUT "$RTLA" $2 2>&1); exitcode=$?
@@ -82,6 +90,10 @@ check() {
 			echo "$result" | col -b | while read line; do echo "# $line"; done
 			printf "#\n# exit code %s\n" $exitcode
 		fi
+
+		# Remove temporary directory
+		popd >/dev/null
+		rm -r $tmpdir
 	fi
 }
 
diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t
index 06787471d0e8..9c2f84a4187d 100644
--- a/tools/tracing/rtla/tests/osnoise.t
+++ b/tools/tracing/rtla/tests/osnoise.t
@@ -16,15 +16,15 @@ check_top_q_hist "verify the --trace param" \
 
 # Thread tests
 check_top_q_hist "verify the --priority/-P param" \
-	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh SCHED_FIFO 1\"" \
+	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"$testdir/scripts/check-priority.sh SCHED_FIFO 1\"" \
 	2 "Priorities are set correctly"
 check_top_q_hist "verify the -C/--cgroup param" \
-	"osnoise TOOL -C -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"tests/scripts/check-cgroup-match.sh\"" \
+	"osnoise TOOL -C -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"$testdir/scripts/check-cgroup-match.sh\"" \
 	2 "cgroup matches for all workload PIDs"
 check_top_q_hist "verify the -c/--cpus param" \
-	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
+	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=$testdir/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
 check_top_q_hist "verify the -H/--house-keeping param" \
-	"osnoise TOOL -P F:1 -H 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=tests/scripts/check-housekeeping-cpus.sh" 2 "^Affinity of threads: 0$"
+	"osnoise TOOL -P F:1 -H 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=$testdir/scripts/check-housekeeping-cpus.sh" 2 "^Affinity of threads: 0$"
 
 # Histogram tests
 check "hist with -b/--bucket-size" \
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index 3ebfe316b39e..f3e5f99e862b 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -41,19 +41,19 @@ check_top_hist "disable auto-analysis" \
 
 # Thread tests
 check_top_hist "verify -P/--priority" \
-	"timerlat TOOL -P F:1 -c 0 -d 10s -T 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh SCHED_FIFO 1\"" \
+	"timerlat TOOL -P F:1 -c 0 -d 10s -T 1 --on-threshold shell,command=\"$testdir/scripts/check-priority.sh SCHED_FIFO 1\"" \
 	2 "Priorities are set correctly"
 check_top_hist "verify -C/--cgroup" \
-	"timerlat TOOL -k -C -c 0 -d 10s -T 1 --on-threshold shell,command=\"tests/scripts/check-cgroup-match.sh\"" \
+	"timerlat TOOL -k -C -c 0 -d 10s -T 1 --on-threshold shell,command=\"$testdir/scripts/check-cgroup-match.sh\"" \
 	2 "cgroup matches for all workload PIDs"
 check_top_q_hist "verify -c/--cpus" \
-	"timerlat TOOL -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
+	"timerlat TOOL -c 0 -d 10s -T 1 --on-threshold shell,command=$testdir/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
 check_top_q_hist "verify -H/--house-keeping" \
-	"timerlat TOOL -H 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-housekeeping-cpus.sh" 2 "^Affinity of threads: 0$"
+	"timerlat TOOL -H 0 -d 10s -T 1 --on-threshold shell,command=$testdir/scripts/check-housekeeping-cpus.sh" 2 "^Affinity of threads: 0$"
 check_top_q_hist "verify -k/--kernel-threads" \
-	"timerlat TOOL -k -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-user-kernel-threads.sh" 2 "1 kernel threads, 0 user threads"
+	"timerlat TOOL -k -c 0 -d 10s -T 1 --on-threshold shell,command=$testdir/scripts/check-user-kernel-threads.sh" 2 "1 kernel threads, 0 user threads"
 check_top_q_hist "verify -u/--user-threads" \
-	"timerlat TOOL -u -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-user-kernel-threads.sh" 2 "0 kernel threads, 1 user threads"
+	"timerlat TOOL -u -c 0 -d 10s -T 1 --on-threshold shell,command=$testdir/scripts/check-user-kernel-threads.sh" 2 "0 kernel threads, 1 user threads"
 
 # Histogram tests
 check "hist with -b/--bucket-size" \
@@ -103,12 +103,12 @@ then
 	# Test BPF action program properly in BPF mode
 	[ -z "$BPFTOOL" ] && BPFTOOL=bpftool
 	check_top_q_hist "with BPF action program (BPF mode)" \
-		"timerlat TOOL -T 2 --bpf-action tests/bpf/bpf_action_map.o --on-threshold shell,command='$BPFTOOL map dump name rtla_test_map'" \
+		"timerlat TOOL -T 2 --bpf-action $testdir/bpf/bpf_action_map.o --on-threshold shell,command='$BPFTOOL map dump name rtla_test_map'" \
 		2 '"value": 42'
 else
 	# Test BPF action program failure in non-BPF mode
 	check_top_q_hist "with BPF action program (non-BPF mode)" \
-		"timerlat TOOL -T 2 --bpf-action tests/bpf/bpf_action_map.o" \
+		"timerlat TOOL -T 2 --bpf-action $testdir/bpf/bpf_action_map.o" \
 		1 "BPF actions are not supported in tracefs-only mode"
 fi
 done
-- 
2.54.0


^ permalink raw reply related

* [PATCH 4/4] rtla/tests: Add runtime tests for restoring continue flag
From: Tomas Glozar @ 2026-05-26 10:25 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260526102523.2662391-1-tglozar@redhat.com>

In case an action preceding the continue action fails, not only
the continue flag should not be set, it should be unset if it was set
from a previous run of actions_perform().

Add a runtime test to both osnoise and timerlat tools that checks that
this works properly by creating a temporary file.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---

Depends on "rtla/tests: Extend runtime test coverage" patchset
- https://lore.kernel.org/linux-trace-kernel/20260423130558.882022-1-tglozar@redhat.com/

 tools/tracing/rtla/tests/osnoise.t  | 2 ++
 tools/tracing/rtla/tests/timerlat.t | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t
index 9c2f84a4187d..a7956ab605cd 100644
--- a/tools/tracing/rtla/tests/osnoise.t
+++ b/tools/tracing/rtla/tests/osnoise.t
@@ -65,6 +65,8 @@ check "top stop at failed action" \
 	"osnoise top -S 2 --on-threshold shell,command='echo -n abc; false' --on-threshold shell,command='echo -n defgh'" 2 "^abc" "defgh"
 check_top_q_hist "with continue" \
 	"osnoise TOOL -S 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
+check_top_q_hist "with conditional continue" \
+	"osnoise TOOL -S 2 --on-threshold shell,command='if [ -f a ]; then echo 2; exit 1; else echo -n 1; touch a; fi' --on-threshold continue" 2 "^12$" "^2$"
 check_top_hist "with trace output at end" \
 	"osnoise TOOL -d 1s --on-end trace" 0 "^  Saving trace to osnoise_trace.txt$"
 
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index f3e5f99e862b..19fd5af26ebb 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -94,6 +94,8 @@ check "top stop at failed action" \
 	"timerlat top -T 2 --on-threshold shell,command='echo -n abc; false' --on-threshold shell,command='echo -n defgh'" 2 "^abc" "defgh"
 check_top_q_hist "with continue" \
 	"timerlat TOOL -T 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
+check_top_q_hist "with conditional continue" \
+	"timerlat TOOL -T 2 --on-threshold shell,command='if [ -f a ]; then echo 2; exit 1; else echo -n 1; touch a; fi' --on-threshold continue" 2 "^12$" "^2$"
 check_top_hist "with trace output at end" \
 	"timerlat TOOL -d 1s --on-end trace" 0 "^  Saving trace to timerlat_trace.txt$"
 
-- 
2.54.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox