Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* Re: [PATCH v10 0/6] mm/memory-failure: add panic option for unrecoverable pages
From: Andrew Morton @ 2026-06-26 16:27 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Miaohe Lin, David Hildenbrand, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Naoya Horiguchi, Jonathan Corbet, Shuah Khan, Liam R. Howlett,
	lance.yang, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team
In-Reply-To: <20260626-ecc_panic-v10-0-6dacb8ad024d@debian.org>

On Fri, 26 Jun 2026 08:33:14 -0700 Breno Leitao <leitao@debian.org> wrote:

> A multi-bit ECC error on a kernel-owned page that the memory failure
> handler cannot recover is currently swallowed: PG_hwpoison is set, the
> event is logged, and the kernel keeps running.  The corrupted memory
> remains accessible to the kernel and either drives silent data
> corruption or surfaces seconds-to-minutes later as an apparently
> unrelated crash.  In a large fleet that delayed, unattributable crash
> turns into significant engineering effort to root-cause; in a kdump
> configuration, by the time the crash happens the original error
> context (faulting PFN, MCE/GHES record, page state) is long gone.
> 
> This series adds an opt-in sysctl,
> vm.panic_on_unrecoverable_memory_failure, that converts an
> unrecoverable kernel-page hwpoison event into an immediate panic with
> a clean dmesg/vmcore that still contains the original failure
> context.  The default is disabled so existing workloads see no
> change.

Cool, thanks.  I added this to mm.git's mm-new branch.  Next week I'll
move it into the mm-unstable branch, where it will receive linux-next
exposure.

Sashiko identified a few possible things, some pre-existing:

	https://sashiko.dev/#/patchset/20260626-ecc_panic-v10-0-6dacb8ad024d@debian.org

^ permalink raw reply

* Re: [PATCH v4 2/2] tracing: Remove trace_printk.h from kernel.h
From: Nathan Chancellor @ 2026-06-26 19:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Peter Zijlstra, Julia Lawall, Yury Norov, linux-doc, linux-kbuild,
	linuxppc-dev, dri-devel, linux-stm32, linux-arm-kernel,
	linux-rdma, linux-usb, linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260626045119.659d1e6b@fedora>

On Fri, Jun 26, 2026 at 04:51:19AM -0400, Steven Rostedt wrote:
> On Thu, 25 Jun 2026 16:41:58 -0700
> Nathan Chancellor <nathan@kernel.org> wrote:
> 
> 
> > The following diff resolves it for me, should I send it as a separate
> > patch or do you want to just fold it in with a note?
> > 
> > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> > index 621566345406..2301a701ffbb 100644
> > --- a/include/linux/lockdep.h
> > +++ b/include/linux/lockdep.h
> > @@ -10,6 +10,7 @@
> >  #ifndef __LINUX_LOCKDEP_H
> >  #define __LINUX_LOCKDEP_H
> >  
> > +#include <linux/instruction_pointer.h>
> 
> Ah, so the reason for this breakage is because lockdep was relying on
> instruction_pointer.h, that just happened to be included in kernel.h
> via trace_printk.h.

Correct.

> This is a separate issue, so it should be a separate patch. I'll add it
> as patch 1 of this series.

Sounds good, thanks!

> Can you send me the config you used. This didn't trigger in my tests.

It is a plain allmodconfig, for example on arm:

  $ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- allmodconfig lib/test_context-analysis.o
  In file included from include/linux/local_lock_internal.h:8,
                   from include/linux/local_lock.h:5,
                   from lib/test_context-analysis.c:9:
  include/linux/local_lock_internal.h: In function 'local_lock_acquire':
  include/linux/lockdep.h:541:87: error: '_THIS_IP_' undeclared (first use in this function)
    541 | #define lock_map_acquire(l)                     lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_)
        |                                                                                       ^~~~~~~~~
  include/linux/lockdep.h:509:88: note: in definition of macro 'lock_acquire_exclusive'
    509 | #define lock_acquire_exclusive(l, s, t, n, i)           lock_acquire(l, s, t, 0, 1, n, i)
        |                                                                                        ^
  include/linux/local_lock_internal.h:46:9: note: in expansion of macro 'lock_map_acquire'
     46 |         lock_map_acquire(&l->dep_map);
        |         ^~~~~~~~~~~~~~~~
  include/linux/lockdep.h:541:87: note: each undeclared identifier is reported only once for each function it appears in
  ...

I also reproduced it on top of allnoconfig:

  $ cat allno.config
  CONFIG_CONTEXT_ANALYSIS_TEST=y
  CONFIG_DEBUG_KERNEL=y
  CONFIG_DEBUG_LOCK_ALLOC=y
  CONFIG_EXPERT=y
  CONFIG_MMU=y
  CONFIG_RUNTIME_TESTING_MENU=y

  $ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- KCONFIG_ALLCONFIG=1 clean allnoconfig lib/test_context-analysis.o
  <same error as above>

-- 
Cheers,
Nathan

^ permalink raw reply

* Re: [PATCH v8 24/46] KVM: guest_memfd: Make in-place conversion the default\
From: Sean Christopherson @ 2026-06-26 19:06 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, jmattson, jthoughton, michael.roth, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, forkloop, pratyush, suzuki.poulose,
	aneesh.kumar, liam, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Baoquan He, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <aj3H2sxymOYTWTnE@yzhao56-desk.sh.intel.com>

On Fri, Jun 26, 2026, Yan Zhao wrote:
> On Thu, Jun 25, 2026 at 07:36:28AM -0700, Sean Christopherson wrote:
> > On Thu, Jun 25, 2026, Yan Zhao wrote:
> > And I'm not remotely convinced that prepending allow_ to the param will help
> > end users diagnose "unexpected" memory consumption, in quotes because anyone that
> > is deploying a stack that utilizes out-of-place conversion absolutely needs to
> > understand and plan for the additional memory consumption.  I.e. if the memory
> > consumption is "unexpected" to the end user, they likely have far bigger problems.
> My first impression of gmem_in_place_conversion=true was that it enforces gmem
> in-place conversion. However, it actually only enforces per-gmem private/shared
> attribute.
> My worry was that people might think it's a kernel bug if userspace can still
> have shared memory from other sources after they configured
> gmem_in_place_conversion=true.

Ah, I see where you're coming from.  FWIW, truly enforcing in-place conversion
is flat out impossible.  E.g. userspace can simply replace the memslot, at which
point the memory effectively reverts to shared.

> However, I have no strong opinion if you think gmem_in_place_conversion is good,
> and with the above documentation. :)

Ya, I think this largely a documentation problem.  I agree that a param name
like gmem_private_memory_attributes would be more precise, but I think it'd be
far less informative for the vast majority of users that only care whether or
not KVM can do in-place conversion, and don't care about how that is done.

^ permalink raw reply

* [PATCH rfc 0/2] Improvements to ftrace comm[] handling
From: David Laight @ 2026-06-26 21:23 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel, Michal Koutný
  Cc: David Laight

RFC because they are untested and I want to send them before going
on holiday for 2 weeks (should still have email access).

The first patch avoids a lot of 'potentially unsized' string
functions by embedding the char[] use to hold task->comm[] in a
structure.

The second adjsuts the data structure used to cache the task
names in the thread switch code.

David Laight (2):
  tracing: Embed 'char comm[16]' in a structure
  tracing: Keep pid and comm[] in the same structure

 kernel/trace/blktrace.c              |  28 +++----
 kernel/trace/trace.c                 |   3 +-
 kernel/trace/trace.h                 |   9 ++-
 kernel/trace/trace_events_filter.c   |   2 +-
 kernel/trace/trace_events_hist.c     |  26 +++---
 kernel/trace/trace_functions_graph.c |  10 +--
 kernel/trace/trace_output.c          |  24 +++---
 kernel/trace/trace_sched_switch.c    | 113 ++++++++++++---------------
 8 files changed, 101 insertions(+), 114 deletions(-)

-- 
2.39.5


^ permalink raw reply

* [PATCH 1/2] tracing: Embed 'char comm[16]' in a structure
From: David Laight @ 2026-06-26 21:23 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel, Michal Koutný
  Cc: David Laight
In-Reply-To: <20260626212356.64150-1-david.laight.linux@gmail.com>

Embedding the array in a stucture makes the size explicit and lets
structure copies be used.

Limit the size to 16 charatacters even if task_struct.comm is
made larger (there are plans to increase it).

Signed-off-by: David Laight <david.laight.linux@gmail.com>
---
 kernel/trace/blktrace.c              | 28 +++++++++----------
 kernel/trace/trace.c                 |  3 +-
 kernel/trace/trace.h                 |  9 ++++--
 kernel/trace/trace_events_filter.c   |  2 +-
 kernel/trace/trace_events_hist.c     | 26 +++++++----------
 kernel/trace/trace_functions_graph.c | 10 +++----
 kernel/trace/trace_output.c          | 24 ++++++++--------
 kernel/trace/trace_sched_switch.c    | 42 ++++++++++++++++------------
 8 files changed, 75 insertions(+), 69 deletions(-)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 8cd2520b4c99..68ffc95548b7 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -1590,20 +1590,20 @@ static void blk_log_dump_pdu(struct trace_seq *s,
 
 static void blk_log_generic(struct trace_seq *s, const struct trace_entry *ent, bool has_cg)
 {
-	char cmd[TASK_COMM_LEN];
+	struct trace_comm cmd;
 
-	trace_find_cmdline(ent->pid, cmd);
+	trace_find_cmdline(ent->pid, &cmd);
 
 	if (t_action(ent) & BLK_TC_ACT(BLK_TC_PC)) {
 		trace_seq_printf(s, "%u ", t_bytes(ent));
 		blk_log_dump_pdu(s, ent, has_cg);
-		trace_seq_printf(s, "[%s]\n", cmd);
+		trace_seq_printf(s, "[%s]\n", cmd.comm);
 	} else {
 		if (t_sec(ent))
 			trace_seq_printf(s, "%llu + %u [%s]\n",
-						t_sector(ent), t_sec(ent), cmd);
+						t_sector(ent), t_sec(ent), cmd.comm);
 		else
-			trace_seq_printf(s, "[%s]\n", cmd);
+			trace_seq_printf(s, "[%s]\n", cmd.comm);
 	}
 }
 
@@ -1637,30 +1637,30 @@ static void blk_log_remap(struct trace_seq *s, const struct trace_entry *ent, bo
 
 static void blk_log_plug(struct trace_seq *s, const struct trace_entry *ent, bool has_cg)
 {
-	char cmd[TASK_COMM_LEN];
+	struct trace_comm cmd;
 
-	trace_find_cmdline(ent->pid, cmd);
+	trace_find_cmdline(ent->pid, &cmd);
 
-	trace_seq_printf(s, "[%s]\n", cmd);
+	trace_seq_printf(s, "[%s]\n", cmd.comm);
 }
 
 static void blk_log_unplug(struct trace_seq *s, const struct trace_entry *ent, bool has_cg)
 {
-	char cmd[TASK_COMM_LEN];
+	struct trace_comm cmd;
 
-	trace_find_cmdline(ent->pid, cmd);
+	trace_find_cmdline(ent->pid, &cmd);
 
-	trace_seq_printf(s, "[%s] %llu\n", cmd, get_pdu_int(ent, has_cg));
+	trace_seq_printf(s, "[%s] %llu\n", cmd.comm, get_pdu_int(ent, has_cg));
 }
 
 static void blk_log_split(struct trace_seq *s, const struct trace_entry *ent, bool has_cg)
 {
-	char cmd[TASK_COMM_LEN];
+	struct trace_comm cmd;
 
-	trace_find_cmdline(ent->pid, cmd);
+	trace_find_cmdline(ent->pid, &cmd);
 
 	trace_seq_printf(s, "%llu / %llu [%s]\n", t_sector(ent),
-			 get_pdu_int(ent, has_cg), cmd);
+			 get_pdu_int(ent, has_cg), cmd.comm);
 }
 
 static void blk_log_msg(struct trace_seq *s, const struct trace_entry *ent,
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6eb4d3097a4d..7de658b8ee0d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -52,6 +52,7 @@
 #include <linux/sort.h>
 #include <linux/io.h> /* vmap_page_range() */
 #include <linux/fs_context.h>
+#include <linux/trace_printk.h>
 
 #include <asm/setup.h> /* COMMAND_LINE_SIZE */
 
@@ -2972,7 +2973,7 @@ print_trace_header(struct seq_file *m, struct trace_iterator *iter)
 	seq_puts(m, "#    -----------------\n");
 	seq_printf(m, "#    | task: %.16s-%d "
 		   "(uid:%d nice:%ld policy:%ld rt_prio:%ld)\n",
-		   data->comm, data->pid,
+		   data->comm.comm, data->pid,
 		   from_kuid_munged(seq_user_ns(m), data->uid), data->nice,
 		   data->policy, data->rt_priority);
 	seq_puts(m, "#    -----------------\n");
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 80fe152af1dd..afd59d79e1fe 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -183,6 +183,11 @@ struct fexit_trace_entry_head {
 
 struct trace_array;
 
+/* task_struct->comm[] may be truncated to save memory/width */
+struct trace_comm {
+	char comm[16];
+};
+
 /*
  * The CPU trace array - it consists of thousands of trace entries
  * plus some other descriptor data: (for example which task started
@@ -203,7 +208,7 @@ struct trace_array_cpu {
 	u64			preempt_timestamp;
 	pid_t			pid;
 	kuid_t			uid;
-	char			comm[TASK_COMM_LEN];
+	struct trace_comm	comm;
 
 #ifdef CONFIG_FUNCTION_TRACER
 	int			ftrace_ignore_pid;
@@ -906,7 +911,7 @@ void trace_last_func_repeats(struct trace_array *tr,
 
 extern u64 ftrace_now(int cpu);
 
-extern void trace_find_cmdline(int pid, char comm[]);
+extern void trace_find_cmdline(int pid, struct trace_comm *comm);
 extern int trace_find_tgid(int pid);
 extern void trace_event_follow_fork(struct trace_array *tr, bool enable);
 
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 609325f57942..749887aff315 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -994,7 +994,7 @@ static int filter_pred_comm(struct filter_pred *pred, void *event)
 	int cmp;
 
 	cmp = pred->regex->match(current->comm, pred->regex,
-				TASK_COMM_LEN);
+				sizeof(current->comm));
 	return cmp ^ pred->not;
 }
 
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0dbbf6cca9bc..1b51491b2a41 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -680,7 +680,7 @@ struct track_data {
 };
 
 struct hist_elt_data {
-	char *comm;
+	struct trace_comm *comm;
 	u64 *var_ref_vals;
 	char **field_var_str;
 	int n_field_var_str;
@@ -756,7 +756,7 @@ static struct track_data *track_data_alloc(unsigned int key_len,
 
 	data->elt.private_data = elt_data;
 
-	elt_data->comm = kzalloc(TASK_COMM_LEN, GFP_KERNEL);
+	elt_data->comm = kzalloc_obj(*elt_data->comm);
 	if (!elt_data->comm) {
 		track_data_free(data);
 		return ERR_PTR(-ENOMEM);
@@ -1608,19 +1608,19 @@ parse_hist_trigger_attrs(struct trace_array *tr, char *trigger_str)
 	return ERR_PTR(ret);
 }
 
-static inline void save_comm(char *comm, struct task_struct *task)
+static inline void save_comm(struct trace_comm *comm, struct task_struct *task)
 {
 	if (!task->pid) {
-		strcpy(comm, "<idle>");
+		strcpy(comm->comm, "<idle>");
 		return;
 	}
 
 	if (WARN_ON_ONCE(task->pid < 0)) {
-		strcpy(comm, "<XXX>");
+		strcpy(comm->comm, "<XXX>");
 		return;
 	}
 
-	strscpy(comm, task->comm, TASK_COMM_LEN);
+	strscpy(comm->comm, task->comm);
 }
 
 static void hist_elt_data_free(struct hist_elt_data *elt_data)
@@ -1646,7 +1646,6 @@ static void hist_trigger_elt_data_free(struct tracing_map_elt *elt)
 static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
 {
 	struct hist_trigger_data *hist_data = elt->map->private_data;
-	unsigned int size = TASK_COMM_LEN;
 	struct hist_elt_data *elt_data;
 	struct hist_field *hist_field;
 	unsigned int i, n_str;
@@ -1659,7 +1658,7 @@ static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
 		hist_field = hist_data->fields[i];
 
 		if (hist_field->flags & HIST_FIELD_FL_EXECNAME) {
-			elt_data->comm = kzalloc(size, GFP_KERNEL);
+			elt_data->comm = kzalloc_obj(*elt_data->comm);
 			if (!elt_data->comm) {
 				kfree(elt_data);
 				return -ENOMEM;
@@ -1677,8 +1676,6 @@ static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
 
 	BUILD_BUG_ON(STR_VAR_LEN_MAX & (sizeof(u64) - 1));
 
-	size = STR_VAR_LEN_MAX;
-
 	elt_data->field_var_str = kcalloc(n_str, sizeof(char *), GFP_KERNEL);
 	if (!elt_data->field_var_str) {
 		hist_elt_data_free(elt_data);
@@ -1687,7 +1684,7 @@ static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
 	elt_data->n_field_var_str = n_str;
 
 	for (i = 0; i < n_str; i++) {
-		elt_data->field_var_str[i] = kzalloc(size, GFP_KERNEL);
+		elt_data->field_var_str[i] = kzalloc(STR_VAR_LEN_MAX, GFP_KERNEL);
 		if (!elt_data->field_var_str[i]) {
 			hist_elt_data_free(elt_data);
 			return -ENOMEM;
@@ -3449,7 +3446,7 @@ static bool cond_snapshot_update(struct trace_array *tr, void *cond_data)
 	elt_data = context->elt->private_data;
 	track_elt_data = track_data->elt.private_data;
 	if (elt_data->comm)
-		strscpy(track_elt_data->comm, elt_data->comm, TASK_COMM_LEN);
+		track_elt_data->comm = elt_data->comm;
 
 	track_data->updated = true;
 
@@ -5505,16 +5502,13 @@ static void hist_trigger_print_key(struct seq_file *m,
 				   uval, (void *)(uintptr_t)uval);
 		} else if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
 			struct hist_elt_data *elt_data = elt->private_data;
-			char *comm;
 
 			if (WARN_ON_ONCE(!elt_data))
 				return;
 
-			comm = elt_data->comm;
-
 			uval = *(u64 *)(key + key_field->offset);
 			seq_printf(m, "%s: %-16s[%10llu]", field_name,
-				   comm, uval);
+				   elt_data->comm->comm, uval);
 		} else if (key_field->flags & HIST_FIELD_FL_SYSCALL) {
 			const char *syscall_name;
 
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 0d2d3a2ea7dd..e46c5aa0d4e4 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -561,19 +561,19 @@ static void print_graph_cpu(struct trace_seq *s, int cpu)
 
 static void print_graph_proc(struct trace_seq *s, pid_t pid)
 {
-	char comm[TASK_COMM_LEN];
+	struct trace_comm comm;
 	/* sign + log10(MAX_INT) + '\0' */
 	char pid_str[12];
 	int spaces = 0;
 	int len;
 	int i;
 
-	trace_find_cmdline(pid, comm);
-	comm[7] = '\0';
+	trace_find_cmdline(pid, &comm);
+	comm.comm[7] = '\0';
 	sprintf(pid_str, "%d", pid);
 
 	/* 1 stands for the "-" character */
-	len = strlen(comm) + strlen(pid_str) + 1;
+	len = strlen(comm.comm) + strlen(pid_str) + 1;
 
 	if (len < TRACE_GRAPH_PROCINFO_LENGTH)
 		spaces = TRACE_GRAPH_PROCINFO_LENGTH - len;
@@ -582,7 +582,7 @@ static void print_graph_proc(struct trace_seq *s, pid_t pid)
 	for (i = 0; i < spaces / 2; i++)
 		trace_seq_putc(s, ' ');
 
-	trace_seq_printf(s, "%s-%s", comm, pid_str);
+	trace_seq_printf(s, "%s-%s", comm.comm, pid_str);
 
 	/* Last spaces to align center */
 	for (i = 0; i < spaces - (spaces / 2); i++)
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index a5ad76175d10..58405291b44f 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -554,12 +554,12 @@ int trace_print_lat_fmt(struct trace_seq *s, struct trace_entry *entry)
 static int
 lat_print_generic(struct trace_seq *s, struct trace_entry *entry, int cpu)
 {
-	char comm[TASK_COMM_LEN];
+	struct trace_comm comm;
 
-	trace_find_cmdline(entry->pid, comm);
+	trace_find_cmdline(entry->pid, &comm);
 
 	trace_seq_printf(s, "%8.8s-%-7d %3d",
-			 comm, entry->pid, cpu);
+			 comm.comm, entry->pid, cpu);
 
 	return trace_print_lat_fmt(s, entry);
 }
@@ -658,11 +658,11 @@ int trace_print_context(struct trace_iterator *iter)
 	struct trace_array *tr = iter->tr;
 	struct trace_seq *s = &iter->seq;
 	struct trace_entry *entry = iter->ent;
-	char comm[TASK_COMM_LEN];
+	struct trace_comm comm;
 
-	trace_find_cmdline(entry->pid, comm);
+	trace_find_cmdline(entry->pid, &comm);
 
-	trace_seq_printf(s, "%16s-%-7d ", comm, entry->pid);
+	trace_seq_printf(s, "%16s-%-7d ", comm.comm, entry->pid);
 
 	if (tr->trace_flags & TRACE_ITER(RECORD_TGID)) {
 		unsigned int tgid = trace_find_tgid(entry->pid);
@@ -700,13 +700,13 @@ int trace_print_lat_context(struct trace_iterator *iter)
 	entry = iter->ent;
 
 	if (verbose) {
-		char comm[TASK_COMM_LEN];
+		struct trace_comm comm;
 
-		trace_find_cmdline(entry->pid, comm);
+		trace_find_cmdline(entry->pid, &comm);
 
 		trace_seq_printf(
 			s, "%16s %7d %3d %d %08x %08lx ",
-			comm, entry->pid, iter->cpu, entry->flags,
+			comm.comm, entry->pid, iter->cpu, entry->flags,
 			entry->preempt_count & 0xf, iter->idx);
 	} else {
 		lat_print_generic(s, entry, iter->cpu);
@@ -1276,7 +1276,7 @@ static enum print_line_t trace_ctxwake_print(struct trace_iterator *iter,
 					     char *delim)
 {
 	struct ctx_switch_entry *field;
-	char comm[TASK_COMM_LEN];
+	struct trace_comm comm;
 	int S, T;
 
 
@@ -1284,7 +1284,7 @@ static enum print_line_t trace_ctxwake_print(struct trace_iterator *iter,
 
 	T = task_index_to_char(field->next_state);
 	S = task_index_to_char(field->prev_state);
-	trace_find_cmdline(field->next_pid, comm);
+	trace_find_cmdline(field->next_pid, &comm);
 	trace_seq_printf(&iter->seq,
 			 " %7d:%3d:%c %s [%03d] %7d:%3d:%c %s\n",
 			 field->prev_pid,
@@ -1293,7 +1293,7 @@ static enum print_line_t trace_ctxwake_print(struct trace_iterator *iter,
 			 field->next_cpu,
 			 field->next_pid,
 			 field->next_prio,
-			 T, comm);
+			 T, comm.comm);
 
 	return trace_handle_return(&iter->seq);
 }
diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
index e9f0ff962660..972883643097 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -172,27 +172,33 @@ struct saved_cmdlines_buffer {
 	unsigned *map_cmdline_to_pid;
 	unsigned cmdline_num;
 	int cmdline_idx;
-	char saved_cmdlines[];
+	struct trace_comm saved_cmdlines[];
 };
 static struct saved_cmdlines_buffer *savedcmd;
 
 /* Holds the size of a cmdline and pid element */
 #define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s)			\
-	(TASK_COMM_LEN + sizeof((s)->map_cmdline_to_pid[0]))
+	(sizeof(struct trace_comm) + sizeof((s)->map_cmdline_to_pid[0]))
 
-static inline char *get_saved_cmdlines(int idx)
+static inline struct trace_comm *get_saved_cmdlines(int idx)
 {
-	return &savedcmd->saved_cmdlines[idx * TASK_COMM_LEN];
+	return &savedcmd->saved_cmdlines[idx];
 }
 
-static inline void set_cmdline(int idx, const char *cmdline)
+static inline void set_cmdline(int idx, const struct task_struct *tsk)
 {
-	strscpy(get_saved_cmdlines(idx), cmdline, TASK_COMM_LEN);
+	struct trace_comm *comm = get_saved_cmdlines(idx);
+
+	BUILD_BUG_ON(sizeof(comm->comm) > sizeof(tsk->comm));
+
+	memcpy(comm->comm, tsk->comm, sizeof comm->comm);
+	if (sizeof(comm->comm) != sizeof(tsk->comm))
+		comm->comm[ARRAY_SIZE(comm->comm) - 1] = 0;
 }
 
 static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s)
 {
-	int order = get_order(sizeof(*s) + s->cmdline_num * TASK_COMM_LEN);
+	int order = get_order(sizeof(*s) + s->cmdline_num * sizeof(struct trace_comm));
 
 	kmemleak_free(s);
 	free_pages((unsigned long)s, order);
@@ -222,7 +228,7 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
 	s->cmdline_num = val;
 
 	/* Place map_cmdline_to_pid array right after saved_cmdlines */
-	s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * TASK_COMM_LEN];
+	s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val];
 
 	memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
 	       sizeof(s->map_pid_to_cmdline));
@@ -273,25 +279,25 @@ int trace_save_cmdline(struct task_struct *tsk)
 	}
 
 	savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
-	set_cmdline(idx, tsk->comm);
+	set_cmdline(idx, tsk);
 
 	arch_spin_unlock(&trace_cmdline_lock);
 
 	return 1;
 }
 
-static void __trace_find_cmdline(int pid, char comm[])
+static void __trace_find_cmdline(int pid, struct trace_comm *comm)
 {
 	unsigned map;
 	int tpid;
 
 	if (!pid) {
-		strcpy(comm, "<idle>");
+		strcpy(comm->comm, "<idle>");
 		return;
 	}
 
 	if (WARN_ON_ONCE(pid < 0)) {
-		strcpy(comm, "<XXX>");
+		strcpy(comm->comm, "<XXX>");
 		return;
 	}
 
@@ -300,14 +306,14 @@ static void __trace_find_cmdline(int pid, char comm[])
 	if (map != NO_CMDLINE_MAP) {
 		tpid = savedcmd->map_cmdline_to_pid[map];
 		if (tpid == pid) {
-			strscpy(comm, get_saved_cmdlines(map), TASK_COMM_LEN);
+			*comm = *get_saved_cmdlines(map);
 			return;
 		}
 	}
-	strcpy(comm, "<...>");
+	strcpy(comm->comm, "<...>");
 }
 
-void trace_find_cmdline(int pid, char comm[])
+void trace_find_cmdline(int pid, struct trace_comm *comm)
 {
 	preempt_disable();
 	arch_spin_lock(&trace_cmdline_lock);
@@ -561,11 +567,11 @@ static void saved_cmdlines_stop(struct seq_file *m, void *v)
 
 static int saved_cmdlines_show(struct seq_file *m, void *v)
 {
-	char buf[TASK_COMM_LEN];
+	struct trace_comm buf;
 	unsigned int *pid = v;
 
-	__trace_find_cmdline(*pid, buf);
-	seq_printf(m, "%d %s\n", *pid, buf);
+	__trace_find_cmdline(*pid, &buf);
+	seq_printf(m, "%d %s\n", *pid, buf.comm);
 	return 0;
 }
 
-- 
2.39.5


^ permalink raw reply related

* [PATCH 2/2] tracing: Keep pid and comm[] in the same structure
From: David Laight @ 2026-06-26 21:23 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel, Michal Koutný
  Cc: David Laight
In-Reply-To: <20260626212356.64150-1-david.laight.linux@gmail.com>

Rather than have two separate dynamic arrays on the end of struct
saved_commandlines_buffer have a single dynamic array where each
entry contains the pid and associated task->comm[].
This simplifies the initialisation and lookup.

Don't bother trying to initialise the pid field no a non-zero value,
it only matters in the tracing_saved_cmdlines_seq_ops code.
Allocate entry [0] first so that the tracing_saved_cmdlines_seq_ops
code can just index the array with the file offset.

The code now uses the correct size when determining the page 'order'
to free the structure. The smaller size will always give the same
'order'.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
---

Is there any reason why this code uses alloc_pages() rather
than vmalloc()?
map_pid_to_cmdline[] is 64k*sizeof(int) so the whole structure
expands to 512k with about 64k/20 (about 3200) pid entries even
though the default is 128.
AFAICT there is only one copy of the data - so it could be static.
Perhaps with pointers to map_pid_cmdline[] and (after this patch)
pid_comm[], both of which could be separately resized.

I also noticed that map_pid_to_cmdline[] contains indexes into
pid_comm[], restricting these to 16bits would half the data area.

 kernel/trace/trace_sched_switch.c | 97 +++++++++++++------------------
 1 file changed, 39 insertions(+), 58 deletions(-)

diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
index 972883643097..5e7c8cf444b8 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -167,27 +167,21 @@ static size_t tgid_map_max;
  * where interrupt is disabled.
  */
 static arch_spinlock_t trace_cmdline_lock = __ARCH_SPIN_LOCK_UNLOCKED;
+struct pid_comm {
+	pid_t pid;
+	struct trace_comm comm;
+};
 struct saved_cmdlines_buffer {
 	unsigned map_pid_to_cmdline[PID_MAX_DEFAULT+1];
-	unsigned *map_cmdline_to_pid;
 	unsigned cmdline_num;
 	int cmdline_idx;
-	struct trace_comm saved_cmdlines[];
+	struct pid_comm pid_comm[];
 };
 static struct saved_cmdlines_buffer *savedcmd;
 
-/* Holds the size of a cmdline and pid element */
-#define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s)			\
-	(sizeof(struct trace_comm) + sizeof((s)->map_cmdline_to_pid[0]))
-
-static inline struct trace_comm *get_saved_cmdlines(int idx)
-{
-	return &savedcmd->saved_cmdlines[idx];
-}
-
-static inline void set_cmdline(int idx, const struct task_struct *tsk)
+static inline void set_cmdline(struct pid_comm *pid_comm, const struct task_struct *tsk)
 {
-	struct trace_comm *comm = get_saved_cmdlines(idx);
+	struct trace_comm *comm = &pid_comm->comm;
 
 	BUILD_BUG_ON(sizeof(comm->comm) > sizeof(tsk->comm));
 
@@ -212,7 +206,7 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
 	int order;
 
 	/* Figure out how much is needed to hold the given number of cmdlines */
-	orig_size = sizeof(*s) + val * SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
+	orig_size = sizeof(*s) + val * sizeof(s->pid_comm[0]);
 	order = get_order(orig_size);
 	size = 1 << (order + PAGE_SHIFT);
 	page = alloc_pages(GFP_KERNEL, order);
@@ -224,16 +218,11 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
 	memset(s, 0, sizeof(*s));
 
 	/* Round up to actual allocation */
-	val = (size - sizeof(*s)) / SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
+	val = (size - sizeof(*s)) / sizeof(s->pid_comm[0]);
 	s->cmdline_num = val;
 
-	/* Place map_cmdline_to_pid array right after saved_cmdlines */
-	s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val];
-
 	memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
 	       sizeof(s->map_pid_to_cmdline));
-	memset(s->map_cmdline_to_pid, NO_CMDLINE_MAP,
-	       val * sizeof(*s->map_cmdline_to_pid));
 
 	return s;
 }
@@ -247,6 +236,7 @@ int trace_create_savedcmd(void)
 
 int trace_save_cmdline(struct task_struct *tsk)
 {
+	struct pid_comm *pid_comm;
 	unsigned tpid, idx;
 
 	/* treat recording of idle task as a success */
@@ -272,14 +262,16 @@ int trace_save_cmdline(struct task_struct *tsk)
 
 	idx = savedcmd->map_pid_to_cmdline[tpid];
 	if (idx == NO_CMDLINE_MAP) {
-		idx = (savedcmd->cmdline_idx + 1) % savedcmd->cmdline_num;
-
+		idx = savedcmd->cmdline_idx;
 		savedcmd->map_pid_to_cmdline[tpid] = idx;
+		if (++idx >= savedcmd->cmdline_num)
+			idx = 0;
 		savedcmd->cmdline_idx = idx;
 	}
 
-	savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
-	set_cmdline(idx, tsk);
+	pid_comm = savedcmd->pid_comm + idx;
+	pid_comm->pid = tsk->pid;
+	set_cmdline(pid_comm, tsk);
 
 	arch_spin_unlock(&trace_cmdline_lock);
 
@@ -288,8 +280,8 @@ int trace_save_cmdline(struct task_struct *tsk)
 
 static void __trace_find_cmdline(int pid, struct trace_comm *comm)
 {
+	struct pid_comm *pid_comm;
 	unsigned map;
-	int tpid;
 
 	if (!pid) {
 		strcpy(comm->comm, "<idle>");
@@ -301,12 +293,11 @@ static void __trace_find_cmdline(int pid, struct trace_comm *comm)
 		return;
 	}
 
-	tpid = pid & (PID_MAX_DEFAULT - 1);
-	map = savedcmd->map_pid_to_cmdline[tpid];
+	map = savedcmd->map_pid_to_cmdline[pid & (PID_MAX_DEFAULT - 1)];
 	if (map != NO_CMDLINE_MAP) {
-		tpid = savedcmd->map_cmdline_to_pid[map];
-		if (tpid == pid) {
-			*comm = *get_saved_cmdlines(map);
+		pid_comm = savedcmd->pid_comm + map;
+		if (pid_comm->pid == pid) {
+			*comm = pid_comm->comm;;
 			return;
 		}
 	}
@@ -521,42 +512,34 @@ const struct file_operations tracing_saved_tgids_fops = {
 	.release	= seq_release,
 };
 
-static void *saved_cmdlines_next(struct seq_file *m, void *v, loff_t *pos)
+static struct pid_comm *saved_cmdlines_entry(loff_t off)
 {
-	unsigned int *ptr = v;
+	struct pid_comm *pid_comm;
 
-	if (*pos || m->count)
-		ptr++;
-
-	(*pos)++;
+	if (off >= savedcmd->cmdline_num)
+		return NULL;
 
-	for (; ptr < &savedcmd->map_cmdline_to_pid[savedcmd->cmdline_num];
-	     ptr++) {
-		if (*ptr == -1 || *ptr == NO_CMDLINE_MAP)
-			continue;
+	/* Entries are used in sequence and never freed */
+	pid_comm = &savedcmd->pid_comm[off];
+	return pid_comm->pid ? pid_comm : NULL;
+}
 
-		return ptr;
-	}
+static void *saved_cmdlines_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	loff_t off = *pos;
 
-	return NULL;
+	v = saved_cmdlines_entry(off);
+	if (v)
+		*pos = off + 1;
+	return v;
 }
 
 static void *saved_cmdlines_start(struct seq_file *m, loff_t *pos)
 {
-	void *v;
-	loff_t l = 0;
-
 	preempt_disable();
 	arch_spin_lock(&trace_cmdline_lock);
 
-	v = &savedcmd->map_cmdline_to_pid[0];
-	while (l <= *pos) {
-		v = saved_cmdlines_next(m, v, &l);
-		if (!v)
-			return NULL;
-	}
-
-	return v;
+	return saved_cmdlines_entry(*pos);
 }
 
 static void saved_cmdlines_stop(struct seq_file *m, void *v)
@@ -567,11 +550,9 @@ static void saved_cmdlines_stop(struct seq_file *m, void *v)
 
 static int saved_cmdlines_show(struct seq_file *m, void *v)
 {
-	struct trace_comm buf;
-	unsigned int *pid = v;
+	struct pid_comm *ptr = v;
 
-	__trace_find_cmdline(*pid, &buf);
-	seq_printf(m, "%d %s\n", *pid, buf.comm);
+	seq_printf(m, "%d %s\n", ptr->pid, ptr->comm.comm);
 	return 0;
 }
 
-- 
2.39.5


^ permalink raw reply related

* Re: [PATCH v10 6/6] selftests/mm: add hwpoison-panic destructive test
From: Mike Rapoport @ 2026-06-27  7:52 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Miaohe Lin, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Vlastimil Babka, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Naoya Horiguchi, Jonathan Corbet, Shuah Khan, Liam R. Howlett,
	lance.yang, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team
In-Reply-To: <20260626-ecc_panic-v10-6-6dacb8ad024d@debian.org>

Hi Breno,

On Fri, Jun 26, 2026 at 08:33:20AM -0700, Breno Leitao wrote:
> Add a destructive selftest that verifies
> vm.panic_on_unrecoverable_memory_failure actually panics when a
> hwpoison error hits a kernel-owned page.

> +ksft_skip=4

...

> +ksft_print() { echo "# $*"; }
> +ksft_exit_skip() { ksft_print "$*"; exit "$ksft_skip"; }
> +ksft_exit_fail() { echo "not ok 1 $*"; exit 1; }

There is tools/testing/selftests/kselftest/ktap_helpers.sh that already
implements this :)

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH] tracing: eprobe: read the complete FILTER_PTR_STRING pointer
From: Steven Rostedt @ 2026-06-27  8:36 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Martin Kaiser, Masami Hiramatsu (Google), linux-trace-kernel,
	linux-kernel
In-Reply-To: <20260627004538.4cc8b34a4a20ee14f9a38c28@kernel.org>



On June 26, 2026 4:45:38 PM GMT+01:00, Masami Hiramatsu <mhiramat@kernel.org> wrote:
>On Fri, 26 Jun 2026 06:42:23 -0400
>Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> .
>> 
>> I'll queue this up.
>
>I've already queued this on my probes/core branch.
>(which will be probes/fixes)


Awesome. Thanks Masami!

-- Steve


^ permalink raw reply

* Re: [PATCH] usb: typec: add trace point for typec_set_mode
From: Steven Rostedt @ 2026-06-27  9:23 UTC (permalink / raw)
  To: Heikki Krogerus
  Cc: Ahmad Fatoum, Greg Kroah-Hartman, Masami Hiramatsu,
	Mathieu Desnoyers, linux-kernel, linux-usb, linux-trace-kernel,
	kernel
In-Reply-To: <ajPW9P6qZyhqtjvp@kuha>

On Thu, 18 Jun 2026 14:31:00 +0300
Heikki Krogerus <heikki.krogerus@linux.intel.com> wrote:

> > >  obj-$(CONFIG_TYPEC)            += typec.o
> > >  typec-y                                := class.o mux.o bus.o pd.o retimer.o mode_selection.o
> > >  typec-$(CONFIG_ACPI)           += port-mapper.o
> > > +typec-$(CONFIG_TRACING)                += trace.o  
> > 
> > Thanks for the suggestion. I will do that for v2.
> > 
> > I also saw there is Sashiko AI feedback on this patch[1], but I am not
> > familiar enough with how the event headers are used outside the kernel
> > to determine if that's actionable advice or if it can be ignored.
> > 
> > Do you have an opinion on that?
> > 
> > [1]:
> > https://sashiko.dev/#/patchset/20260617-typec_set_mode-tracepoint-v1-1-bdfbb39cfccd%40pengutronix.de  
> 
> It's correct. You need to use a private trace.h in this case, so just
> move it here: drivers/usb/typec/trace.h
> 
> And also make sure you include everything needed in that header like
> it's telling you.


You may need more updates to make this work in a different directory.

Please read the comments in:

  samples/trace_events/trace-events-sample.h

and

  samples/trace_events/Makefile

-- Steve

^ permalink raw reply

* Re: [RFC PATCH 00/40] mm: reliable 1GB page allocation
From: Lorenzo Stoakes @ 2026-06-27  9:28 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, linux-mm, david, willy, surenb, hannes,
	ziy, usama.arif, fvdl, Andrew Morton, Jonathan Corbet,
	Chris Mason, David Sterba, Vlastimil Babka, Steven Rostedt,
	Masami Hiramatsu, Rafael J. Wysocki, Oscar Salvador,
	Mike Rapoport, linux-doc, linux-btrfs, linux-trace-kernel,
	linux-pm, linux-cxl, Linus Torvalds
In-Reply-To: <20260520150018.2491267-1-riel@surriel.com>

+cc missing maintainers.
+cc Linus for general issues raised.

(Apologies, this email is VERY long, even for me)

Hi Rik,

I appreciate this is an RFC, but obviously part of that is early
feedback. So if this is meant to be an (EXTREMELY) rough pre-RFC
proof-of-concept then fine (however your 'todo' suggests otherwise).

And of course, before I get critical :) thank you for looking into this,
it's very interesting work. I want this feature to land. BUT :)

So if this in any way resembles what you plan to send upstream un-RFC'd I
need to pour a fairly large bucket of very cold water over this.

TL;DR: The series is completely unmergeable as it stands. Not even close.

Before we get into the code, please please please make sure you cc- the
right people. You're doing very invasive, very major work here.

You failed to even cc- the page allocator maintainer (Vlastimil) for a
series that radically alters page allocation.

MAKE SURE you cc- Vlastimil and all other maintainers and reviewers + lists
on future (RFC!!) revisions of this please.

If you want to do a quiet off-list pre-RFC that's fine, but it's
unforgivable to leave Vlastimil out of this even for that.

It's 30 seconds running a script (filtering for maintainers alone to save
on noise):

$ scripts/get_maintainer.pl --nogit --nogit-fallback --nor 20260520150018.2491267-1-riel@surriel.com.mbx
Andrew Morton <akpm@linux-foundation.org> (maintainer:MEMORY MANAGEMENT - CORE)
David Hildenbrand <david@kernel.org> (maintainer:MEMORY MANAGEMENT - CORE)
Jonathan Corbet <corbet@lwn.net> (maintainer:DOCUMENTATION)
Chris Mason <clm@fb.com> (maintainer:BTRFS FILE SYSTEM)
David Sterba <dsterba@suse.com> (maintainer:BTRFS FILE SYSTEM)
Vlastimil Babka <vbabka@kernel.org> (maintainer:MEMORY MANAGEMENT - PAGE ALLOCATOR)
Steven Rostedt <rostedt@goodmis.org> (maintainer:TRACING)
Masami Hiramatsu <mhiramat@kernel.org> (maintainer:TRACING)
"Rafael J. Wysocki" <rafael@kernel.org> (maintainer:HIBERNATION (aka Software Suspend, aka swsusp))
Oscar Salvador <osalvador@suse.de> (maintainer:MEMORY HOT(UN)PLUG)
Mike Rapoport <rppt@kernel.org> (maintainer:MEMBLOCK AND MEMORY MANAGEMENT INITIALIZATION)
Johannes Weiner <hannes@cmpxchg.org> (maintainer:MEMORY MANAGEMENT - RECLAIM)
linux-mm@kvack.org (open list:MEMORY MANAGEMENT - CORE)
linux-doc@vger.kernel.org (open list:DOCUMENTATION)
linux-kernel@vger.kernel.org (open list)
linux-btrfs@vger.kernel.org (open list:BTRFS FILE SYSTEM)
linux-trace-kernel@vger.kernel.org (open list:TRACING)
linux-pm@vger.kernel.org (open list:HIBERNATION (aka Software Suspend, aka swsusp))
linux-cxl@vger.kernel.org (open list:MEMORY HOT(UN)PLUG)
HIBERNATION (aka Software Suspend, aka swsusp) status: Supported
SUSPEND TO RAM status: Supported

OK, so getting into the code. You didn't include an overall diffstat. So let's
see:

$ git checkout e1914add2799
...
$ b4 shazam 20260520150018.2491267-1-riel@surriel.com
...
$ git format-patch --cover-letter HEAD~40
...
$ tail -31 0000-cover-letter.patch
 Documentation/admin-guide/sysctl/vm.rst |   21 -
 Documentation/mm/physical_memory.rst    |   13 +-
 fs/btrfs/extent_io.c                    |   69 +-
 fs/btrfs/extent_io.h                    |    4 +-
 fs/btrfs/inode.c                        |    2 +-
 fs/btrfs/ioctl.c                        |    2 +-
 fs/btrfs/raid56.c                       |    6 +-
 fs/btrfs/relocation.c                   |    2 +-
 fs/btrfs/scrub.c                        |    3 +-
 include/linux/mmzone.h                  |  236 +-
 include/linux/page-flags.h              |    9 +
 include/linux/pageblock-flags.h         |   10 +
 include/linux/vm_event_item.h           |   10 +
 include/trace/events/kmem.h             |  373 ++
 kernel/power/snapshot.c                 |   35 +-
 mm/compaction.c                         |  360 +-
 mm/debug.c                              |   19 +-
 mm/internal.h                           |   39 +
 mm/memory_hotplug.c                     |    4 +
 mm/mm_init.c                            |  400 +-
 mm/page_alloc.c                         | 5064 ++++++++++++++++++++---
 mm/page_reporting.c                     |  149 +-
 mm/show_mem.c                           |   27 +-
 mm/sparse.c                             |    3 +-
 mm/vmscan.c                             |  115 +-
 mm/vmstat.c                             |   71 +-
 26 files changed, 6131 insertions(+), 915 deletions(-)

This is completely insane :) I do hope the omission of the diffstat was not
to hide this, but at any rate, adding ~5,000 lines of code to page_alloc.c
is a complete non-starter.

In any case I think clearly more files are required, mm/super_pageblock.c
or whatever it'll be, but at any rate this much code being added to me
clearly indicates something is terribly wrong here anyway.

And you added no tests whatsoever. I get that some things are hard to test
in the kernel and mm, but not even an attempt? Surely you have some
localised stuff that you've been using to exercise this, couldn't some of
that be made into selftests?

Certainly if tests are not present, this needs to be justified in the cover
letter.

(Again I appreciate this is an RFC, so perhaps those are coming.)

Anyway let's look at some of this code. Johannes's patches all look
reasonable, but as soon as we start seeing:

Assisted-by: Claude:claude-opus-4.7

Things start to go awry.

The series is full of extremely dense comments that contain far too much
specific information to be parseable. E.g.:

+	 *  - owner_cpu == this CPU (or no owner): take the local PCP
+	 *    lock with spin_trylock and enqueue normally. The trylock
+	 *    fails only on rare local self re-entry (IRQ/NMI fires
+	 *    while the interrupted task already holds the lock) or
+	 *    while a remote drain is active; either way, fall back to
+	 *    free_one_page (or the zone-llist for FPI_TRYLOCK). No
+	 *    irqsave: the trylock cannot block on self, and remote
+	 *    CPUs never take this pcp->lock (they go via free_llist),
+	 *    so an interruption cannot deadlock against another freer.

This isn't readable. This isn't acceptable. It matches exactly what I've
found LLMs to do, adding far too much detail in way that isn't
human-readable.

You must redo all of these.

The patches are all quite dense. 40 patches is already far too many IMO for
a series this massive, but you're also piling in tonnes of complexity in
every patch.

The series should really be broken out into separate series that each
slowly build up the prerequisites, so they are reviewable, parseable, can
be seen in action before we move to the superblock concept and each having
tests or means of asserting that they in fact function.

It feels that there are patches that indeed can be broken out sensibly like
this,
e.g. https://lore.kernel.org/all/20260520150018.2491267-5-riel@surriel.com/
(which also seems to be a fix? So maybe should be sent as one with
appropriate tags?), so I think that's really the only viable way to land
something this huge.

I see Usama commented on you breaking userspace, and you said you are not
going to do that after all (thanks)... but it worries me that you
essentially render the watermark boost broken? I don't think that's OK?

The code in general dumps large blocks of code into already existing huge
functions but, even more unforgivably, turns small, maintainable, easily
understood functions into completely disgusting horror shows.

For instance, __rmqueue_smallest() goes from ~20 lines to 604 lines. This
is completely insane and totally unacceptable.

The code is also full of software engineering 101 fails. For instance in
__rmqueue_sb_find_fallback():

	if (search_cats & SB_SEARCH_PREFERRED) {
		...
		for (full = SB_FULL; full < __NR_SB_FULLNESS; full++) {
			list_for_each_entry(sb,
					    &zone->spb_lists[cat][full], list) {
				...
				if (movable && cat == SB_TAINTED &&
				    sb->nr_free <= spb_tainted_reserve(sb))
					continue;
				...
				for (i = 0; i < MIGRATE_PCPTYPES - 1; i++) {
					...
					if (page) {
						...
					}
				}
			}
		}
	}

Note the mix of insane nesting AND guard clauses to make it even more
indecipherable.

There's no world in which I, or any other mm maintainer (I would venture to
say) want to maintain stuff like this.

A particularly concerning patch is
https://lore.kernel.org/all/20260520150018.2491267-15-riel@surriel.com/ -

The commit messages suffer from the same issue as the comments - incredibly
dense material that is far too detailed to the point of absolutely
clarifying nothing. E.g.:

"The SPB list heads (zone->spb_empty and the spb_lists[cat][full] matrix)
 are initialized only by setup_superpageblocks(), which is __init and runs
 only at boot.  Hot-add into a previously-empty zone invokes
 init_one_superpageblock() with zero-initialized list_heads, and the
 inlined list_add_tail() NULL-derefs walking ->next->prev."

Is just complete word salad. It may as well be written in Egyptian
hieroglyphs.

You're explaining spaghetti code using English language which makes it even
more indecipherable, and not getting to the root of what you are actually
doing.

Again this feels very LLM-generated. I've noticed that they tend to do this
'write out the code again but in English' stuff (and is why I do not
allow LLMs to generate code or comments for me!)

You need to make commit messages and comments as succinct as possible and
as clear as possible for _human beings_ :)

I also notice you have this huge blocks of word salad comments all over the
place but I haven't seen a single kdoc comment (EDIT: noticed at least 1 later!)

This is a far more helpful, structured, form of describing functions and
for anything put in a header you really do need to provide them.

Back to the patch. The diffstat is:

 include/linux/mmzone.h |  10 +
 mm/compaction.c        |  36 +-
 mm/internal.h          |  10 +
 mm/mm_init.c           | 146 +++++--
 mm/page_alloc.c        | 853 ++++++++++++++++++++++++++++++++---------
 mm/vmstat.c            |  66 ++--
 6 files changed, 883 insertions(+), 238 deletions(-)

This is (generally speaking, and certainly in this case) far, far too much
for a single patch.

But as per above this is the patch where you commit what are essentially
code war crimes against __rmqueue_smallest(), adding code so densely nested
that you have to lop off the end of code to fit:

+	if (!movable && !is_migrate_cma(migratetype)) {
+		for (full = SB_FULL; full < __NR_SB_FULLNESS; full++) {
+			list_for_each_entry(sb,
+				&zone->spb_lists[SB_TAINTED][full], list) {
+				if (!sb->nr_free)
+					continue;
+				for (current_order = max_t(unsigned int,
+						order, pageblock_order);
+				     current_order < NR_PAGE_ORDERS;
+				     ++current_order) {
+					area = &sb->free_area[current_order];
+					page = get_page_from_free_area(
+						area, MIGRATE_MOVABLE);
+					if (!page)
+						continue;
+					if (get_pageblock_isolate(page))
+						continue;
+					if (is_migrate_cma(
+					    get_pageblock_migratetype(page)))
+						continue;
+					page = claim_whole_block(zone, page,
+						current_order, order,
+						migratetype, MIGRATE_MOVABLE);
+					trace_mm_page_alloc_zone_locked(
+						page, order, migratetype,
+						pcp_allowed_order(order) &&
+						migratetype < MIGRATE_PCPTYPES);
+					return page;
+				}
+			}
+		}
+	}

I mean in what world are we taking code like this?

https://lore.kernel.org/all/20260520150018.2491267-22-riel@surriel.com/ has
what seem to be more human comments (good!) but then so densely nested that
they have to be cropped (bad).

using smaller, separate, functions and a structured programming approach
here is the way to go.

There's random smaller issues too like
https://lore.kernel.org/all/20260520150018.2491267-23-riel@surriel.com/
seems to assume the compiler can't figure out to remove dead code (note the
wild inconsistency too in comments)

You seem in some commits to undo or correct stuff you did in previous ones
like
https://lore.kernel.org/all/20260520150018.2491267-24-riel@surriel.com/

I mean perhaps I'm mistaken, but otherwise, can you instead rebase your
series please?

And again we see the world salad:

+ * Maximum tainted superpageblock candidates per spb_evacuate_for_order call.
+ * Collected under zone->lock, then evacuated without it. Larger than the
+ * contig-allocation candidate cap because evacuation runs from the slowpath
+ * after reclaim/compaction failed: we need a meaningful chance of freeing a
+ * non-MOV-claimable pageblock before the slowpath escalates to dropping
+ * ALLOC_NOFRAGMENT (which lets __rmqueue_claim taint clean SPBs). Sized to
+ * scan a meaningful fraction of a typical tainted-pool population.

No kdocs, some functions/complicated logic missing any comments at all,
then piles of words thrown at you like a rock.

This isn't pleasant to read, it adds no clarity, it just makes the whole
thing a mess.

The code in that one is at least vaguely ok (though still too nested).

I feel like where you've manually written code the quality substantially
improves, where the LLM has, the quality nosedives into oblivion.

For instance
https://lore.kernel.org/all/20260520150018.2491267-26-riel@surriel.com/
seems to be radically better, albeit adding far too much code. So I assume
that was more manual.

Stuff like
https://lore.kernel.org/all/20260520150018.2491267-27-riel@surriel.com/
shows more of a structural issue.

In:

+	 * Mark callers that have a cheap fallback if the page allocator returns
+	 * NULL, so __rmqueue can refuse to taint a clean SPB when an existing
+	 * tainted SPB still has free pageblocks waiting to be evacuated.
+	 *
+	 * Two shapes qualify:
+	 *
+	 *  1. Explicit fallback declaration: __GFP_NORETRY without
+	 *     __GFP_RETRY_MAYFAIL. Used by THP, slab high-order refill,
+	 *     skb_page_frag_refill on full sockets, etc.
+	 *
+	 *  2. Atomic-context shape: no __GFP_DIRECT_RECLAIM, no __GFP_NOMEMALLOC,
+	 *     no __GFP_NOFAIL. These callers (GFP_ATOMIC, GFP_NOWAIT, including
+	 *     ALLOC_HIGHATOMIC consumers) have implicit fallbacks: drop the
+	 *     packet, demote the slab order, return ENOMEM up the slowpath,
+	 *     retry from process context with GFP_KERNEL, etc. ALLOC_HIGHATOMIC
+	 *     callers also get a second crack at the dedicated MIGRATE_HIGHATOMIC
+	 *     reserve in rmqueue_buddy after __rmqueue returns NULL.
+	 *     Tainting a 1 GiB SPB to satisfy any of them is a long-lived
+	 *     fragmentation event for short-lived data.
+	 *

The comment is vastly better than most, but you seem to be tying far too
much up in assumptions about what particular workloads do.

This is indicative perhaps of a need to refactor to more reasonably
determine these.

Adding a function and state, say, that expresses these properties would
allow you to break out these concepts and have the code be self-documenting
(as well as adding suitable actual documentation in comments that would be
more succinct).

https://lore.kernel.org/all/20260520150018.2491267-29-riel@surriel.com/
also shows some real issues with how you've implemented this, e.g.:

+ * Called from each PASS_1/2/2B/2C/2D success path after a successful
+ * allocation against a tainted SPB. If the SPB is below its shrink
+ * high-water mark, queue the SPB-driven slab shrink and try to start
+ * the per-SPB defrag worker. Both have their own cooldown gates inside,
+ * so this is cheap to call on every such allocation.

Now you're putting information from one gargantuan function with labels
about 'passes' (rather reminiscent of VMA merge 'cases') into another.

This is a clear sign of a broken abstraction and the code not being
structured correctly.

You should _express_ state encoded in these 'passes' in the _actual code_,
break that code up sensibly and in a way that can be assessed for
correctness, not put a bit-rotting comment on top of a function whose
correctness is much harder to confirm.

Again this patch has similar issues with ridiculously dense indecipherable
comments, e.g.:

+	 * Last-chance defrag trigger before tainting a fresh clean SPB.
+	 * Walk the tainted-SPB list and try to wake the per-SPB defrag
+	 * worker on each. Catches SPBs that are stuck in expired-cooldown
+	 * state because no allocator activity has touched them recently
+	 * (the routine event-driven trigger from spb_update_list only
+	 * fires on bucket transitions, not on every alloc). Once the
+	 * cooldown has expired, spb_maybe_start_defrag() will requeue
+	 * work; otherwise the gate inside spb_needs_defrag() no-ops
+	 * cheaply. Bounded by nr_tainted_spbs and only runs when we are
+	 * already on the slow path of fragmenting the clean pool.

The spoondecker is montiplexed in the fradupple complex dedadderated in the
splunkyfied concratanator underfined by the transpontaculatoration
matrixifier... :)

I think this is symptomatic of the abstraction being fundamentally broken.

If you have to establish vast swathes of cognitive context like this mid
way through a huge function (again poor __rmqueue_smallest() who will never
forgive you), then you've basically failed to abstract it.

Please I beg you ADD FUNCTIONS :) structured programming is a thing,
struct's are a thing, abstraction is a thing :)

Pouring spaghetti into a single function is something you expect to see on
a 1990's PHP website, not in core mm code ;)

https://lore.kernel.org/all/20260520150018.2491267-30-riel@surriel.com/
seems to be another example of you just rethinking parts of what you
already submitted midway through the series.

Again I might be missing something here, but if you are doing this, please
just rebase your series to use $NEWIDEA from the start.

https://lore.kernel.org/all/20260520150018.2491267-35-riel@surriel.com/ is
another wall of text (newlines! Please :) but it seems like something you
can break out separately no?

It seems at least reviewable :)

https://lore.kernel.org/all/20260520150018.2491267-36-riel@surriel.com/
contains more of the same broken abstraction stuff, terrible nesting stuff,
word salad stuff but is at least mercifully smaller.

https://lore.kernel.org/all/20260520150018.2491267-37-riel@surriel.com/ is
almost emblematic of the terrible (LLM I hope?) comment issue in the
series. Overly dense hieroglyphs.

https://lore.kernel.org/all/20260520150018.2491267-38-riel@surriel.com/ is
another 'why didn't you rebase?' patch (again maybe I'm missing something
here!)

https://lore.kernel.org/all/20260520150018.2491267-39-riel@surriel.com/
seems more reasonable although newlines please :)

And finally
https://lore.kernel.org/all/20260520150018.2491267-40-riel@surriel.com/ -
while it's a do-not-merge patch in an RFC that changes tracing so maybe
unfair, but the comments are suffering the same symptoms as the rest of the
series.

~

So really, you need to start again, from scratch, and without the use of an
LLM for generating code, or at least with it kept on a (very very short)
leash.

And to be clear, I _want_ this concept of GB superpageblocks to land. It's
a really exciting concept.

Pulling compaction kicking and screaming into 2026 stands to significantly
benefit linux users and developers.

But the execution has to be _completely_ rethought.

I also worry about correctness - I simply cannot see how you can have sense
of it, given the state of the code. For something so invasive and so
critical to kernel functionality, code quality is simply not optional here.

Practical thoughts on how to rework the series:

- Properly abstract the concepts
- Properly separate out functions and data structures
- Add _human-readable_ comments that are succinct + clear as possible
- If a function becomes too nested, separate into smaller functions
- Add tests, if at all or in any way possible (and justify if you can't)
- Clarify commit messages, don't just rewrite code in English
- Use newlines for new paragraphs everywhere :)
- Refactor existing code in preparation for your changes
- Add a new file to contain your changes (+ add to page alloc MAINTAINERS
  section)
- Add kdocs for anything not static, and clearly describe static functions
- Split into separate series as much as possible, gradually building
  foundations for your changes
- Make everything less dense and more abstracted

In general, write the series with reviewers and other kernel developers in
mind - write clear explanations in comments and commit messages, have the
series slowly build to what you're implementing.

That way, we can see that correctness is maintained throughout, can review
what's there sensibly, and the series becomes upstreamable.

IOW I say we take off and nuke the entire site from orbit. It's the only
way to be sure :)

~

Another issue here is maintainer time - even this _extremely_ light-touch
review has taken me a few hours (of my weekend :). To review it in detail
would take probably DAYS of dedicated work.

In general, I'm concerned that we're going to get caught in a cycle of LLM
code sent - reviewers spending hours reviewing - LLM generates new revision
- etc.

This simply isn't scalable, maintainers cannot do this in the face of the
sheer quantity and complexity of code that LLMs can generate.

We are simply going to have to NAK. And that helps nobody.

Luckily for me, this isn't in a sub-(sub-)system I maintain, so I am not
obliged, but I do have empathy for my fellow maintainers, and am VERY
concerned about this trend.

There has recently been an absolute wave of LLM code, some acked as such
(and I think you for doing so here!) but others unacknowledged entirely,
and the workload, which was already too much, has risen significantly (Jon
has noted the rise in commit count for instance in LWN).

Treating maintainer time as without value was already an issue, but I fear
that we'll see a significant increase in maintainer stress and sadly,
burnout.

As such, I feel that we will have to implement measures at some point to
deprioritise/quickly dismiss such series, unfortunately.

But series from smart and capable engineers such as yourselfk who are well
respected in the communityk will likely not bek and thus will suffer from
this issue indefinitely.

So on that basis, I ask respectfully that you account for this when using
this tooling.

(Also, I appreciate that this is an RFC, but your recent non-RFC GUP series
https://lore.kernel.org/all/20260616190300.1509639-1-riel@surriel.com/
(revision v2 at
https://lore.kernel.org/all/20260625015053.2445008-1-riel@surriel.com/) has
all the same hallmarks as this one, so I feel this point needs to be
underlined).

~

Speaking more generally across the industry, I've been reading about
companies generating code blindly and running into terrible problems with
software that ends up becoming totally unmanageable.

I won't tolerate this happening happening in mm, and strongly object to the
concept (held by some AI proponents) that code is simply an unimportant
byproduct of AI.

The kernel's code (and especially mm's) is _critically_ important, quite
literally, as it forms the basis of the world's critical technical
infrastructure.

I am not opposed to the use of LLMs, but they _must_ serve as tools to
_assist_ experts at their job, not a means by which code bases are
degraded.

~

OK with all that said - to be absolutely clear - I respect you a great
deal, and I KNOW you're (much, much) better than this.

And, to repeat, this idea is very exciting and I _want_ to see this land.

But I feel you've rather let the LLM run amok and it's selling you (very,
very) short, given just how smart and capable you are.

Let's try again :)

Thanks, Lorenzo

^ permalink raw reply

* Re: [RFC PATCH 00/40] mm: reliable 1GB page allocation
From: Rik van Riel @ 2026-06-27 13:36 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, kernel-team, linux-mm, david, willy, surenb, hannes,
	ziy, usama.arif, fvdl, Andrew Morton, Jonathan Corbet,
	Chris Mason, David Sterba, Vlastimil Babka, Steven Rostedt,
	Masami Hiramatsu, Rafael J. Wysocki, Oscar Salvador,
	Mike Rapoport, linux-doc, linux-btrfs, linux-trace-kernel,
	linux-pm, linux-cxl, Linus Torvalds
In-Reply-To: <aj9yrlB0TrlYCLlf@lucifer>

On Sat, 2026-06-27 at 10:28 +0100, Lorenzo Stoakes wrote:
> 
> So really, you need to start again, from scratch, and without the use
> of an
> LLM for generating code, or at least with it kept on a (very very
> short)
> leash.
> 
> And to be clear, I _want_ this concept of GB superpageblocks to land.
> It's
> a really exciting concept.

That is the one reason I sent out RFC code before it
is ready. I am looking for feedback on the concepts
in this series.

How do people feel about splitting up the free lists,
so each gigabyte (well, PUD sized) chunk of memory
has its own free lists?

How can we balance the desire for higher-order kernel
allocations, against the desire to preserve gigabyte
sized chunks of memory that can be used for user space?

> 
> Pulling compaction kicking and screaming into 2026 stands to
> significantly
> benefit linux users and developers.

That's another big question. How do we balance the
desire to keep compaction overhead low with the desire
to do higher order allocations almost everywhere?

> 
> But the execution has to be _completely_ rethought.

There's no argument there.

I am just hoping to figure out what I should be
doing on a conceptual level, before figuring out
how to do it cleanly.

The mess in the RFC is the result of trying something
that seemed right, watching it fail in some subtle
way, and trying to fix it up.

Once I know what I need to do, coming up with a
cleaner implementation is very doable.

> 
> IOW I say we take off and nuke the entire site from orbit. It's the
> only
> way to be sure :)
> 
BOOM?

> Another issue here is maintainer time - even this _extremely_ light-
> touch
> review has taken me a few hours (of my weekend :). To review it in
> detail
> would take probably DAYS of dedicated work.

I suspect there is a mismatch in expectations here.

I already knew this code has to be totally redone.

I was looking for feedback on the basic concepts
and design in the patch series, but failed to
clearly communicate that.

You provided some detailed feedback on the code,
but as of yet nobody has really provided any
opinions on things like whether it is desirable
at all to have the free lists per gigablock,
or whether we need to come up with some totally
different approach.

How do we better communicate that kind of thing
in the future?

Is that something to spell out more clearly in
the cover letter?

Is that kind of feedback something developers
could even reasonably ask for? (if not, how do
we figure out what maintainers want?)

-- 
All Rights Reversed.

^ permalink raw reply

* Re: [PATCH 1/2] x86/uprobes: Keep shadow stack in sync for emulated CALLs
From: David Windsor @ 2026-06-27 17:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, mhiramat, tglx, mingo, bp, dave.hansen, x86,
	shuah, linux-trace-kernel, linux-kselftest, linux-kernel
In-Reply-To: <ajqJNju6MY3B0S2F@redhat.com>

On Tue, Jun 23, 2026 at 9:25 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 06/23, Peter Zijlstra wrote:
> >
> > On Tue, Jun 23, 2026 at 02:52:32PM +0200, Oleg Nesterov wrote:
> > > On 06/22, David Windsor wrote:
> > > >
> > > > --- a/arch/x86/kernel/uprobes.c
> > > > +++ b/arch/x86/kernel/uprobes.c
> > > > @@ -1246,8 +1246,12 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs
> > > >           long correction = utask->vaddr - utask->xol_vaddr;
> > > >           regs->ip += correction;
> > > >   } else if (auprobe->defparam.fixups & UPROBE_FIX_CALL) {
> > > > +         unsigned long retaddr = utask->vaddr + auprobe->defparam.ilen;
> > > > +
> > > >           regs->sp += sizeof_long(regs); /* Pop incorrect return address */
> > > > -         if (emulate_push_stack(regs, utask->vaddr + auprobe->defparam.ilen))
> > > > +         if (emulate_push_stack(regs, retaddr))
> > > > +                 return -ERESTART;
> > > > +         if (shstk_update_last_frame(retaddr))
> > > >                   return -ERESTART;
> > >
> > > Well, if shstk_update_last_frame() fails after emulate_push_stack(), we should
> > > probably return another error, so that the caller handle_singlestep() will kill
> > > this task?
> >
> > Makes sense, the other user has a force_sig(SIGSEGV) on failure.
>
> Offtopic question... both shstk_update_last_frame() and shstk_push() are only
> used by arch/x86/kernel/uprobes.c. But they are not symmetric in that
> shstk_update_last_frame() returns 0 if !features_enabled(ARCH_SHSTK_SHSTK),
> while shstk_push() returns -ENOTSUPP in this case.
>
> That is why the users can't just do "if (shstk_push(xxx)) ...". This is really
> minor, but perhaps it makes sense to change shstk_push() to return 0 in this
> case too? I don't think -ENOTSUPP is actually useful...
>

Agreed, will send a follow-up patch changing shstk_push() to return 0
rather than -ENOTSUPP.

I'll send v2 shortly with the additional call to force_sig(SIGSEGV) to
balance out the callers.

> Oleg.
>

^ permalink raw reply

* [PATCH] ring-buffer: serialize read-page order with subbuffer resize
From: Yousef Alhouseen @ 2026-06-28  0:46 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Petr Pavlu, linux-trace-kernel, linux-kernel,
	stable, syzbot+2dd9d02f60775ce5c1fb, Yousef Alhouseen

ring_buffer_read_page() checks that its spare page has the current
subbuffer order before taking cpu_buffer->reader_lock. A concurrent
ring_buffer_subbuf_order_set() can change the order and replace the
reader page after that check. The reader then copies a larger subbuffer
into the old allocation, causing an out-of-bounds write.

Keep spare-page allocation and release under buffer->mutex, which already
serializes order changes. Move the read-side order check under
reader_lock, the lock used by resize when replacing per-CPU pages.

Fixes: f9b94daa542a ("ring-buffer: Set new size of the ring buffer sub page")
Reported-by: syzbot+2dd9d02f60775ce5c1fb@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2dd9d02f60775ce5c1fb
Cc: stable@vger.kernel.org
Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
---
 kernel/trace/ring_buffer.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 56a328e94395..eed5d7cffdee 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -6950,6 +6950,8 @@ ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu)
 	if (!cpumask_test_cpu(cpu, buffer->cpumask))
 		return ERR_PTR(-ENODEV);
 
+	guard(mutex)(&buffer->mutex);
+
 	bpage = kzalloc_obj(*bpage);
 	if (!bpage)
 		return ERR_PTR(-ENOMEM);
@@ -7000,6 +7002,8 @@ void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu,
 	if (!buffer || !buffer->buffers || !buffer->buffers[cpu])
 		return;
 
+	guard(mutex)(&buffer->mutex);
+
 	cpu_buffer = buffer->buffers[cpu];
 
 	/*
@@ -7091,14 +7095,13 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
 	if (!data_page || !data_page->data)
 		return -1;
 
-	if (data_page->order != buffer->subbuf_order)
-		return -1;
-
 	dpage = data_page->data;
 	if (!dpage)
 		return -1;
 
 	guard(raw_spinlock_irqsave)(&cpu_buffer->reader_lock);
+	if (data_page->order != buffer->subbuf_order)
+		return -1;
 
 	reader = rb_get_reader_page(cpu_buffer);
 	if (!reader)
-- 
2.54.0


^ permalink raw reply related

* xdp: add device context to bpf_xdp_link_attach_failed tracepoint
From: Masashi Honma @ 2026-06-28 11:33 UTC (permalink / raw)
  To: netdev, bpf, linux-trace-kernel
  Cc: leon.hwang, ast, daniel, kuba, hawk, andrii, rostedt, mhiramat,
	edumazet, pabeni, linux-kernel

Hello,

The bpf_xdp_link_attach_failed tracepoint (added in commit bf4ea1d0b2cb
"xdp: Add tracepoint for xdp attaching failure") exposes the netlink
extack message produced when attaching an XDP program via BPF_LINK_CREATE
fails. This is useful because, unlike the netlink attach path, the
bpf_link attach path does not return the extack to userspace -- the caller
only gets an errno (e.g. EINVAL/ERANGE).

We would like to use this in Cilium [1][2]: when attaching the XDP
datapath program fails, surface the kernel's reason (e.g. "single-buffer
XDP requires MTU less than ...") in the agent logs instead of an opaque
errno, so operators don't have to inspect dmesg on the host.

The limitation we hit is that the tracepoint only carries the message
string, so a consumer cannot tell which device a failure belongs to.
This matters for two reasons:

  1. Correlation: with only the message, a consumer cannot reliably
      attribute a failure to a specific attach, particularly if multiple
      XDP attaches happen concurrently.
  2. Scoping: a consumer watching this tracepoint sees XDP attach
      failures system-wide and cannot limit them to the devices it
      manages.

At the call site (bpf_xdp_link_attach() in net/core/dev.c) the net_device
is in scope, so exposing it looks straightforward:

  TRACE_EVENT(bpf_xdp_link_attach_failed,
      TP_PROTO(const char *msg, const struct net_device *dev),
      TP_ARGS(msg, dev),
      TP_STRUCT__entry(
          __string(msg, msg)
          __field(int, ifindex)
      ),
      TP_fast_assign(
          __assign_str(msg);
          __entry->ifindex = dev->ifindex;
      ),
      TP_printk("ifindex=%d errmsg=%s", __entry->ifindex, __get_str(msg))
  );

  - trace_bpf_xdp_link_attach_failed(extack._msg);
  + trace_bpf_xdp_link_attach_failed(extack._msg, dev);

Before sending a formal patch I'd appreciate guidance on a few points:

  - Should the tracepoint take const struct net_device *dev (consistent
    with the other tracepoints in this file, and lets TP_printk show the
    device), or just the ifindex as an int (simpler for raw_tp BPF
    consumers, which otherwise read dev->ifindex via CO-RE)?

  - For raw_tp consumers the argument order is effectively ABI: prepending
    dev would shift the existing msg argument. I've appended dev above to
    keep msg at args[0]. Is preserving the existing argument position the
    right call, or is reordering acceptable given how new and rarely
    consumed this tracepoint is?

  - Is extending the existing tracepoint preferred, or would you rather
    keep it as-is and expose the device context some other way?

This would be my first XDP/BPF tracepoint change, so any direction is
welcome. I'm happy to send a proper patch once the shape is agreed.

Regards,
Masashi Honma

[1] https://github.com/cilium/cilium/issues/40777
[2] https://github.com/cilium/cilium/pull/46546

^ permalink raw reply

* [RFC] xdp: add device context to bpf_xdp_link_attach_failed tracepoint
From: Masashi Honma @ 2026-06-28 11:39 UTC (permalink / raw)
  To: netdev, bpf, linux-trace-kernel
  Cc: leon.hwang, ast, daniel, kuba, hawk, andrii, rostedt, mhiramat,
	edumazet, pabeni, linux-kernel

Hello, I am re-posting this mail because I forget to add [RFC].

The bpf_xdp_link_attach_failed tracepoint (added in commit bf4ea1d0b2cb
"xdp: Add tracepoint for xdp attaching failure") exposes the netlink
extack message produced when attaching an XDP program via BPF_LINK_CREATE
fails. This is useful because, unlike the netlink attach path, the
bpf_link attach path does not return the extack to userspace -- the caller
only gets an errno (e.g. EINVAL/ERANGE).

We would like to use this in Cilium [1][2]: when attaching the XDP
datapath program fails, surface the kernel's reason (e.g. "single-buffer
XDP requires MTU less than ...") in the agent logs instead of an opaque
errno, so operators don't have to inspect dmesg on the host.

The limitation we hit is that the tracepoint only carries the message
string, so a consumer cannot tell which device a failure belongs to.
This matters for two reasons:

  1. Correlation: with only the message, a consumer cannot reliably
      attribute a failure to a specific attach, particularly if multiple
      XDP attaches happen concurrently.
  2. Scoping: a consumer watching this tracepoint sees XDP attach
      failures system-wide and cannot limit them to the devices it
      manages.

At the call site (bpf_xdp_link_attach() in net/core/dev.c) the net_device
is in scope, so exposing it looks straightforward:

  TRACE_EVENT(bpf_xdp_link_attach_failed,
      TP_PROTO(const char *msg, const struct net_device *dev),
      TP_ARGS(msg, dev),
      TP_STRUCT__entry(
          __string(msg, msg)
          __field(int, ifindex)
      ),
      TP_fast_assign(
          __assign_str(msg);
          __entry->ifindex = dev->ifindex;
      ),
      TP_printk("ifindex=%d errmsg=%s", __entry->ifindex, __get_str(msg))
  );

  - trace_bpf_xdp_link_attach_failed(extack._msg);
  + trace_bpf_xdp_link_attach_failed(extack._msg, dev);

Before sending a formal patch I'd appreciate guidance on a few points:

  - Should the tracepoint take const struct net_device *dev (consistent
    with the other tracepoints in this file, and lets TP_printk show the
    device), or just the ifindex as an int (simpler for raw_tp BPF
    consumers, which otherwise read dev->ifindex via CO-RE)?

  - For raw_tp consumers the argument order is effectively ABI: prepending
    dev would shift the existing msg argument. I've appended dev above to
    keep msg at args[0]. Is preserving the existing argument position the
    right call, or is reordering acceptable given how new and rarely
    consumed this tracepoint is?

  - Is extending the existing tracepoint preferred, or would you rather
    keep it as-is and expose the device context some other way?

This would be my first XDP/BPF tracepoint change, so any direction is
welcome. I'm happy to send a proper patch once the shape is agreed.

Regards,
Masashi Honma

[1] https://github.com/cilium/cilium/issues/40777
[2] https://github.com/cilium/cilium/pull/46546

^ permalink raw reply

* [PATCH] lib/bootconfig: fix undefined behavior involving NULL pointer arithmetic
From: Bradley Morgan @ 2026-06-28 11:56 UTC (permalink / raw)
  To: akpm, mhiramat
  Cc: leitao, linux-kernel, linux-trace-kernel, Bradley Morgan, stable

When xbc_snprint_cmdline() is called during the size-probing phase
(with buf = NULL and size = 0), the function computes the end pointer
as 'buf + size' (NULL + 0) and repeatedly advances the pointer via
'buf += ret'.

Under the C standard, performing pointer arithmetic on a NULL pointer is
undefined behavior. While harmless inside the kernel, this code is also
compiled into the userspace host tool 'tools/bootconfig', where host
compilers with UBSan or FORTIFY_SOURCE enabled abort the build when they
detect NULL pointer arithmetic.

Fix this by tracking the running written length as an integer offset
('len') rather than advancing 'buf' directly. Only perform pointer
arithmetic if 'buf' is actually non-NULL.

Fixes: 5a643e462323 ("bootconfig: move xbc_snprint_cmdline() to lib/bootconfig.c")
Cc: stable@vger.kernel.org
Assisted-by: Gemini:gemini-3.5-flash
Signed-off-by: Bradley Morgan <include@grrlz.net>
---
 lib/bootconfig.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index f445b7703fdd..0181459505b7 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -410,8 +410,6 @@ const char * __init xbc_node_find_next_key_value(struct xbc_node *root,
 
 static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
 
-#define rest(dst, end) ((end) > (dst) ? (end) - (dst) : 0)
-
 /**
  * xbc_snprint_cmdline() - Render bootconfig keys under @root as a cmdline string
  * @buf: Destination buffer (may be NULL when @size is 0 to query the length)
@@ -427,8 +425,8 @@ static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
 int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 {
 	struct xbc_node *knode, *vnode;
-	char *end = buf + size;
 	const char *val, *q;
+	size_t len = 0;
 	int ret;
 
 	xbc_node_for_each_key_value(root, knode, val) {
@@ -439,10 +437,12 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 
 		vnode = xbc_node_get_child(knode);
 		if (!vnode) {
-			ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
+			ret = snprintf(buf ? buf + len : NULL,
+				       size > len ? size - len : 0,
+				       "%s ", xbc_namebuf);
 			if (ret < 0)
 				return ret;
-			buf += ret;
+			len += ret;
 			continue;
 		}
 		xbc_array_for_each_value(vnode, val) {
@@ -452,17 +452,18 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 			 * whitespace.
 			 */
 			q = strpbrk(val, " \t\r\n") ? "\"" : "";
-			ret = snprintf(buf, rest(buf, end), "%s=%s%s%s ",
+			ret = snprintf(buf ? buf + len : NULL,
+				       size > len ? size - len : 0,
+				       "%s=%s%s%s ",
 				       xbc_namebuf, q, val, q);
 			if (ret < 0)
 				return ret;
-			buf += ret;
+			len += ret;
 		}
 	}
 
-	return buf - (end - size);
+	return len;
 }
-#undef rest
 
 /* XBC parse and tree build */
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH v2] tracing: reject invalid preemptirq_delay_test CPU affinity
From: Samuel Moelius @ 2026-06-28 13:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Samuel Moelius, Masami Hiramatsu, Mathieu Desnoyers,
	open list:TRACING, open list:TRACING

preemptirq_delay_test accepts cpu_affinity as a module parameter and,
when it is non-negative, writes that CPU directly into a temporary
cpumask from the worker thread.  Values outside nr_cpu_ids can set a bit
outside the allocated cpumask before the test reports a normal affinity
error.

Validate the requested CPU in preemptirq_delay_run() before setting it
in the temporary cpumask.  Invalid affinity requests are reported by
the test thread and skipped before cpumask_set_cpu() can touch an
out-of-range bit.

Assisted-by: Codex:gpt-5.5-cyber-preview
Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com>
---
Changes in v2:
  - Move check to preemptirq_delay_run

 kernel/trace/preemptirq_delay_test.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c
index acb0c971a408..69e5238737ed 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -6,6 +6,7 @@
  */
 
 #include <linux/trace_clock.h>
+#include <linux/cpumask.h>
 #include <linux/delay.h>
 #include <linux/interrupt.h>
 #include <linux/irq.h>
@@ -123,6 +124,13 @@ static int preemptirq_delay_run(void *data)
 		return -ENOMEM;
 
 	if (cpu_affinity > -1) {
+		unsigned int cpu = cpu_affinity;
+
+		if (cpu >= nr_cpu_ids || !cpu_possible(cpu)) {
+			pr_err("cpu_affinity:%d, invalid CPU\n", cpu_affinity);
+			goto out;
+		}
+
 		cpumask_clear(cpu_mask);
 		cpumask_set_cpu(cpu_affinity, cpu_mask);
 		if (set_cpus_allowed_ptr(current, cpu_mask))
@@ -132,6 +140,7 @@ static int preemptirq_delay_run(void *data)
 	for (i = 0; i < s; i++)
 		(testfuncs[i])(i);
 
+out:
 	complete(&done);
 
 	set_current_state(TASK_INTERRUPTIBLE);
-- 
2.43.0


^ permalink raw reply related

* Re: [RFC] xdp: add device context to bpf_xdp_link_attach_failed tracepoint
From: Leon Hwang @ 2026-06-28 15:26 UTC (permalink / raw)
  To: Masashi Honma, netdev, bpf, linux-trace-kernel
  Cc: ast, daniel, kuba, hawk, andrii, rostedt, mhiramat, edumazet,
	pabeni, linux-kernel
In-Reply-To: <CAFk-A4mE9Jweo2hfX7y_85xbPyt0FqpMT1EvqX1OcYZ=LTLgRA@mail.gmail.com>

On 2026/6/28 19:39, Masashi Honma wrote:
> Hello, I am re-posting this mail because I forget to add [RFC].
> 
> The bpf_xdp_link_attach_failed tracepoint (added in commit bf4ea1d0b2cb
> "xdp: Add tracepoint for xdp attaching failure") exposes the netlink
> extack message produced when attaching an XDP program via BPF_LINK_CREATE
> fails. This is useful because, unlike the netlink attach path, the

I really appreciate that the XDP tracepoint helped someone.

> bpf_link attach path does not return the extack to userspace -- the caller
> only gets an errno (e.g. EINVAL/ERANGE).
> 
> We would like to use this in Cilium [1][2]: when attaching the XDP
> datapath program fails, surface the kernel's reason (e.g. "single-buffer
> XDP requires MTU less than ...") in the agent logs instead of an opaque
> errno, so operators don't have to inspect dmesg on the host.
> 
> The limitation we hit is that the tracepoint only carries the message
> string, so a consumer cannot tell which device a failure belongs to.
> This matters for two reasons:
> 
>   1. Correlation: with only the message, a consumer cannot reliably
>       attribute a failure to a specific attach, particularly if multiple
>       XDP attaches happen concurrently.
>   2. Scoping: a consumer watching this tracepoint sees XDP attach
>       failures system-wide and cannot limit them to the devices it
>       manages.
> 
> At the call site (bpf_xdp_link_attach() in net/core/dev.c) the net_device
> is in scope, so exposing it looks straightforward:
> 
>   TRACE_EVENT(bpf_xdp_link_attach_failed,
>       TP_PROTO(const char *msg, const struct net_device *dev),
>       TP_ARGS(msg, dev),
>       TP_STRUCT__entry(
>           __string(msg, msg)
>           __field(int, ifindex)
>       ),
>       TP_fast_assign(
>           __assign_str(msg);
>           __entry->ifindex = dev->ifindex;
>       ),
>       TP_printk("ifindex=%d errmsg=%s", __entry->ifindex, __get_str(msg))
>   );
> 
>   - trace_bpf_xdp_link_attach_failed(extack._msg);
>   + trace_bpf_xdp_link_attach_failed(extack._msg, dev);
> 
> Before sending a formal patch I'd appreciate guidance on a few points:
> 
>   - Should the tracepoint take const struct net_device *dev (consistent
>     with the other tracepoints in this file, and lets TP_printk show the
>     device), or just the ifindex as an int (simpler for raw_tp BPF
>     consumers, which otherwise read dev->ifindex via CO-RE)?
> 
>   - For raw_tp consumers the argument order is effectively ABI: prepending
>     dev would shift the existing msg argument. I've appended dev above to
>     keep msg at args[0]. Is preserving the existing argument position the
>     right call, or is reordering acceptable given how new and rarely
>     consumed this tracepoint is?
> 

Good concerns. I'm not sure about these parts.

>   - Is extending the existing tracepoint preferred, or would you rather
>     keep it as-is and expose the device context some other way?
> 

I'm planning to retire this tracepoint. But I think I cannot do it, if
there's user space application relied on the tracepoint.

I'm planning to add BPF syscall common attributes support for
BPF_LINK_CREATE, including XDP link. By that way, the kernel will be
able to back-propagate the 'extack._msg' to user space, when fail to
create XDP link. Thereafter, the user space library will be able to get
the error message alongside the errno.

Thanks,
Leon

> This would be my first XDP/BPF tracepoint change, so any direction is
> welcome. I'm happy to send a proper patch once the shape is agreed.
> 
> Regards,
> Masashi Honma
> 
> [1] https://github.com/cilium/cilium/issues/40777
> [2] https://github.com/cilium/cilium/pull/46546


^ permalink raw reply

* Re: [PATCH v11 11/11] tracing/probes: Add a new testcase for BTF typecasts
From: Masami Hiramatsu @ 2026-06-28 16:03 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Steven Rostedt, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178248336390.841606.1695444929833877704.stgit@devnote2>

This has a problem and needs some updates.

On Fri, 26 Jun 2026 23:16:04 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc b/tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc
> new file mode 100644
> index 000000000000..96791e120b7d
> --- /dev/null
> +++ b/tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc
> @@ -0,0 +1,51 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +# description: BTF event with typecast and percpu access
> +# requires: dynamic_events "this_cpu_read(<fetcharg>)":README "[(structname[,field])]<argname>[->field[->field|.field...]]":README
> +
> +# Check if the sample module is loaded
> +if ! lsmod | grep -q trace_events_sample; then
> +  modprobe trace-events-sample || exit_unsupported

I think this should be exit_unresolved.

> +fi
[...]
> diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/btf_typecast_accepted.tc b/tools/testing/selftests/ftrace/test.d/dynevent/btf_typecast_accepted.tc
> new file mode 100644
> index 000000000000..acf0b5a917d3
> --- /dev/null
> +++ b/tools/testing/selftests/ftrace/test.d/dynevent/btf_typecast_accepted.tc
> @@ -0,0 +1,107 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +# description: BTF typecast and percpu access syntax validation
> +# requires: dynamic_events "this_cpu_read(<fetcharg>)":README "[(structname[,field])]<argname>[->field[->field|.field...]]":README
> +
> +KPROBES=
> +FPROBES=
> +
> +if grep -qF "p[:[<group>/][<event>]] <place> [<args>]" README ; then
> +  KPROBES=yes
> +fi
> +if grep -qF "f[:[<group>/][<event>]] <func-name>[%return] [<args>]" README ; then
> +  FPROBES=yes
> +fi
> +
> +if [ -z "$KPROBES" -a -z "$FPROBES" ] ; then
> +  exit_unsupported
> +fi
> +
> +echo 0 > events/enable
> +echo > dynamic_events
> +
> +# Load trace-events-sample module if available to have per-CPU counter structure defined
> +if ! lsmod | grep -q trace_events_sample; then
> +  modprobe trace-events-sample || true

I think this must be tested, and if the module is not found,
the test must return UNRESOLVED error.

> +fi
[...]
> diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc
> index 0e65e787e426..1d6d1cf94f16 100644
> --- a/tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc
> +++ b/tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc
> @@ -21,8 +21,17 @@ check_error 'e:foo/^bar.1 syscalls/sys_enter_openat'	# BAD_EVENT_NAME
>  
>  check_error 'e:foo/bar syscalls/sys_enter_openat arg=^$foo'	# BAD_ATTACH_ARG
>  
> +check_error 'e:foo/bar syscalls/sys_enter_openat arg=^COMM'	# NO_EVENT_FIELD
> +if grep -q '\\$current' README; then

Sashiko found a problem here. Indeed. this is wrong. See below:

> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc
> index c817158b99db..e12dc967ec76 100644
> --- a/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc
> @@ -28,4 +28,9 @@ if grep -q ".*symstr.*" README; then
>  check_error 'p /bin/sh:10 $stack0:^symstr'	# BAD_TYPE
>  fi
>  
> +# $current is not supported by uprobe
> +if grep -q "\$current.*" README; then

This is correct check.

Let me fix those.

Thanks,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v3 2/9] rv: add generic uprobe infrastructure for RV monitors
From: Wen Yang @ 2026-06-28 16:47 UTC (permalink / raw)
  To: Gabriele Monaco; +Cc: linux-trace-kernel, linux-kernel
In-Reply-To: <a645534b04d9bf32f2c2caa5d2206bb87cfcb3e1.camel@redhat.com>



On 6/16/26 17:49, Gabriele Monaco wrote:
> On Mon, 2026-06-08 at 00:13 +0800, wen.yang@linux.dev wrote:
>> From: Wen Yang <wen.yang@linux.dev>
>>
>> Introduce rv_uprobe, a thin wrapper around uprobe_consumer providing
>> rv_uprobe_attach_path(), rv_uprobe_attach(), and rv_uprobe_detach()
>> for RV monitors.  An opaque priv pointer is forwarded unchanged to
>> entry/return handlers so monitors can carry per-binding state (e.g. a
>> latency threshold) to the hot path without any global lookup.
>>
>> rv_uprobe_detach() is fully synchronous (nosync + sync + path_put +
>> kfree), closing the use-after-free window present in open-coded
>> patterns where kfree() precedes uprobe_unregister_sync().
>>
>> Suggested-by: Gabriele Monaco <gmonaco@redhat.com>
>> Signed-off-by: Wen Yang <wen.yang@linux.dev>
> 
> I find your implementation solid, but I wonder if it isn't a bit
> overdoing it. It's good to have helper functions abstracting away the
> common uprobes code, but I find the number of structures used to indirect/hide
> the process a bit confusing.
> 
> Users of this headers still know they're using uprobes, so we don't
> really need to hide it that much, plus the double function pointers are
> an additional performance hit we could easily skip.
> 
> As I get it, we want to pass a private pointer (essentially to get the
> threshold in tlob) and the uprobe infrastructure doesn't allow that
> transparently. So let's keep on using struct embedding but a bit more
> cut to the bone:
> 
> 	struct rv_uprobe {
> 		struct uprobe_consumer	uc;
> 		struct uprobe		*uprobe;
> 		void			*priv;
> 	};
> 
> Then handlers would use container_of() appropriately to derive the data they
> need.
> 
> 	static int tlob_uprobe_entry_handler(struct uprobe_consumer *self, struct pt_regs *regs, __u64 *data)
> 	{
> 		struct rv_uprobe *p = container_of(self, struct rv_uprobe, uc);
> 		struct tlob_uprobe_binding *b = p->priv;
> 
> 		...
> 	}
> 
> You could even define the struct directly inside your private data and save one
> step (and no void *):
> 
> 	struct rv_uprobe {
> 		struct uprobe_consumer	uc;
> 		struct uprobe		*uprobe;
> 	};
> 
> 	#define DECLARE_RV_UPROBE(name) struct rv_uprobe name
> 
> 	// And then
> 
> 	struct tlob_uprobe_binding {
> 		u64 threshold_ns;
> 		...
> 		DECLARE_RV_UPROBE(start);
> 		DECLARE_RV_UPROBE(stop);
> 	}
> 
> 	static int tlob_uprobe_entry_handler(struct uprobe_consumer *self, struct pt_regs *regs, __u64 *data)
> 	{
> 		struct tlob_uprobe_binding *b = container_of(self, struct tlob_uprobe_binding, start.uc);
> 
> 		...
> 	}
> 
> 
> Now I intentionally skipped offset, as I don't really see why carry it around,
> and path, as we could live without it by relying on inodes (uprobes
> reference-track them but you'd need to save them somewhere).
> 
> Mind, I only build tested this idea (appended after). This was supported by an
> LLM which did a few more changes like standardising names and return values of
> registration functions (a bit more consistent with other stuff, you don't have
> to commit with that though).
> 
> By the way, do we really need the duplicate check? Would it break if a user
> defines it twice? If not, we could simplify the tlob_add_uprobe() even further
> and doing all path related checks in the library, since those aren't quite tlob
> specific.
> 
> Thoughts?
> 

Hi Gabriele,

Thanks for the detailed v3 review.  All feedback is addressed in v4;
only one item ([2-1]) required a design deviation due to a UAF found 
during PREEMPT_RT testing.

On [2-1]~[2-5]: embedded consumer causes UAF on PREEMPT_RT
-----------------------------------------------------------

The uprobe_bind selftest oopses on PREEMPT_RT(full):

   handler_chain+0xc9: mov rax, [r15+0x18]  ; advance list iterator
   RAX: 000015ec00001f28   ; garbage — &uprobe->consumers after kfree

handler_chain() reads uc->cons_node.next after uc->handler() returns,
still inside rcu_read_lock_trace().  That pointer is &uprobe->consumers
(the list head embedded in struct uprobe), which gets freed through:

   put_uprobe()
     -> schedule_work(uprobe_free_deferred)   /* async */
        -> call_srcu(&uretprobes_srcu, ...)
           -> call_rcu_tasks_trace(kfree_uprobe)
              -> kfree(uprobe)

rv_uprobe_sync() calls uprobe_unregister_sync() which calls
synchronize_srcu(&uretprobes_srcu), but that only matters after the
kworker has submitted work to uretprobes_srcu.  On a loaded PREEMPT_RT
box the kworker may not have run yet, so synchronize_srcu() returns
immediately and kfree(uprobe) races with the still-iterating
handler_chain():

   CPU A                                CPU B
   consumer_del() → list_del_rcu        rcu_read_lock_trace()
   put_uprobe()   → schedule_work         uc->handler() returns
   rv_uprobe_sync():                       reading cons_node.next...
     synchronize_srcu(&uretprobes_srcu)
     ← idle; returns immediately
   [kworker fires later]
     kfree(uprobe)  ← frees &uprobe->consumers
                                         cons_node.next = freed mem → CRASH

With uc embedded in the binding (as [2-1] suggests), no amount of
delaying kfree(binding) helps: uprobe->consumers is freed by a chain we
don't control.  The fix is to keep uc->cons_node in memory that outlives
the handler_chain() iteration, which means a separate allocation freed
only after rv_uprobe_sync():

   rv_uprobe_unregister_nosync()  /* list_del_rcu + schedule_work */
   rv_uprobe_sync()               /* waits for any already-submitted 
srcu work */
   rv_uprobe_free()               /* kfree(impl) — safe, iteration is 
done */
   kfree(b)                       /* binding; never contained uc */

v4 keeps the public API shape you suggested with impl private:

   struct rv_uprobe {
       struct rv_uprobe_impl  *impl;  /* allocated by 
rv_uprobe_register() */
       struct inode           *inode;
   };
   #define DECLARE_RV_UPROBE(name)  struct rv_uprobe name        /* [2-2] */

   int   rv_uprobe_register(const char *binpath, loff_t offset,
                            struct rv_uprobe *p, handler_fn,
                            ret_handler_fn, void *priv);         /* [2-4] */
   void  rv_uprobe_unregister(struct rv_uprobe *p);
   void  rv_uprobe_unregister_nosync(struct rv_uprobe *p);
   void  rv_uprobe_sync(void);
   void  rv_uprobe_free(struct rv_uprobe *p);                    /* 
[2-5] restored */
   bool  rv_uprobe_is_registered(const struct rv_uprobe *p);     /* 
[2-7] added */
   void *rv_uprobe_get_priv(struct uprobe_consumer *uc);

Handler signature is (struct uprobe_consumer *self, ...) [2-3]; private
data is retrieved via rv_uprobe_get_priv(self) instead of container_of().

Then, rv_uprobe_free() is restored ([2-5] partially reverted).


All other v3 items have been resolved, we are waiting for your comments 
on the above（embedded consumer causes UAF）:

[1-1,1-3]  `# define` / `# error` space removed
[1-4]      `⟹` -> `=>`
[1-6,1-7]  #if/#else in da_destroy_storage() and da_monitor_init()
            replaced with plain if (DA_MON_ALLOCATION_STRATEGY == 
DA_ALLOC_POOL)
[1-8]      tracepoint_synchronize_unregister() lifted to 
da_monitor_destroy before the pool/kmalloc branch

[2-2]  DECLARE_RV_UPROBE(name) — kept
[2-3]  handler sig (struct uprobe_consumer *self, ...) — kept
[2-4]  rv_uprobe_register() returns int — kept
[2-5]  rv_uprobe_attach/detach removed; rv_uprobe_free() restored (see 
above)
[2-6]  offset, path fields removed; inode used for identity
[2-7]  rv_uprobe_is_registered() added

[6-1]  Suggested-by removed
[6-2]  tlob Kconfig entry placed after the deadline monitors marker
[6-3]  Unnecessary includes removed (hrtimer.h, ktime.h, sched.h, tracefs.h)
[6-4]  `#include "../../rv.h"` -> `<rv.h>`
[6-5,6-6]  Verbose comments around DA_MON_POOL_SIZE and da_extra_cleanup
            simplified; early return in tlob_reset_notify() on
            !trace_detail_env_tlob_enabled()
[6-7]  ha_setup_invariants():
          if (atomic_read_acquire(&target->stopping)) return;
          if (next_state < state_max_tlob) ha_start_timer_ns(...);
          else ha_cancel_timer(ha_mon);
[6-8]  offsetof() arithmetic -> enum-indexed array:
          enum tlob_acc_idx { TLOB_ACC_RUNNING, TLOB_ACC_WAITING,
                              TLOB_ACC_SLEEPING, TLOB_ACC_MAX };
          u64 accs_ns[TLOB_ACC_MAX];
[6-9]  1000 → TLOB_MIN_THRESHOLD_NS
[6-10] WARN_ON_ONCE(!da_mon) -> if (unlikely(WARN_ON_ONCE(!da_mon)))
[6-11] __free(kfree) + list_add_tail(&no_free_ptr(b)->list, ...) in
        tlob_add_uprobe(); note that "b = no_free_ptr(b)" would restore b
        and re-trigger __free on return — the correct pattern is to use
        no_free_ptr() as the argument to list_add_tail() directly
[6-15] tlob.h enum/struct order aligned with rvgen output
[6-16] tracefs_create_file() -> rv_create_file()

[7-1]  KUnit tests call tlob_parse_uprobe_line/tlob_parse_remove_line
        directly; no real uprobe creation from unit tests
[7-2]  KUNIT_EXPECT_EQ(test, ..., 0) for valid-line cases

[8-1]  ftracetest: walk-up algorithm to locate test.d/functions
[9-4]  tlob_busy/sleep/preempt_work unified to duration_ms


--
Best wishes,
Wen


> Gabriele
> 
> **Suggested simplification:**
> 
> ---
> diff --git a/include/rv/rv_uprobe.h b/include/rv/rv_uprobe.h
> index 9106c5c927..4fb3f50a63 100644
> --- a/include/rv/rv_uprobe.h
> +++ b/include/rv/rv_uprobe.h
> @@ -7,83 +7,56 @@
>   #ifndef _RV_UPROBE_H
>   #define _RV_UPROBE_H
>   
> -#include <linux/path.h>
>   #include <linux/types.h>
> +#include <linux/uprobes.h>
>   
>   struct pt_regs;
> +struct inode;
>   
>   /**
>    * struct rv_uprobe - a single uprobe registered on behalf of an RV monitor
>    *
> - * @offset:   byte offset within the ELF binary where the probe is installed
> - * @priv:     monitor-private pointer; set at attach time, never touched by
> - *            this layer; passed unchanged to entry_fn / ret_fn
> - * @path:     resolved path of the probed binary (read-only after attach);
> - *            callers may use path.dentry for identity comparisons
> - *
> - * The implementation fields (uprobe_consumer, uprobe handle, callbacks) are
> - * private to rv_uprobe.c and are not exposed here; monitors must not access
> - * them directly.
> + * @uc:       underlying uprobe consumer (publicly visible)
> + * @uprobe:   active uprobe structure handle
> + * @inode:    inode of the target binary (read-only after registration)
>    */
>   struct rv_uprobe {
> -	/* public: read-only after rv_uprobe_attach*() */
> -	loff_t		 offset;
> -	void		*priv;
> -	struct path	 path;
> +	struct uprobe_consumer	uc;
> +	struct uprobe		*uprobe;
> +	struct inode		*inode;
>   };
>   
> -/**
> - * rv_uprobe_attach_path - register an uprobe given an already-resolved path
> - * @path:     path of the target binary; rv_uprobe takes its own reference
> - * @offset:   byte offset within the binary
> - * @entry_fn: called on probe hit (entry); may be NULL
> - * @ret_fn:   called on function return (uretprobe); may be NULL
> - * @priv:     opaque pointer forwarded to callbacks unchanged
> - *
> - * Use this variant when the caller has already resolved the path (e.g. to
> - * register multiple probes on the same binary with a single kern_path call).
> - * The inode is derived internally via d_real_inode(), so inode and path are
> - * always consistent.
> - *
> - * Returns a pointer to the new rv_uprobe on success, ERR_PTR on failure.
> - */
> -struct rv_uprobe *rv_uprobe_attach_path(struct path *path, loff_t offset,
> -	int (*entry_fn)(struct rv_uprobe *p, struct pt_regs *regs, __u64 *data),
> -	int (*ret_fn)(struct rv_uprobe *p, unsigned long func,
> -			struct pt_regs *regs, __u64 *data),
> -	void *priv);
> +/* Seamless inline declaration of a named uprobe inside user structs */
> +#define DECLARE_RV_UPROBE(name) \
> +	struct rv_uprobe name
>   
>   /**
> - * rv_uprobe_attach - resolve binpath and register an uprobe
> - * @binpath:  absolute path to the target binary
> - * @offset:   byte offset within the binary
> - * @entry_fn: called on probe hit (entry); may be NULL
> - * @ret_fn:   called on function return (uretprobe); may be NULL
> - * @priv:     opaque pointer forwarded to callbacks unchanged
> + * rv_uprobe_register - resolve binpath and register an uprobe
> + * @binpath:     absolute path to the target binary
> + * @offset:      byte offset within the binary
> + * @p:           pointer to the allocated/embedded rv_uprobe structure
> + * @handler:     called on probe hit (entry); may be NULL
> + * @ret_handler: called on function return (uretprobe); may be NULL
>    *
> - * Resolves binpath via kern_path(), then delegates to rv_uprobe_attach_path().
> + * Resolves binpath via kern_path(), registers the uprobe directly using the
> + * embedded `uprobe_consumer`, and immediately releases the path reference.
>    *
> - * Returns a pointer to the new rv_uprobe on success, ERR_PTR on failure.
> + * Returns 0 on success, or a negative error code on failure.
>    */
> -struct rv_uprobe *rv_uprobe_attach(const char *binpath, loff_t offset,
> -	int (*entry_fn)(struct rv_uprobe *p, struct pt_regs *regs, __u64 *data),
> -	int (*ret_fn)(struct rv_uprobe *p, unsigned long func,
> -			struct pt_regs *regs, __u64 *data),
> -	void *priv);
> +int rv_uprobe_register(const char *binpath, loff_t offset, struct rv_uprobe *p,
> +	int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs, __u64 *data),
> +	int (*ret_handler)(struct uprobe_consumer *self, unsigned long func,
> +			   struct pt_regs *regs, __u64 *data));
>   
>   /**
> - * rv_uprobe_detach - synchronously unregister an uprobe and free it
> - * @p:  probe to detach; may be NULL (no-op)
> - *
> - * Calls uprobe_unregister_nosync(), then uprobe_unregister_sync() to wait
> - * for any in-progress handler to finish, then releases the path reference
> - * and frees the rv_uprobe struct.  The caller's priv data is NOT freed.
> + * rv_uprobe_unregister - synchronously unregister a uprobe
> + * @p:  probe to unregister; may be NULL (no-op)
>    *
> - * When removing a single probe, prefer this over the three-phase API.
> - * Safe to call from process context only (uprobe_unregister_sync() may
> - * schedule).
> + * Dequeues the uprobe and waits synchronously for all in-flight handlers
> + * to complete.
>    */
> -void rv_uprobe_detach(struct rv_uprobe *p);
> +void rv_uprobe_unregister(struct rv_uprobe *p);
>   
>   /**
>    * rv_uprobe_unregister_nosync - dequeue an uprobe without waiting
> @@ -91,9 +64,7 @@ void rv_uprobe_detach(struct rv_uprobe *p);
>    *
>    * Removes the uprobe from the uprobe subsystem but does NOT wait for
>    * in-flight handlers to complete.  The caller must call rv_uprobe_sync()
> - * before calling rv_uprobe_free() on the same probe.
> - *
> - * Use this to batch multiple deregistrations before a single rv_uprobe_sync().
> + * before freeing any container holding this probe.
>    */
>   void rv_uprobe_unregister_nosync(struct rv_uprobe *p);
>   
> @@ -101,19 +72,8 @@ void rv_uprobe_unregister_nosync(struct rv_uprobe *p);
>    * rv_uprobe_sync - wait for all in-flight uprobe handlers to complete
>    *
>    * Global barrier: waits for every in-flight uprobe handler across the system
> - * to finish.  Call once after a batch of rv_uprobe_unregister_nosync() calls
> - * and before any rv_uprobe_free() call.
> + * to finish.
>    */
>   void rv_uprobe_sync(void);
>   
> -/**
> - * rv_uprobe_free - release resources of a previously deregistered probe
> - * @p:  probe to free; may be NULL (no-op)
> - *
> - * Releases the path reference and frees the rv_uprobe struct.  Must only
> - * be called after rv_uprobe_sync() has returned.  The caller's priv data
> - * is NOT freed.
> - */
> -void rv_uprobe_free(struct rv_uprobe *p);
> -
>   #endif /* _RV_UPROBE_H */
> diff --git a/kernel/trace/rv/monitors/tlob/tlob.c b/kernel/trace/rv/monitors/tlob/tlob.c
> index d8e0c47947..28a6c740c7 100644
> --- a/kernel/trace/rv/monitors/tlob/tlob.c
> +++ b/kernel/trace/rv/monitors/tlob/tlob.c
> @@ -252,8 +252,8 @@ struct tlob_uprobe_binding {
>   	char			binpath[TLOB_MAX_PATH];
>   	loff_t			offset_start;
>   	loff_t			offset_stop;
> -	struct rv_uprobe	*start_probe;
> -	struct rv_uprobe	*stop_probe;
> +	DECLARE_RV_UPROBE(start_probe);
> +	DECLARE_RV_UPROBE(stop_probe);
>   };
>   
>   /* RCU callback: free the slab once no readers remain. */
> @@ -512,16 +512,16 @@ int tlob_stop_task(struct task_struct *task)
>   EXPORT_SYMBOL_GPL(tlob_stop_task);
>   
>   
> -static int tlob_uprobe_entry_handler(struct rv_uprobe *p, struct pt_regs *regs,
> +static int tlob_uprobe_entry_handler(struct uprobe_consumer *self, struct pt_regs *regs,
>   				     __u64 *data)
>   {
> -	struct tlob_uprobe_binding *b = p->priv;
> +	struct tlob_uprobe_binding *b = container_of(self, struct tlob_uprobe_binding, start_probe.uc);
>   
>   	tlob_start_task(current, b->threshold_ns);
>   	return 0;
>   }
>   
> -static int tlob_uprobe_stop_handler(struct rv_uprobe *p, struct pt_regs *regs,
> +static int tlob_uprobe_stop_handler(struct uprobe_consumer *self, struct pt_regs *regs,
>   				    __u64 *data)
>   {
>   	tlob_stop_task(current);
> @@ -537,6 +537,7 @@ static int tlob_add_uprobe(u64 threshold_ns, const char *binpath,
>   {
>   	struct tlob_uprobe_binding *b, *tmp_b;
>   	char pathbuf[TLOB_MAX_PATH];
> +	struct inode *inode;
>   	struct path path;
>   	char *canon;
>   	int ret;
> @@ -561,10 +562,12 @@ static int tlob_add_uprobe(u64 threshold_ns, const char *binpath,
>   		goto err_path;
>   	}
>   
> -	/* Reject duplicate start offset for the same binary. */
> +	inode = d_real_inode(path.dentry);
> +
> +	/* Reject duplicate start offset for the same binary inode. */
>   	list_for_each_entry(tmp_b, &tlob_uprobe_list, list) {
>   		if (tmp_b->offset_start == offset_start &&
> -		    tmp_b->start_probe->path.dentry == path.dentry) {
> +		    tmp_b->start_probe.inode == inode) {
>   			ret = -EEXIST;
>   			goto err_path;
>   		}
> @@ -577,29 +580,25 @@ static int tlob_add_uprobe(u64 threshold_ns, const char *binpath,
>   	}
>   	strscpy(b->binpath, canon, sizeof(b->binpath));
>   
> -	/* Both probes share b (priv) and path; attach_path refs path itself. */
> -	b->start_probe = rv_uprobe_attach_path(&path, offset_start,
> -					       tlob_uprobe_entry_handler, NULL, b);
> -	if (IS_ERR(b->start_probe)) {
> -		ret = PTR_ERR(b->start_probe);
> -		b->start_probe = NULL;
> -		goto err_path;
> -	}
> +	path_put(&path);
> +
> +	/* Both probes are registered directly on the embedded fields */
> +	ret = rv_uprobe_register(b->binpath, offset_start, &b->start_probe,
> +				 tlob_uprobe_entry_handler, NULL);
> +	if (ret)
> +		goto err_free;
>   
> -	b->stop_probe = rv_uprobe_attach_path(&path, offset_stop,
> -					      tlob_uprobe_stop_handler, NULL, b);
> -	if (IS_ERR(b->stop_probe)) {
> -		ret = PTR_ERR(b->stop_probe);
> -		b->stop_probe = NULL;
> +	ret = rv_uprobe_register(b->binpath, offset_stop, &b->stop_probe,
> +				 tlob_uprobe_stop_handler, NULL);
> +	if (ret)
>   		goto err_start;
> -	}
>   
> -	path_put(&path);
>   	list_add_tail(&b->list, &tlob_uprobe_list);
>   	return 0;
>   
>   err_start:
> -	rv_uprobe_detach(b->start_probe);
> +	rv_uprobe_unregister(&b->start_probe);
> +	goto err_free;
>   err_path:
>   	path_put(&path);
>   err_free:
> @@ -611,21 +610,24 @@ static int tlob_remove_uprobe_by_key(loff_t offset_start, const char *binpath)
>   {
>   	struct tlob_uprobe_binding *b, *tmp;
>   	struct path remove_path;
> +	struct inode *inode;
>   	int ret;
>   
>   	ret = kern_path(binpath, LOOKUP_FOLLOW, &remove_path);
>   	if (ret)
>   		return ret;
>   
> +	inode = d_real_inode(remove_path.dentry);
> +
>   	ret = -ENOENT;
>   	list_for_each_entry_safe(b, tmp, &tlob_uprobe_list, list) {
>   		if (b->offset_start != offset_start)
>   			continue;
> -		if (b->start_probe->path.dentry != remove_path.dentry)
> +		if (b->start_probe.inode != inode)
>   			continue;
>   		list_del(&b->list);
> -		rv_uprobe_detach(b->start_probe);
> -		rv_uprobe_detach(b->stop_probe);
> +		rv_uprobe_unregister(&b->start_probe);
> +		rv_uprobe_unregister(&b->stop_probe);
>   		kfree(b);
>   		ret = 0;
>   		break;
> @@ -643,8 +645,8 @@ static void tlob_remove_all_uprobes(void)
>   	mutex_lock(&tlob_uprobe_mutex);
>   	list_for_each_entry_safe(b, tmp, &tlob_uprobe_list, list) {
>   		list_move(&b->list, &pending);
> -		rv_uprobe_unregister_nosync(b->start_probe);
> -		rv_uprobe_unregister_nosync(b->stop_probe);
> +		rv_uprobe_unregister_nosync(&b->start_probe);
> +		rv_uprobe_unregister_nosync(&b->stop_probe);
>   	}
>   	mutex_unlock(&tlob_uprobe_mutex);
>   
> @@ -658,8 +660,6 @@ static void tlob_remove_all_uprobes(void)
>   	rv_uprobe_sync();
>   
>   	list_for_each_entry_safe(b, tmp, &pending, list) {
> -		rv_uprobe_free(b->start_probe);
> -		rv_uprobe_free(b->stop_probe);
>   		kfree(b);
>   	}
>   }
> diff --git a/kernel/trace/rv/rv_uprobe.c b/kernel/trace/rv/rv_uprobe.c
> index 3d8b764dde..69b8b0c27e 100644
> --- a/kernel/trace/rv/rv_uprobe.c
> +++ b/kernel/trace/rv/rv_uprobe.c
> @@ -10,149 +10,74 @@
>   #include <linux/uprobes.h>
>   #include <rv/rv_uprobe.h>
>   
> -/*
> - * Private extension of struct rv_uprobe.  Allocated by rv_uprobe_attach*()
> - * and returned to callers as &impl->pub.
> - */
> -struct rv_uprobe_impl {
> -	struct rv_uprobe	pub;	/* must be first; callers hold &pub */
> -	struct uprobe_consumer	uc;
> -	struct uprobe		*uprobe;
> -	int (*entry_fn)(struct rv_uprobe *p, struct pt_regs *regs, __u64 *data);
> -	int (*ret_fn)(struct rv_uprobe *p, unsigned long func,
> -			struct pt_regs *regs, __u64 *data);
> -};
> -
> -static int rv_uprobe_handler(struct uprobe_consumer *uc,
> -			     struct pt_regs *regs, __u64 *data)
> -{
> -	struct rv_uprobe_impl *impl = container_of(uc, struct rv_uprobe_impl, uc);
> -
> -	if (impl->entry_fn)
> -		return impl->entry_fn(&impl->pub, regs, data);
> -	return 0;
> -}
> -
> -static int rv_uprobe_ret_handler(struct uprobe_consumer *uc,
> -				 unsigned long func,
> -				 struct pt_regs *regs, __u64 *data)
> -{
> -	struct rv_uprobe_impl *impl = container_of(uc, struct rv_uprobe_impl, uc);
> -
> -	if (impl->ret_fn)
> -		return impl->ret_fn(&impl->pub, func, regs, data);
> -	return 0;
> -}
> -
> -static struct rv_uprobe *
> -__rv_uprobe_attach(struct inode *inode, struct path *path, loff_t offset,
> -		   int (*entry_fn)(struct rv_uprobe *p, struct pt_regs *regs, __u64 *data),
> -		   int (*ret_fn)(struct rv_uprobe *p, unsigned long func,
> -				   struct pt_regs *regs, __u64 *data),
> -		   void *priv)
> -{
> -	struct rv_uprobe_impl *impl;
> -	int ret;
> -
> -	if (!entry_fn && !ret_fn)
> -		return ERR_PTR(-EINVAL);
> -
> -	impl = kzalloc_obj(*impl, GFP_KERNEL);
> -	if (!impl)
> -		return ERR_PTR(-ENOMEM);
> -
> -	impl->pub.offset = offset;
> -	impl->pub.priv   = priv;
> -	impl->entry_fn   = entry_fn;
> -	impl->ret_fn     = ret_fn;
> -	path_get(path);
> -	impl->pub.path   = *path;
> -
> -	if (entry_fn)
> -		impl->uc.handler     = rv_uprobe_handler;
> -	if (ret_fn)
> -		impl->uc.ret_handler = rv_uprobe_ret_handler;
> -
> -	impl->uprobe = uprobe_register(inode, offset, 0, &impl->uc);
> -	if (IS_ERR(impl->uprobe)) {
> -		ret = PTR_ERR(impl->uprobe);
> -		path_put(&impl->pub.path);
> -		kfree(impl);
> -		return ERR_PTR(ret);
> -	}
> -
> -	return &impl->pub;
> -}
> -
> -/**
> - * rv_uprobe_attach_path - register an uprobe given an already-resolved path
> - */
> -struct rv_uprobe *rv_uprobe_attach_path(struct path *path, loff_t offset,
> -	int (*entry_fn)(struct rv_uprobe *p, struct pt_regs *regs, __u64 *data),
> -	int (*ret_fn)(struct rv_uprobe *p, unsigned long func,
> -			struct pt_regs *regs, __u64 *data),
> -	void *priv)
> -{
> -	struct inode *inode = d_real_inode(path->dentry);
> -
> -	return __rv_uprobe_attach(inode, path, offset, entry_fn, ret_fn, priv);
> -}
> -EXPORT_SYMBOL_GPL(rv_uprobe_attach_path);
> -
>   /**
> - * rv_uprobe_attach - resolve binpath and register an uprobe
> + * rv_uprobe_register - resolve binpath and register an uprobe
>    */
> -struct rv_uprobe *rv_uprobe_attach(const char *binpath, loff_t offset,
> -	int (*entry_fn)(struct rv_uprobe *p, struct pt_regs *regs, __u64 *data),
> -	int (*ret_fn)(struct rv_uprobe *p, unsigned long func,
> -			struct pt_regs *regs, __u64 *data),
> -	void *priv)
> +int rv_uprobe_register(const char *binpath, loff_t offset, struct rv_uprobe *p,
> +	int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs, __u64 *data),
> +	int (*ret_handler)(struct uprobe_consumer *self, unsigned long func,
> +			   struct pt_regs *regs, __u64 *data))
>   {
> -	struct rv_uprobe *p;
> +	struct inode *inode;
>   	struct path path;
>   	int ret;
>   
> +	if (!handler && !ret_handler)
> +		return -EINVAL;
> +
>   	ret = kern_path(binpath, LOOKUP_FOLLOW, &path);
>   	if (ret)
> -		return ERR_PTR(ret);
> +		return ret;
>   
>   	if (!d_is_reg(path.dentry)) {
>   		path_put(&path);
> -		return ERR_PTR(-EINVAL);
> +		return -EINVAL;
>   	}
>   
> -	p = rv_uprobe_attach_path(&path, offset, entry_fn, ret_fn, priv);
> +	inode = d_real_inode(path.dentry);
> +
> +	p->uc.handler     = handler;
> +	p->uc.ret_handler = ret_handler;
> +	p->inode          = inode;
> +
> +	p->uprobe = uprobe_register(inode, offset, 0, &p->uc);
>   	path_put(&path);
> -	return p;
> +
> +	if (IS_ERR(p->uprobe)) {
> +		ret = PTR_ERR(p->uprobe);
> +		p->uprobe = NULL;
> +		p->inode  = NULL;
> +		return ret;
> +	}
> +
> +	return 0;
>   }
> -EXPORT_SYMBOL_GPL(rv_uprobe_attach);
> +EXPORT_SYMBOL_GPL(rv_uprobe_register);
>   
>   /**
> - * rv_uprobe_detach - synchronously unregister an uprobe and free it
> + * rv_uprobe_unregister - synchronously unregister a uprobe
>    */
> -void rv_uprobe_detach(struct rv_uprobe *p)
> +void rv_uprobe_unregister(struct rv_uprobe *p)
>   {
> -	if (!p)
> +	if (!p || IS_ERR_OR_NULL(p->uprobe))
>   		return;
>   
>   	rv_uprobe_unregister_nosync(p);
>   	rv_uprobe_sync();
> -	rv_uprobe_free(p);
>   }
> -EXPORT_SYMBOL_GPL(rv_uprobe_detach);
> +EXPORT_SYMBOL_GPL(rv_uprobe_unregister);
>   
>   /**
>    * rv_uprobe_unregister_nosync - dequeue an uprobe without waiting
>    */
>   void rv_uprobe_unregister_nosync(struct rv_uprobe *p)
>   {
> -	struct rv_uprobe_impl *impl;
> -
> -	if (!p)
> +	if (!p || IS_ERR_OR_NULL(p->uprobe))
>   		return;
>   
> -	impl = container_of(p, struct rv_uprobe_impl, pub);
> -	uprobe_unregister_nosync(impl->uprobe, &impl->uc);
> +	uprobe_unregister_nosync(p->uprobe, &p->uc);
> +	p->uprobe = NULL;
> +	p->inode  = NULL;
>   }
>   EXPORT_SYMBOL_GPL(rv_uprobe_unregister_nosync);
>   
> @@ -164,19 +89,3 @@ void rv_uprobe_sync(void)
>   	uprobe_unregister_sync();
>   }
>   EXPORT_SYMBOL_GPL(rv_uprobe_sync);
> -
> -/**
> - * rv_uprobe_free - release resources of a previously deregistered probe
> - */
> -void rv_uprobe_free(struct rv_uprobe *p)
> -{
> -	struct rv_uprobe_impl *impl;
> -
> -	if (!p)
> -		return;
> -
> -	impl = container_of(p, struct rv_uprobe_impl, pub);
> -	path_put(&p->path);
> -	kfree(impl);
> -}
> -EXPORT_SYMBOL_GPL(rv_uprobe_free);
> 

^ permalink raw reply

* Re: [PATCH v3 17/17] selftests/verification: Add selftests for deadline and stall monitors
From: Wen Yang @ 2026-06-28 16:58 UTC (permalink / raw)
  To: Gabriele Monaco, linux-trace-kernel, linux-kernel, Steven Rostedt,
	Shuah Khan, linux-kselftest
  Cc: Nam Cao, Thomas Weissschuh, Tomas Glozar, John Kacur
In-Reply-To: <20260625121440.116317-18-gmonaco@redhat.com>



On 6/25/26 20:14, Gabriele Monaco wrote:
> Add selftests to verify deadline monitors don't fail under expected
> conditions and the stall monitor report violations only when expected.
> 
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>   .../verification/test.d/rv_deadline.tc        | 21 ++++++++++++
>   .../selftests/verification/test.d/rv_stall.tc | 33 +++++++++++++++++++
>   2 files changed, 54 insertions(+)
>   create mode 100644 tools/testing/selftests/verification/test.d/rv_deadline.tc
>   create mode 100644 tools/testing/selftests/verification/test.d/rv_stall.tc
> 
> diff --git a/tools/testing/selftests/verification/test.d/rv_deadline.tc b/tools/testing/selftests/verification/test.d/rv_deadline.tc
> new file mode 100644
> index 0000000000..b583096bed
> --- /dev/null
> +++ b/tools/testing/selftests/verification/test.d/rv_deadline.tc
> @@ -0,0 +1,21 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +# description: Test deadline monitors trigger no reaction
> +# requires: available_reactors deadline:monitor printk:reactor stress-ng:program
> +
> +load() { # returns true if there was a reaction
> +	local lines_before
> +	lines_before=$(dmesg | wc -l)
> +	stress-ng --cpu 2 --sched deadline --sched-period 100000000 --sched-deadline 100000000 --sched-runtime 20000000 -t 5 &
> +	stress-ng --cpu 2 --sched rr --sched-prio 50 --cyclic 1 --cyclic-policy rr --cyclic-prio 50 -t 5 &
> +	wait
> +	dmesg | tail -n +$((lines_before + 1)) | grep -q "rv: monitor [a-z]\+ does not allow event"
> +}
> +
> +echo 1 > monitors/deadline/enable
> +echo printk > monitors/deadline/reactors
> +
> +! load || false
> +
> +echo nop > monitors/deadline/reactors
> +echo 0 > monitors/deadline/enable
> diff --git a/tools/testing/selftests/verification/test.d/rv_stall.tc b/tools/testing/selftests/verification/test.d/rv_stall.tc
> new file mode 100644
> index 0000000000..8e9bcbb441
> --- /dev/null
> +++ b/tools/testing/selftests/verification/test.d/rv_stall.tc
> @@ -0,0 +1,33 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +# description: Test stall monitor
> +# requires: available_reactors stall:monitor printk:reactor stress-ng:program
> +
> +THRESHOLD=/sys/module/stall/parameters/threshold_jiffies
> +
> +load() { # returns true if there was a reaction
> +	local lines_before cpu
> +	cpu=$(($(nproc) - 1))
> +	lines_before=$(dmesg | wc -l)
> +	stress-ng --cpu 1 --taskset "$cpu" --sched rr --sched-prio 1 -t 3 &
> +	stress-ng --cpu 5 --taskset "$cpu" -t 3 &
> +	wait
> +	dmesg | tail -n +$((lines_before + 1)) | grep -q "rv: monitor stall does not allow event"
> +}
> +
> +echo 5000 > $THRESHOLD
> +echo 1 > monitors/stall/enable
> +echo printk > monitors/stall/reactors
> +
> +! load || false
> +
> +echo 0 > monitors/stall/enable
> +echo 70 > $THRESHOLD
> +echo 1 > monitors/stall/enable
> +
> +load
> +
> +echo nop > monitors/stall/reactors
> +echo 0 > monitors/stall/enable
> +
> +echo 1000 > $THRESHOLD

A little worried:
If the system had a different value configured before the test ran,
this silently changes system state for anything running after it.


--
Best wishes,
Wen





^ permalink raw reply

* Re: [PATCH v3 14/17] verification/rvgen: Add selftests for rvgen kunit
From: Wen Yang @ 2026-06-28 17:06 UTC (permalink / raw)
  To: Gabriele Monaco, linux-trace-kernel, linux-kernel, Steven Rostedt
  Cc: Nam Cao, Thomas Weissschuh, Tomas Glozar, John Kacur
In-Reply-To: <20260625121440.116317-15-gmonaco@redhat.com>



On 6/25/26 20:14, Gabriele Monaco wrote:
> The rvgen kunit command patches monitor files and adds necessary
> definitions for kunit tests.
> 
> Add a test case validating its behaviour on dummy generated files and
> comparing it against reference files, like it's done for rvgen monitor.
> 
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>   .../rvgen/rvgen/templates/kunit.c             |   2 +-
>   .../rvgen/tests/golden/test_da_kunit/Kconfig  |   9 +
>   .../golden/test_da_kunit/test_da_kunit.c      | 111 ++++++++
>   .../golden/test_da_kunit/test_da_kunit.h      |  47 ++++
>   .../test_da_kunit/test_da_kunit_kunit.c       |  29 ++
>   .../test_da_kunit/test_da_kunit_kunit.h       |  23 ++
>   .../test_da_kunit/test_da_kunit_trace.h       |  15 +
>   .../rvgen/tests/golden/test_ha_kunit/Kconfig  |   9 +
>   .../golden/test_ha_kunit/test_ha_kunit.c      | 264 ++++++++++++++++++
>   .../golden/test_ha_kunit/test_ha_kunit.h      |  88 ++++++
>   .../test_ha_kunit/test_ha_kunit_kunit.c       |  29 ++
>   .../test_ha_kunit/test_ha_kunit_kunit.h       |  24 ++
>   .../test_ha_kunit/test_ha_kunit_trace.h       |  19 ++
>   .../rvgen/tests/golden/test_ltl_kunit/Kconfig |   9 +
>   .../golden/test_ltl_kunit/test_ltl_kunit.c    | 107 +++++++
>   .../golden/test_ltl_kunit/test_ltl_kunit.h    | 108 +++++++
>   .../test_ltl_kunit/test_ltl_kunit_kunit.c     |  29 ++
>   .../test_ltl_kunit/test_ltl_kunit_kunit.h     |  22 ++
>   .../test_ltl_kunit/test_ltl_kunit_trace.h     |  14 +
>   tools/verification/rvgen/tests/rvgen_kunit.t  |  32 +++
>   20 files changed, 989 insertions(+), 1 deletion(-)
>   create mode 100644 tools/verification/rvgen/tests/golden/test_da_kunit/Kconfig
>   create mode 100644 tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit.c
>   create mode 100644 tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit.h
>   create mode 100644 tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit_kunit.c
>   create mode 100644 tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit_kunit.h
>   create mode 100644 tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit_trace.h
>   create mode 100644 tools/verification/rvgen/tests/golden/test_ha_kunit/Kconfig
>   create mode 100644 tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit.c
>   create mode 100644 tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit.h
>   create mode 100644 tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit_kunit.c
>   create mode 100644 tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit_kunit.h
>   create mode 100644 tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit_trace.h
>   create mode 100644 tools/verification/rvgen/tests/golden/test_ltl_kunit/Kconfig
>   create mode 100644 tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit.c
>   create mode 100644 tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit.h
>   create mode 100644 tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit_kunit.c
>   create mode 100644 tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit_kunit.h
>   create mode 100644 tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit_trace.h
>   create mode 100644 tools/verification/rvgen/tests/rvgen_kunit.t
> 
> diff --git a/tools/verification/rvgen/rvgen/templates/kunit.c b/tools/verification/rvgen/rvgen/templates/kunit.c
> index d29bbf2ea5..402b5c8575 100644
> --- a/tools/verification/rvgen/rvgen/templates/kunit.c
> +++ b/tools/verification/rvgen/rvgen/templates/kunit.c
> @@ -8,7 +8,7 @@
>    */
>   #include "%%MODEL_NAME%%_kunit.h"
>   
> -#if IS_ENABLED(CONFIG_RV_MON_%%MODEL_NAME_UP%%)
> +#if IS_REACHABLE(CONFIG_RV_MON_%%MODEL_NAME_UP%%)
>   
>   static void rv_test_%%MODEL_NAME%%(struct kunit *test)
>   {
> diff --git a/tools/verification/rvgen/tests/golden/test_da_kunit/Kconfig b/tools/verification/rvgen/tests/golden/test_da_kunit/Kconfig
> new file mode 100644
> index 0000000000..6d664ba562
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_da_kunit/Kconfig
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +config RV_MON_TEST_DA_KUNIT
> +	depends on RV
> +	# XXX: add dependencies if there
> +	select DA_MON_EVENTS_IMPLICIT
> +	bool "test_da_kunit monitor"
> +	help
> +	  auto-generated
> diff --git a/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit.c b/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit.c
> new file mode 100644
> index 0000000000..c2916c3e86
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit.c
> @@ -0,0 +1,111 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/ftrace.h>
> +#include <linux/tracepoint.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/rv.h>
> +#include <rv/instrumentation.h>
> +
> +#define MODULE_NAME "test_da_kunit"
> +
> +/*
> + * XXX: include required tracepoint headers, e.g.,
> + * #include <trace/events/sched.h>
> + */
> +#include <rv_trace.h>
> +
> +/*
> + * This is the self-generated part of the monitor. Generally, there is no need
> + * to touch this section.
> + */
> +#define RV_MON_TYPE RV_MON_PER_CPU
> +#include "test_da_kunit.h"
> +#include <rv/da_monitor.h>
> +
> +/*
> + * This is the instrumentation part of the monitor.
> + *
> + * This is the section where manual work is required. Here the kernel events
> + * are translated into model's event.
> + *
> + */
> +static void handle_event_1(void *data, /* XXX: fill header */)
> +{
> +	da_handle_event(event_1_test_da_kunit);
> +}
> +
> +static void handle_event_2(void *data, /* XXX: fill header */)
> +{
> +	/* XXX: validate that this event always leads to the initial state */
> +	da_handle_start_event(event_2_test_da_kunit);
> +}
> +
> +static int enable_test_da_kunit(void)
> +{
> +	int retval;
> +
> +	retval = da_monitor_init();
> +	if (retval)
> +		return retval;
> +
> +	rv_attach_trace_probe("test_da_kunit", /* XXX: tracepoint */, handle_event_1);
> +	rv_attach_trace_probe("test_da_kunit", /* XXX: tracepoint */, handle_event_2);
> +
> +	return 0;
> +}
> +
> +static void disable_test_da_kunit(void)
> +{
> +	rv_this.enabled = 0;
> +
> +	rv_detach_trace_probe("test_da_kunit", /* XXX: tracepoint */, handle_event_1);
> +	rv_detach_trace_probe("test_da_kunit", /* XXX: tracepoint */, handle_event_2);
> +
> +	da_monitor_destroy();
> +}
> +
> +/*
> + * This is the monitor register section.
> + */
> +static struct rv_monitor rv_this = {
> +	.name = "test_da_kunit",
> +	.description = "auto-generated",
> +	.enable = enable_test_da_kunit,
> +	.disable = disable_test_da_kunit,
> +	.reset = da_monitor_reset_all,
> +	.enabled = 0,
> +};
> +
> +static int __init register_test_da_kunit(void)
> +{
> +	return rv_register_monitor(&rv_this, NULL);
> +}
> +
> +static void __exit unregister_test_da_kunit(void)
> +{
> +	rv_unregister_monitor(&rv_this);
> +}
> +
> +module_init(register_test_da_kunit);
> +module_exit(unregister_test_da_kunit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("dot2k: auto-generated");
> +MODULE_DESCRIPTION("test_da_kunit: auto-generated");
> +
> +#if IS_ENABLED(CONFIG_RV_MONITORS_KUNIT_TEST)
> +#include <kunit/visibility.h>
> +#include "test_da_kunit_kunit.h"
> +
> +const struct rv_test_da_kunit_ops rv_test_da_kunit_ops = {
> +	.mon = {
> +		.rv_this = &rv_this,
> +		.monitor_init = da_monitor_init,
> +		.monitor_destroy = da_monitor_destroy,
> +	},
> +	.handle_event_1 = handle_event_1,
> +	.handle_event_2 = handle_event_2,
> +};
> +EXPORT_SYMBOL_IF_KUNIT(rv_test_da_kunit_ops);
> +#endif
> diff --git a/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit.h b/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit.h
> new file mode 100644
> index 0000000000..290a9454ca
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Automatically generated C representation of test_da_kunit automaton
> + * For further information about this format, see kernel documentation:
> + *   Documentation/trace/rv/deterministic_automata.rst
> + */
> +
> +#define MONITOR_NAME test_da_kunit
> +
> +enum states_test_da_kunit {
> +	state_a_test_da_kunit,
> +	state_b_test_da_kunit,
> +	state_max_test_da_kunit,
> +};
> +
> +#define INVALID_STATE state_max_test_da_kunit
> +
> +enum events_test_da_kunit {
> +	event_1_test_da_kunit,
> +	event_2_test_da_kunit,
> +	event_max_test_da_kunit,
> +};
> +
> +struct automaton_test_da_kunit {
> +	char *state_names[state_max_test_da_kunit];
> +	char *event_names[event_max_test_da_kunit];
> +	unsigned char function[state_max_test_da_kunit][event_max_test_da_kunit];
> +	unsigned char initial_state;
> +	bool final_states[state_max_test_da_kunit];
> +};
> +
> +static const struct automaton_test_da_kunit automaton_test_da_kunit = {
> +	.state_names = {
> +		"state_a",
> +		"state_b",
> +	},
> +	.event_names = {
> +		"event_1",
> +		"event_2",
> +	},
> +	.function = {
> +		{       state_b_test_da_kunit,       state_a_test_da_kunit },
> +		{               INVALID_STATE,       state_a_test_da_kunit },
> +	},
> +	.initial_state = state_a_test_da_kunit,
> +	.final_states = { 1, 0 },
> +};
> diff --git a/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit_kunit.c b/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit_kunit.c
> new file mode 100644
> index 0000000000..06a280444b
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit_kunit.c
> @@ -0,0 +1,29 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/kernel.h>
> +#include <linux/rv.h>
> +#include <rv/kunit.h>
> +/*
> + * XXX: include required headers, e.g.,
> + * #include <linux/sched.h>
> + */
> +#include "test_da_kunit_kunit.h"
> +
> +#if IS_REACHABLE(CONFIG_RV_MON_TEST_DA_KUNIT)
> +
> +static void rv_test_test_da_kunit(struct kunit *test)
> +{
> +	struct rv_kunit_ctx *ctx = test->priv;
> +
> +	prepare_test(test, &rv_test_da_kunit_ops.mon);
> +
> +	/*
> +	 * XXX: write the test here
> +	 * e.g.
> +	 * RV_KUNIT_EXPECT_REACTION_HERE(test, ctx)
> +	 *	rv_test_da_kunit_ops.handle_event(args);
> +	 */
> +}
> +
> +#else
> +#define rv_test_test_da_kunit rv_test_stub
> +#endif
> diff --git a/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit_kunit.h b/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit_kunit.h
> new file mode 100644
> index 0000000000..0094215ff4
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit_kunit.h
> @@ -0,0 +1,23 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Automatically generated by rvgen kunit.
> + * May need manual intervention for function prototypes that couldn't be
> + * found (e.g. are in another file) or variables to be exported.
> + */
> +
> +#ifndef __TEST_DA_KUNIT_KUNIT_H
> +#define __TEST_DA_KUNIT_KUNIT_H
> +
> +#if IS_ENABLED(CONFIG_RV_MONITORS_KUNIT_TEST)
> +
> +#include <linux/rv.h>
> +#include <rv/kunit.h>
> +
> +extern const struct rv_test_da_kunit_ops {
> +	struct rv_kunit_mon mon;
> +	void (*handle_event_1)(void *data, /* XXX: fill header */);
> +	void (*handle_event_2)(void *data, /* XXX: fill header */);
> +} rv_test_da_kunit_ops;
> +#endif
> +
> +#endif /* __TEST_DA_KUNIT_KUNIT_H */
> diff --git a/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit_trace.h b/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit_trace.h
> new file mode 100644
> index 0000000000..16804a79e8
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_da_kunit/test_da_kunit_trace.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Snippet to be included in rv_trace.h
> + */
> +
> +#ifdef CONFIG_RV_MON_TEST_DA_KUNIT
> +DEFINE_EVENT(event_da_monitor, event_test_da_kunit,
> +	     TP_PROTO(char *state, char *event, char *next_state, bool final_state),
> +	     TP_ARGS(state, event, next_state, final_state));
> +
> +DEFINE_EVENT(error_da_monitor, error_test_da_kunit,
> +	     TP_PROTO(char *state, char *event),
> +	     TP_ARGS(state, event));
> +#endif /* CONFIG_RV_MON_TEST_DA_KUNIT */
> diff --git a/tools/verification/rvgen/tests/golden/test_ha_kunit/Kconfig b/tools/verification/rvgen/tests/golden/test_ha_kunit/Kconfig
> new file mode 100644
> index 0000000000..6c48770ace
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_ha_kunit/Kconfig
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +config RV_MON_TEST_HA_KUNIT
> +	depends on RV
> +	# XXX: add dependencies if there
> +	select HA_MON_EVENTS_ID
> +	bool "test_ha_kunit monitor"
> +	help
> +	  auto-generated
> diff --git a/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit.c b/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit.c
> new file mode 100644
> index 0000000000..69ca870ac2
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit.c
> @@ -0,0 +1,264 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/ftrace.h>
> +#include <linux/tracepoint.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/rv.h>
> +#include <rv/instrumentation.h>
> +
> +#define MODULE_NAME "test_ha_kunit"
> +
> +/*
> + * XXX: include required tracepoint headers, e.g.,
> + * #include <trace/events/sched.h>
> + */
> +#include <rv_trace.h>
> +
> +/*
> + * This is the self-generated part of the monitor. Generally, there is no need
> + * to touch this section.
> + */
> +#define RV_MON_TYPE RV_MON_PER_TASK
> +/* XXX: If the monitor has several instances, consider HA_TIMER_WHEEL */
> +#define HA_TIMER_TYPE HA_TIMER_HRTIMER
> +#include "test_ha_kunit.h"
> +#include <rv/ha_monitor.h>
> +
> +/*
> + * This is the instrumentation part of the monitor.
> + *
> + * This is the section where manual work is required. Here the kernel events
> + * are translated into model's event.
> + *
> + */
> +#define BAR_NS(ha_mon) /* XXX: what is BAR_NS(ha_mon)? */
> +
> +#define FOO_NS /* XXX: what is FOO_NS? */
> +
> +static inline u64 bar_ns(struct ha_monitor *ha_mon)
> +{
> +	return /* XXX: what is bar_ns(ha_mon)? */;
> +}
> +
> +static u64 foo_ns = /* XXX: default value */;
> +module_param(foo_ns, ullong, 0644);
> +
> +/*
> + * These functions define how to read and reset the environment variable.
> + *
> + * Common environment variables like ns-based and jiffy-based clocks have
> + * pre-define getters and resetters you can use. The parser can infer the type
> + * of the environment variable if you supply a measure unit in the constraint.
> + * If you define your own functions, make sure to add appropriate memory
> + * barriers if required.
> + * Some environment variables don't require a storage as they read a system
> + * state (e.g. preemption count). Those variables are never reset, so we don't
> + * define a reset function on monitors only relying on this type of variables.
> + */
> +static u64 ha_get_env(struct ha_monitor *ha_mon, enum envs_test_ha_kunit env, u64 time_ns)
> +{
> +	if (env == clk_test_ha_kunit)
> +		return ha_get_clk_ns(ha_mon, env, time_ns);
> +	else if (env == env1_test_ha_kunit)
> +		return /* XXX: how do I read env1? */
> +	else if (env == env2_test_ha_kunit)
> +		return /* XXX: how do I read env2? */
> +	return ENV_INVALID_VALUE;
> +}
> +
> +static void ha_reset_env(struct ha_monitor *ha_mon, enum envs_test_ha_kunit env, u64 time_ns)
> +{
> +	if (env == clk_test_ha_kunit)
> +		ha_reset_clk_ns(ha_mon, env, time_ns);
> +}
> +
> +/*
> + * These functions are used to validate state transitions.
> + *
> + * They are generated by parsing the model, there is usually no need to change them.
> + * If the monitor requires a timer, there are functions responsible to arm it when
> + * the next state has a constraint, cancel it in any other case and to check
> + * that it didn't expire before the callback run. Transitions to the same state
> + * without a reset never affect timers.
> + * Due to the different representations between invariants and guards, there is
> + * a function to convert it in case invariants or guards are reachable from
> + * another invariant without reset. Those are not present if not required in
> + * the model. This is all automatic but is worth checking because it may show
> + * errors in the model (e.g. missing resets).
> + */
> +static inline bool ha_verify_invariants(struct ha_monitor *ha_mon,
> +					enum states curr_state, enum events event,
> +					enum states next_state, u64 time_ns)
> +{
> +	if (curr_state == S0_test_ha_kunit)
> +		return ha_check_invariant_ns(ha_mon, clk_test_ha_kunit, time_ns);
> +	else if (curr_state == S2_test_ha_kunit)
> +		return ha_check_invariant_ns(ha_mon, clk_test_ha_kunit, time_ns);
> +	return true;
> +}
> +
> +static inline void ha_convert_inv_guard(struct ha_monitor *ha_mon,
> +					enum states curr_state, enum events event,
> +					enum states next_state, u64 time_ns)
> +{
> +	if (curr_state == next_state)
> +		return;
> +	if (curr_state == S2_test_ha_kunit)
> +		ha_inv_to_guard(ha_mon, clk_test_ha_kunit, BAR_NS(ha_mon), time_ns);
> +}
> +
> +static inline bool ha_verify_guards(struct ha_monitor *ha_mon,
> +				    enum states curr_state, enum events event,
> +				    enum states next_state, u64 time_ns)
> +{
> +	bool res = true;
> +
> +	if (curr_state == S0_test_ha_kunit && event == event0_test_ha_kunit)
> +		ha_reset_env(ha_mon, clk_test_ha_kunit, time_ns);
> +	else if (curr_state == S0_test_ha_kunit && event == event1_test_ha_kunit)
> +		ha_reset_env(ha_mon, clk_test_ha_kunit, time_ns);
> +	else if (curr_state == S1_test_ha_kunit && event == event0_test_ha_kunit)
> +		ha_reset_env(ha_mon, clk_test_ha_kunit, time_ns);
> +	else if (curr_state == S1_test_ha_kunit && event == event2_test_ha_kunit) {
> +		res = ha_get_env(ha_mon, env1_test_ha_kunit, time_ns) == 0ull;
> +		ha_reset_env(ha_mon, clk_test_ha_kunit, time_ns);
> +	} else if (curr_state == S2_test_ha_kunit && event == event1_test_ha_kunit)
> +		res = ha_monitor_env_invalid(ha_mon, clk_test_ha_kunit) ||
> +		      ha_get_env(ha_mon, clk_test_ha_kunit, time_ns) < foo_ns;
> +	else if (curr_state == S3_test_ha_kunit && event == event0_test_ha_kunit)
> +		res = ha_monitor_env_invalid(ha_mon, clk_test_ha_kunit) ||
> +		      (ha_get_env(ha_mon, clk_test_ha_kunit, time_ns) < FOO_NS &&
> +		      ha_get_env(ha_mon, env2_test_ha_kunit, time_ns) == 0ull);
> +	else if (curr_state == S3_test_ha_kunit && event == event1_test_ha_kunit) {
> +		res = ha_monitor_env_invalid(ha_mon, clk_test_ha_kunit) ||
> +		      (ha_get_env(ha_mon, clk_test_ha_kunit, time_ns) < 5000ull &&
> +		      ha_get_env(ha_mon, env1_test_ha_kunit, time_ns) == 1ull);
> +		ha_reset_env(ha_mon, clk_test_ha_kunit, time_ns);
> +	}
> +	return res;
> +}
> +
> +static inline void ha_setup_invariants(struct ha_monitor *ha_mon,
> +				       enum states curr_state, enum events event,
> +				       enum states next_state, u64 time_ns)
> +{
> +	if (next_state == curr_state && event != event0_test_ha_kunit)
> +		return;
> +	if (next_state == S0_test_ha_kunit)
> +		ha_start_timer_ns(ha_mon, clk_test_ha_kunit, bar_ns(ha_mon), time_ns);
> +	else if (next_state == S2_test_ha_kunit)
> +		ha_start_timer_ns(ha_mon, clk_test_ha_kunit, BAR_NS(ha_mon), time_ns);
> +	else if (curr_state == S0_test_ha_kunit)
> +		ha_cancel_timer(ha_mon);
> +	else if (curr_state == S2_test_ha_kunit)
> +		ha_cancel_timer(ha_mon);
> +}
> +
> +static bool ha_verify_constraint(struct ha_monitor *ha_mon,
> +				 enum states curr_state, enum events event,
> +				 enum states next_state, u64 time_ns)
> +{
> +	if (!ha_verify_invariants(ha_mon, curr_state, event, next_state, time_ns))
> +		return false;
> +
> +	ha_convert_inv_guard(ha_mon, curr_state, event, next_state, time_ns);
> +
> +	if (!ha_verify_guards(ha_mon, curr_state, event, next_state, time_ns))
> +		return false;
> +
> +	ha_setup_invariants(ha_mon, curr_state, event, next_state, time_ns);
> +
> +	return true;
> +}
> +
> +static void handle_event0(void *data, /* XXX: fill header */)
> +{
> +	/* XXX: validate that this event always leads to the initial state */
> +	struct task_struct *p = /* XXX: how do I get p? */;
> +	da_handle_start_event(p, event0_test_ha_kunit);
> +}
> +
> +static void handle_event1(void *data, /* XXX: fill header */)
> +{
> +	struct task_struct *p = /* XXX: how do I get p? */;
> +	da_handle_event(p, event1_test_ha_kunit);
> +}
> +
> +static void handle_event2(void *data, /* XXX: fill header */)
> +{
> +	struct task_struct *p = /* XXX: how do I get p? */;
> +	da_handle_event(p, event2_test_ha_kunit);
> +}
> +
> +static int enable_test_ha_kunit(void)
> +{
> +	int retval;
> +
> +	retval = ha_monitor_init();
> +	if (retval)
> +		return retval;
> +
> +	rv_attach_trace_probe("test_ha_kunit", /* XXX: tracepoint */, handle_event0);
> +	rv_attach_trace_probe("test_ha_kunit", /* XXX: tracepoint */, handle_event1);
> +	rv_attach_trace_probe("test_ha_kunit", /* XXX: tracepoint */, handle_event2);
> +
> +	return 0;
> +}
> +
> +static void disable_test_ha_kunit(void)
> +{
> +	rv_this.enabled = 0;
> +
> +	rv_detach_trace_probe("test_ha_kunit", /* XXX: tracepoint */, handle_event0);
> +	rv_detach_trace_probe("test_ha_kunit", /* XXX: tracepoint */, handle_event1);
> +	rv_detach_trace_probe("test_ha_kunit", /* XXX: tracepoint */, handle_event2);
> +
> +	ha_monitor_destroy();
> +}
> +
> +/*
> + * This is the monitor register section.
> + */
> +static struct rv_monitor rv_this = {
> +	.name = "test_ha_kunit",
> +	.description = "auto-generated",
> +	.enable = enable_test_ha_kunit,
> +	.disable = disable_test_ha_kunit,
> +	.reset = da_monitor_reset_all,
> +	.enabled = 0,
> +};
> +
> +static int __init register_test_ha_kunit(void)
> +{
> +	return rv_register_monitor(&rv_this, NULL);
> +}
> +
> +static void __exit unregister_test_ha_kunit(void)
> +{
> +	rv_unregister_monitor(&rv_this);
> +}
> +
> +module_init(register_test_ha_kunit);
> +module_exit(unregister_test_ha_kunit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("dot2k: auto-generated");
> +MODULE_DESCRIPTION("test_ha_kunit: auto-generated");
> +
> +#if IS_ENABLED(CONFIG_RV_MONITORS_KUNIT_TEST)
> +#include <kunit/visibility.h>
> +#include "test_ha_kunit_kunit.h"
> +
> +const struct rv_test_ha_kunit_ops rv_test_ha_kunit_ops = {
> +	.mon = {
> +		.rv_this = &rv_this,
> +		.monitor_init = ha_monitor_init,
> +		.monitor_destroy = ha_monitor_destroy,
> +	},
> +	.handle_event0 = handle_event0,
> +	.handle_event1 = handle_event1,
> +	.handle_event2 = handle_event2,
> +};
> +EXPORT_SYMBOL_IF_KUNIT(rv_test_ha_kunit_ops);
> +#endif
> diff --git a/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit.h b/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit.h
> new file mode 100644
> index 0000000000..5c428f818b
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit.h
> @@ -0,0 +1,88 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Automatically generated C representation of test_ha_kunit automaton
> + * For further information about this format, see kernel documentation:
> + *   Documentation/trace/rv/deterministic_automata.rst
> + */
> +
> +#define MONITOR_NAME test_ha_kunit
> +
> +enum states_test_ha_kunit {
> +	S0_test_ha_kunit,
> +	S1_test_ha_kunit,
> +	S2_test_ha_kunit,
> +	S3_test_ha_kunit,
> +	state_max_test_ha_kunit,
> +};
> +
> +#define INVALID_STATE state_max_test_ha_kunit
> +
> +enum events_test_ha_kunit {
> +	event0_test_ha_kunit,
> +	event1_test_ha_kunit,
> +	event2_test_ha_kunit,
> +	event_max_test_ha_kunit,
> +};
> +
> +enum envs_test_ha_kunit {
> +	clk_test_ha_kunit,
> +	env1_test_ha_kunit,
> +	env2_test_ha_kunit,
> +	env_max_test_ha_kunit,
> +	env_max_stored_test_ha_kunit = env1_test_ha_kunit,
> +};
> +
> +_Static_assert(env_max_stored_test_ha_kunit <= MAX_HA_ENV_LEN, "Not enough slots");
> +#define HA_CLK_NS
> +
> +struct automaton_test_ha_kunit {
> +	char *state_names[state_max_test_ha_kunit];
> +	char *event_names[event_max_test_ha_kunit];
> +	char *env_names[env_max_test_ha_kunit];
> +	unsigned char function[state_max_test_ha_kunit][event_max_test_ha_kunit];
> +	unsigned char initial_state;
> +	bool final_states[state_max_test_ha_kunit];
> +};
> +
> +static const struct automaton_test_ha_kunit automaton_test_ha_kunit = {
> +	.state_names = {
> +		"S0",
> +		"S1",
> +		"S2",
> +		"S3",
> +	},
> +	.event_names = {
> +		"event0",
> +		"event1",
> +		"event2",
> +	},
> +	.env_names = {
> +		"clk",
> +		"env1",
> +		"env2",
> +	},
> +	.function = {
> +		{
> +			S0_test_ha_kunit,
> +			S1_test_ha_kunit,
> +			INVALID_STATE,
> +		},
> +		{
> +			S0_test_ha_kunit,
> +			INVALID_STATE,
> +			S2_test_ha_kunit,
> +		},
> +		{
> +			INVALID_STATE,
> +			S2_test_ha_kunit,
> +			S3_test_ha_kunit,
> +		},
> +		{
> +			S0_test_ha_kunit,
> +			S1_test_ha_kunit,
> +			INVALID_STATE,
> +		},
> +	},
> +	.initial_state = S0_test_ha_kunit,
> +	.final_states = { 1, 0, 0, 0 },
> +};
> diff --git a/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit_kunit.c b/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit_kunit.c
> new file mode 100644
> index 0000000000..730a9a26a0
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit_kunit.c
> @@ -0,0 +1,29 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/kernel.h>
> +#include <linux/rv.h>
> +#include <rv/kunit.h>
> +/*
> + * XXX: include required headers, e.g.,
> + * #include <linux/sched.h>
> + */
> +#include "test_ha_kunit_kunit.h"
> +
> +#if IS_REACHABLE(CONFIG_RV_MON_TEST_HA_KUNIT)
> +
> +static void rv_test_test_ha_kunit(struct kunit *test)
> +{
> +	struct rv_kunit_ctx *ctx = test->priv;
> +
> +	prepare_test(test, &rv_test_ha_kunit_ops.mon);
> +
> +	/*
> +	 * XXX: write the test here
> +	 * e.g.
> +	 * RV_KUNIT_EXPECT_REACTION_HERE(test, ctx)
> +	 *	rv_test_ha_kunit_ops.handle_event(args);
> +	 */
> +}
> +
> +#else
> +#define rv_test_test_ha_kunit rv_test_stub
> +#endif
> diff --git a/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit_kunit.h b/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit_kunit.h
> new file mode 100644
> index 0000000000..0b2030cb64
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit_kunit.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Automatically generated by rvgen kunit.
> + * May need manual intervention for function prototypes that couldn't be
> + * found (e.g. are in another file) or variables to be exported.
> + */
> +
> +#ifndef __TEST_HA_KUNIT_KUNIT_H
> +#define __TEST_HA_KUNIT_KUNIT_H
> +
> +#if IS_ENABLED(CONFIG_RV_MONITORS_KUNIT_TEST)
> +
> +#include <linux/rv.h>
> +#include <rv/kunit.h>
> +
> +extern const struct rv_test_ha_kunit_ops {
> +	struct rv_kunit_mon mon;
> +	void (*handle_event0)(void *data, /* XXX: fill header */);
> +	void (*handle_event1)(void *data, /* XXX: fill header */);
> +	void (*handle_event2)(void *data, /* XXX: fill header */);
> +} rv_test_ha_kunit_ops;
> +#endif
> +
> +#endif /* __TEST_HA_KUNIT_KUNIT_H */
> diff --git a/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit_trace.h b/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit_trace.h
> new file mode 100644
> index 0000000000..6c13ee0068
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_ha_kunit/test_ha_kunit_trace.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Snippet to be included in rv_trace.h
> + */
> +
> +#ifdef CONFIG_RV_MON_TEST_HA_KUNIT
> +DEFINE_EVENT(event_da_monitor_id, event_test_ha_kunit,
> +	     TP_PROTO(int id, char *state, char *event, char *next_state, bool final_state),
> +	     TP_ARGS(id, state, event, next_state, final_state));
> +
> +DEFINE_EVENT(error_da_monitor_id, error_test_ha_kunit,
> +	     TP_PROTO(int id, char *state, char *event),
> +	     TP_ARGS(id, state, event));
> +
> +DEFINE_EVENT(error_env_da_monitor_id, error_env_test_ha_kunit,
> +	     TP_PROTO(int id, char *state, char *event, char *env),
> +	     TP_ARGS(id, state, event, env));
> +#endif /* CONFIG_RV_MON_TEST_HA_KUNIT */
> diff --git a/tools/verification/rvgen/tests/golden/test_ltl_kunit/Kconfig b/tools/verification/rvgen/tests/golden/test_ltl_kunit/Kconfig
> new file mode 100644
> index 0000000000..3e334c3442
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_ltl_kunit/Kconfig
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +config RV_MON_TEST_LTL_KUNIT
> +	depends on RV
> +	# XXX: add dependencies if there
> +	select LTL_MON_EVENTS_ID
> +	bool "test_ltl_kunit monitor"
> +	help
> +	  auto-generated
> diff --git a/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit.c b/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit.c
> new file mode 100644
> index 0000000000..5e95a0f462
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit.c
> @@ -0,0 +1,107 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/ftrace.h>
> +#include <linux/tracepoint.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/rv.h>
> +#include <rv/instrumentation.h>
> +
> +#define MODULE_NAME "test_ltl_kunit"
> +
> +/*
> + * XXX: include required tracepoint headers, e.g.,
> + * #include <trace/events/sched.h>
> + */
> +#include <rv_trace.h>
> +
> +
> +/*
> + * This is the self-generated part of the monitor. Generally, there is no need
> + * to touch this section.
> + */
> +#include "test_ltl_kunit.h"
> +#include <rv/ltl_monitor.h>
> +
> +static void ltl_atoms_fetch(struct task_struct *task, struct ltl_monitor *mon)
> +{
> +	/*
> +	 * This is called everytime the Buchi automaton is triggered.
> +	 *
> +	 * This function could be used to fetch the atomic propositions which
> +	 * are expensive to trace. It is possible only if the atomic proposition
> +	 * does not need to be updated at precise time.
> +	 *
> +	 * It is recommended to use tracepoints and ltl_atom_update() instead.
> +	 */
> +}
> +
> +static void ltl_atoms_init(struct task_struct *task, struct ltl_monitor *mon, bool task_creation)
> +{
> +	/*
> +	 * This should initialize as many atomic propositions as possible.
> +	 *
> +	 * @task_creation indicates whether the task is being created. This is
> +	 * false if the task is already running before the monitor is enabled.
> +	 */
> +	ltl_atom_set(mon, LTL_EVENT_A, true/false);
> +	ltl_atom_set(mon, LTL_EVENT_B, true/false);
> +}
> +
> +/*
> + * This is the instrumentation part of the monitor.
> + *
> + * This is the section where manual work is required. Here the kernel events
> + * are translated into model's event.
> + */
> +static void handle_example_event(void *data, /* XXX: fill header */)
> +{
> +	ltl_atom_update(task, LTL_EVENT_A, true/false);
> +}
> +
> +static int enable_test_ltl_kunit(void)
> +{
> +	int retval;
> +
> +	retval = ltl_monitor_init();
> +	if (retval)
> +		return retval;
> +
> +	rv_attach_trace_probe("test_ltl_kunit", /* XXX: tracepoint */, handle_example_event);
> +
> +	return 0;
> +}
> +
> +static void disable_test_ltl_kunit(void)
> +{
> +	rv_detach_trace_probe("test_ltl_kunit", /* XXX: tracepoint */, handle_sample_event);
> +

one typo:
        handle_sample_event should be handle_example_event.

--
Wen

> +	ltl_monitor_destroy();
> +}
> +
> +/*
> + * This is the monitor register section.
> + */
> +static struct rv_monitor rv_this = {
> +	.name = "test_ltl_kunit",
> +	.description = "auto-generated",
> +	.enable = enable_test_ltl_kunit,
> +	.disable = disable_test_ltl_kunit,
> +};
> +
> +static int __init register_test_ltl_kunit(void)
> +{
> +	return rv_register_monitor(&rv_this, NULL);
> +}
> +
> +static void __exit unregister_test_ltl_kunit(void)
> +{
> +	rv_unregister_monitor(&rv_this);
> +}
> +
> +module_init(register_test_ltl_kunit);
> +module_exit(unregister_test_ltl_kunit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR(/* TODO */);

Please use a valid string here.


--
Wen

> +MODULE_DESCRIPTION("test_ltl_kunit: auto-generated");
> diff --git a/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit.h b/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit.h
> new file mode 100644
> index 0000000000..acc503b56e
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * C implementation of Buchi automaton, automatically generated by
> + * tools/verification/rvgen from the linear temporal logic specification.
> + * For further information, see kernel documentation:
> + *   Documentation/trace/rv/linear_temporal_logic.rst
> + */
> +
> +#include <linux/rv.h>
> +
> +#define MONITOR_NAME test_ltl_kunit
> +
> +enum ltl_atom {
> +	LTL_EVENT_A,
> +	LTL_EVENT_B,
> +	LTL_NUM_ATOM
> +};
> +static_assert(LTL_NUM_ATOM <= RV_MAX_LTL_ATOM);
> +
> +static const char *ltl_atom_str(enum ltl_atom atom)
> +{
> +	static const char *const names[] = {
> +		"ev_a",
> +		"ev_b",
> +	};
> +
> +	return names[atom];
> +}
> +
> +enum ltl_buchi_state {
> +	S0,
> +	S1,
> +	S2,
> +	S3,
> +	S4,
> +	RV_NUM_BA_STATES
> +};
> +static_assert(RV_NUM_BA_STATES <= RV_MAX_BA_STATES);
> +
> +static void ltl_start(struct task_struct *task, struct ltl_monitor *mon)
> +{
> +	bool event_b = test_bit(LTL_EVENT_B, mon->atoms);
> +	bool event_a = test_bit(LTL_EVENT_A, mon->atoms);
> +	bool val1 = !event_a;
> +
> +	if (val1)
> +		__set_bit(S0, mon->states);
> +	if (true)
> +		__set_bit(S1, mon->states);
> +	if (event_b)
> +		__set_bit(S4, mon->states);
> +}
> +
> +static void
> +ltl_possible_next_states(struct ltl_monitor *mon, unsigned int state, unsigned long *next)
> +{
> +	bool event_b = test_bit(LTL_EVENT_B, mon->atoms);
> +	bool event_a = test_bit(LTL_EVENT_A, mon->atoms);
> +	bool val1 = !event_a;
> +
> +	switch (state) {
> +	case S0:
> +		if (val1)
> +			__set_bit(S0, next);
> +		if (true)
> +			__set_bit(S1, next);
> +		if (event_b)
> +			__set_bit(S4, next);
> +		break;
> +	case S1:
> +		if (true)
> +			__set_bit(S1, next);
> +		if (true && val1)
> +			__set_bit(S2, next);
> +		if (event_b && val1)
> +			__set_bit(S3, next);
> +		if (event_b)
> +			__set_bit(S4, next);
> +		break;
> +	case S2:
> +		if (true)
> +			__set_bit(S1, next);
> +		if (true && val1)
> +			__set_bit(S2, next);
> +		if (event_b && val1)
> +			__set_bit(S3, next);
> +		if (event_b)
> +			__set_bit(S4, next);
> +		break;
> +	case S3:
> +		if (val1)
> +			__set_bit(S0, next);
> +		if (true)
> +			__set_bit(S1, next);
> +		if (event_b)
> +			__set_bit(S4, next);
> +		break;
> +	case S4:
> +		if (val1)
> +			__set_bit(S0, next);
> +		if (true)
> +			__set_bit(S1, next);
> +		if (event_b)
> +			__set_bit(S4, next);
> +		break;
> +	}
> +}
> diff --git a/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit_kunit.c b/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit_kunit.c
> new file mode 100644
> index 0000000000..f64faef1b0
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit_kunit.c
> @@ -0,0 +1,29 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/kernel.h>
> +#include <linux/rv.h>
> +#include <rv/kunit.h>
> +/*
> + * XXX: include required headers, e.g.,
> + * #include <linux/sched.h>
> + */
> +#include "test_ltl_kunit_kunit.h"
> +
> +#if IS_REACHABLE(CONFIG_RV_MON_TEST_LTL_KUNIT)
> +
> +static void rv_test_test_ltl_kunit(struct kunit *test)
> +{
> +	struct rv_kunit_ctx *ctx = test->priv;
> +
> +	prepare_test(test, &rv_test_ltl_kunit_ops.mon);
> +
> +	/*
> +	 * XXX: write the test here
> +	 * e.g.
> +	 * RV_KUNIT_EXPECT_REACTION_HERE(test, ctx)
> +	 *	rv_test_ltl_kunit_ops.handle_event(args);
> +	 */
> +}
> +
> +#else
> +#define rv_test_test_ltl_kunit rv_test_stub
> +#endif
> diff --git a/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit_kunit.h b/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit_kunit.h
> new file mode 100644
> index 0000000000..b2ca34be32
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit_kunit.h
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Automatically generated by rvgen kunit.
> + * May need manual intervention for function prototypes that couldn't be
> + * found (e.g. are in another file) or variables to be exported.
> + */
> +
> +#ifndef __TEST_LTL_KUNIT_KUNIT_H
> +#define __TEST_LTL_KUNIT_KUNIT_H
> +
> +#if IS_ENABLED(CONFIG_RV_MONITORS_KUNIT_TEST)
> +
> +#include <linux/rv.h>
> +#include <rv/kunit.h>
> +
> +extern const struct rv_test_ltl_kunit_ops {
> +	struct rv_kunit_mon mon;
> +	void (*handle_example_event)(void *data, /* XXX: fill header */);
> +} rv_test_ltl_kunit_ops;
> +#endif
> +
> +#endif /* __TEST_LTL_KUNIT_KUNIT_H */
> diff --git a/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit_trace.h b/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit_trace.h
> new file mode 100644
> index 0000000000..a054d5b2c0
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/golden/test_ltl_kunit/test_ltl_kunit_trace.h
> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Snippet to be included in rv_trace.h
> + */
> +
> +#ifdef CONFIG_RV_MON_TEST_LTL_KUNIT
> +DEFINE_EVENT(event_ltl_monitor_id, event_test_ltl_kunit,
> +	     TP_PROTO(struct task_struct *task, char *states, char *atoms, char *next),
> +	     TP_ARGS(task, states, atoms, next));
> +DEFINE_EVENT(error_ltl_monitor_id, error_test_ltl_kunit,
> +	     TP_PROTO(struct task_struct *task),
> +	     TP_ARGS(task));
> +#endif /* CONFIG_RV_MON_TEST_LTL_KUNIT */
> diff --git a/tools/verification/rvgen/tests/rvgen_kunit.t b/tools/verification/rvgen/tests/rvgen_kunit.t
> new file mode 100644
> index 0000000000..174a6fac0b
> --- /dev/null
> +++ b/tools/verification/rvgen/tests/rvgen_kunit.t
> @@ -0,0 +1,32 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +source ../tests/engine.sh
> +test_begin
> +
> +set_timeout 30s
> +
> +# Help tests
> +check "verify kunit subcommand help" \
> +	"$RVGEN kunit -h" 0 "model_name" "spec"
> +
> +check_and_compare_folder "KUnit generation with local lookup and test_da_kunit" \
> +	"$RVGEN monitor -c da -s tests/specs/test_da.dot -t per_cpu -n test_da_kunit && $RVGEN kunit -a -l -n test_da_kunit" \
> +	"test_da_kunit" "Now complete the test and add it to rv_monitors_test.c" "monitor_init"
> +
> +check_and_compare_folder "KUnit generation with local lookup and test_ha_kunit" \
> +	"$RVGEN monitor -c ha -s tests/specs/test_ha.dot -t per_task -n test_ha_kunit && $RVGEN kunit -a -l -n test_ha_kunit" \
> +	"test_ha_kunit" "Successfully created KUnit" "Append the following to"
> +
> +check_and_compare_folder "KUnit generation with local lookup and test_ltl_kunit" \
> +	"$RVGEN monitor -c ltl -s tests/specs/test_ltl.ltl -t per_task -n test_ltl_kunit && $RVGEN kunit -l -n test_ltl_kunit" \
> +	"test_ltl_kunit" "monitor_init = ltl_monitor_init"
> +
> +# Error handling tests
> +check "missing required model_name" \
> +	"$RVGEN kunit" 2 "the following arguments are required: -n/--model_name"
> +
> +check "non-existent model_name with auto_patch" \
> +	"$RVGEN kunit -a -n nonexistent" 1 \
> +	"Could not find monitor C file" "Traceback (most recent call last)"
> +
> +test_end

^ permalink raw reply

* Re: [PATCH v3 04/17] tools/rv: Add selftests
From: Wen Yang @ 2026-06-28 17:10 UTC (permalink / raw)
  To: Gabriele Monaco, linux-trace-kernel, linux-kernel, Steven Rostedt
  Cc: Nam Cao, Thomas Weissschuh, Tomas Glozar, John Kacur
In-Reply-To: <20260625121440.116317-5-gmonaco@redhat.com>



On 6/25/26 20:14, Gabriele Monaco wrote:
> The rv tool needs automated testing to catch regressions and verify
> correct functionality across different usage scenarios.
> 
> Add selftests that validate monitor listing (including containers and
> nested monitors), monitor execution with different configurations
> (reactors, verbose output, tracing), and trace output format for both
> per-task and per-cpu monitors. Error handling paths are also tested.
> Tests use a shared engine for common patterns.
> 
> Acked-by: Nam Cao <namcao@linutronix.de>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>   tools/verification/rv/Makefile        |   5 +-
>   tools/verification/rv/tests/rv_list.t |  48 ++++++++++
>   tools/verification/rv/tests/rv_mon.t  |  95 ++++++++++++++++++
>   tools/verification/tests/engine.sh    | 132 ++++++++++++++++++++++++++
>   4 files changed, 279 insertions(+), 1 deletion(-)
>   create mode 100644 tools/verification/rv/tests/rv_list.t
>   create mode 100644 tools/verification/rv/tests/rv_mon.t
>   create mode 100644 tools/verification/tests/engine.sh
> 
> diff --git a/tools/verification/rv/Makefile b/tools/verification/rv/Makefile
> index 5b898360ba..8ae5fc0d1d 100644
> --- a/tools/verification/rv/Makefile
> +++ b/tools/verification/rv/Makefile
> @@ -78,4 +78,7 @@ clean: doc_clean fixdep-clean
>   	$(Q)rm -f rv rv-static fixdep FEATURE-DUMP rv-*
>   	$(Q)rm -rf feature
>   
> -.PHONY: FORCE clean
> +check: $(RV)
> +	RV=$(RV) prove -o --directives -f tests/
> +
> +.PHONY: FORCE clean check
> diff --git a/tools/verification/rv/tests/rv_list.t b/tools/verification/rv/tests/rv_list.t
> new file mode 100644
> index 0000000000..201af33a52
> --- /dev/null
> +++ b/tools/verification/rv/tests/rv_list.t
> @@ -0,0 +1,48 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +source ../tests/engine.sh
> +test_begin
> +
> +set_timeout 30s
> +
> +RVDIR=/sys/kernel/tracing/rv/
> +
> +# Help and basic tests
> +check "verify help page" \
> +	"$RV --help" 0 "usage: rv command"
> +
> +check "verify list subcommand help" \
> +	"$RV list --help" 0 "list all available monitors"
> +
> +all_nested=$(grep : $RVDIR/available_monitors | cut -d: -f2 | paste -s | sed 's/\t/\\|/g')
> +all_non_nested=$(grep -v : $RVDIR/available_monitors | cut -d: -f2 | paste -s | sed 's/\t/\\|/g')
> +sched_monitors=$(grep sched: $RVDIR/available_monitors | cut -d: -f2 | paste -s | sed 's/\t/\\|/g')
> +description_state="[[:space:]]\+[[:print:]]\+\[\(OFF\|ON\)\]"
> +line_nested=" - \($all_nested\)${description_state}"
> +line_non_nested="\($all_non_nested\)${description_state}"
> +
> +# List monitors and containers
> +check "list all monitors" \
> +	"$RV list" 0 "" "" "^\($line_nested\|$line_non_nested\)$"
> +
> +check_if_exists "list container" \
> +	"$RV list sched" "$RVDIR/monitors/sched" \
> +	"" "-- No monitor found in container sched --" \
> +	"^\($sched_monitors\)${description_state}$"
> +
> +check_if_exists "list non-container" \
> +	"$RV list wwnr" "$RVDIR/monitors/wwnr" \
> +	"-- No monitor found in container wwnr --" \
> +	"^\( - \)\?[[:alnum:]]\+${description_state}$"
> +
> +check "list incomplete container name" \
> +	"$RV list s" 0 "-- No monitor found in container s --"
> +
> +# Error handling tests
> +check "no command" \
> +	"$RV" 1 "rv requires a command"
> +
> +check "invalid command" \
> +	"$RV invalid" 1 "rv does not know the invalid command"
> +
> +test_end
> diff --git a/tools/verification/rv/tests/rv_mon.t b/tools/verification/rv/tests/rv_mon.t
> new file mode 100644
> index 0000000000..cbc346c74c
> --- /dev/null
> +++ b/tools/verification/rv/tests/rv_mon.t
> @@ -0,0 +1,95 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +source ../tests/engine.sh
> +test_begin
> +
> +set_timeout 30s
> +
> +RVDIR=/sys/kernel/tracing/rv/
> +
> +# Help and basic tests
> +check "verify mon subcommand help" \
> +	"$RV mon --help" 0 "run a monitor"
> +
> +# Error handling tests
> +check "mon without monitor name" \
> +	"$RV mon" 1 "usage: rv mon"
> +
> +check "invalid monitor name" \
> +	"$RV mon invalid" 1 "monitor invalid does not exist"
> +
> +if [ -d $RVDIR/monitors/wwnr ]; then
> +
> +check "invalid reactor name" \
> +	"$RV mon wwnr -r invalid" 1 "failed to set invalid reactor, is it available?"
> +
> +check "monitor name is substring of another monitor" \
> +	"$RV mon nr" 1 "monitor nr does not exist"
> +
> +check "already enabled monitor returns error" \
> +	"echo 1 > $RVDIR/monitors/wwnr/enable; $RV mon wwnr" 1 \
> +	"monitor wwnr (in-kernel) is already enabled"
> +echo 0 > $RVDIR/monitors/wwnr/enable
> +
> +fi
> +
> +# rv mon runs until terminated
> +set_expected_timeout 2s
> +
> +# Run monitors with different configurations
> +check_if_exists "run the monitor without parameters" \
> +	"$RV mon wwnr" "$RVDIR/monitors/wwnr" "" "."
> +
> +check_if_exists "run the monitor as verbose" \
> +	"$RV mon wwnr -v" "$RVDIR/monitors/wwnr" \
> +	"my pid is \$pid" "\(event\|error\)"
> +
> +check_if_exists "run the monitor with a reactor" \
> +	"$RV mon wwnr -r printk & sleep .5 && cat $RVDIR/monitors/wwnr/reactors && wait" \
> +	"$RVDIR/monitors/wwnr/reactors" "\[printk\]"
> +
> +check_if_exists "reactor is restored after exit" \
> +	"cat $RVDIR/monitors/wwnr/reactors" \
> +	"$RVDIR/monitors/wwnr/reactors" "\[nop\]"
> +
> +check_if_exists "run a nested monitor with a reactor" \
> +	"$RV mon snroc -r printk & sleep .5 && cat $RVDIR/monitors/sched/snroc/reactors && wait" \
> +	"$RVDIR/monitors/sched/snroc/reactors" "\[printk\]"
> +
> +check_if_exists "run an explicitly nested monitor with a reactor" \
> +	"$RV mon sched:sssw -r printk & sleep .5 && cat $RVDIR/monitors/sched/sssw/reactors && wait" \
> +	"$RVDIR/monitors/sched/sssw/reactors" "\[printk\]"
> +
> +check_if_exists "run container monitor" \
> +	"$RV mon sched & sleep .5 && cat $RVDIR/monitors/sched/{sssw,sco}/enable && wait" \
> +	"$RVDIR/monitors/sched" "1" "0" "^1$"
> +
> +# Regexes for the trace
> +header="^[[:space:]]\+\(\([][A-Z_x<>-]\+\||\)[[:space:]]*\)\+$"
> +type="\(event\|error\)[[:space:]]\+"
> +genpid="[0-9]\+[[:space:]]\+"
> +selfpid="\$pid[[:space:]]\+"
> +cpu="\[[0-9]\{3\}\][[:space:]]\+"
> +state="[a-z_]\+ "
> +trace_task="${genpid}${cpu}${type}${genpid}${state}"
> +trace_task_self="${genpid}${cpu}${type}${selfpid}${state}"
> +trace_cpu="${genpid}${cpu}${type}${state}"
> +trace_cpu_self="${selfpid}${cpu}${type}${state}"
> +
> +check_if_exists "run per-task monitor with tracing" \
> +	"$RV mon sssw -t" "$RVDIR/monitors/sched/sssw" \
> +	"$header" "$trace_task_self" "\($header\|$trace_task\)"
> +
> +check_if_exists "run per-task monitor tracing also self" \
> +	"$RV mon sched:sssw -t -s" "$RVDIR/monitors/sched/sssw" \
> +	"$trace_task_self" "" "\($header\|$trace_task\)"
> +
> +check_if_exists "run per-cpu monitor with tracing" \
> +	"$RV mon sched:sco -t" "$RVDIR/monitors/sched/sco" \
> +	"$header" "$trace_cpu_self" "\($header\|$trace_cpu\)"
> +
> +check_if_exists "run per-cpu monitor tracing also self" \
> +	"$RV mon sco -t -s" "$RVDIR/monitors/sched/sco" \
> +	"$trace_cpu_self" "" "\($header\|$trace_cpu\)"
> +
> +test_end
> diff --git a/tools/verification/tests/engine.sh b/tools/verification/tests/engine.sh
> new file mode 100644
> index 0000000000..76cc254ff9
> --- /dev/null
> +++ b/tools/verification/tests/engine.sh
> @@ -0,0 +1,132 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +test_begin() {
> +	# Count tests to allow the test harness to double-check if all were
> +	# included correctly.
> +	ctr=0
> +	[ -z "$RV" ] && RV="../rv/rv"
> +	[ -n "$TEST_COUNT" ] && echo "1..$TEST_COUNT"
> +}
> +
> +failure() {
> +	fail=1
> +	if [ $# -gt 0 ]; then
> +		failbuf+="$1"
> +		failbuf+=$'\n'
> +	fi
> +}
> +
> +report() {
> +	local desc="$1"
> +
> +	if [ "$fail" -eq 0 ]; then
> +		echo "ok $ctr - $desc"
> +	else
> +		# Add output and exit code as comments in case of failure
> +		echo "not ok $ctr - $desc"
> +		echo -n "$failbuf"
> +		echo "$result" | col -b | while read -r line; do echo "# $line"; done
> +		printf "#\n# exit code %s\n" "$exitcode"
> +	fi
> +}
> +
> +_check() {
> +	local command=$2
> +	local expected_exitcode=${3:-0}
> +	local expected_output=$4
> +	local unexpected_output=$5
> +	local all_lines_pattern=$6
> +	local patterns="$expected_output $unexpected_output $all_lines_pattern"
> +
> +	eval "$TIMEOUT" "$command" &> check_output.$$ &
> +	bgpid=$!
> +	pid=$(pgrep -f "${command%%[|;&>]*}" | tail -n1)

The pgrep runs may immediately after the background fork, before the 
child process has had time to exec.

--
Best wishes,
Wen

> +	wait $bgpid
> +	exitcode=$?
> +	result=$(tr -d '\0' < check_output.$$)
> +	rm -f check_output.$$
> +
> +	failbuf=''
> +	fail=0
> +
> +	# Suppress any other error if a needed pid is empty
> +	if [ -z "$pid" ] && grep -q "\$pid" <<< "$patterns"; then
> +		result=''
> +		failure "# Empty pid for $command"
> +		return 1
> +	fi
> +
> +	expected_output="${expected_output//\$pid/$pid}"
> +	unexpected_output="${unexpected_output//\$pid/$pid}"
> +	all_lines_pattern="${all_lines_pattern//\$pid/$pid}"
> +
> +	# Test if the results matches if requested
> +	if [ -n "$expected_output" ] && ! grep -qe "$expected_output" <<< "$result"; then
> +		failure "# Output match failed: \"$expected_output\""
> +	fi
> +
> +	if [ -n "$unexpected_output" ] && grep -qe "$unexpected_output" <<< "$result"; then
> +		failure "# Output non-match failed: \"$unexpected_output\""
> +	fi
> +
> +	if [ -n "$all_lines_pattern" ] && grep -vqe "$all_lines_pattern" <<< "$result"; then
> +		failure "# All-lines pattern failed: \"$all_lines_pattern\""
> +	fi
> +
> +	if [ $exitcode -ne "$expected_exitcode" ]; then
> +		failure "# Expected exit code $expected_exitcode"
> +	fi
> +}
> +
> +check() {
> +	# Simple check: run the command with given arguments and test exit code.
> +	# If TEST_COUNT is set, run the test. Otherwise, just count.
> +	ctr=$((ctr + 1))
> +	if [ -n "$TEST_COUNT" ]; then
> +		_check "$@"
> +		report "$1"
> +	fi
> +}
> +
> +check_if_exists() {
> +	# Conditional check that skips if a file or folder doesn't exist
> +	local desc=$1
> +	local command=$2
> +	local file=$3
> +	local expected_output=$4
> +	local unexpected_output=$5
> +	local all_lines_pattern=$6
> +
> +	ctr=$((ctr + 1))
> +	if [ -n "$TEST_COUNT" ]; then
> +		if [ ! -e "$file" ]; then
> +			echo "ok $ctr - $desc # SKIP file not found: $file"
> +		else
> +			_check "$desc" "$command" 0 "$expected_output" \
> +				"$unexpected_output" "$all_lines_pattern"
> +			report "$desc"
> +		fi
> +	fi
> +}
> +
> +set_timeout() {
> +	TIMEOUT="timeout -v -k 15s $1"
> +}
> +
> +set_expected_timeout() {
> +	TIMEOUT="timeout --preserve-status -k 15s $1"
> +}
> +
> +unset_timeout() {
> +	unset TIMEOUT
> +}
> +
> +test_end() {
> +	# If running without TEST_COUNT, tests are not actually run, just
> +	# counted. In that case, re-run the test with the correct count.
> +	[ -z "$TEST_COUNT" ] && TEST_COUNT=$ctr exec bash "$0" || true
> +}
> +
> +# Avoid any environmental discrepancies
> +export LC_ALL=C
> +unset_timeout

^ permalink raw reply

* Re: [PATCH v3 11/17] rv: Prevent unintentional tracepoints during KUnit tests
From: Wen Yang @ 2026-06-28 17:17 UTC (permalink / raw)
  To: Gabriele Monaco, linux-trace-kernel, linux-kernel, Steven Rostedt,
	Masami Hiramatsu
  Cc: Nam Cao, Thomas Weissschuh, Tomas Glozar, John Kacur
In-Reply-To: <20260625121440.116317-12-gmonaco@redhat.com>



On 6/25/26 20:14, Gabriele Monaco wrote:
> Monitor initialisation also called during KUnit tests may register some
> tracepoints, this can lead to issues since we don't expect real monitor
> events running during KUnit tests.
> 
> Prevent tracepoint registration if an RV KUnit test is running.
> 
> Reviewed-by: Nam Cao <namcao@linutronix.de>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>   include/rv/instrumentation.h |  5 +++++
>   include/rv/kunit.h           |  2 ++
>   kernel/trace/rv/rv.c         | 11 +++++++++++
>   3 files changed, 18 insertions(+)
> 
> diff --git a/include/rv/instrumentation.h b/include/rv/instrumentation.h
> index d4e7a02ede..761f8f147d 100644
> --- a/include/rv/instrumentation.h
> +++ b/include/rv/instrumentation.h
> @@ -9,12 +9,15 @@
>    */
>   
>   #include <linux/ftrace.h>
> +#include <rv/kunit.h>
>   
>   /*
>    * rv_attach_trace_probe - check and attach a handler function to a tracepoint
>    */
>   #define rv_attach_trace_probe(monitor, tp, rv_handler)					\
>   	do {										\
> +		if (rv_mon_test_is_running())						\
> +			break;								\
>   		check_trace_callback_type_##tp(rv_handler);				\
>   		WARN_ONCE(register_trace_##tp(rv_handler, NULL),			\
>   				"fail attaching " #monitor " " #tp "handler");		\
> @@ -25,5 +28,7 @@
>    */
>   #define rv_detach_trace_probe(monitor, tp, rv_handler)					\
>   	do {										\
> +		if (rv_mon_test_is_running())						\
> +			break;								\
>   		unregister_trace_##tp(rv_handler, NULL);				\
>   	} while (0)
> diff --git a/include/rv/kunit.h b/include/rv/kunit.h
> index 5e8c604cf7..0c4c4bc933 100644
> --- a/include/rv/kunit.h
> +++ b/include/rv/kunit.h
> @@ -19,6 +19,7 @@
>   
>   int rv_set_testing(struct kunit_suite *suite);
>   void rv_clear_testing(struct kunit_suite *suite);
> +bool rv_mon_test_is_running(void);
>   struct task_struct *rv_get_current(void);
>   
>   struct rv_kunit_ctx {
> @@ -59,6 +60,7 @@ void teardown_test(void *arg);
>   #else /* !CONFIG_RV_MONITORS_KUNIT_TEST */
>   
>   #define rv_get_current() current
> +#define rv_mon_test_is_running() false
>   
>   #endif /* CONFIG_RV_MONITORS_KUNIT_TEST */
>   #endif /* _RV_KUNIT_H */
> diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
> index 61d4f13960..86ef5dee7f 100644
> --- a/kernel/trace/rv/rv.c
> +++ b/kernel/trace/rv/rv.c
> @@ -864,6 +864,14 @@ int __init rv_init_interface(void)
>   #include <rv/kunit.h>
>   #include <kunit/visibility.h>
>   
> +static bool rv_mon_test_running;
> +
> +bool rv_mon_test_is_running(void)
> +{
> +	return rv_mon_test_running;
> +}
> +EXPORT_SYMBOL_GPL(rv_mon_test_is_running);
> +

The flag rv_mon_test_running is a plain bool with no ordering
guarantees, and the tracepoint handlers that call 
rv_mon_test_is_running() may execute on
arbitrary CPUs. A store to rv_mon_test_running on the test CPU may not
be visible to another CPU's tracepoint handler.
What abbout READ_ONCE/WRITE_ONCE?

--
Best wishes,
Wen




>   /*
>    * rv_set_testing - ensure mutual exclusion between KUnit tests and real monitors
>    *
> @@ -886,6 +894,8 @@ int rv_set_testing(struct kunit_suite *suite)
>   		}
>   	}
>   
> +	rv_mon_test_running = true;
> +
>   	return 0;
>   }
>   EXPORT_SYMBOL_IF_KUNIT(rv_set_testing);
> @@ -895,6 +905,7 @@ EXPORT_SYMBOL_IF_KUNIT(rv_set_testing);
>    */
>   void rv_clear_testing(struct kunit_suite *suite)
>   {
> +	rv_mon_test_running = false;
>   	mutex_unlock(&rv_interface_lock);
>   }
>   EXPORT_SYMBOL_IF_KUNIT(rv_clear_testing);

^ permalink raw reply

* Re: [PATCH v2 1/2] signal: avoid shared siginfo namespace rewrites
From: Andrew Morton @ 2026-06-28 19:11 UTC (permalink / raw)
  To: Bradley Morgan
  Cc: Oleg Nesterov, Christian Brauner, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Peter Zijlstra, Marco Elver,
	Aleksandr Nogikh, Thomas Gleixner, Adrian Huang, Kexin Sun,
	linux-kernel, linux-trace-kernel, stable
In-Reply-To: <86a8857d58d43ee26a8b365b837fd24830343494.1782159692.git.include@grrlz.net>

On Mon, 22 Jun 2026 20:25:08 +0000 Bradley Morgan <include@grrlz.net> wrote:

> send_signal_locked() rewrites sender ids for the target namespace.
> Group sends reuse the same siginfo, so one recipient can affect the
> next.
> 
> Copy the siginfo before changing it.

Thanks, I'll queue this for 7.3-rc1.  I don't see a need to fast-track
it into mainline, as 7a0cf094944e is from 2019.

Can someone please send along a paragraph which describes the
userspace-visible effects of the bug?  I think this is important when
proposing a backportable fix.  Important for all fixes, really.

I understand that I'm to take no action with "[PATCH v2 2/2] signal:
make send_signal_locked() take const siginfo".

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox