Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* [PATCH v3 0/2] tracing: preserve repeated boot-time parameters and drain deferred trigger frees
From: Wesley Atwell @ 2026-03-10  6:47 UTC (permalink / raw)
  To: rostedt, mhiramat
  Cc: mark.rutland, mathieu.desnoyers, corbet, skhan, linux-doc,
	linux-kernel, linux-trace-kernel, Wesley Atwell

Patch 1 updates the affected early tracing boot-parameter parsers to
preserve repeated instances in the format their existing parsers already
consume, and documents that repeated-parameter behavior.

Patch 2 fixes deferred trigger-data cleanup so boot-deferred frees are
drained even when the cleanup kthread never starts.

v3:
- Patch 1: use a shared trace_append_boot_param() helper
- Patch 1: document repeated-parameter behavior in kernel-parameters.txt
- Patch 1: reframe as an improvement and drop the Fixes tags
- Patch 2: no changes

v2:
- Patch 1: no changes
- Patch 2: restore the dropped mutex recheck comment
- Patch 2: clarify the synchronous fallback drain path

Wesley Atwell (2):
  tracing: preserve repeated boot-time tracing parameters
  tracing: drain deferred trigger frees if kthread startup fails

 .../admin-guide/kernel-parameters.txt         | 18 ++++-
 kernel/trace/ftrace.c                         | 12 ++-
 kernel/trace/trace.c                          |  3 +-
 kernel/trace/trace.h                          | 29 +++++++
 kernel/trace/trace_events.c                   | 26 +++++-
 kernel/trace/trace_events_trigger.c           | 79 ++++++++++++++++---
 kernel/trace/trace_kprobe.c                   |  3 +-
 7 files changed, 145 insertions(+), 25 deletions(-)

-- 
2.34.1

^ permalink raw reply

* Re: [PATCH 1/2] tracing: preserve repeated boot-time tracing parameters
From: Masami Hiramatsu @ 2026-03-10  5:39 UTC (permalink / raw)
  To: Wesley Atwell
  Cc: rostedt, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel
In-Reply-To: <20260309195220.2726094-1-atwellwea@gmail.com>

On Mon,  9 Mar 2026 13:52:19 -0600
Wesley Atwell <atwellwea@gmail.com> wrote:

> Bootconfig expands arrays into repeated param=value entries, and the
> kernel command line can repeat the same tracing parameter as well.
> Several tracing __setup() handlers still overwrite their boot buffers,
> so only the last ftrace filter, graph filter, trace option, kprobe
> event, or trace trigger entry survives boot.

Actually I expected to use it as a string instead of array.

kernel.ftrace_filter="funcA,funcB,funcC"

Because this "funcA,funcB,funcC" is the parameter for ftrace_filter
option.

And admins can choose to specify it as an array or a string by themselves.
(according to the behavior)

For parameters which supports appending values.

  param=value1, value2, value3

For parameters which does NOT supports appending values.

  param="value1,value2,value3"

The problem, if there is one, is that admin-guide/kernel-parameters.txt
doesn't provide any information about this behavior.
(Whether the subsequent parameters should overwrite, be appended,
 or be ignored)

Maybe it is better to document it.

> 
> Preserve repeated values in the format their existing parsers already
> consume: comma-delimited lists for ftrace filters and trace options,
> semicolon-delimited lists for kprobe events, and per-chunk parsing for
> trace_trigger=. The trace_trigger parser tokenizes its storage in
> place, so keep a running length and only parse the newly appended
> chunk into bootup_triggers[].
> 
> Fixes: 2af15d6a44b8 ("ftrace: add kernel command line function filtering")
> Fixes: 7bcfaf54f591 ("tracing: Add trace_options kernel command line parameter")
> Fixes: a01fdc897fa5 ("tracing: Add trace_trigger kernel command line option")
> Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter")

And I think this is improvement to support appending subsequent parameters.
Not Fix, because this is not a bug.


> Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
> ---
>  kernel/trace/ftrace.c       | 29 +++++++++++++++++++++++++----
>  kernel/trace/trace.c        | 23 ++++++++++++++++++++++-
>  kernel/trace/trace_events.c | 23 ++++++++++++++++++++---
>  kernel/trace/trace_kprobe.c | 23 ++++++++++++++++++++++-
>  4 files changed, 89 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 8df69e702706..cdd46f639333 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -6835,13 +6835,34 @@ EXPORT_SYMBOL_GPL(ftrace_set_global_notrace);
>  static char ftrace_notrace_buf[FTRACE_FILTER_SIZE] __initdata;
>  static char ftrace_filter_buf[FTRACE_FILTER_SIZE] __initdata;
>  
> +static void __init append_ftrace_boot_param(char *buf, const char *str,
> +					    char sep)
> +{
> +	size_t len, str_len;
> +
> +	if (buf[0] == '\0') {
> +		strscpy(buf, str, FTRACE_FILTER_SIZE);
> +		return;
> +	}
> +
> +	len = strlen(buf);
> +	str_len = strlen(str);
> +	if (!str_len)
> +		return;
> +	if (str_len >= FTRACE_FILTER_SIZE - len - 1)
> +		return;
> +
> +	buf[len] = sep;
> +	strscpy(buf + len + 1, str, FTRACE_FILTER_SIZE - len - 1);
> +}

Please make a generic append function in kernel/trace/trace.h, e.g.

void trace_append_boot_param(char *buf, const char *str, char sep, size_t ssize);

and use it instead of strscpy.

Thank you,

> +
>  /* Used by function selftest to not test if filter is set */
>  bool ftrace_filter_param __initdata;
>  
>  static int __init set_ftrace_notrace(char *str)
>  {
>  	ftrace_filter_param = true;
> -	strscpy(ftrace_notrace_buf, str, FTRACE_FILTER_SIZE);
> +	append_ftrace_boot_param(ftrace_notrace_buf, str, ',');
>  	return 1;
>  }
>  __setup("ftrace_notrace=", set_ftrace_notrace);
> @@ -6849,7 +6870,7 @@ __setup("ftrace_notrace=", set_ftrace_notrace);
>  static int __init set_ftrace_filter(char *str)
>  {
>  	ftrace_filter_param = true;
> -	strscpy(ftrace_filter_buf, str, FTRACE_FILTER_SIZE);
> +	append_ftrace_boot_param(ftrace_filter_buf, str, ',');
>  	return 1;
>  }
>  __setup("ftrace_filter=", set_ftrace_filter);
> @@ -6861,14 +6882,14 @@ static int ftrace_graph_set_hash(struct ftrace_hash *hash, char *buffer);
>  
>  static int __init set_graph_function(char *str)
>  {
> -	strscpy(ftrace_graph_buf, str, FTRACE_FILTER_SIZE);
> +	append_ftrace_boot_param(ftrace_graph_buf, str, ',');
>  	return 1;
>  }
>  __setup("ftrace_graph_filter=", set_graph_function);
>  
>  static int __init set_graph_notrace_function(char *str)
>  {
> -	strscpy(ftrace_graph_notrace_buf, str, FTRACE_FILTER_SIZE);
> +	append_ftrace_boot_param(ftrace_graph_notrace_buf, str, ',');
>  	return 1;
>  }
>  __setup("ftrace_graph_notrace=", set_graph_notrace_function);
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index ebd996f8710e..42d03d36ae39 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -327,9 +327,30 @@ __setup("trace_instance=", boot_instance);
>  
>  static char trace_boot_options_buf[MAX_TRACER_SIZE] __initdata;
>  
> +static void __init append_trace_boot_options(const char *str)
> +{
> +	size_t len, str_len;
> +
> +	if (trace_boot_options_buf[0] == '\0') {
> +		strscpy(trace_boot_options_buf, str, MAX_TRACER_SIZE);
> +		return;
> +	}
> +
> +	len = strlen(trace_boot_options_buf);
> +	str_len = strlen(str);
> +	if (!str_len)
> +		return;
> +	if (str_len >= MAX_TRACER_SIZE - len - 1)
> +		return;
> +
> +	trace_boot_options_buf[len] = ',';
> +	strscpy(trace_boot_options_buf + len + 1, str,
> +		MAX_TRACER_SIZE - len - 1);
> +}
> +
>  static int __init set_trace_boot_options(char *str)
>  {
> -	strscpy(trace_boot_options_buf, str, MAX_TRACER_SIZE);
> +	append_trace_boot_options(str);
>  	return 1;
>  }
>  __setup("trace_options=", set_trace_boot_options);
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 249d1cba72c0..c3981f62e4bc 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -3679,20 +3679,37 @@ static struct boot_triggers {
>  } bootup_triggers[MAX_BOOT_TRIGGERS];
>  
>  static char bootup_trigger_buf[COMMAND_LINE_SIZE];
> +static int bootup_trigger_buf_len;
>  static int nr_boot_triggers;
>  
>  static __init int setup_trace_triggers(char *str)
>  {
>  	char *trigger;
>  	char *buf;
> +	ssize_t copied;
>  	int i;
> +	int start;
>  
> -	strscpy(bootup_trigger_buf, str, COMMAND_LINE_SIZE);
> +	if (bootup_trigger_buf_len >= COMMAND_LINE_SIZE)
> +		return 1;
> +
> +	start = bootup_trigger_buf_len;
> +	if (start && !*str)
> +		return 1;
> +
> +	copied = strscpy(bootup_trigger_buf + start, str,
> +			 COMMAND_LINE_SIZE - start);
> +	if (copied < 0) {
> +		if (start)
> +			return 1;
> +		copied = strlen(bootup_trigger_buf + start);
> +	}
> +	bootup_trigger_buf_len += copied + 1;
>  	trace_set_ring_buffer_expanded(NULL);
>  	disable_tracing_selftest("running event triggers");
>  
> -	buf = bootup_trigger_buf;
> -	for (i = 0; i < MAX_BOOT_TRIGGERS; i++) {
> +	buf = bootup_trigger_buf + start;
> +	for (i = nr_boot_triggers; i < MAX_BOOT_TRIGGERS; i++) {
>  		trigger = strsep(&buf, ",");
>  		if (!trigger)
>  			break;
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index a5dbb72528e0..a63a56b55570 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -29,9 +29,30 @@
>  /* Kprobe early definition from command line */
>  static char kprobe_boot_events_buf[COMMAND_LINE_SIZE] __initdata;
>  
> +static void __init append_kprobe_boot_event(const char *str)
> +{
> +	size_t len, str_len;
> +
> +	if (kprobe_boot_events_buf[0] == '\0') {
> +		strscpy(kprobe_boot_events_buf, str, COMMAND_LINE_SIZE);
> +		return;
> +	}
> +
> +	len = strlen(kprobe_boot_events_buf);
> +	str_len = strlen(str);
> +	if (!str_len)
> +		return;
> +	if (str_len >= COMMAND_LINE_SIZE - len - 1)
> +		return;
> +
> +	kprobe_boot_events_buf[len] = ';';
> +	strscpy(kprobe_boot_events_buf + len + 1, str,
> +		COMMAND_LINE_SIZE - len - 1);
> +}
> +
>  static int __init set_kprobe_boot_events(char *str)
>  {
> -	strscpy(kprobe_boot_events_buf, str, COMMAND_LINE_SIZE);
> +	append_kprobe_boot_event(str);
>  	disable_tracing_selftest("running kprobe events");
>  
>  	return 1;
> -- 
> 2.34.1
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* [PATCH v2 2/2] tracing: drain deferred trigger frees if kthread startup fails
From: Wesley Atwell @ 2026-03-09 21:27 UTC (permalink / raw)
  To: rostedt
  Cc: mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, Wesley Atwell
In-Reply-To: <20260309212702.13622-1-atwellwea@gmail.com>

Boot-time trigger registration can fail before the trigger-data cleanup
kthread exists. Deferring those frees until late init is fine, but the
post-boot fallback must still drain the deferred list if kthread
creation never succeeds.

Otherwise, boot-deferred nodes can accumulate on
trigger_data_free_list, later frees fall back to synchronously freeing
only the current object, and the older queued entries are leaked
forever.

Keep the deferred boot-time behavior, but when kthread creation fails,
drain the whole queued list synchronously. Do the same in the late-init
drain path so queued entries are not stranded there either.

Fixes: 61d445af0a7c ("tracing: Add bulk garbage collection of freeing event_trigger_data")
Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
---
v2:
- Restore the dropped mutex recheck comment
- Clarify the synchronous fallback drain path

 kernel/trace/trace_events_trigger.c | 79 ++++++++++++++++++++++++-----
 1 file changed, 66 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index d5230b759a2d..428b46272ac8 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -22,6 +22,39 @@ static struct task_struct *trigger_kthread;
 static struct llist_head trigger_data_free_list;
 static DEFINE_MUTEX(trigger_data_kthread_mutex);
 
+static int trigger_kthread_fn(void *ignore);
+
+static void trigger_start_kthread_locked(void)
+{
+	lockdep_assert_held(&trigger_data_kthread_mutex);
+
+	if (!trigger_kthread) {
+		struct task_struct *kthread;
+
+		kthread = kthread_create(trigger_kthread_fn, NULL,
+					 "trigger_data_free");
+		if (!IS_ERR(kthread))
+			WRITE_ONCE(trigger_kthread, kthread);
+	}
+}
+
+static void trigger_data_free_queued_locked(void)
+{
+	struct event_trigger_data *data, *tmp;
+	struct llist_node *llnodes;
+
+	lockdep_assert_held(&trigger_data_kthread_mutex);
+
+	llnodes = llist_del_all(&trigger_data_free_list);
+	if (!llnodes)
+		return;
+
+	tracepoint_synchronize_unregister();
+
+	llist_for_each_entry_safe(data, tmp, llnodes, llist)
+		kfree(data);
+}
+
 /* Bulk garbage collection of event_trigger_data elements */
 static int trigger_kthread_fn(void *ignore)
 {
@@ -56,30 +89,50 @@ void trigger_data_free(struct event_trigger_data *data)
 	if (data->cmd_ops->set_filter)
 		data->cmd_ops->set_filter(NULL, data, NULL);
 
+	/*
+	 * Boot-time trigger registration can fail before kthread creation
+	 * works. Keep the deferred-free semantics during boot and let late
+	 * init start the kthread to drain the list.
+	 */
+	if (system_state == SYSTEM_BOOTING && !trigger_kthread) {
+		llist_add(&data->llist, &trigger_data_free_list);
+		return;
+	}
+
 	if (unlikely(!trigger_kthread)) {
 		guard(mutex)(&trigger_data_kthread_mutex);
+
+		trigger_start_kthread_locked();
 		/* Check again after taking mutex */
 		if (!trigger_kthread) {
-			struct task_struct *kthread;
-
-			kthread = kthread_create(trigger_kthread_fn, NULL,
-						 "trigger_data_free");
-			if (!IS_ERR(kthread))
-				WRITE_ONCE(trigger_kthread, kthread);
+			llist_add(&data->llist, &trigger_data_free_list);
+			/* Drain the queued frees synchronously if startup failed. */
+			trigger_data_free_queued_locked();
+			return;
 		}
 	}
 
-	if (!trigger_kthread) {
-		/* Do it the slow way */
-		tracepoint_synchronize_unregister();
-		kfree(data);
-		return;
-	}
-
 	llist_add(&data->llist, &trigger_data_free_list);
 	wake_up_process(trigger_kthread);
 }
 
+static int __init trigger_data_free_init(void)
+{
+	guard(mutex)(&trigger_data_kthread_mutex);
+
+	if (llist_empty(&trigger_data_free_list))
+		return 0;
+
+	trigger_start_kthread_locked();
+	if (trigger_kthread)
+		wake_up_process(trigger_kthread);
+	else
+		trigger_data_free_queued_locked();
+
+	return 0;
+}
+late_initcall(trigger_data_free_init);
+
 static inline void data_ops_trigger(struct event_trigger_data *data,
 				    struct trace_buffer *buffer,  void *rec,
 				    struct ring_buffer_event *event)
-- 
2.34.1

^ permalink raw reply related

* [PATCH v2 1/2] tracing: preserve repeated boot-time tracing parameters
From: Wesley Atwell @ 2026-03-09 21:27 UTC (permalink / raw)
  To: rostedt
  Cc: mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, Wesley Atwell
In-Reply-To: <20260309212702.13622-1-atwellwea@gmail.com>

Bootconfig expands arrays into repeated param=value entries, and the
kernel command line can repeat the same tracing parameter as well.
Several tracing __setup() handlers still overwrite their boot buffers,
so only the last ftrace filter, graph filter, trace option, kprobe
event, or trace trigger entry survives boot.

Preserve repeated values in the format their existing parsers already
consume: comma-delimited lists for ftrace filters and trace options,
semicolon-delimited lists for kprobe events, and per-chunk parsing for
trace_trigger=. The trace_trigger parser tokenizes its storage in
place, so keep a running length and only parse the newly appended
chunk into bootup_triggers[].

Fixes: 2af15d6a44b8 ("ftrace: add kernel command line function filtering")
Fixes: 7bcfaf54f591 ("tracing: Add trace_options kernel command line parameter")
Fixes: a01fdc897fa5 ("tracing: Add trace_trigger kernel command line option")
Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter")
Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
---
v2:
- No changes

 kernel/trace/ftrace.c       | 29 +++++++++++++++++++++++++----
 kernel/trace/trace.c        | 23 ++++++++++++++++++++++-
 kernel/trace/trace_events.c | 23 ++++++++++++++++++++---
 kernel/trace/trace_kprobe.c | 23 ++++++++++++++++++++++-
 4 files changed, 89 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8df69e702706..cdd46f639333 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6835,13 +6835,34 @@ EXPORT_SYMBOL_GPL(ftrace_set_global_notrace);
 static char ftrace_notrace_buf[FTRACE_FILTER_SIZE] __initdata;
 static char ftrace_filter_buf[FTRACE_FILTER_SIZE] __initdata;
 
+static void __init append_ftrace_boot_param(char *buf, const char *str,
+					    char sep)
+{
+	size_t len, str_len;
+
+	if (buf[0] == '\0') {
+		strscpy(buf, str, FTRACE_FILTER_SIZE);
+		return;
+	}
+
+	len = strlen(buf);
+	str_len = strlen(str);
+	if (!str_len)
+		return;
+	if (str_len >= FTRACE_FILTER_SIZE - len - 1)
+		return;
+
+	buf[len] = sep;
+	strscpy(buf + len + 1, str, FTRACE_FILTER_SIZE - len - 1);
+}
+
 /* Used by function selftest to not test if filter is set */
 bool ftrace_filter_param __initdata;
 
 static int __init set_ftrace_notrace(char *str)
 {
 	ftrace_filter_param = true;
-	strscpy(ftrace_notrace_buf, str, FTRACE_FILTER_SIZE);
+	append_ftrace_boot_param(ftrace_notrace_buf, str, ',');
 	return 1;
 }
 __setup("ftrace_notrace=", set_ftrace_notrace);
@@ -6849,7 +6870,7 @@ __setup("ftrace_notrace=", set_ftrace_notrace);
 static int __init set_ftrace_filter(char *str)
 {
 	ftrace_filter_param = true;
-	strscpy(ftrace_filter_buf, str, FTRACE_FILTER_SIZE);
+	append_ftrace_boot_param(ftrace_filter_buf, str, ',');
 	return 1;
 }
 __setup("ftrace_filter=", set_ftrace_filter);
@@ -6861,14 +6882,14 @@ static int ftrace_graph_set_hash(struct ftrace_hash *hash, char *buffer);
 
 static int __init set_graph_function(char *str)
 {
-	strscpy(ftrace_graph_buf, str, FTRACE_FILTER_SIZE);
+	append_ftrace_boot_param(ftrace_graph_buf, str, ',');
 	return 1;
 }
 __setup("ftrace_graph_filter=", set_graph_function);
 
 static int __init set_graph_notrace_function(char *str)
 {
-	strscpy(ftrace_graph_notrace_buf, str, FTRACE_FILTER_SIZE);
+	append_ftrace_boot_param(ftrace_graph_notrace_buf, str, ',');
 	return 1;
 }
 __setup("ftrace_graph_notrace=", set_graph_notrace_function);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ebd996f8710e..42d03d36ae39 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -327,9 +327,30 @@ __setup("trace_instance=", boot_instance);
 
 static char trace_boot_options_buf[MAX_TRACER_SIZE] __initdata;
 
+static void __init append_trace_boot_options(const char *str)
+{
+	size_t len, str_len;
+
+	if (trace_boot_options_buf[0] == '\0') {
+		strscpy(trace_boot_options_buf, str, MAX_TRACER_SIZE);
+		return;
+	}
+
+	len = strlen(trace_boot_options_buf);
+	str_len = strlen(str);
+	if (!str_len)
+		return;
+	if (str_len >= MAX_TRACER_SIZE - len - 1)
+		return;
+
+	trace_boot_options_buf[len] = ',';
+	strscpy(trace_boot_options_buf + len + 1, str,
+		MAX_TRACER_SIZE - len - 1);
+}
+
 static int __init set_trace_boot_options(char *str)
 {
-	strscpy(trace_boot_options_buf, str, MAX_TRACER_SIZE);
+	append_trace_boot_options(str);
 	return 1;
 }
 __setup("trace_options=", set_trace_boot_options);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 249d1cba72c0..c3981f62e4bc 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -3679,20 +3679,37 @@ static struct boot_triggers {
 } bootup_triggers[MAX_BOOT_TRIGGERS];
 
 static char bootup_trigger_buf[COMMAND_LINE_SIZE];
+static int bootup_trigger_buf_len;
 static int nr_boot_triggers;
 
 static __init int setup_trace_triggers(char *str)
 {
 	char *trigger;
 	char *buf;
+	ssize_t copied;
 	int i;
+	int start;
 
-	strscpy(bootup_trigger_buf, str, COMMAND_LINE_SIZE);
+	if (bootup_trigger_buf_len >= COMMAND_LINE_SIZE)
+		return 1;
+
+	start = bootup_trigger_buf_len;
+	if (start && !*str)
+		return 1;
+
+	copied = strscpy(bootup_trigger_buf + start, str,
+			 COMMAND_LINE_SIZE - start);
+	if (copied < 0) {
+		if (start)
+			return 1;
+		copied = strlen(bootup_trigger_buf + start);
+	}
+	bootup_trigger_buf_len += copied + 1;
 	trace_set_ring_buffer_expanded(NULL);
 	disable_tracing_selftest("running event triggers");
 
-	buf = bootup_trigger_buf;
-	for (i = 0; i < MAX_BOOT_TRIGGERS; i++) {
+	buf = bootup_trigger_buf + start;
+	for (i = nr_boot_triggers; i < MAX_BOOT_TRIGGERS; i++) {
 		trigger = strsep(&buf, ",");
 		if (!trigger)
 			break;
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index a5dbb72528e0..a63a56b55570 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -29,9 +29,30 @@
 /* Kprobe early definition from command line */
 static char kprobe_boot_events_buf[COMMAND_LINE_SIZE] __initdata;
 
+static void __init append_kprobe_boot_event(const char *str)
+{
+	size_t len, str_len;
+
+	if (kprobe_boot_events_buf[0] == '\0') {
+		strscpy(kprobe_boot_events_buf, str, COMMAND_LINE_SIZE);
+		return;
+	}
+
+	len = strlen(kprobe_boot_events_buf);
+	str_len = strlen(str);
+	if (!str_len)
+		return;
+	if (str_len >= COMMAND_LINE_SIZE - len - 1)
+		return;
+
+	kprobe_boot_events_buf[len] = ';';
+	strscpy(kprobe_boot_events_buf + len + 1, str,
+		COMMAND_LINE_SIZE - len - 1);
+}
+
 static int __init set_kprobe_boot_events(char *str)
 {
-	strscpy(kprobe_boot_events_buf, str, COMMAND_LINE_SIZE);
+	append_kprobe_boot_event(str);
 	disable_tracing_selftest("running kprobe events");
 
 	return 1;
-- 
2.34.1

^ permalink raw reply related

* [PATCH v2 0/2] tracing: preserve repeated boot-time parameters and drain deferred trigger frees
From: Wesley Atwell @ 2026-03-09 21:27 UTC (permalink / raw)
  To: rostedt
  Cc: mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, Wesley Atwell

Patch 1 preserves repeated tracing boot parameters across the affected
early parsers.

Patch 2 makes deferred trigger-data frees drain correctly if the cleanup
kthread never starts, and restores the review-requested comment context
around the post-mutex recheck and synchronous fallback path.

v2:
- Patch 1: no changes
- Patch 2: restore the dropped mutex recheck comment and clarify the
  synchronous fallback drain path

Wesley Atwell (2):
  tracing: preserve repeated boot-time tracing parameters
  tracing: drain deferred trigger frees if kthread startup fails

 kernel/trace/ftrace.c               | 29 +++++++++--
 kernel/trace/trace.c                | 23 ++++++++-
 kernel/trace/trace_events.c         | 23 +++++++--
 kernel/trace/trace_events_trigger.c | 79 ++++++++++++++++++++++++-----
 kernel/trace/trace_kprobe.c         | 23 ++++++++-
 5 files changed, 155 insertions(+), 22 deletions(-)

-- 
2.34.1

^ permalink raw reply

* Re: [PATCH v4 0/5] mm: zone lock tracepoint instrumentation
From: Steven Rostedt @ 2026-03-09 21:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dmitry Ilvokhin, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Masami Hiramatsu, Mathieu Desnoyers, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Brendan Jackman, Johannes Weiner, Zi Yan,
	Oscar Salvador, Qi Zheng, Shakeel Butt, linux-kernel, linux-mm,
	linux-trace-kernel, linux-pm
In-Reply-To: <aa8xawMxLeUjkyHx@casper.infradead.org>

On Mon, 9 Mar 2026 20:45:31 +0000
Matthew Wilcox <willy@infradead.org> wrote:

> On Mon, Mar 09, 2026 at 03:13:17PM -0400, Steven Rostedt wrote:
> > The biggest issue with making a generic light weight LOCK_STAT is that
> > locks are extremely optimized. Any addition of generic lock encoding will
> > cause a noticeable overhead when compiled in, even when disabled.  
> 
> I'm not sure that's true.  Taking the current Debian kernel config
> leads to a "call" instruction to acquire a spinlock:
> 
> void __insert_inode_hash(struct inode *inode, unsigned long hashval)
> {
>         struct hlist_head *b = inode_hashtable + hash(inode->i_sb, hashval);
> 
>         spin_lock(&inode_hash_lock);
>         spin_lock(&inode->i_lock);
>         hlist_add_head_rcu(&inode->i_hash, b);
>         spin_unlock(&inode->i_lock);
>         spin_unlock(&inode_hash_lock);
> }
> 
> compiles to:
> 
> [...]
>      280:       23 35 00 00 00 00       and    0x0(%rip),%esi        # 286 <__insert_inode_hash+0x56>
>                         282: R_X86_64_PC32      .data..ro_after_init+0x10
>      286:       48 8d 2c f0             lea    (%rax,%rsi,8),%rbp
>      28a:       e8 00 00 00 00          call   28f <__insert_inode_hash+0x5f>
>                         28b: R_X86_64_PLT32     _raw_spin_lock-0x4
>      28f:       4c 89 e7                mov    %r12,%rdi
>      292:       e8 00 00 00 00          call   297 <__insert_inode_hash+0x67>
>                         293: R_X86_64_PLT32     _raw_spin_lock-0x4
> [...]

Ah, you're correct. Looks like it's an arch specific thing. I was going
back to my memory from around 2006, but it appears that only a few archs
inline spinlocks anymore. Thomas made it a bit easier to see what does and
does not do that (in 2009).

  6beb000923882 ("locking: Make inlining decision Kconfig based")

So, perhaps adding code to the spinlocks will not be as much of a hit on I$.

> (The spinlock code is too complex for me to follow what config options
> influence whether it's a function call; you probably have enough of it
> in your head that you'd know)

Yeah, I feel like I'm always relearning the code every time I have to jump
in and understand it again.

> 
> > The other issue is the data we store for the lock. A lock is usually just a
> > word (or long) in size, embedded in a structure. LOCKDEP and LOCK_STAT adds
> > a key per lock. This increases the data size of the kernel.  
> 
> It does, but perhaps for a light weight lockstat, we could do better
> than that.  For example it could use the return address to look up
> which lock is being accessed rather than embedding a key in each lock.

Right.

-- Steve

^ permalink raw reply

* Re: [PATCH v3 00/12] vfs: change inode->i_ino from unsigned long to u64
From: Jeff Layton @ 2026-03-09 20:50 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: linux-fsdevel, linux-kernel, linux-trace-kernel, nvdimm, fsverity,
	linux-mm, netfs, linux-ext4, linux-f2fs-devel, linux-nfs,
	linux-cifs, samba-technical, linux-nilfs, v9fs, linux-afs, autofs,
	ceph-devel, codalist, ecryptfs, linux-mtd, jfs-discussion, ntfs3,
	ocfs2-devel, devel, linux-unionfs, apparmor,
	linux-security-module, linux-integrity, selinux, amd-gfx,
	dri-devel, linux-media, linaro-mm-sig, netdev, linux-perf-users,
	linux-fscrypt, linux-xfs, linux-hams, linux-x25, audit,
	linux-bluetooth, linux-can, linux-sctp, bpf
In-Reply-To: <0bd92b4fce00a6111a0fc7764904f7e6ae0ece3a.camel@linux.ibm.com>

On Mon, 2026-03-09 at 16:11 -0400, Mimi Zohar wrote:
> On Mon, 2026-03-09 at 15:33 -0400, Jeff Layton wrote:
> > On Mon, 2026-03-09 at 15:00 -0400, Mimi Zohar wrote:
> > > On Mon, 2026-03-09 at 13:59 -0400, Jeff Layton wrote:
> > > > On Mon, 2026-03-09 at 13:47 -0400, Mimi Zohar wrote:
> > > > > [ I/O socket time out.  Trimming the To list.]
> > > > > 
> > > > > On Wed, 2026-03-04 at 10:32 -0500, Jeff Layton wrote:
> > > > > > This version squashes all of the format-string changes and the i_ino
> > > > > > type change into the same patch. This results in a giant 600+ line patch
> > > > > > at the end of the series, but it does remain bisectable.  Because the
> > > > > > patchset was reorganized (again) some of the R-b's and A-b's have been
> > > > > > dropped.
> > > > > > 
> > > > > > The entire pile is in the "iino-u64" branch of my tree, if anyone is
> > > > > > interested in testing this.
> > > > > > 
> > > > > >     https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git/
> > > > > > 
> > > > > > Original cover letter follows:
> > > > > > 
> > > > > > ----------------------8<-----------------------
> > > > > > 
> > > > > > Christian said [1] to "just do it" when I proposed this, so here we are!
> > > > > > 
> > > > > > For historical reasons, the inode->i_ino field is an unsigned long,
> > > > > > which means that it's 32 bits on 32 bit architectures. This has caused a
> > > > > > number of filesystems to implement hacks to hash a 64-bit identifier
> > > > > > into a 32-bit field, and deprives us of a universal identifier field for
> > > > > > an inode.
> > > > > > 
> > > > > > This patchset changes the inode->i_ino field from an unsigned long to a
> > > > > > u64. This shouldn't make any material difference on 64-bit hosts, but
> > > > > > 32-bit hosts will see struct inode grow by at least 4 bytes. This could
> > > > > > have effects on slabcache sizes and field alignment.
> > > > > > 
> > > > > > The bulk of the changes are to format strings and tracepoints, since the
> > > > > > kernel itself doesn't care that much about the i_ino field. The first
> > > > > > patch changes some vfs function arguments, so check that one out
> > > > > > carefully.
> > > > > > 
> > > > > > With this change, we may be able to shrink some inode structures. For
> > > > > > instance, struct nfs_inode has a fileid field that holds the 64-bit
> > > > > > inode number. With this set of changes, that field could be eliminated.
> > > > > > I'd rather leave that sort of cleanups for later just to keep this
> > > > > > simple.
> > > > > > 
> > > > > > Much of this set was generated by LLM, but I attributed it to myself
> > > > > > since I consider this to be in the "menial tasks" category of LLM usage.
> > > > > > 
> > > > > > [1]: https://lore.kernel.org/linux-fsdevel/20260219-portrait-winkt-959070cee42f@brauner/
> > > > > > 
> > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > 
> > > > > Jeff, missing from this patch set is EVM.  In hmac_add_misc() EVM copies the
> > > > > i_ino and calculates either an HMAC or file meta-data hash, which is then
> > > > > signed. 
> > > > > 
> > > > > 
> > > > 
> > > > Thanks Mimi, good catch.
> > > > 
> > > > It looks like we should just be able to change the ino field to a u64
> > > > alongside everything else. Something like this:
> > > > 
> > > > diff --git a/security/integrity/evm/evm_crypto.c b/security/integrity/evm/evm_crypto.c
> > > > index c0ca4eedb0fe..77b6c2fa345e 100644
> > > > --- a/security/integrity/evm/evm_crypto.c
> > > > +++ b/security/integrity/evm/evm_crypto.c
> > > > @@ -144,7 +144,7 @@ static void hmac_add_misc(struct shash_desc *desc, struct inode *inode,
> > > >                           char type, char *digest)
> > > >  {
> > > >         struct h_misc {
> > > > -               unsigned long ino;
> > > > +               u64 ino;
> > > >                 __u32 generation;
> > > >                 uid_t uid;
> > > >                 gid_t gid;
> > > > 
> > > 
> > > Agreed.
> > > 
> > > > 
> > > > That should make no material difference on 64-bit hosts. What's the
> > > > effect on 32-bit? Will they just need to remeasure everything or would
> > > > the consequences be more dire? Do we have any clue whether anyone is
> > > > using EVM in 32-bit environments?
> > > 
> > > All good questions. Unfortunately I don't know the answer to most of them. What
> > > we do know: changing the size of the i_ino field would affect EVM file metadata
> > > verification and would require relabeling the filesystem.  Even packages
> > > containing EVM portable signatures, which don't include or verify the i_ino
> > > number, would be affected.
> > > 
> > 
> > Ouch. Technically, I guess this is ABI...
> > 
> > While converting to u64 seems like the ideal thing to do, the other
> > option might be to just keep this as an unsigned long for now.
> > 
> > No effect on 64-bit, but that could keep things working 32-bit when the
> > i_ino casts properly to a u32. ext4 would be fine since they don't
> > issue inode numbers larger than UINT_MAX. xfs and btrfs are a bit more
> > iffy, but worst case they'd just need to be relabeled (which is what
> > they'll need to do anyway).
> > 
> > If we do that, then we should probably add a comment to this function
> > explaining why it's an unsigned long.
> 
> Agreed.
> 

For now, I think that's the best approach. I'll spin up a patch to add
the comment.

> > 
> > Thoughts?
> 
> My concern would be embedded/IoT devices, but I don't have any insight into who
> might be using it on 32 bit.
> 

Yep. This LWN article on Arnd's talk lays out the state of things:

    https://lwn.net/Articles/1035727/

The upshot is that it's hard to buy 32-bit CPUs these days, and will
only get harder. But, there are still a fair number of 32-bit devices
out in the field and will be for some time.

The big question is how many of those EVM users that intend to run new
kernels. I have no idea how to answer that.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply

* Re: [PATCH v4 0/5] mm: zone lock tracepoint instrumentation
From: Matthew Wilcox @ 2026-03-09 20:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dmitry Ilvokhin, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Masami Hiramatsu, Mathieu Desnoyers, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Brendan Jackman, Johannes Weiner, Zi Yan,
	Oscar Salvador, Qi Zheng, Shakeel Butt, linux-kernel, linux-mm,
	linux-trace-kernel, linux-pm
In-Reply-To: <20260309151317.7bba06dd@gandalf.local.home>

On Mon, Mar 09, 2026 at 03:13:17PM -0400, Steven Rostedt wrote:
> The biggest issue with making a generic light weight LOCK_STAT is that
> locks are extremely optimized. Any addition of generic lock encoding will
> cause a noticeable overhead when compiled in, even when disabled.

I'm not sure that's true.  Taking the current Debian kernel config
leads to a "call" instruction to acquire a spinlock:

void __insert_inode_hash(struct inode *inode, unsigned long hashval)
{
        struct hlist_head *b = inode_hashtable + hash(inode->i_sb, hashval);

        spin_lock(&inode_hash_lock);
        spin_lock(&inode->i_lock);
        hlist_add_head_rcu(&inode->i_hash, b);
        spin_unlock(&inode->i_lock);
        spin_unlock(&inode_hash_lock);
}

compiles to:

[...]
     280:       23 35 00 00 00 00       and    0x0(%rip),%esi        # 286 <__insert_inode_hash+0x56>
                        282: R_X86_64_PC32      .data..ro_after_init+0x10
     286:       48 8d 2c f0             lea    (%rax,%rsi,8),%rbp
     28a:       e8 00 00 00 00          call   28f <__insert_inode_hash+0x5f>
                        28b: R_X86_64_PLT32     _raw_spin_lock-0x4
     28f:       4c 89 e7                mov    %r12,%rdi
     292:       e8 00 00 00 00          call   297 <__insert_inode_hash+0x67>
                        293: R_X86_64_PLT32     _raw_spin_lock-0x4
[...]

Debian doesn't do anything too weird here:

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
# CONFIG_DEBUG_RWSEMS is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set

(The spinlock code is too complex for me to follow what config options
influence whether it's a function call; you probably have enough of it
in your head that you'd know)

> The other issue is the data we store for the lock. A lock is usually just a
> word (or long) in size, embedded in a structure. LOCKDEP and LOCK_STAT adds
> a key per lock. This increases the data size of the kernel.

It does, but perhaps for a light weight lockstat, we could do better
than that.  For example it could use the return address to look up
which lock is being accessed rather than embedding a key in each lock.


^ permalink raw reply

* Re: [PATCH v3 00/12] vfs: change inode->i_ino from unsigned long to u64
From: Mimi Zohar @ 2026-03-09 20:11 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-fsdevel, linux-kernel, linux-trace-kernel, nvdimm, fsverity,
	linux-mm, netfs, linux-ext4, linux-f2fs-devel, linux-nfs,
	linux-cifs, samba-technical, linux-nilfs, v9fs, linux-afs, autofs,
	ceph-devel, codalist, ecryptfs, linux-mtd, jfs-discussion, ntfs3,
	ocfs2-devel, devel, linux-unionfs, apparmor,
	linux-security-module, linux-integrity, selinux, amd-gfx,
	dri-devel, linux-media, linaro-mm-sig, netdev, linux-perf-users,
	linux-fscrypt, linux-xfs, linux-hams, linux-x25, audit,
	linux-bluetooth, linux-can, linux-sctp, bpf
In-Reply-To: <dd3f9873c7939fba0ca2366effd24e4b6326f17b.camel@kernel.org>

On Mon, 2026-03-09 at 15:33 -0400, Jeff Layton wrote:
> On Mon, 2026-03-09 at 15:00 -0400, Mimi Zohar wrote:
> > On Mon, 2026-03-09 at 13:59 -0400, Jeff Layton wrote:
> > > On Mon, 2026-03-09 at 13:47 -0400, Mimi Zohar wrote:
> > > > [ I/O socket time out.  Trimming the To list.]
> > > > 
> > > > On Wed, 2026-03-04 at 10:32 -0500, Jeff Layton wrote:
> > > > > This version squashes all of the format-string changes and the i_ino
> > > > > type change into the same patch. This results in a giant 600+ line patch
> > > > > at the end of the series, but it does remain bisectable.  Because the
> > > > > patchset was reorganized (again) some of the R-b's and A-b's have been
> > > > > dropped.
> > > > > 
> > > > > The entire pile is in the "iino-u64" branch of my tree, if anyone is
> > > > > interested in testing this.
> > > > > 
> > > > >     https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git/
> > > > > 
> > > > > Original cover letter follows:
> > > > > 
> > > > > ----------------------8<-----------------------
> > > > > 
> > > > > Christian said [1] to "just do it" when I proposed this, so here we are!
> > > > > 
> > > > > For historical reasons, the inode->i_ino field is an unsigned long,
> > > > > which means that it's 32 bits on 32 bit architectures. This has caused a
> > > > > number of filesystems to implement hacks to hash a 64-bit identifier
> > > > > into a 32-bit field, and deprives us of a universal identifier field for
> > > > > an inode.
> > > > > 
> > > > > This patchset changes the inode->i_ino field from an unsigned long to a
> > > > > u64. This shouldn't make any material difference on 64-bit hosts, but
> > > > > 32-bit hosts will see struct inode grow by at least 4 bytes. This could
> > > > > have effects on slabcache sizes and field alignment.
> > > > > 
> > > > > The bulk of the changes are to format strings and tracepoints, since the
> > > > > kernel itself doesn't care that much about the i_ino field. The first
> > > > > patch changes some vfs function arguments, so check that one out
> > > > > carefully.
> > > > > 
> > > > > With this change, we may be able to shrink some inode structures. For
> > > > > instance, struct nfs_inode has a fileid field that holds the 64-bit
> > > > > inode number. With this set of changes, that field could be eliminated.
> > > > > I'd rather leave that sort of cleanups for later just to keep this
> > > > > simple.
> > > > > 
> > > > > Much of this set was generated by LLM, but I attributed it to myself
> > > > > since I consider this to be in the "menial tasks" category of LLM usage.
> > > > > 
> > > > > [1]: https://lore.kernel.org/linux-fsdevel/20260219-portrait-winkt-959070cee42f@brauner/
> > > > > 
> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > 
> > > > Jeff, missing from this patch set is EVM.  In hmac_add_misc() EVM copies the
> > > > i_ino and calculates either an HMAC or file meta-data hash, which is then
> > > > signed. 
> > > > 
> > > > 
> > > 
> > > Thanks Mimi, good catch.
> > > 
> > > It looks like we should just be able to change the ino field to a u64
> > > alongside everything else. Something like this:
> > > 
> > > diff --git a/security/integrity/evm/evm_crypto.c b/security/integrity/evm/evm_crypto.c
> > > index c0ca4eedb0fe..77b6c2fa345e 100644
> > > --- a/security/integrity/evm/evm_crypto.c
> > > +++ b/security/integrity/evm/evm_crypto.c
> > > @@ -144,7 +144,7 @@ static void hmac_add_misc(struct shash_desc *desc, struct inode *inode,
> > >                           char type, char *digest)
> > >  {
> > >         struct h_misc {
> > > -               unsigned long ino;
> > > +               u64 ino;
> > >                 __u32 generation;
> > >                 uid_t uid;
> > >                 gid_t gid;
> > > 
> > 
> > Agreed.
> > 
> > > 
> > > That should make no material difference on 64-bit hosts. What's the
> > > effect on 32-bit? Will they just need to remeasure everything or would
> > > the consequences be more dire? Do we have any clue whether anyone is
> > > using EVM in 32-bit environments?
> > 
> > All good questions. Unfortunately I don't know the answer to most of them. What
> > we do know: changing the size of the i_ino field would affect EVM file metadata
> > verification and would require relabeling the filesystem.  Even packages
> > containing EVM portable signatures, which don't include or verify the i_ino
> > number, would be affected.
> > 
> 
> Ouch. Technically, I guess this is ABI...
> 
> While converting to u64 seems like the ideal thing to do, the other
> option might be to just keep this as an unsigned long for now.
> 
> No effect on 64-bit, but that could keep things working 32-bit when the
> i_ino casts properly to a u32. ext4 would be fine since they don't
> issue inode numbers larger than UINT_MAX. xfs and btrfs are a bit more
> iffy, but worst case they'd just need to be relabeled (which is what
> they'll need to do anyway).
> 
> If we do that, then we should probably add a comment to this function
> explaining why it's an unsigned long.

Agreed.

> 
> Thoughts?

My concern would be embedded/IoT devices, but I don't have any insight into who
might be using it on 32 bit.

Mimi

^ permalink raw reply

* Re: [PATCH 2/2] tracing: drain deferred trigger frees if kthread startup fails
From: Steven Rostedt @ 2026-03-09 20:02 UTC (permalink / raw)
  To: Wesley Atwell
  Cc: mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel
In-Reply-To: <20260309195220.2726094-2-atwellwea@gmail.com>

On Mon,  9 Mar 2026 13:52:20 -0600
Wesley Atwell <atwellwea@gmail.com> wrote:

> @@ -56,30 +89,48 @@ void trigger_data_free(struct event_trigger_data *data)
>  	if (data->cmd_ops->set_filter)
>  		data->cmd_ops->set_filter(NULL, data, NULL);
>  
> +	/*
> +	 * Boot-time trigger registration can fail before kthread creation
> +	 * works. Keep the deferred-free semantics during boot and let late
> +	 * init start the kthread to drain the list.
> +	 */
> +	if (system_state == SYSTEM_BOOTING && !trigger_kthread) {
> +		llist_add(&data->llist, &trigger_data_free_list);
> +		return;
> +	}
> +
>  	if (unlikely(!trigger_kthread)) {
>  		guard(mutex)(&trigger_data_kthread_mutex);
> -		/* Check again after taking mutex */
> -		if (!trigger_kthread) {
> -			struct task_struct *kthread;
>  
> -			kthread = kthread_create(trigger_kthread_fn, NULL,
> -						 "trigger_data_free");
> -			if (!IS_ERR(kthread))
> -				WRITE_ONCE(trigger_kthread, kthread);
> +		trigger_start_kthread_locked();
> +		if (!trigger_kthread) {
> +			llist_add(&data->llist, &trigger_data_free_list);
> +			trigger_data_free_queued_locked();
> +			return;
>  		}
>  	}
>  
> -	if (!trigger_kthread) {
> -		/* Do it the slow way */

You removed the above comment. It's still relevant in the above if
statement.

-- Steve

> -		tracepoint_synchronize_unregister();
> -		kfree(data);
> -		return;
> -	}
> -
>  	llist_add(&data->llis

^ permalink raw reply

* [PATCH v4 18/18] rtla/utils: Fix loop condition in PID validation
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Ivan Pravdin,
	Crystal Wood, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

The procfs_is_workload_pid() function iterates through a directory
entry name to validate if it represents a process ID. The loop
condition checks if the pointer t_name is non-NULL, but since
incrementing a pointer never makes it NULL, this condition is always
true within the loop's context. Although the inner isdigit() check
catches the NUL terminator and breaks out of the loop, the condition
is semantically misleading and not idiomatic for C string processing.

Correct the loop condition from checking the pointer (t_name) to
checking the character it points to (*t_name). This ensures the loop
terminates when the NUL terminator is reached, aligning with standard
C string iteration practices. While the original code functioned
correctly due to the existing character validation, this change
improves code clarity and maintainability.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/utils.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
index bcc5b5d383d8b..16aa59c90bdb2 100644
--- a/tools/tracing/rtla/src/utils.c
+++ b/tools/tracing/rtla/src/utils.c
@@ -311,7 +311,7 @@ static int procfs_is_workload_pid(const char *comm_prefix, struct dirent *proc_e
 		return 0;

 	/* check if the string is a pid */
-	for (t_name = proc_entry->d_name; t_name; t_name++) {
+	for (t_name = proc_entry->d_name; *t_name; t_name++) {
 		if (!isdigit(*t_name))
 			break;
 	}
-- 
2.53.0

^ permalink raw reply related

* [PATCH v4 17/18] rtla/utils: Fix resource leak in set_comm_sched_attr()
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Ivan Pravdin,
	Crystal Wood, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

The set_comm_sched_attr() function opens the /proc directory via
opendir() but fails to call closedir() on its successful exit path.
If the function iterates through all processes without error, it
returns 0 directly, leaking the DIR stream pointer.

Fix this by refactoring the function to use a single exit path. A
retval variable is introduced to track the success or failure status.
All exit points now jump to a unified out label that calls closedir()
before the function returns, ensuring the resource is always freed.

Fixes: dada03db9bb19 ("rtla: Remove procps-ng dependency")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/utils.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
index c3ed9089e7260..bcc5b5d383d8b 100644
--- a/tools/tracing/rtla/src/utils.c
+++ b/tools/tracing/rtla/src/utils.c
@@ -377,22 +377,23 @@ int set_comm_sched_attr(const char *comm_prefix, struct sched_attr *attr)
 
 		if (strtoi(proc_entry->d_name, &pid)) {
 			err_msg("'%s' is not a valid pid", proc_entry->d_name);
-			goto out_err;
+			retval = 1;
+			goto out;
 		}
 		/* procfs_is_workload_pid confirmed it is a pid */
 		retval = __set_sched_attr(pid, attr);
 		if (retval) {
 			err_msg("Error setting sched attributes for pid:%s\n", proc_entry->d_name);
-			goto out_err;
+			goto out;
 		}
 
 		debug_msg("Set sched attributes for pid:%s\n", proc_entry->d_name);
 	}
-	return 0;
 
-out_err:
+	retval = 0;
+out:
 	closedir(procfs);
-	return 1;
+	return retval;
 }
 
 #define INVALID_VAL	(~0L)
-- 
2.53.0


^ permalink raw reply related

* [PATCH v4 16/18] rtla/trace: Fix I/O handling in save_trace_to_file()
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Crystal Wood,
	Ivan Pravdin, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

The read/write loop in save_trace_to_file() does not correctly handle
errors from the read() and write() system calls. If either call is
interrupted by a signal, it returns -1 with errno set to EINTR, but
the code treats this as a fatal error and aborts the save operation.
Additionally, write() may perform a partial write, returning fewer
bytes than requested, which the code does not handle.

Fix the I/O loop by introducing proper error handling. The return
value of read() is now stored in a ssize_t variable and checked for
errors, with EINTR causing a retry. For write(), an inner loop ensures
all bytes are written, handling both EINTR and partial writes. Error
messages now include strerror() output for better debugging.

This follows the same pattern established in the previous commit that
fixed trace_event_save_hist(), ensuring consistent and robust I/O
handling throughout the trace saving code.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/trace.c | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/tools/tracing/rtla/src/trace.c b/tools/tracing/rtla/src/trace.c
index fed3362527b08..e407447773d04 100644
--- a/tools/tracing/rtla/src/trace.c
+++ b/tools/tracing/rtla/src/trace.c
@@ -73,6 +73,8 @@ int save_trace_to_file(struct tracefs_instance *inst, const char *filename)
 	char buffer[4096];
 	int out_fd, in_fd;
 	int retval = -1;
+	ssize_t n_read;
+	ssize_t n_written;

 	if (!inst || !filename)
 		return 0;
@@ -90,15 +92,30 @@ int save_trace_to_file(struct tracefs_instance *inst, const char *filename)
 		goto out_close_in;
 	}

-	do {
-		retval = read(in_fd, buffer, sizeof(buffer));
-		if (retval <= 0)
+	for (;;) {
+		n_read = read(in_fd, buffer, sizeof(buffer));
+		if (n_read < 0) {
+			if (errno == EINTR)
+				continue;
+			err_msg("Error reading trace file: %s\n", strerror(errno));
 			goto out_close;
+		}
+		if (n_read == 0)
+			break;

-		retval = write(out_fd, buffer, retval);
-		if (retval < 0)
-			goto out_close;
-	} while (retval > 0);
+		n_written = 0;
+		while (n_written < n_read) {
+			const ssize_t w = write(out_fd, buffer + n_written, n_read - n_written);
+
+			if (w < 0) {
+				if (errno == EINTR)
+					continue;
+				err_msg("Error writing trace file: %s\n", strerror(errno));
+				goto out_close;
+			}
+			n_written += w;
+		}
+	}

 	retval = 0;
 out_close:
-- 
2.53.0

^ permalink raw reply related

* [PATCH v4 15/18] rtla/trace: Fix write loop in trace_event_save_hist()
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Crystal Wood,
	Ivan Pravdin, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

The write loop in trace_event_save_hist() does not correctly handle
errors from the write() system call. If write() returns -1, this value
is added to the loop index, leading to an incorrect memory access on
the next iteration and potentially an infinite loop. The loop also
fails to handle EINTR.

Fix the write loop by introducing proper error handling. The return
value of write() is now stored in a ssize_t variable and checked for
errors. The loop retries the call if interrupted by a signal and breaks
on any other error after logging it with strerror().

Additionally, change the index variable type from int to size_t to
match the type used for buffer sizes and by strlen(), improving type
safety.

Fixes: 761916fd02c2 ("rtla/trace: Save event histogram output to a file")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/trace.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tools/tracing/rtla/src/trace.c b/tools/tracing/rtla/src/trace.c
index ed7db5f4115ce..fed3362527b08 100644
--- a/tools/tracing/rtla/src/trace.c
+++ b/tools/tracing/rtla/src/trace.c
@@ -342,11 +342,11 @@ static void trace_event_disable_filter(struct trace_instance *instance,
 static void trace_event_save_hist(struct trace_instance *instance,
 				  struct trace_events *tevent)
 {
-	int index, out_fd;
+	size_t index, hist_len;
 	mode_t mode = 0644;
 	char path[MAX_PATH];
 	char *hist;
-	size_t hist_len;
+	int out_fd;
 
 	if (!tevent)
 		return;
@@ -378,7 +378,15 @@ static void trace_event_save_hist(struct trace_instance *instance,
 	index = 0;
 	hist_len = strlen(hist);
 	do {
-		index += write(out_fd, &hist[index], hist_len - index);
+		const ssize_t written = write(out_fd, &hist[index], hist_len - index);
+
+		if (written < 0) {
+			if (errno == EINTR)
+				continue;
+			err_msg("  Error writing hist file: %s\n", strerror(errno));
+			break;
+		}
+		index += written;
 	} while (index < hist_len);
 
 	free(hist);
-- 
2.53.0


^ permalink raw reply related

* [PATCH v4 14/18] rtla/timerlat: Simplify RTLA_NO_BPF environment variable check
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Ivan Pravdin,
	Crystal Wood, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

The code that checks the RTLA_NO_BPF environment variable calls
getenv() twice and uses strncmp() with a length of 2 to compare
against the single-character string "1". This is inefficient and
the comparison length is unnecessarily long.

Store the result of getenv() in a local variable to avoid the
redundant call, and replace strncmp() with strncmp_static() for
the exact match comparison. This follows the same pattern
established in recent commits that improved string comparison
consistency throughout the rtla codebase.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/timerlat.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/tracing/rtla/src/timerlat.c b/tools/tracing/rtla/src/timerlat.c
index 84577daadd668..91f15f86e78d4 100644
--- a/tools/tracing/rtla/src/timerlat.c
+++ b/tools/tracing/rtla/src/timerlat.c
@@ -28,12 +28,13 @@ int
 timerlat_apply_config(struct osnoise_tool *tool, struct timerlat_params *params)
 {
 	int retval;
+	const char *const rtla_no_bpf = getenv("RTLA_NO_BPF");
 
 	/*
 	 * Try to enable BPF, unless disabled explicitly.
 	 * If BPF enablement fails, fall back to tracefs mode.
 	 */
-	if (getenv("RTLA_NO_BPF") && strncmp(getenv("RTLA_NO_BPF"), "1", 2) == 0) {
+	if (rtla_no_bpf && strncmp_static(rtla_no_bpf, "1") == 0) {
 		debug_msg("RTLA_NO_BPF set, disabling BPF\n");
 		params->mode = TRACING_MODE_TRACEFS;
 	} else if (!tep_find_event_by_name(tool->trace.tep, "osnoise", "timerlat_sample")) {
-- 
2.53.0


^ permalink raw reply related

* [PATCH v4 13/18] rtla: Use str_has_prefix() for option prefix check
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Crystal Wood,
	Ivan Pravdin, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

The argument parsing code in timerlat_main() and osnoise_main() uses
strncmp() with a length of 1 to check if the first argument starts
with a dash, indicating an option flag was passed.

Replace this pattern with str_has_prefix() for consistency with the
rest of the codebase. While character comparison would be slightly
more efficient, using str_has_prefix() provides better readability
and maintains a uniform coding style throughout the rtla tool.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/osnoise.c  | 2 +-
 tools/tracing/rtla/src/timerlat.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/tracing/rtla/src/osnoise.c b/tools/tracing/rtla/src/osnoise.c
index 9d1fd9981fe91..2db3db155c44a 100644
--- a/tools/tracing/rtla/src/osnoise.c
+++ b/tools/tracing/rtla/src/osnoise.c
@@ -1210,7 +1210,7 @@ int osnoise_main(int argc, char *argv[])
 
 	if ((strcmp(argv[1], "-h") == 0) || (strcmp(argv[1], "--help") == 0)) {
 		osnoise_usage(0);
-	} else if (strncmp(argv[1], "-", 1) == 0) {
+	} else if (str_has_prefix(argv[1], "-")) {
 		/* the user skipped the tool, call the default one */
 		run_tool(&osnoise_top_ops, argc, argv);
 		exit(0);
diff --git a/tools/tracing/rtla/src/timerlat.c b/tools/tracing/rtla/src/timerlat.c
index 8f8811f7a13bd..84577daadd668 100644
--- a/tools/tracing/rtla/src/timerlat.c
+++ b/tools/tracing/rtla/src/timerlat.c
@@ -272,7 +272,7 @@ int timerlat_main(int argc, char *argv[])
 
 	if ((strcmp(argv[1], "-h") == 0) || (strcmp(argv[1], "--help") == 0)) {
 		timerlat_usage(0);
-	} else if (strncmp(argv[1], "-", 1) == 0) {
+	} else if (str_has_prefix(argv[1], "-")) {
 		/* the user skipped the tool, call the default one */
 		run_tool(&timerlat_top_ops, argc, argv);
 		exit(0);
-- 
2.53.0


^ permalink raw reply related

* [PATCH v4 12/18] rtla: Enforce exact match for time unit suffixes
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Crystal Wood,
	Ivan Pravdin, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

The parse_ns_duration() function currently uses prefix matching for
detecting time units. This approach is problematic as it silently
accepts malformed strings such as "100nsx" or "100us_invalid" by
ignoring the trailing characters, leading to potential configuration
errors.

Introduce a match_time_unit() helper that checks the suffix matches
exactly and is followed by either end-of-string or a ':' delimiter.
The ':' is needed because parse_ns_duration() is also called from
get_long_ns_after_colon() when parsing SCHED_DEADLINE priority
specifications in the format "d:runtime:period" (e.g., "d:10ms:100ms").

A plain strcmp() would reject valid deadline strings because the suffix
"ms" is followed by ":100ms", not end-of-string. Similarly,
strncmp_static() would fail because ARRAY_SIZE() includes the NUL
terminator, making it equivalent to strcmp() for this comparison.

The match_time_unit() helper solves both problems: it rejects malformed
input like "100msx" while correctly handling the colon-delimited
deadline format.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/utils.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
index 486d96e8290fb..c3ed9089e7260 100644
--- a/tools/tracing/rtla/src/utils.c
+++ b/tools/tracing/rtla/src/utils.c
@@ -200,6 +200,21 @@ long parse_seconds_duration(char *val)
 	return t;
 }

+/*
+ * match_time_unit - check if str starts with unit followed by end-of-string or ':'
+ *
+ * This allows the time unit parser to work both in standalone duration strings
+ * like "100ms" and in colon-delimited SCHED_DEADLINE specifications like
+ * "d:10ms:100ms", while still rejecting malformed input like "100msx".
+ */
+static bool match_time_unit(const char *str, const char *unit)
+{
+	size_t len = strlen(unit);
+
+	return strncmp(str, unit, len) == 0 &&
+	       (str[len] == '\0' || str[len] == ':');
+}
+
 /*
  * parse_ns_duration - parse duration with ns/us/ms/s converting it to nanoseconds
  */
@@ -211,15 +226,15 @@ long parse_ns_duration(char *val)
 	t = strtol(val, &end, 10);

 	if (end) {
-		if (!strncmp(end, "ns", 2)) {
+		if (match_time_unit(end, "ns")) {
 			return t;
-		} else if (!strncmp(end, "us", 2)) {
+		} else if (match_time_unit(end, "us")) {
 			t *= 1000;
 			return t;
-		} else if (!strncmp(end, "ms", 2)) {
+		} else if (match_time_unit(end, "ms")) {
 			t *= 1000 * 1000;
 			return t;
-		} else if (!strncmp(end, "s", 1)) {
+		} else if (match_time_unit(end, "s")) {
 			t *= 1000 * 1000 * 1000;
 			return t;
 		}
-- 
2.53.0

^ permalink raw reply related

* [PATCH v4 11/18] rtla: Use str_has_prefix() for prefix checks
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Ivan Pravdin,
	Crystal Wood, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

The code currently uses strncmp() combined with strlen() to check if a
string starts with a specific prefix. This pattern is verbose and prone
to errors if the length does not match the prefix string.

Replace this pattern with the str_has_prefix() helper function in both
trace.c and utils.c. This improves code readability and safety by
handling the prefix length calculation automatically.

In addition, remove the unused retval variable from
trace_event_save_hist() in trace.c to clean up the function and
silence potential compiler warnings.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/trace.c | 5 ++---
 tools/tracing/rtla/src/utils.c | 3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/tools/tracing/rtla/src/trace.c b/tools/tracing/rtla/src/trace.c
index 2f529aaf8deef..ed7db5f4115ce 100644
--- a/tools/tracing/rtla/src/trace.c
+++ b/tools/tracing/rtla/src/trace.c
@@ -342,7 +342,7 @@ static void trace_event_disable_filter(struct trace_instance *instance,
 static void trace_event_save_hist(struct trace_instance *instance,
 				  struct trace_events *tevent)
 {
-	int retval, index, out_fd;
+	int index, out_fd;
 	mode_t mode = 0644;
 	char path[MAX_PATH];
 	char *hist;
@@ -356,8 +356,7 @@ static void trace_event_save_hist(struct trace_instance *instance,
 		return;
 
 	/* is this a hist: trigger? */
-	retval = strncmp(tevent->trigger, "hist:", strlen("hist:"));
-	if (retval)
+	if (!str_has_prefix(tevent->trigger, "hist:"))
 		return;
 
 	snprintf(path, ARRAY_SIZE(path), "%s_%s_hist.txt", tevent->system, tevent->event);
diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
index e98288e55db15..486d96e8290fb 100644
--- a/tools/tracing/rtla/src/utils.c
+++ b/tools/tracing/rtla/src/utils.c
@@ -318,8 +318,7 @@ static int procfs_is_workload_pid(const char *comm_prefix, struct dirent *proc_e
 		return 0;
 
 	buffer[MAX_PATH-1] = '\0';
-	retval = strncmp(comm_prefix, buffer, strlen(comm_prefix));
-	if (retval)
+	if (!str_has_prefix(buffer, comm_prefix))
 		return 0;
 
 	/* comm already have \n */
-- 
2.53.0


^ permalink raw reply related

* [PATCH v4 10/18] rtla: Add str_has_prefix() helper function
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Crystal Wood,
	Ivan Pravdin, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

Add a str_has_prefix() helper function that tests whether a string
starts with a given prefix. This function provides a cleaner interface
for prefix matching compared to using strncmp() with strlen() directly.

The function returns a boolean value indicating whether the string
starts with the specified prefix. This helper will be used in
subsequent changes to simplify prefix matching code throughout rtla.

Also add the missing string.h include which is needed for the strlen()
and strncmp() functions used by str_has_prefix() and the existing
strncmp_static() macro.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/utils.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tools/tracing/rtla/src/utils.h b/tools/tracing/rtla/src/utils.h
index 3bca53ee48db6..165d6d0a0c6ab 100644
--- a/tools/tracing/rtla/src/utils.h
+++ b/tools/tracing/rtla/src/utils.h
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
 #include <stdint.h>
+#include <string.h>
 #include <time.h>
 #include <sched.h>
 #include <stdbool.h>
@@ -24,6 +25,18 @@
 /* Compare string with static string, length determined at compile time */
 #define strncmp_static(s1, s2) strncmp(s1, s2, ARRAY_SIZE(s2))
 
+/**
+ * str_has_prefix - Test if a string has a given prefix
+ * @str: The string to test
+ * @prefix: The string to see if @str starts with
+ *
+ * Returns: true if @str starts with @prefix, false otherwise
+ */
+static inline bool str_has_prefix(const char *str, const char *prefix)
+{
+	return strncmp(str, prefix, strlen(prefix)) == 0;
+}
+
 #define container_of(ptr, type, member)({			\
 	const typeof(((type *)0)->member) *__mptr = (ptr);	\
 	(type *)((char *)__mptr - offsetof(type, member)) ; })
-- 
2.53.0


^ permalink raw reply related

* [PATCH v4 09/18] rtla: Handle pthread_create() failure properly
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Crystal Wood,
	Ivan Pravdin, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

Add proper error handling when pthread_create() fails to create the
timerlat user-space dispatcher thread. Previously, the code only logged
an error message but continued execution, which could lead to undefined
behavior when the tool later expects the thread to be running.

When pthread_create() returns an error, the function now jumps to the
out_trace error path to properly clean up resources and exit. This
ensures consistent error handling and prevents the tool from running
in an invalid state without the required user-space thread.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/common.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/tracing/rtla/src/common.c b/tools/tracing/rtla/src/common.c
index cbc207fa58707..73906065e7772 100644
--- a/tools/tracing/rtla/src/common.c
+++ b/tools/tracing/rtla/src/common.c
@@ -303,8 +303,10 @@ int run_tool(struct tool_ops *ops, int argc, char *argv[])
 		params->user.cgroup_name = params->cgroup_name;
 
 		retval = pthread_create(&user_thread, NULL, timerlat_u_dispatcher, &params->user);
-		if (retval)
+		if (retval) {
 			err_msg("Error creating timerlat user-space threads\n");
+			goto out_trace;
+		}
 	}
 
 	retval = ops->enable(tool);
-- 
2.53.0


^ permalink raw reply related

* [PATCH v4 08/18] rtla/timerlat: Add bounds check for softirq vector
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Ivan Pravdin,
	Crystal Wood, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

Add bounds checking when accessing the softirq_name array using the
vector value from kernel trace data. The vector field from the
osnoise:softirq_noise event is used directly as an array index without
validation, which could cause an out-of-bounds read if the kernel
provides an unexpected vector value.

The softirq_name array contains 10 elements corresponding to the
standard Linux softirq vectors. While the kernel should only provide
valid vector values in the range 0-9, defensive programming requires
validating untrusted input before using it as an array index. If an
out-of-range vector is encountered, display the word UNKNOWN instead
of attempting to read beyond the array bounds.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/timerlat_aa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/tracing/rtla/src/timerlat_aa.c b/tools/tracing/rtla/src/timerlat_aa.c
index 30ef56d644f9c..bc421637cc19b 100644
--- a/tools/tracing/rtla/src/timerlat_aa.c
+++ b/tools/tracing/rtla/src/timerlat_aa.c
@@ -417,8 +417,8 @@ static int timerlat_aa_softirq_handler(struct trace_seq *s, struct tep_record *r
 	taa_data->thread_softirq_sum += duration;

 	trace_seq_printf(taa_data->softirqs_seq, "  %24s:%-3llu %.*s %9.2f us\n",
-			 softirq_name[vector], vector,
-			 24, spaces,
+			 vector < ARRAY_SIZE(softirq_name) ? softirq_name[vector] : "UNKNOWN",
+			 vector, 24, spaces,
 			 ns_to_usf(duration));
 	return 0;
 }
-- 
2.53.0

^ permalink raw reply related

* [PATCH v4 07/18] rtla: Add strscpy() and replace strncpy() calls
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Ivan Pravdin,
	Crystal Wood, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

Introduce a userspace strscpy() implementation that matches the Linux
kernel's strscpy() semantics. The function is built on top of glibc's
strlcpy() and provides guaranteed NUL-termination along with proper
truncation detection through its return value.

The previous strncpy() calls had potential issues: strncpy() does not
guarantee NUL-termination when the source string length equals or
exceeds the destination buffer size. This required defensive patterns
like pre-zeroing buffers or manually setting the last byte to NUL.
The new strscpy() function always NUL-terminates the destination buffer
unless the size is zero, and returns -E2BIG on truncation, making error
handling cleaner and more consistent with kernel code.

Note that unlike the kernel's strscpy(), this implementation uses
strlcpy() internally, which reads the entire source string to determine
its length. The kernel avoids this to prevent potential DoS attacks from
extremely long untrusted strings. This is harmless for a userspace CLI
tool like rtla where input sources are bounded and trusted.

Replace all strncpy() calls in rtla with strscpy(), using sizeof() for
buffer sizes instead of magic constants to ensure the sizes stay in
sync with the actual buffer declarations. Also remove a now-redundant
memset() call that was previously needed to work around strncpy()
behavior.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/timerlat_aa.c |  6 ++---
 tools/tracing/rtla/src/utils.c       | 34 ++++++++++++++++++++++++++--
 tools/tracing/rtla/src/utils.h       |  1 +
 3 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/tools/tracing/rtla/src/timerlat_aa.c b/tools/tracing/rtla/src/timerlat_aa.c
index 31e66ea2b144c..30ef56d644f9c 100644
--- a/tools/tracing/rtla/src/timerlat_aa.c
+++ b/tools/tracing/rtla/src/timerlat_aa.c
@@ -455,9 +455,9 @@ static int timerlat_aa_thread_handler(struct trace_seq *s, struct tep_record *re
 		taa_data->thread_blocking_duration = duration;
 
 		if (comm)
-			strncpy(taa_data->run_thread_comm, comm, MAX_COMM);
+			strscpy(taa_data->run_thread_comm, comm, sizeof(taa_data->run_thread_comm));
 		else
-			sprintf(taa_data->run_thread_comm, "<...>");
+			strscpy(taa_data->run_thread_comm, "<...>", sizeof(taa_data->run_thread_comm));
 
 	} else {
 		taa_data->thread_thread_sum += duration;
@@ -519,7 +519,7 @@ static int timerlat_aa_sched_switch_handler(struct trace_seq *s, struct tep_reco
 	tep_get_field_val(s, event, "next_pid", record, &taa_data->current_pid, 1);
 	comm = tep_get_field_raw(s, event, "next_comm", record, &val, 1);
 
-	strncpy(taa_data->current_comm, comm, MAX_COMM);
+	strscpy(taa_data->current_comm, comm, sizeof(taa_data->current_comm));
 
 	/*
 	 * If this was a kworker, clean the last kworkers that ran.
diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
index b5a6007b108d2..e98288e55db15 100644
--- a/tools/tracing/rtla/src/utils.c
+++ b/tools/tracing/rtla/src/utils.c
@@ -722,8 +722,7 @@ static const int find_mount(const char *fs, char *mp, int sizeof_mp)
 	if (!found)
 		return 0;
 
-	memset(mp, 0, sizeof_mp);
-	strncpy(mp, mount_point, sizeof_mp - 1);
+	strscpy(mp, mount_point, sizeof_mp);
 
 	debug_msg("Fs %s found at %s\n", fs, mp);
 	return 1;
@@ -1036,6 +1035,37 @@ int strtoi(const char *s, int *res)
 	return 0;
 }
 
+/**
+ * strscpy - Copy a C-string into a sized buffer
+ * @dst: Where to copy the string to
+ * @src: Where to copy the string from
+ * @count: Size of destination buffer
+ *
+ * Copy the source string @src, or as much of it as fits, into the destination
+ * @dst buffer. The destination @dst buffer is always NUL-terminated, unless
+ * it's zero-sized.
+ *
+ * This is a userspace implementation matching the kernel's strscpy() semantics,
+ * built on top of glibc's strlcpy().
+ *
+ * Returns the number of characters copied (not including the trailing NUL)
+ * or -E2BIG if @count is 0 or the copy was truncated.
+ */
+ssize_t strscpy(char *dst, const char *src, size_t count)
+{
+	size_t len;
+
+	if (count == 0)
+		return -E2BIG;
+
+	len = strlcpy(dst, src, count);
+
+	if (len >= count)
+		return -E2BIG;
+
+	return (ssize_t) len;
+}
+
 static inline void fatal_alloc(void)
 {
 	fatal("Error allocating memory\n");
diff --git a/tools/tracing/rtla/src/utils.h b/tools/tracing/rtla/src/utils.h
index 11458676f607d..3bca53ee48db6 100644
--- a/tools/tracing/rtla/src/utils.h
+++ b/tools/tracing/rtla/src/utils.h
@@ -97,6 +97,7 @@ static inline int have_libcpupower_support(void) { return 0; }
 #endif /* HAVE_LIBCPUPOWER_SUPPORT */
 int auto_house_keeping(cpu_set_t *monitored_cpus);
 __attribute__((__warn_unused_result__)) int strtoi(const char *s, int *res);
+ssize_t strscpy(char *dst, const char *src, size_t count);
 
 #define ns_to_usf(x) (((double)x/1000))
 #define ns_to_per(total, part) ((part * 100) / (double)total)
-- 
2.53.0


^ permalink raw reply related

* [PATCH v4 06/18] rtla: Simplify code by caching string lengths
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Ivan Pravdin,
	Crystal Wood, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

Simplify trace_event_save_hist() and set_comm_cgroup() by computing
string lengths once and storing them in local variables, rather than
calling strlen() multiple times on the same unchanged strings. This
makes the code clearer by eliminating redundant function calls and
improving readability.

In trace_event_save_hist(), the write loop previously called strlen()
on the hist buffer twice per iteration for both the size calculation
and loop condition. Store the length in hist_len before entering the
loop. In set_comm_cgroup(), strlen() was called on cgroup_path up to
three times in succession. Store the result in cg_path_len to use in
both the offset calculation and size parameter for subsequent append
operations.

This simplification makes the code easier to read and maintain without
changing program behavior.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/trace.c |  6 ++++--
 tools/tracing/rtla/src/utils.c | 11 +++++++----
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/tools/tracing/rtla/src/trace.c b/tools/tracing/rtla/src/trace.c
index e1af54f9531b8..2f529aaf8deef 100644
--- a/tools/tracing/rtla/src/trace.c
+++ b/tools/tracing/rtla/src/trace.c
@@ -346,6 +346,7 @@ static void trace_event_save_hist(struct trace_instance *instance,
 	mode_t mode = 0644;
 	char path[MAX_PATH];
 	char *hist;
+	size_t hist_len;
 
 	if (!tevent)
 		return;
@@ -376,9 +377,10 @@ static void trace_event_save_hist(struct trace_instance *instance,
 	}
 
 	index = 0;
+	hist_len = strlen(hist);
 	do {
-		index += write(out_fd, &hist[index], strlen(hist) - index);
-	} while (index < strlen(hist));
+		index += write(out_fd, &hist[index], hist_len - index);
+	} while (index < hist_len);
 
 	free(hist);
 out_close:
diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
index 75cdcc63d5a15..b5a6007b108d2 100644
--- a/tools/tracing/rtla/src/utils.c
+++ b/tools/tracing/rtla/src/utils.c
@@ -811,6 +811,7 @@ static int open_cgroup_procs(const char *cgroup)
 	char cgroup_procs[MAX_PATH];
 	int retval;
 	int cg_fd;
+	size_t cg_path_len;
 
 	retval = find_mount("cgroup2", cgroup_path, sizeof(cgroup_path));
 	if (!retval) {
@@ -818,16 +819,18 @@ static int open_cgroup_procs(const char *cgroup)
 		return -1;
 	}
 
+	cg_path_len = strlen(cgroup_path);
+
 	if (!cgroup) {
-		retval = get_self_cgroup(&cgroup_path[strlen(cgroup_path)],
-				sizeof(cgroup_path) - strlen(cgroup_path));
+		retval = get_self_cgroup(&cgroup_path[cg_path_len],
+				sizeof(cgroup_path) - cg_path_len);
 		if (!retval) {
 			err_msg("Did not find self cgroup\n");
 			return -1;
 		}
 	} else {
-		snprintf(&cgroup_path[strlen(cgroup_path)],
-				sizeof(cgroup_path) - strlen(cgroup_path), "%s/", cgroup);
+		snprintf(&cgroup_path[cg_path_len],
+				sizeof(cgroup_path) - cg_path_len, "%s/", cgroup);
 	}
 
 	snprintf(cgroup_procs, MAX_PATH, "%s/cgroup.procs", cgroup_path);
-- 
2.53.0


^ permalink raw reply related

* [PATCH v4 05/18] rtla: Replace magic number with MAX_PATH
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Ivan Pravdin,
	Crystal Wood, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

The trace functions use a buffer to manipulate strings that will be
written to tracefs files. These buffers are defined with a magic number
of 1024, which is a common source of vulnerabilities.

Replace the magic number 1024 with the MAX_PATH macro to make the code
safer and more readable. While at it, replace other instances of the
magic number with ARRAY_SIZE() when the buffer is locally defined.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/osnoise.c    |  2 +-
 tools/tracing/rtla/src/timerlat_u.c |  4 ++--
 tools/tracing/rtla/src/trace.c      | 20 ++++++++++----------
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/tools/tracing/rtla/src/osnoise.c b/tools/tracing/rtla/src/osnoise.c
index ec074cd53dd84..9d1fd9981fe91 100644
--- a/tools/tracing/rtla/src/osnoise.c
+++ b/tools/tracing/rtla/src/osnoise.c
@@ -62,7 +62,7 @@ int osnoise_set_cpus(struct osnoise_context *context, char *cpus)
 	if (!context->curr_cpus)
 		return -1;
 
-	snprintf(buffer, 1024, "%s\n", cpus);
+	snprintf(buffer, ARRAY_SIZE(buffer), "%s\n", cpus);
 
 	debug_msg("setting cpus to %s from %s", cpus, context->orig_cpus);
 
diff --git a/tools/tracing/rtla/src/timerlat_u.c b/tools/tracing/rtla/src/timerlat_u.c
index ce68e39d25fde..efe2f72686486 100644
--- a/tools/tracing/rtla/src/timerlat_u.c
+++ b/tools/tracing/rtla/src/timerlat_u.c
@@ -32,7 +32,7 @@
 static int timerlat_u_main(int cpu, struct timerlat_u_params *params)
 {
 	struct sched_param sp = { .sched_priority = 95 };
-	char buffer[1024];
+	char buffer[MAX_PATH];
 	int timerlat_fd;
 	cpu_set_t set;
 	int retval;
@@ -83,7 +83,7 @@ static int timerlat_u_main(int cpu, struct timerlat_u_params *params)
 
 	/* add should continue with a signal handler */
 	while (true) {
-		retval = read(timerlat_fd, buffer, 1024);
+		retval = read(timerlat_fd, buffer, ARRAY_SIZE(buffer));
 		if (retval < 0)
 			break;
 	}
diff --git a/tools/tracing/rtla/src/trace.c b/tools/tracing/rtla/src/trace.c
index 211ca54b15b0e..e1af54f9531b8 100644
--- a/tools/tracing/rtla/src/trace.c
+++ b/tools/tracing/rtla/src/trace.c
@@ -313,7 +313,7 @@ void trace_event_add_trigger(struct trace_events *event, char *trigger)
 static void trace_event_disable_filter(struct trace_instance *instance,
 				       struct trace_events *tevent)
 {
-	char filter[1024];
+	char filter[MAX_PATH];
 	int retval;
 
 	if (!tevent->filter)
@@ -325,7 +325,7 @@ static void trace_event_disable_filter(struct trace_instance *instance,
 	debug_msg("Disabling %s:%s filter %s\n", tevent->system,
 		  tevent->event ? : "*", tevent->filter);
 
-	snprintf(filter, 1024, "!%s\n", tevent->filter);
+	snprintf(filter, ARRAY_SIZE(filter), "!%s\n", tevent->filter);
 
 	retval = tracefs_event_file_write(instance->inst, tevent->system,
 					  tevent->event, "filter", filter);
@@ -344,7 +344,7 @@ static void trace_event_save_hist(struct trace_instance *instance,
 {
 	int retval, index, out_fd;
 	mode_t mode = 0644;
-	char path[1024];
+	char path[MAX_PATH];
 	char *hist;
 
 	if (!tevent)
@@ -359,7 +359,7 @@ static void trace_event_save_hist(struct trace_instance *instance,
 	if (retval)
 		return;
 
-	snprintf(path, 1024, "%s_%s_hist.txt", tevent->system, tevent->event);
+	snprintf(path, ARRAY_SIZE(path), "%s_%s_hist.txt", tevent->system, tevent->event);
 
 	printf("  Saving event %s:%s hist to %s\n", tevent->system, tevent->event, path);
 
@@ -391,7 +391,7 @@ static void trace_event_save_hist(struct trace_instance *instance,
 static void trace_event_disable_trigger(struct trace_instance *instance,
 					struct trace_events *tevent)
 {
-	char trigger[1024];
+	char trigger[MAX_PATH];
 	int retval;
 
 	if (!tevent->trigger)
@@ -405,7 +405,7 @@ static void trace_event_disable_trigger(struct trace_instance *instance,
 
 	trace_event_save_hist(instance, tevent);
 
-	snprintf(trigger, 1024, "!%s\n", tevent->trigger);
+	snprintf(trigger, ARRAY_SIZE(trigger), "!%s\n", tevent->trigger);
 
 	retval = tracefs_event_file_write(instance->inst, tevent->system,
 					  tevent->event, "trigger", trigger);
@@ -444,7 +444,7 @@ void trace_events_disable(struct trace_instance *instance,
 static int trace_event_enable_filter(struct trace_instance *instance,
 				     struct trace_events *tevent)
 {
-	char filter[1024];
+	char filter[MAX_PATH];
 	int retval;
 
 	if (!tevent->filter)
@@ -456,7 +456,7 @@ static int trace_event_enable_filter(struct trace_instance *instance,
 		return 1;
 	}
 
-	snprintf(filter, 1024, "%s\n", tevent->filter);
+	snprintf(filter, ARRAY_SIZE(filter), "%s\n", tevent->filter);
 
 	debug_msg("Enabling %s:%s filter %s\n", tevent->system,
 		  tevent->event ? : "*", tevent->filter);
@@ -479,7 +479,7 @@ static int trace_event_enable_filter(struct trace_instance *instance,
 static int trace_event_enable_trigger(struct trace_instance *instance,
 				      struct trace_events *tevent)
 {
-	char trigger[1024];
+	char trigger[MAX_PATH];
 	int retval;
 
 	if (!tevent->trigger)
@@ -491,7 +491,7 @@ static int trace_event_enable_trigger(struct trace_instance *instance,
 		return 1;
 	}
 
-	snprintf(trigger, 1024, "%s\n", tevent->trigger);
+	snprintf(trigger, ARRAY_SIZE(trigger), "%s\n", tevent->trigger);
 
 	debug_msg("Enabling %s:%s trigger %s\n", tevent->system,
 		  tevent->event ? : "*", tevent->trigger);
-- 
2.53.0


^ permalink raw reply related

* [PATCH v4 04/18] rtla: Introduce common_threshold_handler() helper
From: Wander Lairson Costa @ 2026-03-09 19:46 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar, Wander Lairson Costa, Ivan Pravdin,
	Crystal Wood, Costa Shulyupin, John Kacur, Tiezhu Yang,
	Daniel Bristot de Oliveira, Daniel Wagner,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:Real-time Linux Analysis (RTLA) tools,
	open list:BPF [MISC]:Keyword:(?:b|_)bpf(?:b|_)
In-Reply-To: <20260309195040.1019085-1-wander@redhat.com>

Several functions duplicate the logic for handling threshold actions.
When a threshold is reached, these functions stop the trace, perform
configured actions, and restart the trace if --on-threshold continue
is set.

Create common_threshold_handler() to centralize this shared logic and
avoid code duplication. The function executes the configured threshold
actions and restarts the necessary trace instances when appropriate.

Also add should_continue_tracing() helper to encapsulate the check
for whether tracing should continue after a threshold event, improving
code readability at call sites.

In timerlat_top_bpf_main_loop(), use common_params directly instead
of casting through timerlat_params when only common fields are needed.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 tools/tracing/rtla/src/common.c        | 61 ++++++++++++++++++--------
 tools/tracing/rtla/src/common.h        | 18 ++++++++
 tools/tracing/rtla/src/timerlat_hist.c | 19 ++++----
 tools/tracing/rtla/src/timerlat_top.c  | 32 +++++++-------
 4 files changed, 86 insertions(+), 44 deletions(-)

diff --git a/tools/tracing/rtla/src/common.c b/tools/tracing/rtla/src/common.c
index ceff76a62a30b..cbc207fa58707 100644
--- a/tools/tracing/rtla/src/common.c
+++ b/tools/tracing/rtla/src/common.c
@@ -175,6 +175,38 @@ common_apply_config(struct osnoise_tool *tool, struct common_params *params)
 }
 
 
+/**
+ * common_threshold_handler - handle latency threshold overflow
+ * @tool: pointer to the osnoise_tool instance containing trace contexts
+ *
+ * Executes the configured threshold actions (e.g., saving trace, printing,
+ * sending signals). If the continue flag is set (--on-threshold continue),
+ * restarts the auxiliary trace instances to continue monitoring.
+ *
+ * Return: 0 for success, -1 for error.
+ */
+int
+common_threshold_handler(const struct osnoise_tool *tool)
+{
+	actions_perform(&tool->params->threshold_actions);
+
+	if (!should_continue_tracing(tool->params))
+		/* continue flag not set, break */
+		return 0;
+
+	/* continue action reached, re-enable tracing */
+	if (tool->record && trace_instance_start(&tool->record->trace))
+		goto err;
+	if (tool->aa && trace_instance_start(&tool->aa->trace))
+		goto err;
+
+	return 0;
+
+err:
+	err_msg("Error restarting trace\n");
+	return -1;
+}
+
 int run_tool(struct tool_ops *ops, int argc, char *argv[])
 {
 	struct common_params *params;
@@ -352,17 +384,14 @@ int top_main_loop(struct osnoise_tool *tool)
 				/* stop tracing requested, do not perform actions */
 				return 0;
 
-			actions_perform(&params->threshold_actions);
+			retval = common_threshold_handler(tool);
+			if (retval)
+				return retval;
+
 
-			if (!params->threshold_actions.continue_flag)
-				/* continue flag not set, break */
+			if (!should_continue_tracing(params))
 				return 0;
 
-			/* continue action reached, re-enable tracing */
-			if (record)
-				trace_instance_start(&record->trace);
-			if (tool->aa)
-				trace_instance_start(&tool->aa->trace);
 			trace_instance_start(trace);
 		}
 
@@ -403,18 +432,14 @@ int hist_main_loop(struct osnoise_tool *tool)
 				/* stop tracing requested, do not perform actions */
 				break;
 
-			actions_perform(&params->threshold_actions);
+			retval = common_threshold_handler(tool);
+			if (retval)
+				return retval;
 
-			if (!params->threshold_actions.continue_flag)
-				/* continue flag not set, break */
-				break;
+			if (!should_continue_tracing(params))
+				return 0;
 
-			/* continue action reached, re-enable tracing */
-			if (tool->record)
-				trace_instance_start(&tool->record->trace);
-			if (tool->aa)
-				trace_instance_start(&tool->aa->trace);
-			trace_instance_start(&tool->trace);
+			trace_instance_start(trace);
 		}
 
 		/* is there still any user-threads ? */
diff --git a/tools/tracing/rtla/src/common.h b/tools/tracing/rtla/src/common.h
index 7602c5593ef5d..c548decd3c40f 100644
--- a/tools/tracing/rtla/src/common.h
+++ b/tools/tracing/rtla/src/common.h
@@ -143,6 +143,24 @@ struct tool_ops {
 	void (*free)(struct osnoise_tool *tool);
 };
 
+/**
+ * should_continue_tracing - check if tracing should continue after threshold
+ * @params: pointer to the common parameters structure
+ *
+ * Returns true if the continue action was configured (--on-threshold continue),
+ * indicating that tracing should be restarted after handling the threshold event.
+ *
+ * Return: 1 if tracing should continue, 0 otherwise.
+ */
+static inline int
+should_continue_tracing(const struct common_params *params)
+{
+	return params->threshold_actions.continue_flag;
+}
+
+int
+common_threshold_handler(const struct osnoise_tool *tool);
+
 int osnoise_set_cpus(struct osnoise_context *context, char *cpus);
 void osnoise_restore_cpus(struct osnoise_context *context);
 
diff --git a/tools/tracing/rtla/src/timerlat_hist.c b/tools/tracing/rtla/src/timerlat_hist.c
index 6ea397421f1c9..6b8eaef8a3a09 100644
--- a/tools/tracing/rtla/src/timerlat_hist.c
+++ b/tools/tracing/rtla/src/timerlat_hist.c
@@ -17,6 +17,7 @@
 #include "timerlat.h"
 #include "timerlat_aa.h"
 #include "timerlat_bpf.h"
+#include "common.h"
 
 struct timerlat_hist_cpu {
 	int			*irq;
@@ -1048,7 +1049,6 @@ static struct osnoise_tool
 
 static int timerlat_hist_bpf_main_loop(struct osnoise_tool *tool)
 {
-	struct timerlat_params *params = to_timerlat_params(tool->params);
 	int retval;
 
 	while (!stop_tracing) {
@@ -1056,18 +1056,17 @@ static int timerlat_hist_bpf_main_loop(struct osnoise_tool *tool)
 
 		if (!stop_tracing) {
 			/* Threshold overflow, perform actions on threshold */
-			actions_perform(&params->common.threshold_actions);
+			retval = common_threshold_handler(tool);
+			if (retval)
+				return retval;
 
-			if (!params->common.threshold_actions.continue_flag)
-				/* continue flag not set, break */
+			if (!should_continue_tracing(tool->params))
 				break;
 
-			/* continue action reached, re-enable tracing */
-			if (tool->record)
-				trace_instance_start(&tool->record->trace);
-			if (tool->aa)
-				trace_instance_start(&tool->aa->trace);
-			timerlat_bpf_restart_tracing();
+			if (timerlat_bpf_restart_tracing()) {
+				err_msg("Error restarting BPF trace\n");
+				return -1;
+			}
 		}
 	}
 	timerlat_bpf_detach();
diff --git a/tools/tracing/rtla/src/timerlat_top.c b/tools/tracing/rtla/src/timerlat_top.c
index dd727cb48b551..c6f6757c3fb6b 100644
--- a/tools/tracing/rtla/src/timerlat_top.c
+++ b/tools/tracing/rtla/src/timerlat_top.c
@@ -17,6 +17,7 @@
 #include "timerlat.h"
 #include "timerlat_aa.h"
 #include "timerlat_bpf.h"
+#include "common.h"
 
 struct timerlat_top_cpu {
 	unsigned long long	irq_count;
@@ -801,10 +802,10 @@ static struct osnoise_tool
 static int
 timerlat_top_bpf_main_loop(struct osnoise_tool *tool)
 {
-	struct timerlat_params *params = to_timerlat_params(tool->params);
+	const struct common_params *params = tool->params;
 	int retval, wait_retval;
 
-	if (params->common.aa_only) {
+	if (params->aa_only) {
 		/* Auto-analysis only, just wait for stop tracing */
 		timerlat_bpf_wait(-1);
 		return 0;
@@ -812,8 +813,8 @@ timerlat_top_bpf_main_loop(struct osnoise_tool *tool)
 
 	/* Pull and display data in a loop */
 	while (!stop_tracing) {
-		wait_retval = timerlat_bpf_wait(params->common.quiet ? -1 :
-						params->common.sleep_time);
+		wait_retval = timerlat_bpf_wait(params->quiet ? -1 :
+						params->sleep_time);
 
 		retval = timerlat_top_bpf_pull_data(tool);
 		if (retval) {
@@ -821,28 +822,27 @@ timerlat_top_bpf_main_loop(struct osnoise_tool *tool)
 			return retval;
 		}
 
-		if (!params->common.quiet)
+		if (!params->quiet)
 			timerlat_print_stats(tool);
 
 		if (wait_retval != 0) {
 			/* Stopping requested by tracer */
-			actions_perform(&params->common.threshold_actions);
+			retval = common_threshold_handler(tool);
+			if (retval)
+				return retval;
 
-			if (!params->common.threshold_actions.continue_flag)
-				/* continue flag not set, break */
+			if (!should_continue_tracing(tool->params))
 				break;
 
-			/* continue action reached, re-enable tracing */
-			if (tool->record)
-				trace_instance_start(&tool->record->trace);
-			if (tool->aa)
-				trace_instance_start(&tool->aa->trace);
-			timerlat_bpf_restart_tracing();
+			if (timerlat_bpf_restart_tracing()) {
+				err_msg("Error restarting BPF trace\n");
+				return -1;
+			}
 		}
 
 		/* is there still any user-threads ? */
-		if (params->common.user_workload) {
-			if (params->common.user.stopped_running) {
+		if (params->user_workload) {
+			if (params->user.stopped_running) {
 				debug_msg("timerlat user space threads stopped!\n");
 				break;
 			}
-- 
2.53.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox