public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] tracing: Read user data from futex system call trace event
@ 2026-03-31 18:13 Steven Rostedt
  2026-03-31 18:13 ` [PATCH v3 1/3] tracing: Have futex syscall trace event show specific user data Steven Rostedt
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Steven Rostedt @ 2026-03-31 18:13 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Brian Geffon, John Stultz,
	Ian Rogers, Suleiman Souhlal

We are looking at the performance of futexes and require a bit more
information when tracing them.

The two patches here extend the system call reading of user space to
create specific handling of the futex system call. It now reads the
user space relevant data (the addr, utime and addr2), as well as
parses the flags. This adds a little smarts to the trace event as
it only shows the parameters that are relevant, as well as parses
utime as either a timespec or as val2 depending on the futex_op.

Here's an example of the new output:

 sys_futex(uaddr: 0x56196292e830 (0), FUTEX_WAKE|FUTEX_PRIVATE_FLAG)
 sys_futex(uaddr: 0x56196292e834 (0x4a7) tid: 1191, FUTEX_UNLOCK_PI|FUTEX_PRIVATE_FLAG)
 sys_futex(uaddr: 0x56196292e834 (0) tid: 0, FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
 sys_futex(uaddr: 0x56196292e830 (0), FUTEX_WAIT|FUTEX_PRIVATE_FLAG)
 sys_futex(uaddr: 0x56196292e838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, timespec: 0x7ffc1b91a9f0 (163.048528790), uaddr2: 0x56196292e834 (4aa), val3: 0)
 sys_futex(uaddr: 0x56196292e834 (0x4aa) tid: 1194, FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
 sys_futex(uaddr: 0x56196292e838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, timespec: 0x7ffc1b91a9f0 (163.048528790), uaddr2: 0x56196292e834 (800004aa), val3: 0)
 sys_futex(uaddr: 0x7f7ed6b29990 (0x4ab), FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME)
 sys_futex(uaddr: 0x56196292e834 (0x800004aa) tid: 1194 (WAITERS), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
 sys_futex(uaddr: 0x56196292e838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, timespec: 0x7ffc1b91a9f0 (163.048528790), uaddr2: 0x56196292e834 (800004aa), val3: 0)
 sys_futex(uaddr: 0x56196292e834 (0x800004aa) tid: 1194 (WAITERS), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)

Changes since v2: https://lore.kernel.org/all/20260310200954.285663884@kernel.org/

- Removed unused "buf" variable (kernel test robot)

- Iterate __futex_cmds[] make the print statement.
  Note this required exposing __futex_cmds[] to trace_syscall.c
  (Masami Hiramatsu)

- Added back val statement (with the move to futex/syscall.c the
  third parameter was dropped).

  git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
futex/core

Head SHA1: 79b0609ad15b24d0bbfb1790e17902a6c210ae69


Steven Rostedt (3):
      tracing: Have futex syscall trace event show specific user data
      tracing: Update futex syscall trace event to show more commands
      tracing: Show TID and flags for PI futex system call trace event

----
 include/linux/futex.h         |  39 ++++++-
 kernel/futex/syscalls.c       | 137 +++++++++++++++++++++---
 kernel/trace/trace_syscalls.c | 237 +++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 396 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/3] tracing: Have futex syscall trace event show specific user data
  2026-03-31 18:13 [PATCH v3 0/3] tracing: Read user data from futex system call trace event Steven Rostedt
@ 2026-03-31 18:13 ` Steven Rostedt
  2026-04-01 18:07   ` Ian Rogers
  2026-03-31 18:13 ` [PATCH v3 2/3] tracing: Update futex syscall trace event to show more commands Steven Rostedt
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2026-03-31 18:13 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Brian Geffon, John Stultz,
	Ian Rogers, Suleiman Souhlal

From: Steven Rostedt <rostedt@goodmis.org>

Add specific reporting of the futex system call. This allows for debugging
the futex code a bit easier. Instead of just showing the values passed
into the futex system call, read the value of the user space memory
pointed to by the addr parameter.

Also make the op parameter more readable by parsing the values to show
what the command is:

 futex_requeue_p-3251    [002] .....  2101.068479: sys_futex(uaddr: 0x55e79a4da834 (0x80000cb1), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG, val: 0)
 futex_requeue_p-3248    [001] .....  2101.068970: sys_futex(uaddr: 0x7f859072f990 (0xcb2), FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, val: 3250)
 futex_requeue_p-3252    [005] .....  2101.069108: sys_futex(uaddr: 0x55e79a4da838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, val: 0, timespec: 0x7ffe61076aa0, uaddr2: 0x55e79a4da834, uaddr2: 94453214586932, val3: 0)
 futex_requeue_p-3252    [005] .....  2101.069410: sys_futex(uaddr: 0x55e79a4da834 (0x80000cb1), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG, val: 0)

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v2: https://patch.msgid.link/20260310201036.542627924@kernel.org

- Removed unused "buf" variable (kernel test robot)

- Iterate __futex_cmds[] make the print statement.
  Note this required exposing __futex_cmds[] to trace_syscall.c

- Added back val statement (with the move to futex/syscall.c the
  third parameter was dropped).

 include/linux/futex.h         |   8 +-
 kernel/futex/syscalls.c       |  97 +++++++++++++++++++++++
 kernel/trace/trace_syscalls.c | 144 +++++++++++++++++++++++++++++++++-
 3 files changed, 245 insertions(+), 4 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 9e9750f04980..7eaf01ff68cf 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -82,6 +82,11 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 	      u32 __user *uaddr2, u32 val2, u32 val3);
 int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4);
 
+#ifdef CONFIG_FTRACE_SYSCALLS
+void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u32 *ptr);
+extern const char * __futex_cmds[];
+#endif
+
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
 int futex_hash_allocate_default(void);
 void futex_hash_free(struct mm_struct *mm);
@@ -114,7 +119,8 @@ static inline int futex_hash_allocate_default(void)
 }
 static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
 static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
-
+static inline void futex_print_syscall(struct seq_buf *s, int nr_args,
+				       unsigned long *args, u32 *ptr) { }
 #endif
 
 #endif
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index 743c7a728237..3ba8ca017c9c 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -171,6 +171,18 @@ static __always_inline bool futex_cmd_has_timeout(u32 cmd)
 	return false;
 }
 
+static __always_inline bool futex_cmd_has_addr2(u32 cmd)
+{
+	switch (cmd) {
+	case FUTEX_REQUEUE:
+	case FUTEX_CMP_REQUEUE:
+	case FUTEX_WAKE_OP:
+	case FUTEX_WAIT_REQUEUE_PI:
+		return true;
+	}
+	return false;
+}
+
 static __always_inline int
 futex_init_timeout(u32 cmd, u32 op, struct timespec64 *ts, ktime_t *t)
 {
@@ -207,6 +219,91 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 	return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
 }
 
+#ifdef CONFIG_FTRACE_SYSCALLS
+
+/* End with NULL for iterators */
+const char * __futex_cmds[] =
+{
+	"FUTEX_WAIT", "FUTEX_WAKE", "FUTEX_FD", "FUTEX_REQUEUE",
+	"FUTEX_CMP_REQUEUE", "FUTEX_WAKE_OP", "FUTEX_LOCK_PI",
+	"FUTEX_UNLOCK_PI", "FUTEX_TRYLOCK_PI", "FUTEX_WAIT_BITSET",
+	"FUTEX_WAKE_BITSET", "FUTEX_WAIT_REQUEUE_PI", "FUTEX_CMP_REQUEUE_PI",
+	"FUTEX_LOCK_PI2", NULL
+};
+
+void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u32 *ptr)
+{
+	unsigned int op, cmd;
+	bool done = false;
+
+	for (int i = 0; !done && i < nr_args; i++) {
+
+		if (seq_buf_has_overflowed(s))
+			break;
+
+		switch (i) {
+		case 0:
+			seq_buf_printf(s, "uaddr: 0x%lx", args[i]);
+			if (ptr) {
+				u32 val = *ptr;
+				if (val < 10)
+					seq_buf_printf(s, " (%u)", val);
+				else
+					seq_buf_printf(s, " (0x%x)", val);
+			}
+			continue;
+		case 1:
+			op = args[i];
+			cmd = op & FUTEX_CMD_MASK;
+			if (cmd <= FUTEX_LOCK_PI2)
+				seq_buf_printf(s, ", %s", __futex_cmds[cmd]);
+			else
+				seq_buf_puts(s, ", UNKNOWN");
+
+			if (op & FUTEX_PRIVATE_FLAG)
+				seq_buf_puts(s, "|FUTEX_PRIVATE_FLAG");
+			if (op & FUTEX_CLOCK_REALTIME)
+				seq_buf_puts(s, "|FUTEX_CLOCK_REALTIME");
+			continue;
+		case 2:
+			if (args[i] < 10)
+				seq_buf_printf(s, ", val: %ld", args[i]);
+			else
+				seq_buf_printf(s, ", val: 0x%lx", args[i]);
+			continue;
+		case 3:
+			if (!futex_cmd_has_timeout(cmd)) {
+
+				if (!futex_cmd_has_addr2(cmd)) {
+					done = true;
+					continue;
+				}
+
+				seq_buf_printf(s, ", val2: 0x%x", (u32)(long)args[i]);
+				continue;
+			}
+
+			if (!args[i])
+				continue;
+
+			seq_buf_printf(s, ", timespec: 0x%lx", args[i]);
+			continue;
+		case 4:
+			if (!futex_cmd_has_addr2(cmd)) {
+				done = true;
+				continue;
+			}
+			seq_buf_printf(s, ", uaddr2: 0x%lx", args[i]);
+			continue;
+		case 5:
+			seq_buf_printf(s, ", val3: %lu", args[i]);
+			done = true;
+			continue;
+		}
+	}
+}
+#endif
+
 /**
  * futex_parse_waitv - Parse a waitv array from userspace
  * @futexv:	Kernel side list of waiters to be filled
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 37317b81fcda..f8213d772f89 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -6,11 +6,13 @@
 #include <linux/slab.h>
 #include <linux/kernel.h>
 #include <linux/module.h>	/* for MODULE_NAME_LEN via KSYM_SYMBOL_LEN */
+#include <linux/futex.h>
 #include <linux/ftrace.h>
 #include <linux/perf_event.h>
 #include <linux/xarray.h>
 #include <asm/syscall.h>
 
+
 #include "trace_output.h"
 #include "trace.h"
 
@@ -237,6 +239,27 @@ sys_enter_openat_print(struct syscall_trace_enter *trace, struct syscall_metadat
 	return trace_handle_return(s);
 }
 
+static enum print_line_t
+sys_enter_futex_print(struct syscall_trace_enter *trace, struct syscall_metadata *entry,
+		      struct trace_seq *s, struct trace_event *event, int ent_size)
+{
+	void *end = (void *)trace + ent_size;
+	void *ptr;
+
+	/* Set ptr to the user space copied area */
+	ptr = (void *)trace->args + sizeof(unsigned long) * entry->nb_args;
+	if (ptr + 4 > end)
+		ptr = NULL;
+
+	trace_seq_printf(s, "%s(", entry->name);
+
+	futex_print_syscall(&s->seq, entry->nb_args, trace->args, ptr);
+
+	trace_seq_puts(s, ")\n");
+
+	return trace_handle_return(s);
+}
+
 static enum print_line_t
 print_syscall_enter(struct trace_iterator *iter, int flags,
 		    struct trace_event *event)
@@ -267,6 +290,10 @@ print_syscall_enter(struct trace_iterator *iter, int flags,
 		if (!tr || !(tr->trace_flags & TRACE_ITER(VERBOSE)))
 			return sys_enter_openat_print(trace, entry, s, event);
 		break;
+	case __NR_futex:
+		if (!tr || !(tr->trace_flags & TRACE_ITER(VERBOSE)))
+			return sys_enter_futex_print(trace, entry, s, event, iter->ent_size);
+		break;
 	default:
 		break;
 	}
@@ -437,6 +464,48 @@ sys_enter_openat_print_fmt(struct syscall_metadata *entry, char *buf, int len)
 	return pos;
 }
 
+static int __init
+sys_enter_futex_print_fmt(struct syscall_metadata *entry, char *buf, int len)
+{
+	int pos = 0;
+
+	pos += snprintf(buf + pos, LEN_OR_ZERO,
+			"\"uaddr: 0x%%lx (0x%%lx) cmd=%%s%%s%%s");
+	pos += snprintf(buf + pos, LEN_OR_ZERO,
+			"  val: 0x%%x timeout/val2: 0x%%llx");
+	pos += snprintf(buf + pos, LEN_OR_ZERO,
+			" uaddr2: 0x%%lx val3: 0x%%x\", ");
+
+	pos += snprintf(buf + pos, LEN_OR_ZERO,
+			" REC->uaddr,");
+	pos += snprintf(buf + pos, LEN_OR_ZERO,
+			" REC->__value,");
+	pos += snprintf(buf + pos, LEN_OR_ZERO,
+			"  __print_symbolic(REC->op & 0x%x, ", FUTEX_CMD_MASK);
+
+	for (int i = 0; __futex_cmds[i]; i++) {
+		pos += snprintf(buf + pos, LEN_OR_ZERO,
+				"%s{%d, \"%s\"} ",
+				i ? "," : "", i, __futex_cmds[i]);
+	}
+
+	pos += snprintf(buf + pos, LEN_OR_ZERO, "), ");
+
+	pos += snprintf(buf + pos, LEN_OR_ZERO,
+			" (REC->op & %d) ? \"|FUTEX_PRIVATE_FLAG\" : \"\",",
+			FUTEX_PRIVATE_FLAG);
+	pos += snprintf(buf + pos, LEN_OR_ZERO,
+			" (REC->op & %d) ? \"|FUTEX_CLOCK_REALTIME\" : \"\",",
+			FUTEX_CLOCK_REALTIME);
+
+	pos += snprintf(buf + pos, LEN_OR_ZERO,
+			" REC->val, REC->utime,");
+
+	pos += snprintf(buf + pos, LEN_OR_ZERO,
+			" REC->uaddr, REC->val3");
+	return pos;
+}
+
 static int __init
 __set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
 {
@@ -447,6 +516,8 @@ __set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
 	switch (entry->syscall_nr) {
 	case __NR_openat:
 		return sys_enter_openat_print_fmt(entry, buf, len);
+	case __NR_futex:
+		return sys_enter_futex_print_fmt(entry, buf, len);
 	default:
 		break;
 	}
@@ -523,6 +594,21 @@ static void __init free_syscall_print_fmt(struct trace_event_call *call)
 		kfree(call->print_fmt);
 }
 
+static int __init futex_fields(struct trace_event_call *call, int offset)
+{
+	char *arg;
+	int ret;
+
+	arg = kstrdup("__value", GFP_KERNEL);
+	if (WARN_ON_ONCE(!arg))
+		return -ENOMEM;
+	ret = trace_define_field(call, "u32", arg, offset, sizeof(int), 0,
+				 FILTER_OTHER);
+	if (ret)
+		kfree(arg);
+	return ret;
+}
+
 static int __init syscall_enter_define_fields(struct trace_event_call *call)
 {
 	struct syscall_trace_enter trace;
@@ -544,6 +630,9 @@ static int __init syscall_enter_define_fields(struct trace_event_call *call)
 		offset += sizeof(unsigned long);
 	}
 
+	if (!ret && meta->syscall_nr == __NR_futex)
+		return futex_fields(call, offset);
+
 	if (ret || !meta->user_mask)
 		return ret;
 
@@ -689,6 +778,45 @@ static int syscall_copy_user_array(char *buf, const char __user *ptr,
 	return 0;
 }
 
+static int
+syscall_get_futex(unsigned long *args, char **buffer, int *size, int buf_size)
+{
+	struct syscall_user_buffer *sbuf;
+	const char __user *ptr;
+
+	/* buf_size of zero means user doesn't want user space read */
+	if (!buf_size)
+		return -1;
+
+	/* If the syscall_buffer is NULL, tracing is being shutdown */
+	sbuf = READ_ONCE(syscall_buffer);
+	if (!sbuf)
+		return -1;
+
+	ptr = (char __user *)args[0];
+
+	*buffer = trace_user_fault_read(&sbuf->buf, ptr, 4, NULL, NULL);
+	if (!*buffer)
+		return -1;
+
+	/* Add room for the value */
+	*size += 4;
+
+	return 0;
+}
+
+static void syscall_put_futex(struct syscall_metadata *sys_data,
+			      struct syscall_trace_enter *entry,
+			      char *buffer)
+{
+	u32 *ptr;
+
+	/* Place the futex key into the storage */
+	ptr = (void *)entry->args + sizeof(unsigned long) * sys_data->nb_args;
+
+	*ptr = *(u32 *)buffer;
+}
+
 static char *sys_fault_user(unsigned int buf_size,
 			    struct syscall_metadata *sys_data,
 			    struct syscall_user_buffer *sbuf,
@@ -905,6 +1033,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 		if (syscall_get_data(sys_data, args, &user_ptr,
 				     &size, user_sizes, &uargs, tr->syscall_buf_sz) < 0)
 			return;
+	} else if (syscall_nr == __NR_futex) {
+		if (syscall_get_futex(args, &user_ptr, &size, tr->syscall_buf_sz) < 0)
+			return;
 	}
 
 	size += sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
@@ -921,6 +1052,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 	if (mayfault)
 		syscall_put_data(sys_data, entry, user_ptr, size, user_sizes, uargs);
 
+	else if (syscall_nr == __NR_futex)
+		syscall_put_futex(sys_data, entry, user_ptr);
+
 	trace_event_buffer_commit(&fbuffer);
 }
 
@@ -971,14 +1105,18 @@ static int reg_event_syscall_enter(struct trace_event_file *file,
 {
 	struct syscall_metadata *sys_data = call->data;
 	struct trace_array *tr = file->tr;
+	bool enable_faults;
 	int ret = 0;
 	int num;
 
 	num = sys_data->syscall_nr;
 	if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
 		return -ENOSYS;
+
+	enable_faults = sys_data->user_mask || num == __NR_futex;
+
 	guard(mutex)(&syscall_trace_lock);
-	if (sys_data->user_mask) {
+	if (enable_faults) {
 		ret = syscall_fault_buffer_enable();
 		if (ret < 0)
 			return ret;
@@ -986,7 +1124,7 @@ static int reg_event_syscall_enter(struct trace_event_file *file,
 	if (!tr->sys_refcount_enter) {
 		ret = register_trace_sys_enter(ftrace_syscall_enter, tr);
 		if (ret < 0) {
-			if (sys_data->user_mask)
+			if (enable_faults)
 				syscall_fault_buffer_disable();
 			return ret;
 		}
@@ -1011,7 +1149,7 @@ static void unreg_event_syscall_enter(struct trace_event_file *file,
 	WRITE_ONCE(tr->enter_syscall_files[num], NULL);
 	if (!tr->sys_refcount_enter)
 		unregister_trace_sys_enter(ftrace_syscall_enter, tr);
-	if (sys_data->user_mask)
+	if (sys_data->user_mask || num == __NR_futex)
 		syscall_fault_buffer_disable();
 }
 
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/3] tracing: Update futex syscall trace event to show more commands
  2026-03-31 18:13 [PATCH v3 0/3] tracing: Read user data from futex system call trace event Steven Rostedt
  2026-03-31 18:13 ` [PATCH v3 1/3] tracing: Have futex syscall trace event show specific user data Steven Rostedt
@ 2026-03-31 18:13 ` Steven Rostedt
  2026-03-31 18:13 ` [PATCH v3 3/3] tracing: Show TID and flags for PI futex system call trace event Steven Rostedt
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2026-03-31 18:13 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Brian Geffon, John Stultz,
	Ian Rogers, Suleiman Souhlal

From: Steven Rostedt <rostedt@goodmis.org>

Make the futex syscall trace event a little more smart. Have it read the
futex_op instruction to determine what else it can save and print. For the
appropriate options, it will read the utime (timespec) parameter and show
its output as well as the uaddr2.

 futex_requeue_p-1154    [004] .....   144.568339: sys_futex(uaddr: 0x5652b178d834 (0x482), FUTEX_UNLOCK_PI|FUTEX_PRIVATE_FLAG, val: 0)
 futex_requeue_p-1162    [002] .....   144.568696: sys_futex(uaddr: 0x7f763b7fece0 (2), FUTEX_WAIT|FUTEX_PRIVATE_FLAG, val: 2)
 futex_requeue_p-1151    [000] .....   144.568700: sys_futex(uaddr: 0x7f763b7fece0 (0), FUTEX_WAKE|FUTEX_PRIVATE_FLAG, val: 1)
 futex_requeue_p-1162    [002] .....   144.568705: sys_futex(uaddr: 0x7f763b7fece0 (0), FUTEX_WAKE|FUTEX_PRIVATE_FLAG, val: 1)
 futex_requeue_p-1151    [000] .....   144.568715: sys_futex(uaddr: 0x7f764369e990 (0x483), FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, val: 1155)
 futex_requeue_p-1155    [005] .....   144.569420: sys_futex(uaddr: 0x5652b178d838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, val: 0, timespec: 0x7ffdacfba500 (143.890024054), uaddr2: 0x5652b178d834 (0), val3: 0)
 futex_requeue_p-1155    [005] .....   144.569454: sys_futex(uaddr: 0x5652b178d834 (0), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG, val: 0)

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 include/linux/futex.h         |  35 ++++++++-
 kernel/futex/syscalls.c       |  48 ++++++-------
 kernel/trace/trace_syscalls.c | 129 +++++++++++++++++++++++++++++-----
 3 files changed, 164 insertions(+), 48 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 7eaf01ff68cf..ec0d9cfa8a59 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -82,8 +82,35 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 	      u32 __user *uaddr2, u32 val2, u32 val3);
 int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4);
 
+static __always_inline bool futex_cmd_has_timeout(u32 cmd)
+{
+	switch (cmd) {
+	case FUTEX_WAIT:
+	case FUTEX_LOCK_PI:
+	case FUTEX_LOCK_PI2:
+	case FUTEX_WAIT_BITSET:
+	case FUTEX_WAIT_REQUEUE_PI:
+		return true;
+	}
+	return false;
+}
+
+static __always_inline bool futex_cmd_has_addr2(u32 cmd)
+{
+	switch (cmd) {
+	case FUTEX_REQUEUE:
+	case FUTEX_CMP_REQUEUE:
+	case FUTEX_WAKE_OP:
+	case FUTEX_WAIT_REQUEUE_PI:
+		return true;
+	}
+	return false;
+}
+
 #ifdef CONFIG_FTRACE_SYSCALLS
-void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u32 *ptr);
+void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args,
+			 u32 *ptr1, u32 *ptr2, unsigned long *ts1,
+			 unsigned long *ts2);
 extern const char * __futex_cmds[];
 #endif
 
@@ -120,7 +147,11 @@ static inline int futex_hash_allocate_default(void)
 static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
 static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
 static inline void futex_print_syscall(struct seq_buf *s, int nr_args,
-				       unsigned long *args, u32 *ptr) { }
+				       unsigned long *args, u32 *ptr1,
+				       u32 *ptr2, unsigned long *ts1,
+				       unsigned long *ts2) { }
+static __always_inline bool futex_cmd_has_timeout(u32 cmd) { return false; }
+static __always_inline bool futex_cmd_has_addr2(u32 cmd) { return false; }
 #endif
 
 #endif
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index 3ba8ca017c9c..265ec0a236d0 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -158,31 +158,6 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 	return -ENOSYS;
 }
 
-static __always_inline bool futex_cmd_has_timeout(u32 cmd)
-{
-	switch (cmd) {
-	case FUTEX_WAIT:
-	case FUTEX_LOCK_PI:
-	case FUTEX_LOCK_PI2:
-	case FUTEX_WAIT_BITSET:
-	case FUTEX_WAIT_REQUEUE_PI:
-		return true;
-	}
-	return false;
-}
-
-static __always_inline bool futex_cmd_has_addr2(u32 cmd)
-{
-	switch (cmd) {
-	case FUTEX_REQUEUE:
-	case FUTEX_CMP_REQUEUE:
-	case FUTEX_WAKE_OP:
-	case FUTEX_WAIT_REQUEUE_PI:
-		return true;
-	}
-	return false;
-}
-
 static __always_inline int
 futex_init_timeout(u32 cmd, u32 op, struct timespec64 *ts, ktime_t *t)
 {
@@ -231,7 +206,9 @@ const char * __futex_cmds[] =
 	"FUTEX_LOCK_PI2", NULL
 };
 
-void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u32 *ptr)
+void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args,
+			 u32 *ptr1, u32 *ptr2, unsigned long *ts1,
+			 unsigned long *ts2)
 {
 	unsigned int op, cmd;
 	bool done = false;
@@ -244,8 +221,8 @@ void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u3
 		switch (i) {
 		case 0:
 			seq_buf_printf(s, "uaddr: 0x%lx", args[i]);
-			if (ptr) {
-				u32 val = *ptr;
+			if (ptr1) {
+				u32 val = *ptr1;
 				if (val < 10)
 					seq_buf_printf(s, " (%u)", val);
 				else
@@ -287,6 +264,15 @@ void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u3
 				continue;
 
 			seq_buf_printf(s, ", timespec: 0x%lx", args[i]);
+
+			if (!ts1 || !ts2)
+				continue;
+
+			if (!*ts1 && !*ts2) {
+				seq_buf_puts(s, " (0)");
+				continue;
+			}
+			seq_buf_printf(s, " (%lu.%09lu)", *ts1, *ts2);
 			continue;
 		case 4:
 			if (!futex_cmd_has_addr2(cmd)) {
@@ -294,6 +280,12 @@ void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u3
 				continue;
 			}
 			seq_buf_printf(s, ", uaddr2: 0x%lx", args[i]);
+
+			if (!ptr2)
+				continue;
+
+			seq_buf_printf(s, " (%x)", *ptr2);
+
 			continue;
 		case 5:
 			seq_buf_printf(s, ", val3: %lu", args[i]);
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index f8213d772f89..0c86986ec7c4 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -239,21 +239,35 @@ sys_enter_openat_print(struct syscall_trace_enter *trace, struct syscall_metadat
 	return trace_handle_return(s);
 }
 
+struct futex_data {
+	u32		val1;
+	u32		val2;
+	unsigned long	ts1;
+	unsigned long	ts2;
+};
+
 static enum print_line_t
 sys_enter_futex_print(struct syscall_trace_enter *trace, struct syscall_metadata *entry,
 		      struct trace_seq *s, struct trace_event *event, int ent_size)
 {
+	struct futex_data *data;
 	void *end = (void *)trace + ent_size;
-	void *ptr;
+	unsigned long *ts1 = NULL, *ts2 = NULL;
+	u32 *ptr1 = NULL, *ptr2 = NULL;
 
 	/* Set ptr to the user space copied area */
-	ptr = (void *)trace->args + sizeof(unsigned long) * entry->nb_args;
-	if (ptr + 4 > end)
-		ptr = NULL;
+	data = (void *)trace->args + sizeof(unsigned long) * entry->nb_args;
+	if ((void *)data + sizeof(*data) <= end) {
+		ptr1 = &data->val1;
+		ptr2 = &data->val2;
+		ts1 = &data->ts1;
+		ts2 = &data->ts2;
+	}
 
 	trace_seq_printf(s, "%s(", entry->name);
 
-	futex_print_syscall(&s->seq, entry->nb_args, trace->args, ptr);
+	futex_print_syscall(&s->seq, entry->nb_args, trace->args, ptr1, ptr2,
+			    ts1, ts2);
 
 	trace_seq_puts(s, ")\n");
 
@@ -472,9 +486,9 @@ sys_enter_futex_print_fmt(struct syscall_metadata *entry, char *buf, int len)
 	pos += snprintf(buf + pos, LEN_OR_ZERO,
 			"\"uaddr: 0x%%lx (0x%%lx) cmd=%%s%%s%%s");
 	pos += snprintf(buf + pos, LEN_OR_ZERO,
-			"  val: 0x%%x timeout/val2: 0x%%llx");
+			"  val: 0x%%x timeout/val2: 0x%%llx (%%lu.%%lu)");
 	pos += snprintf(buf + pos, LEN_OR_ZERO,
-			" uaddr2: 0x%%lx val3: 0x%%x\", ");
+			" uaddr2: 0x%%lx (0x%%lx) val3: 0x%%x\", ");
 
 	pos += snprintf(buf + pos, LEN_OR_ZERO,
 			" REC->uaddr,");
@@ -499,10 +513,12 @@ sys_enter_futex_print_fmt(struct syscall_metadata *entry, char *buf, int len)
 			FUTEX_CLOCK_REALTIME);
 
 	pos += snprintf(buf + pos, LEN_OR_ZERO,
-			" REC->val, REC->utime,");
+			" REC->val, REC->utime, REC->__ts1, REC->__ts2,");
 
 	pos += snprintf(buf + pos, LEN_OR_ZERO,
-			" REC->uaddr, REC->val3");
+			" REC->uaddr,");
+	pos += snprintf(buf + pos, LEN_OR_ZERO,
+			" REC->__value2, REC->val3");
 	return pos;
 }
 
@@ -605,7 +621,39 @@ static int __init futex_fields(struct trace_event_call *call, int offset)
 	ret = trace_define_field(call, "u32", arg, offset, sizeof(int), 0,
 				 FILTER_OTHER);
 	if (ret)
-		kfree(arg);
+		goto free;
+	offset += sizeof(int);
+
+	arg = kstrdup("__value2", GFP_KERNEL);
+	if (WARN_ON_ONCE(!arg))
+		return -ENOMEM;
+	ret = trace_define_field(call, "u32", arg, offset, sizeof(int), 0,
+				 FILTER_OTHER);
+	if (ret)
+		goto free;
+	offset += sizeof(int);
+
+	arg = kstrdup("__ts1", GFP_KERNEL);
+	if (WARN_ON_ONCE(!arg))
+		return -ENOMEM;
+	ret = trace_define_field(call, "unsigned long", arg, offset,
+				 sizeof(unsigned long), 0, FILTER_OTHER);
+	if (ret)
+		goto free;
+	offset += sizeof(long);
+
+	arg = kstrdup("__ts2", GFP_KERNEL);
+	if (WARN_ON_ONCE(!arg))
+		return -ENOMEM;
+	ret = trace_define_field(call, "unsigned long", arg, offset,
+				 sizeof(unsigned long), 0, FILTER_OTHER);
+	if (ret)
+		goto free;
+
+	return 0;
+
+free:
+	kfree(arg);
 	return ret;
 }
 
@@ -778,11 +826,51 @@ static int syscall_copy_user_array(char *buf, const char __user *ptr,
 	return 0;
 }
 
+struct tp_futex_data {
+	u32			cmd;
+	const u32		__user *val1;
+	const u32 		__user *val2;
+	void			__user *timeout;
+};
+
+static int syscall_copy_futex(char *buf, const char __user *ptr,
+			      size_t size, void *data)
+{
+	struct tp_futex_data *tp_data = data;
+	struct futex_data *fdata = (void *)buf;
+	int cmd = tp_data->cmd & FUTEX_CMD_MASK;
+	int ret;
+
+	memset(fdata, 0, sizeof(*fdata));
+
+	if (tp_data->val1) {
+		ret = __copy_from_user(&fdata->val1, tp_data->val1, 4);
+		if (ret)
+			return -1;
+	}
+
+	if (tp_data->val2 && futex_cmd_has_addr2(cmd)) {
+		ret = __copy_from_user(&fdata->val2, tp_data->val2, 4);
+		if (ret)
+			return -1;
+	}
+
+	if (tp_data->timeout && futex_cmd_has_timeout(cmd)) {
+		/* Copies both ts1 and ts2 */
+		ret = __copy_from_user(&fdata->ts1, tp_data->timeout,
+				       sizeof(long) * 2);
+		if (ret)
+			return -1;
+	}
+
+	return 0;
+}
+
 static int
 syscall_get_futex(unsigned long *args, char **buffer, int *size, int buf_size)
 {
 	struct syscall_user_buffer *sbuf;
-	const char __user *ptr;
+	struct tp_futex_data tp_data;
 
 	/* buf_size of zero means user doesn't want user space read */
 	if (!buf_size)
@@ -793,14 +881,18 @@ syscall_get_futex(unsigned long *args, char **buffer, int *size, int buf_size)
 	if (!sbuf)
 		return -1;
 
-	ptr = (char __user *)args[0];
+	tp_data.cmd = args[1];
+	tp_data.val1 = (u32 __user *)args[0];
+	tp_data.val2 = (u32 __user *)args[4];
+	tp_data.timeout = (u64 __user *)args[3];
 
-	*buffer = trace_user_fault_read(&sbuf->buf, ptr, 4, NULL, NULL);
+	*buffer = trace_user_fault_read(&sbuf->buf, NULL, 0,
+					syscall_copy_futex, &tp_data);
 	if (!*buffer)
 		return -1;
 
-	/* Add room for the value */
-	*size += 4;
+	/* Add room for values */
+	*size += sizeof(struct futex_data);
 
 	return 0;
 }
@@ -809,12 +901,13 @@ static void syscall_put_futex(struct syscall_metadata *sys_data,
 			      struct syscall_trace_enter *entry,
 			      char *buffer)
 {
-	u32 *ptr;
+	struct futex_data *fdata = (void *)buffer;
+	struct futex_data *data;
 
 	/* Place the futex key into the storage */
-	ptr = (void *)entry->args + sizeof(unsigned long) * sys_data->nb_args;
+	data = (void *)entry->args + sizeof(unsigned long) * sys_data->nb_args;
 
-	*ptr = *(u32 *)buffer;
+	*data = *fdata;
 }
 
 static char *sys_fault_user(unsigned int buf_size,
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/3] tracing: Show TID and flags for PI futex system call trace event
  2026-03-31 18:13 [PATCH v3 0/3] tracing: Read user data from futex system call trace event Steven Rostedt
  2026-03-31 18:13 ` [PATCH v3 1/3] tracing: Have futex syscall trace event show specific user data Steven Rostedt
  2026-03-31 18:13 ` [PATCH v3 2/3] tracing: Update futex syscall trace event to show more commands Steven Rostedt
@ 2026-03-31 18:13 ` Steven Rostedt
  2026-04-01 17:15 ` [PATCH v3 0/3] tracing: Read user data from " Steven Rostedt
  2026-04-01 19:31 ` Thomas Gleixner
  4 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2026-03-31 18:13 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Brian Geffon, John Stultz,
	Ian Rogers, Suleiman Souhlal

From: Steven Rostedt <rostedt@goodmis.org>

For the futex system call trace event for FUTEX_LOCK_PI and
FUTEX_UNLOCK_PI commands, show the TID from the value (which is usually in
hex) as well as translate the flags (DIED and WAITERS).

 pi_mutex_hammer-1098    [000] .....   121.876928: sys_futex(uaddr: 0x560f40cc8180 (0x450) tid: 1104, FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG, val: 0, timespec: 0x7f2f9d4b1e50 (0.000100000))
 pi_mutex_hammer-1128    [000] .....   121.877120: sys_futex(uaddr: 0x560f40cc8180 (0x8000042a) tid: 1066 (WAITERS), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG, val: 0, timespec: 0x7f2f8e493e50 (0.000100000))
 pi_mutex_hammer-1106    [000] .....   121.877242: sys_futex(uaddr: 0x560f40cc8180 (0x80000452) tid: 1106 (WAITERS), FUTEX_UNLOCK_PI|FUTEX_PRIVATE_FLAG, val: 0)

This makes it easier to see the hand off of a mutex and who the owner was.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/futex/syscalls.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index 265ec0a236d0..8144781a9ff0 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -213,6 +213,9 @@ void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args,
 	unsigned int op, cmd;
 	bool done = false;
 
+	op = args[1];
+	cmd = op & FUTEX_CMD_MASK;
+
 	for (int i = 0; !done && i < nr_args; i++) {
 
 		if (seq_buf_has_overflowed(s))
@@ -227,11 +230,30 @@ void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args,
 					seq_buf_printf(s, " (%u)", val);
 				else
 					seq_buf_printf(s, " (0x%x)", val);
+
+				switch(cmd) {
+				case FUTEX_LOCK_PI:
+				case FUTEX_UNLOCK_PI:
+					seq_buf_printf(s, " tid: %d",
+						       val & FUTEX_TID_MASK);
+
+					if (!(val & (FUTEX_OWNER_DIED|FUTEX_WAITERS)))
+						break;
+
+					seq_buf_puts(s, " (");
+					if (val & FUTEX_WAITERS)
+						seq_buf_puts(s, "WAITERS");
+					if (val & FUTEX_OWNER_DIED) {
+						if (op & FUTEX_WAITERS)
+							seq_buf_putc(s, '|');
+						seq_buf_puts(s, "DIED");
+					}
+					seq_buf_putc(s, ')');
+					break;
+				}
 			}
 			continue;
 		case 1:
-			op = args[i];
-			cmd = op & FUTEX_CMD_MASK;
 			if (cmd <= FUTEX_LOCK_PI2)
 				seq_buf_printf(s, ", %s", __futex_cmds[cmd]);
 			else
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/3] tracing: Read user data from futex system call trace event
  2026-03-31 18:13 [PATCH v3 0/3] tracing: Read user data from futex system call trace event Steven Rostedt
                   ` (2 preceding siblings ...)
  2026-03-31 18:13 ` [PATCH v3 3/3] tracing: Show TID and flags for PI futex system call trace event Steven Rostedt
@ 2026-04-01 17:15 ` Steven Rostedt
  2026-04-01 19:31 ` Thomas Gleixner
  4 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2026-04-01 17:15 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, Peter Zijlstra
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Thomas Gleixner, Brian Geffon, John Stultz, Ian Rogers,
	Suleiman Souhlal


Peter,

Any thoughts on this version?

-- Steve


On Tue, 31 Mar 2026 14:13:49 -0400
Steven Rostedt <rostedt@kernel.org> wrote:

> We are looking at the performance of futexes and require a bit more
> information when tracing them.
> 
> The two patches here extend the system call reading of user space to
> create specific handling of the futex system call. It now reads the
> user space relevant data (the addr, utime and addr2), as well as
> parses the flags. This adds a little smarts to the trace event as
> it only shows the parameters that are relevant, as well as parses
> utime as either a timespec or as val2 depending on the futex_op.
> 
> Here's an example of the new output:
> 
>  sys_futex(uaddr: 0x56196292e830 (0), FUTEX_WAKE|FUTEX_PRIVATE_FLAG)
>  sys_futex(uaddr: 0x56196292e834 (0x4a7) tid: 1191, FUTEX_UNLOCK_PI|FUTEX_PRIVATE_FLAG)
>  sys_futex(uaddr: 0x56196292e834 (0) tid: 0, FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
>  sys_futex(uaddr: 0x56196292e830 (0), FUTEX_WAIT|FUTEX_PRIVATE_FLAG)
>  sys_futex(uaddr: 0x56196292e838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, timespec: 0x7ffc1b91a9f0 (163.048528790), uaddr2: 0x56196292e834 (4aa), val3: 0)
>  sys_futex(uaddr: 0x56196292e834 (0x4aa) tid: 1194, FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
>  sys_futex(uaddr: 0x56196292e838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, timespec: 0x7ffc1b91a9f0 (163.048528790), uaddr2: 0x56196292e834 (800004aa), val3: 0)
>  sys_futex(uaddr: 0x7f7ed6b29990 (0x4ab), FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME)
>  sys_futex(uaddr: 0x56196292e834 (0x800004aa) tid: 1194 (WAITERS), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
>  sys_futex(uaddr: 0x56196292e838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, timespec: 0x7ffc1b91a9f0 (163.048528790), uaddr2: 0x56196292e834 (800004aa), val3: 0)
>  sys_futex(uaddr: 0x56196292e834 (0x800004aa) tid: 1194 (WAITERS), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
> 
> Changes since v2: https://lore.kernel.org/all/20260310200954.285663884@kernel.org/
> 
> - Removed unused "buf" variable (kernel test robot)
> 
> - Iterate __futex_cmds[] make the print statement.
>   Note this required exposing __futex_cmds[] to trace_syscall.c
>   (Masami Hiramatsu)
> 
> - Added back val statement (with the move to futex/syscall.c the
>   third parameter was dropped).
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
> futex/core
> 
> Head SHA1: 79b0609ad15b24d0bbfb1790e17902a6c210ae69
> 
> 
> Steven Rostedt (3):
>       tracing: Have futex syscall trace event show specific user data
>       tracing: Update futex syscall trace event to show more commands
>       tracing: Show TID and flags for PI futex system call trace event
> 
> ----
>  include/linux/futex.h         |  39 ++++++-
>  kernel/futex/syscalls.c       | 137 +++++++++++++++++++++---
>  kernel/trace/trace_syscalls.c | 237 +++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 396 insertions(+), 17 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/3] tracing: Have futex syscall trace event show specific user data
  2026-03-31 18:13 ` [PATCH v3 1/3] tracing: Have futex syscall trace event show specific user data Steven Rostedt
@ 2026-04-01 18:07   ` Ian Rogers
  2026-04-01 18:17     ` Steven Rostedt
  0 siblings, 1 reply; 11+ messages in thread
From: Ian Rogers @ 2026-04-01 18:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Brian Geffon, John Stultz, Suleiman Souhlal

On Tue, Mar 31, 2026 at 11:13 AM Steven Rostedt <rostedt@kernel.org> wrote:
>
> From: Steven Rostedt <rostedt@goodmis.org>
>
> Add specific reporting of the futex system call. This allows for debugging
> the futex code a bit easier. Instead of just showing the values passed
> into the futex system call, read the value of the user space memory
> pointed to by the addr parameter.
>
> Also make the op parameter more readable by parsing the values to show
> what the command is:
>
>  futex_requeue_p-3251    [002] .....  2101.068479: sys_futex(uaddr: 0x55e79a4da834 (0x80000cb1), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG, val: 0)
>  futex_requeue_p-3248    [001] .....  2101.068970: sys_futex(uaddr: 0x7f859072f990 (0xcb2), FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, val: 3250)

nit: Would this read better if `val` were also hex?
futex_requeue_p-3248    [001] .....  2101.068970: sys_futex(uaddr:
0x7f859072f990 (0xcb2), FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, val:
0xcb2)
The val matching the value in uaddr is the condition necessary for the
futex to sleep.

>  futex_requeue_p-3252    [005] .....  2101.069108: sys_futex(uaddr: 0x55e79a4da838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, val: 0, timespec: 0x7ffe61076aa0, uaddr2: 0x55e79a4da834, uaddr2: 94453214586932, val3: 0)
>  futex_requeue_p-3252    [005] .....  2101.069410: sys_futex(uaddr: 0x55e79a4da834 (0x80000cb1), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG, val: 0)
>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> Changes since v2: https://patch.msgid.link/20260310201036.542627924@kernel.org
>
> - Removed unused "buf" variable (kernel test robot)
>
> - Iterate __futex_cmds[] make the print statement.
>   Note this required exposing __futex_cmds[] to trace_syscall.c
>
> - Added back val statement (with the move to futex/syscall.c the
>   third parameter was dropped).
>
>  include/linux/futex.h         |   8 +-
>  kernel/futex/syscalls.c       |  97 +++++++++++++++++++++++
>  kernel/trace/trace_syscalls.c | 144 +++++++++++++++++++++++++++++++++-
>  3 files changed, 245 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/futex.h b/include/linux/futex.h
> index 9e9750f04980..7eaf01ff68cf 100644
> --- a/include/linux/futex.h
> +++ b/include/linux/futex.h
> @@ -82,6 +82,11 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
>               u32 __user *uaddr2, u32 val2, u32 val3);
>  int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4);
>
> +#ifdef CONFIG_FTRACE_SYSCALLS
> +void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u32 *ptr);
> +extern const char * __futex_cmds[];
> +#endif
> +
>  #ifdef CONFIG_FUTEX_PRIVATE_HASH
>  int futex_hash_allocate_default(void);
>  void futex_hash_free(struct mm_struct *mm);
> @@ -114,7 +119,8 @@ static inline int futex_hash_allocate_default(void)
>  }
>  static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
>  static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
> -
> +static inline void futex_print_syscall(struct seq_buf *s, int nr_args,
> +                                      unsigned long *args, u32 *ptr) { }
>  #endif
>
>  #endif
> diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
> index 743c7a728237..3ba8ca017c9c 100644
> --- a/kernel/futex/syscalls.c
> +++ b/kernel/futex/syscalls.c
> @@ -171,6 +171,18 @@ static __always_inline bool futex_cmd_has_timeout(u32 cmd)
>         return false;
>  }
>
> +static __always_inline bool futex_cmd_has_addr2(u32 cmd)
> +{
> +       switch (cmd) {
> +       case FUTEX_REQUEUE:
> +       case FUTEX_CMP_REQUEUE:
> +       case FUTEX_WAKE_OP:
> +       case FUTEX_WAIT_REQUEUE_PI:
> +               return true;
> +       }
> +       return false;
> +}
> +
>  static __always_inline int
>  futex_init_timeout(u32 cmd, u32 op, struct timespec64 *ts, ktime_t *t)
>  {
> @@ -207,6 +219,91 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
>         return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
>  }
>
> +#ifdef CONFIG_FTRACE_SYSCALLS
> +
> +/* End with NULL for iterators */
> +const char * __futex_cmds[] =
> +{
> +       "FUTEX_WAIT", "FUTEX_WAKE", "FUTEX_FD", "FUTEX_REQUEUE",
> +       "FUTEX_CMP_REQUEUE", "FUTEX_WAKE_OP", "FUTEX_LOCK_PI",
> +       "FUTEX_UNLOCK_PI", "FUTEX_TRYLOCK_PI", "FUTEX_WAIT_BITSET",
> +       "FUTEX_WAKE_BITSET", "FUTEX_WAIT_REQUEUE_PI", "FUTEX_CMP_REQUEUE_PI",
> +       "FUTEX_LOCK_PI2", NULL
> +};
> +
> +void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u32 *ptr)
> +{
> +       unsigned int op, cmd;
> +       bool done = false;
> +
> +       for (int i = 0; !done && i < nr_args; i++) {
> +
> +               if (seq_buf_has_overflowed(s))
> +                       break;
> +
> +               switch (i) {
> +               case 0:
> +                       seq_buf_printf(s, "uaddr: 0x%lx", args[i]);
> +                       if (ptr) {
> +                               u32 val = *ptr;
> +                               if (val < 10)
> +                                       seq_buf_printf(s, " (%u)", val);
> +                               else
> +                                       seq_buf_printf(s, " (0x%x)", val);
> +                       }
> +                       continue;
> +               case 1:
> +                       op = args[i];
> +                       cmd = op & FUTEX_CMD_MASK;
> +                       if (cmd <= FUTEX_LOCK_PI2)
> +                               seq_buf_printf(s, ", %s", __futex_cmds[cmd]);
> +                       else
> +                               seq_buf_puts(s, ", UNKNOWN");
> +
> +                       if (op & FUTEX_PRIVATE_FLAG)
> +                               seq_buf_puts(s, "|FUTEX_PRIVATE_FLAG");
> +                       if (op & FUTEX_CLOCK_REALTIME)
> +                               seq_buf_puts(s, "|FUTEX_CLOCK_REALTIME");
> +                       continue;
> +               case 2:
> +                       if (args[i] < 10)
> +                               seq_buf_printf(s, ", val: %ld", args[i]);
> +                       else
> +                               seq_buf_printf(s, ", val: 0x%lx", args[i]);

I'm confused why this hex formatting didn't appear in the sample output.

Thanks,
Ian

> +                       continue;
> +               case 3:
> +                       if (!futex_cmd_has_timeout(cmd)) {
> +
> +                               if (!futex_cmd_has_addr2(cmd)) {
> +                                       done = true;
> +                                       continue;
> +                               }
> +
> +                               seq_buf_printf(s, ", val2: 0x%x", (u32)(long)args[i]);
> +                               continue;
> +                       }
> +
> +                       if (!args[i])
> +                               continue;
> +
> +                       seq_buf_printf(s, ", timespec: 0x%lx", args[i]);
> +                       continue;
> +               case 4:
> +                       if (!futex_cmd_has_addr2(cmd)) {
> +                               done = true;
> +                               continue;
> +                       }
> +                       seq_buf_printf(s, ", uaddr2: 0x%lx", args[i]);
> +                       continue;
> +               case 5:
> +                       seq_buf_printf(s, ", val3: %lu", args[i]);
> +                       done = true;
> +                       continue;
> +               }
> +       }
> +}
> +#endif
> +
>  /**
>   * futex_parse_waitv - Parse a waitv array from userspace
>   * @futexv:    Kernel side list of waiters to be filled
> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 37317b81fcda..f8213d772f89 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -6,11 +6,13 @@
>  #include <linux/slab.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>      /* for MODULE_NAME_LEN via KSYM_SYMBOL_LEN */
> +#include <linux/futex.h>
>  #include <linux/ftrace.h>
>  #include <linux/perf_event.h>
>  #include <linux/xarray.h>
>  #include <asm/syscall.h>
>
> +
>  #include "trace_output.h"
>  #include "trace.h"
>
> @@ -237,6 +239,27 @@ sys_enter_openat_print(struct syscall_trace_enter *trace, struct syscall_metadat
>         return trace_handle_return(s);
>  }
>
> +static enum print_line_t
> +sys_enter_futex_print(struct syscall_trace_enter *trace, struct syscall_metadata *entry,
> +                     struct trace_seq *s, struct trace_event *event, int ent_size)
> +{
> +       void *end = (void *)trace + ent_size;
> +       void *ptr;
> +
> +       /* Set ptr to the user space copied area */
> +       ptr = (void *)trace->args + sizeof(unsigned long) * entry->nb_args;
> +       if (ptr + 4 > end)
> +               ptr = NULL;
> +
> +       trace_seq_printf(s, "%s(", entry->name);
> +
> +       futex_print_syscall(&s->seq, entry->nb_args, trace->args, ptr);
> +
> +       trace_seq_puts(s, ")\n");
> +
> +       return trace_handle_return(s);
> +}
> +
>  static enum print_line_t
>  print_syscall_enter(struct trace_iterator *iter, int flags,
>                     struct trace_event *event)
> @@ -267,6 +290,10 @@ print_syscall_enter(struct trace_iterator *iter, int flags,
>                 if (!tr || !(tr->trace_flags & TRACE_ITER(VERBOSE)))
>                         return sys_enter_openat_print(trace, entry, s, event);
>                 break;
> +       case __NR_futex:
> +               if (!tr || !(tr->trace_flags & TRACE_ITER(VERBOSE)))
> +                       return sys_enter_futex_print(trace, entry, s, event, iter->ent_size);
> +               break;
>         default:
>                 break;
>         }
> @@ -437,6 +464,48 @@ sys_enter_openat_print_fmt(struct syscall_metadata *entry, char *buf, int len)
>         return pos;
>  }
>
> +static int __init
> +sys_enter_futex_print_fmt(struct syscall_metadata *entry, char *buf, int len)
> +{
> +       int pos = 0;
> +
> +       pos += snprintf(buf + pos, LEN_OR_ZERO,
> +                       "\"uaddr: 0x%%lx (0x%%lx) cmd=%%s%%s%%s");
> +       pos += snprintf(buf + pos, LEN_OR_ZERO,
> +                       "  val: 0x%%x timeout/val2: 0x%%llx");
> +       pos += snprintf(buf + pos, LEN_OR_ZERO,
> +                       " uaddr2: 0x%%lx val3: 0x%%x\", ");
> +
> +       pos += snprintf(buf + pos, LEN_OR_ZERO,
> +                       " REC->uaddr,");
> +       pos += snprintf(buf + pos, LEN_OR_ZERO,
> +                       " REC->__value,");
> +       pos += snprintf(buf + pos, LEN_OR_ZERO,
> +                       "  __print_symbolic(REC->op & 0x%x, ", FUTEX_CMD_MASK);
> +
> +       for (int i = 0; __futex_cmds[i]; i++) {
> +               pos += snprintf(buf + pos, LEN_OR_ZERO,
> +                               "%s{%d, \"%s\"} ",
> +                               i ? "," : "", i, __futex_cmds[i]);
> +       }
> +
> +       pos += snprintf(buf + pos, LEN_OR_ZERO, "), ");
> +
> +       pos += snprintf(buf + pos, LEN_OR_ZERO,
> +                       " (REC->op & %d) ? \"|FUTEX_PRIVATE_FLAG\" : \"\",",
> +                       FUTEX_PRIVATE_FLAG);
> +       pos += snprintf(buf + pos, LEN_OR_ZERO,
> +                       " (REC->op & %d) ? \"|FUTEX_CLOCK_REALTIME\" : \"\",",
> +                       FUTEX_CLOCK_REALTIME);
> +
> +       pos += snprintf(buf + pos, LEN_OR_ZERO,
> +                       " REC->val, REC->utime,");
> +
> +       pos += snprintf(buf + pos, LEN_OR_ZERO,
> +                       " REC->uaddr, REC->val3");
> +       return pos;
> +}
> +
>  static int __init
>  __set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
>  {
> @@ -447,6 +516,8 @@ __set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
>         switch (entry->syscall_nr) {
>         case __NR_openat:
>                 return sys_enter_openat_print_fmt(entry, buf, len);
> +       case __NR_futex:
> +               return sys_enter_futex_print_fmt(entry, buf, len);
>         default:
>                 break;
>         }
> @@ -523,6 +594,21 @@ static void __init free_syscall_print_fmt(struct trace_event_call *call)
>                 kfree(call->print_fmt);
>  }
>
> +static int __init futex_fields(struct trace_event_call *call, int offset)
> +{
> +       char *arg;
> +       int ret;
> +
> +       arg = kstrdup("__value", GFP_KERNEL);
> +       if (WARN_ON_ONCE(!arg))
> +               return -ENOMEM;
> +       ret = trace_define_field(call, "u32", arg, offset, sizeof(int), 0,
> +                                FILTER_OTHER);
> +       if (ret)
> +               kfree(arg);
> +       return ret;
> +}
> +
>  static int __init syscall_enter_define_fields(struct trace_event_call *call)
>  {
>         struct syscall_trace_enter trace;
> @@ -544,6 +630,9 @@ static int __init syscall_enter_define_fields(struct trace_event_call *call)
>                 offset += sizeof(unsigned long);
>         }
>
> +       if (!ret && meta->syscall_nr == __NR_futex)
> +               return futex_fields(call, offset);
> +
>         if (ret || !meta->user_mask)
>                 return ret;
>
> @@ -689,6 +778,45 @@ static int syscall_copy_user_array(char *buf, const char __user *ptr,
>         return 0;
>  }
>
> +static int
> +syscall_get_futex(unsigned long *args, char **buffer, int *size, int buf_size)
> +{
> +       struct syscall_user_buffer *sbuf;
> +       const char __user *ptr;
> +
> +       /* buf_size of zero means user doesn't want user space read */
> +       if (!buf_size)
> +               return -1;
> +
> +       /* If the syscall_buffer is NULL, tracing is being shutdown */
> +       sbuf = READ_ONCE(syscall_buffer);
> +       if (!sbuf)
> +               return -1;
> +
> +       ptr = (char __user *)args[0];
> +
> +       *buffer = trace_user_fault_read(&sbuf->buf, ptr, 4, NULL, NULL);
> +       if (!*buffer)
> +               return -1;
> +
> +       /* Add room for the value */
> +       *size += 4;
> +
> +       return 0;
> +}
> +
> +static void syscall_put_futex(struct syscall_metadata *sys_data,
> +                             struct syscall_trace_enter *entry,
> +                             char *buffer)
> +{
> +       u32 *ptr;
> +
> +       /* Place the futex key into the storage */
> +       ptr = (void *)entry->args + sizeof(unsigned long) * sys_data->nb_args;
> +
> +       *ptr = *(u32 *)buffer;
> +}
> +
>  static char *sys_fault_user(unsigned int buf_size,
>                             struct syscall_metadata *sys_data,
>                             struct syscall_user_buffer *sbuf,
> @@ -905,6 +1033,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
>                 if (syscall_get_data(sys_data, args, &user_ptr,
>                                      &size, user_sizes, &uargs, tr->syscall_buf_sz) < 0)
>                         return;
> +       } else if (syscall_nr == __NR_futex) {
> +               if (syscall_get_futex(args, &user_ptr, &size, tr->syscall_buf_sz) < 0)
> +                       return;
>         }
>
>         size += sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
> @@ -921,6 +1052,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
>         if (mayfault)
>                 syscall_put_data(sys_data, entry, user_ptr, size, user_sizes, uargs);
>
> +       else if (syscall_nr == __NR_futex)
> +               syscall_put_futex(sys_data, entry, user_ptr);
> +
>         trace_event_buffer_commit(&fbuffer);
>  }
>
> @@ -971,14 +1105,18 @@ static int reg_event_syscall_enter(struct trace_event_file *file,
>  {
>         struct syscall_metadata *sys_data = call->data;
>         struct trace_array *tr = file->tr;
> +       bool enable_faults;
>         int ret = 0;
>         int num;
>
>         num = sys_data->syscall_nr;
>         if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
>                 return -ENOSYS;
> +
> +       enable_faults = sys_data->user_mask || num == __NR_futex;
> +
>         guard(mutex)(&syscall_trace_lock);
> -       if (sys_data->user_mask) {
> +       if (enable_faults) {
>                 ret = syscall_fault_buffer_enable();
>                 if (ret < 0)
>                         return ret;
> @@ -986,7 +1124,7 @@ static int reg_event_syscall_enter(struct trace_event_file *file,
>         if (!tr->sys_refcount_enter) {
>                 ret = register_trace_sys_enter(ftrace_syscall_enter, tr);
>                 if (ret < 0) {
> -                       if (sys_data->user_mask)
> +                       if (enable_faults)
>                                 syscall_fault_buffer_disable();
>                         return ret;
>                 }
> @@ -1011,7 +1149,7 @@ static void unreg_event_syscall_enter(struct trace_event_file *file,
>         WRITE_ONCE(tr->enter_syscall_files[num], NULL);
>         if (!tr->sys_refcount_enter)
>                 unregister_trace_sys_enter(ftrace_syscall_enter, tr);
> -       if (sys_data->user_mask)
> +       if (sys_data->user_mask || num == __NR_futex)
>                 syscall_fault_buffer_disable();
>  }
>
> --
> 2.51.0
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/3] tracing: Have futex syscall trace event show specific user data
  2026-04-01 18:07   ` Ian Rogers
@ 2026-04-01 18:17     ` Steven Rostedt
  0 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2026-04-01 18:17 UTC (permalink / raw)
  To: Ian Rogers
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Brian Geffon, John Stultz, Suleiman Souhlal

On Wed, 1 Apr 2026 11:07:07 -0700
Ian Rogers <irogers@google.com> wrote:

> nit: Would this read better if `val` were also hex?
> futex_requeue_p-3248    [001] .....  2101.068970: sys_futex(uaddr:
> 0x7f859072f990 (0xcb2), FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, val:
> 0xcb2)
> The val matching the value in uaddr is the condition necessary for the
> futex to sleep.

Good point. I can update that in v4.

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/3] tracing: Read user data from futex system call trace event
  2026-03-31 18:13 [PATCH v3 0/3] tracing: Read user data from futex system call trace event Steven Rostedt
                   ` (3 preceding siblings ...)
  2026-04-01 17:15 ` [PATCH v3 0/3] tracing: Read user data from " Steven Rostedt
@ 2026-04-01 19:31 ` Thomas Gleixner
  2026-04-01 20:13   ` Peter Zijlstra
  2026-04-01 20:19   ` Steven Rostedt
  4 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2026-04-01 19:31 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Peter Zijlstra, Brian Geffon, John Stultz, Ian Rogers,
	Suleiman Souhlal

On Tue, Mar 31 2026 at 14:13, Steven Rostedt wrote:
> We are looking at the performance of futexes and require a bit more
> information when tracing them.
>
> The two patches here extend the system call reading of user space to

s/two/three/ :)

> create specific handling of the futex system call. It now reads the
> user space relevant data (the addr, utime and addr2), as well as
> parses the flags. This adds a little smarts to the trace event as
> it only shows the parameters that are relevant, as well as parses
> utime as either a timespec or as val2 depending on the futex_op.
>
> Here's an example of the new output:
>
>  sys_futex(uaddr: 0x56196292e830 (0), FUTEX_WAKE|FUTEX_PRIVATE_FLAG)
>  sys_futex(uaddr: 0x56196292e834 (0x4a7) tid: 1191, FUTEX_UNLOCK_PI|FUTEX_PRIVATE_FLAG)
>  sys_futex(uaddr: 0x56196292e834 (0) tid: 0, FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
>  sys_futex(uaddr: 0x56196292e830 (0), FUTEX_WAIT|FUTEX_PRIVATE_FLAG)
>  sys_futex(uaddr: 0x56196292e838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, timespec: 0x7ffc1b91a9f0 (163.048528790), uaddr2: 0x56196292e834 (4aa), val3: 0)
>  sys_futex(uaddr: 0x56196292e834 (0x4aa) tid: 1194, FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
>  sys_futex(uaddr: 0x56196292e838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, timespec: 0x7ffc1b91a9f0 (163.048528790), uaddr2: 0x56196292e834 (800004aa), val3: 0)
>  sys_futex(uaddr: 0x7f7ed6b29990 (0x4ab), FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME)
>  sys_futex(uaddr: 0x56196292e834 (0x800004aa) tid: 1194 (WAITERS), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
>  sys_futex(uaddr: 0x56196292e838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, timespec: 0x7ffc1b91a9f0 (163.048528790), uaddr2: 0x56196292e834 (800004aa), val3: 0)
>  sys_futex(uaddr: 0x56196292e834 (0x800004aa) tid: 1194 (WAITERS), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)

I understand what you are trying to achieve, but do we really need all
the complexity of decoding and pretty printing in the kernel?

Isn't it sufficient to store and expose the raw data and use post
processing to make it readable?

I've been doing complex futex analysis for two decades with a small set
of python scripts which translate raw text or binary trace data into
human readable information.

I agree that it's useful to have the actual timeout value and other data
which is missing today, but that still does not require all this
customized printing.

The initial idea of having at least some information about the data
entry (type, meaning etc.) in $event/format and use that for kernel text
output and for user space tools to analyze a binary trace has been
definitely the right way to go.

But that now deviates because $event/format cannot carry that
information you translate to in the kernel. It will still describe raw
event data, no?

So why not keeping the well known and working solution of identifying
the data in the format, print it raw and leave the post processing to
user space tools in case there is a need.

You actually make it harder to do development. Look at the patch series
related to robust futexes:

  https://lore.kernel.org/lkml/20260330114212.927686587@kernel.org/

So your decoding:

>  sys_futex(uaddr: 0x56196292e830 (0), FUTEX_WAKE|FUTEX_PRIVATE_FLAG)

fails to decode the new flag and the usage of uaddr2 unless I go and add
it in the first place _before_ working on the code. Right now it is just
printing op as a hex value and it just works when a new bit is added.

Stick 100 lines of python into tools/tracing and be done with it. I'm
happy to contribute to that.

Aside of that:

  Putting the decoder (futex_print_syscall) into the futex code itself
  is admittedly a smart move to offload the work of keeping that up to
  date to the people who are actually working on futexes.

TBH, I'm not interested to deal with that at all. If you want this
ftrace magic pretty printing, then stick it into kernel/trace or if
there is a real technical reason (hint there is none) into
kernel/futex/trace.c and take ownership of it. But please do not burden
others with your fancy toy of the day.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/3] tracing: Read user data from futex system call trace event
  2026-04-01 19:31 ` Thomas Gleixner
@ 2026-04-01 20:13   ` Peter Zijlstra
  2026-04-01 20:19   ` Steven Rostedt
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2026-04-01 20:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Brian Geffon, John Stultz, Ian Rogers, Suleiman Souhlal

On Wed, Apr 01, 2026 at 09:31:19PM +0200, Thomas Gleixner wrote:

> Stick 100 lines of python into tools/tracing and be done with it. I'm
> happy to contribute to that.
> 
> Aside of that:
> 
>   Putting the decoder (futex_print_syscall) into the futex code itself
>   is admittedly a smart move to offload the work of keeping that up to
>   date to the people who are actually working on futexes.
> 
> TBH, I'm not interested to deal with that at all. If you want this
> ftrace magic pretty printing, then stick it into kernel/trace or if
> there is a real technical reason (hint there is none) into
> kernel/futex/trace.c and take ownership of it. But please do not burden
> others with your fancy toy of the day.

Him putting it near futex is my fault. I really hate having this
description nonsense separate -- *IF* is has to exist at all.

That said, I was thinking of Sasha's syscall api endeavour, I feel that
if we do any of this, it should all come from the same one place.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/3] tracing: Read user data from futex system call trace event
  2026-04-01 19:31 ` Thomas Gleixner
  2026-04-01 20:13   ` Peter Zijlstra
@ 2026-04-01 20:19   ` Steven Rostedt
  2026-04-01 20:25     ` Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2026-04-01 20:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Peter Zijlstra, Brian Geffon,
	John Stultz, Ian Rogers, Suleiman Souhlal

On Wed, 01 Apr 2026 21:31:19 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Tue, Mar 31 2026 at 14:13, Steven Rostedt wrote:
> > We are looking at the performance of futexes and require a bit more
> > information when tracing them.
> >
> > The two patches here extend the system call reading of user space to  
> 
> s/two/three/ :)

Ah v1 had only two patches and this was cut and pasted from there.

> 
> I understand what you are trying to achieve, but do we really need all
> the complexity of decoding and pretty printing in the kernel?

You could say the same for most tracepoints. ;-)

> 
> Isn't it sufficient to store and expose the raw data and use post
> processing to make it readable?

Yes this is possible, and will also work too, as libtraceevent will be
updated to parse the raw data.

> 
> I've been doing complex futex analysis for two decades with a small set
> of python scripts which translate raw text or binary trace data into
> human readable information.
> 
> I agree that it's useful to have the actual timeout value and other data
> which is missing today, but that still does not require all this
> customized printing.
> 
> The initial idea of having at least some information about the data
> entry (type, meaning etc.) in $event/format and use that for kernel text
> output and for user space tools to analyze a binary trace has been
> definitely the right way to go.
> 
> But that now deviates because $event/format cannot carry that
> information you translate to in the kernel. It will still describe raw
> event data, no?

It still shows a bit:

name: sys_enter_futex
ID: 592
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:int __syscall_nr;	offset:8;	size:4;	signed:1;
	field:u32 * uaddr;	offset:16;	size:8;	signed:0;
	field:int op;	offset:24;	size:8;	signed:0;
	field:u32 val;	offset:32;	size:8;	signed:0;
	field:const struct __kernel_timespec * utime;	offset:40;	size:8;	signed:0;
	field:u32 * uaddr2;	offset:48;	size:8;	signed:0;
	field:u32 val3;	offset:56;	size:8;	signed:0;
	field:u32 __value;	offset:64;	size:4;	signed:0;
	field:u32 __value2;	offset:68;	size:4;	signed:0;
	field:unsigned long __ts1;	offset:72;	size:8;	signed:0;
	field:unsigned long __ts2;	offset:80;	size:8;	signed:0;

print fmt: "uaddr: 0x%lx (0x%lx) cmd=%s%s%s  val: 0x%x timeout/val2: 0x%llx (%lu.%lu) uaddr2: 0x%lx (0x%lx) val3: 0x%x",  REC->uaddr, REC->__value,  __print_symbolic(REC->op & 0xfffffe7f, {0, "FUTEX_WAIT"} ,{1, "FUTEX_WAKE"} ,{2, "FUTEX_FD"} ,{3, "FUTEX_REQUEUE"} ,{4, "FUTEX_CMP_REQUEUE"} ,{5, "FUTEX_WAKE_OP"} ,{6, "FUTEX_LOCK_PI"} ,{7, "FUTEX_UNLOCK_PI"} ,{8, "FUTEX_TRYLOCK_PI"} ,{9, "FUTEX_WAIT_BITSET"} ,{10, "FUTEX_WAKE_BITSET"} ,{11, "FUTEX_WAIT_REQUEUE_PI"} ,{12, "FUTEX_CMP_REQUEUE_PI"} ,{13, "FUTEX_LOCK_PI2"} ),  (REC->op & 128) ? "|FUTEX_PRIVATE_FLAG" : "", (REC->op & 256) ? "|FUTEX_CLOCK_REALTIME" : "", REC->val, REC->utime, REC->__ts1, REC->__ts2, REC->uaddr, REC->__value2, REC->val3

> 
> So why not keeping the well known and working solution of identifying
> the data in the format, print it raw and leave the post processing to
> user space tools in case there is a need.
> 
> You actually make it harder to do development. Look at the patch series
> related to robust futexes:
> 
>   https://lore.kernel.org/lkml/20260330114212.927686587@kernel.org/
> 
> So your decoding:
> 
> >  sys_futex(uaddr: 0x56196292e830 (0), FUTEX_WAKE|FUTEX_PRIVATE_FLAG)  
> 
> fails to decode the new flag and the usage of uaddr2 unless I go and add
> it in the first place _before_ working on the code. Right now it is just
> printing op as a hex value and it just works when a new bit is added.
> 
> Stick 100 lines of python into tools/tracing and be done with it. I'm
> happy to contribute to that.

Well, it would be updated for trace-cmd not tools/tracing.

> 
> Aside of that:
> 
>   Putting the decoder (futex_print_syscall) into the futex code itself
>   is admittedly a smart move to offload the work of keeping that up to
>   date to the people who are actually working on futexes.
> 
> TBH, I'm not interested to deal with that at all. If you want this
> ftrace magic pretty printing, then stick it into kernel/trace or if
> there is a real technical reason (hint there is none) into
> kernel/futex/trace.c and take ownership of it. But please do not burden
> others with your fancy toy of the day.

v1 kept it all within the tracing subsystem, but Peter suggested that it be
closer to the syscall:

   https://lore.kernel.org/all/20260304090748.GO606826@noisy.programming.kicks-ass.net/

I'm happy to put it back and maintain it separately.

Or I can just keep the simple bits (the reading of user space), and not do
all the more fancy formatting. Basically dropping patch 2 and 3.

I've been using trace-cmd start / show for testing. But I could also move
the logic to libtraceevent, which would require using trace-cmd record
instead.

How much are you against the full series? Are you OK with it if it stays
within the tracing subsystem? Or would you prefer just keeping with patch 1
and dropping the other patches and doing that work in libtraceevent?

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/3] tracing: Read user data from futex system call trace event
  2026-04-01 20:19   ` Steven Rostedt
@ 2026-04-01 20:25     ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2026-04-01 20:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Brian Geffon, John Stultz, Ian Rogers, Suleiman Souhlal

On Wed, Apr 01, 2026 at 04:19:19PM -0400, Steven Rostedt wrote:

> Well, it would be updated for trace-cmd not tools/tracing.

So I tried using that a while ago:

  https://lkml.kernel.org/r/20260116181524.GF831285@noisy.programming.kicks-ass.net

and I found it really rough.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-04-01 20:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31 18:13 [PATCH v3 0/3] tracing: Read user data from futex system call trace event Steven Rostedt
2026-03-31 18:13 ` [PATCH v3 1/3] tracing: Have futex syscall trace event show specific user data Steven Rostedt
2026-04-01 18:07   ` Ian Rogers
2026-04-01 18:17     ` Steven Rostedt
2026-03-31 18:13 ` [PATCH v3 2/3] tracing: Update futex syscall trace event to show more commands Steven Rostedt
2026-03-31 18:13 ` [PATCH v3 3/3] tracing: Show TID and flags for PI futex system call trace event Steven Rostedt
2026-04-01 17:15 ` [PATCH v3 0/3] tracing: Read user data from " Steven Rostedt
2026-04-01 19:31 ` Thomas Gleixner
2026-04-01 20:13   ` Peter Zijlstra
2026-04-01 20:19   ` Steven Rostedt
2026-04-01 20:25     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox