[PATCH v2 0/2] perf trace: Refactor augmented_raw_syscalls using bpf

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [PATCH v2 0/2] perf trace: Refactor augmented_raw_syscalls using bpf_for
@ 2026-07-03 10:32 Viktor Malik
  2026-07-03 10:32 ` [PATCH v2 1/2] perf trace: Factor out BPF loop body Viktor Malik
  2026-07-03 10:32 ` [PATCH v2 2/2] perf trace: Refactor augmented_raw_syscalls using bpf_for Viktor Malik
  0 siblings, 2 replies; 5+ messages in thread
From: Viktor Malik @ 2026-07-03 10:32 UTC (permalink / raw)
  To: linux-perf-users
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Viktor Malik, Howard Chu,
	linux-kernel, bpf, Michael Petlan

The loop for processing syscall args in augment_raw_syscalls has a
history of breaking with Clang updates. In the past, we've seen it break
(i.e. stop passing the BPF verifier) between Clang 15 and 16 and now a
similar thing happened between Clang 21 and 22. While the issue is
mitigated on the main line by a recent verifier update, it remains
broken on the 6.12 and 6.18 stable branches, effectively breaking
`perf trace`.

The reason is that the loop is quite complex and the BPF verifier often
struggles to prove that it terminates.

This series fixes the issue by replacing the standard for loop by the
bpf_for macro, which uses numeric BPF iterator. This should prevent
future breakages of this kind since the verifier has much easier job
proving that the loop terminates. Small adjustments were necessary for
the loop to make it work, see the second commit message for details.

To keep perf compatible with older kernels, the first commit factors out
the loop body into a function, which is then called either from bpf_for
or from a standard for loop, depending on whether BPF numeric iterators
are available.

Changes from v1:
- v1: https://lore.kernel.org/bpf/akWqIfWPMCdaGgGg@google.com/T/
- Use bpf_for instead of bpf_loop (suggested by Alexei and Andrii)
- Keep the change backwards compatible with older kernels (required by
  Namhyung)

Viktor Malik (2):
  perf trace: Factor out BPF loop body
  perf trace: Refactor augmented_raw_syscalls using bpf_for

 .../bpf_skel/augmented_raw_syscalls.bpf.c     | 155 +++++++++++-------
 1 file changed, 98 insertions(+), 57 deletions(-)

-- 
2.54.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] perf trace: Factor out BPF loop body
  2026-07-03 10:32 [PATCH v2 0/2] perf trace: Refactor augmented_raw_syscalls using bpf_for Viktor Malik
@ 2026-07-03 10:32 ` Viktor Malik
  2026-07-03 23:49   ` Namhyung Kim
  2026-07-03 10:32 ` [PATCH v2 2/2] perf trace: Refactor augmented_raw_syscalls using bpf_for Viktor Malik
  1 sibling, 1 reply; 5+ messages in thread
From: Viktor Malik @ 2026-07-03 10:32 UTC (permalink / raw)
  To: linux-perf-users
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Viktor Malik, Howard Chu,
	linux-kernel, bpf, Michael Petlan, stable

The BPF program in augmented_raw_syscalls uses a for loop to iterate all
syscall arguments. The loop body is quite complex and often poses
problems for the BPF verifier. As a preparation step for addressing this
issue, factor out the loop body into a separate function.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Cc: stable@vger.kernel.org
---
 .../bpf_skel/augmented_raw_syscalls.bpf.c     | 127 ++++++++++--------
 1 file changed, 72 insertions(+), 55 deletions(-)

diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
index 2a6e61864ee0..cbdd5ce19a2f 100644
--- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
+++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
@@ -429,15 +429,79 @@ static bool pid_filter__has(struct pids_filtered *pids, pid_t pid)
 	return bpf_map_lookup_elem(pids, &pid) != NULL;
 }
 
+/*
+ * Determine what type of argument and how many bytes to read from user space, using the
+ * value in the beauty_map. This is the relation of parameter type and its corresponding
+ * value in the beauty map, and how many bytes we read eventually:
+ *
+ * string: 1			      -> size of string
+ * struct: size of struct	      -> size of struct
+ * buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_MAX_BUF)
+ */
+static inline int augment_arg(struct syscall_enter_args *args, int i,
+			      unsigned int *beauty_map, void *payload_offset)
+{
+	int index, value_size = sizeof(struct augmented_arg) - offsetof(struct augmented_arg, value);
+	s64 aug_size, size;
+	bool augmented;
+	void *arg;
+
+	arg = (void *)args->args[i];
+	augmented = false;
+	size = beauty_map[i];
+	aug_size = size; /* size of the augmented data read from user space */
+
+	if (size == 0 || arg == NULL)
+		return 0;
+
+	if (size == 1) { /* string */
+		aug_size = bpf_probe_read_user_str(((struct augmented_arg *)payload_offset)->value, value_size, arg);
+		/* minimum of 0 to pass the verifier */
+		if (aug_size < 0)
+			aug_size = 0;
+
+		augmented = true;
+	} else if (size > 0 && size <= value_size) { /* struct */
+		if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, size, arg))
+			augmented = true;
+	} else if ((int)size < 0 && size >= -6) { /* buffer */
+		index = -(size + 1);
+		barrier_var(index); // Prevent clang (noticed with v18) from removing the &= 7 trick.
+		index &= 7;	    // Satisfy the bounds checking with the verifier in some kernels.
+		aug_size = args->args[index] > TRACE_AUG_MAX_BUF ? TRACE_AUG_MAX_BUF : args->args[index];
+
+		if (aug_size > 0) {
+			if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, aug_size, arg))
+				augmented = true;
+		}
+	}
+
+	/* Augmented data size is limited to sizeof(augmented_arg->unnamed union with value field) */
+	if (aug_size > value_size)
+		aug_size = value_size;
+
+	/* write data to payload */
+	if (augmented) {
+		int written = offsetof(struct augmented_arg, value) + aug_size;
+
+		if (written < 0 || written > sizeof(struct augmented_arg))
+			return -1;
+
+		((struct augmented_arg *)payload_offset)->size = aug_size;
+		return written;
+	}
+
+	return 0;
+}
+
 static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
 {
-	bool augmented, do_output = false;
-	int zero = 0, index, value_size = sizeof(struct augmented_arg) - offsetof(struct augmented_arg, value);
+	bool do_output = false;
+	int zero = 0, written;
 	u64 output = 0; /* has to be u64, otherwise it won't pass the verifier */
-	s64 aug_size, size;
 	unsigned int nr, *beauty_map;
 	struct beauty_payload_enter *payload;
-	void *arg, *payload_offset;
+	void *payload_offset;
 
 	/* fall back to do predefined tail call */
 	if (args == NULL)
@@ -457,58 +521,11 @@ static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
 	/* copy the sys_enter header, which has the syscall_nr */
 	__builtin_memcpy(&payload->args, args, sizeof(struct syscall_enter_args));
 
-	/*
-	 * Determine what type of argument and how many bytes to read from user space, using the
-	 * value in the beauty_map. This is the relation of parameter type and its corresponding
-	 * value in the beauty map, and how many bytes we read eventually:
-	 *
-	 * string: 1			      -> size of string
-	 * struct: size of struct	      -> size of struct
-	 * buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_MAX_BUF)
-	 */
 	for (int i = 0; i < 6; i++) {
-		arg = (void *)args->args[i];
-		augmented = false;
-		size = beauty_map[i];
-		aug_size = size; /* size of the augmented data read from user space */
-
-		if (size == 0 || arg == NULL)
-			continue;
-
-		if (size == 1) { /* string */
-			aug_size = bpf_probe_read_user_str(((struct augmented_arg *)payload_offset)->value, value_size, arg);
-			/* minimum of 0 to pass the verifier */
-			if (aug_size < 0)
-				aug_size = 0;
-
-			augmented = true;
-		} else if (size > 0 && size <= value_size) { /* struct */
-			if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, size, arg))
-				augmented = true;
-		} else if ((int)size < 0 && size >= -6) { /* buffer */
-			index = -(size + 1);
-			barrier_var(index); // Prevent clang (noticed with v18) from removing the &= 7 trick.
-			index &= 7;	    // Satisfy the bounds checking with the verifier in some kernels.
-			aug_size = args->args[index] > TRACE_AUG_MAX_BUF ? TRACE_AUG_MAX_BUF : args->args[index];
-
-			if (aug_size > 0) {
-				if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, aug_size, arg))
-					augmented = true;
-			}
-		}
-
-		/* Augmented data size is limited to sizeof(augmented_arg->unnamed union with value field) */
-		if (aug_size > value_size)
-			aug_size = value_size;
-
-		/* write data to payload */
-		if (augmented) {
-			int written = offsetof(struct augmented_arg, value) + aug_size;
-
-			if (written < 0 || written > sizeof(struct augmented_arg))
-				return 1;
-
-			((struct augmented_arg *)payload_offset)->size = aug_size;
+		written = augment_arg(args, i, beauty_map, payload_offset);
+		if (written < 0)
+			return 1;
+		if (written > 0) {
 			output += written;
 			payload_offset += written;
 			do_output = true;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] perf trace: Refactor augmented_raw_syscalls using bpf_for
  2026-07-03 10:32 [PATCH v2 0/2] perf trace: Refactor augmented_raw_syscalls using bpf_for Viktor Malik
  2026-07-03 10:32 ` [PATCH v2 1/2] perf trace: Factor out BPF loop body Viktor Malik
@ 2026-07-03 10:32 ` Viktor Malik
  2026-07-03 23:50   ` Namhyung Kim
  1 sibling, 1 reply; 5+ messages in thread
From: Viktor Malik @ 2026-07-03 10:32 UTC (permalink / raw)
  To: linux-perf-users
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Viktor Malik, Howard Chu,
	linux-kernel, bpf, Michael Petlan, Andrii Nakryiko, stable

The loop for processing syscall args in augment_raw_syscalls has a
history of breaking with Clang updates, see e.g. commit 013eb043f37b
("perf trace: Fix BPF loading failure (-E2BIG)") from Clang 15 to 16.

Now, a similar thing happened between Clang 21 and 22. While the issue
is mitigated on the main line by a recent verifier update, it remains
broken on the 6.12 and 6.18 stable branches:

    [linux-6.18.y]# sudo perf trace true
    libbpf: prog 'sys_enter': BPF program load failed: -E2BIG
    libbpf: prog 'sys_enter': -- BEGIN PROG LOAD LOG --
    [...]
    BPF program is too large. Processed 1000001 insn
    processed 1000001 insns (limit 1000000) max_states_per_insn 40 total_states 37941 peak_states 232 mark_read 0
    -- END PROG LOAD LOG --
    libbpf: prog 'sys_enter': failed to load: -E2BIG
    libbpf: failed to load object 'augmented_raw_syscalls_bpf'
    libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -E2BIG
    Error: failed to get syscall or beauty map fd
    [...]

The reason is that the loop is quite complex and the BPF verifier often
struggles to prove that it terminates.

Fix the issue by replacing the standard for loop by the bpf_for macro,
which uses numeric BPF iterator. This should prevent future breakages of
this kind since the verifier has much easier job proving that the loop
terminates.

Small adjustments were necessary for the loop to make it work.  The main
problem is that the verifier has sometimes problems with bpf_for loops
that use a carry-over state, such as the `payload_offset` and `output`
vars here, since the verifier tries to track their values too precisely
and cannot prove loop convergence. To resolve the issue, we (1)
explicitly recompute `payload_offset` in every iteration and (2) use a
trick with adding a global zero to `output` to help verifier forget its
precise state and use a range instead.

In exchange, this also allows to drop a few artificial checks to help
the verifier, including the changes introduced by 013eb043f37b.

Finally, to keep backwards compatibility with older kernel versions
which do not have bpf_for (i.e. numeric iterators), fall back to
standard for loop in such a case.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Fixes: a68fd6a6cdd3 ("perf trace: Collect augmented data using BPF")
Fixes: 013eb043f37b ("perf trace: Fix BPF loading failure (-E2BIG)")
Cc: stable@vger.kernel.org
---
 .../bpf_skel/augmented_raw_syscalls.bpf.c     | 54 +++++++++++++------
 1 file changed, 39 insertions(+), 15 deletions(-)

diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
index cbdd5ce19a2f..60babc06f381 100644
--- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
+++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
@@ -429,6 +429,8 @@ static bool pid_filter__has(struct pids_filtered *pids, pid_t pid)
 	return bpf_map_lookup_elem(pids, &pid) != NULL;
 }
 
+u64 ZERO = 0;
+
 /*
  * Determine what type of argument and how many bytes to read from user space, using the
  * value in the beauty_map. This is the relation of parameter type and its corresponding
@@ -439,12 +441,13 @@ static bool pid_filter__has(struct pids_filtered *pids, pid_t pid)
  * buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_MAX_BUF)
  */
 static inline int augment_arg(struct syscall_enter_args *args, int i,
-			      unsigned int *beauty_map, void *payload_offset)
+			      unsigned int *beauty_map,
+			      struct beauty_payload_enter *payload, u64 offset)
 {
 	int index, value_size = sizeof(struct augmented_arg) - offsetof(struct augmented_arg, value);
 	s64 aug_size, size;
 	bool augmented;
-	void *arg;
+	void *arg, *payload_offset;
 
 	arg = (void *)args->args[i];
 	augmented = false;
@@ -454,6 +457,12 @@ static inline int augment_arg(struct syscall_enter_args *args, int i,
 	if (size == 0 || arg == NULL)
 		return 0;
 
+	/* bounds check for the verifier */
+	if (offset > sizeof(payload->aug_args) - sizeof(payload->aug_args[0]))
+		return -1;
+	barrier_var(offset);
+	payload_offset = (void *)&payload->aug_args + offset;
+
 	if (size == 1) { /* string */
 		aug_size = bpf_probe_read_user_str(((struct augmented_arg *)payload_offset)->value, value_size, arg);
 		/* minimum of 0 to pass the verifier */
@@ -464,11 +473,13 @@ static inline int augment_arg(struct syscall_enter_args *args, int i,
 	} else if (size > 0 && size <= value_size) { /* struct */
 		if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, size, arg))
 			augmented = true;
-	} else if ((int)size < 0 && size >= -6) { /* buffer */
+	} else if (size < 0 && size >= -6) { /* buffer */
 		index = -(size + 1);
 		barrier_var(index); // Prevent clang (noticed with v18) from removing the &= 7 trick.
 		index &= 7;	    // Satisfy the bounds checking with the verifier in some kernels.
-		aug_size = args->args[index] > TRACE_AUG_MAX_BUF ? TRACE_AUG_MAX_BUF : args->args[index];
+		aug_size = args->args[index];
+		if (aug_size > TRACE_AUG_MAX_BUF)
+			aug_size = TRACE_AUG_MAX_BUF;
 
 		if (aug_size > 0) {
 			if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, aug_size, arg))
@@ -497,11 +508,10 @@ static inline int augment_arg(struct syscall_enter_args *args, int i,
 static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
 {
 	bool do_output = false;
-	int zero = 0, written;
+	int i, zero = 0, written;
 	u64 output = 0; /* has to be u64, otherwise it won't pass the verifier */
 	unsigned int nr, *beauty_map;
 	struct beauty_payload_enter *payload;
-	void *payload_offset;
 
 	/* fall back to do predefined tail call */
 	if (args == NULL)
@@ -513,7 +523,6 @@ static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
 
 	/* set up payload for output */
 	payload        = bpf_map_lookup_elem(&beauty_payload_enter_map, &zero);
-	payload_offset = (void *)&payload->aug_args;
 
 	if (beauty_map == NULL || payload == NULL)
 		return 1;
@@ -521,14 +530,29 @@ static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
 	/* copy the sys_enter header, which has the syscall_nr */
 	__builtin_memcpy(&payload->args, args, sizeof(struct syscall_enter_args));
 
-	for (int i = 0; i < 6; i++) {
-		written = augment_arg(args, i, beauty_map, payload_offset);
-		if (written < 0)
-			return 1;
-		if (written > 0) {
-			output += written;
-			payload_offset += written;
-			do_output = true;
+	if (bpf_ksym_exists(bpf_iter_num_new)) {
+		bpf_for(i, 0, 6) {
+			written = augment_arg(args, i, beauty_map, payload, output);
+			if (written < 0)
+				return 1;
+			if (written > 0) {
+				output += written;
+				/* guide the verifier to forget range of `output`, which
+				 * helps to prove convergence of the loop
+				 */
+				output += ZERO;
+				do_output = true;
+			}
+		}
+	} else {
+		for (i = 0; i < 6; i++) {
+			written = augment_arg(args, i, beauty_map, payload, output);
+			if (written < 0)
+				return 1;
+			if (written > 0) {
+				output += written;
+				do_output = true;
+			}
 		}
 	}
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] perf trace: Factor out BPF loop body
  2026-07-03 10:32 ` [PATCH v2 1/2] perf trace: Factor out BPF loop body Viktor Malik
@ 2026-07-03 23:49   ` Namhyung Kim
  0 siblings, 0 replies; 5+ messages in thread
From: Namhyung Kim @ 2026-07-03 23:49 UTC (permalink / raw)
  To: Viktor Malik
  Cc: linux-perf-users, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Howard Chu,
	linux-kernel, bpf, Michael Petlan, stable

Hello,

On Fri, Jul 03, 2026 at 12:32:14PM +0200, Viktor Malik wrote:
> The BPF program in augmented_raw_syscalls uses a for loop to iterate all
> syscall arguments. The loop body is quite complex and often poses
> problems for the BPF verifier. As a preparation step for addressing this
> issue, factor out the loop body into a separate function.
> 
> Signed-off-by: Viktor Malik <vmalik@redhat.com>
> Cc: stable@vger.kernel.org
> ---
>  .../bpf_skel/augmented_raw_syscalls.bpf.c     | 127 ++++++++++--------
>  1 file changed, 72 insertions(+), 55 deletions(-)
> 
> diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> index 2a6e61864ee0..cbdd5ce19a2f 100644
> --- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> @@ -429,15 +429,79 @@ static bool pid_filter__has(struct pids_filtered *pids, pid_t pid)
>  	return bpf_map_lookup_elem(pids, &pid) != NULL;
>  }
>  
> +/*
> + * Determine what type of argument and how many bytes to read from user space, using the
> + * value in the beauty_map. This is the relation of parameter type and its corresponding
> + * value in the beauty map, and how many bytes we read eventually:
> + *
> + * string: 1			      -> size of string
> + * struct: size of struct	      -> size of struct
> + * buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_MAX_BUF)
> + */
> +static inline int augment_arg(struct syscall_enter_args *args, int i,
> +			      unsigned int *beauty_map, void *payload_offset)

Can we make it 'struct augmented_arg *payload_offset' instead?

Thanks,
Namhyung


> +{
> +	int index, value_size = sizeof(struct augmented_arg) - offsetof(struct augmented_arg, value);
> +	s64 aug_size, size;
> +	bool augmented;
> +	void *arg;
> +
> +	arg = (void *)args->args[i];
> +	augmented = false;
> +	size = beauty_map[i];
> +	aug_size = size; /* size of the augmented data read from user space */
> +
> +	if (size == 0 || arg == NULL)
> +		return 0;
> +
> +	if (size == 1) { /* string */
> +		aug_size = bpf_probe_read_user_str(((struct augmented_arg *)payload_offset)->value, value_size, arg);
> +		/* minimum of 0 to pass the verifier */
> +		if (aug_size < 0)
> +			aug_size = 0;
> +
> +		augmented = true;
> +	} else if (size > 0 && size <= value_size) { /* struct */
> +		if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, size, arg))
> +			augmented = true;
> +	} else if ((int)size < 0 && size >= -6) { /* buffer */
> +		index = -(size + 1);
> +		barrier_var(index); // Prevent clang (noticed with v18) from removing the &= 7 trick.
> +		index &= 7;	    // Satisfy the bounds checking with the verifier in some kernels.
> +		aug_size = args->args[index] > TRACE_AUG_MAX_BUF ? TRACE_AUG_MAX_BUF : args->args[index];
> +
> +		if (aug_size > 0) {
> +			if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, aug_size, arg))
> +				augmented = true;
> +		}
> +	}
> +
> +	/* Augmented data size is limited to sizeof(augmented_arg->unnamed union with value field) */
> +	if (aug_size > value_size)
> +		aug_size = value_size;
> +
> +	/* write data to payload */
> +	if (augmented) {
> +		int written = offsetof(struct augmented_arg, value) + aug_size;
> +
> +		if (written < 0 || written > sizeof(struct augmented_arg))
> +			return -1;
> +
> +		((struct augmented_arg *)payload_offset)->size = aug_size;
> +		return written;
> +	}
> +
> +	return 0;
> +}
> +
>  static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
>  {
> -	bool augmented, do_output = false;
> -	int zero = 0, index, value_size = sizeof(struct augmented_arg) - offsetof(struct augmented_arg, value);
> +	bool do_output = false;
> +	int zero = 0, written;
>  	u64 output = 0; /* has to be u64, otherwise it won't pass the verifier */
> -	s64 aug_size, size;
>  	unsigned int nr, *beauty_map;
>  	struct beauty_payload_enter *payload;
> -	void *arg, *payload_offset;
> +	void *payload_offset;
>  
>  	/* fall back to do predefined tail call */
>  	if (args == NULL)
> @@ -457,58 +521,11 @@ static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
>  	/* copy the sys_enter header, which has the syscall_nr */
>  	__builtin_memcpy(&payload->args, args, sizeof(struct syscall_enter_args));
>  
> -	/*
> -	 * Determine what type of argument and how many bytes to read from user space, using the
> -	 * value in the beauty_map. This is the relation of parameter type and its corresponding
> -	 * value in the beauty map, and how many bytes we read eventually:
> -	 *
> -	 * string: 1			      -> size of string
> -	 * struct: size of struct	      -> size of struct
> -	 * buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_MAX_BUF)
> -	 */
>  	for (int i = 0; i < 6; i++) {
> -		arg = (void *)args->args[i];
> -		augmented = false;
> -		size = beauty_map[i];
> -		aug_size = size; /* size of the augmented data read from user space */
> -
> -		if (size == 0 || arg == NULL)
> -			continue;
> -
> -		if (size == 1) { /* string */
> -			aug_size = bpf_probe_read_user_str(((struct augmented_arg *)payload_offset)->value, value_size, arg);
> -			/* minimum of 0 to pass the verifier */
> -			if (aug_size < 0)
> -				aug_size = 0;
> -
> -			augmented = true;
> -		} else if (size > 0 && size <= value_size) { /* struct */
> -			if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, size, arg))
> -				augmented = true;
> -		} else if ((int)size < 0 && size >= -6) { /* buffer */
> -			index = -(size + 1);
> -			barrier_var(index); // Prevent clang (noticed with v18) from removing the &= 7 trick.
> -			index &= 7;	    // Satisfy the bounds checking with the verifier in some kernels.
> -			aug_size = args->args[index] > TRACE_AUG_MAX_BUF ? TRACE_AUG_MAX_BUF : args->args[index];
> -
> -			if (aug_size > 0) {
> -				if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, aug_size, arg))
> -					augmented = true;
> -			}
> -		}
> -
> -		/* Augmented data size is limited to sizeof(augmented_arg->unnamed union with value field) */
> -		if (aug_size > value_size)
> -			aug_size = value_size;
> -
> -		/* write data to payload */
> -		if (augmented) {
> -			int written = offsetof(struct augmented_arg, value) + aug_size;
> -
> -			if (written < 0 || written > sizeof(struct augmented_arg))
> -				return 1;
> -
> -			((struct augmented_arg *)payload_offset)->size = aug_size;
> +		written = augment_arg(args, i, beauty_map, payload_offset);
> +		if (written < 0)
> +			return 1;
> +		if (written > 0) {
>  			output += written;
>  			payload_offset += written;
>  			do_output = true;
> -- 
> 2.54.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] perf trace: Refactor augmented_raw_syscalls using bpf_for
  2026-07-03 10:32 ` [PATCH v2 2/2] perf trace: Refactor augmented_raw_syscalls using bpf_for Viktor Malik
@ 2026-07-03 23:50   ` Namhyung Kim
  0 siblings, 0 replies; 5+ messages in thread
From: Namhyung Kim @ 2026-07-03 23:50 UTC (permalink / raw)
  To: Viktor Malik
  Cc: linux-perf-users, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Howard Chu,
	linux-kernel, bpf, Michael Petlan, Andrii Nakryiko, stable

On Fri, Jul 03, 2026 at 12:32:15PM +0200, Viktor Malik wrote:
> The loop for processing syscall args in augment_raw_syscalls has a
> history of breaking with Clang updates, see e.g. commit 013eb043f37b
> ("perf trace: Fix BPF loading failure (-E2BIG)") from Clang 15 to 16.
> 
> Now, a similar thing happened between Clang 21 and 22. While the issue
> is mitigated on the main line by a recent verifier update, it remains
> broken on the 6.12 and 6.18 stable branches:
> 
>     [linux-6.18.y]# sudo perf trace true
>     libbpf: prog 'sys_enter': BPF program load failed: -E2BIG
>     libbpf: prog 'sys_enter': -- BEGIN PROG LOAD LOG --
>     [...]
>     BPF program is too large. Processed 1000001 insn
>     processed 1000001 insns (limit 1000000) max_states_per_insn 40 total_states 37941 peak_states 232 mark_read 0
>     -- END PROG LOAD LOG --
>     libbpf: prog 'sys_enter': failed to load: -E2BIG
>     libbpf: failed to load object 'augmented_raw_syscalls_bpf'
>     libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -E2BIG
>     Error: failed to get syscall or beauty map fd
>     [...]
> 
> The reason is that the loop is quite complex and the BPF verifier often
> struggles to prove that it terminates.
> 
> Fix the issue by replacing the standard for loop by the bpf_for macro,
> which uses numeric BPF iterator. This should prevent future breakages of
> this kind since the verifier has much easier job proving that the loop
> terminates.
> 
> Small adjustments were necessary for the loop to make it work.  The main
> problem is that the verifier has sometimes problems with bpf_for loops
> that use a carry-over state, such as the `payload_offset` and `output`
> vars here, since the verifier tries to track their values too precisely
> and cannot prove loop convergence. To resolve the issue, we (1)
> explicitly recompute `payload_offset` in every iteration and (2) use a
> trick with adding a global zero to `output` to help verifier forget its
> precise state and use a range instead.
> 
> In exchange, this also allows to drop a few artificial checks to help
> the verifier, including the changes introduced by 013eb043f37b.
> 
> Finally, to keep backwards compatibility with older kernel versions
> which do not have bpf_for (i.e. numeric iterators), fall back to
> standard for loop in such a case.
> 
> Signed-off-by: Viktor Malik <vmalik@redhat.com>
> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> Fixes: a68fd6a6cdd3 ("perf trace: Collect augmented data using BPF")
> Fixes: 013eb043f37b ("perf trace: Fix BPF loading failure (-E2BIG)")
> Cc: stable@vger.kernel.org
> ---
>  .../bpf_skel/augmented_raw_syscalls.bpf.c     | 54 +++++++++++++------
>  1 file changed, 39 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> index cbdd5ce19a2f..60babc06f381 100644
> --- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> @@ -429,6 +429,8 @@ static bool pid_filter__has(struct pids_filtered *pids, pid_t pid)
>  	return bpf_map_lookup_elem(pids, &pid) != NULL;
>  }
>  
> +u64 ZERO = 0;
> +
>  /*
>   * Determine what type of argument and how many bytes to read from user space, using the
>   * value in the beauty_map. This is the relation of parameter type and its corresponding
> @@ -439,12 +441,13 @@ static bool pid_filter__has(struct pids_filtered *pids, pid_t pid)
>   * buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_MAX_BUF)
>   */
>  static inline int augment_arg(struct syscall_enter_args *args, int i,
> -			      unsigned int *beauty_map, void *payload_offset)
> +			      unsigned int *beauty_map,
> +			      struct beauty_payload_enter *payload, u64 offset)
>  {
>  	int index, value_size = sizeof(struct augmented_arg) - offsetof(struct augmented_arg, value);
>  	s64 aug_size, size;
>  	bool augmented;
> -	void *arg;
> +	void *arg, *payload_offset;
>  
>  	arg = (void *)args->args[i];
>  	augmented = false;
> @@ -454,6 +457,12 @@ static inline int augment_arg(struct syscall_enter_args *args, int i,
>  	if (size == 0 || arg == NULL)
>  		return 0;
>  
> +	/* bounds check for the verifier */
> +	if (offset > sizeof(payload->aug_args) - sizeof(payload->aug_args[0]))
> +		return -1;
> +	barrier_var(offset);
> +	payload_offset = (void *)&payload->aug_args + offset;
> +
>  	if (size == 1) { /* string */
>  		aug_size = bpf_probe_read_user_str(((struct augmented_arg *)payload_offset)->value, value_size, arg);
>  		/* minimum of 0 to pass the verifier */
> @@ -464,11 +473,13 @@ static inline int augment_arg(struct syscall_enter_args *args, int i,
>  	} else if (size > 0 && size <= value_size) { /* struct */
>  		if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, size, arg))
>  			augmented = true;
> -	} else if ((int)size < 0 && size >= -6) { /* buffer */
> +	} else if (size < 0 && size >= -6) { /* buffer */
>  		index = -(size + 1);
>  		barrier_var(index); // Prevent clang (noticed with v18) from removing the &= 7 trick.
>  		index &= 7;	    // Satisfy the bounds checking with the verifier in some kernels.
> -		aug_size = args->args[index] > TRACE_AUG_MAX_BUF ? TRACE_AUG_MAX_BUF : args->args[index];
> +		aug_size = args->args[index];
> +		if (aug_size > TRACE_AUG_MAX_BUF)
> +			aug_size = TRACE_AUG_MAX_BUF;
>  
>  		if (aug_size > 0) {
>  			if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, aug_size, arg))
> @@ -497,11 +508,10 @@ static inline int augment_arg(struct syscall_enter_args *args, int i,
>  static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
>  {
>  	bool do_output = false;
> -	int zero = 0, written;
> +	int i, zero = 0, written;
>  	u64 output = 0; /* has to be u64, otherwise it won't pass the verifier */
>  	unsigned int nr, *beauty_map;
>  	struct beauty_payload_enter *payload;
> -	void *payload_offset;
>  
>  	/* fall back to do predefined tail call */
>  	if (args == NULL)
> @@ -513,7 +523,6 @@ static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
>  
>  	/* set up payload for output */
>  	payload        = bpf_map_lookup_elem(&beauty_payload_enter_map, &zero);
> -	payload_offset = (void *)&payload->aug_args;
>  
>  	if (beauty_map == NULL || payload == NULL)
>  		return 1;
> @@ -521,14 +530,29 @@ static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
>  	/* copy the sys_enter header, which has the syscall_nr */
>  	__builtin_memcpy(&payload->args, args, sizeof(struct syscall_enter_args));
>  
> -	for (int i = 0; i < 6; i++) {
> -		written = augment_arg(args, i, beauty_map, payload_offset);
> -		if (written < 0)
> -			return 1;
> -		if (written > 0) {
> -			output += written;
> -			payload_offset += written;
> -			do_output = true;
> +	if (bpf_ksym_exists(bpf_iter_num_new)) {
> +		bpf_for(i, 0, 6) {
> +			written = augment_arg(args, i, beauty_map, payload, output);
> +			if (written < 0)
> +				return 1;
> +			if (written > 0) {
> +				output += written;
> +				/* guide the verifier to forget range of `output`, which
> +				 * helps to prove convergence of the loop
> +				 */
> +				output += ZERO;
> +				do_output = true;
> +			}
> +		}
> +	} else {
> +		for (i = 0; i < 6; i++) {
> +			written = augment_arg(args, i, beauty_map, payload, output);
> +			if (written < 0)
> +				return 1;
> +			if (written > 0) {
> +				output += written;

Woundn't it also need '+= ZERO' here?

Thanks,
Namhyung


> +				do_output = true;
> +			}
>  		}
>  	}
>  
> -- 
> 2.54.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-07-03 23:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-03 10:32 [PATCH v2 0/2] perf trace: Refactor augmented_raw_syscalls using bpf_for Viktor Malik
2026-07-03 10:32 ` [PATCH v2 1/2] perf trace: Factor out BPF loop body Viktor Malik
2026-07-03 23:49   ` Namhyung Kim
2026-07-03 10:32 ` [PATCH v2 2/2] perf trace: Refactor augmented_raw_syscalls using bpf_for Viktor Malik
2026-07-03 23:50   ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox