* [RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF
@ 2026-07-01 13:45 Masami Hiramatsu (Google)
2026-07-01 13:45 ` [RFC PATCH 1/4] tools/tracing: Add fetcharg performance micro-benchmark Masami Hiramatsu (Google)
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-07-01 13:45 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Shuah Khan
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
linux-kselftest, bpf
Hi,
I investigated the feasibility of optimizing `fetcharg` in probe events
using BPF conversion. The result looks promising. It can reduce about
30% of overhead (and maybe more if we have more than 3 arguments.)
I actually thought there was not such a big difference because I guessed
major overhead source is unsafe pointer dereferencing (e.g.
copy_from_kernel_nofault()). Actually without CONFIG_BPF_JIT, the overhead
is more than double. But with the JIT compiler it showed better performance.
The basic concept is quite simple. The process remains the same up until
the point where user input is converted into `fetcharg` code. It is
possible to convert some of the fundamental `fetcharg` operations into
an equivalent sequence of BPF instructions. This creates a single
`bpf_prog` for each probe event (rather than one per argument).
This program executes within the event handler, reads `pt_regs` directly,
and stores the results in the ftrace ring buffer, just as `fetcharg`
does.
So here are the benchmark results on qemu (KVM) on Intel Core i7-8565U.
When enabling BPF with JIT:
--------------------------------------------------------------------------------
Configuration 0 Fetchargs 1 Fetcharg 2 Fetchargs 3 Fetchargs
--------------------------------------------------------------------------------
Baseline 298882359 - - - loops/sec
- - - - overhead
Kprobe 9740841 8664195 7944956 7608274 loops/sec
99.31 ns 12.76 ns 23.21 ns 28.78 ns overhead
Fprobe 10827749 9220918 7992512 7683757 loops/sec
89.01 ns 16.09 ns 32.76 ns 37.79 ns overhead
Eprobe 6746389 6245994 5319037 4845406 loops/sec
144.88 ns 11.88 ns 39.78 ns 58.15 ns overhead
--------------------------------------------------------------------------------
When enabling BPF without JIT:
-----------------------------------------------------------------------------------------------
Configuration 0 Fetchargs 1 Fetcharg 2 Fetchargs 3 Fetchargs
-----------------------------------------------------------------------------------------------
Baseline 84067374 - - - loops/sec
- - - - overhead
Kprobe 7092949 5834913 3848776 3443408 loops/sec
129.09 ns 30.40 ns 118.84 ns 149.42 ns overhead
Fprobe 9426302 6441734 4350313 3710814 loops/sec
94.19 ns 49.15 ns 123.78 ns 163.40 ns overhead
Eprobe 5681716 4958113 3940999 3953434 loops/sec
164.11 ns 25.69 ns 77.74 ns 76.94 ns overhead
-----------------------------------------------------------------------------------------------
When disabling BPF (legacy fetcharg)
--------------------------------------------------------------------------------
Configuration 0 Fetchargs 1 Fetcharg 2 Fetchargs 3 Fetchargs
--------------------------------------------------------------------------------
Baseline 245433525 - - - loops/sec
- - - - overhead
Kprobe 9055348 8488351 7219595 6453928 loops/sec
106.36 ns 7.38 ns 28.08 ns 44.51 ns overhead
Fprobe 10859326 9288801 7492518 6607046 loops/sec
88.01 ns 15.57 ns 41.38 ns 59.27 ns overhead
Eprobe 6987128 5114526 5055084 4803759 loops/sec
139.05 ns 52.40 ns 54.70 ns 65.05 ns overhead
--------------------------------------------------------------------------------
The number is still unstable (because of the benchmark problem) but the
trend shows the BPF+JIT is the winner.
TODOs:
- Add a new Kconfig which depends on CONFIG_BPF_JIT=y.
- Even if a single dereference operation fails, processing of subsequent
arguments continues.
- Allow mixing with unsupported FETCH_OPs on the same event.
Thank you,
---
base-commit: c0c56fe6fb52cfb28419242cfa6235125f818f94
Masami Hiramatsu (Google) (4):
tools/tracing: Add fetcharg performance micro-benchmark
tracing/probes: Compile all fetchargs into a single BPF program per event
tracing: Add disable_bpf trace option to ignore eBPF for fetchargs
selftests/ftrace: Add a test for eBPF compiled fetchargs
kernel/trace/trace.c | 7 +
kernel/trace/trace.h | 8 +
kernel/trace/trace_probe.c | 249 ++++++++++++++++++++
kernel/trace/trace_probe.h | 15 +
kernel/trace/trace_probe_tmpl.h | 13 +
.../ftrace/test.d/dynevent/test_bpf_fetchargs.tc | 51 ++++
tools/tracing/benchmark/Kbuild | 3
tools/tracing/benchmark/Makefile | 12 +
tools/tracing/benchmark/bench_fetcharg.sh | 195 ++++++++++++++++
tools/tracing/benchmark/fetcharg_bench.c | 98 ++++++++
tools/tracing/benchmark/fetcharg_bench_trace.h | 37 +++
11 files changed, 684 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc
create mode 100644 tools/tracing/benchmark/Kbuild
create mode 100644 tools/tracing/benchmark/Makefile
create mode 100755 tools/tracing/benchmark/bench_fetcharg.sh
create mode 100644 tools/tracing/benchmark/fetcharg_bench.c
create mode 100644 tools/tracing/benchmark/fetcharg_bench_trace.h
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH 1/4] tools/tracing: Add fetcharg performance micro-benchmark
2026-07-01 13:45 [RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF Masami Hiramatsu (Google)
@ 2026-07-01 13:45 ` Masami Hiramatsu (Google)
2026-07-01 13:53 ` sashiko-bot
2026-07-01 13:45 ` [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event Masami Hiramatsu (Google)
` (2 subsequent siblings)
3 siblings, 1 reply; 16+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-07-01 13:45 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Shuah Khan
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
linux-kselftest, bpf
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add a benchmark test module (fetcharg_bench) and bench_fetcharg.sh script
to measure the execution overhead of fetchargs across kprobe, fprobe,
and eprobe configurations.
The benchmark runs for a baseline (no probe events), 0-arguments,
1-argument, 2-arguments, and 3-arguments configurations, calculating
the estimated overhead in nanoseconds. It also supports a --debug option
to dump registered dynamic events.
Assisted-by: Antigravity:gemini-3.5-flash
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
tools/tracing/benchmark/Kbuild | 3
tools/tracing/benchmark/Makefile | 12 +
tools/tracing/benchmark/bench_fetcharg.sh | 195 ++++++++++++++++++++++++
tools/tracing/benchmark/fetcharg_bench.c | 98 ++++++++++++
tools/tracing/benchmark/fetcharg_bench_trace.h | 37 +++++
5 files changed, 345 insertions(+)
create mode 100644 tools/tracing/benchmark/Kbuild
create mode 100644 tools/tracing/benchmark/Makefile
create mode 100755 tools/tracing/benchmark/bench_fetcharg.sh
create mode 100644 tools/tracing/benchmark/fetcharg_bench.c
create mode 100644 tools/tracing/benchmark/fetcharg_bench_trace.h
diff --git a/tools/tracing/benchmark/Kbuild b/tools/tracing/benchmark/Kbuild
new file mode 100644
index 000000000000..4c31b26ca51c
--- /dev/null
+++ b/tools/tracing/benchmark/Kbuild
@@ -0,0 +1,3 @@
+obj-m += fetcharg_bench.o
+
+ccflags-y += -I$(src)
diff --git a/tools/tracing/benchmark/Makefile b/tools/tracing/benchmark/Makefile
new file mode 100644
index 000000000000..bf5b3cbaff03
--- /dev/null
+++ b/tools/tracing/benchmark/Makefile
@@ -0,0 +1,12 @@
+ifeq ($(O),)
+KDIR ?= /lib/modules/$(shell uname -r)/build
+else
+KDIR := $(O)
+endif
+MDIR := $(CURDIR)
+
+all:
+ $(MAKE) -C $(KDIR) M=$(MDIR) modules
+
+clean:
+ $(MAKE) -C $(KDIR) M=$(MDIR) clean
diff --git a/tools/tracing/benchmark/bench_fetcharg.sh b/tools/tracing/benchmark/bench_fetcharg.sh
new file mode 100755
index 000000000000..0b2a2b8a896e
--- /dev/null
+++ b/tools/tracing/benchmark/bench_fetcharg.sh
@@ -0,0 +1,195 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# description: Benchmark fetcharg performance (baseline vs kprobe vs fprobe vs eprobe)
+
+DEBUG=0
+while [[ $# -gt 0 ]]; do
+ case "$1" in
+ --debug|-d)
+ DEBUG=1
+ shift
+ ;;
+ *)
+ echo "Unknown option: $1"
+ echo "Usage: $0 [--debug|-d]"
+ exit 1
+ ;;
+ esac
+done
+
+DEBUGFS_MOUNT=$(grep ^debugfs /proc/mounts | awk '{print $2}')
+if [ -z "$DEBUGFS_MOUNT" ]; then
+ mount -t debugfs nodev /sys/kernel/debug
+ DEBUGFS_MOUNT="/sys/kernel/debug"
+fi
+
+TRACEFS_MOUNT=$(grep ^tracefs /proc/mounts | awk '{print $2}')
+if [ -z "$TRACEFS_MOUNT" ]; then
+ mount -t tracefs nodev /sys/kernel/tracing
+ TRACEFS_MOUNT="/sys/kernel/tracing"
+fi
+
+MOD_NAME="fetcharg_bench"
+MOD_FILE="./${MOD_NAME}.ko"
+
+if [ ! -f "$MOD_FILE" ]; then
+ echo "Module $MOD_FILE not found. Please run 'make' first."
+ exit 1
+fi
+
+rmmod $MOD_NAME 2>/dev/null
+insmod $MOD_FILE || { echo "Failed to load $MOD_FILE"; exit 1; }
+
+TRIGGER_FILE="${DEBUGFS_MOUNT}/fetcharg_benchmark/trigger"
+
+if [ ! -f "$TRIGGER_FILE" ]; then
+ echo "Trigger file $TRIGGER_FILE not found."
+ rmmod $MOD_NAME
+ exit 1
+fi
+
+DYN_EVENTS="${TRACEFS_MOUNT}/dynamic_events"
+
+# Helper to clear events
+clear_events() {
+ echo 0 > "${TRACEFS_MOUNT}/events/enable"
+ echo > "$DYN_EVENTS"
+}
+
+run_bench() {
+ if [ "$DEBUG" = "1" ]; then
+ echo "=== [DEBUG] dynamic_events ===" >&2
+ cat "$DYN_EVENTS" >&2
+ echo "==============================" >&2
+ fi
+ cat "$TRIGGER_FILE"
+}
+
+calc_overhead() {
+ local lps=$1
+ local base_lps=$2
+ if [ -z "$lps" ] || [ -z "$base_lps" ] || [ "$lps" = "-" ] || [ "$base_lps" = "-" ]; then
+ echo "-"
+ return
+ fi
+ awk -v lps="$lps" -v base_lps="$base_lps" 'BEGIN {
+ if (lps == 0 || base_lps == 0) {
+ print "-"
+ exit
+ }
+ t = 1000000000.0 / lps
+ t_base = 1000000000.0 / base_lps
+ diff = t - t_base
+ printf "%.2f ns", diff
+ }'
+}
+
+echo "Running Fetcharg Micro Benchmark..."
+echo "Please wait, this may take a few seconds..."
+
+# Baseline
+clear_events
+baseline=$(run_bench)
+
+# Kprobe
+clear_events
+echo "p:bench_kprobe fetcharg_bench_target" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/kprobes/bench_kprobe/enable"
+kprobe_0=$(run_bench)
+
+clear_events
+echo "p:bench_kprobe fetcharg_bench_target a=\$arg1" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/kprobes/bench_kprobe/enable"
+kprobe_1=$(run_bench)
+
+clear_events
+echo "p:bench_kprobe fetcharg_bench_target a=\$arg1 b=+0(+0(\$arg2)):u32" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/kprobes/bench_kprobe/enable"
+kprobe_2=$(run_bench)
+
+clear_events
+echo "p:bench_kprobe fetcharg_bench_target a=\$arg1 b=+0(+0(\$arg2)):u32 c=+0(\$arg3):u32" \
+ >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/kprobes/bench_kprobe/enable"
+kprobe_3=$(run_bench)
+
+# Fprobe
+clear_events
+echo "f:bench_fprobe fetcharg_bench_target" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/fprobes/bench_fprobe/enable"
+fprobe_0=$(run_bench)
+
+clear_events
+echo "f:bench_fprobe fetcharg_bench_target a=\$arg1" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/fprobes/bench_fprobe/enable"
+fprobe_1=$(run_bench)
+
+clear_events
+echo "f:bench_fprobe fetcharg_bench_target a=\$arg1 b=+0(+0(\$arg2)):u32" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/fprobes/bench_fprobe/enable"
+fprobe_2=$(run_bench)
+
+clear_events
+echo "f:bench_fprobe fetcharg_bench_target a=\$arg1 b=+0(+0(\$arg2)):u32 c=+0(\$arg3):u32" \
+ >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/fprobes/bench_fprobe/enable"
+fprobe_3=$(run_bench)
+
+# Eprobe
+clear_events
+echo "e:bench_eprobe fetcharg_bench/fetcharg_bench_event" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/eprobes/bench_eprobe/enable"
+echo 1 > "${TRACEFS_MOUNT}/events/fetcharg_bench/fetcharg_bench_event/enable"
+eprobe_0=$(run_bench)
+
+clear_events
+echo "e:bench_eprobe fetcharg_bench/fetcharg_bench_event a=\$a" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/eprobes/bench_eprobe/enable"
+echo 1 > "${TRACEFS_MOUNT}/events/fetcharg_bench/fetcharg_bench_event/enable"
+eprobe_1=$(run_bench)
+
+clear_events
+echo "e:bench_eprobe fetcharg_bench/fetcharg_bench_event a=\$a b=+0(+0(\$b_ptr)):u32" \
+ >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/eprobes/bench_eprobe/enable"
+echo 1 > "${TRACEFS_MOUNT}/events/fetcharg_bench/fetcharg_bench_event/enable"
+eprobe_2=$(run_bench)
+
+clear_events
+echo "e:bench_eprobe fetcharg_bench/fetcharg_bench_event a=\$a b=+0(+0(\$b_ptr)):u32 c=+0(\$c_ptr):u32" \
+ >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/eprobes/bench_eprobe/enable"
+echo 1 > "${TRACEFS_MOUNT}/events/fetcharg_bench/fetcharg_bench_event/enable"
+eprobe_3=$(run_bench)
+
+echo "--------------------------------------------------------------------------------"
+echo "Configuration 0 Fetchargs 1 Fetcharg 2 Fetchargs 3 Fetchargs"
+echo "--------------------------------------------------------------------------------"
+printf "%-18s %15s %15s %18s %18s loops/sec\n" "Baseline" "$baseline" "-" "-" "-"
+printf "%-18s %15s %15s %18s %18s overhead\n" " " "-" "-" "-" "-"
+printf "%-18s %15s %15s %18s %18s loops/sec\n" \
+ "Kprobe" "$kprobe_0" "$kprobe_1" "$kprobe_2" "$kprobe_3"
+printf "%-18s %15s %15s %18s %18s overhead\n" " " \
+ "$(calc_overhead $kprobe_0 $baseline)" \
+ "$(calc_overhead $kprobe_1 $kprobe_0)" \
+ "$(calc_overhead $kprobe_2 $kprobe_0)" \
+ "$(calc_overhead $kprobe_3 $kprobe_0)"
+printf "%-18s %15s %15s %18s %18s loops/sec\n" \
+ "Fprobe" "$fprobe_0" "$fprobe_1" "$fprobe_2" "$fprobe_3"
+printf "%-18s %15s %15s %18s %18s overhead\n" " " \
+ "$(calc_overhead $fprobe_0 $baseline)" \
+ "$(calc_overhead $fprobe_1 $fprobe_0)" \
+ "$(calc_overhead $fprobe_2 $fprobe_0)" \
+ "$(calc_overhead $fprobe_3 $fprobe_0)"
+printf "%-18s %15s %15s %18s %18s loops/sec\n" \
+ "Eprobe" "$eprobe_0" "$eprobe_1" "$eprobe_2" "$eprobe_3"
+printf "%-18s %15s %15s %18s %18s overhead\n" " " \
+ "$(calc_overhead $eprobe_0 $baseline)" \
+ "$(calc_overhead $eprobe_1 $eprobe_0)" \
+ "$(calc_overhead $eprobe_2 $eprobe_0)" \
+ "$(calc_overhead $eprobe_3 $eprobe_0)"
+echo "--------------------------------------------------------------------------------"
+
+clear_events
+rmmod $MOD_NAME
+exit 0
diff --git a/tools/tracing/benchmark/fetcharg_bench.c b/tools/tracing/benchmark/fetcharg_bench.c
new file mode 100644
index 000000000000..af18183c1f5d
--- /dev/null
+++ b/tools/tracing/benchmark/fetcharg_bench.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/ktime.h>
+#include <linux/debugfs.h>
+#include <linux/uaccess.h>
+
+#define CREATE_TRACE_POINTS
+#include "fetcharg_bench_trace.h"
+
+static noinline int fetcharg_bench_target(int a, int **b, char *c)
+{
+ /* Prevent compiler from optimizing the loop out entirely */
+ asm volatile ("" : : "r"(a), "r"(b), "r"(c) : "memory");
+ trace_fetcharg_bench_event(a, b, c);
+ return a + **b;
+}
+
+/* Indirect pointer to prevent inlining */
+static int (*bench_func_ptr)(int, int **, char *) = fetcharg_bench_target;
+
+#define BENCH_ITERATIONS 1000000
+
+static ssize_t fetcharg_bench_read(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ char buf[64];
+ int len;
+ u64 start, current_time;
+ u64 elapsed;
+ u64 loops_per_sec;
+ int dummy = 0;
+ int a = 1;
+ int b_val = 2;
+ int *b_ptr = &b_val;
+ int **b = &b_ptr;
+ char c[] = "benchmark";
+ int i;
+
+ if (*ppos > 0)
+ return 0; /* EOF */
+
+ start = ktime_get_ns();
+ for (i = 0; i < BENCH_ITERATIONS; i++)
+ dummy += bench_func_ptr(a, b, c);
+ current_time = ktime_get_ns();
+
+ elapsed = current_time - start;
+ loops_per_sec = ((u64)BENCH_ITERATIONS * NSEC_PER_SEC) / elapsed;
+
+ len = snprintf(buf, sizeof(buf), "%llu\n", loops_per_sec);
+ if (len < 0)
+ return len;
+
+ if (copy_to_user(user_buf, buf, len))
+ return -EFAULT;
+
+ *ppos += len;
+
+ /*
+ * Use 'dummy' to ensure the compiler doesn't optimize out
+ * the call completely, though the asm volatile helps too.
+ */
+ if (dummy == 0xdeadbeef)
+ pr_info("dummy=%d\n", dummy);
+
+ return len;
+}
+
+static const struct file_operations fetcharg_bench_fops = {
+ .read = fetcharg_bench_read,
+ .open = simple_open,
+ .llseek = default_llseek,
+};
+
+static struct dentry *bench_dir;
+
+static int __init fetcharg_bench_init(void)
+{
+ bench_dir = debugfs_create_dir("fetcharg_benchmark", NULL);
+ if (!bench_dir)
+ return -ENOMEM;
+
+ debugfs_create_file("trigger", 0444, bench_dir, NULL, &fetcharg_bench_fops);
+
+ return 0;
+}
+
+static void __exit fetcharg_bench_exit(void)
+{
+ debugfs_remove_recursive(bench_dir);
+}
+
+module_init(fetcharg_bench_init);
+module_exit(fetcharg_bench_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Antigravity");
+MODULE_DESCRIPTION("Fetcharg performance benchmark test module");
diff --git a/tools/tracing/benchmark/fetcharg_bench_trace.h b/tools/tracing/benchmark/fetcharg_bench_trace.h
new file mode 100644
index 000000000000..6560f62337e3
--- /dev/null
+++ b/tools/tracing/benchmark/fetcharg_bench_trace.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fetcharg_bench
+
+#if !defined(_FETCHARG_BENCH_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _FETCHARG_BENCH_TRACE_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(fetcharg_bench_event,
+
+ TP_PROTO(int a, int **b, char *c),
+
+ TP_ARGS(a, b, c),
+
+ TP_STRUCT__entry(
+ __field(int, a)
+ __field(int **, b_ptr)
+ __field(char *, c_ptr)
+ ),
+
+ TP_fast_assign(
+ __entry->a = a;
+ __entry->b_ptr = b;
+ __entry->c_ptr = c;
+ ),
+
+ TP_printk("a=%d b=%p c=%p", __entry->a, __entry->b_ptr, __entry->c_ptr)
+);
+
+#endif /* _FETCHARG_BENCH_TRACE_H */
+
+#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE fetcharg_bench_trace
+#include <trace/define_trace.h>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 13:45 [RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF Masami Hiramatsu (Google)
2026-07-01 13:45 ` [RFC PATCH 1/4] tools/tracing: Add fetcharg performance micro-benchmark Masami Hiramatsu (Google)
@ 2026-07-01 13:45 ` Masami Hiramatsu (Google)
2026-07-01 14:06 ` sashiko-bot
2026-07-01 18:41 ` Alexei Starovoitov
2026-07-01 13:45 ` [RFC PATCH 3/4] tracing: Add disable_bpf trace option to ignore eBPF for fetchargs Masami Hiramatsu (Google)
2026-07-01 13:46 ` [RFC PATCH 4/4] selftests/ftrace: Add a test for eBPF compiled fetchargs Masami Hiramatsu (Google)
3 siblings, 2 replies; 16+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-07-01 13:45 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Shuah Khan
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
linux-kselftest, bpf
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Compile all fetch arguments of a trace probe event into a single BPF
program instead of separate programs per argument to reduce prologue
and dispatching overhead.
BPF-compatible arguments (such as register, immediate, dereferences,
and raw stores) are compiled, including registers mapping for x86_64,
arm64, and s390. If any argument requires non-BPF operations (such as
dynamic strings), we fallback to the interpreter loop for all arguments.
Also, correctly initialize prog->len to prevent invalid opcode execution in
the BPF interpreter.
Assisted-by: Antigravity:gemini-3.5-flash
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
kernel/trace/trace_probe.c | 249 ++++++++++++++++++++++++++++++++++++++-
kernel/trace/trace_probe.h | 15 ++
kernel/trace/trace_probe_tmpl.h | 13 ++
3 files changed, 273 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 18c212122344..0deb53c22ae3 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -2003,11 +2003,208 @@ static char *generate_probe_arg_name(const char *arg, int idx)
return name;
}
+#ifdef CONFIG_BPF_SYSCALL
+#include <linux/filter.h>
+#include <linux/uaccess.h>
+
+static int regs_get_kernel_argument_offset(unsigned int n)
+{
+#ifdef CONFIG_X86_64
+ static const int argument_offsets[] = {
+ offsetof(struct pt_regs, di),
+ offsetof(struct pt_regs, si),
+ offsetof(struct pt_regs, dx),
+ offsetof(struct pt_regs, cx),
+ offsetof(struct pt_regs, r8),
+ offsetof(struct pt_regs, r9),
+ };
+ if (n < ARRAY_SIZE(argument_offsets))
+ return argument_offsets[n];
+#elif defined(CONFIG_ARM64)
+ if (n < 8)
+ return offsetof(struct pt_regs, regs[n]);
+#elif defined(CONFIG_S390)
+ if (n < 5)
+ return offsetof(struct pt_regs, gprs[2 + n]);
+#endif
+ return -1;
+}
+
+static bool trace_probe_can_compile_bpf(struct trace_probe *tp)
+{
+ int i;
+
+ if (tp->nr_args == 0)
+ return false;
+
+ for (i = 0; i < tp->nr_args; i++) {
+ struct probe_arg *parg = &tp->args[i];
+ struct fetch_insn *code = parg->code;
+
+ while (code->op != FETCH_OP_END) {
+ switch (code->op) {
+ case FETCH_OP_REG:
+ case FETCH_OP_IMM:
+ case FETCH_OP_DEREF:
+ case FETCH_OP_ST_RAW:
+ case FETCH_OP_ST_MEM:
+ break;
+ case FETCH_OP_ARG:
+ if (regs_get_kernel_argument_offset(code->param) < 0)
+ return false;
+ break;
+ default:
+ return false;
+ }
+ code++;
+ }
+ }
+ return true;
+}
+
+static void trace_probe_compile_bpf(struct trace_probe *tp)
+{
+ struct bpf_insn *insns;
+ int i = 0;
+ struct bpf_prog *prog;
+ int err, idx;
+
+ if (!trace_probe_can_compile_bpf(tp))
+ return;
+
+ insns = kmalloc_array(512, sizeof(struct bpf_insn), GFP_KERNEL);
+ if (!insns)
+ return;
+
+ /* Prologue: R6 = ctx */
+ insns[i++] = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
+ /* R7 = ctx->rec */
+ insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_6,
+ offsetof(struct fetch_bpf_ctx, rec));
+ /* R8 = ctx->data */
+ insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_8, BPF_REG_6,
+ offsetof(struct fetch_bpf_ctx, data));
+ /* R9 = total size (0) */
+ insns[i++] = BPF_MOV64_IMM(BPF_REG_9, 0);
+
+ for (idx = 0; idx < tp->nr_args; idx++) {
+ struct probe_arg *parg = &tp->args[idx];
+ struct fetch_insn *code = parg->code;
+
+ while (code->op != FETCH_OP_END && i < 500) {
+ switch (code->op) {
+ case FETCH_OP_REG:
+ /* R0 = *(unsigned long *)(R7 + code->param) */
+ insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_7, code->param);
+ break;
+ case FETCH_OP_ARG: {
+ int offset = regs_get_kernel_argument_offset(code->param);
+ /* R0 = *(unsigned long *)(R7 + offset) */
+ insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_7, offset);
+ break;
+ }
+ case FETCH_OP_IMM:
+ insns[i++] = BPF_LD_IMM64(BPF_REG_0, code->immediate);
+ break;
+ case FETCH_OP_DEREF:
+ /* Add offset: R3 = R0 + code->offset (src) */
+ insns[i++] = BPF_MOV64_REG(BPF_REG_2, BPF_REG_0);
+ if (code->offset)
+ insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2,
+ code->offset);
+ /* R1 = dst (R10 - 8 on stack) */
+ insns[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_10);
+ insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8);
+ /* R3 = size */
+ insns[i++] = BPF_MOV64_IMM(BPF_REG_3, sizeof(unsigned long));
+ /* Call copy_from_kernel_nofault(dst, src, size) */
+ insns[i++] = BPF_EMIT_CALL(copy_from_kernel_nofault);
+ /* if (R0 < 0) return R0; */
+ insns[i++] = BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 1);
+ insns[i++] = BPF_EXIT_INSN();
+ /* R0 = *(unsigned long *)(R10 - 8) */
+ insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8);
+ break;
+ case FETCH_OP_ST_RAW:
+ /* Store R0 into R8 (data) + parg->offset based on size */
+ switch (code->size) {
+ case 1:
+ insns[i++] = BPF_STX_MEM(BPF_B, BPF_REG_8, BPF_REG_0,
+ parg->offset);
+ break;
+ case 2:
+ insns[i++] = BPF_STX_MEM(BPF_H, BPF_REG_8, BPF_REG_0,
+ parg->offset);
+ break;
+ case 4:
+ insns[i++] = BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_0,
+ parg->offset);
+ break;
+ case 8:
+ insns[i++] = BPF_STX_MEM(BPF_DW, BPF_REG_8, BPF_REG_0,
+ parg->offset);
+ break;
+ }
+ break;
+ case FETCH_OP_ST_MEM:
+ /* Add offset: R2 = R0 + code->offset (src) */
+ insns[i++] = BPF_MOV64_REG(BPF_REG_2, BPF_REG_0);
+ if (code->offset)
+ insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2,
+ code->offset);
+ /* R1 = dst (R8 + parg->offset) */
+ insns[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_8);
+ if (parg->offset)
+ insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
+ parg->offset);
+ /* R3 = size */
+ insns[i++] = BPF_MOV64_IMM(BPF_REG_3, code->size);
+ /* Call copy_from_kernel_nofault(dst, src, size) */
+ insns[i++] = BPF_EMIT_CALL(copy_from_kernel_nofault);
+ /* if (R0 < 0) return R0; */
+ insns[i++] = BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 1);
+ insns[i++] = BPF_EXIT_INSN();
+ break;
+ default:
+ goto out;
+ }
+ code++;
+ }
+ }
+
+ if (i >= 500)
+ goto out;
+
+ /* Epilogue: return R9 (0) */
+ insns[i++] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_9);
+ insns[i++] = BPF_EXIT_INSN();
+
+ prog = bpf_prog_alloc(bpf_prog_size(i), 0);
+ if (!prog)
+ goto out;
+
+ prog->len = i;
+ memcpy(prog->insnsi, insns, prog->len * sizeof(struct bpf_insn));
+ prog->type = BPF_PROG_TYPE_KPROBE;
+
+ prog = bpf_prog_select_runtime(prog, &err);
+ if (IS_ERR(prog))
+ goto out;
+ tp->prog = prog;
+
+out:
+ kfree(insns);
+}
+#endif
+
+/* Parse an argument */
+/* The caller must pass a null-terminated argument string */
int traceprobe_parse_probe_arg(struct trace_probe *tp, int i, const char *arg,
struct traceprobe_parse_context *ctx)
{
struct probe_arg *parg = &tp->args[i];
const char *body;
+ int ret;
ctx->tp = tp;
body = strchr(arg, '=');
@@ -2038,7 +2235,11 @@ int traceprobe_parse_probe_arg(struct trace_probe *tp, int i, const char *arg,
}
ctx->offset = body - arg;
/* Parse fetch argument */
- return traceprobe_parse_probe_arg_body(body, &tp->size, parg, ctx);
+ ret = traceprobe_parse_probe_arg_body(body, &tp->size, parg, ctx);
+ if (ret)
+ return ret;
+
+ return 0;
}
void traceprobe_free_probe_arg(struct probe_arg *arg)
@@ -2443,6 +2644,13 @@ void trace_probe_cleanup(struct trace_probe *tp)
for (i = 0; i < tp->nr_args; i++)
traceprobe_free_probe_arg(&tp->args[i]);
+#ifdef CONFIG_BPF_SYSCALL
+ if (tp->prog) {
+ bpf_prog_put(tp->prog);
+ tp->prog = NULL;
+ }
+#endif
+
if (tp->entry_arg) {
kfree(tp->entry_arg);
tp->entry_arg = NULL;
@@ -2531,15 +2739,32 @@ int trace_probe_register_event_call(struct trace_probe *tp)
trace_probe_name(tp)))
return -EEXIST;
+#ifdef CONFIG_BPF_SYSCALL
+ trace_probe_compile_bpf(tp);
+#endif
+
ret = register_trace_event(&call->event);
- if (!ret)
- return -ENODEV;
+ if (!ret) {
+ ret = -ENODEV;
+ goto err_free_bpf;
+ }
ret = trace_add_event_call(call);
- if (ret)
+ if (ret) {
unregister_trace_event(&call->event);
+ goto err_free_bpf;
+ }
return ret;
+
+err_free_bpf:
+#ifdef CONFIG_BPF_SYSCALL
+ if (tp->prog) {
+ bpf_prog_put(tp->prog);
+ tp->prog = NULL;
+ }
+#endif
+ return ret;
}
int trace_probe_add_file(struct trace_probe *tp, struct trace_event_file *file)
@@ -2768,5 +2993,21 @@ void trace_probe_dump_args(struct seq_file *m, struct trace_probe *tp)
for (i = 0; i < tp->nr_args; i++)
trace_probe_dump_arg(m, &tp->args[i]);
+
+#ifdef CONFIG_BPF_SYSCALL
+ if (tp->prog) {
+ seq_printf(m, "# [BPF%s]:", tp->prog->jited ? "-JIT" : "");
+ for (i = 0; i < tp->prog->len; i++) {
+ struct bpf_insn *insn = &tp->prog->insnsi[i];
+
+ seq_printf(m, " %02x %02x %04x %08x", insn->code,
+ insn->dst_reg | (insn->src_reg << 4),
+ insn->off, insn->imm);
+ if (i < tp->prog->len - 1)
+ seq_putc(m, ',');
+ }
+ seq_putc(m, '\n');
+ }
+#endif
}
#endif /* CONFIG_PROBE_EVENTS_DUMP_FETCHARG */
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index e6268a8dc378..10589414451c 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -274,6 +274,9 @@ struct trace_probe {
ssize_t size; /* trace entry size */
unsigned int nr_args;
struct probe_entry_arg *entry_arg; /* This is only for return probe */
+#ifdef CONFIG_BPF_SYSCALL
+ struct bpf_prog *prog;
+#endif
struct probe_arg args[];
};
@@ -299,6 +302,7 @@ static inline void trace_probe_set_flag(struct trace_probe *tp,
smp_store_release(&tp->event->flags, tp->event->flags | flag);
}
+
static inline void trace_probe_clear_flag(struct trace_probe *tp,
unsigned int flag)
{
@@ -631,3 +635,14 @@ struct uprobe_dispatch_data {
struct trace_uprobe *tu;
unsigned long bp_addr;
};
+
+#ifdef CONFIG_BPF_SYSCALL
+#include <linux/filter.h>
+
+struct fetch_bpf_ctx {
+ void *rec;
+ void *edata;
+ void *data;
+ void *base;
+};
+#endif
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 8db12f758fda..6ca2dfe59a0f 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -273,6 +273,19 @@ store_trace_args(void *data, struct trace_probe *tp, void *rec, void *edata,
u32 *dl; /* Data location */
int ret, i;
+#ifdef CONFIG_BPF_SYSCALL
+ if (tp->prog) {
+ struct fetch_bpf_ctx ctx = {
+ .rec = rec,
+ .edata = edata,
+ .data = data,
+ .base = base,
+ };
+ bpf_prog_run(tp->prog, &ctx);
+ return;
+ }
+#endif
+
for (i = 0; i < tp->nr_args; i++) {
arg = tp->args + i;
dl = data + arg->offset;
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH 3/4] tracing: Add disable_bpf trace option to ignore eBPF for fetchargs
2026-07-01 13:45 [RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF Masami Hiramatsu (Google)
2026-07-01 13:45 ` [RFC PATCH 1/4] tools/tracing: Add fetcharg performance micro-benchmark Masami Hiramatsu (Google)
2026-07-01 13:45 ` [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event Masami Hiramatsu (Google)
@ 2026-07-01 13:45 ` Masami Hiramatsu (Google)
2026-07-01 14:02 ` sashiko-bot
2026-07-01 13:46 ` [RFC PATCH 4/4] selftests/ftrace: Add a test for eBPF compiled fetchargs Masami Hiramatsu (Google)
3 siblings, 1 reply; 16+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-07-01 13:45 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Shuah Khan
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
linux-kselftest, bpf
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add a trace option "disable_bpf" to disable BPF execution for fetchargs,
forcing the execution to fallback to the interpreter loop. This is useful
for evaluating BPF compilation performance impact.
Assisted-by: Antigravity:gemini-3.5-flash
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
kernel/trace/trace.c | 7 +++++++
kernel/trace/trace.h | 8 ++++++++
kernel/trace/trace_probe_tmpl.h | 2 +-
3 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c9e182d40059..7c0f7b629fcb 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9940,6 +9940,13 @@ struct trace_array *trace_get_global_array(void)
}
#endif
+#ifdef CONFIG_BPF_SYSCALL
+bool trace_probe_bpf_disabled(void)
+{
+ return !!(global_trace.trace_flags & TRACE_ITER(DISABLE_BPF));
+}
+#endif
+
void __init early_trace_init(void)
{
if (tracepoint_printk) {
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 80fe152af1dd..bf83680e0ba7 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1503,6 +1503,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
C(PAUSE_ON_TRACE, "pause-on-trace"), \
C(HASH_PTR, "hash-ptr"), /* Print hashed pointer */ \
C(BITMASK_LIST, "bitmask-list"), \
+ C(DISABLE_BPF, "disable_bpf"), \
FUNCTION_FLAGS \
FGRAPH_FLAGS \
STACK_FLAGS \
@@ -2505,4 +2506,11 @@ static inline int rv_init_interface(void)
_args; \
})
+#ifdef CONFIG_BPF_SYSCALL
+bool trace_probe_bpf_disabled(void);
+#else
+static inline bool trace_probe_bpf_disabled(void) { return false; }
+#endif
+
#endif /* _LINUX_KERNEL_TRACE_H */
+
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 6ca2dfe59a0f..015208aefbaf 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -274,7 +274,7 @@ store_trace_args(void *data, struct trace_probe *tp, void *rec, void *edata,
int ret, i;
#ifdef CONFIG_BPF_SYSCALL
- if (tp->prog) {
+ if (tp->prog && !trace_probe_bpf_disabled()) {
struct fetch_bpf_ctx ctx = {
.rec = rec,
.edata = edata,
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH 4/4] selftests/ftrace: Add a test for eBPF compiled fetchargs
2026-07-01 13:45 [RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF Masami Hiramatsu (Google)
` (2 preceding siblings ...)
2026-07-01 13:45 ` [RFC PATCH 3/4] tracing: Add disable_bpf trace option to ignore eBPF for fetchargs Masami Hiramatsu (Google)
@ 2026-07-01 13:46 ` Masami Hiramatsu (Google)
2026-07-01 13:53 ` sashiko-bot
3 siblings, 1 reply; 16+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-07-01 13:46 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Shuah Khan
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
linux-kselftest, bpf
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add a selftest for trace probe BPF compilation to verify that BPF is
correctly compiled and executes properly for valid configurations,
and falls back to interpreter on unsupported cases.
Assisted-by: Antigravity:gemini-3.5-flash
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
.../ftrace/test.d/dynevent/test_bpf_fetchargs.tc | 51 ++++++++++++++++++++
1 file changed, 51 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc b/tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc
new file mode 100644
index 000000000000..6c6f4dd4517a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc
@@ -0,0 +1,51 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Dynamic event - test eBPF compiled fetchargs for tprobe and fprobe
+# requires: dynamic_events "t[:[<group>/][<event>]] <tracepoint> [<args>]":README
+
+# Check if the sample module is loaded
+if ! lsmod | grep -q trace_events_sample; then
+ modprobe trace-events-sample || exit_unsupported
+fi
+
+echo 0 > events/enable
+echo > dynamic_events
+clear_trace
+
+# Add a tprobe to sample-trace:foo_bar
+# foo_bar args: const char *foo, int bar, ...
+# So $arg1 is char ptr (points to "hello"), $arg2 is int.
+# Let's test REG (arg2) and DEREF (+0($arg1))
+# "hello" in little-endian 32-bit: 'h'(0x68), 'e'(0x65), 'l'(0x6c), 'l'(0x6c) -> 0x6c6c6568 -> 1819043176
+echo "t:test_tprobe foo_bar mybar=\$arg2 myfoo=+0(\$arg1):u32" >> dynamic_events
+
+# Add an fprobe to __traceiter_foo_bar
+if grep -q "__traceiter_foo_bar" /proc/kallsyms; then
+ # __traceiter_foo_bar args: void *__data, const char *foo, int bar, ...
+ # $arg1 is __data, $arg2 is foo (points to "hello")
+ # "hello" in little-endian 32-bit: 'h'(0x68), 'e'(0x65), 'l'(0x6c), 'l'(0x6c) -> 0x6c6c6568 -> 1819043176
+ echo "f:test_fprobe __traceiter_foo_bar myfoo=\$arg2 myfmt=+0(\$arg2):u32" >> dynamic_events
+fi
+
+echo 1 > events/sample-trace/foo_bar/enable
+echo 1 > events/tracepoints/test_tprobe/enable
+if [ -d events/fprobes/test_fprobe ]; then
+ echo 1 > events/fprobes/test_fprobe/enable
+fi
+
+# Wait for 2 seconds to let the sample thread trigger the events
+sleep 2
+
+echo 0 > events/enable
+
+# Now verify the trace
+grep -q "test_tprobe.*myfoo=1819043176" trace || exit_fail
+
+if [ -d events/fprobes/test_fprobe ]; then
+ grep -q "test_fprobe.*myfmt=1819043176" trace || exit_fail
+fi
+
+echo > dynamic_events
+rmmod trace-events-sample
+
+exit 0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 4/4] selftests/ftrace: Add a test for eBPF compiled fetchargs
2026-07-01 13:46 ` [RFC PATCH 4/4] selftests/ftrace: Add a test for eBPF compiled fetchargs Masami Hiramatsu (Google)
@ 2026-07-01 13:53 ` sashiko-bot
0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-07-01 13:53 UTC (permalink / raw)
Cc: bpf
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Medium] Hardcoded little-endian expectation for string memory read as u32 causes selftest failure on big-endian architectures.
- [Medium] Improper cleanup of `trace-events-sample` module causes resource leaks on early test failure and violates test isolation on success.
- [Medium] The test fails to verify the fallback to the interpreter on unsupported cases, despite claims in the commit message.
- [Low] The test configures multiple fetch arguments but only verifies the dereferenced string, ignoring the register argument.
--
commit 4bb415f7039780520dc7b73a0fda0fb23e2616db
Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
selftests/ftrace: Add a test for eBPF compiled fetchargs
Add a selftest for trace probe BPF compilation to verify that BPF is
correctly compiled and executes properly for valid configurations,
and falls back to interpreter on unsupported cases.
> Add a selftest for trace probe BPF compilation to verify that BPF is
> correctly compiled and executes properly for valid configurations,
> and falls back to interpreter on unsupported cases.
[Severity: Medium]
Does the test actually verify the fallback to the interpreter on unsupported
cases? The script configures REG and DEREF fetchargs, but there doesn't
appear to be a test case that intentionally uses an unsupported configuration
(such as dynamic strings) to verify the interpreter fallback mechanism.
> diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc b/tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc
> new file mode 100644
> index 0000000000000..6c6f4dd4517ac
> --- /dev/null
> +++ b/tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc
> @@ -0,0 +1,51 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +# description: Dynamic event - test eBPF compiled fetchargs for tprobe and fprobe
> +# requires: dynamic_events "t[:[<group>/][<event>]] <tracepoint> [<args>]":README
> +
> +# Check if the sample module is loaded
> +if ! lsmod | grep -q trace_events_sample; then
> + modprobe trace-events-sample || exit_unsupported
> +fi
[Severity: Medium]
If the script exits early on failure, will this module be left loaded
since there is no trap EXIT registered to clean it up?
[ ... ]
> +
> +# Now verify the trace
> +grep -q "test_tprobe.*myfoo=1819043176" trace || exit_fail
[Severity: Medium]
Will this hardcoded value cause false-positive test failures on big-endian
architectures? The test dereferences the string "hello" as a 32-bit unsigned
integer and unconditionally expects the little-endian integer representation
1819043176 (0x6c6c6568). On big-endian systems, this value would be
1751477356 (0x68656c6c).
[Severity: Low]
Does this grep ignore the register argument configured earlier? The tprobe
was configured to capture two arguments (mybar=$arg2 myfoo=+0($arg1):u32),
but this verification only checks for the presence of the dereferenced
myfoo value.
> +
> +if [ -d events/fprobes/test_fprobe ]; then
> + grep -q "test_fprobe.*myfmt=1819043176" trace || exit_fail
[Severity: Low]
Similarly, does this grep check ignore the myfoo register argument configured
earlier for the fprobe?
> +fi
> +
> +echo > dynamic_events
> +rmmod trace-events-sample
[Severity: Medium]
If the trace-events-sample module was already loaded before the test started,
will this unconditional rmmod incorrectly unload it and alter the preexisting
system state?
> +
> +exit 0
--
Sashiko AI review · https://sashiko.dev/#/patchset/178291352217.1566898.14481561093843379745.stgit@devnote2?part=4
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 1/4] tools/tracing: Add fetcharg performance micro-benchmark
2026-07-01 13:45 ` [RFC PATCH 1/4] tools/tracing: Add fetcharg performance micro-benchmark Masami Hiramatsu (Google)
@ 2026-07-01 13:53 ` sashiko-bot
0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-07-01 13:53 UTC (permalink / raw)
Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Divide by zero in benchmark calculation causing a kernel panic
- [Medium] Buffer Overflow / User-Space Out-of-Bounds Write due to ignored `count` in `fetcharg_bench_read`
--
commit e30ac26f087c17e4a2160772405d441b6e416abd
Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
tools/tracing: Add fetcharg performance micro-benchmark
This commit adds a benchmark test module and a shell script to measure the
execution overhead of fetchargs across kprobe, fprobe, and eprobe
configurations.
> diff --git a/tools/tracing/benchmark/fetcharg_bench.c b/tools/tracing/benchmark/fetcharg_bench.c
> new file mode 100644
> index 0000000000000..af18183c1f5d5
> --- /dev/null
> +++ b/tools/tracing/benchmark/fetcharg_bench.c
[ ... ]
> +static ssize_t fetcharg_bench_read(struct file *file, char __user *user_buf,
> + size_t count, loff_t *ppos)
> +{
> + char buf[64];
> + int len;
> + u64 start, current_time;
> + u64 elapsed;
> + u64 loops_per_sec;
[ ... ]
> + start = ktime_get_ns();
> + for (i = 0; i < BENCH_ITERATIONS; i++)
> + dummy += bench_func_ptr(a, b, c);
> + current_time = ktime_get_ns();
> +
> + elapsed = current_time - start;
> + loops_per_sec = ((u64)BENCH_ITERATIONS * NSEC_PER_SEC) / elapsed;
[Severity: High]
Could this result in a divide by zero? If the system lacks high-resolution
timers or if the loop completes within the same timer tick, elapsed could
evaluate to 0 and cause a kernel panic.
> +
> + len = snprintf(buf, sizeof(buf), "%llu\n", loops_per_sec);
> + if (len < 0)
> + return len;
> +
> + if (copy_to_user(user_buf, buf, len))
[Severity: Medium]
Does this code ignore the user-provided count parameter?
If a user program calls read() on the debugfs file with a count parameter
smaller than the formatted string length len, it appears we could write beyond
the bounds of the user-provided buffer by copying len bytes instead of
honoring the count.
--
Sashiko AI review · https://sashiko.dev/#/patchset/178291352217.1566898.14481561093843379745.stgit@devnote2?part=1
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 3/4] tracing: Add disable_bpf trace option to ignore eBPF for fetchargs
2026-07-01 13:45 ` [RFC PATCH 3/4] tracing: Add disable_bpf trace option to ignore eBPF for fetchargs Masami Hiramatsu (Google)
@ 2026-07-01 14:02 ` sashiko-bot
0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-07-01 14:02 UTC (permalink / raw)
Cc: bpf
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [Medium] The `disable_bpf` trace option is improperly exposed to instance directories and non-BPF builds, creating a dead configuration knob that silently ignores user input.
Pre-existing issues:
- [High] The `IS_ERR()` check on the return value of `bpf_prog_select_runtime()` is broken, leading to a silent failure and an uninitialized memory leak to user-space.
- [High] An early `BPF_EXIT_INSN()` generated for memory fetch failures aborts the entire probe fetch program, leaking uninitialized memory for all subsequent arguments.
--
commit 87050ba78ea1f240fee455c53d7eb225ed51e533
Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
tracing: Add disable_bpf trace option to ignore eBPF for fetchargs
Add a trace option "disable_bpf" to disable BPF execution for fetchargs,
forcing the execution to fallback to the interpreter loop. This is useful
for evaluating BPF compilation performance impact.
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 80fe152af1dd8..bf83680e0ba70 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -1503,6 +1503,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
> C(PAUSE_ON_TRACE, "pause-on-trace"), \
> C(HASH_PTR, "hash-ptr"), /* Print hashed pointer */ \
> C(BITMASK_LIST, "bitmask-list"), \
> + C(DISABLE_BPF, "disable_bpf"), \
[Severity: Medium]
Does this option need an #ifdef CONFIG_BPF_SYSCALL wrapper? It appears this
unconditionally adds the flag, making it visible even on kernels built without
BPF support.
Additionally, since trace_probe_bpf_disabled() strictly checks
global_trace.trace_flags, toggling this option inside a local trace instance
directory won't have any effect. Should this be added to
TOP_LEVEL_TRACE_FLAGS in kernel/trace/trace.c to restrict it to the global
instance?
> FUNCTION_FLAGS \
> FGRAPH_FLAGS \
> STACK_FLAGS \
[ ... ]
This isn't a bug introduced by this patch, but I noticed a couple of problems
in the previous patch of this series in kernel/trace/trace_probe.c.
[Severity: High]
This is a pre-existing issue, but in trace_probe_compile_bpf(),
bpf_prog_select_runtime() returns errors via its err pointer argument and
always returns a valid program pointer (or the original one), not an ERR_PTR:
prog = bpf_prog_select_runtime(prog, &err);
if (IS_ERR(prog))
goto out;
Because IS_ERR(prog) is always false, won't this silently attach a dummy
program (__bpf_prog_ret0_warn) to the trace probe on JIT failure? This dummy
program immediately returns 0 without fetching arguments. Since
trace_event_buffer_reserve() doesn't zero-initialize the raw ring buffer space,
does this result in leaking uninitialized memory to user-space?
[Severity: High]
This isn't a bug introduced by this patch, but also in
trace_probe_compile_bpf(), when compiling memory fetch instructions:
insns[i++] = BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 1);
insns[i++] = BPF_EXIT_INSN();
If copy_from_kernel_nofault() returns a negative error (e.g., from an invalid
pointer), doesn't this early exit terminate the entire BPF fetch program? Since
it stops processing any subsequent arguments, and trace_event_buffer_reserve()
doesn't zero-initialize the buffer, could this leak stale kernel ring buffer
memory to user space for all subsequent un-fetched arguments?
--
Sashiko AI review · https://sashiko.dev/#/patchset/178291352217.1566898.14481561093843379745.stgit@devnote2?part=3
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 13:45 ` [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event Masami Hiramatsu (Google)
@ 2026-07-01 14:06 ` sashiko-bot
2026-07-01 18:41 ` Alexei Starovoitov
1 sibling, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-07-01 14:06 UTC (permalink / raw)
Cc: bpf
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [Critical] Early termination of the BPF fetch program upon a single memory fault skips the processing of all subsequent probe arguments, exposing uninitialized ring-buffer memory to userspace.
- [Critical] The generated BPF program bypasses the verifier and leaves `stack_depth` as 0, causing JIT to omit stack allocation and leading to stack corruption during memory fetches.
- [Critical] Direct invocation of `copy_from_kernel_nofault` using `BPF_EMIT_CALL` violates kCFI and architecture calling conventions.
- [High] Failures from `bpf_prog_select_runtime` are silently ignored because the code checks `IS_ERR(prog)` instead of the `err` out-parameter.
- [High] Missing `refcnt` initialization of the manually allocated BPF program leads to a permanent memory leak.
--
commit fb399d3b83423ab453a35bd3c084a85256eaf7df
Author: Masami Hiramatsu (Google) <mhiramat@kernel.org>
tracing/probes: Compile all fetchargs into a single BPF program per event
This commit compiles trace probe arguments into a single BPF program to
reduce prologue and dispatching overhead. It supports basic BPF-compatible
arguments and falls back to the interpreter loop if non-BPF operations
are required.
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 18c2121223440..0deb53c22ae39 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
[ ... ]
> +static void trace_probe_compile_bpf(struct trace_probe *tp)
> +{
[ ... ]
> + case FETCH_OP_DEREF:
> + /* Add offset: R3 = R0 + code->offset (src) */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_2, BPF_REG_0);
> + if (code->offset)
> + insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2,
> + code->offset);
> + /* R1 = dst (R10 - 8 on stack) */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_10);
> + insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8);
[Severity: Critical]
Does this stack access corrupt the caller's stack frame?
Since this manually constructed BPF program bypasses the verifier,
prog->aux->stack_depth remains 0. Because the stack depth is 0, the JIT
compiler will not allocate a stack frame for the program. When the program
writes to R10 - 8, it might overwrite the caller's stack frame or
callee-saved registers, potentially leading to kernel panics.
> + /* R3 = size */
> + insns[i++] = BPF_MOV64_IMM(BPF_REG_3, sizeof(unsigned long));
> + /* Call copy_from_kernel_nofault(dst, src, size) */
> + insns[i++] = BPF_EMIT_CALL(copy_from_kernel_nofault);
[Severity: Critical]
Can this direct call to copy_from_kernel_nofault trigger a kCFI panic?
copy_from_kernel_nofault is a standard C function with a 3-argument
signature, but BPF_EMIT_CALL executes a BPF helper call which expects
to pass 5 arguments in 64-bit BPF registers. Under Control Flow Integrity,
invoking this through the interpreter's 5-argument function pointer might
cause a signature mismatch and panic the kernel. It could also break the
standard calling convention on 32-bit architectures.
> + /* if (R0 < 0) return R0; */
> + insns[i++] = BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 1);
> + insns[i++] = BPF_EXIT_INSN();
[Severity: Critical]
If a memory fault occurs here and the program exits early, does it leave the
remaining trace event ring buffer fields uninitialized?
Since trace_event_buffer_reserve() leaves the ring buffer uninitialized,
terminating the BPF program immediately on a single fault without zeroing
the current field or processing subsequent arguments might expose stale
kernel memory from the stack or heap to userspace when the buffer is
subsequently committed.
> + /* R0 = *(unsigned long *)(R10 - 8) */
> + insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8);
> + break;
[ ... ]
> + /* Epilogue: return R9 (0) */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_9);
> + insns[i++] = BPF_EXIT_INSN();
> +
> + prog = bpf_prog_alloc(bpf_prog_size(i), 0);
> + if (!prog)
> + goto out;
[Severity: High]
Is a prog->aux->refcnt initialization needed here?
bpf_prog_alloc() zero-initializes the bpf_prog_aux structure, leaving
refcnt as 0. When bpf_prog_put() is later called during cleanup,
atomic64_dec_and_test(0) will evaluate to false, which causes the
program memory to never be freed.
> +
> + prog->len = i;
> + memcpy(prog->insnsi, insns, prog->len * sizeof(struct bpf_insn));
> + prog->type = BPF_PROG_TYPE_KPROBE;
> +
> + prog = bpf_prog_select_runtime(prog, &err);
> + if (IS_ERR(prog))
> + goto out;
> + tp->prog = prog;
[Severity: High]
Does this error check properly handle failures from bpf_prog_select_runtime?
bpf_prog_select_runtime() does not return ERR_PTR on failure; instead, it
returns the program pointer and sets the out-parameter err to a negative
error code. Checking IS_ERR(prog) evaluates to false, which might cause
the error to be ignored. If JIT compilation fails and the interpreter is
unavailable, this leaves prog->bpf_func as NULL, leading to a NULL pointer
dereference later.
--
Sashiko AI review · https://sashiko.dev/#/patchset/178291352217.1566898.14481561093843379745.stgit@devnote2?part=2
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 13:45 ` [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event Masami Hiramatsu (Google)
2026-07-01 14:06 ` sashiko-bot
@ 2026-07-01 18:41 ` Alexei Starovoitov
2026-07-01 18:47 ` Steven Rostedt
2026-07-01 22:40 ` Masami Hiramatsu
1 sibling, 2 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2026-07-01 18:41 UTC (permalink / raw)
To: Masami Hiramatsu (Google), Steven Rostedt, Shuah Khan
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
linux-kselftest, bpf
On Wed Jul 1, 2026 at 6:45 AM PDT, Masami Hiramatsu (Google) wrote:
> +static bool trace_probe_can_compile_bpf(struct trace_probe *tp)
> +{
> + int i;
> +
> + if (tp->nr_args == 0)
> + return false;
> +
> + for (i = 0; i < tp->nr_args; i++) {
> + struct probe_arg *parg = &tp->args[i];
> + struct fetch_insn *code = parg->code;
> +
> + while (code->op != FETCH_OP_END) {
> + switch (code->op) {
> + case FETCH_OP_REG:
> + case FETCH_OP_IMM:
> + case FETCH_OP_DEREF:
> + case FETCH_OP_ST_RAW:
> + case FETCH_OP_ST_MEM:
> + break;
> + case FETCH_OP_ARG:
> + if (regs_get_kernel_argument_offset(code->param) < 0)
> + return false;
> + break;
> + default:
> + return false;
> + }
> + code++;
> + }
> + }
> + return true;
> +}
> +
> +static void trace_probe_compile_bpf(struct trace_probe *tp)
> +{
> + struct bpf_insn *insns;
> + int i = 0;
> + struct bpf_prog *prog;
> + int err, idx;
> +
> + if (!trace_probe_can_compile_bpf(tp))
> + return;
> +
> + insns = kmalloc_array(512, sizeof(struct bpf_insn), GFP_KERNEL);
> + if (!insns)
> + return;
> +
> + /* Prologue: R6 = ctx */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
> + /* R7 = ctx->rec */
> + insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_6,
> + offsetof(struct fetch_bpf_ctx, rec));
> + /* R8 = ctx->data */
> + insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_8, BPF_REG_6,
> + offsetof(struct fetch_bpf_ctx, data));
> + /* R9 = total size (0) */
> + insns[i++] = BPF_MOV64_IMM(BPF_REG_9, 0);
> +
> + for (idx = 0; idx < tp->nr_args; idx++) {
> + struct probe_arg *parg = &tp->args[idx];
> + struct fetch_insn *code = parg->code;
> +
> + while (code->op != FETCH_OP_END && i < 500) {
> + switch (code->op) {
> + case FETCH_OP_REG:
> + /* R0 = *(unsigned long *)(R7 + code->param) */
> + insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_7, code->param);
> + break;
> + case FETCH_OP_ARG: {
> + int offset = regs_get_kernel_argument_offset(code->param);
> + /* R0 = *(unsigned long *)(R7 + offset) */
> + insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_7, offset);
> + break;
> + }
> + case FETCH_OP_IMM:
> + insns[i++] = BPF_LD_IMM64(BPF_REG_0, code->immediate);
> + break;
> + case FETCH_OP_DEREF:
> + /* Add offset: R3 = R0 + code->offset (src) */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_2, BPF_REG_0);
> + if (code->offset)
> + insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2,
> + code->offset);
> + /* R1 = dst (R10 - 8 on stack) */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_10);
> + insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8);
> + /* R3 = size */
> + insns[i++] = BPF_MOV64_IMM(BPF_REG_3, sizeof(unsigned long));
> + /* Call copy_from_kernel_nofault(dst, src, size) */
> + insns[i++] = BPF_EMIT_CALL(copy_from_kernel_nofault);
> + /* if (R0 < 0) return R0; */
> + insns[i++] = BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 1);
> + insns[i++] = BPF_EXIT_INSN();
> + /* R0 = *(unsigned long *)(R10 - 8) */
> + insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8);
> + break;
> + case FETCH_OP_ST_RAW:
> + /* Store R0 into R8 (data) + parg->offset based on size */
> + switch (code->size) {
> + case 1:
> + insns[i++] = BPF_STX_MEM(BPF_B, BPF_REG_8, BPF_REG_0,
> + parg->offset);
> + break;
> + case 2:
> + insns[i++] = BPF_STX_MEM(BPF_H, BPF_REG_8, BPF_REG_0,
> + parg->offset);
> + break;
> + case 4:
> + insns[i++] = BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_0,
> + parg->offset);
> + break;
> + case 8:
> + insns[i++] = BPF_STX_MEM(BPF_DW, BPF_REG_8, BPF_REG_0,
> + parg->offset);
> + break;
> + }
> + break;
> + case FETCH_OP_ST_MEM:
> + /* Add offset: R2 = R0 + code->offset (src) */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_2, BPF_REG_0);
> + if (code->offset)
> + insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2,
> + code->offset);
> + /* R1 = dst (R8 + parg->offset) */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_8);
> + if (parg->offset)
> + insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
> + parg->offset);
> + /* R3 = size */
> + insns[i++] = BPF_MOV64_IMM(BPF_REG_3, code->size);
> + /* Call copy_from_kernel_nofault(dst, src, size) */
> + insns[i++] = BPF_EMIT_CALL(copy_from_kernel_nofault);
> + /* if (R0 < 0) return R0; */
> + insns[i++] = BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 1);
> + insns[i++] = BPF_EXIT_INSN();
> + break;
> + default:
> + goto out;
> + }
> + code++;
> + }
> + }
Nack.
I really don't like it.
There were days in the past when the kernel generating bpf directly was appealing.
These days are gone. Performance improvements for fetchargs is not a good reason
to add all this complexity and bypass verifier checks.
bpf insns should come from user space.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 18:41 ` Alexei Starovoitov
@ 2026-07-01 18:47 ` Steven Rostedt
2026-07-01 18:53 ` Alexei Starovoitov
2026-07-01 22:40 ` Masami Hiramatsu
1 sibling, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2026-07-01 18:47 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Masami Hiramatsu (Google), Shuah Khan, Mathieu Desnoyers,
linux-kernel, linux-trace-kernel, linux-kselftest, bpf
On Wed, 01 Jul 2026 11:41:26 -0700
"Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:
> Nack.
> I really don't like it.
So the nack is mostly your opinion and not technical?
> There were days in the past when the kernel generating bpf directly was appealing.
> These days are gone. Performance improvements for fetchargs is not a good reason
Why is performance *not* a good reason?
> to add all this complexity and bypass verifier checks.
The code lives in the kernel. What reason is there to add verification
checks? The point of verification is because we don't trust user space. Why
do we not trust the kernel?
> bpf insns should come from user space.
Why?
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 18:47 ` Steven Rostedt
@ 2026-07-01 18:53 ` Alexei Starovoitov
0 siblings, 0 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2026-07-01 18:53 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu (Google), Shuah Khan, Mathieu Desnoyers,
linux-kernel, linux-trace-kernel, linux-kselftest, bpf
On Wed Jul 1, 2026 at 11:47 AM PDT, Steven Rostedt wrote:
> On Wed, 01 Jul 2026 11:41:26 -0700
> "Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:
>
>> Nack.
>> I really don't like it.
>
> So the nack is mostly your opinion and not technical?
>
>> There were days in the past when the kernel generating bpf directly was appealing.
>> These days are gone. Performance improvements for fetchargs is not a good reason
>
> Why is performance *not* a good reason?
>
>> to add all this complexity and bypass verifier checks.
>
> The code lives in the kernel. What reason is there to add verification
> checks? The point of verification is because we don't trust user space. Why
> do we not trust the kernel?
>
>> bpf insns should come from user space.
>
> Why?
because I see this patchset as pointless kernel bloat with the "help" of bpf.
Hence the nack to avoid dragging bpf into this.
Call it philosophical disagreement if you like.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 18:41 ` Alexei Starovoitov
2026-07-01 18:47 ` Steven Rostedt
@ 2026-07-01 22:40 ` Masami Hiramatsu
2026-07-02 0:01 ` Alexei Starovoitov
1 sibling, 1 reply; 16+ messages in thread
From: Masami Hiramatsu @ 2026-07-01 22:40 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Steven Rostedt, Shuah Khan, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, linux-kselftest, bpf
On Wed, 01 Jul 2026 11:41:26 -0700
"Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:
>
> Nack.
> I really don't like it.
> There were days in the past when the kernel generating bpf directly was appealing.
> These days are gone. Performance improvements for fetchargs is not a good reason
> to add all this complexity and bypass verifier checks.
> bpf insns should come from user space.
Thanks for your comment!
OK, I don't mind because this is a kind of investigation project. And some
people had asked me about the same idea, now I can tell them the result.
I'm satisfied with the current outcome, as this development process gave me
insight into the implementation of BPF and demonstrated the potential for
optimization via JIT. :)
And also, as noted in the cover letter, the current performance of fetcharg
is better than I thought, and is good enough for debugging. :)
BTW, I'm also interested in calling the verifier on this generated code.
Even it it is not merged, I think showing the correct way to implement it
will be useful in the future.
Thank you,
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 22:40 ` Masami Hiramatsu
@ 2026-07-02 0:01 ` Alexei Starovoitov
2026-07-02 1:01 ` Masami Hiramatsu
2026-07-02 14:04 ` Steven Rostedt
0 siblings, 2 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2026-07-02 0:01 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: Steven Rostedt, Shuah Khan, Mathieu Desnoyers, LKML,
linux-trace-kernel, open list:KERNEL SELFTEST FRAMEWORK, bpf
On Wed, Jul 1, 2026 at 3:40 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Wed, 01 Jul 2026 11:41:26 -0700
> "Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:
>
> >
> > Nack.
> > I really don't like it.
> > There were days in the past when the kernel generating bpf directly was appealing.
> > These days are gone. Performance improvements for fetchargs is not a good reason
> > to add all this complexity and bypass verifier checks.
> > bpf insns should come from user space.
>
> Thanks for your comment!
> OK, I don't mind because this is a kind of investigation project. And some
> people had asked me about the same idea, now I can tell them the result.
>
> I'm satisfied with the current outcome, as this development process gave me
> insight into the implementation of BPF and demonstrated the potential for
> optimization via JIT. :)
>
> And also, as noted in the cover letter, the current performance of fetcharg
> is better than I thought, and is good enough for debugging. :)
>
> BTW, I'm also interested in calling the verifier on this generated code.
> Even it it is not merged, I think showing the correct way to implement it
> will be useful in the future.
The whole feature you're trying to do is imo reinvention of the wheel.
bpf could do that kind of filtering years ago.
I don't buy the excuse that embedded environments without any
kind of user space needs this facility.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-02 0:01 ` Alexei Starovoitov
@ 2026-07-02 1:01 ` Masami Hiramatsu
2026-07-02 14:04 ` Steven Rostedt
1 sibling, 0 replies; 16+ messages in thread
From: Masami Hiramatsu @ 2026-07-02 1:01 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Steven Rostedt, Shuah Khan, Mathieu Desnoyers, LKML,
linux-trace-kernel, open list:KERNEL SELFTEST FRAMEWORK, bpf
On Wed, 1 Jul 2026 17:01:01 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Wed, Jul 1, 2026 at 3:40 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > On Wed, 01 Jul 2026 11:41:26 -0700
> > "Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:
> >
> > >
> > > Nack.
> > > I really don't like it.
> > > There were days in the past when the kernel generating bpf directly was appealing.
> > > These days are gone. Performance improvements for fetchargs is not a good reason
> > > to add all this complexity and bypass verifier checks.
> > > bpf insns should come from user space.
> >
> > Thanks for your comment!
> > OK, I don't mind because this is a kind of investigation project. And some
> > people had asked me about the same idea, now I can tell them the result.
> >
> > I'm satisfied with the current outcome, as this development process gave me
> > insight into the implementation of BPF and demonstrated the potential for
> > optimization via JIT. :)
> >
> > And also, as noted in the cover letter, the current performance of fetcharg
> > is better than I thought, and is good enough for debugging. :)
> >
> > BTW, I'm also interested in calling the verifier on this generated code.
> > Even it it is not merged, I think showing the correct way to implement it
> > will be useful in the future.
>
> The whole feature you're trying to do is imo reinvention of the wheel.
> bpf could do that kind of filtering years ago.
> I don't buy the excuse that embedded environments without any
> kind of user space needs this facility.
>
Yeah, I understood it. I have no intention to merge this series.
This is totally just for fun and my interest about BPF.
But thanks for your comments. It is good to hear your thought in the RFC
stage. I can stop this series here with peace of mind. :)
Thank you!
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-02 0:01 ` Alexei Starovoitov
2026-07-02 1:01 ` Masami Hiramatsu
@ 2026-07-02 14:04 ` Steven Rostedt
1 sibling, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2026-07-02 14:04 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Masami Hiramatsu, Shuah Khan, Mathieu Desnoyers, LKML,
linux-trace-kernel, open list:KERNEL SELFTEST FRAMEWORK, bpf
On Wed, 1 Jul 2026 17:01:01 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> The whole feature you're trying to do is imo reinvention of the wheel.
> bpf could do that kind of filtering years ago.
> I don't buy the excuse that embedded environments without any
> kind of user space needs this facility.
And what embedded environments do you work with?
Feel free to go to Embedded Recipes and talk with developers there.
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-07-02 14:04 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01 13:45 [RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF Masami Hiramatsu (Google)
2026-07-01 13:45 ` [RFC PATCH 1/4] tools/tracing: Add fetcharg performance micro-benchmark Masami Hiramatsu (Google)
2026-07-01 13:53 ` sashiko-bot
2026-07-01 13:45 ` [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event Masami Hiramatsu (Google)
2026-07-01 14:06 ` sashiko-bot
2026-07-01 18:41 ` Alexei Starovoitov
2026-07-01 18:47 ` Steven Rostedt
2026-07-01 18:53 ` Alexei Starovoitov
2026-07-01 22:40 ` Masami Hiramatsu
2026-07-02 0:01 ` Alexei Starovoitov
2026-07-02 1:01 ` Masami Hiramatsu
2026-07-02 14:04 ` Steven Rostedt
2026-07-01 13:45 ` [RFC PATCH 3/4] tracing: Add disable_bpf trace option to ignore eBPF for fetchargs Masami Hiramatsu (Google)
2026-07-01 14:02 ` sashiko-bot
2026-07-01 13:46 ` [RFC PATCH 4/4] selftests/ftrace: Add a test for eBPF compiled fetchargs Masami Hiramatsu (Google)
2026-07-01 13:53 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox