* [RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF
@ 2026-07-01 13:45 Masami Hiramatsu (Google)
2026-07-01 13:45 ` [RFC PATCH 1/4] tools/tracing: Add fetcharg performance micro-benchmark Masami Hiramatsu (Google)
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-07-01 13:45 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Shuah Khan
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
linux-kselftest, bpf
Hi,
I investigated the feasibility of optimizing `fetcharg` in probe events
using BPF conversion. The result looks promising. It can reduce about
30% of overhead (and maybe more if we have more than 3 arguments.)
I actually thought there was not such a big difference because I guessed
major overhead source is unsafe pointer dereferencing (e.g.
copy_from_kernel_nofault()). Actually without CONFIG_BPF_JIT, the overhead
is more than double. But with the JIT compiler it showed better performance.
The basic concept is quite simple. The process remains the same up until
the point where user input is converted into `fetcharg` code. It is
possible to convert some of the fundamental `fetcharg` operations into
an equivalent sequence of BPF instructions. This creates a single
`bpf_prog` for each probe event (rather than one per argument).
This program executes within the event handler, reads `pt_regs` directly,
and stores the results in the ftrace ring buffer, just as `fetcharg`
does.
So here are the benchmark results on qemu (KVM) on Intel Core i7-8565U.
When enabling BPF with JIT:
--------------------------------------------------------------------------------
Configuration 0 Fetchargs 1 Fetcharg 2 Fetchargs 3 Fetchargs
--------------------------------------------------------------------------------
Baseline 298882359 - - - loops/sec
- - - - overhead
Kprobe 9740841 8664195 7944956 7608274 loops/sec
99.31 ns 12.76 ns 23.21 ns 28.78 ns overhead
Fprobe 10827749 9220918 7992512 7683757 loops/sec
89.01 ns 16.09 ns 32.76 ns 37.79 ns overhead
Eprobe 6746389 6245994 5319037 4845406 loops/sec
144.88 ns 11.88 ns 39.78 ns 58.15 ns overhead
--------------------------------------------------------------------------------
When enabling BPF without JIT:
-----------------------------------------------------------------------------------------------
Configuration 0 Fetchargs 1 Fetcharg 2 Fetchargs 3 Fetchargs
-----------------------------------------------------------------------------------------------
Baseline 84067374 - - - loops/sec
- - - - overhead
Kprobe 7092949 5834913 3848776 3443408 loops/sec
129.09 ns 30.40 ns 118.84 ns 149.42 ns overhead
Fprobe 9426302 6441734 4350313 3710814 loops/sec
94.19 ns 49.15 ns 123.78 ns 163.40 ns overhead
Eprobe 5681716 4958113 3940999 3953434 loops/sec
164.11 ns 25.69 ns 77.74 ns 76.94 ns overhead
-----------------------------------------------------------------------------------------------
When disabling BPF (legacy fetcharg)
--------------------------------------------------------------------------------
Configuration 0 Fetchargs 1 Fetcharg 2 Fetchargs 3 Fetchargs
--------------------------------------------------------------------------------
Baseline 245433525 - - - loops/sec
- - - - overhead
Kprobe 9055348 8488351 7219595 6453928 loops/sec
106.36 ns 7.38 ns 28.08 ns 44.51 ns overhead
Fprobe 10859326 9288801 7492518 6607046 loops/sec
88.01 ns 15.57 ns 41.38 ns 59.27 ns overhead
Eprobe 6987128 5114526 5055084 4803759 loops/sec
139.05 ns 52.40 ns 54.70 ns 65.05 ns overhead
--------------------------------------------------------------------------------
The number is still unstable (because of the benchmark problem) but the
trend shows the BPF+JIT is the winner.
TODOs:
- Add a new Kconfig which depends on CONFIG_BPF_JIT=y.
- Even if a single dereference operation fails, processing of subsequent
arguments continues.
- Allow mixing with unsupported FETCH_OPs on the same event.
Thank you,
---
base-commit: c0c56fe6fb52cfb28419242cfa6235125f818f94
Masami Hiramatsu (Google) (4):
tools/tracing: Add fetcharg performance micro-benchmark
tracing/probes: Compile all fetchargs into a single BPF program per event
tracing: Add disable_bpf trace option to ignore eBPF for fetchargs
selftests/ftrace: Add a test for eBPF compiled fetchargs
kernel/trace/trace.c | 7 +
kernel/trace/trace.h | 8 +
kernel/trace/trace_probe.c | 249 ++++++++++++++++++++
kernel/trace/trace_probe.h | 15 +
kernel/trace/trace_probe_tmpl.h | 13 +
.../ftrace/test.d/dynevent/test_bpf_fetchargs.tc | 51 ++++
tools/tracing/benchmark/Kbuild | 3
tools/tracing/benchmark/Makefile | 12 +
tools/tracing/benchmark/bench_fetcharg.sh | 195 ++++++++++++++++
tools/tracing/benchmark/fetcharg_bench.c | 98 ++++++++
tools/tracing/benchmark/fetcharg_bench_trace.h | 37 +++
11 files changed, 684 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc
create mode 100644 tools/tracing/benchmark/Kbuild
create mode 100644 tools/tracing/benchmark/Makefile
create mode 100755 tools/tracing/benchmark/bench_fetcharg.sh
create mode 100644 tools/tracing/benchmark/fetcharg_bench.c
create mode 100644 tools/tracing/benchmark/fetcharg_bench_trace.h
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC PATCH 1/4] tools/tracing: Add fetcharg performance micro-benchmark
2026-07-01 13:45 [RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF Masami Hiramatsu (Google)
@ 2026-07-01 13:45 ` Masami Hiramatsu (Google)
2026-07-01 13:45 ` [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event Masami Hiramatsu (Google)
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-07-01 13:45 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Shuah Khan
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
linux-kselftest, bpf
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add a benchmark test module (fetcharg_bench) and bench_fetcharg.sh script
to measure the execution overhead of fetchargs across kprobe, fprobe,
and eprobe configurations.
The benchmark runs for a baseline (no probe events), 0-arguments,
1-argument, 2-arguments, and 3-arguments configurations, calculating
the estimated overhead in nanoseconds. It also supports a --debug option
to dump registered dynamic events.
Assisted-by: Antigravity:gemini-3.5-flash
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
tools/tracing/benchmark/Kbuild | 3
tools/tracing/benchmark/Makefile | 12 +
tools/tracing/benchmark/bench_fetcharg.sh | 195 ++++++++++++++++++++++++
tools/tracing/benchmark/fetcharg_bench.c | 98 ++++++++++++
tools/tracing/benchmark/fetcharg_bench_trace.h | 37 +++++
5 files changed, 345 insertions(+)
create mode 100644 tools/tracing/benchmark/Kbuild
create mode 100644 tools/tracing/benchmark/Makefile
create mode 100755 tools/tracing/benchmark/bench_fetcharg.sh
create mode 100644 tools/tracing/benchmark/fetcharg_bench.c
create mode 100644 tools/tracing/benchmark/fetcharg_bench_trace.h
diff --git a/tools/tracing/benchmark/Kbuild b/tools/tracing/benchmark/Kbuild
new file mode 100644
index 000000000000..4c31b26ca51c
--- /dev/null
+++ b/tools/tracing/benchmark/Kbuild
@@ -0,0 +1,3 @@
+obj-m += fetcharg_bench.o
+
+ccflags-y += -I$(src)
diff --git a/tools/tracing/benchmark/Makefile b/tools/tracing/benchmark/Makefile
new file mode 100644
index 000000000000..bf5b3cbaff03
--- /dev/null
+++ b/tools/tracing/benchmark/Makefile
@@ -0,0 +1,12 @@
+ifeq ($(O),)
+KDIR ?= /lib/modules/$(shell uname -r)/build
+else
+KDIR := $(O)
+endif
+MDIR := $(CURDIR)
+
+all:
+ $(MAKE) -C $(KDIR) M=$(MDIR) modules
+
+clean:
+ $(MAKE) -C $(KDIR) M=$(MDIR) clean
diff --git a/tools/tracing/benchmark/bench_fetcharg.sh b/tools/tracing/benchmark/bench_fetcharg.sh
new file mode 100755
index 000000000000..0b2a2b8a896e
--- /dev/null
+++ b/tools/tracing/benchmark/bench_fetcharg.sh
@@ -0,0 +1,195 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# description: Benchmark fetcharg performance (baseline vs kprobe vs fprobe vs eprobe)
+
+DEBUG=0
+while [[ $# -gt 0 ]]; do
+ case "$1" in
+ --debug|-d)
+ DEBUG=1
+ shift
+ ;;
+ *)
+ echo "Unknown option: $1"
+ echo "Usage: $0 [--debug|-d]"
+ exit 1
+ ;;
+ esac
+done
+
+DEBUGFS_MOUNT=$(grep ^debugfs /proc/mounts | awk '{print $2}')
+if [ -z "$DEBUGFS_MOUNT" ]; then
+ mount -t debugfs nodev /sys/kernel/debug
+ DEBUGFS_MOUNT="/sys/kernel/debug"
+fi
+
+TRACEFS_MOUNT=$(grep ^tracefs /proc/mounts | awk '{print $2}')
+if [ -z "$TRACEFS_MOUNT" ]; then
+ mount -t tracefs nodev /sys/kernel/tracing
+ TRACEFS_MOUNT="/sys/kernel/tracing"
+fi
+
+MOD_NAME="fetcharg_bench"
+MOD_FILE="./${MOD_NAME}.ko"
+
+if [ ! -f "$MOD_FILE" ]; then
+ echo "Module $MOD_FILE not found. Please run 'make' first."
+ exit 1
+fi
+
+rmmod $MOD_NAME 2>/dev/null
+insmod $MOD_FILE || { echo "Failed to load $MOD_FILE"; exit 1; }
+
+TRIGGER_FILE="${DEBUGFS_MOUNT}/fetcharg_benchmark/trigger"
+
+if [ ! -f "$TRIGGER_FILE" ]; then
+ echo "Trigger file $TRIGGER_FILE not found."
+ rmmod $MOD_NAME
+ exit 1
+fi
+
+DYN_EVENTS="${TRACEFS_MOUNT}/dynamic_events"
+
+# Helper to clear events
+clear_events() {
+ echo 0 > "${TRACEFS_MOUNT}/events/enable"
+ echo > "$DYN_EVENTS"
+}
+
+run_bench() {
+ if [ "$DEBUG" = "1" ]; then
+ echo "=== [DEBUG] dynamic_events ===" >&2
+ cat "$DYN_EVENTS" >&2
+ echo "==============================" >&2
+ fi
+ cat "$TRIGGER_FILE"
+}
+
+calc_overhead() {
+ local lps=$1
+ local base_lps=$2
+ if [ -z "$lps" ] || [ -z "$base_lps" ] || [ "$lps" = "-" ] || [ "$base_lps" = "-" ]; then
+ echo "-"
+ return
+ fi
+ awk -v lps="$lps" -v base_lps="$base_lps" 'BEGIN {
+ if (lps == 0 || base_lps == 0) {
+ print "-"
+ exit
+ }
+ t = 1000000000.0 / lps
+ t_base = 1000000000.0 / base_lps
+ diff = t - t_base
+ printf "%.2f ns", diff
+ }'
+}
+
+echo "Running Fetcharg Micro Benchmark..."
+echo "Please wait, this may take a few seconds..."
+
+# Baseline
+clear_events
+baseline=$(run_bench)
+
+# Kprobe
+clear_events
+echo "p:bench_kprobe fetcharg_bench_target" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/kprobes/bench_kprobe/enable"
+kprobe_0=$(run_bench)
+
+clear_events
+echo "p:bench_kprobe fetcharg_bench_target a=\$arg1" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/kprobes/bench_kprobe/enable"
+kprobe_1=$(run_bench)
+
+clear_events
+echo "p:bench_kprobe fetcharg_bench_target a=\$arg1 b=+0(+0(\$arg2)):u32" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/kprobes/bench_kprobe/enable"
+kprobe_2=$(run_bench)
+
+clear_events
+echo "p:bench_kprobe fetcharg_bench_target a=\$arg1 b=+0(+0(\$arg2)):u32 c=+0(\$arg3):u32" \
+ >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/kprobes/bench_kprobe/enable"
+kprobe_3=$(run_bench)
+
+# Fprobe
+clear_events
+echo "f:bench_fprobe fetcharg_bench_target" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/fprobes/bench_fprobe/enable"
+fprobe_0=$(run_bench)
+
+clear_events
+echo "f:bench_fprobe fetcharg_bench_target a=\$arg1" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/fprobes/bench_fprobe/enable"
+fprobe_1=$(run_bench)
+
+clear_events
+echo "f:bench_fprobe fetcharg_bench_target a=\$arg1 b=+0(+0(\$arg2)):u32" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/fprobes/bench_fprobe/enable"
+fprobe_2=$(run_bench)
+
+clear_events
+echo "f:bench_fprobe fetcharg_bench_target a=\$arg1 b=+0(+0(\$arg2)):u32 c=+0(\$arg3):u32" \
+ >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/fprobes/bench_fprobe/enable"
+fprobe_3=$(run_bench)
+
+# Eprobe
+clear_events
+echo "e:bench_eprobe fetcharg_bench/fetcharg_bench_event" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/eprobes/bench_eprobe/enable"
+echo 1 > "${TRACEFS_MOUNT}/events/fetcharg_bench/fetcharg_bench_event/enable"
+eprobe_0=$(run_bench)
+
+clear_events
+echo "e:bench_eprobe fetcharg_bench/fetcharg_bench_event a=\$a" >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/eprobes/bench_eprobe/enable"
+echo 1 > "${TRACEFS_MOUNT}/events/fetcharg_bench/fetcharg_bench_event/enable"
+eprobe_1=$(run_bench)
+
+clear_events
+echo "e:bench_eprobe fetcharg_bench/fetcharg_bench_event a=\$a b=+0(+0(\$b_ptr)):u32" \
+ >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/eprobes/bench_eprobe/enable"
+echo 1 > "${TRACEFS_MOUNT}/events/fetcharg_bench/fetcharg_bench_event/enable"
+eprobe_2=$(run_bench)
+
+clear_events
+echo "e:bench_eprobe fetcharg_bench/fetcharg_bench_event a=\$a b=+0(+0(\$b_ptr)):u32 c=+0(\$c_ptr):u32" \
+ >> "$DYN_EVENTS"
+echo 1 > "${TRACEFS_MOUNT}/events/eprobes/bench_eprobe/enable"
+echo 1 > "${TRACEFS_MOUNT}/events/fetcharg_bench/fetcharg_bench_event/enable"
+eprobe_3=$(run_bench)
+
+echo "--------------------------------------------------------------------------------"
+echo "Configuration 0 Fetchargs 1 Fetcharg 2 Fetchargs 3 Fetchargs"
+echo "--------------------------------------------------------------------------------"
+printf "%-18s %15s %15s %18s %18s loops/sec\n" "Baseline" "$baseline" "-" "-" "-"
+printf "%-18s %15s %15s %18s %18s overhead\n" " " "-" "-" "-" "-"
+printf "%-18s %15s %15s %18s %18s loops/sec\n" \
+ "Kprobe" "$kprobe_0" "$kprobe_1" "$kprobe_2" "$kprobe_3"
+printf "%-18s %15s %15s %18s %18s overhead\n" " " \
+ "$(calc_overhead $kprobe_0 $baseline)" \
+ "$(calc_overhead $kprobe_1 $kprobe_0)" \
+ "$(calc_overhead $kprobe_2 $kprobe_0)" \
+ "$(calc_overhead $kprobe_3 $kprobe_0)"
+printf "%-18s %15s %15s %18s %18s loops/sec\n" \
+ "Fprobe" "$fprobe_0" "$fprobe_1" "$fprobe_2" "$fprobe_3"
+printf "%-18s %15s %15s %18s %18s overhead\n" " " \
+ "$(calc_overhead $fprobe_0 $baseline)" \
+ "$(calc_overhead $fprobe_1 $fprobe_0)" \
+ "$(calc_overhead $fprobe_2 $fprobe_0)" \
+ "$(calc_overhead $fprobe_3 $fprobe_0)"
+printf "%-18s %15s %15s %18s %18s loops/sec\n" \
+ "Eprobe" "$eprobe_0" "$eprobe_1" "$eprobe_2" "$eprobe_3"
+printf "%-18s %15s %15s %18s %18s overhead\n" " " \
+ "$(calc_overhead $eprobe_0 $baseline)" \
+ "$(calc_overhead $eprobe_1 $eprobe_0)" \
+ "$(calc_overhead $eprobe_2 $eprobe_0)" \
+ "$(calc_overhead $eprobe_3 $eprobe_0)"
+echo "--------------------------------------------------------------------------------"
+
+clear_events
+rmmod $MOD_NAME
+exit 0
diff --git a/tools/tracing/benchmark/fetcharg_bench.c b/tools/tracing/benchmark/fetcharg_bench.c
new file mode 100644
index 000000000000..af18183c1f5d
--- /dev/null
+++ b/tools/tracing/benchmark/fetcharg_bench.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/ktime.h>
+#include <linux/debugfs.h>
+#include <linux/uaccess.h>
+
+#define CREATE_TRACE_POINTS
+#include "fetcharg_bench_trace.h"
+
+static noinline int fetcharg_bench_target(int a, int **b, char *c)
+{
+ /* Prevent compiler from optimizing the loop out entirely */
+ asm volatile ("" : : "r"(a), "r"(b), "r"(c) : "memory");
+ trace_fetcharg_bench_event(a, b, c);
+ return a + **b;
+}
+
+/* Indirect pointer to prevent inlining */
+static int (*bench_func_ptr)(int, int **, char *) = fetcharg_bench_target;
+
+#define BENCH_ITERATIONS 1000000
+
+static ssize_t fetcharg_bench_read(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ char buf[64];
+ int len;
+ u64 start, current_time;
+ u64 elapsed;
+ u64 loops_per_sec;
+ int dummy = 0;
+ int a = 1;
+ int b_val = 2;
+ int *b_ptr = &b_val;
+ int **b = &b_ptr;
+ char c[] = "benchmark";
+ int i;
+
+ if (*ppos > 0)
+ return 0; /* EOF */
+
+ start = ktime_get_ns();
+ for (i = 0; i < BENCH_ITERATIONS; i++)
+ dummy += bench_func_ptr(a, b, c);
+ current_time = ktime_get_ns();
+
+ elapsed = current_time - start;
+ loops_per_sec = ((u64)BENCH_ITERATIONS * NSEC_PER_SEC) / elapsed;
+
+ len = snprintf(buf, sizeof(buf), "%llu\n", loops_per_sec);
+ if (len < 0)
+ return len;
+
+ if (copy_to_user(user_buf, buf, len))
+ return -EFAULT;
+
+ *ppos += len;
+
+ /*
+ * Use 'dummy' to ensure the compiler doesn't optimize out
+ * the call completely, though the asm volatile helps too.
+ */
+ if (dummy == 0xdeadbeef)
+ pr_info("dummy=%d\n", dummy);
+
+ return len;
+}
+
+static const struct file_operations fetcharg_bench_fops = {
+ .read = fetcharg_bench_read,
+ .open = simple_open,
+ .llseek = default_llseek,
+};
+
+static struct dentry *bench_dir;
+
+static int __init fetcharg_bench_init(void)
+{
+ bench_dir = debugfs_create_dir("fetcharg_benchmark", NULL);
+ if (!bench_dir)
+ return -ENOMEM;
+
+ debugfs_create_file("trigger", 0444, bench_dir, NULL, &fetcharg_bench_fops);
+
+ return 0;
+}
+
+static void __exit fetcharg_bench_exit(void)
+{
+ debugfs_remove_recursive(bench_dir);
+}
+
+module_init(fetcharg_bench_init);
+module_exit(fetcharg_bench_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Antigravity");
+MODULE_DESCRIPTION("Fetcharg performance benchmark test module");
diff --git a/tools/tracing/benchmark/fetcharg_bench_trace.h b/tools/tracing/benchmark/fetcharg_bench_trace.h
new file mode 100644
index 000000000000..6560f62337e3
--- /dev/null
+++ b/tools/tracing/benchmark/fetcharg_bench_trace.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fetcharg_bench
+
+#if !defined(_FETCHARG_BENCH_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _FETCHARG_BENCH_TRACE_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(fetcharg_bench_event,
+
+ TP_PROTO(int a, int **b, char *c),
+
+ TP_ARGS(a, b, c),
+
+ TP_STRUCT__entry(
+ __field(int, a)
+ __field(int **, b_ptr)
+ __field(char *, c_ptr)
+ ),
+
+ TP_fast_assign(
+ __entry->a = a;
+ __entry->b_ptr = b;
+ __entry->c_ptr = c;
+ ),
+
+ TP_printk("a=%d b=%p c=%p", __entry->a, __entry->b_ptr, __entry->c_ptr)
+);
+
+#endif /* _FETCHARG_BENCH_TRACE_H */
+
+#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE fetcharg_bench_trace
+#include <trace/define_trace.h>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 13:45 [RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF Masami Hiramatsu (Google)
2026-07-01 13:45 ` [RFC PATCH 1/4] tools/tracing: Add fetcharg performance micro-benchmark Masami Hiramatsu (Google)
@ 2026-07-01 13:45 ` Masami Hiramatsu (Google)
2026-07-01 18:41 ` Alexei Starovoitov
2026-07-01 13:45 ` [RFC PATCH 3/4] tracing: Add disable_bpf trace option to ignore eBPF for fetchargs Masami Hiramatsu (Google)
2026-07-01 13:46 ` [RFC PATCH 4/4] selftests/ftrace: Add a test for eBPF compiled fetchargs Masami Hiramatsu (Google)
3 siblings, 1 reply; 12+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-07-01 13:45 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Shuah Khan
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
linux-kselftest, bpf
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Compile all fetch arguments of a trace probe event into a single BPF
program instead of separate programs per argument to reduce prologue
and dispatching overhead.
BPF-compatible arguments (such as register, immediate, dereferences,
and raw stores) are compiled, including registers mapping for x86_64,
arm64, and s390. If any argument requires non-BPF operations (such as
dynamic strings), we fallback to the interpreter loop for all arguments.
Also, correctly initialize prog->len to prevent invalid opcode execution in
the BPF interpreter.
Assisted-by: Antigravity:gemini-3.5-flash
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
kernel/trace/trace_probe.c | 249 ++++++++++++++++++++++++++++++++++++++-
kernel/trace/trace_probe.h | 15 ++
kernel/trace/trace_probe_tmpl.h | 13 ++
3 files changed, 273 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 18c212122344..0deb53c22ae3 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -2003,11 +2003,208 @@ static char *generate_probe_arg_name(const char *arg, int idx)
return name;
}
+#ifdef CONFIG_BPF_SYSCALL
+#include <linux/filter.h>
+#include <linux/uaccess.h>
+
+static int regs_get_kernel_argument_offset(unsigned int n)
+{
+#ifdef CONFIG_X86_64
+ static const int argument_offsets[] = {
+ offsetof(struct pt_regs, di),
+ offsetof(struct pt_regs, si),
+ offsetof(struct pt_regs, dx),
+ offsetof(struct pt_regs, cx),
+ offsetof(struct pt_regs, r8),
+ offsetof(struct pt_regs, r9),
+ };
+ if (n < ARRAY_SIZE(argument_offsets))
+ return argument_offsets[n];
+#elif defined(CONFIG_ARM64)
+ if (n < 8)
+ return offsetof(struct pt_regs, regs[n]);
+#elif defined(CONFIG_S390)
+ if (n < 5)
+ return offsetof(struct pt_regs, gprs[2 + n]);
+#endif
+ return -1;
+}
+
+static bool trace_probe_can_compile_bpf(struct trace_probe *tp)
+{
+ int i;
+
+ if (tp->nr_args == 0)
+ return false;
+
+ for (i = 0; i < tp->nr_args; i++) {
+ struct probe_arg *parg = &tp->args[i];
+ struct fetch_insn *code = parg->code;
+
+ while (code->op != FETCH_OP_END) {
+ switch (code->op) {
+ case FETCH_OP_REG:
+ case FETCH_OP_IMM:
+ case FETCH_OP_DEREF:
+ case FETCH_OP_ST_RAW:
+ case FETCH_OP_ST_MEM:
+ break;
+ case FETCH_OP_ARG:
+ if (regs_get_kernel_argument_offset(code->param) < 0)
+ return false;
+ break;
+ default:
+ return false;
+ }
+ code++;
+ }
+ }
+ return true;
+}
+
+static void trace_probe_compile_bpf(struct trace_probe *tp)
+{
+ struct bpf_insn *insns;
+ int i = 0;
+ struct bpf_prog *prog;
+ int err, idx;
+
+ if (!trace_probe_can_compile_bpf(tp))
+ return;
+
+ insns = kmalloc_array(512, sizeof(struct bpf_insn), GFP_KERNEL);
+ if (!insns)
+ return;
+
+ /* Prologue: R6 = ctx */
+ insns[i++] = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
+ /* R7 = ctx->rec */
+ insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_6,
+ offsetof(struct fetch_bpf_ctx, rec));
+ /* R8 = ctx->data */
+ insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_8, BPF_REG_6,
+ offsetof(struct fetch_bpf_ctx, data));
+ /* R9 = total size (0) */
+ insns[i++] = BPF_MOV64_IMM(BPF_REG_9, 0);
+
+ for (idx = 0; idx < tp->nr_args; idx++) {
+ struct probe_arg *parg = &tp->args[idx];
+ struct fetch_insn *code = parg->code;
+
+ while (code->op != FETCH_OP_END && i < 500) {
+ switch (code->op) {
+ case FETCH_OP_REG:
+ /* R0 = *(unsigned long *)(R7 + code->param) */
+ insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_7, code->param);
+ break;
+ case FETCH_OP_ARG: {
+ int offset = regs_get_kernel_argument_offset(code->param);
+ /* R0 = *(unsigned long *)(R7 + offset) */
+ insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_7, offset);
+ break;
+ }
+ case FETCH_OP_IMM:
+ insns[i++] = BPF_LD_IMM64(BPF_REG_0, code->immediate);
+ break;
+ case FETCH_OP_DEREF:
+ /* Add offset: R3 = R0 + code->offset (src) */
+ insns[i++] = BPF_MOV64_REG(BPF_REG_2, BPF_REG_0);
+ if (code->offset)
+ insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2,
+ code->offset);
+ /* R1 = dst (R10 - 8 on stack) */
+ insns[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_10);
+ insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8);
+ /* R3 = size */
+ insns[i++] = BPF_MOV64_IMM(BPF_REG_3, sizeof(unsigned long));
+ /* Call copy_from_kernel_nofault(dst, src, size) */
+ insns[i++] = BPF_EMIT_CALL(copy_from_kernel_nofault);
+ /* if (R0 < 0) return R0; */
+ insns[i++] = BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 1);
+ insns[i++] = BPF_EXIT_INSN();
+ /* R0 = *(unsigned long *)(R10 - 8) */
+ insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8);
+ break;
+ case FETCH_OP_ST_RAW:
+ /* Store R0 into R8 (data) + parg->offset based on size */
+ switch (code->size) {
+ case 1:
+ insns[i++] = BPF_STX_MEM(BPF_B, BPF_REG_8, BPF_REG_0,
+ parg->offset);
+ break;
+ case 2:
+ insns[i++] = BPF_STX_MEM(BPF_H, BPF_REG_8, BPF_REG_0,
+ parg->offset);
+ break;
+ case 4:
+ insns[i++] = BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_0,
+ parg->offset);
+ break;
+ case 8:
+ insns[i++] = BPF_STX_MEM(BPF_DW, BPF_REG_8, BPF_REG_0,
+ parg->offset);
+ break;
+ }
+ break;
+ case FETCH_OP_ST_MEM:
+ /* Add offset: R2 = R0 + code->offset (src) */
+ insns[i++] = BPF_MOV64_REG(BPF_REG_2, BPF_REG_0);
+ if (code->offset)
+ insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2,
+ code->offset);
+ /* R1 = dst (R8 + parg->offset) */
+ insns[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_8);
+ if (parg->offset)
+ insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
+ parg->offset);
+ /* R3 = size */
+ insns[i++] = BPF_MOV64_IMM(BPF_REG_3, code->size);
+ /* Call copy_from_kernel_nofault(dst, src, size) */
+ insns[i++] = BPF_EMIT_CALL(copy_from_kernel_nofault);
+ /* if (R0 < 0) return R0; */
+ insns[i++] = BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 1);
+ insns[i++] = BPF_EXIT_INSN();
+ break;
+ default:
+ goto out;
+ }
+ code++;
+ }
+ }
+
+ if (i >= 500)
+ goto out;
+
+ /* Epilogue: return R9 (0) */
+ insns[i++] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_9);
+ insns[i++] = BPF_EXIT_INSN();
+
+ prog = bpf_prog_alloc(bpf_prog_size(i), 0);
+ if (!prog)
+ goto out;
+
+ prog->len = i;
+ memcpy(prog->insnsi, insns, prog->len * sizeof(struct bpf_insn));
+ prog->type = BPF_PROG_TYPE_KPROBE;
+
+ prog = bpf_prog_select_runtime(prog, &err);
+ if (IS_ERR(prog))
+ goto out;
+ tp->prog = prog;
+
+out:
+ kfree(insns);
+}
+#endif
+
+/* Parse an argument */
+/* The caller must pass a null-terminated argument string */
int traceprobe_parse_probe_arg(struct trace_probe *tp, int i, const char *arg,
struct traceprobe_parse_context *ctx)
{
struct probe_arg *parg = &tp->args[i];
const char *body;
+ int ret;
ctx->tp = tp;
body = strchr(arg, '=');
@@ -2038,7 +2235,11 @@ int traceprobe_parse_probe_arg(struct trace_probe *tp, int i, const char *arg,
}
ctx->offset = body - arg;
/* Parse fetch argument */
- return traceprobe_parse_probe_arg_body(body, &tp->size, parg, ctx);
+ ret = traceprobe_parse_probe_arg_body(body, &tp->size, parg, ctx);
+ if (ret)
+ return ret;
+
+ return 0;
}
void traceprobe_free_probe_arg(struct probe_arg *arg)
@@ -2443,6 +2644,13 @@ void trace_probe_cleanup(struct trace_probe *tp)
for (i = 0; i < tp->nr_args; i++)
traceprobe_free_probe_arg(&tp->args[i]);
+#ifdef CONFIG_BPF_SYSCALL
+ if (tp->prog) {
+ bpf_prog_put(tp->prog);
+ tp->prog = NULL;
+ }
+#endif
+
if (tp->entry_arg) {
kfree(tp->entry_arg);
tp->entry_arg = NULL;
@@ -2531,15 +2739,32 @@ int trace_probe_register_event_call(struct trace_probe *tp)
trace_probe_name(tp)))
return -EEXIST;
+#ifdef CONFIG_BPF_SYSCALL
+ trace_probe_compile_bpf(tp);
+#endif
+
ret = register_trace_event(&call->event);
- if (!ret)
- return -ENODEV;
+ if (!ret) {
+ ret = -ENODEV;
+ goto err_free_bpf;
+ }
ret = trace_add_event_call(call);
- if (ret)
+ if (ret) {
unregister_trace_event(&call->event);
+ goto err_free_bpf;
+ }
return ret;
+
+err_free_bpf:
+#ifdef CONFIG_BPF_SYSCALL
+ if (tp->prog) {
+ bpf_prog_put(tp->prog);
+ tp->prog = NULL;
+ }
+#endif
+ return ret;
}
int trace_probe_add_file(struct trace_probe *tp, struct trace_event_file *file)
@@ -2768,5 +2993,21 @@ void trace_probe_dump_args(struct seq_file *m, struct trace_probe *tp)
for (i = 0; i < tp->nr_args; i++)
trace_probe_dump_arg(m, &tp->args[i]);
+
+#ifdef CONFIG_BPF_SYSCALL
+ if (tp->prog) {
+ seq_printf(m, "# [BPF%s]:", tp->prog->jited ? "-JIT" : "");
+ for (i = 0; i < tp->prog->len; i++) {
+ struct bpf_insn *insn = &tp->prog->insnsi[i];
+
+ seq_printf(m, " %02x %02x %04x %08x", insn->code,
+ insn->dst_reg | (insn->src_reg << 4),
+ insn->off, insn->imm);
+ if (i < tp->prog->len - 1)
+ seq_putc(m, ',');
+ }
+ seq_putc(m, '\n');
+ }
+#endif
}
#endif /* CONFIG_PROBE_EVENTS_DUMP_FETCHARG */
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index e6268a8dc378..10589414451c 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -274,6 +274,9 @@ struct trace_probe {
ssize_t size; /* trace entry size */
unsigned int nr_args;
struct probe_entry_arg *entry_arg; /* This is only for return probe */
+#ifdef CONFIG_BPF_SYSCALL
+ struct bpf_prog *prog;
+#endif
struct probe_arg args[];
};
@@ -299,6 +302,7 @@ static inline void trace_probe_set_flag(struct trace_probe *tp,
smp_store_release(&tp->event->flags, tp->event->flags | flag);
}
+
static inline void trace_probe_clear_flag(struct trace_probe *tp,
unsigned int flag)
{
@@ -631,3 +635,14 @@ struct uprobe_dispatch_data {
struct trace_uprobe *tu;
unsigned long bp_addr;
};
+
+#ifdef CONFIG_BPF_SYSCALL
+#include <linux/filter.h>
+
+struct fetch_bpf_ctx {
+ void *rec;
+ void *edata;
+ void *data;
+ void *base;
+};
+#endif
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 8db12f758fda..6ca2dfe59a0f 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -273,6 +273,19 @@ store_trace_args(void *data, struct trace_probe *tp, void *rec, void *edata,
u32 *dl; /* Data location */
int ret, i;
+#ifdef CONFIG_BPF_SYSCALL
+ if (tp->prog) {
+ struct fetch_bpf_ctx ctx = {
+ .rec = rec,
+ .edata = edata,
+ .data = data,
+ .base = base,
+ };
+ bpf_prog_run(tp->prog, &ctx);
+ return;
+ }
+#endif
+
for (i = 0; i < tp->nr_args; i++) {
arg = tp->args + i;
dl = data + arg->offset;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH 3/4] tracing: Add disable_bpf trace option to ignore eBPF for fetchargs
2026-07-01 13:45 [RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF Masami Hiramatsu (Google)
2026-07-01 13:45 ` [RFC PATCH 1/4] tools/tracing: Add fetcharg performance micro-benchmark Masami Hiramatsu (Google)
2026-07-01 13:45 ` [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event Masami Hiramatsu (Google)
@ 2026-07-01 13:45 ` Masami Hiramatsu (Google)
2026-07-01 13:46 ` [RFC PATCH 4/4] selftests/ftrace: Add a test for eBPF compiled fetchargs Masami Hiramatsu (Google)
3 siblings, 0 replies; 12+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-07-01 13:45 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Shuah Khan
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
linux-kselftest, bpf
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add a trace option "disable_bpf" to disable BPF execution for fetchargs,
forcing the execution to fallback to the interpreter loop. This is useful
for evaluating BPF compilation performance impact.
Assisted-by: Antigravity:gemini-3.5-flash
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
kernel/trace/trace.c | 7 +++++++
kernel/trace/trace.h | 8 ++++++++
kernel/trace/trace_probe_tmpl.h | 2 +-
3 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c9e182d40059..7c0f7b629fcb 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9940,6 +9940,13 @@ struct trace_array *trace_get_global_array(void)
}
#endif
+#ifdef CONFIG_BPF_SYSCALL
+bool trace_probe_bpf_disabled(void)
+{
+ return !!(global_trace.trace_flags & TRACE_ITER(DISABLE_BPF));
+}
+#endif
+
void __init early_trace_init(void)
{
if (tracepoint_printk) {
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 80fe152af1dd..bf83680e0ba7 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1503,6 +1503,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
C(PAUSE_ON_TRACE, "pause-on-trace"), \
C(HASH_PTR, "hash-ptr"), /* Print hashed pointer */ \
C(BITMASK_LIST, "bitmask-list"), \
+ C(DISABLE_BPF, "disable_bpf"), \
FUNCTION_FLAGS \
FGRAPH_FLAGS \
STACK_FLAGS \
@@ -2505,4 +2506,11 @@ static inline int rv_init_interface(void)
_args; \
})
+#ifdef CONFIG_BPF_SYSCALL
+bool trace_probe_bpf_disabled(void);
+#else
+static inline bool trace_probe_bpf_disabled(void) { return false; }
+#endif
+
#endif /* _LINUX_KERNEL_TRACE_H */
+
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 6ca2dfe59a0f..015208aefbaf 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -274,7 +274,7 @@ store_trace_args(void *data, struct trace_probe *tp, void *rec, void *edata,
int ret, i;
#ifdef CONFIG_BPF_SYSCALL
- if (tp->prog) {
+ if (tp->prog && !trace_probe_bpf_disabled()) {
struct fetch_bpf_ctx ctx = {
.rec = rec,
.edata = edata,
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH 4/4] selftests/ftrace: Add a test for eBPF compiled fetchargs
2026-07-01 13:45 [RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF Masami Hiramatsu (Google)
` (2 preceding siblings ...)
2026-07-01 13:45 ` [RFC PATCH 3/4] tracing: Add disable_bpf trace option to ignore eBPF for fetchargs Masami Hiramatsu (Google)
@ 2026-07-01 13:46 ` Masami Hiramatsu (Google)
3 siblings, 0 replies; 12+ messages in thread
From: Masami Hiramatsu (Google) @ 2026-07-01 13:46 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Shuah Khan
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
linux-kselftest, bpf
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add a selftest for trace probe BPF compilation to verify that BPF is
correctly compiled and executes properly for valid configurations,
and falls back to interpreter on unsupported cases.
Assisted-by: Antigravity:gemini-3.5-flash
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
.../ftrace/test.d/dynevent/test_bpf_fetchargs.tc | 51 ++++++++++++++++++++
1 file changed, 51 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc b/tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc
new file mode 100644
index 000000000000..6c6f4dd4517a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc
@@ -0,0 +1,51 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Dynamic event - test eBPF compiled fetchargs for tprobe and fprobe
+# requires: dynamic_events "t[:[<group>/][<event>]] <tracepoint> [<args>]":README
+
+# Check if the sample module is loaded
+if ! lsmod | grep -q trace_events_sample; then
+ modprobe trace-events-sample || exit_unsupported
+fi
+
+echo 0 > events/enable
+echo > dynamic_events
+clear_trace
+
+# Add a tprobe to sample-trace:foo_bar
+# foo_bar args: const char *foo, int bar, ...
+# So $arg1 is char ptr (points to "hello"), $arg2 is int.
+# Let's test REG (arg2) and DEREF (+0($arg1))
+# "hello" in little-endian 32-bit: 'h'(0x68), 'e'(0x65), 'l'(0x6c), 'l'(0x6c) -> 0x6c6c6568 -> 1819043176
+echo "t:test_tprobe foo_bar mybar=\$arg2 myfoo=+0(\$arg1):u32" >> dynamic_events
+
+# Add an fprobe to __traceiter_foo_bar
+if grep -q "__traceiter_foo_bar" /proc/kallsyms; then
+ # __traceiter_foo_bar args: void *__data, const char *foo, int bar, ...
+ # $arg1 is __data, $arg2 is foo (points to "hello")
+ # "hello" in little-endian 32-bit: 'h'(0x68), 'e'(0x65), 'l'(0x6c), 'l'(0x6c) -> 0x6c6c6568 -> 1819043176
+ echo "f:test_fprobe __traceiter_foo_bar myfoo=\$arg2 myfmt=+0(\$arg2):u32" >> dynamic_events
+fi
+
+echo 1 > events/sample-trace/foo_bar/enable
+echo 1 > events/tracepoints/test_tprobe/enable
+if [ -d events/fprobes/test_fprobe ]; then
+ echo 1 > events/fprobes/test_fprobe/enable
+fi
+
+# Wait for 2 seconds to let the sample thread trigger the events
+sleep 2
+
+echo 0 > events/enable
+
+# Now verify the trace
+grep -q "test_tprobe.*myfoo=1819043176" trace || exit_fail
+
+if [ -d events/fprobes/test_fprobe ]; then
+ grep -q "test_fprobe.*myfmt=1819043176" trace || exit_fail
+fi
+
+echo > dynamic_events
+rmmod trace-events-sample
+
+exit 0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 13:45 ` [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event Masami Hiramatsu (Google)
@ 2026-07-01 18:41 ` Alexei Starovoitov
2026-07-01 18:47 ` Steven Rostedt
2026-07-01 22:40 ` Masami Hiramatsu
0 siblings, 2 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2026-07-01 18:41 UTC (permalink / raw)
To: Masami Hiramatsu (Google), Steven Rostedt, Shuah Khan
Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel,
linux-kselftest, bpf
On Wed Jul 1, 2026 at 6:45 AM PDT, Masami Hiramatsu (Google) wrote:
> +static bool trace_probe_can_compile_bpf(struct trace_probe *tp)
> +{
> + int i;
> +
> + if (tp->nr_args == 0)
> + return false;
> +
> + for (i = 0; i < tp->nr_args; i++) {
> + struct probe_arg *parg = &tp->args[i];
> + struct fetch_insn *code = parg->code;
> +
> + while (code->op != FETCH_OP_END) {
> + switch (code->op) {
> + case FETCH_OP_REG:
> + case FETCH_OP_IMM:
> + case FETCH_OP_DEREF:
> + case FETCH_OP_ST_RAW:
> + case FETCH_OP_ST_MEM:
> + break;
> + case FETCH_OP_ARG:
> + if (regs_get_kernel_argument_offset(code->param) < 0)
> + return false;
> + break;
> + default:
> + return false;
> + }
> + code++;
> + }
> + }
> + return true;
> +}
> +
> +static void trace_probe_compile_bpf(struct trace_probe *tp)
> +{
> + struct bpf_insn *insns;
> + int i = 0;
> + struct bpf_prog *prog;
> + int err, idx;
> +
> + if (!trace_probe_can_compile_bpf(tp))
> + return;
> +
> + insns = kmalloc_array(512, sizeof(struct bpf_insn), GFP_KERNEL);
> + if (!insns)
> + return;
> +
> + /* Prologue: R6 = ctx */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
> + /* R7 = ctx->rec */
> + insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_6,
> + offsetof(struct fetch_bpf_ctx, rec));
> + /* R8 = ctx->data */
> + insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_8, BPF_REG_6,
> + offsetof(struct fetch_bpf_ctx, data));
> + /* R9 = total size (0) */
> + insns[i++] = BPF_MOV64_IMM(BPF_REG_9, 0);
> +
> + for (idx = 0; idx < tp->nr_args; idx++) {
> + struct probe_arg *parg = &tp->args[idx];
> + struct fetch_insn *code = parg->code;
> +
> + while (code->op != FETCH_OP_END && i < 500) {
> + switch (code->op) {
> + case FETCH_OP_REG:
> + /* R0 = *(unsigned long *)(R7 + code->param) */
> + insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_7, code->param);
> + break;
> + case FETCH_OP_ARG: {
> + int offset = regs_get_kernel_argument_offset(code->param);
> + /* R0 = *(unsigned long *)(R7 + offset) */
> + insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_7, offset);
> + break;
> + }
> + case FETCH_OP_IMM:
> + insns[i++] = BPF_LD_IMM64(BPF_REG_0, code->immediate);
> + break;
> + case FETCH_OP_DEREF:
> + /* Add offset: R3 = R0 + code->offset (src) */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_2, BPF_REG_0);
> + if (code->offset)
> + insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2,
> + code->offset);
> + /* R1 = dst (R10 - 8 on stack) */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_10);
> + insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8);
> + /* R3 = size */
> + insns[i++] = BPF_MOV64_IMM(BPF_REG_3, sizeof(unsigned long));
> + /* Call copy_from_kernel_nofault(dst, src, size) */
> + insns[i++] = BPF_EMIT_CALL(copy_from_kernel_nofault);
> + /* if (R0 < 0) return R0; */
> + insns[i++] = BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 1);
> + insns[i++] = BPF_EXIT_INSN();
> + /* R0 = *(unsigned long *)(R10 - 8) */
> + insns[i++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8);
> + break;
> + case FETCH_OP_ST_RAW:
> + /* Store R0 into R8 (data) + parg->offset based on size */
> + switch (code->size) {
> + case 1:
> + insns[i++] = BPF_STX_MEM(BPF_B, BPF_REG_8, BPF_REG_0,
> + parg->offset);
> + break;
> + case 2:
> + insns[i++] = BPF_STX_MEM(BPF_H, BPF_REG_8, BPF_REG_0,
> + parg->offset);
> + break;
> + case 4:
> + insns[i++] = BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_0,
> + parg->offset);
> + break;
> + case 8:
> + insns[i++] = BPF_STX_MEM(BPF_DW, BPF_REG_8, BPF_REG_0,
> + parg->offset);
> + break;
> + }
> + break;
> + case FETCH_OP_ST_MEM:
> + /* Add offset: R2 = R0 + code->offset (src) */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_2, BPF_REG_0);
> + if (code->offset)
> + insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2,
> + code->offset);
> + /* R1 = dst (R8 + parg->offset) */
> + insns[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_8);
> + if (parg->offset)
> + insns[i++] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
> + parg->offset);
> + /* R3 = size */
> + insns[i++] = BPF_MOV64_IMM(BPF_REG_3, code->size);
> + /* Call copy_from_kernel_nofault(dst, src, size) */
> + insns[i++] = BPF_EMIT_CALL(copy_from_kernel_nofault);
> + /* if (R0 < 0) return R0; */
> + insns[i++] = BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 1);
> + insns[i++] = BPF_EXIT_INSN();
> + break;
> + default:
> + goto out;
> + }
> + code++;
> + }
> + }
Nack.
I really don't like it.
There were days in the past when the kernel generating bpf directly was appealing.
These days are gone. Performance improvements for fetchargs is not a good reason
to add all this complexity and bypass verifier checks.
bpf insns should come from user space.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 18:41 ` Alexei Starovoitov
@ 2026-07-01 18:47 ` Steven Rostedt
2026-07-01 18:53 ` Alexei Starovoitov
2026-07-01 22:40 ` Masami Hiramatsu
1 sibling, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2026-07-01 18:47 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Masami Hiramatsu (Google), Shuah Khan, Mathieu Desnoyers,
linux-kernel, linux-trace-kernel, linux-kselftest, bpf
On Wed, 01 Jul 2026 11:41:26 -0700
"Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:
> Nack.
> I really don't like it.
So the nack is mostly your opinion and not technical?
> There were days in the past when the kernel generating bpf directly was appealing.
> These days are gone. Performance improvements for fetchargs is not a good reason
Why is performance *not* a good reason?
> to add all this complexity and bypass verifier checks.
The code lives in the kernel. What reason is there to add verification
checks? The point of verification is because we don't trust user space. Why
do we not trust the kernel?
> bpf insns should come from user space.
Why?
-- Steve
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 18:47 ` Steven Rostedt
@ 2026-07-01 18:53 ` Alexei Starovoitov
0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2026-07-01 18:53 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu (Google), Shuah Khan, Mathieu Desnoyers,
linux-kernel, linux-trace-kernel, linux-kselftest, bpf
On Wed Jul 1, 2026 at 11:47 AM PDT, Steven Rostedt wrote:
> On Wed, 01 Jul 2026 11:41:26 -0700
> "Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:
>
>> Nack.
>> I really don't like it.
>
> So the nack is mostly your opinion and not technical?
>
>> There were days in the past when the kernel generating bpf directly was appealing.
>> These days are gone. Performance improvements for fetchargs is not a good reason
>
> Why is performance *not* a good reason?
>
>> to add all this complexity and bypass verifier checks.
>
> The code lives in the kernel. What reason is there to add verification
> checks? The point of verification is because we don't trust user space. Why
> do we not trust the kernel?
>
>> bpf insns should come from user space.
>
> Why?
because I see this patchset as pointless kernel bloat with the "help" of bpf.
Hence the nack to avoid dragging bpf into this.
Call it philosophical disagreement if you like.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 18:41 ` Alexei Starovoitov
2026-07-01 18:47 ` Steven Rostedt
@ 2026-07-01 22:40 ` Masami Hiramatsu
2026-07-02 0:01 ` Alexei Starovoitov
1 sibling, 1 reply; 12+ messages in thread
From: Masami Hiramatsu @ 2026-07-01 22:40 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Steven Rostedt, Shuah Khan, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, linux-kselftest, bpf
On Wed, 01 Jul 2026 11:41:26 -0700
"Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:
>
> Nack.
> I really don't like it.
> There were days in the past when the kernel generating bpf directly was appealing.
> These days are gone. Performance improvements for fetchargs is not a good reason
> to add all this complexity and bypass verifier checks.
> bpf insns should come from user space.
Thanks for your comment!
OK, I don't mind because this is a kind of investigation project. And some
people had asked me about the same idea, now I can tell them the result.
I'm satisfied with the current outcome, as this development process gave me
insight into the implementation of BPF and demonstrated the potential for
optimization via JIT. :)
And also, as noted in the cover letter, the current performance of fetcharg
is better than I thought, and is good enough for debugging. :)
BTW, I'm also interested in calling the verifier on this generated code.
Even it it is not merged, I think showing the correct way to implement it
will be useful in the future.
Thank you,
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-01 22:40 ` Masami Hiramatsu
@ 2026-07-02 0:01 ` Alexei Starovoitov
2026-07-02 1:01 ` Masami Hiramatsu
2026-07-02 14:04 ` Steven Rostedt
0 siblings, 2 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2026-07-02 0:01 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: Steven Rostedt, Shuah Khan, Mathieu Desnoyers, LKML,
linux-trace-kernel, open list:KERNEL SELFTEST FRAMEWORK, bpf
On Wed, Jul 1, 2026 at 3:40 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Wed, 01 Jul 2026 11:41:26 -0700
> "Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:
>
> >
> > Nack.
> > I really don't like it.
> > There were days in the past when the kernel generating bpf directly was appealing.
> > These days are gone. Performance improvements for fetchargs is not a good reason
> > to add all this complexity and bypass verifier checks.
> > bpf insns should come from user space.
>
> Thanks for your comment!
> OK, I don't mind because this is a kind of investigation project. And some
> people had asked me about the same idea, now I can tell them the result.
>
> I'm satisfied with the current outcome, as this development process gave me
> insight into the implementation of BPF and demonstrated the potential for
> optimization via JIT. :)
>
> And also, as noted in the cover letter, the current performance of fetcharg
> is better than I thought, and is good enough for debugging. :)
>
> BTW, I'm also interested in calling the verifier on this generated code.
> Even it it is not merged, I think showing the correct way to implement it
> will be useful in the future.
The whole feature you're trying to do is imo reinvention of the wheel.
bpf could do that kind of filtering years ago.
I don't buy the excuse that embedded environments without any
kind of user space needs this facility.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-02 0:01 ` Alexei Starovoitov
@ 2026-07-02 1:01 ` Masami Hiramatsu
2026-07-02 14:04 ` Steven Rostedt
1 sibling, 0 replies; 12+ messages in thread
From: Masami Hiramatsu @ 2026-07-02 1:01 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Steven Rostedt, Shuah Khan, Mathieu Desnoyers, LKML,
linux-trace-kernel, open list:KERNEL SELFTEST FRAMEWORK, bpf
On Wed, 1 Jul 2026 17:01:01 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Wed, Jul 1, 2026 at 3:40 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > On Wed, 01 Jul 2026 11:41:26 -0700
> > "Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:
> >
> > >
> > > Nack.
> > > I really don't like it.
> > > There were days in the past when the kernel generating bpf directly was appealing.
> > > These days are gone. Performance improvements for fetchargs is not a good reason
> > > to add all this complexity and bypass verifier checks.
> > > bpf insns should come from user space.
> >
> > Thanks for your comment!
> > OK, I don't mind because this is a kind of investigation project. And some
> > people had asked me about the same idea, now I can tell them the result.
> >
> > I'm satisfied with the current outcome, as this development process gave me
> > insight into the implementation of BPF and demonstrated the potential for
> > optimization via JIT. :)
> >
> > And also, as noted in the cover letter, the current performance of fetcharg
> > is better than I thought, and is good enough for debugging. :)
> >
> > BTW, I'm also interested in calling the verifier on this generated code.
> > Even it it is not merged, I think showing the correct way to implement it
> > will be useful in the future.
>
> The whole feature you're trying to do is imo reinvention of the wheel.
> bpf could do that kind of filtering years ago.
> I don't buy the excuse that embedded environments without any
> kind of user space needs this facility.
>
Yeah, I understood it. I have no intention to merge this series.
This is totally just for fun and my interest about BPF.
But thanks for your comments. It is good to hear your thought in the RFC
stage. I can stop this series here with peace of mind. :)
Thank you!
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event
2026-07-02 0:01 ` Alexei Starovoitov
2026-07-02 1:01 ` Masami Hiramatsu
@ 2026-07-02 14:04 ` Steven Rostedt
1 sibling, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2026-07-02 14:04 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Masami Hiramatsu, Shuah Khan, Mathieu Desnoyers, LKML,
linux-trace-kernel, open list:KERNEL SELFTEST FRAMEWORK, bpf
On Wed, 1 Jul 2026 17:01:01 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> The whole feature you're trying to do is imo reinvention of the wheel.
> bpf could do that kind of filtering years ago.
> I don't buy the excuse that embedded environments without any
> kind of user space needs this facility.
And what embedded environments do you work with?
Feel free to go to Embedded Recipes and talk with developers there.
-- Steve
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-07-02 14:04 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01 13:45 [RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF Masami Hiramatsu (Google)
2026-07-01 13:45 ` [RFC PATCH 1/4] tools/tracing: Add fetcharg performance micro-benchmark Masami Hiramatsu (Google)
2026-07-01 13:45 ` [RFC PATCH 2/4] tracing/probes: Compile all fetchargs into a single BPF program per event Masami Hiramatsu (Google)
2026-07-01 18:41 ` Alexei Starovoitov
2026-07-01 18:47 ` Steven Rostedt
2026-07-01 18:53 ` Alexei Starovoitov
2026-07-01 22:40 ` Masami Hiramatsu
2026-07-02 0:01 ` Alexei Starovoitov
2026-07-02 1:01 ` Masami Hiramatsu
2026-07-02 14:04 ` Steven Rostedt
2026-07-01 13:45 ` [RFC PATCH 3/4] tracing: Add disable_bpf trace option to ignore eBPF for fetchargs Masami Hiramatsu (Google)
2026-07-01 13:46 ` [RFC PATCH 4/4] selftests/ftrace: Add a test for eBPF compiled fetchargs Masami Hiramatsu (Google)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox