* [PATCH bpf-next 0/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
@ 2025-04-26 16:00 KaFai Wan
2025-04-26 16:00 ` [PATCH bpf-next 1/4] " KaFai Wan
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: KaFai Wan @ 2025-04-26 16:00 UTC (permalink / raw)
To: song, jolsa, ast, daniel, andrii, martin.lau, eddyz87,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo,
mattbobrowski, rostedt, mhiramat, mathieu.desnoyers, davem,
edumazet, kuba, pabeni, horms, mykolal, shuah
Cc: linux-kernel, bpf, linux-trace-kernel, netdev, linux-kselftest,
leon.hwang, mannkafai
hi,
We can use get_func_[arg|arg_cnt] helpers in fentry/fexit/fmod_ret programs
currently[1]. But they can't be used in raw_tp/tp_btf programs.
Adding support to use get_func_[arg|arg_cnt] helpers in raw_tp/tp_btf
programs.
Adding BPF_PROG_TEST_RUN for tp_btf.
Add selftests to check them.
Thanks,
KaFai
[1] https://lore.kernel.org/bpf/20211208193245.172141-1-jolsa@kernel.org/
---
KaFai Wan (4):
bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
bpf: Enable BPF_PROG_TEST_RUN for tp_btf
selftests/bpf: Add raw_tp_test_run for tp_btf
selftests/bpf: Add tests for get_func_[arg|arg_cnt] helpers in raw
tracepoint programs
kernel/trace/bpf_trace.c | 17 +++++--
net/bpf/test_run.c | 16 +++----
.../bpf/prog_tests/raw_tp_get_func_args.c | 48 +++++++++++++++++++
.../bpf/prog_tests/raw_tp_test_run.c | 18 ++++++-
.../bpf/progs/test_raw_tp_get_func_args.c | 47 ++++++++++++++++++
.../bpf/progs/test_raw_tp_test_run.c | 16 +++++--
6 files changed, 146 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_get_func_args.c
create mode 100644 tools/testing/selftests/bpf/progs/test_raw_tp_get_func_args.c
--
2.43.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH bpf-next 1/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
2025-04-26 16:00 [PATCH bpf-next 0/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs KaFai Wan
@ 2025-04-26 16:00 ` KaFai Wan
2025-04-30 2:46 ` Alexei Starovoitov
2025-04-26 16:00 ` [PATCH bpf-next 2/4] bpf: Enable BPF_PROG_TEST_RUN for tp_btf KaFai Wan
` (2 subsequent siblings)
3 siblings, 1 reply; 16+ messages in thread
From: KaFai Wan @ 2025-04-26 16:00 UTC (permalink / raw)
To: song, jolsa, ast, daniel, andrii, martin.lau, eddyz87,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo,
mattbobrowski, rostedt, mhiramat, mathieu.desnoyers, davem,
edumazet, kuba, pabeni, horms, mykolal, shuah
Cc: linux-kernel, bpf, linux-trace-kernel, netdev, linux-kselftest,
leon.hwang, mannkafai
Adding support to use get_func_[arg|arg_cnt] helpers in raw_tp/tp_btf
programs.
We can use get_func_[arg|ret|arg_cnt] helpers in fentry/fexit/fmod_ret
programs currently. If we try to use get_func_[arg|arg_cnt] helpers in
raw_tp/tp_btf programs, verifier will fail to load the program with:
; __u64 cnt = bpf_get_func_arg_cnt(ctx);
3: (85) call bpf_get_func_arg_cnt#185
unknown func bpf_get_func_arg_cnt#185
Adding get_func_[arg|arg_cnt] helpers in raw_tp_prog_func_proto and
tracing_prog_func_proto for raw tracepoint.
Adding 1 arg on ctx of raw tracepoint program and make it stores number of
arguments on ctx-8, so it's easy to verify argument index and find
argument's position.
Signed-off-by: KaFai Wan <mannkafai@gmail.com>
---
kernel/trace/bpf_trace.c | 17 ++++++++++++++---
net/bpf/test_run.c | 13 +++++--------
2 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 52c432a44aeb..eb4c56013493 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1892,6 +1892,10 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_get_stackid_proto_raw_tp;
case BPF_FUNC_get_stack:
return &bpf_get_stack_proto_raw_tp;
+ case BPF_FUNC_get_func_arg:
+ return &bpf_get_func_arg_proto;
+ case BPF_FUNC_get_func_arg_cnt:
+ return &bpf_get_func_arg_cnt_proto;
case BPF_FUNC_get_attach_cookie:
return &bpf_get_attach_cookie_proto_tracing;
default:
@@ -1950,10 +1954,16 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_d_path:
return &bpf_d_path_proto;
case BPF_FUNC_get_func_arg:
+ if (prog->type == BPF_PROG_TYPE_TRACING &&
+ prog->expected_attach_type == BPF_TRACE_RAW_TP)
+ return &bpf_get_func_arg_proto;
return bpf_prog_has_trampoline(prog) ? &bpf_get_func_arg_proto : NULL;
case BPF_FUNC_get_func_ret:
return bpf_prog_has_trampoline(prog) ? &bpf_get_func_ret_proto : NULL;
case BPF_FUNC_get_func_arg_cnt:
+ if (prog->type == BPF_PROG_TYPE_TRACING &&
+ prog->expected_attach_type == BPF_TRACE_RAW_TP)
+ return &bpf_get_func_arg_cnt_proto;
return bpf_prog_has_trampoline(prog) ? &bpf_get_func_arg_cnt_proto : NULL;
case BPF_FUNC_get_attach_cookie:
if (prog->type == BPF_PROG_TYPE_TRACING &&
@@ -2312,7 +2322,7 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
#define REPEAT(X, FN, DL, ...) REPEAT_##X(FN, DL, __VA_ARGS__)
#define SARG(X) u64 arg##X
-#define COPY(X) args[X] = arg##X
+#define COPY(X) args[X + 1] = arg##X
#define __DL_COM (,)
#define __DL_SEM (;)
@@ -2323,9 +2333,10 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
void bpf_trace_run##x(struct bpf_raw_tp_link *link, \
REPEAT(x, SARG, __DL_COM, __SEQ_0_11)) \
{ \
- u64 args[x]; \
+ u64 args[x + 1]; \
+ args[0] = x; \
REPEAT(x, COPY, __DL_SEM, __SEQ_0_11); \
- __bpf_trace_run(link, args); \
+ __bpf_trace_run(link, args + 1); \
} \
EXPORT_SYMBOL_GPL(bpf_trace_run##x)
BPF_TRACE_DEFN_x(1);
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index aaf13a7d58ed..8cb285187270 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -760,6 +760,7 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
void __user *ctx_in = u64_to_user_ptr(kattr->test.ctx_in);
__u32 ctx_size_in = kattr->test.ctx_size_in;
struct bpf_raw_tp_test_run_info info;
+ u64 args[MAX_BPF_FUNC_ARGS + 1] = {};
int cpu = kattr->test.cpu, err = 0;
int current_cpu;
@@ -776,14 +777,11 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 && cpu != 0)
return -EINVAL;
- if (ctx_size_in) {
- info.ctx = memdup_user(ctx_in, ctx_size_in);
- if (IS_ERR(info.ctx))
- return PTR_ERR(info.ctx);
- } else {
- info.ctx = NULL;
- }
+ if (ctx_size_in && copy_from_user(args + 1, ctx_in, ctx_size_in))
+ return -EFAULT;
+ args[0] = ctx_size_in / sizeof(u64);
+ info.ctx = args + 1;
info.prog = prog;
current_cpu = get_cpu();
@@ -807,7 +805,6 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
copy_to_user(&uattr->test.retval, &info.retval, sizeof(u32)))
err = -EFAULT;
- kfree(info.ctx);
return err;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH bpf-next 2/4] bpf: Enable BPF_PROG_TEST_RUN for tp_btf
2025-04-26 16:00 [PATCH bpf-next 0/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs KaFai Wan
2025-04-26 16:00 ` [PATCH bpf-next 1/4] " KaFai Wan
@ 2025-04-26 16:00 ` KaFai Wan
2025-05-01 20:55 ` Andrii Nakryiko
2025-04-26 16:00 ` [PATCH bpf-next 3/4] selftests/bpf: Add raw_tp_test_run " KaFai Wan
2025-04-26 16:00 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for get_func_[arg|arg_cnt] helpers in raw tracepoint programs KaFai Wan
3 siblings, 1 reply; 16+ messages in thread
From: KaFai Wan @ 2025-04-26 16:00 UTC (permalink / raw)
To: song, jolsa, ast, daniel, andrii, martin.lau, eddyz87,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo,
mattbobrowski, rostedt, mhiramat, mathieu.desnoyers, davem,
edumazet, kuba, pabeni, horms, mykolal, shuah
Cc: linux-kernel, bpf, linux-trace-kernel, netdev, linux-kselftest,
leon.hwang, mannkafai
Add .test_run for tp_btf. Use the .test_run for raw_tp.
Signed-off-by: KaFai Wan <mannkafai@gmail.com>
---
net/bpf/test_run.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 8cb285187270..8c901ec92341 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -690,6 +690,9 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog,
int b = 2, err = -EFAULT;
u32 retval = 0;
+ if (prog->expected_attach_type == BPF_TRACE_RAW_TP)
+ return bpf_prog_test_run_raw_tp(prog, kattr, uattr);
+
if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size)
return -EINVAL;
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH bpf-next 3/4] selftests/bpf: Add raw_tp_test_run for tp_btf
2025-04-26 16:00 [PATCH bpf-next 0/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs KaFai Wan
2025-04-26 16:00 ` [PATCH bpf-next 1/4] " KaFai Wan
2025-04-26 16:00 ` [PATCH bpf-next 2/4] bpf: Enable BPF_PROG_TEST_RUN for tp_btf KaFai Wan
@ 2025-04-26 16:00 ` KaFai Wan
2025-04-26 16:00 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for get_func_[arg|arg_cnt] helpers in raw tracepoint programs KaFai Wan
3 siblings, 0 replies; 16+ messages in thread
From: KaFai Wan @ 2025-04-26 16:00 UTC (permalink / raw)
To: song, jolsa, ast, daniel, andrii, martin.lau, eddyz87,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo,
mattbobrowski, rostedt, mhiramat, mathieu.desnoyers, davem,
edumazet, kuba, pabeni, horms, mykolal, shuah
Cc: linux-kernel, bpf, linux-trace-kernel, netdev, linux-kselftest,
leon.hwang, mannkafai
Add test runs test_run for tp_btf base on raw_tp.
Changing test_raw_tp_test_run to test both raw_tp and tp_btf.
Signed-off-by: KaFai Wan <mannkafai@gmail.com>
---
.../selftests/bpf/prog_tests/raw_tp_test_run.c | 18 ++++++++++++++++--
.../selftests/bpf/progs/test_raw_tp_test_run.c | 16 +++++++++++++---
2 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_test_run.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_test_run.c
index fe5b8fae2c36..e2968a1e73bf 100644
--- a/tools/testing/selftests/bpf/prog_tests/raw_tp_test_run.c
+++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_test_run.c
@@ -5,7 +5,7 @@
#include "bpf/libbpf_internal.h"
#include "test_raw_tp_test_run.skel.h"
-void test_raw_tp_test_run(void)
+static void test_raw_tp(bool is_tp_btf)
{
int comm_fd = -1, err, nr_online, i, prog_fd;
__u64 args[2] = {0x1234ULL, 0x5678ULL};
@@ -28,6 +28,9 @@ void test_raw_tp_test_run(void)
if (!ASSERT_OK_PTR(skel, "skel_open"))
goto cleanup;
+ bpf_program__set_autoattach(skel->progs.rename_tp_btf, is_tp_btf);
+ bpf_program__set_autoattach(skel->progs.rename_raw_tp, !is_tp_btf);
+
err = test_raw_tp_test_run__attach(skel);
if (!ASSERT_OK(err, "skel_attach"))
goto cleanup;
@@ -42,7 +45,10 @@ void test_raw_tp_test_run(void)
ASSERT_NEQ(skel->bss->count, 0, "check_count");
ASSERT_EQ(skel->data->on_cpu, 0xffffffff, "check_on_cpu");
- prog_fd = bpf_program__fd(skel->progs.rename);
+ if (is_tp_btf)
+ prog_fd = bpf_program__fd(skel->progs.rename_tp_btf);
+ else
+ prog_fd = bpf_program__fd(skel->progs.rename_raw_tp);
opts.ctx_in = args;
opts.ctx_size_in = sizeof(__u64);
@@ -84,3 +90,11 @@ void test_raw_tp_test_run(void)
test_raw_tp_test_run__destroy(skel);
free(online);
}
+
+void test_raw_tp_test_run(void)
+{
+ if (test__start_subtest("raw_tp"))
+ test_raw_tp(false);
+ if (test__start_subtest("tp_btf"))
+ test_raw_tp(true);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_raw_tp_test_run.c b/tools/testing/selftests/bpf/progs/test_raw_tp_test_run.c
index 4c63cc87b9d0..ddc22e6cfdd9 100644
--- a/tools/testing/selftests/bpf/progs/test_raw_tp_test_run.c
+++ b/tools/testing/selftests/bpf/progs/test_raw_tp_test_run.c
@@ -8,10 +8,8 @@
__u32 count = 0;
__u32 on_cpu = 0xffffffff;
-SEC("raw_tp/task_rename")
-int BPF_PROG(rename, struct task_struct *task, char *comm)
+static __always_inline int check_test_run(struct task_struct *task, char *comm)
{
-
count++;
if ((__u64) task == 0x1234ULL && (__u64) comm == 0x5678ULL) {
on_cpu = bpf_get_smp_processor_id();
@@ -21,4 +19,16 @@ int BPF_PROG(rename, struct task_struct *task, char *comm)
return 0;
}
+SEC("raw_tp/task_rename")
+int BPF_PROG(rename_raw_tp, struct task_struct *task, char *comm)
+{
+ return check_test_run(task, comm);
+}
+
+SEC("tp_btf/task_rename")
+int BPF_PROG(rename_tp_btf, struct task_struct *task, char *comm)
+{
+ return check_test_run(task, comm);
+}
+
char _license[] SEC("license") = "GPL";
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH bpf-next 4/4] selftests/bpf: Add tests for get_func_[arg|arg_cnt] helpers in raw tracepoint programs
2025-04-26 16:00 [PATCH bpf-next 0/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs KaFai Wan
` (2 preceding siblings ...)
2025-04-26 16:00 ` [PATCH bpf-next 3/4] selftests/bpf: Add raw_tp_test_run " KaFai Wan
@ 2025-04-26 16:00 ` KaFai Wan
3 siblings, 0 replies; 16+ messages in thread
From: KaFai Wan @ 2025-04-26 16:00 UTC (permalink / raw)
To: song, jolsa, ast, daniel, andrii, martin.lau, eddyz87,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo,
mattbobrowski, rostedt, mhiramat, mathieu.desnoyers, davem,
edumazet, kuba, pabeni, horms, mykolal, shuah
Cc: linux-kernel, bpf, linux-trace-kernel, netdev, linux-kselftest,
leon.hwang, mannkafai
Adding tests for get_func_[arg|arg_cnt] helpers in raw tracepoint programs.
Using these helpers in raw_tp/tp_btf programs.
Signed-off-by: KaFai Wan <mannkafai@gmail.com>
---
.../bpf/prog_tests/raw_tp_get_func_args.c | 48 +++++++++++++++++++
.../bpf/progs/test_raw_tp_get_func_args.c | 47 ++++++++++++++++++
2 files changed, 95 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_get_func_args.c
create mode 100644 tools/testing/selftests/bpf/progs/test_raw_tp_get_func_args.c
diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_get_func_args.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_get_func_args.c
new file mode 100644
index 000000000000..cbe9b441d8d9
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_get_func_args.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <linux/bpf.h>
+#include "bpf/libbpf_internal.h"
+#include "test_raw_tp_get_func_args.skel.h"
+
+static void test_raw_tp_args(bool is_tp_btf)
+{
+ __u64 args[2] = {0x1234ULL, 0x5678ULL};
+ int expected_retval = 0x1234 + 0x5678;
+ struct test_raw_tp_get_func_args *skel;
+ LIBBPF_OPTS(bpf_test_run_opts, opts,
+ .ctx_in = args,
+ .ctx_size_in = sizeof(args),
+ );
+ int err, prog_fd;
+
+ skel = test_raw_tp_get_func_args__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ bpf_program__set_autoattach(skel->progs.tp_btf_test, is_tp_btf);
+ bpf_program__set_autoattach(skel->progs.raw_tp_test, !is_tp_btf);
+
+ err = test_raw_tp_get_func_args__attach(skel);
+ if (!ASSERT_OK(err, "skel_attach"))
+ goto cleanup;
+
+ if (is_tp_btf)
+ prog_fd = bpf_program__fd(skel->progs.tp_btf_test);
+ else
+ prog_fd = bpf_program__fd(skel->progs.raw_tp_test);
+ err = bpf_prog_test_run_opts(prog_fd, &opts);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(opts.retval, expected_retval, "check_retval");
+ ASSERT_EQ(skel->bss->test_result, 1, "test_result");
+
+cleanup:
+ test_raw_tp_get_func_args__destroy(skel);
+}
+
+void test_raw_tp_get_func_args(void)
+{
+ if (test__start_subtest("raw_tp"))
+ test_raw_tp_args(false);
+ if (test__start_subtest("tp_btf"))
+ test_raw_tp_args(true);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_raw_tp_get_func_args.c b/tools/testing/selftests/bpf/progs/test_raw_tp_get_func_args.c
new file mode 100644
index 000000000000..5069bbd15283
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_raw_tp_get_func_args.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <errno.h>
+
+__u64 test_result = 0;
+
+static __always_inline int check_args(void *ctx, struct task_struct *task,
+ char *comm)
+{
+ __u64 cnt = bpf_get_func_arg_cnt(ctx);
+ __u64 a = 0, b = 0, z = 0;
+ __s64 err;
+
+ if ((__u64)task != 0x1234ULL || (__u64)comm != 0x5678ULL)
+ return 0;
+
+ test_result = cnt == 2;
+
+ /* valid arguments */
+ err = bpf_get_func_arg(ctx, 0, &a);
+ test_result &= err == 0 && a == 0x1234ULL;
+
+ err = bpf_get_func_arg(ctx, 1, &b);
+ test_result &= err == 0 && b == 0x5678ULL;
+
+ /* not valid argument */
+ err = bpf_get_func_arg(ctx, 2, &z);
+ test_result &= err == -EINVAL;
+
+ return a + b;
+}
+
+SEC("raw_tp/task_rename")
+int BPF_PROG(raw_tp_test, struct task_struct *task, char *comm)
+{
+ return check_args(ctx, task, comm);
+}
+
+SEC("tp_btf/task_rename")
+int BPF_PROG(tp_btf_test, struct task_struct *task, char *comm)
+{
+ return check_args(ctx, task, comm);
+}
+
+char _license[] SEC("license") = "GPL";
--
2.43.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
2025-04-26 16:00 ` [PATCH bpf-next 1/4] " KaFai Wan
@ 2025-04-30 2:46 ` Alexei Starovoitov
2025-04-30 12:43 ` Kafai Wan
0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-04-30 2:46 UTC (permalink / raw)
To: KaFai Wan
Cc: Song Liu, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
Matt Bobrowski, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Mykola Lysenko, Shuah Khan, LKML, bpf,
linux-trace-kernel, Network Development,
open list:KERNEL SELFTEST FRAMEWORK, Leon Hwang
On Sat, Apr 26, 2025 at 9:00 AM KaFai Wan <mannkafai@gmail.com> wrote:
>
> Adding support to use get_func_[arg|arg_cnt] helpers in raw_tp/tp_btf
> programs.
>
> We can use get_func_[arg|ret|arg_cnt] helpers in fentry/fexit/fmod_ret
> programs currently. If we try to use get_func_[arg|arg_cnt] helpers in
> raw_tp/tp_btf programs, verifier will fail to load the program with:
>
> ; __u64 cnt = bpf_get_func_arg_cnt(ctx);
> 3: (85) call bpf_get_func_arg_cnt#185
> unknown func bpf_get_func_arg_cnt#185
>
> Adding get_func_[arg|arg_cnt] helpers in raw_tp_prog_func_proto and
> tracing_prog_func_proto for raw tracepoint.
>
> Adding 1 arg on ctx of raw tracepoint program and make it stores number of
> arguments on ctx-8, so it's easy to verify argument index and find
> argument's position.
>
> Signed-off-by: KaFai Wan <mannkafai@gmail.com>
> ---
> kernel/trace/bpf_trace.c | 17 ++++++++++++++---
> net/bpf/test_run.c | 13 +++++--------
> 2 files changed, 19 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 52c432a44aeb..eb4c56013493 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1892,6 +1892,10 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> return &bpf_get_stackid_proto_raw_tp;
> case BPF_FUNC_get_stack:
> return &bpf_get_stack_proto_raw_tp;
> + case BPF_FUNC_get_func_arg:
> + return &bpf_get_func_arg_proto;
> + case BPF_FUNC_get_func_arg_cnt:
> + return &bpf_get_func_arg_cnt_proto;
> case BPF_FUNC_get_attach_cookie:
> return &bpf_get_attach_cookie_proto_tracing;
> default:
> @@ -1950,10 +1954,16 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> case BPF_FUNC_d_path:
> return &bpf_d_path_proto;
> case BPF_FUNC_get_func_arg:
> + if (prog->type == BPF_PROG_TYPE_TRACING &&
> + prog->expected_attach_type == BPF_TRACE_RAW_TP)
> + return &bpf_get_func_arg_proto;
> return bpf_prog_has_trampoline(prog) ? &bpf_get_func_arg_proto : NULL;
> case BPF_FUNC_get_func_ret:
> return bpf_prog_has_trampoline(prog) ? &bpf_get_func_ret_proto : NULL;
> case BPF_FUNC_get_func_arg_cnt:
> + if (prog->type == BPF_PROG_TYPE_TRACING &&
> + prog->expected_attach_type == BPF_TRACE_RAW_TP)
> + return &bpf_get_func_arg_cnt_proto;
> return bpf_prog_has_trampoline(prog) ? &bpf_get_func_arg_cnt_proto : NULL;
> case BPF_FUNC_get_attach_cookie:
> if (prog->type == BPF_PROG_TYPE_TRACING &&
> @@ -2312,7 +2322,7 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
> #define REPEAT(X, FN, DL, ...) REPEAT_##X(FN, DL, __VA_ARGS__)
>
> #define SARG(X) u64 arg##X
> -#define COPY(X) args[X] = arg##X
> +#define COPY(X) args[X + 1] = arg##X
>
> #define __DL_COM (,)
> #define __DL_SEM (;)
> @@ -2323,9 +2333,10 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
> void bpf_trace_run##x(struct bpf_raw_tp_link *link, \
> REPEAT(x, SARG, __DL_COM, __SEQ_0_11)) \
> { \
> - u64 args[x]; \
> + u64 args[x + 1]; \
> + args[0] = x; \
> REPEAT(x, COPY, __DL_SEM, __SEQ_0_11); \
> - __bpf_trace_run(link, args); \
> + __bpf_trace_run(link, args + 1); \
This is neat, but what is this for?
The program that attaches to a particular raw_tp knows what it is
attaching to and how many arguments are there,
so bpf_get_func_arg_cnt() is a 5th wheel.
If the reason is "for completeness" then it's not a good reason
to penalize performance. Though it's just an extra 8 byte of stack
and a single store of a constant.
pw-bot: cr
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
2025-04-30 2:46 ` Alexei Starovoitov
@ 2025-04-30 12:43 ` Kafai Wan
2025-04-30 15:54 ` Leon Hwang
2025-05-01 20:53 ` Andrii Nakryiko
0 siblings, 2 replies; 16+ messages in thread
From: Kafai Wan @ 2025-04-30 12:43 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Song Liu, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
Matt Bobrowski, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Mykola Lysenko, Shuah Khan, LKML, bpf,
linux-trace-kernel, Network Development,
open list:KERNEL SELFTEST FRAMEWORK, Leon Hwang
On Wed, Apr 30, 2025 at 10:46 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Apr 26, 2025 at 9:00 AM KaFai Wan <mannkafai@gmail.com> wrote:
> >
> > Adding support to use get_func_[arg|arg_cnt] helpers in raw_tp/tp_btf
> > programs.
> >
> > We can use get_func_[arg|ret|arg_cnt] helpers in fentry/fexit/fmod_ret
> > programs currently. If we try to use get_func_[arg|arg_cnt] helpers in
> > raw_tp/tp_btf programs, verifier will fail to load the program with:
> >
> > ; __u64 cnt = bpf_get_func_arg_cnt(ctx);
> > 3: (85) call bpf_get_func_arg_cnt#185
> > unknown func bpf_get_func_arg_cnt#185
> >
> > Adding get_func_[arg|arg_cnt] helpers in raw_tp_prog_func_proto and
> > tracing_prog_func_proto for raw tracepoint.
> >
> > Adding 1 arg on ctx of raw tracepoint program and make it stores number of
> > arguments on ctx-8, so it's easy to verify argument index and find
> > argument's position.
> >
> > Signed-off-by: KaFai Wan <mannkafai@gmail.com>
> > ---
> > kernel/trace/bpf_trace.c | 17 ++++++++++++++---
> > net/bpf/test_run.c | 13 +++++--------
> > 2 files changed, 19 insertions(+), 11 deletions(-)
> >
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 52c432a44aeb..eb4c56013493 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -1892,6 +1892,10 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > return &bpf_get_stackid_proto_raw_tp;
> > case BPF_FUNC_get_stack:
> > return &bpf_get_stack_proto_raw_tp;
> > + case BPF_FUNC_get_func_arg:
> > + return &bpf_get_func_arg_proto;
> > + case BPF_FUNC_get_func_arg_cnt:
> > + return &bpf_get_func_arg_cnt_proto;
> > case BPF_FUNC_get_attach_cookie:
> > return &bpf_get_attach_cookie_proto_tracing;
> > default:
> > @@ -1950,10 +1954,16 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > case BPF_FUNC_d_path:
> > return &bpf_d_path_proto;
> > case BPF_FUNC_get_func_arg:
> > + if (prog->type == BPF_PROG_TYPE_TRACING &&
> > + prog->expected_attach_type == BPF_TRACE_RAW_TP)
> > + return &bpf_get_func_arg_proto;
> > return bpf_prog_has_trampoline(prog) ? &bpf_get_func_arg_proto : NULL;
> > case BPF_FUNC_get_func_ret:
> > return bpf_prog_has_trampoline(prog) ? &bpf_get_func_ret_proto : NULL;
> > case BPF_FUNC_get_func_arg_cnt:
> > + if (prog->type == BPF_PROG_TYPE_TRACING &&
> > + prog->expected_attach_type == BPF_TRACE_RAW_TP)
> > + return &bpf_get_func_arg_cnt_proto;
> > return bpf_prog_has_trampoline(prog) ? &bpf_get_func_arg_cnt_proto : NULL;
> > case BPF_FUNC_get_attach_cookie:
> > if (prog->type == BPF_PROG_TYPE_TRACING &&
> > @@ -2312,7 +2322,7 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
> > #define REPEAT(X, FN, DL, ...) REPEAT_##X(FN, DL, __VA_ARGS__)
> >
> > #define SARG(X) u64 arg##X
> > -#define COPY(X) args[X] = arg##X
> > +#define COPY(X) args[X + 1] = arg##X
> >
> > #define __DL_COM (,)
> > #define __DL_SEM (;)
> > @@ -2323,9 +2333,10 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
> > void bpf_trace_run##x(struct bpf_raw_tp_link *link, \
> > REPEAT(x, SARG, __DL_COM, __SEQ_0_11)) \
> > { \
> > - u64 args[x]; \
> > + u64 args[x + 1]; \
> > + args[0] = x; \
> > REPEAT(x, COPY, __DL_SEM, __SEQ_0_11); \
> > - __bpf_trace_run(link, args); \
> > + __bpf_trace_run(link, args + 1); \
>
> This is neat, but what is this for?
> The program that attaches to a particular raw_tp knows what it is
> attaching to and how many arguments are there,
> so bpf_get_func_arg_cnt() is a 5th wheel.
>
> If the reason is "for completeness" then it's not a good reason
> to penalize performance. Though it's just an extra 8 byte of stack
> and a single store of a constant.
>
If we try to capture all arguments of a specific raw_tp in tracing programs,
We first obtain the arguments count from the format file in debugfs or BTF
and pass this count to the BPF program via .bss section or cookie (if
available).
If we store the count in ctx and get it via get_func_arg_cnt helper in
the BPF program,
a) It's easier and more efficient to get the arguments count in the BPF program.
b) It could use a single BPF program to capture arguments for multiple raw_tps,
reduce the number of BPF programs when massive tracing.
Thanks,
KaFai
> pw-bot: cr
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
2025-04-30 12:43 ` Kafai Wan
@ 2025-04-30 15:54 ` Leon Hwang
2025-04-30 16:53 ` Alexei Starovoitov
2025-05-01 20:53 ` Andrii Nakryiko
1 sibling, 1 reply; 16+ messages in thread
From: Leon Hwang @ 2025-04-30 15:54 UTC (permalink / raw)
To: Kafai Wan, Alexei Starovoitov
Cc: Song Liu, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
Matt Bobrowski, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Mykola Lysenko, Shuah Khan, LKML, bpf,
linux-trace-kernel, Network Development,
open list:KERNEL SELFTEST FRAMEWORK
On 2025/4/30 20:43, Kafai Wan wrote:
> On Wed, Apr 30, 2025 at 10:46 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Sat, Apr 26, 2025 at 9:00 AM KaFai Wan <mannkafai@gmail.com> wrote:
>>>
[...]
>>> @@ -2312,7 +2322,7 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
>>> #define REPEAT(X, FN, DL, ...) REPEAT_##X(FN, DL, __VA_ARGS__)
>>>
>>> #define SARG(X) u64 arg##X
>>> -#define COPY(X) args[X] = arg##X
>>> +#define COPY(X) args[X + 1] = arg##X
>>>
>>> #define __DL_COM (,)
>>> #define __DL_SEM (;)
>>> @@ -2323,9 +2333,10 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
>>> void bpf_trace_run##x(struct bpf_raw_tp_link *link, \
>>> REPEAT(x, SARG, __DL_COM, __SEQ_0_11)) \
>>> { \
>>> - u64 args[x]; \
>>> + u64 args[x + 1]; \
>>> + args[0] = x; \
>>> REPEAT(x, COPY, __DL_SEM, __SEQ_0_11); \
>>> - __bpf_trace_run(link, args); \
>>> + __bpf_trace_run(link, args + 1); \
>>
>> This is neat, but what is this for?
>> The program that attaches to a particular raw_tp knows what it is
>> attaching to and how many arguments are there,
>> so bpf_get_func_arg_cnt() is a 5th wheel.
>>
>> If the reason is "for completeness" then it's not a good reason
>> to penalize performance. Though it's just an extra 8 byte of stack
>> and a single store of a constant.
>>
> If we try to capture all arguments of a specific raw_tp in tracing programs,
> We first obtain the arguments count from the format file in debugfs or BTF
> and pass this count to the BPF program via .bss section or cookie (if
> available).
>
> If we store the count in ctx and get it via get_func_arg_cnt helper in
> the BPF program,
> a) It's easier and more efficient to get the arguments count in the BPF program.
> b) It could use a single BPF program to capture arguments for multiple raw_tps,
> reduce the number of BPF programs when massive tracing.
>
bpf_get_func_arg() will be very helpful for bpfsnoop[1] when tracing tp_btf.
In bpfsnoop, it can generate a small snippet of bpf instructions to use
bpf_get_func_arg() for retrieving and filtering arguments. For example,
with the netif_receive_skb tracepoint, bpfsnoop can use
bpf_get_func_arg() to filter the skb argument using pcap-filter(7)[2] or
a custom attribute-based filter. This will allow bpfsnoop to trace
multiple tracepoints using a single bpf program code.
[1] https://github.com/bpfsnoop/bpfsnoop
[2] https://www.tcpdump.org/manpages/pcap-filter.7.html
Thanks,
Leon
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
2025-04-30 15:54 ` Leon Hwang
@ 2025-04-30 16:53 ` Alexei Starovoitov
2025-05-02 14:25 ` Leon Hwang
0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-04-30 16:53 UTC (permalink / raw)
To: Leon Hwang
Cc: Kafai Wan, Song Liu, Jiri Olsa, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Matt Bobrowski, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Mykola Lysenko, Shuah Khan, LKML, bpf,
linux-trace-kernel, Network Development,
open list:KERNEL SELFTEST FRAMEWORK
On Wed, Apr 30, 2025 at 8:55 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 2025/4/30 20:43, Kafai Wan wrote:
> > On Wed, Apr 30, 2025 at 10:46 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >>
> >> On Sat, Apr 26, 2025 at 9:00 AM KaFai Wan <mannkafai@gmail.com> wrote:
> >>>
>
> [...]
>
> >>> @@ -2312,7 +2322,7 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
> >>> #define REPEAT(X, FN, DL, ...) REPEAT_##X(FN, DL, __VA_ARGS__)
> >>>
> >>> #define SARG(X) u64 arg##X
> >>> -#define COPY(X) args[X] = arg##X
> >>> +#define COPY(X) args[X + 1] = arg##X
> >>>
> >>> #define __DL_COM (,)
> >>> #define __DL_SEM (;)
> >>> @@ -2323,9 +2333,10 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
> >>> void bpf_trace_run##x(struct bpf_raw_tp_link *link, \
> >>> REPEAT(x, SARG, __DL_COM, __SEQ_0_11)) \
> >>> { \
> >>> - u64 args[x]; \
> >>> + u64 args[x + 1]; \
> >>> + args[0] = x; \
> >>> REPEAT(x, COPY, __DL_SEM, __SEQ_0_11); \
> >>> - __bpf_trace_run(link, args); \
> >>> + __bpf_trace_run(link, args + 1); \
> >>
> >> This is neat, but what is this for?
> >> The program that attaches to a particular raw_tp knows what it is
> >> attaching to and how many arguments are there,
> >> so bpf_get_func_arg_cnt() is a 5th wheel.
> >>
> >> If the reason is "for completeness" then it's not a good reason
> >> to penalize performance. Though it's just an extra 8 byte of stack
> >> and a single store of a constant.
> >>
> > If we try to capture all arguments of a specific raw_tp in tracing programs,
> > We first obtain the arguments count from the format file in debugfs or BTF
> > and pass this count to the BPF program via .bss section or cookie (if
> > available).
> >
> > If we store the count in ctx and get it via get_func_arg_cnt helper in
> > the BPF program,
> > a) It's easier and more efficient to get the arguments count in the BPF program.
> > b) It could use a single BPF program to capture arguments for multiple raw_tps,
> > reduce the number of BPF programs when massive tracing.
> >
>
>
> bpf_get_func_arg() will be very helpful for bpfsnoop[1] when tracing tp_btf.
>
> In bpfsnoop, it can generate a small snippet of bpf instructions to use
> bpf_get_func_arg() for retrieving and filtering arguments. For example,
> with the netif_receive_skb tracepoint, bpfsnoop can use
> bpf_get_func_arg() to filter the skb argument using pcap-filter(7)[2] or
> a custom attribute-based filter. This will allow bpfsnoop to trace
> multiple tracepoints using a single bpf program code.
I doubt you thought it through end to end.
When tracepoint prog attaches we have this check:
/*
* check that program doesn't access arguments beyond what's
* available in this tracepoint
*/
if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
return -EINVAL;
So you cannot have a single bpf prog attached to many tracepoints
to read many arguments as-is.
You can hack around that limit with probe_read,
but the values won't be trusted and you won't be able to pass
such untrusted pointers into skb and other helpers/kfuncs.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
2025-04-30 12:43 ` Kafai Wan
2025-04-30 15:54 ` Leon Hwang
@ 2025-05-01 20:53 ` Andrii Nakryiko
1 sibling, 0 replies; 16+ messages in thread
From: Andrii Nakryiko @ 2025-05-01 20:53 UTC (permalink / raw)
To: Kafai Wan
Cc: Alexei Starovoitov, Song Liu, Jiri Olsa, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Matt Bobrowski, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Mykola Lysenko, Shuah Khan, LKML, bpf,
linux-trace-kernel, Network Development,
open list:KERNEL SELFTEST FRAMEWORK, Leon Hwang
On Wed, Apr 30, 2025 at 5:44 AM Kafai Wan <mannkafai@gmail.com> wrote:
>
> On Wed, Apr 30, 2025 at 10:46 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Sat, Apr 26, 2025 at 9:00 AM KaFai Wan <mannkafai@gmail.com> wrote:
> > >
> > > Adding support to use get_func_[arg|arg_cnt] helpers in raw_tp/tp_btf
> > > programs.
> > >
> > > We can use get_func_[arg|ret|arg_cnt] helpers in fentry/fexit/fmod_ret
> > > programs currently. If we try to use get_func_[arg|arg_cnt] helpers in
> > > raw_tp/tp_btf programs, verifier will fail to load the program with:
> > >
> > > ; __u64 cnt = bpf_get_func_arg_cnt(ctx);
> > > 3: (85) call bpf_get_func_arg_cnt#185
> > > unknown func bpf_get_func_arg_cnt#185
> > >
> > > Adding get_func_[arg|arg_cnt] helpers in raw_tp_prog_func_proto and
> > > tracing_prog_func_proto for raw tracepoint.
> > >
> > > Adding 1 arg on ctx of raw tracepoint program and make it stores number of
> > > arguments on ctx-8, so it's easy to verify argument index and find
> > > argument's position.
> > >
> > > Signed-off-by: KaFai Wan <mannkafai@gmail.com>
> > > ---
> > > kernel/trace/bpf_trace.c | 17 ++++++++++++++---
> > > net/bpf/test_run.c | 13 +++++--------
> > > 2 files changed, 19 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > index 52c432a44aeb..eb4c56013493 100644
> > > --- a/kernel/trace/bpf_trace.c
> > > +++ b/kernel/trace/bpf_trace.c
> > > @@ -1892,6 +1892,10 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > > return &bpf_get_stackid_proto_raw_tp;
> > > case BPF_FUNC_get_stack:
> > > return &bpf_get_stack_proto_raw_tp;
> > > + case BPF_FUNC_get_func_arg:
> > > + return &bpf_get_func_arg_proto;
> > > + case BPF_FUNC_get_func_arg_cnt:
> > > + return &bpf_get_func_arg_cnt_proto;
> > > case BPF_FUNC_get_attach_cookie:
> > > return &bpf_get_attach_cookie_proto_tracing;
> > > default:
> > > @@ -1950,10 +1954,16 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > > case BPF_FUNC_d_path:
> > > return &bpf_d_path_proto;
> > > case BPF_FUNC_get_func_arg:
> > > + if (prog->type == BPF_PROG_TYPE_TRACING &&
> > > + prog->expected_attach_type == BPF_TRACE_RAW_TP)
> > > + return &bpf_get_func_arg_proto;
> > > return bpf_prog_has_trampoline(prog) ? &bpf_get_func_arg_proto : NULL;
> > > case BPF_FUNC_get_func_ret:
> > > return bpf_prog_has_trampoline(prog) ? &bpf_get_func_ret_proto : NULL;
> > > case BPF_FUNC_get_func_arg_cnt:
> > > + if (prog->type == BPF_PROG_TYPE_TRACING &&
> > > + prog->expected_attach_type == BPF_TRACE_RAW_TP)
> > > + return &bpf_get_func_arg_cnt_proto;
> > > return bpf_prog_has_trampoline(prog) ? &bpf_get_func_arg_cnt_proto : NULL;
> > > case BPF_FUNC_get_attach_cookie:
> > > if (prog->type == BPF_PROG_TYPE_TRACING &&
> > > @@ -2312,7 +2322,7 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
> > > #define REPEAT(X, FN, DL, ...) REPEAT_##X(FN, DL, __VA_ARGS__)
> > >
> > > #define SARG(X) u64 arg##X
> > > -#define COPY(X) args[X] = arg##X
> > > +#define COPY(X) args[X + 1] = arg##X
> > >
> > > #define __DL_COM (,)
> > > #define __DL_SEM (;)
> > > @@ -2323,9 +2333,10 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
> > > void bpf_trace_run##x(struct bpf_raw_tp_link *link, \
> > > REPEAT(x, SARG, __DL_COM, __SEQ_0_11)) \
> > > { \
> > > - u64 args[x]; \
> > > + u64 args[x + 1]; \
> > > + args[0] = x; \
> > > REPEAT(x, COPY, __DL_SEM, __SEQ_0_11); \
> > > - __bpf_trace_run(link, args); \
> > > + __bpf_trace_run(link, args + 1); \
> >
> > This is neat, but what is this for?
> > The program that attaches to a particular raw_tp knows what it is
> > attaching to and how many arguments are there,
> > so bpf_get_func_arg_cnt() is a 5th wheel.
> >
> > If the reason is "for completeness" then it's not a good reason
> > to penalize performance. Though it's just an extra 8 byte of stack
> > and a single store of a constant.
> >
> If we try to capture all arguments of a specific raw_tp in tracing programs,
> We first obtain the arguments count from the format file in debugfs or BTF
> and pass this count to the BPF program via .bss section or cookie (if
> available).
To do anything useful with those arguments beside printing their
values in hex you'd need to lookup BTF anyways, no? So at that point
what's the problem just passing the number of arguments as a BPF
cookie?
And then bpf_probe_read_kernel(..., cnt * 8, ctx)?
>
> If we store the count in ctx and get it via get_func_arg_cnt helper in
> the BPF program,
> a) It's easier and more efficient to get the arguments count in the BPF program.
> b) It could use a single BPF program to capture arguments for multiple raw_tps,
> reduce the number of BPF programs when massive tracing.
>
> Thanks,
> KaFai
>
> > pw-bot: cr
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next 2/4] bpf: Enable BPF_PROG_TEST_RUN for tp_btf
2025-04-26 16:00 ` [PATCH bpf-next 2/4] bpf: Enable BPF_PROG_TEST_RUN for tp_btf KaFai Wan
@ 2025-05-01 20:55 ` Andrii Nakryiko
0 siblings, 0 replies; 16+ messages in thread
From: Andrii Nakryiko @ 2025-05-01 20:55 UTC (permalink / raw)
To: KaFai Wan
Cc: song, jolsa, ast, daniel, andrii, martin.lau, eddyz87,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo,
mattbobrowski, rostedt, mhiramat, mathieu.desnoyers, davem,
edumazet, kuba, pabeni, horms, mykolal, shuah, linux-kernel, bpf,
linux-trace-kernel, netdev, linux-kselftest, leon.hwang
On Sat, Apr 26, 2025 at 9:01 AM KaFai Wan <mannkafai@gmail.com> wrote:
>
> Add .test_run for tp_btf. Use the .test_run for raw_tp.
Hm... so now you'll be able to pass arbitrary values as pointers to
kernel structs (e.g., arbitrary u64 as struct task_struct * pointer),
not sure this is a good idea...
>
> Signed-off-by: KaFai Wan <mannkafai@gmail.com>
> ---
> net/bpf/test_run.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 8cb285187270..8c901ec92341 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -690,6 +690,9 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog,
> int b = 2, err = -EFAULT;
> u32 retval = 0;
>
> + if (prog->expected_attach_type == BPF_TRACE_RAW_TP)
> + return bpf_prog_test_run_raw_tp(prog, kattr, uattr);
> +
> if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size)
> return -EINVAL;
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
2025-04-30 16:53 ` Alexei Starovoitov
@ 2025-05-02 14:25 ` Leon Hwang
2025-05-06 21:01 ` Andrii Nakryiko
0 siblings, 1 reply; 16+ messages in thread
From: Leon Hwang @ 2025-05-02 14:25 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Kafai Wan, Song Liu, Jiri Olsa, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Matt Bobrowski, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Mykola Lysenko, Shuah Khan, LKML, bpf,
linux-trace-kernel, Network Development,
open list:KERNEL SELFTEST FRAMEWORK
On 2025/5/1 00:53, Alexei Starovoitov wrote:
> On Wed, Apr 30, 2025 at 8:55 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 2025/4/30 20:43, Kafai Wan wrote:
>>> On Wed, Apr 30, 2025 at 10:46 AM Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>>
>>>> On Sat, Apr 26, 2025 at 9:00 AM KaFai Wan <mannkafai@gmail.com> wrote:
>>>>>
>>
[...]
>>
>>
>> bpf_get_func_arg() will be very helpful for bpfsnoop[1] when tracing tp_btf.
>>
>> In bpfsnoop, it can generate a small snippet of bpf instructions to use
>> bpf_get_func_arg() for retrieving and filtering arguments. For example,
>> with the netif_receive_skb tracepoint, bpfsnoop can use
>> bpf_get_func_arg() to filter the skb argument using pcap-filter(7)[2] or
>> a custom attribute-based filter. This will allow bpfsnoop to trace
>> multiple tracepoints using a single bpf program code.
>
> I doubt you thought it through end to end.
> When tracepoint prog attaches we have this check:
> /*
> * check that program doesn't access arguments beyond what's
> * available in this tracepoint
> */
> if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
> return -EINVAL;
>
> So you cannot have a single bpf prog attached to many tracepoints
> to read many arguments as-is.
> You can hack around that limit with probe_read,
> but the values won't be trusted and you won't be able to pass
> such untrusted pointers into skb and other helpers/kfuncs.
I understand that a single bpf program cannot be attached to multiple
tracepoints using tp_btf. However, the same bpf code can be reused to
create multiple bpf programs, each attached to a different tracepoint.
For example:
SEC("fentry")
int BPF_PROG(fentry_fn)
{
/* ... */
return BPF_OK;
}
The above fentry code can be compiled into multiple bpf programs to
trace different kernel functions. Each program can then use the
bpf_get_func_arg() helper to access the arguments of the traced function.
With this patch, tp_btf will gain similar flexibility. For example:
SEC("tp_btf")
int BPF_PROG(tp_btf_fn)
{
/* ... */
return BPF_OK;
}
Here, bpf_get_func_arg() can be used to access tracepoint arguments.
Currently, due to the lack of bpf_get_func_arg() support in tp_btf,
bpfsnoop[1] uses bpf_probe_read_kernel() to read tracepoint arguments.
This is also used when filtering specific argument attributes.
For instance, to filter the skb argument of the netif_receive_skb
tracepoint by 'skb->dev->ifindex == 2', the translated bpf instructions
with bpf_probe_read_kernel() would look like this:
bool filter_arg(__u64 * args):
; filter_arg(__u64 *args)
209: (79) r1 = *(u64 *)(r1 +0) /* all tracepoint's argument has been
read into args using bpf_probe_read_kernel() */
210: (bf) r3 = r1
211: (07) r3 += 16
212: (b7) r2 = 8
213: (bf) r1 = r10
214: (07) r1 += -8
215: (85) call bpf_probe_read_kernel#-125280
216: (79) r3 = *(u64 *)(r10 -8)
217: (15) if r3 == 0x0 goto pc+10
218: (07) r3 += 224
219: (b7) r2 = 8
220: (bf) r1 = r10
221: (07) r1 += -8
222: (85) call bpf_probe_read_kernel#-125280
223: (79) r3 = *(u64 *)(r10 -8)
224: (67) r3 <<= 32
225: (77) r3 >>= 32
226: (b7) r0 = 1
227: (15) if r3 == 0x2 goto pc+1
228: (af) r0 ^= r0
229: (95) exit
If bpf_get_func_arg() is supported in tp_btf, the bpf program will
instead look like:
static __noinline bool
filter_skb(void *ctx)
{
struct sk_buff *skb;
(void) bpf_get_func_arg(ctx, 0, (__u64 *) &skb);
return skb->dev->ifindex == 2;
}
This will simplify the generated code and eliminate the need for
bpf_probe_read_kernel() calls. However, in my tests (on kernel
6.8.0-35-generic, Ubuntu 24.04 LTS), the pointer returned by
bpf_get_func_arg() is marked as a scalar rather than a trusted pointer:
0: R1=ctx() R10=fp0
; if (!filter_skb(ctx))
0: (85) call pc+3
caller:
R10=fp0
callee:
frame1: R1=ctx() R10=fp0
4: frame1: R1=ctx() R10=fp0
; filter_skb(void *ctx)
4: (bf) r3 = r10 ; frame1: R3_w=fp0 R10=fp0
;
5: (07) r3 += -8 ; frame1: R3_w=fp-8
; (void) bpf_get_func_arg(ctx, 0, (__u64 *) &skb);
6: (b7) r2 = 0 ; frame1: R2_w=0
7: (85) call bpf_get_func_arg#183 ; frame1: R0_w=scalar()
; return skb->dev->ifindex == 2;
8: (79) r1 = *(u64 *)(r10 -8) ; frame1: R1_w=scalar() R10=fp0
fp-8=mmmmmmmm
; return skb->dev->ifindex == 2;
9: (79) r1 = *(u64 *)(r1 +16)
R1 invalid mem access 'scalar'
processed 7 insns (limit 1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0
If the returned skb is a trusted pointer, the verifier will accept
something like:
static __noinline bool
filter_skb(struct sk_buff *skb)
{
return skb->dev->ifindex == 2;
}
Which will compile into much simpler and more efficient instructions:
bool filter_skb(struct sk_buff * skb):
; return skb->dev->ifindex == 2;
92: (79) r1 = *(u64 *)(r1 +16)
; return skb->dev->ifindex == 2;
93: (61) r1 = *(u32 *)(r1 +224)
94: (b7) r0 = 1
; return skb->dev->ifindex == 2;
95: (15) if r1 == 0x2 goto pc+1
96: (b7) r0 = 0
; return skb->dev->ifindex == 2;
97: (95) exit
In conclusion:
1. It will be better if the pointer returned by bpf_get_func_arg() is
trusted, only when the argument index is a known constant.
2. Adding bpf_get_func_arg() support to tp_btf will significantly
simplify and improve tools like bpfsnoop.
[1] https://github.com/bpfsnoop/bpfsnoop
Thanks,
Leon
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
2025-05-02 14:25 ` Leon Hwang
@ 2025-05-06 21:01 ` Andrii Nakryiko
2025-05-12 11:12 ` Leon Hwang
0 siblings, 1 reply; 16+ messages in thread
From: Andrii Nakryiko @ 2025-05-06 21:01 UTC (permalink / raw)
To: Leon Hwang
Cc: Alexei Starovoitov, Kafai Wan, Song Liu, Jiri Olsa,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard, Yonghong Song, John Fastabend, KP Singh,
Stanislav Fomichev, Hao Luo, Matt Bobrowski, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Mykola Lysenko, Shuah Khan, LKML, bpf, linux-trace-kernel,
Network Development, open list:KERNEL SELFTEST FRAMEWORK
On Fri, May 2, 2025 at 7:26 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 2025/5/1 00:53, Alexei Starovoitov wrote:
> > On Wed, Apr 30, 2025 at 8:55 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >>
> >>
> >> On 2025/4/30 20:43, Kafai Wan wrote:
> >>> On Wed, Apr 30, 2025 at 10:46 AM Alexei Starovoitov
> >>> <alexei.starovoitov@gmail.com> wrote:
> >>>>
> >>>> On Sat, Apr 26, 2025 at 9:00 AM KaFai Wan <mannkafai@gmail.com> wrote:
> >>>>>
> >>
>
> [...]
>
> >>
> >>
> >> bpf_get_func_arg() will be very helpful for bpfsnoop[1] when tracing tp_btf.
> >>
> >> In bpfsnoop, it can generate a small snippet of bpf instructions to use
> >> bpf_get_func_arg() for retrieving and filtering arguments. For example,
> >> with the netif_receive_skb tracepoint, bpfsnoop can use
> >> bpf_get_func_arg() to filter the skb argument using pcap-filter(7)[2] or
> >> a custom attribute-based filter. This will allow bpfsnoop to trace
> >> multiple tracepoints using a single bpf program code.
> >
> > I doubt you thought it through end to end.
> > When tracepoint prog attaches we have this check:
> > /*
> > * check that program doesn't access arguments beyond what's
> > * available in this tracepoint
> > */
> > if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
> > return -EINVAL;
> >
> > So you cannot have a single bpf prog attached to many tracepoints
> > to read many arguments as-is.
> > You can hack around that limit with probe_read,
> > but the values won't be trusted and you won't be able to pass
> > such untrusted pointers into skb and other helpers/kfuncs.
>
> I understand that a single bpf program cannot be attached to multiple
> tracepoints using tp_btf. However, the same bpf code can be reused to
> create multiple bpf programs, each attached to a different tracepoint.
>
> For example:
>
> SEC("fentry")
> int BPF_PROG(fentry_fn)
> {
> /* ... */
> return BPF_OK;
> }
>
> The above fentry code can be compiled into multiple bpf programs to
> trace different kernel functions. Each program can then use the
> bpf_get_func_arg() helper to access the arguments of the traced function.
>
> With this patch, tp_btf will gain similar flexibility. For example:
>
> SEC("tp_btf")
> int BPF_PROG(tp_btf_fn)
> {
> /* ... */
> return BPF_OK;
> }
>
> Here, bpf_get_func_arg() can be used to access tracepoint arguments.
>
> Currently, due to the lack of bpf_get_func_arg() support in tp_btf,
> bpfsnoop[1] uses bpf_probe_read_kernel() to read tracepoint arguments.
> This is also used when filtering specific argument attributes.
>
> For instance, to filter the skb argument of the netif_receive_skb
> tracepoint by 'skb->dev->ifindex == 2', the translated bpf instructions
> with bpf_probe_read_kernel() would look like this:
>
> bool filter_arg(__u64 * args):
> ; filter_arg(__u64 *args)
> 209: (79) r1 = *(u64 *)(r1 +0) /* all tracepoint's argument has been
> read into args using bpf_probe_read_kernel() */
> 210: (bf) r3 = r1
> 211: (07) r3 += 16
> 212: (b7) r2 = 8
> 213: (bf) r1 = r10
> 214: (07) r1 += -8
> 215: (85) call bpf_probe_read_kernel#-125280
> 216: (79) r3 = *(u64 *)(r10 -8)
> 217: (15) if r3 == 0x0 goto pc+10
> 218: (07) r3 += 224
> 219: (b7) r2 = 8
> 220: (bf) r1 = r10
> 221: (07) r1 += -8
> 222: (85) call bpf_probe_read_kernel#-125280
> 223: (79) r3 = *(u64 *)(r10 -8)
> 224: (67) r3 <<= 32
> 225: (77) r3 >>= 32
> 226: (b7) r0 = 1
> 227: (15) if r3 == 0x2 goto pc+1
> 228: (af) r0 ^= r0
> 229: (95) exit
>
> If bpf_get_func_arg() is supported in tp_btf, the bpf program will
> instead look like:
>
> static __noinline bool
> filter_skb(void *ctx)
> {
> struct sk_buff *skb;
>
> (void) bpf_get_func_arg(ctx, 0, (__u64 *) &skb);
> return skb->dev->ifindex == 2;
> }
>
> This will simplify the generated code and eliminate the need for
> bpf_probe_read_kernel() calls. However, in my tests (on kernel
> 6.8.0-35-generic, Ubuntu 24.04 LTS), the pointer returned by
> bpf_get_func_arg() is marked as a scalar rather than a trusted pointer:
>
> 0: R1=ctx() R10=fp0
> ; if (!filter_skb(ctx))
> 0: (85) call pc+3
> caller:
> R10=fp0
> callee:
> frame1: R1=ctx() R10=fp0
> 4: frame1: R1=ctx() R10=fp0
> ; filter_skb(void *ctx)
> 4: (bf) r3 = r10 ; frame1: R3_w=fp0 R10=fp0
> ;
> 5: (07) r3 += -8 ; frame1: R3_w=fp-8
> ; (void) bpf_get_func_arg(ctx, 0, (__u64 *) &skb);
> 6: (b7) r2 = 0 ; frame1: R2_w=0
> 7: (85) call bpf_get_func_arg#183 ; frame1: R0_w=scalar()
> ; return skb->dev->ifindex == 2;
> 8: (79) r1 = *(u64 *)(r10 -8) ; frame1: R1_w=scalar() R10=fp0
> fp-8=mmmmmmmm
> ; return skb->dev->ifindex == 2;
> 9: (79) r1 = *(u64 *)(r1 +16)
> R1 invalid mem access 'scalar'
> processed 7 insns (limit 1000000) max_states_per_insn 0 total_states 0
> peak_states 0 mark_read 0
>
> If the returned skb is a trusted pointer, the verifier will accept
> something like:
>
> static __noinline bool
> filter_skb(struct sk_buff *skb)
> {
> return skb->dev->ifindex == 2;
> }
>
> Which will compile into much simpler and more efficient instructions:
>
> bool filter_skb(struct sk_buff * skb):
> ; return skb->dev->ifindex == 2;
> 92: (79) r1 = *(u64 *)(r1 +16)
> ; return skb->dev->ifindex == 2;
> 93: (61) r1 = *(u32 *)(r1 +224)
> 94: (b7) r0 = 1
> ; return skb->dev->ifindex == 2;
> 95: (15) if r1 == 0x2 goto pc+1
> 96: (b7) r0 = 0
> ; return skb->dev->ifindex == 2;
> 97: (95) exit
>
> In conclusion:
>
> 1. It will be better if the pointer returned by bpf_get_func_arg() is
> trusted, only when the argument index is a known constant.
bpf_get_func_arg() was never meant to return trusted arguments, so
this, IMO, is pushing it too far.
> 2. Adding bpf_get_func_arg() support to tp_btf will significantly
> simplify and improve tools like bpfsnoop.
"Significantly simplify and improve" is a bit of an exaggeration,
given BPF cookies can be used for getting number of arguments of
tp_btf, as for the getting rid of bpf_probe_read_kernel(), tbh, more
generally useful addition would be an untyped counterpart to
bpf_core_cast(), which wouldn't need BTF type information, but will
treat all accessed memory as raw bytes (but will still install
exception handler just like with bpf_core_cast()).
>
> [1] https://github.com/bpfsnoop/bpfsnoop
>
> Thanks,
> Leon
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
2025-05-06 21:01 ` Andrii Nakryiko
@ 2025-05-12 11:12 ` Leon Hwang
2025-05-12 15:25 ` Alexei Starovoitov
0 siblings, 1 reply; 16+ messages in thread
From: Leon Hwang @ 2025-05-12 11:12 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Alexei Starovoitov, Kafai Wan, Song Liu, Jiri Olsa,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard, Yonghong Song, John Fastabend, KP Singh,
Stanislav Fomichev, Hao Luo, Matt Bobrowski, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Mykola Lysenko, Shuah Khan, LKML, bpf, linux-trace-kernel,
Network Development, open list:KERNEL SELFTEST FRAMEWORK
On 2025/5/7 05:01, Andrii Nakryiko wrote:
> On Fri, May 2, 2025 at 7:26 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 2025/5/1 00:53, Alexei Starovoitov wrote:
>>> On Wed, Apr 30, 2025 at 8:55 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>>>
>>>>
>>>>
>>>> On 2025/4/30 20:43, Kafai Wan wrote:
>>>>> On Wed, Apr 30, 2025 at 10:46 AM Alexei Starovoitov
>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>
>>>>>> On Sat, Apr 26, 2025 at 9:00 AM KaFai Wan <mannkafai@gmail.com> wrote:
>>>>>>>
>>>>
>>
>> [...]
>>
>>>>
>>>>
>>>> bpf_get_func_arg() will be very helpful for bpfsnoop[1] when tracing tp_btf.
>>>>
>>>> In bpfsnoop, it can generate a small snippet of bpf instructions to use
>>>> bpf_get_func_arg() for retrieving and filtering arguments. For example,
>>>> with the netif_receive_skb tracepoint, bpfsnoop can use
>>>> bpf_get_func_arg() to filter the skb argument using pcap-filter(7)[2] or
>>>> a custom attribute-based filter. This will allow bpfsnoop to trace
>>>> multiple tracepoints using a single bpf program code.
>>>
>>> I doubt you thought it through end to end.
>>> When tracepoint prog attaches we have this check:
>>> /*
>>> * check that program doesn't access arguments beyond what's
>>> * available in this tracepoint
>>> */
>>> if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
>>> return -EINVAL;
>>>
>>> So you cannot have a single bpf prog attached to many tracepoints
>>> to read many arguments as-is.
>>> You can hack around that limit with probe_read,
>>> but the values won't be trusted and you won't be able to pass
>>> such untrusted pointers into skb and other helpers/kfuncs.
>>
>> I understand that a single bpf program cannot be attached to multiple
>> tracepoints using tp_btf. However, the same bpf code can be reused to
>> create multiple bpf programs, each attached to a different tracepoint.
>>
>> For example:
>>
>> SEC("fentry")
>> int BPF_PROG(fentry_fn)
>> {
>> /* ... */
>> return BPF_OK;
>> }
>>
>> The above fentry code can be compiled into multiple bpf programs to
>> trace different kernel functions. Each program can then use the
>> bpf_get_func_arg() helper to access the arguments of the traced function.
>>
>> With this patch, tp_btf will gain similar flexibility. For example:
>>
>> SEC("tp_btf")
>> int BPF_PROG(tp_btf_fn)
>> {
>> /* ... */
>> return BPF_OK;
>> }
>>
>> Here, bpf_get_func_arg() can be used to access tracepoint arguments.
>>
>> Currently, due to the lack of bpf_get_func_arg() support in tp_btf,
>> bpfsnoop[1] uses bpf_probe_read_kernel() to read tracepoint arguments.
>> This is also used when filtering specific argument attributes.
>>
>> For instance, to filter the skb argument of the netif_receive_skb
>> tracepoint by 'skb->dev->ifindex == 2', the translated bpf instructions
>> with bpf_probe_read_kernel() would look like this:
>>
>> bool filter_arg(__u64 * args):
>> ; filter_arg(__u64 *args)
>> 209: (79) r1 = *(u64 *)(r1 +0) /* all tracepoint's argument has been
>> read into args using bpf_probe_read_kernel() */
>> 210: (bf) r3 = r1
>> 211: (07) r3 += 16
>> 212: (b7) r2 = 8
>> 213: (bf) r1 = r10
>> 214: (07) r1 += -8
>> 215: (85) call bpf_probe_read_kernel#-125280
>> 216: (79) r3 = *(u64 *)(r10 -8)
>> 217: (15) if r3 == 0x0 goto pc+10
>> 218: (07) r3 += 224
>> 219: (b7) r2 = 8
>> 220: (bf) r1 = r10
>> 221: (07) r1 += -8
>> 222: (85) call bpf_probe_read_kernel#-125280
>> 223: (79) r3 = *(u64 *)(r10 -8)
>> 224: (67) r3 <<= 32
>> 225: (77) r3 >>= 32
>> 226: (b7) r0 = 1
>> 227: (15) if r3 == 0x2 goto pc+1
>> 228: (af) r0 ^= r0
>> 229: (95) exit
>>
>> If bpf_get_func_arg() is supported in tp_btf, the bpf program will
>> instead look like:
>>
>> static __noinline bool
>> filter_skb(void *ctx)
>> {
>> struct sk_buff *skb;
>>
>> (void) bpf_get_func_arg(ctx, 0, (__u64 *) &skb);
>> return skb->dev->ifindex == 2;
>> }
>>
>> This will simplify the generated code and eliminate the need for
>> bpf_probe_read_kernel() calls. However, in my tests (on kernel
>> 6.8.0-35-generic, Ubuntu 24.04 LTS), the pointer returned by
>> bpf_get_func_arg() is marked as a scalar rather than a trusted pointer:
>>
>> 0: R1=ctx() R10=fp0
>> ; if (!filter_skb(ctx))
>> 0: (85) call pc+3
>> caller:
>> R10=fp0
>> callee:
>> frame1: R1=ctx() R10=fp0
>> 4: frame1: R1=ctx() R10=fp0
>> ; filter_skb(void *ctx)
>> 4: (bf) r3 = r10 ; frame1: R3_w=fp0 R10=fp0
>> ;
>> 5: (07) r3 += -8 ; frame1: R3_w=fp-8
>> ; (void) bpf_get_func_arg(ctx, 0, (__u64 *) &skb);
>> 6: (b7) r2 = 0 ; frame1: R2_w=0
>> 7: (85) call bpf_get_func_arg#183 ; frame1: R0_w=scalar()
>> ; return skb->dev->ifindex == 2;
>> 8: (79) r1 = *(u64 *)(r10 -8) ; frame1: R1_w=scalar() R10=fp0
>> fp-8=mmmmmmmm
>> ; return skb->dev->ifindex == 2;
>> 9: (79) r1 = *(u64 *)(r1 +16)
>> R1 invalid mem access 'scalar'
>> processed 7 insns (limit 1000000) max_states_per_insn 0 total_states 0
>> peak_states 0 mark_read 0
>>
>> If the returned skb is a trusted pointer, the verifier will accept
>> something like:
>>
>> static __noinline bool
>> filter_skb(struct sk_buff *skb)
>> {
>> return skb->dev->ifindex == 2;
>> }
>>
>> Which will compile into much simpler and more efficient instructions:
>>
>> bool filter_skb(struct sk_buff * skb):
>> ; return skb->dev->ifindex == 2;
>> 92: (79) r1 = *(u64 *)(r1 +16)
>> ; return skb->dev->ifindex == 2;
>> 93: (61) r1 = *(u32 *)(r1 +224)
>> 94: (b7) r0 = 1
>> ; return skb->dev->ifindex == 2;
>> 95: (15) if r1 == 0x2 goto pc+1
>> 96: (b7) r0 = 0
>> ; return skb->dev->ifindex == 2;
>> 97: (95) exit
>>
>> In conclusion:
>>
>> 1. It will be better if the pointer returned by bpf_get_func_arg() is
>> trusted, only when the argument index is a known constant.
>
> bpf_get_func_arg() was never meant to return trusted arguments, so
> this, IMO, is pushing it too far.
>
>> 2. Adding bpf_get_func_arg() support to tp_btf will significantly
>> simplify and improve tools like bpfsnoop.
>
> "Significantly simplify and improve" is a bit of an exaggeration,
> given BPF cookies can be used for getting number of arguments of
> tp_btf, as for the getting rid of bpf_probe_read_kernel(), tbh, more
> generally useful addition would be an untyped counterpart to
> bpf_core_cast(), which wouldn't need BTF type information, but will
> treat all accessed memory as raw bytes (but will still install
> exception handler just like with bpf_core_cast()).
>
Cool! The bpf_rdonly_cast() kfunc used by the bpf_core_cast() macro
works well in bpfsnoop.
The expression 'skb->dev->ifindex == 2' is translated into:
bool filter_arg(__u64 * args):
; filter_arg(__u64 *args)
209: (bf) r9 = r1
210: (79) r8 = *(u64 *)(r9 +0)
211: (bf) r1 = r8
212: (b7) r2 = 6973
213: (bf) r0 = r1
214: (79) r1 = *(u64 *)(r0 +16)
215: (15) if r1 == 0x0 goto pc+12
216: (07) r1 += 224
217: (bf) r3 = r1
218: (b7) r2 = 8
219: (bf) r1 = r10
220: (07) r1 += -8
221: (85) call bpf_probe_read_kernel#-125280
222: (79) r8 = *(u64 *)(r10 -8)
223: (67) r8 <<= 32
224: (77) r8 >>= 32
225: (55) if r8 != 0x2 goto pc+2
226: (b7) r8 = 1
227: (05) goto pc+1
228: (af) r8 ^= r8
229: (bf) r0 = r8
230: (95) exit
However, since bpf_rdonly_cast() is a kfunc, it causes registers r1–r5
to be considered volatile.
If the verifier could trust the pointer fetched by bpf_get_func_arg(),
this extra cost from bpf_rdonly_cast() could be avoided.
Thanks,
Leon
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
2025-05-12 11:12 ` Leon Hwang
@ 2025-05-12 15:25 ` Alexei Starovoitov
2025-05-12 16:01 ` Leon Hwang
0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-05-12 15:25 UTC (permalink / raw)
To: Leon Hwang
Cc: Andrii Nakryiko, Kafai Wan, Song Liu, Jiri Olsa,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard, Yonghong Song, John Fastabend, KP Singh,
Stanislav Fomichev, Hao Luo, Matt Bobrowski, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Mykola Lysenko, Shuah Khan, LKML, bpf, linux-trace-kernel,
Network Development, open list:KERNEL SELFTEST FRAMEWORK
On Mon, May 12, 2025 at 4:12 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 2025/5/7 05:01, Andrii Nakryiko wrote:
> > On Fri, May 2, 2025 at 7:26 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >>
> >>
> >> On 2025/5/1 00:53, Alexei Starovoitov wrote:
> >>> On Wed, Apr 30, 2025 at 8:55 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2025/4/30 20:43, Kafai Wan wrote:
> >>>>> On Wed, Apr 30, 2025 at 10:46 AM Alexei Starovoitov
> >>>>> <alexei.starovoitov@gmail.com> wrote:
> >>>>>>
> >>>>>> On Sat, Apr 26, 2025 at 9:00 AM KaFai Wan <mannkafai@gmail.com> wrote:
> >>>>>>>
> >>>>
> >>
> >> [...]
> >>
> >>>>
> >>>>
> >>>> bpf_get_func_arg() will be very helpful for bpfsnoop[1] when tracing tp_btf.
> >>>>
> >>>> In bpfsnoop, it can generate a small snippet of bpf instructions to use
> >>>> bpf_get_func_arg() for retrieving and filtering arguments. For example,
> >>>> with the netif_receive_skb tracepoint, bpfsnoop can use
> >>>> bpf_get_func_arg() to filter the skb argument using pcap-filter(7)[2] or
> >>>> a custom attribute-based filter. This will allow bpfsnoop to trace
> >>>> multiple tracepoints using a single bpf program code.
> >>>
> >>> I doubt you thought it through end to end.
> >>> When tracepoint prog attaches we have this check:
> >>> /*
> >>> * check that program doesn't access arguments beyond what's
> >>> * available in this tracepoint
> >>> */
> >>> if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64))
> >>> return -EINVAL;
> >>>
> >>> So you cannot have a single bpf prog attached to many tracepoints
> >>> to read many arguments as-is.
> >>> You can hack around that limit with probe_read,
> >>> but the values won't be trusted and you won't be able to pass
> >>> such untrusted pointers into skb and other helpers/kfuncs.
> >>
> >> I understand that a single bpf program cannot be attached to multiple
> >> tracepoints using tp_btf. However, the same bpf code can be reused to
> >> create multiple bpf programs, each attached to a different tracepoint.
> >>
> >> For example:
> >>
> >> SEC("fentry")
> >> int BPF_PROG(fentry_fn)
> >> {
> >> /* ... */
> >> return BPF_OK;
> >> }
> >>
> >> The above fentry code can be compiled into multiple bpf programs to
> >> trace different kernel functions. Each program can then use the
> >> bpf_get_func_arg() helper to access the arguments of the traced function.
> >>
> >> With this patch, tp_btf will gain similar flexibility. For example:
> >>
> >> SEC("tp_btf")
> >> int BPF_PROG(tp_btf_fn)
> >> {
> >> /* ... */
> >> return BPF_OK;
> >> }
> >>
> >> Here, bpf_get_func_arg() can be used to access tracepoint arguments.
> >>
> >> Currently, due to the lack of bpf_get_func_arg() support in tp_btf,
> >> bpfsnoop[1] uses bpf_probe_read_kernel() to read tracepoint arguments.
> >> This is also used when filtering specific argument attributes.
> >>
> >> For instance, to filter the skb argument of the netif_receive_skb
> >> tracepoint by 'skb->dev->ifindex == 2', the translated bpf instructions
> >> with bpf_probe_read_kernel() would look like this:
> >>
> >> bool filter_arg(__u64 * args):
> >> ; filter_arg(__u64 *args)
> >> 209: (79) r1 = *(u64 *)(r1 +0) /* all tracepoint's argument has been
> >> read into args using bpf_probe_read_kernel() */
> >> 210: (bf) r3 = r1
> >> 211: (07) r3 += 16
> >> 212: (b7) r2 = 8
> >> 213: (bf) r1 = r10
> >> 214: (07) r1 += -8
> >> 215: (85) call bpf_probe_read_kernel#-125280
> >> 216: (79) r3 = *(u64 *)(r10 -8)
> >> 217: (15) if r3 == 0x0 goto pc+10
> >> 218: (07) r3 += 224
> >> 219: (b7) r2 = 8
> >> 220: (bf) r1 = r10
> >> 221: (07) r1 += -8
> >> 222: (85) call bpf_probe_read_kernel#-125280
> >> 223: (79) r3 = *(u64 *)(r10 -8)
> >> 224: (67) r3 <<= 32
> >> 225: (77) r3 >>= 32
> >> 226: (b7) r0 = 1
> >> 227: (15) if r3 == 0x2 goto pc+1
> >> 228: (af) r0 ^= r0
> >> 229: (95) exit
> >>
> >> If bpf_get_func_arg() is supported in tp_btf, the bpf program will
> >> instead look like:
> >>
> >> static __noinline bool
> >> filter_skb(void *ctx)
> >> {
> >> struct sk_buff *skb;
> >>
> >> (void) bpf_get_func_arg(ctx, 0, (__u64 *) &skb);
> >> return skb->dev->ifindex == 2;
> >> }
> >>
> >> This will simplify the generated code and eliminate the need for
> >> bpf_probe_read_kernel() calls. However, in my tests (on kernel
> >> 6.8.0-35-generic, Ubuntu 24.04 LTS), the pointer returned by
> >> bpf_get_func_arg() is marked as a scalar rather than a trusted pointer:
> >>
> >> 0: R1=ctx() R10=fp0
> >> ; if (!filter_skb(ctx))
> >> 0: (85) call pc+3
> >> caller:
> >> R10=fp0
> >> callee:
> >> frame1: R1=ctx() R10=fp0
> >> 4: frame1: R1=ctx() R10=fp0
> >> ; filter_skb(void *ctx)
> >> 4: (bf) r3 = r10 ; frame1: R3_w=fp0 R10=fp0
> >> ;
> >> 5: (07) r3 += -8 ; frame1: R3_w=fp-8
> >> ; (void) bpf_get_func_arg(ctx, 0, (__u64 *) &skb);
> >> 6: (b7) r2 = 0 ; frame1: R2_w=0
> >> 7: (85) call bpf_get_func_arg#183 ; frame1: R0_w=scalar()
> >> ; return skb->dev->ifindex == 2;
> >> 8: (79) r1 = *(u64 *)(r10 -8) ; frame1: R1_w=scalar() R10=fp0
> >> fp-8=mmmmmmmm
> >> ; return skb->dev->ifindex == 2;
> >> 9: (79) r1 = *(u64 *)(r1 +16)
> >> R1 invalid mem access 'scalar'
> >> processed 7 insns (limit 1000000) max_states_per_insn 0 total_states 0
> >> peak_states 0 mark_read 0
> >>
> >> If the returned skb is a trusted pointer, the verifier will accept
> >> something like:
> >>
> >> static __noinline bool
> >> filter_skb(struct sk_buff *skb)
> >> {
> >> return skb->dev->ifindex == 2;
> >> }
> >>
> >> Which will compile into much simpler and more efficient instructions:
> >>
> >> bool filter_skb(struct sk_buff * skb):
> >> ; return skb->dev->ifindex == 2;
> >> 92: (79) r1 = *(u64 *)(r1 +16)
> >> ; return skb->dev->ifindex == 2;
> >> 93: (61) r1 = *(u32 *)(r1 +224)
> >> 94: (b7) r0 = 1
> >> ; return skb->dev->ifindex == 2;
> >> 95: (15) if r1 == 0x2 goto pc+1
> >> 96: (b7) r0 = 0
> >> ; return skb->dev->ifindex == 2;
> >> 97: (95) exit
> >>
> >> In conclusion:
> >>
> >> 1. It will be better if the pointer returned by bpf_get_func_arg() is
> >> trusted, only when the argument index is a known constant.
> >
> > bpf_get_func_arg() was never meant to return trusted arguments, so
> > this, IMO, is pushing it too far.
> >
> >> 2. Adding bpf_get_func_arg() support to tp_btf will significantly
> >> simplify and improve tools like bpfsnoop.
> >
> > "Significantly simplify and improve" is a bit of an exaggeration,
> > given BPF cookies can be used for getting number of arguments of
> > tp_btf, as for the getting rid of bpf_probe_read_kernel(), tbh, more
> > generally useful addition would be an untyped counterpart to
> > bpf_core_cast(), which wouldn't need BTF type information, but will
> > treat all accessed memory as raw bytes (but will still install
> > exception handler just like with bpf_core_cast()).
> >
>
> Cool! The bpf_rdonly_cast() kfunc used by the bpf_core_cast() macro
> works well in bpfsnoop.
>
> The expression 'skb->dev->ifindex == 2' is translated into:
>
> bool filter_arg(__u64 * args):
> ; filter_arg(__u64 *args)
> 209: (bf) r9 = r1
> 210: (79) r8 = *(u64 *)(r9 +0)
> 211: (bf) r1 = r8
> 212: (b7) r2 = 6973
> 213: (bf) r0 = r1
> 214: (79) r1 = *(u64 *)(r0 +16)
> 215: (15) if r1 == 0x0 goto pc+12
> 216: (07) r1 += 224
> 217: (bf) r3 = r1
> 218: (b7) r2 = 8
> 219: (bf) r1 = r10
> 220: (07) r1 += -8
> 221: (85) call bpf_probe_read_kernel#-125280
> 222: (79) r8 = *(u64 *)(r10 -8)
> 223: (67) r8 <<= 32
> 224: (77) r8 >>= 32
> 225: (55) if r8 != 0x2 goto pc+2
> 226: (b7) r8 = 1
> 227: (05) goto pc+1
> 228: (af) r8 ^= r8
> 229: (bf) r0 = r8
> 230: (95) exit
>
> However, since bpf_rdonly_cast() is a kfunc, it causes registers r1–r5
> to be considered volatile.
It is not.
See:
BTF_ID_FLAGS(func, bpf_rdonly_cast, KF_FASTCALL)
and relevant commits.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs
2025-05-12 15:25 ` Alexei Starovoitov
@ 2025-05-12 16:01 ` Leon Hwang
0 siblings, 0 replies; 16+ messages in thread
From: Leon Hwang @ 2025-05-12 16:01 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Andrii Nakryiko, Kafai Wan, Song Liu, Jiri Olsa,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard, Yonghong Song, John Fastabend, KP Singh,
Stanislav Fomichev, Hao Luo, Matt Bobrowski, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Mykola Lysenko, Shuah Khan, LKML, bpf, linux-trace-kernel,
Network Development, open list:KERNEL SELFTEST FRAMEWORK
On 2025/5/12 23:25, Alexei Starovoitov wrote:
> On Mon, May 12, 2025 at 4:12 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
[...]
>>
>> However, since bpf_rdonly_cast() is a kfunc, it causes registers r1–r5
>> to be considered volatile.
>
> It is not.
> See:
> BTF_ID_FLAGS(func, bpf_rdonly_cast, KF_FASTCALL)
> and relevant commits.
Thanks for the reminder — you're right, bpf_rdonly_cast() is marked with
KF_FASTCALL, so it doesn't make r1–r5 volatile.
Thanks,
Leon
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-05-12 16:02 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-26 16:00 [PATCH bpf-next 0/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs KaFai Wan
2025-04-26 16:00 ` [PATCH bpf-next 1/4] " KaFai Wan
2025-04-30 2:46 ` Alexei Starovoitov
2025-04-30 12:43 ` Kafai Wan
2025-04-30 15:54 ` Leon Hwang
2025-04-30 16:53 ` Alexei Starovoitov
2025-05-02 14:25 ` Leon Hwang
2025-05-06 21:01 ` Andrii Nakryiko
2025-05-12 11:12 ` Leon Hwang
2025-05-12 15:25 ` Alexei Starovoitov
2025-05-12 16:01 ` Leon Hwang
2025-05-01 20:53 ` Andrii Nakryiko
2025-04-26 16:00 ` [PATCH bpf-next 2/4] bpf: Enable BPF_PROG_TEST_RUN for tp_btf KaFai Wan
2025-05-01 20:55 ` Andrii Nakryiko
2025-04-26 16:00 ` [PATCH bpf-next 3/4] selftests/bpf: Add raw_tp_test_run " KaFai Wan
2025-04-26 16:00 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for get_func_[arg|arg_cnt] helpers in raw tracepoint programs KaFai Wan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).