* [PATCH bpf-next 00/25] bpf: tracing multi-link support
@ 2025-05-28 3:46 Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 01/25] add per-function metadata storage support Menglong Dong
` (26 more replies)
0 siblings, 27 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
After four months, I finally finish the coding and testing of this series.
This is my first time to write such a complex series, and it's so hard :/
Anyway, I finished it.
(I'm scared :/)
For now, the BPF program of type BPF_PROG_TYPE_TRACING is not allowed to
be attached to multiple hooks, and we have to create a BPF program for
each kernel function, for which we want to trace, even through all the
program have the same (or similar) logic. This can consume extra memory,
and make the program loading slow if we have plenty of kernel function to
trace.
In this series, we add the support to allow attaching a tracing BPF
program to multi hooks, which is similar to BPF_TRACE_KPROBE_MULTI.
Generally speaking, this series can be divided into 5 parts:
1. Add per-function metadata storage support.
2. Add bpf global trampoline support for x86_64.
3. Add bpf global trampoline link support.
4. Add tracing multi-link support.
5. Compatibility between tracing and tracing_multi.
per-function metadata storage
-----------------------------
The per-function metadata storage is the basic of the bpf global
trampoline. It has 2 mode: function padding mode and hash table mode. When
The CONFIG_FUNCTION_METADATA_PADDING is enabled, the function padding mode
will be used, and it has higher performance with almost no overhead. It
will allocate a metadata array and storage the index of the metadata to
the function padding.
The function padding can increase the text size, so it can be not
enabled sometimes. When the CONFIG_FUNCTION_METADATA is not enabled, we
will fallback to the hash table mode, and storage the function metadata
in a hlist.
The release of the metadata is a big difficulty in the function padding
mode. In this mode, we use a metadata array to store the metadata. The
array need to be enlarged when it is full. So we not only need to control
the release of the metadata, but also the release of the metadata array.
per-cpu ref and rcu are used in these situations. The release of the
metadata is much easier in the hash table mode. We just need to use the
rcu to release the metadata.
For now, function padding mode is supported in x86_64 and arm64. And
the function metadata is only used by the bpf global trampoline in x86_64.
So maybe we don't need to support the function padding mode in arm64 in
this series?
The performance comparison between the function padding mode and hash
table mode can be found in the following commit log. As Alexei pointed
out in [1], we can fallback to the hash table mode when the function
padding is not supported.
bpf global trampoline
---------------------
The bpf global trampoline is similar to the general bpf trampoline. The
bpf trampoline store the bpf progs and some metadata in the trampoline
instructions directly. However, the bpf global trampoline store and get
the metadata from the function metadata with kfunc_md_get_noref(). This
makes the bpf global trampoline more flexible and can be used for all the
kernel functions with a single instance.
The bpf global trampoline is designed to implement the tracing multi-link
for FENTRY, FEXIT and MODIFY_RETURN. Sleepable bpf progs are not
supported, as we call __bpf_prog_enter_recur() and __bpf_prog_exit_recur()
directly in the bpf global trampoline, which can be optimized later. And
we make __bpf_prog_{enter,exit}_recur() global to use it in the global
trampoline.
The overhead of the bpf global trampoline is slightly higher than the
function trampoline, as we need prepare all the things we need in the
stack. For example, we store the address of the traced function in the
stack for bpf_get_func_ip(), even if it is not used.
As Steven mentioned in [1], we need to mitigate for spectre, as we use
indirect call in the bpf global trampoline. I haven't fully understood
Spectre yet, and I make the indirect call with CALL_NOSPEC just like the
others do. Could it prevent the spectre?
bpf global trampoline link
--------------------------
We reuse part of the code in [2] to implement the tracing multi-link. The
struct bpf_gtramp_link is introduced for the bpf global trampoline link.
Similar to the bpf trampoline link, the bpf global trampoline link has
bpf_gtrampoline_link_prog() and bpf_gtrampoline_unlink_prog() to link and
unlink the bpf progs.
The "entries" in the bpf_gtramp_link is a array of struct
bpf_gtramp_link_entry, which contain all the information of the functions
that we trace, such as the address, the number of args, the cookie and so
on.
The bpf global trampoline is much simpler than the bpf trampoline, and we
introduce then new struct bpf_global_trampoline for it. The "image" field
is a pointer to bpf_global_caller. We implement the global trampoline
based on the direct ftrace, and the "fops" field for this propose. This
means bpf2bpf is not supported by the tracing multi-link.
When we link the bpf prog, we will add it to all the target functions'
kfunc_md. Then, we get all the function addresses that have bpf progs with
kfunc_md_bpf_ips(), and reset the ftrace filter of the fops to it. The
direct ftrace don't support to reset the filter functions yet, so we
introduce the reset_ftrace_direct_ips() to do this.
We use a global lock to protect the global trampoline, and we need to hold
it when we link or unlink the bpf prog. The global lock is a read-write.
In fact, it should be a mutex here, the rw_semaphore is used for the
following patches that make tracing_multi compatible with tracing.
tracing multi-link
------------------
Most of the code of this part comes from the series [2].
In the 10th patch, we add the support to record index of the accessed
function args of the target for tracing program. Meanwhile, we add the
function btf_check_func_part_match() to compare the accessed function args
of two function prototype. This function will be used in the next commit.
In the 11th patch, we refactor the struct modules_array to ptr_array, as
we need similar function to hold the target btf, target program and kernel
modules that we reference to in the following commit.
In the 15th patch, we implement the multi-link support for tracing, and
following new attach types are added:
BPF_TRACE_FENTRY_MULTI
BPF_TRACE_FEXIT_MULTI
BPF_MODIFY_RETURN_MULTI
We introduce the struct bpf_tracing_multi_link for this purpose, which
can hold all the kernel modules, target bpf program (for attaching to bpf
program) or target btf (for attaching to kernel function) that we
referenced.
During loading, the first target is used for verification by the verifier.
And during attaching, we check the consistency of all the targets with
the first target.
Compatibility between tracing and tracing_multi
------------------------------------------------
The tracing_multi is not compatible with the tracing without the 16-18th
patches. For example, we will fail on attaching the FENTRY_MULTI to the
functions that FENTRY exists, and FENTRY will also fail if FENTRY_MULTI
exists.
Generally speaking, we will replace the global trampoline with bpf
trampoline if both of them exist on the same function. The function
replace_ftrace_direct() is added for this, which is used to replace the
direct ftrace_ops with another one for all the filter functions in the
source ftrace_ops.
The most difficult part is synchronization between
bpf_gtrampoline_link_prog and bpf_trampoline_link_prog, and we use a
rw_semaphore here, which is quite ugly. We hold the write lock in
bpf_gtrampoline_link_prog and read lock in bpf_trampoline_link_prog.
We introduce the function bpf_gtrampoline_link_tramp() to make
bpf_gtramp_link fit bpf_trampoline, which will be called in
bpf_gtrampoline_link_prog(). If the bpf_trampoline of the function exist
in the kfunc_md or we find it with bpf_trampoline_lookup_exist(), it means
that we need do the fitting. The fitting is simple, we create a
bpf_shim_tramp_link for our prog and link it to the bpf_trampoline with
__bpf_trampoline_link_prog().
It's a little complex for the bpf_trampoline_link_prog() case. We create
bpf_shim_tramp_link for all the bpf progs in kfunc_md and add it to the
bpf_trampoline before we call __bpf_trampoline_link_prog() in
bpf_gtrampoline_override(). And we will fallback in
bpf_gtrampoline_override_finish() if error is returned by
__bpf_trampoline_link_prog().
Another solution is to fit into the existing trampoline. For example, we
can add the bpf prog to the kfunc_md if tracing_multi bpf prog is attached
on the target function when we attach a tracing bpf prog. And we can also
update the tracing_multi prog to the trampoline if tracing prog exists
on the target function. I think this will make the compatibility much
easier.
The code in this part is very ugly and messy, and I think it will be a
liberation to split it out to another series :/
Performance comparison
----------------------
We have implemented the performance testings in the selftests in
test_tracing_multi_bench(), and you can get the result by running:
./test_progs -t tracing_multi_bench -v | grep time
In this testcase, bpf_fentry_test1() will be called 10000000 times, and
the time consumed will be returned and printed. Following cases is
considered:
- nop: nothing is attached to bpf_fentry_test1()
- fentry: a empty FENTRY bpf program is attached to bpf_fentry_test1()
- fentry_multi_single: a empty FENTRY_MULTI bpf program is attached to
bpf_fentry_test1().
We alias it as "fm_single" in the following.
- fentry_multi_all: a empty FENTRY_MULTI bpf program is attached to all
the kernel functions.
We alias it as "fm_all" in the following.
- kprobe_multi_single: a empty KPROBE_MULTI bpf program is attached to
bpf_fentry_test1().
We alias it as "km_single" in the following.
- kprobe_multi_all: a empty KPROBE_MULTI bpf program is attached to all
the kernel functions.
We alias it as "km_all" in the following.
The "fentry_multi_all" is used to test the performance of the hash table
mode of the function metadata.
Different kconfig can affect the performance, and the base kconfig we use
comes from debian 12.
no-mitigate + hash table mode
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
nop | fentry | fm_single | fm_all | km_single | km_all
9.014ms | 162.378ms | 180.511ms | 446.286ms | 220.634ms | 1465.133ms
9.038ms | 161.600ms | 178.757ms | 445.807ms | 220.656ms | 1463.714ms
9.048ms | 161.435ms | 180.510ms | 452.530ms | 220.943ms | 1487.494ms
9.030ms | 161.585ms | 178.699ms | 448.167ms | 220.107ms | 1463.785ms
9.056ms | 161.530ms | 178.947ms | 445.609ms | 221.026ms | 1560.584ms
The mitigate is enabled by default in the kernel, and we can disable it
with the "mitigations=off" cmdline to do the testing.
The count of the kernel functions that we traced in the fentry_multi_all
and kprobe_multi_all testcase is 43871, which can be obtained by running:
$ ./test_progs -t trace_bench -v | grep before
attach 43871 functions before testings
attach 43871 functions before testings
However, we use hlist_add_tail_rcu() in the kfunc_md, and bpf_fentry_test1
is the 39425th function in
/sys/kernel/debug/tracing/available_filter_functions, which means the
bpf_fentry_test1 is in the tail of the hlist budget. So we can image that
the total traced functions is 79k(39425 * 2) to make the hash table lookup
fair enough.
Note, the performance of fentry can vary significantly with different
kconfig. When the kernel is compiled with "tinyconfig" of x86_64, the
performance of fentry is about 80ms, and the fentry_multi is about 95ms.
mitigate + hash table mode
^^^^^^^^^^^^^^^^^^^^^^^^^^^
nop | fentry | fm_single | fm_all | km_single | km_all
37.348ms | 753.621ms | 844.110ms | 1033.151ms | 1033.814ms | 2403.759ms
37.439ms | 753.894ms | 843.922ms | 1033.182ms | 1034.066ms | 2364.879ms
37.420ms | 754.802ms | 844.430ms | 1046.192ms | 1035.357ms | 2368.233ms
37.436ms | 754.051ms | 844.831ms | 1035.204ms | 1034.431ms | 2252.827ms
37.425ms | 753.802ms | 844.714ms | 1106.462ms | 1034.119ms | 2252.217ms
The performance of fentry_multi is much higher than fentry with
mitigations, and I think that it is because we made one more function call
in the fentry_multi, which is the kfunc_md_get_noref(). What's more, the
mitigate for the indirect call in the bpf global trampoline also increase
the overhead. We can still do some optimizations in the bpf global
trampoline.
I think this is what Alexei meant in [1], that the performance suffers
from the mitigations anyway, so the indirect call doesn't matter in this
case.
no-mitigate + function padding mode
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
nop | fentry | fm_single | fm_all | km_single | km_all
9.320ms | 166.454ms | 184.094ms | 193.884ms | 227.320ms | 1441.462ms
9.326ms | 166.651ms | 183.954ms | 193.912ms | 227.503ms | 1544.634ms
9.313ms | 170.501ms | 183.985ms | 191.738ms | 227.801ms | 1441.284ms
9.311ms | 166.957ms | 182.086ms | 192.063ms | 410.411ms | 1489.665ms
9.329ms | 166.332ms | 182.196ms | 194.154ms | 227.443ms | 1511.272ms
The overhead of fentry_multi_all is a little higher than the
fentry_multi_single. Maybe it is because the function
ktime_get_boottime_ns(), which is used in bpf_testmod_bench_run(), is also
traced? I haven't figured it out yet, but it doesn't matter :/
The overhead of fentry_multi_single is a little higher than the fentry,
and it makes sense, as we do more things in the bpf global trampoline. In
comparison, the bpf trampoline is simpler.
mitigate + function padding mode
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
nop | fentry | fm_single | fm_all | km_single | km_all
37.340ms | 754.659ms | 840.295ms | 849.632ms | 1043.997ms | 2180.874ms
37.543ms | 753.809ms | 840.535ms | 849.746ms | 1034.481ms | 2355.085ms
37.442ms | 753.803ms | 840.797ms | 850.012ms | 1034.462ms | 2276.567ms
37.501ms | 753.931ms | 840.789ms | 850.938ms | 1035.594ms | 2218.350ms
37.700ms | 753.714ms | 840.875ms | 851.614ms | 1034.465ms | 2329.307ms
conclusion
^^^^^^^^^^
The performance of fentry_multi is close to fentry in the function padding
mode. And in the hash table mode, the performance of fentry_multi can be
276% of fentry when the number of the traced functions is 79k.
Link: https://lore.kernel.org/all/20250303132837.498938-1-dongml2@chinatelecom.cn/ [1]
Link: https://lore.kernel.org/bpf/20240311093526.1010158-1-dongmenglong.8@bytedance.com/ [2]
Menglong Dong (25):
add per-function metadata storage support
x86: implement per-function metadata storage for x86
arm64: implement per-function metadata storage for arm64
bpf: make kfunc_md support global trampoline link
x86,bpf: add bpf_global_caller for global trampoline
ftrace: factor out ftrace_direct_update from register_ftrace_direct
ftrace: add reset_ftrace_direct_ips
bpf: introduce bpf_gtramp_link
bpf: tracing: add support to record and check the accessed args
bpf: refactor the modules_array to ptr_array
bpf: verifier: add btf to the function args of bpf_check_attach_target
bpf: verifier: move btf_id_deny to bpf_check_attach_target
x86,bpf: factor out __arch_get_bpf_regs_nr
bpf: tracing: add multi-link support
ftrace: factor out __unregister_ftrace_direct
ftrace: supporting replace direct ftrace_ops
bpf: make trampoline compatible with global trampoline
libbpf: don't free btf if tracing_multi progs existing
libbpf: support tracing_multi
libbpf: add btf type hash lookup support
libbpf: add skip_invalid and attach_tracing for tracing_multi
selftests/bpf: use the glob_match() from libbpf in test_progs.c
selftests/bpf: add get_ksyms and get_addrs to test_progs.c
selftests/bpf: add testcases for multi-link of tracing
selftests/bpf: add performance bench test for trace prog
arch/arm64/Kconfig | 21 +
arch/arm64/Makefile | 23 +-
arch/arm64/include/asm/ftrace.h | 34 +
arch/arm64/kernel/ftrace.c | 13 +-
arch/x86/Kconfig | 30 +
arch/x86/include/asm/alternative.h | 2 +
arch/x86/include/asm/ftrace.h | 47 ++
arch/x86/kernel/asm-offsets.c | 15 +
arch/x86/kernel/callthunks.c | 2 +-
arch/x86/kernel/ftrace.c | 26 +
arch/x86/kernel/ftrace_64.S | 231 ++++++
arch/x86/net/bpf_jit_comp.c | 36 +-
include/linux/bpf.h | 67 ++
include/linux/bpf_types.h | 1 +
include/linux/bpf_verifier.h | 1 +
include/linux/ftrace.h | 15 +
include/linux/kfunc_md.h | 63 ++
include/uapi/linux/bpf.h | 10 +
kernel/bpf/btf.c | 113 ++-
kernel/bpf/syscall.c | 409 ++++++++++-
kernel/bpf/trampoline.c | 476 +++++++++++-
kernel/bpf/verifier.c | 161 ++--
kernel/trace/Makefile | 1 +
kernel/trace/bpf_trace.c | 48 +-
kernel/trace/ftrace.c | 288 ++++++--
kernel/trace/kfunc_md.c | 689 ++++++++++++++++++
net/bpf/test_run.c | 3 +
net/core/bpf_sk_storage.c | 2 +
tools/bpf/bpftool/common.c | 3 +
tools/include/uapi/linux/bpf.h | 10 +
tools/lib/bpf/bpf.c | 10 +
tools/lib/bpf/bpf.h | 6 +
tools/lib/bpf/btf.c | 102 +++
tools/lib/bpf/btf.h | 6 +
tools/lib/bpf/libbpf.c | 288 +++++++-
tools/lib/bpf/libbpf.h | 25 +
tools/lib/bpf/libbpf.map | 5 +
tools/testing/selftests/bpf/Makefile | 2 +-
.../selftests/bpf/prog_tests/fentry_fexit.c | 22 +-
.../selftests/bpf/prog_tests/fentry_test.c | 79 +-
.../selftests/bpf/prog_tests/fexit_test.c | 79 +-
.../bpf/prog_tests/kprobe_multi_test.c | 220 +-----
.../selftests/bpf/prog_tests/modify_return.c | 60 ++
.../selftests/bpf/prog_tests/trace_bench.c | 149 ++++
.../bpf/prog_tests/tracing_multi_link.c | 276 +++++++
.../selftests/bpf/progs/fentry_empty.c | 13 +
.../selftests/bpf/progs/fentry_multi_empty.c | 13 +
.../testing/selftests/bpf/progs/trace_bench.c | 21 +
.../bpf/progs/tracing_multi_override.c | 28 +
.../selftests/bpf/progs/tracing_multi_test.c | 181 +++++
.../selftests/bpf/test_kmods/bpf_testmod.c | 40 +
tools/testing/selftests/bpf/test_progs.c | 349 ++++++++-
tools/testing/selftests/bpf/test_progs.h | 5 +
53 files changed, 4334 insertions(+), 485 deletions(-)
create mode 100644 include/linux/kfunc_md.h
create mode 100644 kernel/trace/kfunc_md.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/trace_bench.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/tracing_multi_link.c
create mode 100644 tools/testing/selftests/bpf/progs/fentry_empty.c
create mode 100644 tools/testing/selftests/bpf/progs/fentry_multi_empty.c
create mode 100644 tools/testing/selftests/bpf/progs/trace_bench.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_override.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_test.c
--
2.39.5
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH bpf-next 01/25] add per-function metadata storage support
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
@ 2025-05-28 3:46 ` Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 02/25] x86: implement per-function metadata storage for x86 Menglong Dong
` (25 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
For now, there isn't a way to set and get per-function metadata with
a low overhead, which is not convenient for some situations. Take
BPF trampoline for example, we need to create a trampoline for each
kernel function, as we have to store some information of the function
to the trampoline, such as BPF progs, function arg count, etc. The
performance overhead and memory consumption can be higher to create
these trampolines. With the supporting of per-function metadata storage,
we can store these information to the metadata, and create a global BPF
trampoline for all the kernel functions. In the global trampoline, we
get the information that we need from the function metadata through the
ip (function address) with almost no overhead.
Another beneficiary can be fprobe. For now, fprobe will add all the
functions that it hooks into a hash table. And in fprobe_entry(), it will
lookup all the handlers of the function in the hash table. The performance
can suffer from the hash table lookup. We can optimize it by adding the
handler to the function metadata instead.
Support per-function metadata storage in the function padding, and
previous discussion can be found in [1]. Generally speaking, we have two
way to implement this feature:
1. Create a function metadata array, and prepend a insn which can hold
the index of the function metadata in the array. And store the insn to
the function padding.
2. Allocate the function metadata with kmalloc(), and prepend a insn which
hold the pointer of the metadata. And store the insn to the function
padding.
Compared with way 2, way 1 consume less space, but we need to do more work
on the global function metadata array. And we implement this function in
the way 1.
Link: https://lore.kernel.org/bpf/CADxym3anLzM6cAkn_z71GDd_VeKiqqk1ts=xuiP7pr4PO6USPA@mail.gmail.com/ [1]
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/linux/kfunc_md.h | 44 +++
kernel/trace/Makefile | 1 +
kernel/trace/kfunc_md.c | 566 +++++++++++++++++++++++++++++++++++++++
3 files changed, 611 insertions(+)
create mode 100644 include/linux/kfunc_md.h
create mode 100644 kernel/trace/kfunc_md.c
diff --git a/include/linux/kfunc_md.h b/include/linux/kfunc_md.h
new file mode 100644
index 000000000000..21c0b879cc03
--- /dev/null
+++ b/include/linux/kfunc_md.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_KFUNC_MD_H
+#define _LINUX_KFUNC_MD_H
+
+#define KFUNC_MD_FL_DEAD (1 << 0) /* the md shouldn't be reused */
+
+#ifndef __ASSEMBLER__
+
+#include <linux/kernel.h>
+#include <linux/bpf.h>
+
+struct kfunc_md_array;
+
+struct kfunc_md {
+#ifndef CONFIG_FUNCTION_METADATA_PADDING
+ /* this is used for the hash table mode */
+ struct hlist_node hash;
+ /* this is used for table mode */
+ struct rcu_head rcu;
+#endif
+ unsigned long func;
+#ifdef CONFIG_FUNCTION_METADATA
+ /* the array is used for the fast mode */
+ struct kfunc_md_array *array;
+#endif
+ struct percpu_ref pcref;
+ u32 flags;
+ u16 users;
+ u8 nr_args;
+};
+
+struct kfunc_md *kfunc_md_get(unsigned long ip);
+struct kfunc_md *kfunc_md_get_noref(unsigned long ip);
+struct kfunc_md *kfunc_md_create(unsigned long ip, int nr_args);
+void kfunc_md_put_entry(struct kfunc_md *meta);
+void kfunc_md_put(unsigned long ip);
+void kfunc_md_lock(void);
+void kfunc_md_unlock(void);
+void kfunc_md_exit(struct kfunc_md *md);
+void kfunc_md_enter(struct kfunc_md *md);
+bool kfunc_md_arch_support(int *insn, int *data);
+
+#endif
+#endif
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 057cd975d014..d8c19ff1e55e 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_TRACING) += trace_seq.o
obj-$(CONFIG_TRACING) += trace_stat.o
obj-$(CONFIG_TRACING) += trace_printk.o
obj-$(CONFIG_TRACING) += pid_list.o
+obj-$(CONFIG_TRACING) += kfunc_md.o
obj-$(CONFIG_TRACING_MAP) += tracing_map.o
obj-$(CONFIG_PREEMPTIRQ_DELAY_TEST) += preemptirq_delay_test.o
obj-$(CONFIG_SYNTH_EVENT_GEN_TEST) += synth_event_gen_test.o
diff --git a/kernel/trace/kfunc_md.c b/kernel/trace/kfunc_md.c
new file mode 100644
index 000000000000..9571081f6560
--- /dev/null
+++ b/kernel/trace/kfunc_md.c
@@ -0,0 +1,566 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 ChinaTelecom */
+
+#include <linux/slab.h>
+#include <linux/memory.h>
+#include <linux/rcupdate.h>
+#include <linux/ftrace.h>
+#include <linux/kfunc_md.h>
+
+#include <uapi/linux/bpf.h>
+
+#ifndef CONFIG_FUNCTION_METADATA_PADDING
+
+DEFINE_STATIC_KEY_TRUE(kfunc_md_use_padding);
+static int __insn_offset, __data_offset;
+#define insn_offset __insn_offset
+#define data_offset __data_offset
+
+#define KFUNC_MD_HASH_BITS 10
+static struct hlist_head kfunc_md_table[1 << KFUNC_MD_HASH_BITS];
+
+#else
+#define insn_offset KFUNC_MD_INSN_OFFSET
+#define data_offset KFUNC_MD_DATA_OFFSET
+#endif
+
+#define insn_size KFUNC_MD_INSN_SIZE
+
+#define ENTRIES_PER_PAGE (PAGE_SIZE / sizeof(struct kfunc_md))
+
+#define KFUNC_MD_ARRAY_FL_DEAD 0
+
+struct kfunc_md_array {
+ struct kfunc_md *mds;
+ u32 kfunc_md_count;
+ unsigned long flags;
+ atomic_t kfunc_md_used;
+ union {
+ struct work_struct work;
+ struct rcu_head rcu;
+ };
+};
+
+static struct kfunc_md_array empty_array = {
+ .mds = NULL,
+ .kfunc_md_count = 0,
+};
+/* used for the padding-based function metadata */
+static struct kfunc_md_array __rcu *kfunc_mds = &empty_array;
+
+/* any function metadata write should hold this lock */
+static DEFINE_MUTEX(kfunc_md_mutex);
+
+
+#ifndef CONFIG_FUNCTION_METADATA_PADDING
+
+static struct hlist_head *kfunc_md_hash_head(unsigned long ip)
+{
+ return &kfunc_md_table[hash_ptr((void *)ip, KFUNC_MD_HASH_BITS)];
+}
+
+static struct kfunc_md *kfunc_md_hash_get(unsigned long ip)
+{
+ struct hlist_head *head;
+ struct kfunc_md *md;
+
+ head = kfunc_md_hash_head(ip);
+ hlist_for_each_entry_rcu_notrace(md, head, hash) {
+ if (md->func == ip)
+ return md;
+ }
+
+ return NULL;
+}
+
+static void kfunc_md_hash_release(struct percpu_ref *pcref)
+{
+ struct kfunc_md *md;
+
+ md = container_of(pcref, struct kfunc_md, pcref);
+ kfree_rcu(md, rcu);
+}
+
+static struct kfunc_md *kfunc_md_hash_create(unsigned long ip, int nr_args)
+{
+ struct kfunc_md *md = kfunc_md_hash_get(ip);
+ struct hlist_head *head;
+ int err;
+
+ if (md) {
+ md->users++;
+ return md;
+ }
+
+ md = kzalloc(sizeof(*md), GFP_KERNEL);
+ if (!md)
+ return NULL;
+
+ md->users = 1;
+ md->func = ip;
+ md->nr_args = nr_args;
+
+ err = percpu_ref_init(&md->pcref, kfunc_md_hash_release, 0, GFP_KERNEL);
+ if (err) {
+ kfree(md);
+ return NULL;
+ }
+
+ head = kfunc_md_hash_head(ip);
+ hlist_add_tail_rcu(&md->hash, head);
+ atomic_inc(&kfunc_mds->kfunc_md_used);
+
+ return md;
+}
+
+static void kfunc_md_hash_put(struct kfunc_md *md)
+{
+ if (WARN_ON_ONCE(md->users <= 0))
+ return;
+
+ md->users--;
+ if (md->users > 0)
+ return;
+
+ hlist_del_rcu(&md->hash);
+ percpu_ref_kill(&md->pcref);
+ atomic_dec(&kfunc_mds->kfunc_md_used);
+}
+
+static bool kfunc_md_fast(void)
+{
+ return static_branch_likely(&kfunc_md_use_padding);
+}
+#else
+
+static void kfunc_md_hash_put(struct kfunc_md *md)
+{
+}
+
+static struct kfunc_md *kfunc_md_hash_get(unsigned long ip)
+{
+ return NULL;
+}
+
+static struct kfunc_md *kfunc_md_hash_create(unsigned long ip, int nr_args)
+{
+ return NULL;
+}
+
+#define kfunc_md_fast() 1
+#endif /* CONFIG_FUNCTION_METADATA_PADDING */
+
+#ifdef CONFIG_FUNCTION_METADATA
+static void kfunc_md_release(struct percpu_ref *pcref);
+
+static __always_inline u32 kfunc_md_get_index(unsigned long ip)
+{
+ return *(u32 *)(ip - data_offset);
+}
+
+static struct kfunc_md_array *kfunc_md_array_alloc(struct kfunc_md_array *old)
+{
+ struct kfunc_md_array *new_mds;
+ int len = old->kfunc_md_count;
+ struct kfunc_md *md;
+ int err, i;
+
+ new_mds = kmalloc(sizeof(*new_mds), __GFP_ZERO | GFP_KERNEL);
+ if (!new_mds)
+ return NULL;
+
+ /* if the length of old kfunc md array is zero, we make ENTRIES_PER_PAGE
+ * as the default size of the new kfunc md array.
+ */
+ new_mds->kfunc_md_count = (len * 2) ?: ENTRIES_PER_PAGE;
+ new_mds->mds = kvmalloc_array(new_mds->kfunc_md_count, sizeof(*new_mds->mds),
+ __GFP_ZERO | GFP_KERNEL);
+ if (!new_mds->mds) {
+ kfree(new_mds);
+ return NULL;
+ }
+
+ if (len) {
+ memcpy(new_mds->mds, old->mds, sizeof(*new_mds->mds) * len);
+ new_mds->kfunc_md_used = old->kfunc_md_used;
+ }
+
+ for (i = 0; i < new_mds->kfunc_md_count; i++) {
+ md = &new_mds->mds[i];
+
+ if (md->users) {
+ err = percpu_ref_init(&md->pcref, kfunc_md_release,
+ 0, GFP_KERNEL);
+ if (err)
+ goto pcref_fail;
+ md->array = new_mds;
+ }
+ }
+
+ return new_mds;
+
+pcref_fail:
+ for (int j = 0; j < i; j++) {
+ md = &new_mds->mds[j];
+ if (md->users)
+ percpu_ref_exit(&md->pcref);
+ }
+ kvfree(new_mds->mds);
+ kfree(new_mds);
+ return NULL;
+}
+
+static void kfunc_md_array_release_deferred(struct work_struct *work)
+{
+ struct kfunc_md_array *mds;
+
+ mds = container_of(work, struct kfunc_md_array, work);
+ /* the kfunc metadata array is not used anywhere, we can free it
+ * directly.
+ */
+ if (atomic_read(&mds->kfunc_md_used) == 0) {
+ for (int i = 0; i < mds->kfunc_md_count; i++) {
+ if (mds->mds[i].users)
+ percpu_ref_exit(&mds->mds[i].pcref);
+ }
+
+ kvfree(mds->mds);
+ kfree_rcu(mds, rcu);
+ return;
+ }
+
+ for (int i = 0; i < mds->kfunc_md_count; i++) {
+ if (mds->mds[i].users)
+ percpu_ref_kill(&mds->mds[i].pcref);
+ }
+}
+
+static void kfunc_md_array_release(struct rcu_head *rcu)
+{
+ struct kfunc_md_array *mds;
+
+ mds = container_of(rcu, struct kfunc_md_array, rcu);
+ if (mds == &empty_array)
+ return;
+
+ INIT_WORK(&mds->work, kfunc_md_array_release_deferred);
+ schedule_work(&mds->work);
+}
+
+static void kfunc_md_release(struct percpu_ref *pcref)
+{
+ struct kfunc_md *md;
+
+ md = container_of(pcref, struct kfunc_md, pcref);
+ if (test_bit(KFUNC_MD_ARRAY_FL_DEAD, &md->array->flags)) {
+ if (atomic_dec_and_test(&md->array->kfunc_md_used)) {
+ call_rcu_tasks(&md->array->rcu, kfunc_md_array_release);
+ return;
+ }
+ }
+ percpu_ref_exit(&md->pcref);
+ /* clear the flags, so it can be reused */
+ md->flags = 0;
+}
+
+static int kfunc_md_text_poke(unsigned long ip, void *insn, void *nop)
+{
+ void *target;
+ int ret = 0;
+ u8 *prog;
+
+ target = (void *)(ip - insn_offset);
+ mutex_lock(&text_mutex);
+ if (insn) {
+ if (!memcmp(target, insn, insn_size))
+ goto out;
+
+ if (memcmp(target, nop, insn_size)) {
+ ret = -EBUSY;
+ goto out;
+ }
+ prog = insn;
+ } else {
+ if (!memcmp(target, nop, insn_size))
+ goto out;
+ prog = nop;
+ }
+
+ ret = kfunc_md_arch_poke(target, prog, insn_offset);
+out:
+ mutex_unlock(&text_mutex);
+ return ret;
+}
+
+/* Get next usable function metadata. On success, return the usable
+ * kfunc_md and store the index of it to *index. If no usable kfunc_md is
+ * found in kfunc_mds, a larger array will be allocated.
+ */
+static struct kfunc_md *kfunc_md_fast_next(u32 *index)
+{
+ struct kfunc_md_array *mds, *new_mds;
+ struct kfunc_md *md;
+ u32 i;
+
+ mds = kfunc_mds;
+do_retry:
+ if (likely(atomic_read(&mds->kfunc_md_used) < mds->kfunc_md_count)) {
+ /* maybe we can manage the used function metadata entry
+ * with a bit map ?
+ */
+ for (i = 0; i < mds->kfunc_md_count; i++) {
+ md = &mds->mds[i];
+ if (!md->users && !(md->flags & KFUNC_MD_FL_DEAD)) {
+ atomic_inc(&mds->kfunc_md_used);
+ *index = i;
+ return md;
+ }
+ }
+ }
+
+ /* no available function metadata, so allocate a bigger function
+ * metadata array.
+ *
+ * TODO: we increase the array length here, and we also need to
+ * shrink it somewhere.
+ */
+ new_mds = kfunc_md_array_alloc(mds);
+ if (!new_mds)
+ return NULL;
+
+ rcu_assign_pointer(kfunc_mds, new_mds);
+ /* release of the old kfunc metadata array.
+ *
+ * First step, set KFUNC_MD_ARRAY_FL_DEAD on it. The old mds will
+ * not be accessed by anyone anymore from now on.
+ *
+ * Second step, call rcu to wakeup the work queue to call
+ * kfunc_md_array_release_deferred() in kfunc_md_array_release.
+ *
+ * Third step, kill all the percpu ref of the mds in
+ * kfunc_md_array_release_deferred().
+ *
+ * Fourth step, decrease the mds->kfunc_md_used in the callback of
+ * the percpu ref. And the callback is kfunc_md_release().
+ *
+ * Fifth step, wakeup the work queue to call
+ * kfunc_md_array_release_deferred() if old->kfunc_md_used is decreased
+ * to 0, and the old mds will be freed.
+ */
+ set_bit(KFUNC_MD_ARRAY_FL_DEAD, &mds->flags);
+ call_rcu_tasks(&mds->rcu, kfunc_md_array_release);
+ mds = new_mds;
+
+ goto do_retry;
+}
+
+static void kfunc_md_fast_put(struct kfunc_md *md)
+{
+ u8 nop_insn[insn_size];
+
+ if (WARN_ON_ONCE(md->users <= 0))
+ return;
+
+ md->users--;
+ if (md->users > 0)
+ return;
+
+ if (WARN_ON_ONCE(!kfunc_md_arch_exist(md->func, insn_offset)))
+ return;
+
+ atomic_dec(&md->array->kfunc_md_used);
+ kfunc_md_arch_nops(nop_insn);
+ /* release the metadata by recovering the function padding to NOPS */
+ kfunc_md_text_poke(md->func, NULL, nop_insn);
+ /* mark it as dead, so it will not be reused before we release it
+ * fully in kfunc_md_release().
+ */
+ md->flags |= KFUNC_MD_FL_DEAD;
+ percpu_ref_kill(&md->pcref);
+}
+
+/* Get a exist metadata by the function address, and NULL will be returned
+ * if not exist.
+ *
+ * NOTE: rcu lock or kfunc_md_lock should be held during reading the metadata,
+ * and kfunc_md_lock should be held if writing happens.
+ */
+static struct kfunc_md *kfunc_md_fast_get(unsigned long ip)
+{
+ struct kfunc_md *md;
+ u32 index;
+
+ if (kfunc_md_arch_exist(ip, insn_offset)) {
+ index = kfunc_md_get_index(ip);
+ md = READ_ONCE(kfunc_mds->mds) + index;
+ return md;
+ }
+ return NULL;
+}
+
+/* Get a exist metadata by the function address, and create one if not
+ * exist. Reference of the metadata will increase 1.
+ *
+ * NOTE: always call this function with kfunc_md_lock held, and all
+ * updating to metadata should also hold the kfunc_md_lock.
+ */
+static struct kfunc_md *kfunc_md_fast_create(unsigned long ip, int nr_args)
+{
+ u8 nop_insn[insn_size], insn[insn_size];
+ struct kfunc_md *md;
+ u32 index;
+ int err;
+
+ md = kfunc_md_fast_get(ip);
+ if (md) {
+ md->users++;
+ return md;
+ }
+
+ md = kfunc_md_fast_next(&index);
+ if (!md)
+ return NULL;
+
+ memset(md, 0, sizeof(*md));
+ err = percpu_ref_init(&md->pcref, kfunc_md_release, 0, GFP_KERNEL);
+ if (err)
+ return NULL;
+
+ kfunc_md_arch_pretend(insn, index);
+ kfunc_md_arch_nops(nop_insn);
+
+ if (kfunc_md_text_poke(ip, insn, nop_insn)) {
+ atomic_dec(&kfunc_mds->kfunc_md_used);
+ percpu_ref_exit(&md->pcref);
+ return NULL;
+ }
+
+ md->users = 1;
+ md->func = ip;
+ md->array = kfunc_mds;
+ md->nr_args = nr_args;
+
+ return md;
+}
+#else
+
+static void kfunc_md_fast_put(struct kfunc_md *md)
+{
+}
+
+static struct kfunc_md *kfunc_md_fast_get(unsigned long ip)
+{
+ return NULL;
+}
+
+static struct kfunc_md *kfunc_md_fast_create(unsigned long ip, int nr_args)
+{
+ return NULL;
+}
+
+#endif /* !CONFIG_FUNCTION_METADATA */
+
+void kfunc_md_enter(struct kfunc_md *md)
+{
+ percpu_ref_get(&md->pcref);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_enter);
+
+void kfunc_md_exit(struct kfunc_md *md)
+{
+ percpu_ref_put(&md->pcref);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_exit);
+
+void kfunc_md_unlock(void)
+{
+ mutex_unlock(&kfunc_md_mutex);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_unlock);
+
+void kfunc_md_lock(void)
+{
+ mutex_lock(&kfunc_md_mutex);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_lock);
+
+#undef CALL
+#define CALL(fast, slow, type, ...) ({ \
+ type ___ret; \
+ if (kfunc_md_fast()) \
+ ___ret = fast(__VA_ARGS__); \
+ else \
+ ___ret = slow(__VA_ARGS__); \
+ ___ret; \
+})
+
+#undef CALL_VOID
+#define CALL_VOID(fast, slow, ...) do { \
+ if (kfunc_md_fast()) \
+ fast(__VA_ARGS__); \
+ else \
+ slow(__VA_ARGS__); \
+} while (0)
+
+struct kfunc_md *kfunc_md_get_noref(unsigned long ip)
+{
+ return CALL(kfunc_md_fast_get, kfunc_md_hash_get, struct kfunc_md *,
+ ip);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_get_noref);
+
+struct kfunc_md *kfunc_md_get(unsigned long ip)
+{
+ struct kfunc_md *md;
+
+ md = CALL(kfunc_md_fast_get, kfunc_md_hash_get, struct kfunc_md *,
+ ip);
+ if (md)
+ md->users++;
+ return md;
+}
+EXPORT_SYMBOL_GPL(kfunc_md_get);
+
+void kfunc_md_put(unsigned long ip)
+{
+ struct kfunc_md *md = kfunc_md_get_noref(ip);
+
+ if (md)
+ CALL_VOID(kfunc_md_fast_put, kfunc_md_hash_put, md);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_put);
+
+/* Decrease the reference of the md, release it if "md->users <= 0" */
+void kfunc_md_put_entry(struct kfunc_md *md)
+{
+ if (!md)
+ return;
+
+ CALL_VOID(kfunc_md_fast_put, kfunc_md_hash_put, md);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_put_entry);
+
+struct kfunc_md *kfunc_md_create(unsigned long ip, int nr_args)
+{
+ return CALL(kfunc_md_fast_create, kfunc_md_hash_create,
+ struct kfunc_md *, ip, nr_args);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_create);
+
+bool __weak kfunc_md_arch_support(int *insn, int *data)
+{
+ return false;
+}
+
+static int __init kfunc_md_init_test(void)
+{
+#ifndef CONFIG_FUNCTION_METADATA_PADDING
+ /* When the CONFIG_FUNCTION_METADATA_PADDING is not available, try
+ * to probe the usable function padding dynamically.
+ */
+ if (!kfunc_md_arch_support(&__insn_offset, &__data_offset))
+ static_branch_disable(&kfunc_md_use_padding);
+#endif
+ return 0;
+}
+late_initcall(kfunc_md_init_test);
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 02/25] x86: implement per-function metadata storage for x86
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 01/25] add per-function metadata storage support Menglong Dong
@ 2025-05-28 3:46 ` Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 03/25] arm64: implement per-function metadata storage for arm64 Menglong Dong
` (24 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
With CONFIG_CALL_PADDING enabled, there will be 16-bytes padding space
before all the kernel functions. And some kernel features can use it,
such as MITIGATION_CALL_DEPTH_TRACKING, CFI_CLANG, FINEIBT, etc.
In my research, MITIGATION_CALL_DEPTH_TRACKING will consume the tail
9-bytes in the function padding, CFI_CLANG will consume the head 5-bytes,
and FINEIBT will consume all the 16 bytes if it is enabled. So there will
be no space for us if MITIGATION_CALL_DEPTH_TRACKING and CFI_CLANG are
both enabled, or FINEIBT is enabled.
In order to implement the padding-based function metadata, we need 5-bytes
to prepend a "mov %eax xxx" insn in x86_64, which can hold a 4-bytes
index. So we have following logic:
1. use the head 5-bytes if CFI_CLANG is not enabled
2. use the tail 5-bytes if MITIGATION_CALL_DEPTH_TRACKING and FINEIBT are
not enabled
3. try to probe if fineibt or the call thunks is enabled after the kernel
boot dynamically
On the third case, we implement the function metadata by hash table if
"cfi_mode==CFI_FINEIBT || thunks_initialized". Therefore, we need to make
thunks_initialized global in arch/x86/kernel/callthunks.c
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
arch/x86/Kconfig | 26 +++++++++++++++++
arch/x86/include/asm/alternative.h | 2 ++
arch/x86/include/asm/ftrace.h | 47 ++++++++++++++++++++++++++++++
arch/x86/kernel/callthunks.c | 2 +-
arch/x86/kernel/ftrace.c | 26 +++++++++++++++++
5 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4b9f378e05f6..0405288c42c6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2459,6 +2459,32 @@ config PREFIX_SYMBOLS
def_bool y
depends on CALL_PADDING && !CFI_CLANG
+config FUNCTION_METADATA
+ bool "Per-function metadata storage support"
+ default y
+ depends on CC_HAS_ENTRY_PADDING && OBJTOOL
+ help
+ Support function padding based per-function metadata storage for
+ kernel functions, and get the metadata of the function by its
+ address with almost no overhead.
+
+ The index of the metadata will be stored in the function padding
+ and consumes 5-bytes.
+
+ Hash table based function metadata will be used if this option
+ is not enabled.
+
+config FUNCTION_METADATA_PADDING
+ bool "function padding is available for metadata"
+ default y
+ depends on FUNCTION_METADATA && !FINEIBT && !(CFI_CLANG && CALL_THUNKS)
+ select CALL_PADDING
+ help
+ Function padding is available for the function metadata. If this
+ option is disabled, function metadata will try to probe if there
+ are usable function padding during the system boot. If not, the
+ hash table based function metadata will be used instead.
+
menuconfig CPU_MITIGATIONS
bool "Mitigations for CPU vulnerabilities"
default y
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 4a37a8bd87fd..951edf1857c3 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -103,6 +103,8 @@ struct callthunk_sites {
};
#ifdef CONFIG_CALL_THUNKS
+extern bool thunks_initialized;
+
extern void callthunks_patch_builtin_calls(void);
extern void callthunks_patch_module_calls(struct callthunk_sites *sites,
struct module *mod);
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 93156ac4ffe0..ed1fdfce824e 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -4,6 +4,21 @@
#include <asm/ptrace.h>
+#ifdef CONFIG_FUNCTION_METADATA_PADDING
+
+#ifdef CONFIG_CFI_CLANG
+/* use the space that CALL_THUNKS suppose to use */
+#define KFUNC_MD_INSN_OFFSET (5)
+#define KFUNC_MD_DATA_OFFSET (4)
+#else
+/* use the space that CFI_CLANG suppose to use */
+#define KFUNC_MD_INSN_OFFSET (CONFIG_FUNCTION_PADDING_BYTES)
+#define KFUNC_MD_DATA_OFFSET (CONFIG_FUNCTION_PADDING_BYTES - 1)
+#endif
+#endif
+
+#define KFUNC_MD_INSN_SIZE (5)
+
#ifdef CONFIG_FUNCTION_TRACER
#ifndef CC_USING_FENTRY
# error Compiler does not support fentry?
@@ -154,6 +169,38 @@ static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
}
#endif /* CONFIG_FTRACE_SYSCALLS && CONFIG_IA32_EMULATION */
#endif /* !COMPILE_OFFSETS */
+
+#ifdef CONFIG_FUNCTION_METADATA
+#include <asm/text-patching.h>
+
+static inline bool kfunc_md_arch_exist(unsigned long ip, int insn_offset)
+{
+ return *(u8 *)(ip - insn_offset) == 0xB8;
+}
+
+static inline void kfunc_md_arch_pretend(u8 *insn, u32 index)
+{
+ *insn = 0xB8;
+ *(u32 *)(insn + 1) = index;
+}
+
+static inline void kfunc_md_arch_nops(u8 *insn)
+{
+ *(insn++) = BYTES_NOP1;
+ *(insn++) = BYTES_NOP1;
+ *(insn++) = BYTES_NOP1;
+ *(insn++) = BYTES_NOP1;
+ *(insn++) = BYTES_NOP1;
+}
+
+static inline int kfunc_md_arch_poke(void *ip, u8 *insn, int insn_offset)
+{
+ text_poke(ip, insn, insn_offset);
+ text_poke_sync();
+ return 0;
+}
+#endif /* CONFIG_FUNCTION_METADATA */
+
#endif /* !__ASSEMBLER__ */
#endif /* _ASM_X86_FTRACE_H */
diff --git a/arch/x86/kernel/callthunks.c b/arch/x86/kernel/callthunks.c
index d86d7d6e750c..6ed49904cd61 100644
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -56,7 +56,7 @@ struct core_text {
const char *name;
};
-static bool thunks_initialized __ro_after_init;
+bool thunks_initialized __ro_after_init;
static const struct core_text builtin_coretext = {
.base = (unsigned long)_text,
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index cace6e8d7cc7..2504c2556508 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -26,6 +26,7 @@
#include <linux/vmalloc.h>
#include <linux/set_memory.h>
#include <linux/execmem.h>
+#include <linux/kfunc_md.h>
#include <trace/syscall.h>
@@ -569,6 +570,31 @@ void arch_ftrace_trampoline_free(struct ftrace_ops *ops)
ops->trampoline = 0;
}
+#if defined(CONFIG_FUNCTION_METADATA) && !defined(CONFIG_FUNCTION_METADATA_PADDING)
+bool kfunc_md_arch_support(int *insn, int *data)
+{
+ /* when fineibt is enabled, the 16-bytes padding are all used */
+ if (IS_ENABLED(CONFIG_FINEIBT) && cfi_mode == CFI_FINEIBT)
+ return false;
+
+ if (IS_ENABLED(CONFIG_CALL_THUNKS) && IS_ENABLED(CONFIG_CFI_CLANG)) {
+ /* when call thunks and cfi are both enabled, no enough space
+ * for us.
+ */
+ if (thunks_initialized)
+ return false;
+ /* use the tail 5-bytes for function meta data */
+ *insn = 5;
+ *data = 4;
+
+ return true;
+ }
+
+ WARN_ON_ONCE(1);
+ return true;
+}
+#endif
+
#endif /* CONFIG_X86_64 */
#endif /* CONFIG_DYNAMIC_FTRACE */
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 03/25] arm64: implement per-function metadata storage for arm64
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 01/25] add per-function metadata storage support Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 02/25] x86: implement per-function metadata storage for x86 Menglong Dong
@ 2025-05-28 3:46 ` Menglong Dong
2025-05-28 12:16 ` kernel test robot
2025-05-28 3:46 ` [PATCH bpf-next 04/25] bpf: make kfunc_md support global trampoline link Menglong Dong
` (23 subsequent siblings)
26 siblings, 1 reply; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
The per-function metadata storage is already used by ftrace if
CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS is enabled, and it store the pointer
of the callback directly to the function padding, which consume 8-bytes,
in the commit
baaf553d3bc3 ("arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS").
So we can directly store the index to the function padding too, without
a prepending. With CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS enabled, the
function is 8-bytes aligned, and we will compile the kernel with extra
8-bytes (2 NOPS) padding space. Otherwise, the function is 4-bytes
aligned, and only extra 4-bytes (1 NOPS) is needed for us.
However, we have the same problem with Mark in the commit above: we can't
use the function padding together with CFI_CLANG, which can make the clang
compiles a wrong offset to the pre-function type hash. So we fallback to
the hash table mode for function metadata if CFI_CLANG is enabled.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
arch/arm64/Kconfig | 21 ++++++++++++++++++++
arch/arm64/Makefile | 23 ++++++++++++++++++++--
arch/arm64/include/asm/ftrace.h | 34 +++++++++++++++++++++++++++++++++
arch/arm64/kernel/ftrace.c | 13 +++++++++++--
4 files changed, 87 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a182295e6f08..db504df07072 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1549,6 +1549,27 @@ config NODES_SHIFT
Specify the maximum number of NUMA Nodes available on the target
system. Increases memory reserved to accommodate various tables.
+config FUNCTION_METADATA
+ bool "Per-function metadata storage support"
+ default y
+ select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE if !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
+ depends on !CFI_CLANG
+ help
+ Support function padding based per-function metadata storage for
+ kernel functions, and get the metadata of the function by its
+ address with almost no overhead.
+
+ The index of the metadata will be stored in the function padding,
+ which will consume 4-bytes. If FUNCTION_ALIGNMENT_8B is enabled,
+ extra 8-bytes function padding will be reserved during compiling.
+ Otherwise, only extra 4-bytes function padding is needed.
+
+ Hash table based function metadata will be used if this option
+ is not enabled.
+
+config FUNCTION_METADATA_PADDING
+ def_bool FUNCTION_METADATA
+
source "kernel/Kconfig.hz"
config ARCH_SPARSEMEM_ENABLE
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 1d5dfcd1c13e..576d6ab94dc5 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -144,12 +144,31 @@ endif
CHECKFLAGS += -D__aarch64__
+ifeq ($(CONFIG_FUNCTION_METADATA_PADDING),y)
+ ifeq ($(CONFIG_FUNCTION_ALIGNMENT_8B),y)
+ __padding_nops := 2
+ else
+ __padding_nops := 1
+ endif
+else
+ __padding_nops := 0
+endif
+
ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS),y)
+ __padding_nops := $(shell echo $(__padding_nops) + 2 | bc)
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
- CC_FLAGS_FTRACE := -fpatchable-function-entry=4,2
+ CC_FLAGS_FTRACE := -fpatchable-function-entry=$(shell echo $(__padding_nops) + 2 | bc),$(__padding_nops)
else ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_ARGS),y)
+ CC_FLAGS_FTRACE := -fpatchable-function-entry=$(shell echo $(__padding_nops) + 2 | bc),$(__padding_nops)
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
- CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+else ifeq ($(CONFIG_FUNCTION_METADATA_PADDING),y)
+ CC_FLAGS_FTRACE += -fpatchable-function-entry=$(__padding_nops),$(__padding_nops)
+ ifneq ($(CONFIG_FUNCTION_TRACER),y)
+ KBUILD_CFLAGS += $(CC_FLAGS_FTRACE)
+ # some file need to remove this cflag when CONFIG_FUNCTION_TRACER
+ # is not enabled, so we need to export it here
+ export CC_FLAGS_FTRACE
+ endif
endif
ifeq ($(CONFIG_KASAN_SW_TAGS), y)
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index bfe3ce9df197..9aafb3103829 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -24,6 +24,16 @@
#define FTRACE_PLT_IDX 0
#define NR_FTRACE_PLTS 1
+#ifdef CONFIG_FUNCTION_METADATA_PADDING
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+#define KFUNC_MD_DATA_OFFSET (AARCH64_INSN_SIZE * 3)
+#else
+#define KFUNC_MD_DATA_OFFSET AARCH64_INSN_SIZE
+#endif
+#define KFUNC_MD_INSN_SIZE AARCH64_INSN_SIZE
+#define KFUNC_MD_INSN_OFFSET KFUNC_MD_DATA_OFFSET
+#endif
+
/*
* Currently, gcc tends to save the link register after the local variables
* on the stack. This causes the max stack tracer to report the function
@@ -216,6 +226,30 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
*/
return !strcmp(sym + 8, name);
}
+
+#ifdef CONFIG_FUNCTION_METADATA_PADDING
+#include <asm/text-patching.h>
+
+static inline bool kfunc_md_arch_exist(unsigned long ip, int insn_offset)
+{
+ return !aarch64_insn_is_nop(*(u32 *)(ip - insn_offset));
+}
+
+static inline void kfunc_md_arch_pretend(u8 *insn, u32 index)
+{
+ *(u32 *)insn = index;
+}
+
+static inline void kfunc_md_arch_nops(u8 *insn)
+{
+ *(u32 *)insn = aarch64_insn_gen_nop();
+}
+
+static inline int kfunc_md_arch_poke(void *ip, u8 *insn, int insn_offset)
+{
+ return aarch64_insn_patch_text_nosync(ip, *(u32 *)insn);
+}
+#endif
#endif /* ifndef __ASSEMBLY__ */
#ifndef __ASSEMBLY__
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 5a890714ee2e..869946dabdd0 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -88,8 +88,10 @@ unsigned long ftrace_call_adjust(unsigned long addr)
* to `BL <caller>`, which is at `addr + 4` bytes in either case.
*
*/
- if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
- return addr + AARCH64_INSN_SIZE;
+ if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)) {
+ addr += AARCH64_INSN_SIZE;
+ goto out;
+ }
/*
* When using patchable-function-entry with pre-function NOPs, addr is
@@ -139,6 +141,13 @@ unsigned long ftrace_call_adjust(unsigned long addr)
/* Skip the first NOP after function entry */
addr += AARCH64_INSN_SIZE;
+out:
+ if (IS_ENABLED(CONFIG_FUNCTION_METADATA_PADDING)) {
+ if (IS_ENABLED(CONFIG_FUNCTION_ALIGNMENT_8B))
+ addr += 2 * AARCH64_INSN_SIZE;
+ else
+ addr += AARCH64_INSN_SIZE;
+ }
return addr;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 04/25] bpf: make kfunc_md support global trampoline link
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (2 preceding siblings ...)
2025-05-28 3:46 ` [PATCH bpf-next 03/25] arm64: implement per-function metadata storage for arm64 Menglong Dong
@ 2025-05-28 3:46 ` Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 05/25] x86,bpf: add bpf_global_caller for global trampoline Menglong Dong
` (22 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
Introduce the struct kfunc_md_tramp_prog for BPF_PROG_TYPE_TRACING, and
add the field "bpf_progs" to struct kfunc_md. These filed will be used
in the next patch of bpf global trampoline.
And the KFUNC_MD_FL_TRACING_ORIGIN is introduced to indicate that origin
call is needed on this function.
Add the function kfunc_md_bpf_link and kfunc_md_bpf_unlink to add or
remove bpf prog to kfunc_md. Meanwhile, introduce kunfc_md_bpf_ips() to
get all the kernel functions in kfunc_mds that contains bpf progs.
The KFUNC_MD_FL_BPF_REMOVING indicate that a removing operation is in
progress, and we shouldn't return it if "bpf_prog_cnt<=1" in
kunfc_md_bpf_ips().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/linux/kfunc_md.h | 17 ++++++
kernel/trace/kfunc_md.c | 118 +++++++++++++++++++++++++++++++++++++++
2 files changed, 135 insertions(+)
diff --git a/include/linux/kfunc_md.h b/include/linux/kfunc_md.h
index 21c0b879cc03..f1b1012eeab2 100644
--- a/include/linux/kfunc_md.h
+++ b/include/linux/kfunc_md.h
@@ -3,12 +3,21 @@
#define _LINUX_KFUNC_MD_H
#define KFUNC_MD_FL_DEAD (1 << 0) /* the md shouldn't be reused */
+#define KFUNC_MD_FL_TRACING_ORIGIN (1 << 1)
+#define KFUNC_MD_FL_BPF_REMOVING (1 << 2)
#ifndef __ASSEMBLER__
#include <linux/kernel.h>
#include <linux/bpf.h>
+struct kfunc_md_tramp_prog {
+ struct kfunc_md_tramp_prog *next;
+ struct bpf_prog *prog;
+ u64 cookie;
+ struct rcu_head rcu;
+};
+
struct kfunc_md_array;
struct kfunc_md {
@@ -19,6 +28,7 @@ struct kfunc_md {
struct rcu_head rcu;
#endif
unsigned long func;
+ struct kfunc_md_tramp_prog *bpf_progs[BPF_TRAMP_MAX];
#ifdef CONFIG_FUNCTION_METADATA
/* the array is used for the fast mode */
struct kfunc_md_array *array;
@@ -26,6 +36,7 @@ struct kfunc_md {
struct percpu_ref pcref;
u32 flags;
u16 users;
+ u8 bpf_prog_cnt;
u8 nr_args;
};
@@ -40,5 +51,11 @@ void kfunc_md_exit(struct kfunc_md *md);
void kfunc_md_enter(struct kfunc_md *md);
bool kfunc_md_arch_support(int *insn, int *data);
+int kfunc_md_bpf_ips(void ***ips);
+
+int kfunc_md_bpf_unlink(struct kfunc_md *md, struct bpf_prog *prog, int type);
+int kfunc_md_bpf_link(struct kfunc_md *md, struct bpf_prog *prog, int type,
+ u64 cookie);
+
#endif
#endif
diff --git a/kernel/trace/kfunc_md.c b/kernel/trace/kfunc_md.c
index 9571081f6560..ebb4e46d482d 100644
--- a/kernel/trace/kfunc_md.c
+++ b/kernel/trace/kfunc_md.c
@@ -131,6 +131,23 @@ static bool kfunc_md_fast(void)
{
return static_branch_likely(&kfunc_md_use_padding);
}
+
+static int kfunc_md_hash_bpf_ips(void **ips)
+{
+ struct hlist_head *head;
+ struct kfunc_md *md;
+ int c = 0, i;
+
+ for (i = 0; i < (1 << KFUNC_MD_HASH_BITS); i++) {
+ head = &kfunc_md_table[i];
+ hlist_for_each_entry(md, head, hash) {
+ if (md->bpf_prog_cnt > !!(md->flags & KFUNC_MD_FL_BPF_REMOVING))
+ ips[c++] = (void *)md->func;
+ }
+ }
+
+ return c;
+}
#else
static void kfunc_md_hash_put(struct kfunc_md *md)
@@ -148,6 +165,11 @@ static struct kfunc_md *kfunc_md_hash_create(unsigned long ip, int nr_args)
}
#define kfunc_md_fast() 1
+
+static int kfunc_md_hash_bpf_ips(void **ips)
+{
+ return 0;
+}
#endif /* CONFIG_FUNCTION_METADATA_PADDING */
#ifdef CONFIG_FUNCTION_METADATA
@@ -442,6 +464,19 @@ static struct kfunc_md *kfunc_md_fast_create(unsigned long ip, int nr_args)
return md;
}
+
+static int kfunc_md_fast_bpf_ips(void **ips)
+{
+ struct kfunc_md *md;
+ int i, c = 0;
+
+ for (i = 0; i < kfunc_mds->kfunc_md_count; i++) {
+ md = &kfunc_mds->mds[i];
+ if (md->users && md->bpf_prog_cnt > !!(md->flags & KFUNC_MD_FL_BPF_REMOVING))
+ ips[c++] = (void *)md->func;
+ }
+ return c;
+}
#else
static void kfunc_md_fast_put(struct kfunc_md *md)
@@ -458,6 +493,10 @@ static struct kfunc_md *kfunc_md_fast_create(unsigned long ip, int nr_args)
return NULL;
}
+static int kfunc_md_fast_bpf_ips(void **ips)
+{
+ return 0;
+}
#endif /* !CONFIG_FUNCTION_METADATA */
void kfunc_md_enter(struct kfunc_md *md)
@@ -547,6 +586,85 @@ struct kfunc_md *kfunc_md_create(unsigned long ip, int nr_args)
}
EXPORT_SYMBOL_GPL(kfunc_md_create);
+int kfunc_md_bpf_ips(void ***ips)
+{
+ void **tmp;
+ int c;
+
+ c = atomic_read(&kfunc_mds->kfunc_md_used);
+ if (!c)
+ return 0;
+
+ tmp = kmalloc_array(c, sizeof(*tmp), GFP_KERNEL);
+ if (!tmp)
+ return -ENOMEM;
+
+ rcu_read_lock();
+ c = CALL(kfunc_md_fast_bpf_ips, kfunc_md_hash_bpf_ips, int, tmp);
+ rcu_read_unlock();
+
+ *ips = tmp;
+
+ return c;
+}
+
+int kfunc_md_bpf_link(struct kfunc_md *md, struct bpf_prog *prog, int type,
+ u64 cookie)
+{
+ struct kfunc_md_tramp_prog *tramp_prog, **last;
+
+ tramp_prog = md->bpf_progs[type];
+ /* check if the prog is already linked */
+ while (tramp_prog) {
+ if (tramp_prog->prog == prog)
+ return -EEXIST;
+ tramp_prog = tramp_prog->next;
+ }
+
+ tramp_prog = kmalloc(sizeof(*tramp_prog), GFP_KERNEL);
+ if (!tramp_prog)
+ return -ENOMEM;
+
+ tramp_prog->prog = prog;
+ tramp_prog->cookie = cookie;
+ tramp_prog->next = NULL;
+
+ /* add the new prog to the list tail */
+ last = &md->bpf_progs[type];
+ while (*last)
+ last = &(*last)->next;
+ *last = tramp_prog;
+
+ md->bpf_prog_cnt++;
+ if (type == BPF_TRAMP_FEXIT || type == BPF_TRAMP_MODIFY_RETURN)
+ md->flags |= KFUNC_MD_FL_TRACING_ORIGIN;
+
+ return 0;
+}
+
+int kfunc_md_bpf_unlink(struct kfunc_md *md, struct bpf_prog *prog, int type)
+{
+ struct kfunc_md_tramp_prog *cur, **prev;
+
+ prev = &md->bpf_progs[type];
+ while (*prev && (*prev)->prog != prog)
+ prev = &(*prev)->next;
+
+ cur = *prev;
+ if (!cur)
+ return -EINVAL;
+
+ *prev = cur->next;
+ kfree_rcu(cur, rcu);
+ md->bpf_prog_cnt--;
+
+ if (!md->bpf_progs[BPF_TRAMP_FEXIT] &&
+ !md->bpf_progs[BPF_TRAMP_MODIFY_RETURN])
+ md->flags &= ~KFUNC_MD_FL_TRACING_ORIGIN;
+
+ return 0;
+}
+
bool __weak kfunc_md_arch_support(int *insn, int *data)
{
return false;
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 05/25] x86,bpf: add bpf_global_caller for global trampoline
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (3 preceding siblings ...)
2025-05-28 3:46 ` [PATCH bpf-next 04/25] bpf: make kfunc_md support global trampoline link Menglong Dong
@ 2025-05-28 3:46 ` Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 06/25] ftrace: factor out ftrace_direct_update from register_ftrace_direct Menglong Dong
` (21 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
Implement the bpf global trampoline "bpf_global_caller" for x86_64. The
logic of it is similar to the bpf trampoline:
1. save the regs for function args. For now, only the function with args
count no more than 6 is supported
2. save rbx and r12, which will be used to store the prog list and return
value of __bpf_prog_enter_recur
3. get the origin function address on the stack. To get the real function
address, we make it "&= $0xfffffffffffffff0", as it is always 16-bytes
aligned
4. get the function metadata by calling kfunc_md_get_noref()
5. get the function args count from the kfunc_md and store it on the
stack.
6. get the kfunc_md flags and store it on the stack. Call kfunc_md_enter()
if origin call is needed
7. get the prog list for FENTRY, and run all the progs in the list with
bpf_caller_prog_run
8. goto the end if origin call is not necessary
9. get the prog list for MODIFY_RETURN, and run all the progs in the list
with bpf_caller_prog_run
10.restore the regs and do the origin call. We get the ip of the origin
function by the rip in the stack
11.save the return value of the origin call to the stack.
12.get the prog list for FEXIT, and run all the progs in the list with
bpf_caller_prog_run
13.restore rbx, r12, r13. In order to rebalance the RSB, we call
bpf_global_caller_rsb here.
Indirect call is used in bpf_caller_prog_run, as we load and call the
function address from the stack in the origin call case. What's more, we
get the bpf progs from the kfunc_md and call it indirectly. We make the
indirect call with CALL_NOSPEC, and I'm not sure if it can prevent the
Spectre. I just saw others do it in the same way :/
We use the r13 to keep the address where we put the return value of the
origin call on the stack. The offset of it is
"FUNC_ARGS_OFFSET + 8 * nr_args".
The calling of kfunc_md_get_noref() should be within rcu_read_lock, which
I don't, as this will increase the overhead of a function call. And I'm
considering to make the calling of the bpf prog list within the rcu lock:
rcu_read_lock()
kfunc_md_get_noref()
call fentry progs
call modify_return progs
rcu_read_unlock()
call origin
rcu_read_lock()
call fexit progs
rcu_read_unlock()
I'm not sure why the general bpf trampoline don't do it this way. Because
this will make the trampoline hold the rcu lock too long?
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
arch/x86/Kconfig | 4 +
arch/x86/kernel/asm-offsets.c | 15 +++
arch/x86/kernel/ftrace_64.S | 231 ++++++++++++++++++++++++++++++++++
include/linux/bpf.h | 4 +
kernel/bpf/trampoline.c | 6 +-
5 files changed, 257 insertions(+), 3 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0405288c42c6..6d37f814701a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -153,6 +153,7 @@ config X86
select ARCH_WANT_HUGETLB_VMEMMAP_PREINIT if X86_64
select ARCH_WANTS_THP_SWAP if X86_64
select ARCH_HAS_PARANOID_L1D_FLUSH
+ select ARCH_HAS_BPF_GLOBAL_CALLER if X86_64
select BUILDTIME_TABLE_SORT
select CLKEVT_I8253
select CLOCKSOURCE_WATCHDOG
@@ -431,6 +432,9 @@ config PGTABLE_LEVELS
default 3 if X86_PAE
default 2
+config ARCH_HAS_BPF_GLOBAL_CALLER
+ bool
+
menu "Processor type and features"
config SMP
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index ad4ea6fb3b6c..a35831be3054 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -12,7 +12,9 @@
#include <linux/stddef.h>
#include <linux/hardirq.h>
#include <linux/suspend.h>
+#include <linux/bpf.h>
#include <linux/kbuild.h>
+#include <linux/kfunc_md.h>
#include <asm/processor.h>
#include <asm/thread_info.h>
#include <asm/sigframe.h>
@@ -115,4 +117,17 @@ static void __used common(void)
OFFSET(ARIA_CTX_rounds, aria_ctx, rounds);
#endif
+ DEFINE(RUN_CTX_SIZE, sizeof(struct bpf_tramp_run_ctx));
+ OFFSET(RUN_CTX_cookie, bpf_tramp_run_ctx, bpf_cookie);
+
+ OFFSET(BPF_PROG_func, bpf_prog, bpf_func);
+
+ DEFINE(KFUNC_MD_SIZE, sizeof(struct kfunc_md));
+ OFFSET(KFUNC_MD_progs, kfunc_md, bpf_progs);
+ OFFSET(KFUNC_MD_addr, kfunc_md, func);
+ OFFSET(KFUNC_MD_flags, kfunc_md, flags);
+ OFFSET(KFUNC_MD_nr_args, kfunc_md, nr_args);
+
+ OFFSET(KFUNC_MD_PROG_prog, kfunc_md_tramp_prog, prog);
+ OFFSET(KFUNC_MD_PROG_cookie, kfunc_md_tramp_prog, cookie);
}
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 367da3638167..62269a67bf3a 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -6,6 +6,7 @@
#include <linux/export.h>
#include <linux/cfi_types.h>
#include <linux/linkage.h>
+#include <linux/kfunc_md.h>
#include <asm/asm-offsets.h>
#include <asm/ptrace.h>
#include <asm/ftrace.h>
@@ -384,3 +385,233 @@ SYM_CODE_START(return_to_handler)
X86_FEATURE_CALL_DEPTH
SYM_CODE_END(return_to_handler)
#endif
+
+/*
+ * the stack layout for bpf_global_caller:
+ *
+ * callee rip
+ * rbp
+ * ---------------------- rbp
+ * return value - for origin call
+ * arg6
+ * ......
+ * arg1
+ * arg count
+ * origin ip - for bpf_get_func_ip()
+ * rbx - keep pointer to kfunc_md_tramp_prog
+ * bpf_tramp_run_ctx
+ * kfunc_md_ptr
+ * kfunc_md_flags
+ * r12 - keep the start time
+ * r13 - keep the return value address, for origin call
+ *
+ * Note: the return value can be in the position of arg6, arg5, etc,
+ * depending on the number of args. That's why we need the %r13
+ */
+
+#define FUNC_RETURN_OFFSET (-8)
+
+#define FUNC_ARGS_SIZE (6 * 8)
+#define FUNC_ARGS_OFFSET (FUNC_RETURN_OFFSET - FUNC_ARGS_SIZE)
+#define FUNC_ARGS_1 (FUNC_ARGS_OFFSET + 0 * 8)
+#define FUNC_ARGS_2 (FUNC_ARGS_OFFSET + 1 * 8)
+#define FUNC_ARGS_3 (FUNC_ARGS_OFFSET + 2 * 8)
+#define FUNC_ARGS_4 (FUNC_ARGS_OFFSET + 3 * 8)
+#define FUNC_ARGS_5 (FUNC_ARGS_OFFSET + 4 * 8)
+#define FUNC_ARGS_6 (FUNC_ARGS_OFFSET + 5 * 8)
+
+/* the args count, rbp - 8 * 8 */
+#define FUNC_ARGS_COUNT_OFFSET (FUNC_ARGS_OFFSET - 1 * 8)
+#define FUNC_ORIGIN_IP (FUNC_ARGS_OFFSET - 2 * 8) /* -9 * 8 */
+#define RBX_OFFSET (FUNC_ARGS_OFFSET - 3 * 8)
+
+/* bpf_tramp_run_ctx, rbp - RUN_CTX_OFFSET */
+#define RUN_CTX_OFFSET (RBX_OFFSET - RUN_CTX_SIZE)
+#define KFUNC_MD_OFFSET (RUN_CTX_OFFSET - 1 * 8)
+#define KFUNC_MD_FLAGS_OFFSET (KFUNC_MD_OFFSET - 8)
+#define R12_OFFSET (KFUNC_MD_OFFSET - 16)
+#define R13_OFFSET (KFUNC_MD_OFFSET - 24)
+#define STACK_SIZE (-1 * R13_OFFSET)
+
+/* restore the regs before we return or before we do the origin call */
+.macro tramp_restore_regs
+ movq FUNC_ARGS_1(%rbp), %rdi
+ movq FUNC_ARGS_2(%rbp), %rsi
+ movq FUNC_ARGS_3(%rbp), %rdx
+ movq FUNC_ARGS_4(%rbp), %rcx
+ movq FUNC_ARGS_5(%rbp), %r8
+ movq FUNC_ARGS_6(%rbp), %r9
+ .endm
+
+/* save the args to stack, only regs is supported for now */
+.macro tramp_save_regs
+ movq %rdi, FUNC_ARGS_1(%rbp)
+ movq %rsi, FUNC_ARGS_2(%rbp)
+ movq %rdx, FUNC_ARGS_3(%rbp)
+ movq %rcx, FUNC_ARGS_4(%rbp)
+ movq %r8, FUNC_ARGS_5(%rbp)
+ movq %r9, FUNC_ARGS_6(%rbp)
+ .endm
+
+#define BPF_TRAMP_FENTRY 0
+#define BPF_TRAMP_FEXIT 1
+#define BPF_TRAMP_MODIFY_RETURN 2
+
+.macro bpf_caller_prog_run type
+ /* check if the prog list is NULL */
+1: testq %rbx, %rbx
+ jz 3f
+
+ /* load bpf prog to the 1st arg */
+ movq KFUNC_MD_PROG_prog(%rbx), %rdi
+
+ /* load the pointer of tramp_run_ctx to the 2nd arg */
+ leaq RUN_CTX_OFFSET(%rbp), %rsi
+ /* save the bpf cookie to the tramp_run_ctx */
+ movq KFUNC_MD_PROG_cookie(%rbx), %rax
+ movq %rax, RUN_CTX_cookie(%rsi)
+ call __bpf_prog_enter_recur
+ /* save the start time to r12 */
+ movq %rax, %r12
+ testq %rax, %rax
+ jz 2f
+
+ movq KFUNC_MD_PROG_prog(%rbx), %rdi
+ /* load the JITed prog to rax */
+ movq BPF_PROG_func(%rdi), %rax
+ /* load func args array to the 1st arg */
+ leaq FUNC_ARGS_OFFSET(%rbp), %rdi
+
+ /* load and call the JITed bpf func */
+ CALL_NOSPEC rax
+.if \type==BPF_TRAMP_MODIFY_RETURN
+ /* modify_return case, save the return value */
+ movq %rax, (%r13)
+.endif
+
+ /* load bpf prog to the 1st arg */
+2: movq KFUNC_MD_PROG_prog(%rbx), %rdi
+ /* load the rbx(start time) to the 2nd arg */
+ movq %r12, %rsi
+ /* load the pointer of tramp_run_ctx to the 3rd arg */
+ leaq RUN_CTX_OFFSET(%rbp), %rdx
+ call __bpf_prog_exit_recur
+
+.if \type==BPF_TRAMP_MODIFY_RETURN
+ /* modify_return case, break the loop and skip the origin function call */
+ cmpq $0, (%r13)
+ jne do_bpf_fexit
+.endif
+ /* load the next tramp prog to rbx */
+ movq 0(%rbx), %rbx
+ jmp 1b
+
+3:
+ .endm
+
+SYM_FUNC_START(bpf_global_caller)
+ ANNOTATE_NOENDBR
+
+ /* prepare the stack space and store the args to the stack */
+ pushq %rbp
+ movq %rsp, %rbp
+ subq $STACK_SIZE, %rsp
+ tramp_save_regs
+
+ CALL_DEPTH_ACCOUNT
+
+ /* save rbx and r12, which will be used later */
+ movq %rbx, RBX_OFFSET(%rbp)
+ movq %r12, R12_OFFSET(%rbp)
+
+ /* get the function address */
+ movq 8(%rbp), %rdi
+ /* for x86_64, the function is 16-bytes aligned */
+ andq $0xfffffffffffffff0, %rdi
+ /* save the origin function ip */
+ movq %rdi, FUNC_ORIGIN_IP(%rbp)
+
+ /* get the function meta data */
+ call kfunc_md_get_noref
+ testq %rax, %rax
+ jz do_bpf_out
+ movq %rax, %rbx
+ movq %rbx, KFUNC_MD_OFFSET(%rbp)
+
+ /* save the function args count */
+ movzbq KFUNC_MD_nr_args(%rbx), %rax
+ movq %rax, FUNC_ARGS_COUNT_OFFSET(%rbp)
+
+ /* call kfunc_md_enter only if we need origin call */
+ movl KFUNC_MD_flags(%rbx), %edi
+ movl %edi, KFUNC_MD_FLAGS_OFFSET(%rbp)
+ andl $KFUNC_MD_FL_TRACING_ORIGIN, %edi
+ jz 1f
+
+ /* save the address of the return value to r13 */
+ movq %r13, R13_OFFSET(%rbp)
+ leaq FUNC_ARGS_OFFSET(%rbp, %rax, 8), %r13
+
+ movq %rbx, %rdi
+ call kfunc_md_enter
+
+ /* try run fentry progs */
+1: movq (KFUNC_MD_progs + BPF_TRAMP_FENTRY * 8)(%rbx), %rbx
+ bpf_caller_prog_run BPF_TRAMP_FENTRY
+
+ /* check if we need to do the origin call */
+ movl KFUNC_MD_FLAGS_OFFSET(%rbp), %eax
+ andl $KFUNC_MD_FL_TRACING_ORIGIN, %eax
+ jz do_bpf_out
+
+ /* try run modify_return progs */
+ movq KFUNC_MD_OFFSET(%rbp), %r12
+ movq (KFUNC_MD_progs + BPF_TRAMP_MODIFY_RETURN * 8)(%r12), %rbx
+ movq $0, (%r13)
+ bpf_caller_prog_run BPF_TRAMP_MODIFY_RETURN
+
+ /* do the origin call */
+ tramp_restore_regs
+ /* call the origin function from the stack, just like BPF_TRAMP_F_ORIG_STACK */
+ movq 8(%rbp), %rax
+ CALL_NOSPEC rax
+ movq %rax, (%r13)
+
+do_bpf_fexit:
+ /* for origin case, run fexit and return */
+ movq KFUNC_MD_OFFSET(%rbp), %r12
+ movq (KFUNC_MD_progs + BPF_TRAMP_FEXIT * 8)(%r12), %rbx
+ bpf_caller_prog_run BPF_TRAMP_FEXIT
+ movq KFUNC_MD_OFFSET(%rbp), %rdi
+ call kfunc_md_exit
+
+ movq (%r13), %rax
+ movq RBX_OFFSET(%rbp), %rbx
+ movq R12_OFFSET(%rbp), %r12
+ movq R13_OFFSET(%rbp), %r13
+ leave
+
+ /* rebalance the RSB. We can simply use:
+ *
+ * leaq 8(%rsp), %rsp
+ * RET
+ *
+ * instead here if we don't want do the rebalance.
+ */
+ movq $bpf_global_caller_rsb, (%rsp)
+ RET
+SYM_INNER_LABEL(bpf_global_caller_rsb, SYM_L_LOCAL)
+ ANNOTATE_NOENDBR
+ RET
+
+do_bpf_out:
+ /* for no origin call case, restore regs and return */
+ tramp_restore_regs
+
+ movq RBX_OFFSET(%rbp), %rbx
+ movq R12_OFFSET(%rbp), %r12
+ leave
+ RET
+
+SYM_FUNC_END(bpf_global_caller)
+STACK_FRAME_NON_STANDARD(bpf_global_caller)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5b25d278409b..8979e397ea06 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3554,6 +3554,10 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
u32 num_args, struct bpf_bprintf_data *data);
void bpf_bprintf_cleanup(struct bpf_bprintf_data *data);
+u64 __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx);
+void __bpf_prog_exit_recur(struct bpf_prog *prog, u64 start,
+ struct bpf_tramp_run_ctx *run_ctx);
+
#ifdef CONFIG_BPF_LSM
void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
void bpf_cgroup_atype_put(int cgroup_atype);
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index c4b1a98ff726..da4be23f03c3 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -894,7 +894,7 @@ static __always_inline u64 notrace bpf_prog_start_time(void)
* [2..MAX_U64] - execute bpf prog and record execution time.
* This is start time.
*/
-static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx)
+u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx)
__acquires(RCU)
{
rcu_read_lock();
@@ -934,8 +934,8 @@ static void notrace update_prog_stats(struct bpf_prog *prog,
}
}
-static void notrace __bpf_prog_exit_recur(struct bpf_prog *prog, u64 start,
- struct bpf_tramp_run_ctx *run_ctx)
+void notrace __bpf_prog_exit_recur(struct bpf_prog *prog, u64 start,
+ struct bpf_tramp_run_ctx *run_ctx)
__releases(RCU)
{
bpf_reset_run_ctx(run_ctx->saved_run_ctx);
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 06/25] ftrace: factor out ftrace_direct_update from register_ftrace_direct
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (4 preceding siblings ...)
2025-05-28 3:46 ` [PATCH bpf-next 05/25] x86,bpf: add bpf_global_caller for global trampoline Menglong Dong
@ 2025-05-28 3:46 ` Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 07/25] ftrace: add reset_ftrace_direct_ips Menglong Dong
` (20 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
Factor out ftrace_direct_update() from register_ftrace_direct(), which is
used to add new entries to the direct_functions. This function will be
used in the later patch.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
kernel/trace/ftrace.c | 108 +++++++++++++++++++++++-------------------
1 file changed, 60 insertions(+), 48 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 61130bb34d6c..a1028942e743 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5910,53 +5910,18 @@ static void register_ftrace_direct_cb(struct rcu_head *rhp)
free_ftrace_hash(fhp);
}
-/**
- * register_ftrace_direct - Call a custom trampoline directly
- * for multiple functions registered in @ops
- * @ops: The address of the struct ftrace_ops object
- * @addr: The address of the trampoline to call at @ops functions
- *
- * This is used to connect a direct calls to @addr from the nop locations
- * of the functions registered in @ops (with by ftrace_set_filter_ip
- * function).
- *
- * The location that it calls (@addr) must be able to handle a direct call,
- * and save the parameters of the function being traced, and restore them
- * (or inject new ones if needed), before returning.
- *
- * Returns:
- * 0 on success
- * -EINVAL - The @ops object was already registered with this call or
- * when there are no functions in @ops object.
- * -EBUSY - Another direct function is already attached (there can be only one)
- * -ENODEV - @ip does not point to a ftrace nop location (or not supported)
- * -ENOMEM - There was an allocation failure.
- */
-int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
+static int ftrace_direct_update(struct ftrace_hash *hash, unsigned long addr)
{
- struct ftrace_hash *hash, *new_hash = NULL, *free_hash = NULL;
struct ftrace_func_entry *entry, *new;
+ struct ftrace_hash *new_hash = NULL;
int err = -EBUSY, size, i;
- if (ops->func || ops->trampoline)
- return -EINVAL;
- if (!(ops->flags & FTRACE_OPS_FL_INITIALIZED))
- return -EINVAL;
- if (ops->flags & FTRACE_OPS_FL_ENABLED)
- return -EINVAL;
-
- hash = ops->func_hash->filter_hash;
- if (ftrace_hash_empty(hash))
- return -EINVAL;
-
- mutex_lock(&direct_mutex);
-
/* Make sure requested entries are not already registered.. */
size = 1 << hash->size_bits;
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
if (ftrace_find_rec_direct(entry->ip))
- goto out_unlock;
+ goto out;
}
}
@@ -5969,7 +5934,7 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
size = FTRACE_HASH_MAX_BITS;
new_hash = alloc_ftrace_hash(size);
if (!new_hash)
- goto out_unlock;
+ goto out;
/* Now copy over the existing direct entries */
size = 1 << direct_functions->size_bits;
@@ -5977,7 +5942,7 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
hlist_for_each_entry(entry, &direct_functions->buckets[i], hlist) {
new = add_hash_entry(new_hash, entry->ip);
if (!new)
- goto out_unlock;
+ goto out;
new->direct = entry->direct;
}
}
@@ -5988,16 +5953,67 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = add_hash_entry(new_hash, entry->ip);
if (!new)
- goto out_unlock;
+ goto out;
/* Update both the copy and the hash entry */
new->direct = addr;
entry->direct = addr;
}
}
- free_hash = direct_functions;
rcu_assign_pointer(direct_functions, new_hash);
new_hash = NULL;
+ err = 0;
+out:
+ if (new_hash)
+ free_ftrace_hash(new_hash);
+
+ return err;
+}
+
+/**
+ * register_ftrace_direct - Call a custom trampoline directly
+ * for multiple functions registered in @ops
+ * @ops: The address of the struct ftrace_ops object
+ * @addr: The address of the trampoline to call at @ops functions
+ *
+ * This is used to connect a direct calls to @addr from the nop locations
+ * of the functions registered in @ops (with by ftrace_set_filter_ip
+ * function).
+ *
+ * The location that it calls (@addr) must be able to handle a direct call,
+ * and save the parameters of the function being traced, and restore them
+ * (or inject new ones if needed), before returning.
+ *
+ * Returns:
+ * 0 on success
+ * -EINVAL - The @ops object was already registered with this call or
+ * when there are no functions in @ops object.
+ * -EBUSY - Another direct function is already attached (there can be only one)
+ * -ENODEV - @ip does not point to a ftrace nop location (or not supported)
+ * -ENOMEM - There was an allocation failure.
+ */
+int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
+{
+ struct ftrace_hash *hash, *free_hash = NULL;
+ int err = -EBUSY;
+
+ if (ops->func || ops->trampoline)
+ return -EINVAL;
+ if (!(ops->flags & FTRACE_OPS_FL_INITIALIZED))
+ return -EINVAL;
+ if (ops->flags & FTRACE_OPS_FL_ENABLED)
+ return -EINVAL;
+
+ hash = ops->func_hash->filter_hash;
+ if (ftrace_hash_empty(hash))
+ return -EINVAL;
+
+ mutex_lock(&direct_mutex);
+
+ free_hash = direct_functions;
+ err = ftrace_direct_update(hash, addr);
+ if (err)
+ goto out_unlock;
ops->func = call_direct_funcs;
ops->flags = MULTI_FLAGS;
@@ -6005,15 +6021,11 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
ops->direct_call = addr;
err = register_ftrace_function_nolock(ops);
-
- out_unlock:
- mutex_unlock(&direct_mutex);
-
if (free_hash && free_hash != EMPTY_HASH)
call_rcu_tasks(&free_hash->rcu, register_ftrace_direct_cb);
- if (new_hash)
- free_ftrace_hash(new_hash);
+ out_unlock:
+ mutex_unlock(&direct_mutex);
return err;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 07/25] ftrace: add reset_ftrace_direct_ips
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (5 preceding siblings ...)
2025-05-28 3:46 ` [PATCH bpf-next 06/25] ftrace: factor out ftrace_direct_update from register_ftrace_direct Menglong Dong
@ 2025-05-28 3:46 ` Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 08/25] bpf: introduce bpf_gtramp_link Menglong Dong
` (19 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
For now, we can change the address of a direct ftrace_ops with
modify_ftrace_direct(). However, we can't change the functions to filter
for a direct ftrace_ops. Therefore, we introduce the function
reset_ftrace_direct_ips() to do such things, and this function will reset
the functions to filter for a direct ftrace_ops.
This function do such thing in following steps:
1. filter out the new functions from ips that don't exist in the
ops->func_hash->filter_hash and add them to the new hash.
2. add all the functions in the new ftrace_hash to direct_functions by
ftrace_direct_update().
3. reset the functions to filter of the ftrace_ops to the ips with
ftrace_set_filter_ips().
4. remove the functions that in the old ftrace_hash, but not in the new
ftrace_hash from direct_functions.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/linux/ftrace.h | 7 ++++
kernel/trace/ftrace.c | 75 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 82 insertions(+)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index fbabc3d848b3..40727d3f125d 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -526,6 +526,8 @@ int modify_ftrace_direct_nolock(struct ftrace_ops *ops, unsigned long addr);
void ftrace_stub_direct_tramp(void);
+int reset_ftrace_direct_ips(struct ftrace_ops *ops, unsigned long *ips,
+ unsigned int cnt);
#else
struct ftrace_ops;
static inline unsigned long ftrace_find_rec_direct(unsigned long ip)
@@ -549,6 +551,11 @@ static inline int modify_ftrace_direct_nolock(struct ftrace_ops *ops, unsigned l
{
return -ENODEV;
}
+static inline int reset_ftrace_direct_ips(struct ftrace_ops *ops, unsigned long *ips,
+ unsigned int cnt)
+{
+ return -ENODEV;
+}
/*
* This must be implemented by the architecture.
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index a1028942e743..0befb4c93e89 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6181,6 +6181,81 @@ int modify_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
return err;
}
EXPORT_SYMBOL_GPL(modify_ftrace_direct);
+
+/* reset the ips for a direct ftrace (add or remove) */
+int reset_ftrace_direct_ips(struct ftrace_ops *ops, unsigned long *ips,
+ unsigned int cnt)
+{
+ struct ftrace_hash *hash, *free_hash;
+ struct ftrace_func_entry *entry, *del;
+ unsigned long ip;
+ int err, size;
+
+ if (check_direct_multi(ops))
+ return -EINVAL;
+ if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
+ return -EINVAL;
+
+ mutex_lock(&direct_mutex);
+ hash = alloc_ftrace_hash(FTRACE_HASH_DEFAULT_BITS);
+ if (!hash) {
+ err = -ENOMEM;
+ goto out_unlock;
+ }
+
+ /* find out the new functions from ips and add to hash */
+ for (int i = 0; i < cnt; i++) {
+ ip = ftrace_location(ips[i]);
+ if (!ip) {
+ err = -ENOENT;
+ goto out_unlock;
+ }
+ if (__ftrace_lookup_ip(ops->func_hash->filter_hash, ip))
+ continue;
+ err = __ftrace_match_addr(hash, ip, 0);
+ if (err)
+ goto out_unlock;
+ }
+
+ free_hash = direct_functions;
+ /* add the new ips to direct hash. */
+ err = ftrace_direct_update(hash, ops->direct_call);
+ if (err)
+ goto out_unlock;
+
+ if (free_hash && free_hash != EMPTY_HASH)
+ call_rcu_tasks(&free_hash->rcu, register_ftrace_direct_cb);
+
+ free_ftrace_hash(hash);
+ hash = alloc_and_copy_ftrace_hash(FTRACE_HASH_DEFAULT_BITS,
+ ops->func_hash->filter_hash);
+ if (!hash) {
+ err = -ENOMEM;
+ goto out_unlock;
+ }
+ err = ftrace_set_filter_ips(ops, ips, cnt, 0, 1);
+
+ /* remove the entries that don't exist in our filter_hash anymore
+ * from the direct_functions.
+ */
+ size = 1 << hash->size_bits;
+ for (int i = 0; i < size; i++) {
+ hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
+ if (__ftrace_lookup_ip(ops->func_hash->filter_hash, entry->ip))
+ continue;
+ del = __ftrace_lookup_ip(direct_functions, entry->ip);
+ if (del && del->direct == ops->direct_call) {
+ remove_hash_entry(direct_functions, del);
+ kfree(del);
+ }
+ }
+ }
+out_unlock:
+ mutex_unlock(&direct_mutex);
+ if (hash)
+ free_ftrace_hash(hash);
+ return err;
+}
#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
/**
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 08/25] bpf: introduce bpf_gtramp_link
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (6 preceding siblings ...)
2025-05-28 3:46 ` [PATCH bpf-next 07/25] ftrace: add reset_ftrace_direct_ips Menglong Dong
@ 2025-05-28 3:46 ` Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 09/25] bpf: tracing: add support to record and check the accessed args Menglong Dong
` (18 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
Introduce the struct bpf_gtramp_link, which is used to attach
a bpf prog to multi functions. Meanwhile, introduce corresponding
function bpf_gtrampoline_{link,unlink}_prog.
The lock global_tr_lock is held during global trampoline link and unlink.
Why we define the global_tr_lock as rw_semaphore? Well, it should be mutex
here, but we will use the rw_semaphore in the later patch for the
trampoline override case :/
When unlink the global trampoline link, we mark all the function in the
bpf_gtramp_link with KFUNC_MD_FL_BPF_REMOVING and update the global
trampoline with bpf_gtrampoline_update(). If this is the last bpf prog
in the kfunc_md, the function will be remove from the filter_hash of the
ftrace_ops of bpf_global_trampoline. Then, we remove the bpf prog from
the kfunc_md, and free the kfunc_md if necessary.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/linux/bpf.h | 31 +++++++
kernel/bpf/trampoline.c | 183 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 214 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8979e397ea06..7527399bab5b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -58,12 +58,15 @@ struct bpf_token;
struct user_namespace;
struct super_block;
struct inode;
+struct bpf_tramp_link;
+struct bpf_gtramp_link;
extern struct idr btf_idr;
extern spinlock_t btf_idr_lock;
extern struct kobject *btf_kobj;
extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
extern bool bpf_global_ma_set;
+extern struct bpf_global_trampoline global_tr;
typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
@@ -1279,6 +1282,12 @@ struct bpf_trampoline {
struct bpf_tramp_image *cur_image;
};
+struct bpf_global_trampoline {
+ struct list_head list;
+ struct ftrace_ops *fops;
+ void *image;
+};
+
struct bpf_attach_target_info {
struct btf_func_model fmodel;
long tgt_addr;
@@ -1382,6 +1391,12 @@ struct bpf_trampoline *bpf_trampoline_get(u64 key,
void bpf_trampoline_put(struct bpf_trampoline *tr);
int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_funcs);
+#ifdef CONFIG_ARCH_HAS_BPF_GLOBAL_CALLER
+void bpf_global_caller(void);
+#endif
+int bpf_gtrampoline_link_prog(struct bpf_gtramp_link *link);
+int bpf_gtrampoline_unlink_prog(struct bpf_gtramp_link *link);
+
/*
* When the architecture supports STATIC_CALL replace the bpf_dispatcher_fn
* indirection with a direct call to the bpf program. If the architecture does
@@ -1746,6 +1761,22 @@ struct bpf_shim_tramp_link {
struct bpf_trampoline *trampoline;
};
+struct bpf_gtramp_link_entry {
+ struct bpf_prog *tgt_prog;
+ struct bpf_trampoline *trampoline;
+ void *addr;
+ struct btf *attach_btf;
+ u64 cookie;
+ u32 btf_id;
+ u32 nr_args;
+};
+
+struct bpf_gtramp_link {
+ struct bpf_link link;
+ struct bpf_gtramp_link_entry *entries;
+ u32 entry_cnt;
+};
+
struct bpf_tracing_link {
struct bpf_tramp_link link;
enum bpf_attach_type attach_type;
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index da4be23f03c3..be06dd76505a 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -13,6 +13,7 @@
#include <linux/bpf_verifier.h>
#include <linux/bpf_lsm.h>
#include <linux/delay.h>
+#include <linux/kfunc_md.h>
/* dummy _ops. The verifier will operate on target program's ops. */
const struct bpf_verifier_ops bpf_extension_verifier_ops = {
@@ -29,6 +30,10 @@ static struct hlist_head trampoline_table[TRAMPOLINE_TABLE_SIZE];
/* serializes access to trampoline_table */
static DEFINE_MUTEX(trampoline_mutex);
+struct bpf_global_trampoline global_tr;
+static DECLARE_RWSEM(global_tr_lock);
+static const struct bpf_link_ops bpf_shim_tramp_link_lops;
+
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex);
@@ -645,6 +650,172 @@ int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
return err;
}
+#if defined(CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS) && defined(CONFIG_ARCH_HAS_BPF_GLOBAL_CALLER)
+static int bpf_gtrampoline_update(struct bpf_global_trampoline *tr)
+{
+ struct ftrace_ops *fops;
+ int ips_count, err = 0;
+ void **ips = NULL;
+
+ ips_count = kfunc_md_bpf_ips(&ips);
+ if (ips_count < 0) {
+ err = ips_count;
+ goto out;
+ }
+
+ fops = tr->fops;
+ if (ips_count == 0) {
+ if (!(fops->flags & FTRACE_OPS_FL_ENABLED))
+ goto out;
+ err = unregister_ftrace_direct(fops, (unsigned long)tr->image,
+ true);
+ goto out;
+ }
+
+ if (fops->flags & FTRACE_OPS_FL_ENABLED) {
+ err = reset_ftrace_direct_ips(fops, (unsigned long *)ips,
+ ips_count);
+ goto out;
+ }
+
+ err = ftrace_set_filter_ips(tr->fops, (unsigned long *)ips,
+ ips_count, 0, 1);
+ if (err)
+ goto out;
+
+ err = register_ftrace_direct(fops, (unsigned long)tr->image);
+out:
+ kfree(ips);
+
+ return err;
+}
+#else
+static int bpf_gtrampoline_update(struct bpf_global_trampoline *tr)
+{
+ return -ENODEV;
+}
+#endif
+
+static int __bpf_gtrampoline_unlink_prog(struct bpf_gtramp_link *link,
+ u32 cnt)
+{
+ enum bpf_tramp_prog_type kind;
+ struct kfunc_md *md;
+ int err = 0;
+
+ kind = bpf_attach_type_to_tramp(link->link.prog);
+ kfunc_md_lock();
+ for (int i = 0; i < cnt; i++) {
+ md = kfunc_md_get_noref((long)link->entries[i].addr);
+ if (WARN_ON_ONCE(!md)) {
+ err = -EINVAL;
+ break;
+ }
+
+ if (md->tramp)
+ bpf_gtrampoline_remove(md->tramp, link->link.prog, false);
+
+ md->flags &= ~KFUNC_MD_FL_BPF_REMOVING;
+ err = kfunc_md_bpf_unlink(md, link->link.prog, kind);
+ kfunc_md_put_entry(md);
+ if (err)
+ break;
+ }
+ kfunc_md_unlock();
+
+ return err;
+}
+
+int bpf_gtrampoline_unlink_prog(struct bpf_gtramp_link *link)
+{
+ struct kfunc_md *md;
+ int err;
+
+
+ /* hold the global trampoline lock, to make the target functions
+ * consist during we unlink the prog.
+ */
+ down_read(&global_tr_lock);
+ /* update the kfunc_md status, meanwhile update corresponding fops */
+ kfunc_md_lock();
+ for (int i = 0; i < link->entry_cnt; i++) {
+ md = kfunc_md_get_noref((long)link->entries[i].addr);
+ if (WARN_ON_ONCE(!md))
+ continue;
+
+ md->flags |= KFUNC_MD_FL_BPF_REMOVING;
+ }
+ kfunc_md_unlock();
+
+ bpf_gtrampoline_update(&global_tr);
+
+ /* update the ftrace filter first, then the corresponding kfunc_md */
+ err = __bpf_gtrampoline_unlink_prog(link, link->entry_cnt);
+ up_read(&global_tr_lock);
+
+ return err;
+}
+
+int bpf_gtrampoline_link_prog(struct bpf_gtramp_link *link)
+{
+ struct bpf_gtramp_link_entry *entry;
+ enum bpf_tramp_prog_type kind;
+ struct bpf_prog *prog;
+ struct kfunc_md *md;
+ bool update = false;
+ int err = 0, i;
+
+ prog = link->link.prog;
+ kind = bpf_attach_type_to_tramp(prog);
+
+ /* hold the global trampoline lock, to make the target functions
+ * consist during we link the prog.
+ */
+ down_read(&global_tr_lock);
+
+ /* update the bpf prog to all the corresponding function metadata */
+ for (i = 0; i < link->entry_cnt; i++) {
+ entry = &link->entries[i];
+ /* it seems that we hold this lock too long, we can use rcu
+ * lock instead.
+ */
+ kfunc_md_lock();
+ md = kfunc_md_create((long)entry->addr, entry->nr_args);
+ if (md) {
+ /* the function is not in the filter hash of gtr,
+ * we need update the global trampoline.
+ */
+ if (!md->bpf_prog_cnt)
+ update = true;
+ err = kfunc_md_bpf_link(md, prog, kind, entry->cookie);
+ } else {
+ err = -ENOMEM;
+ }
+
+ if (err) {
+ kfunc_md_put_entry(md);
+ kfunc_md_unlock();
+ goto on_fallback;
+ }
+ kfunc_md_unlock();
+ }
+
+ if (update) {
+ err = bpf_gtrampoline_update(&global_tr);
+ if (err)
+ goto on_fallback;
+ }
+ up_read(&global_tr_lock);
+
+ return 0;
+
+on_fallback:
+ __bpf_gtrampoline_unlink_prog(link, i);
+ up_read(&global_tr_lock);
+
+ return err;
+}
+
#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM)
static void bpf_shim_tramp_link_release(struct bpf_link *link)
{
@@ -1131,6 +1302,18 @@ static int __init init_trampolines(void)
{
int i;
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+ global_tr.fops = kzalloc(sizeof(struct ftrace_ops), GFP_KERNEL);
+ if (!global_tr.fops)
+ return -ENOMEM;
+
+ global_tr.fops->private = &global_tr;
+ global_tr.fops->ops_func = bpf_tramp_ftrace_ops_func;
+#endif
+#ifdef CONFIG_ARCH_HAS_BPF_GLOBAL_CALLER
+ global_tr.image = bpf_global_caller;
+#endif
+
for (i = 0; i < TRAMPOLINE_TABLE_SIZE; i++)
INIT_HLIST_HEAD(&trampoline_table[i]);
return 0;
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 09/25] bpf: tracing: add support to record and check the accessed args
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (7 preceding siblings ...)
2025-05-28 3:46 ` [PATCH bpf-next 08/25] bpf: introduce bpf_gtramp_link Menglong Dong
@ 2025-05-28 3:46 ` Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 10/25] bpf: refactor the modules_array to ptr_array Menglong Dong
` (17 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
In this commit, we add the 'accessed_args' field to struct bpf_prog_aux,
which is used to record the accessed index of the function args in
btf_ctx_access().
Meanwhile, we add the function btf_check_func_part_match() to compare the
accessed function args of two function prototype. This function will be
used in the following commit.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/linux/bpf.h | 4 ++
kernel/bpf/btf.c | 108 +++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 110 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7527399bab5b..abf504e95ff2 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1601,6 +1601,7 @@ struct bpf_prog_aux {
const struct btf_type *attach_func_proto;
/* function name for valid attach_btf_id */
const char *attach_func_name;
+ u64 accessed_args;
struct bpf_prog **func;
void *jit_data; /* JIT specific data. arch dependent */
struct bpf_jit_poke_descriptor *poke_tab;
@@ -2779,6 +2780,9 @@ struct bpf_reg_state;
int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog);
int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,
struct btf *btf, const struct btf_type *t);
+int btf_check_func_part_match(struct btf *btf1, const struct btf_type *t1,
+ struct btf *btf2, const struct btf_type *t2,
+ u64 func_args);
const char *btf_find_decl_tag_value(const struct btf *btf, const struct btf_type *pt,
int comp_idx, const char *tag_key);
int btf_find_next_decl_tag(const struct btf *btf, const struct btf_type *pt,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 0f7828380895..64538625ee91 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6392,19 +6392,24 @@ static bool is_void_or_int_ptr(struct btf *btf, const struct btf_type *t)
}
static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto,
- int off)
+ int off, int *aligned_idx)
{
const struct btf_param *args;
const struct btf_type *t;
u32 offset = 0, nr_args;
int i;
+ if (aligned_idx)
+ *aligned_idx = -ENOENT;
+
if (!func_proto)
return off / 8;
nr_args = btf_type_vlen(func_proto);
args = (const struct btf_param *)(func_proto + 1);
for (i = 0; i < nr_args; i++) {
+ if (aligned_idx && offset == off)
+ *aligned_idx = i;
t = btf_type_skip_modifiers(btf, args[i].type, NULL);
offset += btf_type_is_ptr(t) ? 8 : roundup(t->size, 8);
if (off < offset)
@@ -6671,7 +6676,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
tname, off);
return false;
}
- arg = get_ctx_arg_idx(btf, t, off);
+ arg = get_ctx_arg_idx(btf, t, off, NULL);
args = (const struct btf_param *)(t + 1);
/* if (t == NULL) Fall back to default BPF prog with
* MAX_BPF_FUNC_REG_ARGS u64 arguments.
@@ -6681,6 +6686,9 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
/* skip first 'void *__data' argument in btf_trace_##name typedef */
args++;
nr_args--;
+ prog->aux->accessed_args |= (1 << (arg + 1));
+ } else {
+ prog->aux->accessed_args |= (1 << arg);
}
if (arg > nr_args) {
@@ -7540,6 +7548,102 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
return btf_check_func_type_match(log, btf1, t1, btf2, t2);
}
+static u32 get_ctx_arg_total_size(struct btf *btf, const struct btf_type *t)
+{
+ const struct btf_param *args;
+ u32 size = 0, nr_args;
+ int i;
+
+ nr_args = btf_type_vlen(t);
+ args = (const struct btf_param *)(t + 1);
+ for (i = 0; i < nr_args; i++) {
+ t = btf_type_skip_modifiers(btf, args[i].type, NULL);
+ size += btf_type_is_ptr(t) ? 8 : roundup(t->size, 8);
+ }
+
+ return size;
+}
+
+/* This function is similar to btf_check_func_type_match(), except that it
+ * only compare some function args of the function prototype t1 and t2.
+ */
+int btf_check_func_part_match(struct btf *btf1, const struct btf_type *func1,
+ struct btf *btf2, const struct btf_type *func2,
+ u64 func_args)
+{
+ const struct btf_param *args1, *args2;
+ u32 nargs1, i, offset = 0;
+ const char *s1, *s2;
+
+ if (!btf_type_is_func_proto(func1) || !btf_type_is_func_proto(func2))
+ return -EINVAL;
+
+ args1 = (const struct btf_param *)(func1 + 1);
+ args2 = (const struct btf_param *)(func2 + 1);
+ nargs1 = btf_type_vlen(func1);
+
+ for (i = 0; i <= nargs1; i++) {
+ const struct btf_type *t1, *t2;
+
+ if (!(func_args & (1 << i)))
+ goto next;
+
+ if (i < nargs1) {
+ int t2_index;
+
+ /* get the index of the arg corresponding to args1[i]
+ * by the offset.
+ */
+ get_ctx_arg_idx(btf2, func2, offset, &t2_index);
+ if (t2_index < 0)
+ return -EINVAL;
+
+ t1 = btf_type_skip_modifiers(btf1, args1[i].type, NULL);
+ t2 = btf_type_skip_modifiers(btf2, args2[t2_index].type,
+ NULL);
+ } else {
+ /* i == nargs1, this is the index of return value of t1 */
+ if (get_ctx_arg_total_size(btf1, func1) !=
+ get_ctx_arg_total_size(btf2, func2))
+ return -EINVAL;
+
+ /* check the return type of t1 and t2 */
+ t1 = btf_type_skip_modifiers(btf1, func1->type, NULL);
+ t2 = btf_type_skip_modifiers(btf2, func2->type, NULL);
+ }
+
+ if (t1->info != t2->info ||
+ (btf_type_has_size(t1) && t1->size != t2->size))
+ return -EINVAL;
+ if (btf_type_is_int(t1) || btf_is_any_enum(t1))
+ goto next;
+
+ if (btf_type_is_struct(t1))
+ goto on_struct;
+
+ if (!btf_type_is_ptr(t1))
+ return -EINVAL;
+
+ t1 = btf_type_skip_modifiers(btf1, t1->type, NULL);
+ t2 = btf_type_skip_modifiers(btf2, t2->type, NULL);
+ if (!btf_type_is_struct(t1) || !btf_type_is_struct(t2))
+ return -EINVAL;
+
+on_struct:
+ s1 = btf_name_by_offset(btf1, t1->name_off);
+ s2 = btf_name_by_offset(btf2, t2->name_off);
+ if (strcmp(s1, s2))
+ return -EINVAL;
+next:
+ if (i < nargs1) {
+ t1 = btf_type_skip_modifiers(btf1, args1[i].type, NULL);
+ offset += btf_type_is_ptr(t1) ? 8 : roundup(t1->size, 8);
+ }
+ }
+
+ return 0;
+}
+
static bool btf_is_dynptr_ptr(const struct btf *btf, const struct btf_type *t)
{
const char *name;
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 10/25] bpf: refactor the modules_array to ptr_array
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (8 preceding siblings ...)
2025-05-28 3:46 ` [PATCH bpf-next 09/25] bpf: tracing: add support to record and check the accessed args Menglong Dong
@ 2025-05-28 3:46 ` Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 11/25] bpf: verifier: add btf to the function args of bpf_check_attach_target Menglong Dong
` (16 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
Refactor the struct modules_array to more general struct ptr_array, which
is used to store the pointers.
Meanwhile, introduce the bpf_try_add_ptr(), which checks the existing of
the ptr before adding it to the array.
Seems it should be moved to another files in "lib", and I'm not sure where
to add it now, and let's move it to kernel/bpf/syscall.c for now.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/linux/bpf.h | 10 +++++++++
kernel/bpf/syscall.c | 36 ++++++++++++++++++++++++++++++
kernel/trace/bpf_trace.c | 48 ++++++----------------------------------
3 files changed, 53 insertions(+), 41 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index abf504e95ff2..c35da9d91125 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -315,6 +315,16 @@ struct bpf_map {
s64 __percpu *elem_count;
};
+struct ptr_array {
+ void **ptrs;
+ int cnt;
+ int cap;
+};
+
+int bpf_add_ptr(struct ptr_array *arr, void *ptr);
+bool bpf_has_ptr(struct ptr_array *arr, struct module *mod);
+int bpf_try_add_ptr(struct ptr_array *arr, void *ptr);
+
static inline const char *btf_field_type_name(enum btf_field_type type)
{
switch (type) {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4b5f29168618..e22a23aa03d1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -619,6 +619,42 @@ int bpf_map_alloc_pages(const struct bpf_map *map, int nid,
return ret;
}
+int bpf_add_ptr(struct ptr_array *arr, void *ptr)
+{
+ void **ptrs;
+
+ if (arr->cnt == arr->cap) {
+ arr->cap = max(16, arr->cap * 3 / 2);
+ ptrs = krealloc_array(arr->ptrs, arr->cap, sizeof(*ptrs), GFP_KERNEL);
+ if (!ptrs)
+ return -ENOMEM;
+ arr->ptrs = ptrs;
+ }
+
+ arr->ptrs[arr->cnt] = ptr;
+ arr->cnt++;
+ return 0;
+}
+
+bool bpf_has_ptr(struct ptr_array *arr, struct module *mod)
+{
+ int i;
+
+ for (i = arr->cnt - 1; i >= 0; i--) {
+ if (arr->ptrs[i] == mod)
+ return true;
+ }
+ return false;
+}
+
+int bpf_try_add_ptr(struct ptr_array *arr, void *ptr)
+{
+ if (bpf_has_ptr(arr, ptr))
+ return -EEXIST;
+ if (bpf_add_ptr(arr, ptr))
+ return -ENOMEM;
+ return 0;
+}
static int btf_field_cmp(const void *a, const void *b)
{
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 132c8be6f635..8f134f291b81 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2779,43 +2779,9 @@ static void symbols_swap_r(void *a, void *b, int size, const void *priv)
}
}
-struct modules_array {
- struct module **mods;
- int mods_cnt;
- int mods_cap;
-};
-
-static int add_module(struct modules_array *arr, struct module *mod)
-{
- struct module **mods;
-
- if (arr->mods_cnt == arr->mods_cap) {
- arr->mods_cap = max(16, arr->mods_cap * 3 / 2);
- mods = krealloc_array(arr->mods, arr->mods_cap, sizeof(*mods), GFP_KERNEL);
- if (!mods)
- return -ENOMEM;
- arr->mods = mods;
- }
-
- arr->mods[arr->mods_cnt] = mod;
- arr->mods_cnt++;
- return 0;
-}
-
-static bool has_module(struct modules_array *arr, struct module *mod)
-{
- int i;
-
- for (i = arr->mods_cnt - 1; i >= 0; i--) {
- if (arr->mods[i] == mod)
- return true;
- }
- return false;
-}
-
static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u32 addrs_cnt)
{
- struct modules_array arr = {};
+ struct ptr_array arr = {};
u32 i, err = 0;
for (i = 0; i < addrs_cnt; i++) {
@@ -2825,7 +2791,7 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
scoped_guard(rcu) {
mod = __module_address(addrs[i]);
/* Either no module or it's already stored */
- if (!mod || has_module(&arr, mod)) {
+ if (!mod || bpf_has_ptr(&arr, mod)) {
skip_add = true;
break; /* scoped_guard */
}
@@ -2836,7 +2802,7 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
continue;
if (err)
break;
- err = add_module(&arr, mod);
+ err = bpf_add_ptr(&arr, mod);
if (err) {
module_put(mod);
break;
@@ -2845,14 +2811,14 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
/* We return either err < 0 in case of error, ... */
if (err) {
- kprobe_multi_put_modules(arr.mods, arr.mods_cnt);
- kfree(arr.mods);
+ kprobe_multi_put_modules((struct module **)arr.ptrs, arr.cnt);
+ kfree(arr.ptrs);
return err;
}
/* or number of modules found if everything is ok. */
- *mods = arr.mods;
- return arr.mods_cnt;
+ *mods = (struct module **)arr.ptrs;
+ return arr.cnt;
}
static int addrs_check_error_injection_list(unsigned long *addrs, u32 cnt)
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 11/25] bpf: verifier: add btf to the function args of bpf_check_attach_target
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (9 preceding siblings ...)
2025-05-28 3:46 ` [PATCH bpf-next 10/25] bpf: refactor the modules_array to ptr_array Menglong Dong
@ 2025-05-28 3:46 ` Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 12/25] bpf: verifier: move btf_id_deny to bpf_check_attach_target Menglong Dong
` (15 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
Add target btf to the function args of bpf_check_attach_target(), then
the caller can specify the btf to check.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/linux/bpf_verifier.h | 1 +
kernel/bpf/syscall.c | 6 ++++--
kernel/bpf/trampoline.c | 1 +
kernel/bpf/verifier.c | 8 +++++---
4 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 78c97e12ea4e..85ed52a4e50b 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -894,6 +894,7 @@ static inline void bpf_trampoline_unpack_key(u64 key, u32 *obj_id, u32 *btf_id)
int bpf_check_attach_target(struct bpf_verifier_log *log,
const struct bpf_prog *prog,
const struct bpf_prog *tgt_prog,
+ struct btf *btf,
u32 btf_id,
struct bpf_attach_target_info *tgt_info);
void bpf_free_kfunc_btf_tab(struct bpf_kfunc_btf_tab *tab);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e22a23aa03d1..60865a27d7d3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3593,9 +3593,11 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
* need a new trampoline and a check for compatibility
*/
struct bpf_attach_target_info tgt_info = {};
+ struct btf *btf;
- err = bpf_check_attach_target(NULL, prog, tgt_prog, btf_id,
- &tgt_info);
+ btf = tgt_prog ? tgt_prog->aux->btf : prog->aux->attach_btf;
+ err = bpf_check_attach_target(NULL, prog, tgt_prog, btf,
+ btf_id, &tgt_info);
if (err)
goto out_unlock;
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index be06dd76505a..3d7fd59107ed 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -907,6 +907,7 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
int err;
err = bpf_check_attach_target(NULL, prog, NULL,
+ prog->aux->attach_btf,
prog->aux->attach_btf_id,
&tgt_info);
if (err)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d5807d2efc92..b3927db15254 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -23078,6 +23078,7 @@ static int check_non_sleepable_error_inject(u32 btf_id)
int bpf_check_attach_target(struct bpf_verifier_log *log,
const struct bpf_prog *prog,
const struct bpf_prog *tgt_prog,
+ struct btf *btf,
u32 btf_id,
struct bpf_attach_target_info *tgt_info)
{
@@ -23090,7 +23091,6 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
const struct btf_type *t;
bool conservative = true;
const char *tname, *fname;
- struct btf *btf;
long addr = 0;
struct module *mod = NULL;
@@ -23098,7 +23098,6 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
bpf_log(log, "Tracing programs must provide btf_id\n");
return -EINVAL;
}
- btf = tgt_prog ? tgt_prog->aux->btf : prog->aux->attach_btf;
if (!btf) {
bpf_log(log,
"FENTRY/FEXIT program can only be attached to another program annotated with BTF\n");
@@ -23477,6 +23476,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
struct bpf_attach_target_info tgt_info = {};
u32 btf_id = prog->aux->attach_btf_id;
struct bpf_trampoline *tr;
+ struct btf *btf;
int ret;
u64 key;
@@ -23501,7 +23501,9 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
prog->type != BPF_PROG_TYPE_EXT)
return 0;
- ret = bpf_check_attach_target(&env->log, prog, tgt_prog, btf_id, &tgt_info);
+ btf = tgt_prog ? tgt_prog->aux->btf : prog->aux->attach_btf;
+ ret = bpf_check_attach_target(&env->log, prog, tgt_prog, btf,
+ btf_id, &tgt_info);
if (ret)
return ret;
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 12/25] bpf: verifier: move btf_id_deny to bpf_check_attach_target
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (10 preceding siblings ...)
2025-05-28 3:46 ` [PATCH bpf-next 11/25] bpf: verifier: add btf to the function args of bpf_check_attach_target Menglong Dong
@ 2025-05-28 3:46 ` Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 13/25] x86,bpf: factor out __arch_get_bpf_regs_nr Menglong Dong
` (14 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:46 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
Move the checking of btf_id_deny and noreturn_deny from
check_attach_btf_id() to bpf_check_attach_target(). Therefore, we can do
such checking during attaching for tracing multi-link in the later
patches.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
kernel/bpf/verifier.c | 125 ++++++++++++++++++++++--------------------
1 file changed, 65 insertions(+), 60 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b3927db15254..5d2e70425c1d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -23075,6 +23075,52 @@ static int check_non_sleepable_error_inject(u32 btf_id)
return btf_id_set_contains(&btf_non_sleepable_error_inject, btf_id);
}
+BTF_SET_START(btf_id_deny)
+BTF_ID_UNUSED
+#ifdef CONFIG_SMP
+BTF_ID(func, migrate_disable)
+BTF_ID(func, migrate_enable)
+#endif
+#if !defined CONFIG_PREEMPT_RCU && !defined CONFIG_TINY_RCU
+BTF_ID(func, rcu_read_unlock_strict)
+#endif
+#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
+BTF_ID(func, preempt_count_add)
+BTF_ID(func, preempt_count_sub)
+#endif
+#ifdef CONFIG_PREEMPT_RCU
+BTF_ID(func, __rcu_read_lock)
+BTF_ID(func, __rcu_read_unlock)
+#endif
+BTF_SET_END(btf_id_deny)
+
+/* fexit and fmod_ret can't be used to attach to __noreturn functions.
+ * Currently, we must manually list all __noreturn functions here. Once a more
+ * robust solution is implemented, this workaround can be removed.
+ */
+BTF_SET_START(noreturn_deny)
+#ifdef CONFIG_IA32_EMULATION
+BTF_ID(func, __ia32_sys_exit)
+BTF_ID(func, __ia32_sys_exit_group)
+#endif
+#ifdef CONFIG_KUNIT
+BTF_ID(func, __kunit_abort)
+BTF_ID(func, kunit_try_catch_throw)
+#endif
+#ifdef CONFIG_MODULES
+BTF_ID(func, __module_put_and_kthread_exit)
+#endif
+#ifdef CONFIG_X86_64
+BTF_ID(func, __x64_sys_exit)
+BTF_ID(func, __x64_sys_exit_group)
+#endif
+BTF_ID(func, do_exit)
+BTF_ID(func, do_group_exit)
+BTF_ID(func, kthread_complete_and_exit)
+BTF_ID(func, kthread_exit)
+BTF_ID(func, make_task_dead)
+BTF_SET_END(noreturn_deny)
+
int bpf_check_attach_target(struct bpf_verifier_log *log,
const struct bpf_prog *prog,
const struct bpf_prog *tgt_prog,
@@ -23398,6 +23444,25 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
break;
}
+
+ if (prog->type == BPF_PROG_TYPE_LSM) {
+ ret = bpf_lsm_verify_prog(log, prog);
+ if (ret < 0) {
+ module_put(mod);
+ return ret;
+ }
+ } else if (prog->type == BPF_PROG_TYPE_TRACING &&
+ btf_id_set_contains(&btf_id_deny, btf_id)) {
+ module_put(mod);
+ return -EINVAL;
+ } else if ((prog->expected_attach_type == BPF_TRACE_FEXIT ||
+ prog->expected_attach_type == BPF_MODIFY_RETURN) &&
+ btf_id_set_contains(&noreturn_deny, btf_id)) {
+ module_put(mod);
+ bpf_log(log, "Attaching fexit/fmod_ret to __noreturn functions is rejected.\n");
+ return -EINVAL;
+ }
+
tgt_info->tgt_addr = addr;
tgt_info->tgt_name = tname;
tgt_info->tgt_type = t;
@@ -23405,52 +23470,6 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
return 0;
}
-BTF_SET_START(btf_id_deny)
-BTF_ID_UNUSED
-#ifdef CONFIG_SMP
-BTF_ID(func, migrate_disable)
-BTF_ID(func, migrate_enable)
-#endif
-#if !defined CONFIG_PREEMPT_RCU && !defined CONFIG_TINY_RCU
-BTF_ID(func, rcu_read_unlock_strict)
-#endif
-#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
-BTF_ID(func, preempt_count_add)
-BTF_ID(func, preempt_count_sub)
-#endif
-#ifdef CONFIG_PREEMPT_RCU
-BTF_ID(func, __rcu_read_lock)
-BTF_ID(func, __rcu_read_unlock)
-#endif
-BTF_SET_END(btf_id_deny)
-
-/* fexit and fmod_ret can't be used to attach to __noreturn functions.
- * Currently, we must manually list all __noreturn functions here. Once a more
- * robust solution is implemented, this workaround can be removed.
- */
-BTF_SET_START(noreturn_deny)
-#ifdef CONFIG_IA32_EMULATION
-BTF_ID(func, __ia32_sys_exit)
-BTF_ID(func, __ia32_sys_exit_group)
-#endif
-#ifdef CONFIG_KUNIT
-BTF_ID(func, __kunit_abort)
-BTF_ID(func, kunit_try_catch_throw)
-#endif
-#ifdef CONFIG_MODULES
-BTF_ID(func, __module_put_and_kthread_exit)
-#endif
-#ifdef CONFIG_X86_64
-BTF_ID(func, __x64_sys_exit)
-BTF_ID(func, __x64_sys_exit_group)
-#endif
-BTF_ID(func, do_exit)
-BTF_ID(func, do_group_exit)
-BTF_ID(func, kthread_complete_and_exit)
-BTF_ID(func, kthread_exit)
-BTF_ID(func, make_task_dead)
-BTF_SET_END(noreturn_deny)
-
static bool can_be_sleepable(struct bpf_prog *prog)
{
if (prog->type == BPF_PROG_TYPE_TRACING) {
@@ -23533,20 +23552,6 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
return bpf_iter_prog_supported(prog);
}
- if (prog->type == BPF_PROG_TYPE_LSM) {
- ret = bpf_lsm_verify_prog(&env->log, prog);
- if (ret < 0)
- return ret;
- } else if (prog->type == BPF_PROG_TYPE_TRACING &&
- btf_id_set_contains(&btf_id_deny, btf_id)) {
- return -EINVAL;
- } else if ((prog->expected_attach_type == BPF_TRACE_FEXIT ||
- prog->expected_attach_type == BPF_MODIFY_RETURN) &&
- btf_id_set_contains(&noreturn_deny, btf_id)) {
- verbose(env, "Attaching fexit/fmod_ret to __noreturn functions is rejected.\n");
- return -EINVAL;
- }
-
key = bpf_trampoline_compute_key(tgt_prog, prog->aux->attach_btf, btf_id);
tr = bpf_trampoline_get(key, &tgt_info);
if (!tr)
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 13/25] x86,bpf: factor out __arch_get_bpf_regs_nr
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (11 preceding siblings ...)
2025-05-28 3:46 ` [PATCH bpf-next 12/25] bpf: verifier: move btf_id_deny to bpf_check_attach_target Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 14/25] bpf: tracing: add multi-link support Menglong Dong
` (13 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
Factor the function __arch_get_bpf_regs_nr() to get the regs count that
used by the function args.
The arch_get_bpf_regs_nr() will return -ENOTSUPP if the regs is not enough
to hold the function args.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
arch/x86/net/bpf_jit_comp.c | 36 +++++++++++++++++++++++++++++-------
include/linux/bpf.h | 1 +
kernel/bpf/verifier.c | 5 +++++
3 files changed, 35 insertions(+), 7 deletions(-)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 9e5fe2ba858f..84bb668f3bee 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -2945,6 +2945,33 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
return 0;
}
+static int __arch_bpf_get_regs_nr(const struct btf_func_model *m)
+{
+ int nr_regs = m->nr_args;
+
+ /* extra registers for struct arguments */
+ for (int i = 0; i < m->nr_args; i++) {
+ if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
+ nr_regs += (m->arg_size[i] + 7) / 8 - 1;
+ }
+
+ return nr_regs;
+}
+
+int arch_bpf_get_regs_nr(const struct btf_func_model *m)
+{
+ int nr_regs = __arch_bpf_get_regs_nr(m);
+
+ /* The maximum number of registers that can be used to pass
+ * arguments is 6. If the number of registers exceeds this,
+ * return -ENOTSUPP.
+ */
+ if (nr_regs > 6)
+ return -EOPNOTSUPP;
+
+ return nr_regs;
+}
+
/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
#define LOAD_TRAMP_TAIL_CALL_CNT_PTR(stack) \
__LOAD_TCC_PTR(-round_up(stack, 8) - 8)
@@ -3015,7 +3042,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
struct bpf_tramp_links *tlinks,
void *func_addr)
{
- int i, ret, nr_regs = m->nr_args, stack_size = 0;
+ int i, ret, nr_regs, stack_size = 0;
int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
@@ -3033,15 +3060,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
WARN_ON_ONCE((flags & BPF_TRAMP_F_INDIRECT) &&
(flags & ~(BPF_TRAMP_F_INDIRECT | BPF_TRAMP_F_RET_FENTRY_RET)));
- /* extra registers for struct arguments */
- for (i = 0; i < m->nr_args; i++) {
- if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
- nr_regs += (m->arg_size[i] + 7) / 8 - 1;
- }
-
/* x86-64 supports up to MAX_BPF_FUNC_ARGS arguments. 1-6
* are passed through regs, the remains are through stack.
*/
+ nr_regs = __arch_bpf_get_regs_nr(m);
if (nr_regs > MAX_BPF_FUNC_ARGS)
return -ENOTSUPP;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c35da9d91125..080bb966d026 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1221,6 +1221,7 @@ void arch_free_bpf_trampoline(void *image, unsigned int size);
int __must_check arch_protect_bpf_trampoline(void *image, unsigned int size);
int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
struct bpf_tramp_links *tlinks, void *func_addr);
+int arch_bpf_get_regs_nr(const struct btf_func_model *m);
u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog,
struct bpf_tramp_run_ctx *run_ctx);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5d2e70425c1d..9c4e29bc98c0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -22901,6 +22901,11 @@ static int do_check_main(struct bpf_verifier_env *env)
}
+int __weak arch_bpf_get_regs_nr(const struct btf_func_model *m)
+{
+ return -ENODEV;
+}
+
static void print_verification_stats(struct bpf_verifier_env *env)
{
int i;
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 14/25] bpf: tracing: add multi-link support
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (12 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 13/25] x86,bpf: factor out __arch_get_bpf_regs_nr Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 11:34 ` kernel test robot
2025-05-28 3:47 ` [PATCH bpf-next 15/25] ftrace: factor out __unregister_ftrace_direct Menglong Dong
` (12 subsequent siblings)
26 siblings, 1 reply; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
In this commit, we add the support to allow attaching a tracing BPF
program to multi hooks, which is similar to BPF_TRACE_KPROBE_MULTI.
The use case is obvious. For now, we have to create a BPF program for each
kernel function, for which we want to trace, even through all the program
have the same (or similar logic). This can consume extra memory, and make
the program loading slow if we have plenty of kernel function to trace.
The KPROBE_MULTI maybe a alternative, but it can't do what TRACING do. For
example, the kretprobe can't obtain the function args, but the FEXIT can.
For now, we support to create multi-link for fentry/fexit/modify_return
with the following new attach types that we introduce:
BPF_TRACE_FENTRY_MULTI
BPF_TRACE_FEXIT_MULTI
BPF_MODIFY_RETURN_MULTI
We introduce the struct bpf_tracing_multi_link for this purpose, which
can hold all the kernel modules, target bpf program (for attaching to bpf
program) or target btf (for attaching to kernel function) that we
referenced.
During loading, the first target is used for verification by the verifer.
And during attaching, we check the consistency of all the targets with
the first target.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/linux/bpf.h | 11 +
include/linux/bpf_types.h | 1 +
include/uapi/linux/bpf.h | 10 +
kernel/bpf/btf.c | 5 +
kernel/bpf/syscall.c | 365 +++++++++++++++++++++++++++++++++
kernel/bpf/trampoline.c | 7 +-
kernel/bpf/verifier.c | 25 ++-
net/bpf/test_run.c | 3 +
net/core/bpf_sk_storage.c | 2 +
tools/include/uapi/linux/bpf.h | 10 +
10 files changed, 435 insertions(+), 4 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 080bb966d026..7191ad25d519 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1802,6 +1802,17 @@ struct bpf_raw_tp_link {
u64 cookie;
};
+struct bpf_tracing_multi_link {
+ struct bpf_gtramp_link link;
+ enum bpf_attach_type attach_type;
+ struct bpf_prog **tgt_progs;
+ struct btf **tgt_btfs;
+ struct module **mods;
+ u32 prog_cnt;
+ u32 btf_cnt;
+ u32 mods_cnt;
+};
+
struct bpf_link_primer {
struct bpf_link *link;
struct file *file;
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index fa78f49d4a9a..139d5436ce4c 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -154,3 +154,4 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi)
BPF_LINK_TYPE(BPF_LINK_TYPE_STRUCT_OPS, struct_ops)
BPF_LINK_TYPE(BPF_LINK_TYPE_UPROBE_MULTI, uprobe_multi)
+BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING_MULTI, tracing_multi)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 16e95398c91c..45dfaf40230e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1120,6 +1120,9 @@ enum bpf_attach_type {
BPF_NETKIT_PEER,
BPF_TRACE_KPROBE_SESSION,
BPF_TRACE_UPROBE_SESSION,
+ BPF_TRACE_FENTRY_MULTI,
+ BPF_TRACE_FEXIT_MULTI,
+ BPF_MODIFY_RETURN_MULTI,
__MAX_BPF_ATTACH_TYPE
};
@@ -1144,6 +1147,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_UPROBE_MULTI = 12,
BPF_LINK_TYPE_NETKIT = 13,
BPF_LINK_TYPE_SOCKMAP = 14,
+ BPF_LINK_TYPE_TRACING_MULTI = 15,
__MAX_BPF_LINK_TYPE,
};
@@ -1765,6 +1769,12 @@ union bpf_attr {
*/
__u64 cookie;
} tracing;
+ struct {
+ __u32 cnt;
+ __aligned_u64 tgt_fds;
+ __aligned_u64 btf_ids;
+ __aligned_u64 cookies;
+ } tracing_multi;
struct {
__u32 pf;
__u32 hooknum;
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 64538625ee91..03045f7d428f 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6100,6 +6100,9 @@ static int btf_validate_prog_ctx_type(struct bpf_verifier_log *log, const struct
case BPF_TRACE_FENTRY:
case BPF_TRACE_FEXIT:
case BPF_MODIFY_RETURN:
+ case BPF_TRACE_FENTRY_MULTI:
+ case BPF_TRACE_FEXIT_MULTI:
+ case BPF_MODIFY_RETURN_MULTI:
/* allow u64* as ctx */
if (btf_is_int(t) && t->size == 8)
return 0;
@@ -6705,6 +6708,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
fallthrough;
case BPF_LSM_CGROUP:
case BPF_TRACE_FEXIT:
+ case BPF_TRACE_FEXIT_MULTI:
/* When LSM programs are attached to void LSM hooks
* they use FEXIT trampolines and when attached to
* int LSM hooks, they use MODIFY_RETURN trampolines.
@@ -6723,6 +6727,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
t = btf_type_by_id(btf, t->type);
break;
case BPF_MODIFY_RETURN:
+ case BPF_MODIFY_RETURN_MULTI:
/* For now the BPF_MODIFY_RETURN can only be attached to
* functions that return an int.
*/
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 60865a27d7d3..0cd989381128 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -37,6 +37,7 @@
#include <linux/trace_events.h>
#include <linux/tracepoint.h>
#include <linux/overflow.h>
+#include <linux/kfunc_md.h>
#include <net/netfilter/nf_bpf_link.h>
#include <net/netkit.h>
@@ -3466,6 +3467,34 @@ static const struct bpf_link_ops bpf_tracing_link_lops = {
.fill_link_info = bpf_tracing_link_fill_link_info,
};
+static int bpf_tracing_check_multi(struct bpf_prog *prog,
+ struct bpf_prog *tgt_prog,
+ struct btf *btf2,
+ const struct btf_type *t2)
+{
+ const struct btf_type *t1;
+ struct btf *btf1;
+
+ /* this case is already valided in bpf_check_attach_target() */
+ if (prog->type == BPF_PROG_TYPE_EXT)
+ return 0;
+
+ btf1 = prog->aux->dst_prog ? prog->aux->dst_prog->aux->btf :
+ prog->aux->attach_btf;
+ if (!btf1)
+ return -EOPNOTSUPP;
+
+ btf2 = btf2 ?: tgt_prog->aux->btf;
+ t1 = prog->aux->attach_func_proto;
+
+ /* the target is the same as the origin one, this is a re-attach */
+ if (t1 == t2)
+ return 0;
+
+ return btf_check_func_part_match(btf1, t1, btf2, t2,
+ prog->aux->accessed_args);
+}
+
static int bpf_tracing_prog_attach(struct bpf_prog *prog,
int tgt_prog_fd,
u32 btf_id,
@@ -3665,6 +3694,335 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
return err;
}
+static void __bpf_tracing_multi_link_release(struct bpf_tracing_multi_link *link)
+{
+ int i;
+
+ if (link->mods_cnt) {
+ for (i = 0; i < link->mods_cnt; i++)
+ module_put(link->mods[i]);
+ kfree(link->mods);
+ }
+
+ if (link->prog_cnt) {
+ for (i = 0; i < link->prog_cnt; i++)
+ bpf_prog_put(link->tgt_progs[i]);
+ kfree(link->tgt_progs);
+ }
+
+ if (link->btf_cnt) {
+ for (i = 0; i < link->btf_cnt; i++)
+ btf_put(link->tgt_btfs[i]);
+ kfree(link->tgt_btfs);
+ }
+
+ kfree(link->link.entries);
+}
+
+static void bpf_tracing_multi_link_release(struct bpf_link *link)
+{
+ struct bpf_tracing_multi_link *multi_link =
+ container_of(link, struct bpf_tracing_multi_link, link.link);
+
+ bpf_gtrampoline_unlink_prog(&multi_link->link);
+ __bpf_tracing_multi_link_release(multi_link);
+}
+
+static void bpf_tracing_multi_link_dealloc(struct bpf_link *link)
+{
+ struct bpf_tracing_multi_link *tr_link =
+ container_of(link, struct bpf_tracing_multi_link, link.link);
+
+ kfree(tr_link);
+}
+
+static void bpf_tracing_multi_link_show_fdinfo(const struct bpf_link *link,
+ struct seq_file *seq)
+{
+ struct bpf_tracing_multi_link *tr_link =
+ container_of(link, struct bpf_tracing_multi_link, link.link);
+ int i;
+
+ for (i = 0; i < tr_link->link.entry_cnt; i++) {
+ seq_printf(seq,
+ "attach_type:\t%d\n"
+ "target_addr:\t%p\n",
+ tr_link->attach_type,
+ tr_link->link.entries[i].addr);
+ }
+}
+
+static const struct bpf_link_ops bpf_tracing_multi_link_lops = {
+ .release = bpf_tracing_multi_link_release,
+ .dealloc = bpf_tracing_multi_link_dealloc,
+ .show_fdinfo = bpf_tracing_multi_link_show_fdinfo,
+};
+
+#define MAX_TRACING_MULTI_CNT 102400
+
+static int bpf_tracing_get_target(u32 fd, struct bpf_prog **tgt_prog,
+ struct btf **tgt_btf)
+{
+ struct bpf_prog *prog = NULL;
+ struct btf *btf = NULL;
+ int err = 0;
+
+ if (fd) {
+ prog = bpf_prog_get(fd);
+ if (!IS_ERR(prog))
+ goto found;
+
+ prog = NULL;
+ /* "fd" is the fd of the kernel module BTF */
+ btf = btf_get_by_fd(fd);
+ if (IS_ERR(btf)) {
+ err = PTR_ERR(btf);
+ goto err;
+ }
+ if (!btf_is_kernel(btf)) {
+ btf_put(btf);
+ err = -EOPNOTSUPP;
+ goto err;
+ }
+ } else {
+ btf = bpf_get_btf_vmlinux();
+ if (IS_ERR(btf)) {
+ err = PTR_ERR(btf);
+ goto err;
+ }
+ if (!btf) {
+ err = -EINVAL;
+ goto err;
+ }
+ btf_get(btf);
+ }
+found:
+ *tgt_prog = prog;
+ *tgt_btf = btf;
+ return 0;
+err:
+ *tgt_prog = NULL;
+ *tgt_btf = NULL;
+ return err;
+}
+
+static int bpf_tracing_multi_link_check(const union bpf_attr *attr, u32 **btf_ids,
+ u32 **tgt_fds, u64 **cookies,
+ u32 cnt)
+{
+ void __user *ubtf_ids;
+ void __user *utgt_fds;
+ void __user *ucookies;
+ void *tmp;
+ int i;
+
+ if (!cnt)
+ return -EINVAL;
+
+ if (cnt > MAX_TRACING_MULTI_CNT)
+ return -E2BIG;
+
+ ucookies = u64_to_user_ptr(attr->link_create.tracing_multi.cookies);
+ if (ucookies) {
+ tmp = kvmalloc_array(cnt, sizeof(**cookies), GFP_KERNEL);
+ if (!tmp)
+ return -ENOMEM;
+
+ *cookies = tmp;
+ if (copy_from_user(tmp, ucookies, cnt * sizeof(**cookies)))
+ return -EFAULT;
+ }
+
+ utgt_fds = u64_to_user_ptr(attr->link_create.tracing_multi.tgt_fds);
+ if (utgt_fds) {
+ tmp = kvmalloc_array(cnt, sizeof(**tgt_fds), GFP_KERNEL);
+ if (!tmp)
+ return -ENOMEM;
+
+ *tgt_fds = tmp;
+ if (copy_from_user(tmp, utgt_fds, cnt * sizeof(**tgt_fds)))
+ return -EFAULT;
+ }
+
+ ubtf_ids = u64_to_user_ptr(attr->link_create.tracing_multi.btf_ids);
+ if (!ubtf_ids)
+ return -EINVAL;
+
+ tmp = kvmalloc_array(cnt, sizeof(**btf_ids), GFP_KERNEL);
+ if (!tmp)
+ return -ENOMEM;
+
+ *btf_ids = tmp;
+ if (copy_from_user(tmp, ubtf_ids, cnt * sizeof(**btf_ids)))
+ return -EFAULT;
+
+ for (i = 0; i < cnt; i++) {
+ if (!(*btf_ids)[i])
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void bpf_tracing_multi_link_ptr_fill(struct bpf_tracing_multi_link *link,
+ struct ptr_array *progs,
+ struct ptr_array *mods,
+ struct ptr_array *btfs)
+{
+ link->mods = (struct module **) mods->ptrs;
+ link->mods_cnt = mods->cnt;
+ link->tgt_btfs = (struct btf **) btfs->ptrs;
+ link->btf_cnt = btfs->cnt;
+ link->tgt_progs = (struct bpf_prog **) progs->ptrs;
+ link->prog_cnt = progs->cnt;
+}
+
+static int bpf_tracing_prog_attach_multi(const union bpf_attr *attr,
+ struct bpf_prog *prog)
+{
+ struct bpf_tracing_multi_link *link = NULL;
+ u32 cnt, *btf_ids = NULL, *tgt_fds = NULL;
+ struct bpf_link_primer link_primer;
+ struct ptr_array prog_array = { };
+ struct ptr_array btf_array = { };
+ struct ptr_array mod_array = { };
+ u64 *cookies = NULL;
+ int err = 0, i;
+
+ if ((prog->expected_attach_type != BPF_TRACE_FENTRY_MULTI &&
+ prog->expected_attach_type != BPF_TRACE_FEXIT_MULTI &&
+ prog->expected_attach_type != BPF_MODIFY_RETURN_MULTI) ||
+ prog->type != BPF_PROG_TYPE_TRACING)
+ return -EINVAL;
+
+ cnt = attr->link_create.tracing_multi.cnt;
+ err = bpf_tracing_multi_link_check(attr, &btf_ids, &tgt_fds, &cookies,
+ cnt);
+ if (err)
+ goto err_out;
+
+ link = kzalloc(sizeof(*link), GFP_USER);
+ if (!link) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+ link->link.entries = kzalloc(sizeof(*link->link.entries) * cnt,
+ GFP_USER);
+ if (!link->link.entries) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ bpf_link_init(&link->link.link, BPF_LINK_TYPE_TRACING_MULTI,
+ &bpf_tracing_multi_link_lops, prog);
+ link->attach_type = prog->expected_attach_type;
+
+ mutex_lock(&prog->aux->dst_mutex);
+
+ for (i = 0; i < cnt; i++) {
+ struct bpf_attach_target_info tgt_info = {};
+ struct bpf_gtramp_link_entry *entry;
+ struct bpf_prog *tgt_prog = NULL;
+ u32 tgt_fd, btf_id = btf_ids[i];
+ struct btf *tgt_btf = NULL;
+ struct module *mod = NULL;
+ int nr_regs;
+
+ entry = &link->link.entries[i];
+ tgt_fd = tgt_fds ? tgt_fds[i] : 0;
+ err = bpf_tracing_get_target(tgt_fd, &tgt_prog, &tgt_btf);
+ if (err)
+ goto err_out_unlock;
+
+ if (tgt_prog) {
+ err = bpf_try_add_ptr(&prog_array, tgt_prog);
+ if (err) {
+ bpf_prog_put(tgt_prog);
+ if (err != -EEXIST)
+ goto err_out_unlock;
+ }
+ }
+
+ if (tgt_btf) {
+ err = bpf_try_add_ptr(&btf_array, tgt_btf);
+ if (err) {
+ btf_put(tgt_btf);
+ if (err != -EEXIST)
+ goto err_out_unlock;
+ }
+ }
+
+ prog->aux->attach_tracing_prog = tgt_prog &&
+ tgt_prog->type == BPF_PROG_TYPE_TRACING &&
+ prog->type == BPF_PROG_TYPE_TRACING;
+
+ err = bpf_check_attach_target(NULL, prog, tgt_prog, tgt_btf,
+ btf_id, &tgt_info);
+ if (err)
+ goto err_out_unlock;
+
+ nr_regs = arch_bpf_get_regs_nr(&tgt_info.fmodel);
+ if (nr_regs < 0) {
+ err = nr_regs;
+ goto err_out_unlock;
+ }
+
+ mod = tgt_info.tgt_mod;
+ if (mod) {
+ err = bpf_try_add_ptr(&mod_array, mod);
+ if (err) {
+ module_put(mod);
+ if (err != -EEXIST)
+ goto err_out_unlock;
+ }
+ }
+
+ err = bpf_tracing_check_multi(prog, tgt_prog, tgt_btf,
+ tgt_info.tgt_type);
+ if (err)
+ goto err_out_unlock;
+
+ entry->cookie = cookies ? cookies[i] : 0;
+ entry->addr = (void *)tgt_info.tgt_addr;
+ entry->tgt_prog = tgt_prog;
+ entry->attach_btf = tgt_btf;
+ entry->btf_id = btf_id;
+ entry->nr_args = nr_regs;
+
+ link->link.entry_cnt++;
+ }
+
+ err = bpf_gtrampoline_link_prog(&link->link);
+ if (err)
+ goto err_out_unlock;
+
+ err = bpf_link_prime(&link->link.link, &link_primer);
+ if (err) {
+ bpf_gtrampoline_unlink_prog(&link->link);
+ goto err_out_unlock;
+ }
+
+ bpf_tracing_multi_link_ptr_fill(link, &prog_array, &mod_array,
+ &btf_array);
+ mutex_unlock(&prog->aux->dst_mutex);
+
+ kfree(btf_ids);
+ kfree(tgt_fds);
+ kfree(cookies);
+ return bpf_link_settle(&link_primer);
+err_out_unlock:
+ bpf_tracing_multi_link_ptr_fill(link, &prog_array, &mod_array,
+ &btf_array);
+ __bpf_tracing_multi_link_release(link);
+ mutex_unlock(&prog->aux->dst_mutex);
+err_out:
+ kfree(btf_ids);
+ kfree(tgt_fds);
+ kfree(cookies);
+ kfree(link);
+ return err;
+}
+
static void bpf_raw_tp_link_release(struct bpf_link *link)
{
struct bpf_raw_tp_link *raw_tp =
@@ -4133,6 +4491,9 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
case BPF_TRACE_FENTRY:
case BPF_TRACE_FEXIT:
case BPF_MODIFY_RETURN:
+ case BPF_TRACE_FENTRY_MULTI:
+ case BPF_TRACE_FEXIT_MULTI:
+ case BPF_MODIFY_RETURN_MULTI:
return BPF_PROG_TYPE_TRACING;
case BPF_LSM_MAC:
return BPF_PROG_TYPE_LSM;
@@ -5439,6 +5800,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
ret = bpf_iter_link_attach(attr, uattr, prog);
else if (prog->expected_attach_type == BPF_LSM_CGROUP)
ret = cgroup_bpf_link_attach(attr, prog);
+ else if (prog->expected_attach_type == BPF_TRACE_FENTRY_MULTI ||
+ prog->expected_attach_type == BPF_TRACE_FEXIT_MULTI ||
+ prog->expected_attach_type == BPF_MODIFY_RETURN_MULTI)
+ ret = bpf_tracing_prog_attach_multi(attr, prog);
else
ret = bpf_tracing_prog_attach(prog,
attr->link_create.target_fd,
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 3d7fd59107ed..b92d1d4f1033 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -116,7 +116,9 @@ bool bpf_prog_has_trampoline(const struct bpf_prog *prog)
return (ptype == BPF_PROG_TYPE_TRACING &&
(eatype == BPF_TRACE_FENTRY || eatype == BPF_TRACE_FEXIT ||
- eatype == BPF_MODIFY_RETURN)) ||
+ eatype == BPF_MODIFY_RETURN ||
+ eatype == BPF_TRACE_FENTRY_MULTI || eatype == BPF_TRACE_FEXIT_MULTI ||
+ eatype == BPF_MODIFY_RETURN_MULTI)) ||
(ptype == BPF_PROG_TYPE_LSM && eatype == BPF_LSM_MAC);
}
@@ -515,10 +517,13 @@ static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog)
{
switch (prog->expected_attach_type) {
case BPF_TRACE_FENTRY:
+ case BPF_TRACE_FENTRY_MULTI:
return BPF_TRAMP_FENTRY;
case BPF_MODIFY_RETURN:
+ case BPF_MODIFY_RETURN_MULTI:
return BPF_TRAMP_MODIFY_RETURN;
case BPF_TRACE_FEXIT:
+ case BPF_TRACE_FEXIT_MULTI:
return BPF_TRAMP_FEXIT;
case BPF_LSM_MAC:
if (!prog->aux->attach_func_proto->type)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9c4e29bc98c0..bbfe8ae39f3c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -16945,10 +16945,13 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char
switch (env->prog->expected_attach_type) {
case BPF_TRACE_FENTRY:
case BPF_TRACE_FEXIT:
+ case BPF_TRACE_FENTRY_MULTI:
+ case BPF_TRACE_FEXIT_MULTI:
range = retval_range(0, 0);
break;
case BPF_TRACE_RAW_TP:
case BPF_MODIFY_RETURN:
+ case BPF_MODIFY_RETURN_MULTI:
return 0;
case BPF_TRACE_ITER:
break;
@@ -22264,7 +22267,9 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
if (prog_type == BPF_PROG_TYPE_TRACING &&
insn->imm == BPF_FUNC_get_func_ret) {
if (eatype == BPF_TRACE_FEXIT ||
- eatype == BPF_MODIFY_RETURN) {
+ eatype == BPF_MODIFY_RETURN ||
+ eatype == BPF_TRACE_FEXIT_MULTI ||
+ eatype == BPF_MODIFY_RETURN_MULTI) {
/* Load nr_args from ctx - 8 */
insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
insn_buf[1] = BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 3);
@@ -23246,7 +23251,9 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
if (tgt_prog->type == BPF_PROG_TYPE_TRACING &&
prog_extension &&
(tgt_prog->expected_attach_type == BPF_TRACE_FENTRY ||
- tgt_prog->expected_attach_type == BPF_TRACE_FEXIT)) {
+ tgt_prog->expected_attach_type == BPF_TRACE_FEXIT ||
+ tgt_prog->expected_attach_type == BPF_TRACE_FENTRY_MULTI ||
+ tgt_prog->expected_attach_type == BPF_TRACE_FEXIT_MULTI)) {
/* Program extensions can extend all program types
* except fentry/fexit. The reason is the following.
* The fentry/fexit programs are used for performance
@@ -23345,6 +23352,9 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
case BPF_LSM_CGROUP:
case BPF_TRACE_FENTRY:
case BPF_TRACE_FEXIT:
+ case BPF_MODIFY_RETURN_MULTI:
+ case BPF_TRACE_FENTRY_MULTI:
+ case BPF_TRACE_FEXIT_MULTI:
if (!btf_type_is_func(t)) {
bpf_log(log, "attach_btf_id %u is not a function\n",
btf_id);
@@ -23430,7 +23440,8 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
bpf_log(log, "%s is not sleepable\n", tname);
return ret;
}
- } else if (prog->expected_attach_type == BPF_MODIFY_RETURN) {
+ } else if (prog->expected_attach_type == BPF_MODIFY_RETURN ||
+ prog->expected_attach_type == BPF_MODIFY_RETURN_MULTI) {
if (tgt_prog) {
module_put(mod);
bpf_log(log, "can't modify return codes of BPF programs\n");
@@ -23483,6 +23494,9 @@ static bool can_be_sleepable(struct bpf_prog *prog)
case BPF_TRACE_FEXIT:
case BPF_MODIFY_RETURN:
case BPF_TRACE_ITER:
+ case BPF_TRACE_FENTRY_MULTI:
+ case BPF_TRACE_FEXIT_MULTI:
+ case BPF_MODIFY_RETURN_MULTI:
return true;
default:
return false;
@@ -23557,6 +23571,11 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
return bpf_iter_prog_supported(prog);
}
+ if (prog->expected_attach_type == BPF_TRACE_FENTRY_MULTI ||
+ prog->expected_attach_type == BPF_TRACE_FEXIT_MULTI ||
+ prog->expected_attach_type == BPF_MODIFY_RETURN_MULTI)
+ return 0;
+
key = bpf_trampoline_compute_key(tgt_prog, prog->aux->attach_btf, btf_id);
tr = bpf_trampoline_get(key, &tgt_info);
if (!tr)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index aaf13a7d58ed..bcb609425fda 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -696,6 +696,8 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog,
switch (prog->expected_attach_type) {
case BPF_TRACE_FENTRY:
case BPF_TRACE_FEXIT:
+ case BPF_TRACE_FENTRY_MULTI:
+ case BPF_TRACE_FEXIT_MULTI:
if (bpf_fentry_test1(1) != 2 ||
bpf_fentry_test2(2, 3) != 5 ||
bpf_fentry_test3(4, 5, 6) != 15 ||
@@ -709,6 +711,7 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog,
goto out;
break;
case BPF_MODIFY_RETURN:
+ case BPF_MODIFY_RETURN_MULTI:
ret = bpf_modify_return_test(1, &b);
if (b != 2)
side_effect++;
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 2e538399757f..c5b1fd714b58 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -369,6 +369,8 @@ static bool bpf_sk_storage_tracing_allowed(const struct bpf_prog *prog)
return true;
case BPF_TRACE_FENTRY:
case BPF_TRACE_FEXIT:
+ case BPF_TRACE_FENTRY_MULTI:
+ case BPF_TRACE_FEXIT_MULTI:
return !!strncmp(prog->aux->attach_func_name, "bpf_sk_storage",
strlen("bpf_sk_storage"));
default:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 16e95398c91c..45dfaf40230e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1120,6 +1120,9 @@ enum bpf_attach_type {
BPF_NETKIT_PEER,
BPF_TRACE_KPROBE_SESSION,
BPF_TRACE_UPROBE_SESSION,
+ BPF_TRACE_FENTRY_MULTI,
+ BPF_TRACE_FEXIT_MULTI,
+ BPF_MODIFY_RETURN_MULTI,
__MAX_BPF_ATTACH_TYPE
};
@@ -1144,6 +1147,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_UPROBE_MULTI = 12,
BPF_LINK_TYPE_NETKIT = 13,
BPF_LINK_TYPE_SOCKMAP = 14,
+ BPF_LINK_TYPE_TRACING_MULTI = 15,
__MAX_BPF_LINK_TYPE,
};
@@ -1765,6 +1769,12 @@ union bpf_attr {
*/
__u64 cookie;
} tracing;
+ struct {
+ __u32 cnt;
+ __aligned_u64 tgt_fds;
+ __aligned_u64 btf_ids;
+ __aligned_u64 cookies;
+ } tracing_multi;
struct {
__u32 pf;
__u32 hooknum;
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 15/25] ftrace: factor out __unregister_ftrace_direct
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (13 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 14/25] bpf: tracing: add multi-link support Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 12:37 ` kernel test robot
2025-05-28 3:47 ` [PATCH bpf-next 16/25] ftrace: supporting replace direct ftrace_ops Menglong Dong
` (11 subsequent siblings)
26 siblings, 1 reply; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
Factor out __unregister_ftrace_direct, which doesn't hold the direct_mutex
lock.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
kernel/trace/ftrace.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 0befb4c93e89..5b6b74ea4c20 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -113,6 +113,8 @@ bool ftrace_pids_enabled(struct ftrace_ops *ops)
}
static void ftrace_update_trampoline(struct ftrace_ops *ops);
+static int __unregister_ftrace_direct(struct ftrace_ops *ops, unsigned long addr,
+ bool free_filters);
/*
* ftrace_disabled is set when an anomaly is discovered.
@@ -6046,8 +6048,8 @@ EXPORT_SYMBOL_GPL(register_ftrace_direct);
* 0 on success
* -EINVAL - The @ops object was not properly registered.
*/
-int unregister_ftrace_direct(struct ftrace_ops *ops, unsigned long addr,
- bool free_filters)
+static int __unregister_ftrace_direct(struct ftrace_ops *ops, unsigned long addr,
+ bool free_filters)
{
struct ftrace_hash *hash = ops->func_hash->filter_hash;
int err;
@@ -6057,10 +6059,8 @@ int unregister_ftrace_direct(struct ftrace_ops *ops, unsigned long addr,
if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
return -EINVAL;
- mutex_lock(&direct_mutex);
err = unregister_ftrace_function(ops);
remove_direct_functions_hash(hash, addr);
- mutex_unlock(&direct_mutex);
/* cleanup for possible another register call */
ops->func = NULL;
@@ -6070,6 +6070,18 @@ int unregister_ftrace_direct(struct ftrace_ops *ops, unsigned long addr,
ftrace_free_filter(ops);
return err;
}
+
+int unregister_ftrace_direct(struct ftrace_ops *ops, unsigned long addr,
+ bool free_filters)
+{
+ int err;
+
+ mutex_lock(&direct_mutex);
+ err = __unregister_ftrace_direct(ops, addr, free_filters);
+ mutex_unlock(&direct_mutex);
+
+ return err;
+}
EXPORT_SYMBOL_GPL(unregister_ftrace_direct);
static int
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 16/25] ftrace: supporting replace direct ftrace_ops
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (14 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 15/25] ftrace: factor out __unregister_ftrace_direct Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 17/25] bpf: make trampoline compatible with global trampoline Menglong Dong
` (10 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
Introduce the function replace_ftrace_direct(). This is used to replace
the direct ftrace_ops for a function, and will be used in the next patch.
Let's call the origin ftrace_ops A, and the new ftrace_ops B. First, we
register B directly, and the callback of the functions in A and B will
fallback to the ftrace_ops_list case.
Then, we modify the address of the entry in the direct_functions to
B->direct_call, and remove it from A. This will update the dyn_rec and
make the functions call b->direct_call directly. If no function in
A->filter_hash, just unregister it.
So a record can have more than one direct ftrace_ops, and we need check
if there is any direct ops for the record before remove the
FTRACE_OPS_FL_DIRECT in __ftrace_hash_rec_update().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/linux/ftrace.h | 8 ++++
kernel/trace/ftrace.c | 87 +++++++++++++++++++++++++++++++++++++++++-
2 files changed, 93 insertions(+), 2 deletions(-)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 40727d3f125d..1d162e331e99 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -528,6 +528,9 @@ void ftrace_stub_direct_tramp(void);
int reset_ftrace_direct_ips(struct ftrace_ops *ops, unsigned long *ips,
unsigned int cnt);
+int replace_ftrace_direct(struct ftrace_ops *ops, struct ftrace_ops *src_ops,
+ unsigned long addr);
+
#else
struct ftrace_ops;
static inline unsigned long ftrace_find_rec_direct(unsigned long ip)
@@ -556,6 +559,11 @@ static inline int reset_ftrace_direct_ips(struct ftrace_ops *ops, unsigned long
{
return -ENODEV;
}
+static inline int replace_ftrace_direct(struct ftrace_ops *ops, struct ftrace_ops *src_ops,
+ unsigned long addr)
+{
+ return -ENODEV;
+}
/*
* This must be implemented by the architecture.
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 5b6b74ea4c20..7f2313e4c3d9 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1727,6 +1727,24 @@ static bool skip_record(struct dyn_ftrace *rec)
!(rec->flags & FTRACE_FL_ENABLED);
}
+static struct ftrace_ops *
+ftrace_find_direct_ops_any_other(struct dyn_ftrace *rec, struct ftrace_ops *op_exclude)
+{
+ struct ftrace_ops *op;
+ unsigned long ip = rec->ip;
+
+ do_for_each_ftrace_op(op, ftrace_ops_list) {
+
+ if (op == op_exclude || !(op->flags & FTRACE_OPS_FL_DIRECT))
+ continue;
+
+ if (hash_contains_ip(ip, op->func_hash))
+ return op;
+ } while_for_each_ftrace_op(op);
+
+ return NULL;
+}
+
/*
* This is the main engine to the ftrace updates to the dyn_ftrace records.
*
@@ -1831,8 +1849,10 @@ static bool __ftrace_hash_rec_update(struct ftrace_ops *ops,
* function, then that function should no longer
* be direct.
*/
- if (ops->flags & FTRACE_OPS_FL_DIRECT)
- rec->flags &= ~FTRACE_FL_DIRECT;
+ if (ops->flags & FTRACE_OPS_FL_DIRECT) {
+ if (!ftrace_find_direct_ops_any_other(rec, ops))
+ rec->flags &= ~FTRACE_FL_DIRECT;
+ }
/*
* If the rec had REGS enabled and the ops that is
@@ -6033,6 +6053,69 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
}
EXPORT_SYMBOL_GPL(register_ftrace_direct);
+int replace_ftrace_direct(struct ftrace_ops *ops, struct ftrace_ops *src_ops,
+ unsigned long addr)
+{
+ struct ftrace_hash *hash;
+ struct ftrace_func_entry *entry, *iter;
+ int err = -EBUSY, size, count;
+
+ if (ops->func || ops->trampoline)
+ return -EINVAL;
+ if (!(ops->flags & FTRACE_OPS_FL_INITIALIZED))
+ return -EINVAL;
+ if (ops->flags & FTRACE_OPS_FL_ENABLED)
+ return -EINVAL;
+
+ hash = ops->func_hash->filter_hash;
+ if (ftrace_hash_empty(hash))
+ return -EINVAL;
+
+ mutex_lock(&direct_mutex);
+
+ ops->func = call_direct_funcs;
+ ops->flags = MULTI_FLAGS;
+ ops->trampoline = FTRACE_REGS_ADDR;
+ ops->direct_call = addr;
+
+ err = register_ftrace_function_nolock(ops);
+ if (err)
+ goto out_unlock;
+
+ hash = ops->func_hash->filter_hash;
+ size = 1 << hash->size_bits;
+ for (int i = 0; i < size; i++) {
+ hlist_for_each_entry(iter, &hash->buckets[i], hlist) {
+ entry = __ftrace_lookup_ip(direct_functions, iter->ip);
+ if (!entry) {
+ err = -ENOENT;
+ goto out_unlock;
+ }
+ WRITE_ONCE(entry->direct, addr);
+ /* remove the ip from the hash, and this will make the trampoline
+ * be called directly.
+ */
+ count = src_ops->func_hash->filter_hash->count;
+ if (count <= 1) {
+ if (WARN_ON_ONCE(!count))
+ continue;
+ err = __unregister_ftrace_direct(src_ops, src_ops->direct_call,
+ true);
+ } else {
+ err = ftrace_set_filter_ip(src_ops, iter->ip, 1, 0);
+ }
+ if (err)
+ goto out_unlock;
+ }
+ }
+
+out_unlock:
+ mutex_unlock(&direct_mutex);
+
+ return err;
+}
+EXPORT_SYMBOL_GPL(replace_ftrace_direct);
+
/**
* unregister_ftrace_direct - Remove calls to custom trampoline
* previously registered by register_ftrace_direct for @ops object.
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 17/25] bpf: make trampoline compatible with global trampoline
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (15 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 16/25] ftrace: supporting replace direct ftrace_ops Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 18/25] libbpf: don't free btf if tracing_multi progs existing Menglong Dong
` (9 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
For now, the bpf global trampoline can't work together with trampoline.
For example, we will fail on attaching the FENTRY_MULTI to the functions
that FENTRY exists, and FENTRY will also fail if FENTRY_MULTI exists.
We make the global trampoline work together with trampoline in this
commit.
It is not easy. The most difficult part is synchronization between
bpf_gtrampoline_link_prog and bpf_trampoline_link_prog, and we use a
rw_semaphore here, which is quite ugly. We hold the write lock in
bpf_gtrampoline_link_prog and read lock in bpf_trampoline_link_prog.
We introduce the function bpf_gtrampoline_link_tramp() to make
bpf_gtramp_link fit bpf_trampoline, which will be called in
bpf_gtrampoline_link_prog(). If the bpf_trampoline of the function exist
in the kfunc_md or we find it with bpf_trampoline_lookup_exist(), it means
that we need do the fitting. The fitting is simple, we create a
bpf_shim_tramp_link for our prog and link it to the bpf_trampoline with
__bpf_trampoline_link_prog().
It's a little complex for the bpf_trampoline_link_prog() case. We create
bpf_shim_tramp_link for all the bpf progs in kfunc_md and add it to the
bpf_trampoline before we call __bpf_trampoline_link_prog() in
bpf_gtrampoline_replace(). And we will fallback in
bpf_gtrampoline_replace_finish() if error is returned by
__bpf_trampoline_link_prog().
In __bpf_gtrampoline_unlink_prog(), we will call bpf_gtrampoline_remove()
to release the bpf_shim_tramp_link, and the bpf prog will be unlinked if
it is ever linked successfully in bpf_link_free().
Another solution is to fit into the existing trampoline. For example, we
can add the bpf prog to the kfunc_md if tracing_multi bpf prog is attached
on the target function when we attach a tracing bpf prog. And we can also
update the tracing_multi prog to the trampoline if tracing prog exists
on the target function. I think this will make the compatibility much
easier.
The code in this part is very ugly and messy, and I think it will be a
liberation to split it out to another series :/
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/linux/bpf.h | 6 +
include/linux/kfunc_md.h | 2 +
kernel/bpf/syscall.c | 2 +-
kernel/bpf/trampoline.c | 291 +++++++++++++++++++++++++++++++++++++--
kernel/trace/kfunc_md.c | 9 +-
5 files changed, 293 insertions(+), 17 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7191ad25d519..0f4605be87fc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1173,6 +1173,11 @@ struct btf_func_model {
*/
#define BPF_TRAMP_F_INDIRECT BIT(8)
+/* Indicate that bpf global trampoline is also used on this function and
+ * the trampoline is replacing it.
+ */
+#define BPF_TRAMP_F_REPLACE BIT(9)
+
/* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
* bytes on x86.
*/
@@ -2554,6 +2559,7 @@ void bpf_link_put(struct bpf_link *link);
int bpf_link_new_fd(struct bpf_link *link);
struct bpf_link *bpf_link_get_from_fd(u32 ufd);
struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
+void bpf_link_free(struct bpf_link *link);
void bpf_token_inc(struct bpf_token *token);
void bpf_token_put(struct bpf_token *token);
diff --git a/include/linux/kfunc_md.h b/include/linux/kfunc_md.h
index f1b1012eeab2..956e16f96d82 100644
--- a/include/linux/kfunc_md.h
+++ b/include/linux/kfunc_md.h
@@ -29,6 +29,8 @@ struct kfunc_md {
#endif
unsigned long func;
struct kfunc_md_tramp_prog *bpf_progs[BPF_TRAMP_MAX];
+ /* fallback case, there is already a trampoline on this function */
+ struct bpf_trampoline *tramp;
#ifdef CONFIG_FUNCTION_METADATA
/* the array is used for the fast mode */
struct kfunc_md_array *array;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0cd989381128..c1c92c2b2cfc 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3184,7 +3184,7 @@ static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu)
}
/* bpf_link_free is guaranteed to be called from process context */
-static void bpf_link_free(struct bpf_link *link)
+void bpf_link_free(struct bpf_link *link)
{
const struct bpf_link_ops *ops = link->ops;
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index b92d1d4f1033..81b62aae9faf 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -14,6 +14,7 @@
#include <linux/bpf_lsm.h>
#include <linux/delay.h>
#include <linux/kfunc_md.h>
+#include <linux/execmem.h>
/* dummy _ops. The verifier will operate on target program's ops. */
const struct bpf_verifier_ops bpf_extension_verifier_ops = {
@@ -142,20 +143,44 @@ void bpf_image_ksym_del(struct bpf_ksym *ksym)
PAGE_SIZE, true, ksym->name);
}
-static struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
+static struct bpf_trampoline *__bpf_trampoline_lookup_exist(u64 key)
{
struct bpf_trampoline *tr;
struct hlist_head *head;
- int i;
- mutex_lock(&trampoline_mutex);
head = &trampoline_table[hash_64(key, TRAMPOLINE_HASH_BITS)];
hlist_for_each_entry(tr, head, hlist) {
if (tr->key == key) {
refcount_inc(&tr->refcnt);
- goto out;
+ return tr;
}
}
+
+ return NULL;
+}
+
+static struct bpf_trampoline *bpf_trampoline_lookup_exist(u64 key)
+{
+ struct bpf_trampoline *tr;
+
+ mutex_lock(&trampoline_mutex);
+ tr = __bpf_trampoline_lookup_exist(key);
+ mutex_unlock(&trampoline_mutex);
+
+ return tr;
+}
+
+static struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
+{
+ struct bpf_trampoline *tr;
+ struct hlist_head *head;
+ int i;
+
+ mutex_lock(&trampoline_mutex);
+ tr = __bpf_trampoline_lookup_exist(key);
+ if (tr)
+ goto out;
+
tr = kzalloc(sizeof(*tr), GFP_KERNEL);
if (!tr)
goto out;
@@ -172,6 +197,7 @@ static struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
tr->key = key;
INIT_HLIST_NODE(&tr->hlist);
+ head = &trampoline_table[hash_64(key, TRAMPOLINE_HASH_BITS)];
hlist_add_head(&tr->hlist, head);
refcount_set(&tr->refcnt, 1);
mutex_init(&tr->mutex);
@@ -228,7 +254,11 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
if (tr->func.ftrace_managed) {
ftrace_set_filter_ip(tr->fops, (unsigned long)ip, 0, 1);
- ret = register_ftrace_direct(tr->fops, (long)new_addr);
+ if (tr->flags & BPF_TRAMP_F_REPLACE)
+ ret = replace_ftrace_direct(tr->fops, global_tr.fops,
+ (long)new_addr);
+ else
+ ret = register_ftrace_direct(tr->fops, (long)new_addr);
} else {
ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr);
}
@@ -236,6 +266,17 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
return ret;
}
+static int
+bpf_trampoline_get_count(const struct bpf_trampoline *tr)
+{
+ int count = 0;
+
+ for (int kind = 0; kind < BPF_TRAMP_MAX; kind++)
+ count += tr->progs_cnt[kind];
+
+ return count;
+}
+
static struct bpf_tramp_links *
bpf_trampoline_get_progs(const struct bpf_trampoline *tr, int *total, bool *ip_arg)
{
@@ -608,15 +649,173 @@ static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link,
return err;
}
+static int bpf_gtrampoline_get_link(struct bpf_trampoline *tr, struct bpf_prog *prog,
+ u64 cookie, int kind,
+ struct bpf_shim_tramp_link **link)
+{
+ struct bpf_shim_tramp_link *__link;
+
+ __link = kzalloc(sizeof(*__link), GFP_KERNEL);
+ if (!__link)
+ return -ENOMEM;
+
+ __link->link.cookie = cookie;
+
+ bpf_link_init(&__link->link.link, BPF_LINK_TYPE_UNSPEC,
+ &bpf_shim_tramp_link_lops, prog);
+
+ /* the bpf_shim_tramp_link will hold a reference on the prog and tr */
+ refcount_inc(&tr->refcnt);
+ bpf_prog_inc(prog);
+ *link = __link;
+
+ return 0;
+}
+
+static struct bpf_tramp_link *
+bpf_gtrampoline_find_link(struct bpf_trampoline *tr, struct bpf_prog *prog)
+{
+ struct bpf_tramp_link *link;
+
+ for (int kind = 0; kind < BPF_TRAMP_MAX; kind++) {
+ hlist_for_each_entry(link, &tr->progs_hlist[kind], tramp_hlist) {
+ if (link->link.prog == prog)
+ return link;
+ }
+ }
+
+ return NULL;
+}
+
+static int bpf_gtrampoline_remove(struct bpf_trampoline *tr, struct bpf_prog *prog,
+ bool remove_list)
+{
+ struct bpf_shim_tramp_link *slink;
+ int kind;
+
+ slink = (struct bpf_shim_tramp_link *)bpf_gtrampoline_find_link(tr, prog);
+ if (WARN_ON_ONCE(!slink))
+ return -EINVAL;
+
+ if (!slink->trampoline && remove_list) {
+ kind = bpf_attach_type_to_tramp(prog);
+ hlist_del_init(&slink->link.tramp_hlist);
+ tr->progs_cnt[kind]--;
+ }
+ bpf_link_free(&slink->link.link);
+
+ return 0;
+}
+
+static int bpf_gtrampoline_replace(struct bpf_trampoline *tr)
+{
+ struct kfunc_md_tramp_prog *progs;
+ struct bpf_shim_tramp_link *link;
+ struct kfunc_md *md;
+ int err = 0, count;
+
+ kfunc_md_lock();
+ md = kfunc_md_get((unsigned long)tr->func.addr);
+ if (!md || md->tramp) {
+ kfunc_md_put_entry(md);
+ kfunc_md_unlock();
+ return 0;
+ }
+ kfunc_md_unlock();
+
+ rcu_read_lock();
+ md = kfunc_md_get_noref((unsigned long)tr->func.addr);
+ if (!md || md->tramp)
+ goto on_fail;
+
+ count = bpf_trampoline_get_count(tr);
+ /* we are attaching a new link, so +1 here */
+ count += md->bpf_prog_cnt + 1;
+ if (count > BPF_MAX_TRAMP_LINKS) {
+ err = -E2BIG;
+ goto on_fail;
+ }
+
+ for (int kind = 0; kind < BPF_TRAMP_MAX; kind++) {
+ progs = md->bpf_progs[kind];
+ while (progs) {
+ err = bpf_gtrampoline_get_link(tr, progs->prog, progs->cookie,
+ kind, &link);
+ if (err)
+ goto on_fail;
+
+ hlist_add_head(&link->link.tramp_hlist, &tr->progs_hlist[kind]);
+ tr->progs_cnt[kind]++;
+ progs = progs->next;
+ link->trampoline = tr;
+ }
+ }
+
+ tr->flags |= BPF_TRAMP_F_REPLACE;
+ rcu_read_unlock();
+
+ return 0;
+
+on_fail:
+ kfunc_md_put_entry(md);
+ rcu_read_unlock();
+
+ return err;
+}
+
+static void bpf_gtrampoline_replace_finish(struct bpf_trampoline *tr, int err)
+{
+ struct kfunc_md_tramp_prog *progs;
+ struct kfunc_md *md;
+
+ if (!(tr->flags & BPF_TRAMP_F_REPLACE))
+ return;
+
+ kfunc_md_lock();
+ md = kfunc_md_get_noref((unsigned long)tr->func.addr);
+ /* this shouldn't happen, as the md->tramp can only be set with
+ * global_tr_lock.
+ */
+ if (WARN_ON_ONCE(!md || md->tramp))
+ return;
+
+ if (err) {
+ for (int kind = 0; kind < BPF_TRAMP_MAX; kind++) {
+ progs = md->bpf_progs[kind];
+ while (progs) {
+ /* the progs is already added to trampoline
+ * and we need clean it on this case.
+ */
+ bpf_gtrampoline_remove(tr, progs->prog, true);
+ progs = progs->next;
+ }
+ }
+ } else {
+ md->tramp = tr;
+ }
+
+ kfunc_md_put_entry(md);
+ kfunc_md_unlock();
+}
+
int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog)
{
int err;
- mutex_lock(&tr->mutex);
- err = __bpf_trampoline_link_prog(link, tr, tgt_prog);
- mutex_unlock(&tr->mutex);
+ down_read(&global_tr_lock);
+
+ err = bpf_gtrampoline_replace(tr);
+ if (!err) {
+ mutex_lock(&tr->mutex);
+ err = __bpf_trampoline_link_prog(link, tr, tgt_prog);
+ mutex_unlock(&tr->mutex);
+ }
+
+ bpf_gtrampoline_replace_finish(tr, err);
+ up_read(&global_tr_lock);
+
return err;
}
@@ -745,7 +944,7 @@ int bpf_gtrampoline_unlink_prog(struct bpf_gtramp_link *link)
kfunc_md_lock();
for (int i = 0; i < link->entry_cnt; i++) {
md = kfunc_md_get_noref((long)link->entries[i].addr);
- if (WARN_ON_ONCE(!md))
+ if (WARN_ON_ONCE(!md) || md->tramp)
continue;
md->flags |= KFUNC_MD_FL_BPF_REMOVING;
@@ -761,13 +960,65 @@ int bpf_gtrampoline_unlink_prog(struct bpf_gtramp_link *link)
return err;
}
+static int bpf_gtrampoline_link_tramp(struct bpf_gtramp_link_entry *entry,
+ struct bpf_prog *prog)
+{
+ struct bpf_trampoline *tr, *new_tr = NULL;
+ struct bpf_shim_tramp_link *slink = NULL;
+ struct kfunc_md *md;
+ int err, kind;
+ u64 key;
+
+ kfunc_md_lock();
+ md = kfunc_md_get_noref((long)entry->addr);
+ kind = bpf_attach_type_to_tramp(prog);
+ if (!md->tramp) {
+ key = bpf_trampoline_compute_key(NULL, entry->attach_btf,
+ entry->btf_id);
+ new_tr = bpf_trampoline_lookup_exist(key);
+ md->tramp = new_tr;
+ }
+
+ /* check if we need to be replaced by trampoline */
+ tr = md->tramp;
+ kfunc_md_unlock();
+ if (!tr)
+ return 0;
+
+ mutex_lock(&tr->mutex);
+ err = bpf_gtrampoline_get_link(tr, prog, entry->cookie, kind, &slink);
+ if (err)
+ goto err_out;
+
+ err = __bpf_trampoline_link_prog(&slink->link, tr, NULL);
+ if (err)
+ goto err_out;
+ mutex_unlock(&tr->mutex);
+
+ bpf_trampoline_put(new_tr);
+ /* this can only be set on the link success */
+ slink->trampoline = tr;
+ tr->flags |= BPF_TRAMP_F_REPLACE;
+
+ return 0;
+err_out:
+ mutex_unlock(&tr->mutex);
+
+ bpf_trampoline_put(new_tr);
+ if (slink) {
+ bpf_trampoline_put(tr);
+ bpf_link_free(&slink->link.link);
+ }
+ return err;
+}
+
int bpf_gtrampoline_link_prog(struct bpf_gtramp_link *link)
{
struct bpf_gtramp_link_entry *entry;
enum bpf_tramp_prog_type kind;
struct bpf_prog *prog;
struct kfunc_md *md;
- bool update = false;
+ bool update = false, linked;
int err = 0, i;
prog = link->link.prog;
@@ -785,6 +1036,7 @@ int bpf_gtrampoline_link_prog(struct bpf_gtramp_link *link)
* lock instead.
*/
kfunc_md_lock();
+ linked = false;
md = kfunc_md_create((long)entry->addr, entry->nr_args);
if (md) {
/* the function is not in the filter hash of gtr,
@@ -793,16 +1045,27 @@ int bpf_gtrampoline_link_prog(struct bpf_gtramp_link *link)
if (!md->bpf_prog_cnt)
update = true;
err = kfunc_md_bpf_link(md, prog, kind, entry->cookie);
+ if (!err)
+ linked = true;
} else {
err = -ENOMEM;
}
+ kfunc_md_unlock();
- if (err) {
- kfunc_md_put_entry(md);
- kfunc_md_unlock();
- goto on_fallback;
+ if (!err) {
+ err = bpf_gtrampoline_link_tramp(entry, prog);
+ if (!err)
+ continue;
}
+
+ /* on error case, fallback the md and previous */
+ kfunc_md_lock();
+ md = kfunc_md_get_noref((long)entry->addr);
+ if (linked)
+ kfunc_md_bpf_unlink(md, prog, kind);
+ kfunc_md_put_entry(md);
kfunc_md_unlock();
+ goto on_fallback;
}
if (update) {
diff --git a/kernel/trace/kfunc_md.c b/kernel/trace/kfunc_md.c
index ebb4e46d482d..5d61a8be3768 100644
--- a/kernel/trace/kfunc_md.c
+++ b/kernel/trace/kfunc_md.c
@@ -141,7 +141,8 @@ static int kfunc_md_hash_bpf_ips(void **ips)
for (i = 0; i < (1 << KFUNC_MD_HASH_BITS); i++) {
head = &kfunc_md_table[i];
hlist_for_each_entry(md, head, hash) {
- if (md->bpf_prog_cnt > !!(md->flags & KFUNC_MD_FL_BPF_REMOVING))
+ if (md->bpf_prog_cnt > !!(md->flags & KFUNC_MD_FL_BPF_REMOVING) &&
+ !md->tramp)
ips[c++] = (void *)md->func;
}
}
@@ -472,7 +473,8 @@ static int kfunc_md_fast_bpf_ips(void **ips)
for (i = 0; i < kfunc_mds->kfunc_md_count; i++) {
md = &kfunc_mds->mds[i];
- if (md->users && md->bpf_prog_cnt > !!(md->flags & KFUNC_MD_FL_BPF_REMOVING))
+ if (md->users && md->bpf_prog_cnt > !!(md->flags & KFUNC_MD_FL_BPF_REMOVING) &&
+ !md->tramp)
ips[c++] = (void *)md->func;
}
return c;
@@ -662,6 +664,9 @@ int kfunc_md_bpf_unlink(struct kfunc_md *md, struct bpf_prog *prog, int type)
!md->bpf_progs[BPF_TRAMP_MODIFY_RETURN])
md->flags &= ~KFUNC_MD_FL_TRACING_ORIGIN;
+ if (!md->bpf_prog_cnt)
+ md->tramp = NULL;
+
return 0;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 18/25] libbpf: don't free btf if tracing_multi progs existing
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (16 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 17/25] bpf: make trampoline compatible with global trampoline Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 19/25] libbpf: support tracing_multi Menglong Dong
` (8 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
By default, the kernel btf that we load during loading program will be
freed after the programs are loaded in bpf_object_load(). However, we
still need to use these btf for tracing of multi-link during attaching.
Therefore, we don't free the btfs until the bpf object is closed if any
bpf programs of the type multi-link tracing exist.
Meanwhile, introduce the new api bpf_object__free_btf() to manually free
the btfs after attaching.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
tools/lib/bpf/libbpf.c | 16 +++++++++++++++-
tools/lib/bpf/libbpf.h | 2 ++
tools/lib/bpf/libbpf.map | 1 +
3 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e9c641a2fb20..cfe81e1640d8 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8581,6 +8581,20 @@ static void bpf_object_post_load_cleanup(struct bpf_object *obj)
obj->btf_vmlinux = NULL;
}
+static void bpf_object_early_free_btf(struct bpf_object *obj)
+{
+ struct bpf_program *prog;
+
+ bpf_object__for_each_program(prog, obj) {
+ if (prog->expected_attach_type == BPF_TRACE_FENTRY_MULTI ||
+ prog->expected_attach_type == BPF_TRACE_FEXIT_MULTI ||
+ prog->expected_attach_type == BPF_MODIFY_RETURN_MULTI)
+ return;
+ }
+
+ bpf_object_post_load_cleanup(obj);
+}
+
static int bpf_object_prepare(struct bpf_object *obj, const char *target_btf_path)
{
int err;
@@ -8652,7 +8666,7 @@ static int bpf_object_load(struct bpf_object *obj, int extra_log_level, const ch
err = bpf_gen__finish(obj->gen_loader, obj->nr_programs, obj->nr_maps);
}
- bpf_object_post_load_cleanup(obj);
+ bpf_object_early_free_btf(obj);
obj->state = OBJ_LOADED; /* doesn't matter if successfully or not */
if (err) {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index d39f19c8396d..ded98e8cf327 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -323,6 +323,8 @@ LIBBPF_API struct bpf_program *
bpf_object__find_program_by_name(const struct bpf_object *obj,
const char *name);
+LIBBPF_API void bpf_object__free_btfs(struct bpf_object *obj);
+
LIBBPF_API int
libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
enum bpf_attach_type *expected_attach_type);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 1205f9a4fe04..23df00ae0b73 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -415,6 +415,7 @@ LIBBPF_1.4.0 {
bpf_token_create;
btf__new_split;
btf_ext__raw_data;
+ bpf_object__free_btfs;
} LIBBPF_1.3.0;
LIBBPF_1.5.0 {
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 19/25] libbpf: support tracing_multi
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (17 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 18/25] libbpf: don't free btf if tracing_multi progs existing Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 20/25] libbpf: add btf type hash lookup support Menglong Dong
` (7 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
Add supporting for the attach types of:
BPF_TRACE_FENTRY_MULTI
BPF_TRACE_FEXIT_MULTI
BPF_MODIFY_RETURN_MULTI
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
tools/bpf/bpftool/common.c | 3 +
tools/lib/bpf/bpf.c | 10 +++
tools/lib/bpf/bpf.h | 6 ++
tools/lib/bpf/libbpf.c | 168 ++++++++++++++++++++++++++++++++++++-
tools/lib/bpf/libbpf.h | 19 +++++
tools/lib/bpf/libbpf.map | 1 +
6 files changed, 204 insertions(+), 3 deletions(-)
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index ecfa790adc13..8e681fe3dd6b 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -1162,6 +1162,9 @@ const char *bpf_attach_type_input_str(enum bpf_attach_type t)
case BPF_TRACE_FENTRY: return "fentry";
case BPF_TRACE_FEXIT: return "fexit";
case BPF_MODIFY_RETURN: return "mod_ret";
+ case BPF_TRACE_FENTRY_MULTI: return "fentry_multi";
+ case BPF_TRACE_FEXIT_MULTI: return "fexit_multi";
+ case BPF_MODIFY_RETURN_MULTI: return "mod_ret_multi";
case BPF_SK_REUSEPORT_SELECT: return "sk_skb_reuseport_select";
case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE: return "sk_skb_reuseport_select_or_migrate";
default: return libbpf_bpf_attach_type_str(t);
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index a9c3e33d0f8a..75a917de1a3c 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -797,6 +797,16 @@ int bpf_link_create(int prog_fd, int target_fd,
if (!OPTS_ZEROED(opts, tracing))
return libbpf_err(-EINVAL);
break;
+ case BPF_TRACE_FENTRY_MULTI:
+ case BPF_TRACE_FEXIT_MULTI:
+ case BPF_MODIFY_RETURN_MULTI:
+ attr.link_create.tracing_multi.btf_ids = ptr_to_u64(OPTS_GET(opts, tracing_multi.btf_ids, 0));
+ attr.link_create.tracing_multi.tgt_fds = ptr_to_u64(OPTS_GET(opts, tracing_multi.tgt_fds, 0));
+ attr.link_create.tracing_multi.cookies = ptr_to_u64(OPTS_GET(opts, tracing_multi.cookies, 0));
+ attr.link_create.tracing_multi.cnt = OPTS_GET(opts, tracing_multi.cnt, 0);
+ if (!OPTS_ZEROED(opts, tracing_multi))
+ return libbpf_err(-EINVAL);
+ break;
case BPF_NETFILTER:
attr.link_create.netfilter.pf = OPTS_GET(opts, netfilter.pf, 0);
attr.link_create.netfilter.hooknum = OPTS_GET(opts, netfilter.hooknum, 0);
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 777627d33d25..c279b3bc80be 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -422,6 +422,12 @@ struct bpf_link_create_opts {
struct {
__u64 cookie;
} tracing;
+ struct {
+ __u32 cnt;
+ const __u32 *btf_ids;
+ const __u32 *tgt_fds;
+ const __u64 *cookies;
+ } tracing_multi;
struct {
__u32 pf;
__u32 hooknum;
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index cfe81e1640d8..0c4ed5d237e5 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -136,6 +136,9 @@ static const char * const attach_type_name[] = {
[BPF_NETKIT_PEER] = "netkit_peer",
[BPF_TRACE_KPROBE_SESSION] = "trace_kprobe_session",
[BPF_TRACE_UPROBE_SESSION] = "trace_uprobe_session",
+ [BPF_TRACE_FENTRY_MULTI] = "trace_fentry_multi",
+ [BPF_TRACE_FEXIT_MULTI] = "trace_fexit_multi",
+ [BPF_MODIFY_RETURN_MULTI] = "modify_return_multi",
};
static const char * const link_type_name[] = {
@@ -410,6 +413,8 @@ enum sec_def_flags {
SEC_XDP_FRAGS = 16,
/* Setup proper attach type for usdt probes. */
SEC_USDT = 32,
+ /* attachment target is multi-link */
+ SEC_ATTACH_BTF_MULTI = 64,
};
struct bpf_sec_def {
@@ -7417,9 +7422,9 @@ static int libbpf_prepare_prog_load(struct bpf_program *prog,
opts->expected_attach_type = BPF_TRACE_UPROBE_MULTI;
}
- if ((def & SEC_ATTACH_BTF) && !prog->attach_btf_id) {
+ if ((def & (SEC_ATTACH_BTF | SEC_ATTACH_BTF_MULTI)) && !prog->attach_btf_id) {
int btf_obj_fd = 0, btf_type_id = 0, err;
- const char *attach_name;
+ const char *attach_name, *name_end;
attach_name = strchr(prog->sec_name, '/');
if (!attach_name) {
@@ -7438,7 +7443,27 @@ static int libbpf_prepare_prog_load(struct bpf_program *prog,
}
attach_name++; /* skip over / */
- err = libbpf_find_attach_btf_id(prog, attach_name, &btf_obj_fd, &btf_type_id);
+ name_end = strchr(attach_name, ',');
+ /* for multi-link tracing, use the first target symbol during
+ * loading.
+ */
+ if ((def & SEC_ATTACH_BTF_MULTI) && name_end) {
+ int len = name_end - attach_name + 1;
+ char *first_tgt;
+
+ first_tgt = malloc(len);
+ if (!first_tgt)
+ return -ENOMEM;
+ libbpf_strlcpy(first_tgt, attach_name, len);
+ first_tgt[len - 1] = '\0';
+ err = libbpf_find_attach_btf_id(prog, first_tgt, &btf_obj_fd,
+ &btf_type_id);
+ free(first_tgt);
+ } else {
+ err = libbpf_find_attach_btf_id(prog, attach_name, &btf_obj_fd,
+ &btf_type_id);
+ }
+
if (err)
return err;
@@ -9507,6 +9532,7 @@ static int attach_kprobe_session(const struct bpf_program *prog, long cookie, st
static int attach_uprobe_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link);
static int attach_lsm(const struct bpf_program *prog, long cookie, struct bpf_link **link);
static int attach_iter(const struct bpf_program *prog, long cookie, struct bpf_link **link);
+static int attach_trace_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link);
static const struct bpf_sec_def section_defs[] = {
SEC_DEF("socket", SOCKET_FILTER, 0, SEC_NONE),
@@ -9553,6 +9579,13 @@ static const struct bpf_sec_def section_defs[] = {
SEC_DEF("fentry.s+", TRACING, BPF_TRACE_FENTRY, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
SEC_DEF("fmod_ret.s+", TRACING, BPF_MODIFY_RETURN, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
SEC_DEF("fexit.s+", TRACING, BPF_TRACE_FEXIT, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
+ SEC_DEF("tp_btf+", TRACING, BPF_TRACE_RAW_TP, SEC_ATTACH_BTF, attach_trace),
+ SEC_DEF("fentry.multi+", TRACING, BPF_TRACE_FENTRY_MULTI, SEC_ATTACH_BTF_MULTI, attach_trace_multi),
+ SEC_DEF("fmod_ret.multi+", TRACING, BPF_MODIFY_RETURN_MULTI, SEC_ATTACH_BTF_MULTI, attach_trace_multi),
+ SEC_DEF("fexit.multi+", TRACING, BPF_TRACE_FEXIT_MULTI, SEC_ATTACH_BTF_MULTI, attach_trace_multi),
+ SEC_DEF("fentry.multi.s+", TRACING, BPF_TRACE_FENTRY_MULTI, SEC_ATTACH_BTF_MULTI | SEC_SLEEPABLE, attach_trace_multi),
+ SEC_DEF("fmod_ret.multi.s+", TRACING, BPF_MODIFY_RETURN_MULTI, SEC_ATTACH_BTF_MULTI | SEC_SLEEPABLE, attach_trace_multi),
+ SEC_DEF("fexit.multi.s+", TRACING, BPF_TRACE_FEXIT_MULTI, SEC_ATTACH_BTF_MULTI | SEC_SLEEPABLE, attach_trace_multi),
SEC_DEF("freplace+", EXT, 0, SEC_ATTACH_BTF, attach_trace),
SEC_DEF("lsm+", LSM, BPF_LSM_MAC, SEC_ATTACH_BTF, attach_lsm),
SEC_DEF("lsm.s+", LSM, BPF_LSM_MAC, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_lsm),
@@ -12787,6 +12820,135 @@ static int attach_trace(const struct bpf_program *prog, long cookie, struct bpf_
return libbpf_get_error(*link);
}
+struct bpf_link *bpf_program__attach_trace_multi_opts(const struct bpf_program *prog,
+ const struct bpf_trace_multi_opts *opts)
+{
+ LIBBPF_OPTS(bpf_link_create_opts, link_opts);
+ __u32 *btf_ids = NULL, *tgt_fds = NULL;
+ struct bpf_link *link = NULL;
+ char errmsg[STRERR_BUFSIZE];
+ int prog_fd, pfd, cnt, err;
+
+ if (!OPTS_VALID(opts, bpf_trace_multi_opts))
+ return libbpf_err_ptr(-EINVAL);
+
+ prog_fd = bpf_program__fd(prog);
+ if (prog_fd < 0) {
+ pr_warn("prog '%s': can't attach before loaded\n", prog->name);
+ return libbpf_err_ptr(-EINVAL);
+ }
+
+ cnt = OPTS_GET(opts, cnt, 0);
+ if (opts->syms) {
+ int btf_obj_fd, btf_type_id, i;
+
+ if (opts->btf_ids || opts->tgt_fds) {
+ pr_warn("can set both opts->syms and opts->btf_ids\n");
+ return libbpf_err_ptr(-EINVAL);
+ }
+
+ btf_ids = malloc(sizeof(*btf_ids) * cnt);
+ tgt_fds = malloc(sizeof(*tgt_fds) * cnt);
+ if (!btf_ids || !tgt_fds) {
+ err = -ENOMEM;
+ goto err_free;
+ }
+ for (i = 0; i < cnt; i++) {
+ btf_obj_fd = btf_type_id = 0;
+
+ err = find_kernel_btf_id(prog->obj, opts->syms[i],
+ prog->expected_attach_type, &btf_obj_fd,
+ &btf_type_id);
+ if (err)
+ goto err_free;
+ btf_ids[i] = btf_type_id;
+ tgt_fds[i] = btf_obj_fd;
+ }
+ link_opts.tracing_multi.btf_ids = btf_ids;
+ link_opts.tracing_multi.tgt_fds = tgt_fds;
+ } else {
+ link_opts.tracing_multi.btf_ids = OPTS_GET(opts, btf_ids, 0);
+ link_opts.tracing_multi.tgt_fds = OPTS_GET(opts, tgt_fds, 0);
+ }
+
+ link = calloc(1, sizeof(*link));
+ if (!link) {
+ err = -ENOMEM;
+ goto err_free;
+ }
+ link->detach = &bpf_link__detach_fd;
+
+ link_opts.tracing_multi.cookies = OPTS_GET(opts, cookies, 0);
+ link_opts.tracing_multi.cnt = cnt;
+
+ pfd = bpf_link_create(prog_fd, 0, bpf_program__expected_attach_type(prog), &link_opts);
+ if (pfd < 0) {
+ err = -errno;
+ pr_warn("prog '%s': failed to attach: %s\n",
+ prog->name, libbpf_strerror_r(pfd, errmsg, sizeof(errmsg)));
+ goto err_free;
+ }
+ link->fd = pfd;
+
+ free(btf_ids);
+ free(tgt_fds);
+ return link;
+err_free:
+ free(btf_ids);
+ free(tgt_fds);
+ free(link);
+ return libbpf_err_ptr(err);
+}
+
+static int attach_trace_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link)
+{
+ LIBBPF_OPTS(bpf_trace_multi_opts, opts);
+ int i, err, len, cnt = 1;
+ char **syms, *buf, *name;
+ const char *spec;
+
+ spec = strchr(prog->sec_name, '/');
+ if (!spec || !*(++spec))
+ return -EINVAL;
+
+ len = strlen(spec) + 1;
+ buf = malloc(len);
+ if (!buf)
+ return -ENOMEM;
+
+ libbpf_strlcpy(buf, spec, len);
+ for (i = 0; i < len; i++) {
+ if (buf[i] == ',')
+ cnt++;
+ }
+
+ syms = malloc(sizeof(*syms) * cnt);
+ if (!syms) {
+ err = -ENOMEM;
+ goto out_free;
+ }
+
+ opts.syms = (const char **)syms;
+ opts.cnt = cnt;
+ name = buf;
+ err = -EINVAL;
+ while (name) {
+ if (*name == '\0')
+ goto out_free;
+ *(syms++) = name;
+ name = strchr(name, ',');
+ if (name)
+ *(name++) = '\0';
+ }
+
+ *link = bpf_program__attach_trace_multi_opts(prog, &opts);
+ err = libbpf_get_error(*link);
+out_free:
+ free(buf);
+ free(opts.syms);
+ return err;
+}
+
static int attach_lsm(const struct bpf_program *prog, long cookie, struct bpf_link **link)
{
*link = bpf_program__attach_lsm(prog);
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index ded98e8cf327..d7f0db7ab586 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -833,6 +833,25 @@ bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex);
LIBBPF_API struct bpf_link *
bpf_program__attach_freplace(const struct bpf_program *prog,
int target_fd, const char *attach_func_name);
+struct bpf_trace_multi_opts {
+ /* size of this struct, for forward/backward compatibility */
+ size_t sz;
+ /* array of function symbols to attach */
+ const char **syms;
+ /* array of the btf type id to attach */
+ __u32 *btf_ids;
+ /* array of the target fds */
+ __u32 *tgt_fds;
+ /* array of the cookies */
+ __u64 *cookies;
+ /* number of elements in syms/btf_ids/cookies arrays */
+ size_t cnt;
+};
+#define bpf_trace_multi_opts__last_field cnt
+
+LIBBPF_API struct bpf_link *
+bpf_program__attach_trace_multi_opts(const struct bpf_program *prog,
+ const struct bpf_trace_multi_opts *opts);
struct bpf_netfilter_opts {
/* size of this struct, for forward/backward compatibility */
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 23df00ae0b73..fab014528b86 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -416,6 +416,7 @@ LIBBPF_1.4.0 {
btf__new_split;
btf_ext__raw_data;
bpf_object__free_btfs;
+ bpf_program__attach_trace_multi_opts;
} LIBBPF_1.3.0;
LIBBPF_1.5.0 {
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 20/25] libbpf: add btf type hash lookup support
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (18 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 19/25] libbpf: support tracing_multi Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 21/25] libbpf: add skip_invalid and attach_tracing for tracing_multi Menglong Dong
` (6 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
For now, the libbpf find the btf type id by loop all the btf types and
compare its name, which is inefficient if we have many functions to
lookup.
We add the "use_hash" to the function args of find_kernel_btf_id() to
indicate if we should lookup the btf type id by hash. The hash table will
be initialized if it has not yet.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
tools/lib/bpf/btf.c | 102 +++++++++++++++++++++++++++++++++++++++
tools/lib/bpf/btf.h | 6 +++
tools/lib/bpf/libbpf.c | 37 +++++++++++---
tools/lib/bpf/libbpf.map | 3 ++
4 files changed, 140 insertions(+), 8 deletions(-)
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index f1d495dc66bb..a0df16296a94 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -35,6 +35,7 @@ struct btf {
void *raw_data;
/* raw BTF data in non-native endianness */
void *raw_data_swapped;
+ struct hashmap *func_hash;
__u32 raw_size;
/* whether target endianness differs from the native one */
bool swapped_endian;
@@ -131,6 +132,12 @@ struct btf {
int ptr_sz;
};
+struct btf_type_key {
+ __u32 dummy;
+ const char *name;
+ int kind;
+};
+
static inline __u64 ptr_to_u64(const void *ptr)
{
return (__u64) (unsigned long) ptr;
@@ -938,6 +945,100 @@ static __s32 btf_find_by_name_kind(const struct btf *btf, int start_id,
return libbpf_err(-ENOENT);
}
+static size_t btf_hash_name(long key, void *btf)
+{
+ const struct btf_type *t = (const struct btf_type *)key;
+ const char *name;
+
+ if (t->name_off > BTF_MAX_NAME_OFFSET)
+ name = ((struct btf_type_key *)key)->name;
+ else
+ name = btf__name_by_offset(btf, t->name_off);
+
+ return str_hash(name);
+}
+
+static bool btf_name_equal(long key1, long key2, void *btf)
+{
+ const struct btf_type *t1 = (const struct btf_type *)key1,
+ *t2 = (const struct btf_type *)key2;
+ const char *name1, *name2;
+ int k1, k2;
+
+ name1 = btf__name_by_offset(btf, t1->name_off);
+ k1 = btf_kind(t1);
+
+ if (t2->name_off > BTF_MAX_NAME_OFFSET) {
+ struct btf_type_key *t2_key = (struct btf_type_key *)key2;
+
+ name2 = t2_key->name;
+ k2 = t2_key->kind;
+ } else {
+ name2 = btf__name_by_offset(btf, t2->name_off);
+ k2 = btf_kind(t2);
+ }
+
+ return k1 == k2 && strcmp(name1, name2) == 0;
+}
+
+__s32 btf__make_hash(struct btf *btf)
+{
+ __u32 i, nr_types = btf__type_cnt(btf);
+ struct hashmap *map;
+
+ if (btf->func_hash)
+ return 0;
+
+ map = hashmap__new(btf_hash_name, btf_name_equal, (void *)btf);
+ if (!map)
+ return libbpf_err(-ENOMEM);
+
+ for (i = btf->start_id; i < nr_types; i++) {
+ const struct btf_type *t = btf__type_by_id(btf, i);
+ int err;
+
+ /* only function need this */
+ if (btf_kind(t) != BTF_KIND_FUNC)
+ continue;
+
+ err = hashmap__add(map, t, i);
+ if (err == -EEXIST) {
+ pr_warn("btf type exist: name=%s\n",
+ btf__name_by_offset(btf, t->name_off));
+ continue;
+ }
+
+ if (err)
+ return libbpf_err(err);
+ }
+
+ btf->func_hash = map;
+ return 0;
+}
+
+bool btf__hash_hash(struct btf *btf)
+{
+ return !!btf->func_hash;
+}
+
+int btf__find_by_func_hash(struct btf *btf, const char *type_name, __u32 kind)
+{
+ struct btf_type_key key = {
+ .dummy = 0xffffffff,
+ .name = type_name,
+ .kind = kind,
+ };
+ long t;
+
+ if (!btf->func_hash)
+ return -ENOENT;
+
+ if (hashmap__find(btf->func_hash, &key, &t))
+ return t;
+
+ return -ENOENT;
+}
+
__s32 btf__find_by_name_kind_own(const struct btf *btf, const char *type_name,
__u32 kind)
{
@@ -974,6 +1075,7 @@ void btf__free(struct btf *btf)
if (btf->fd >= 0)
close(btf->fd);
+ hashmap__free(btf->func_hash);
if (btf_is_modifiable(btf)) {
/* if BTF was modified after loading, it will have a split
* in-memory representation for header, types, and strings
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 4392451d634b..8639377a1f3b 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -335,6 +335,12 @@ btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
const void *data, size_t data_sz,
const struct btf_dump_type_data_opts *opts);
+
+LIBBPF_API __s32 btf__make_hash(struct btf *btf);
+LIBBPF_API bool btf__hash_hash(struct btf *btf);
+LIBBPF_API int
+btf__find_by_func_hash(struct btf *btf, const char *type_name, __u32 kind);
+
/*
* A set of helpers for easier BTF types handling.
*
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 0c4ed5d237e5..4a903102e0c7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -634,6 +634,7 @@ struct extern_desc {
struct module_btf {
struct btf *btf;
+ struct hashmap *btf_name_hash;
char *name;
__u32 id;
int fd;
@@ -717,6 +718,7 @@ struct bpf_object {
* it at load time.
*/
struct btf *btf_vmlinux;
+ struct hashmap *btf_name_hash;
/* Path to the custom BTF to be used for BPF CO-RE relocations as an
* override for vmlinux BTF.
*/
@@ -1004,7 +1006,7 @@ static int find_ksym_btf_id(struct bpf_object *obj, const char *ksym_name,
struct module_btf **res_mod_btf);
#define STRUCT_OPS_VALUE_PREFIX "bpf_struct_ops_"
-static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix,
+static int find_btf_by_prefix_kind(struct btf *btf, const char *prefix,
const char *name, __u32 kind);
static int
@@ -10040,7 +10042,7 @@ void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
}
}
-static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix,
+static int find_btf_by_prefix_kind(struct btf *btf, const char *prefix,
const char *name, __u32 kind)
{
char btf_type_name[BTF_MAX_NAME_SIZE];
@@ -10054,6 +10056,10 @@ static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix,
*/
if (ret < 0 || ret >= sizeof(btf_type_name))
return -ENAMETOOLONG;
+
+ if (btf__hash_hash(btf))
+ return btf__find_by_func_hash(btf, btf_type_name, kind);
+
return btf__find_by_name_kind(btf, btf_type_name, kind);
}
@@ -10126,9 +10132,9 @@ static int libbpf_find_prog_btf_id(const char *name, __u32 attach_prog_fd, int t
static int find_kernel_btf_id(struct bpf_object *obj, const char *attach_name,
enum bpf_attach_type attach_type,
- int *btf_obj_fd, int *btf_type_id)
+ int *btf_obj_fd, int *btf_type_id, bool use_hash)
{
- int ret, i, mod_len;
+ int ret, i, mod_len, err;
const char *fn_name, *mod_name = NULL;
fn_name = strchr(attach_name, ':');
@@ -10139,6 +10145,11 @@ static int find_kernel_btf_id(struct bpf_object *obj, const char *attach_name,
}
if (!mod_name || strncmp(mod_name, "vmlinux", mod_len) == 0) {
+ if (use_hash) {
+ err = btf__make_hash(obj->btf_vmlinux);
+ if (err)
+ return err;
+ }
ret = find_attach_btf_id(obj->btf_vmlinux,
mod_name ? fn_name : attach_name,
attach_type);
@@ -10161,6 +10172,11 @@ static int find_kernel_btf_id(struct bpf_object *obj, const char *attach_name,
if (mod_name && strncmp(mod->name, mod_name, mod_len) != 0)
continue;
+ if (use_hash) {
+ err = btf__make_hash(mod->btf);
+ if (err)
+ return err;
+ }
ret = find_attach_btf_id(mod->btf,
mod_name ? fn_name : attach_name,
attach_type);
@@ -10210,7 +10226,7 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, const char *attac
} else {
err = find_kernel_btf_id(prog->obj, attach_name,
attach_type, btf_obj_fd,
- btf_type_id);
+ btf_type_id, false);
}
if (err) {
pr_warn("prog '%s': failed to find kernel BTF type ID of '%s': %s\n",
@@ -12854,11 +12870,16 @@ struct bpf_link *bpf_program__attach_trace_multi_opts(const struct bpf_program *
goto err_free;
}
for (i = 0; i < cnt; i++) {
- btf_obj_fd = btf_type_id = 0;
+ /* only use btf type function hashmap when the count
+ * is big enough.
+ */
+ bool func_hash = cnt > 1024;
+
+ btf_obj_fd = btf_type_id = 0;
err = find_kernel_btf_id(prog->obj, opts->syms[i],
prog->expected_attach_type, &btf_obj_fd,
- &btf_type_id);
+ &btf_type_id, func_hash);
if (err)
goto err_free;
btf_ids[i] = btf_type_id;
@@ -13936,7 +13957,7 @@ int bpf_program__set_attach_target(struct bpf_program *prog,
return libbpf_err(err);
err = find_kernel_btf_id(prog->obj, attach_func_name,
prog->expected_attach_type,
- &btf_obj_fd, &btf_id);
+ &btf_obj_fd, &btf_id, false);
if (err)
return libbpf_err(err);
}
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index fab014528b86..100b14de9b22 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -445,4 +445,7 @@ LIBBPF_1.6.0 {
bpf_program__line_info_cnt;
btf__add_decl_attr;
btf__add_type_attr;
+ btf__hash_hash;
+ btf__find_by_func_hash;
+ btf__make_hash;
} LIBBPF_1.5.0;
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 21/25] libbpf: add skip_invalid and attach_tracing for tracing_multi
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (19 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 20/25] libbpf: add btf type hash lookup support Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 22/25] selftests/bpf: use the glob_match() from libbpf in test_progs.c Menglong Dong
` (5 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
We add skip_invalid and attach_tracing for tracing_multi for the
selftests.
When we try to attach all the functions in available_filter_functions with
tracing_multi, we can't tell if the target symbol can be attached
successfully, and the attaching will fail. When skip_invalid is set to
true, we will check if it can be attached in libbpf, and skip the invalid
entries.
We will skip the symbols in the following cases:
1. the btf type not exist
2. the btf type is not a function proto
3. the function args count more that 6
4. the return type is struct or union
5. any function args is struct or union
The 5th rule can be a manslaughter, but it's ok for the testings.
"attach_tracing" is used to convert a TRACING prog to TRACING_MULTI. For
example, we can set the attach type to FENTRY_MULTI before we load the
skel. And we can attach the prog with
bpf_program__attach_trace_multi_opts() with "attach_tracing=1". The libbpf
will attach the target btf type of the prog automatically. This is also
used to reuse the selftests of tracing.
(Oh my goodness! What am I doing?)
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
tools/lib/bpf/libbpf.c | 97 ++++++++++++++++++++++++++++++++++++------
tools/lib/bpf/libbpf.h | 6 ++-
2 files changed, 89 insertions(+), 14 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4a903102e0c7..911fda3f678c 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10132,7 +10132,8 @@ static int libbpf_find_prog_btf_id(const char *name, __u32 attach_prog_fd, int t
static int find_kernel_btf_id(struct bpf_object *obj, const char *attach_name,
enum bpf_attach_type attach_type,
- int *btf_obj_fd, int *btf_type_id, bool use_hash)
+ int *btf_obj_fd, int *btf_type_id, bool use_hash,
+ const struct btf **btf)
{
int ret, i, mod_len, err;
const char *fn_name, *mod_name = NULL;
@@ -10156,6 +10157,8 @@ static int find_kernel_btf_id(struct bpf_object *obj, const char *attach_name,
if (ret > 0) {
*btf_obj_fd = 0; /* vmlinux BTF */
*btf_type_id = ret;
+ if (btf)
+ *btf = obj->btf_vmlinux;
return 0;
}
if (ret != -ENOENT)
@@ -10183,6 +10186,8 @@ static int find_kernel_btf_id(struct bpf_object *obj, const char *attach_name,
if (ret > 0) {
*btf_obj_fd = mod->fd;
*btf_type_id = ret;
+ if (btf)
+ *btf = mod->btf;
return 0;
}
if (ret == -ENOENT)
@@ -10226,7 +10231,7 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, const char *attac
} else {
err = find_kernel_btf_id(prog->obj, attach_name,
attach_type, btf_obj_fd,
- btf_type_id, false);
+ btf_type_id, false, NULL);
}
if (err) {
pr_warn("prog '%s': failed to find kernel BTF type ID of '%s': %s\n",
@@ -12836,6 +12841,53 @@ static int attach_trace(const struct bpf_program *prog, long cookie, struct bpf_
return libbpf_get_error(*link);
}
+static bool is_trace_valid(const struct btf *btf, int btf_type_id, const char *name)
+{
+ const struct btf_type *t;
+
+ t = skip_mods_and_typedefs(btf, btf_type_id, NULL);
+ if (btf_is_func(t)) {
+ const struct btf_param *args;
+ __u32 nargs, m;
+
+ t = skip_mods_and_typedefs(btf, t->type, NULL);
+ if (!btf_is_func_proto(t)) {
+ pr_debug("skipping no function btf type for %s\n",
+ name);
+ return false;
+ }
+
+ args = (const struct btf_param *)(t + 1);
+ nargs = btf_vlen(t);
+ if (nargs > 6) {
+ pr_debug("skipping args count more than 6 for %s\n",
+ name);
+ return false;
+ }
+
+ t = skip_mods_and_typedefs(btf, t->type, NULL);
+ if (btf_is_struct(t) || btf_is_union(t) ||
+ (nargs && args[nargs - 1].type == 0)) {
+ pr_debug("skipping invalid return type for %s\n",
+ name);
+ return false;
+ }
+
+ for (m = 0; m < nargs; m++) {
+ t = skip_mods_and_typedefs(btf, args[m].type, NULL);
+ if (btf_is_struct(t) || btf_is_union(t)) {
+ pr_debug("skipping not supported arg type %s\n",
+ name);
+ break;
+ }
+ }
+ if (m < nargs)
+ return false;
+ }
+
+ return true;
+}
+
struct bpf_link *bpf_program__attach_trace_multi_opts(const struct bpf_program *prog,
const struct bpf_trace_multi_opts *opts)
{
@@ -12856,7 +12908,7 @@ struct bpf_link *bpf_program__attach_trace_multi_opts(const struct bpf_program *
cnt = OPTS_GET(opts, cnt, 0);
if (opts->syms) {
- int btf_obj_fd, btf_type_id, i;
+ int btf_obj_fd, btf_type_id, i, j = 0;
if (opts->btf_ids || opts->tgt_fds) {
pr_warn("can set both opts->syms and opts->btf_ids\n");
@@ -12870,23 +12922,41 @@ struct bpf_link *bpf_program__attach_trace_multi_opts(const struct bpf_program *
goto err_free;
}
for (i = 0; i < cnt; i++) {
+ const struct btf *btf = NULL;
+ bool func_hash;
+
/* only use btf type function hashmap when the count
* is big enough.
*/
- bool func_hash = cnt > 1024;
-
-
+ func_hash = cnt > 1024;
btf_obj_fd = btf_type_id = 0;
err = find_kernel_btf_id(prog->obj, opts->syms[i],
- prog->expected_attach_type, &btf_obj_fd,
- &btf_type_id, func_hash);
- if (err)
- goto err_free;
- btf_ids[i] = btf_type_id;
- tgt_fds[i] = btf_obj_fd;
+ prog->expected_attach_type, &btf_obj_fd,
+ &btf_type_id, func_hash, &btf);
+ if (err) {
+ if (!opts->skip_invalid)
+ goto err_free;
+
+ pr_debug("can't find btf type for %s, skip\n",
+ opts->syms[i]);
+ continue;
+ }
+
+ if (opts->skip_invalid &&
+ !is_trace_valid(btf, btf_type_id, opts->syms[i]))
+ continue;
+
+ btf_ids[j] = btf_type_id;
+ tgt_fds[j] = btf_obj_fd;
+ j++;
}
+ cnt = j;
link_opts.tracing_multi.btf_ids = btf_ids;
link_opts.tracing_multi.tgt_fds = tgt_fds;
+ } else if (opts->attach_tracing) {
+ link_opts.tracing_multi.btf_ids = &prog->attach_btf_id;
+ link_opts.tracing_multi.tgt_fds = &prog->attach_btf_obj_fd;
+ cnt = 1;
} else {
link_opts.tracing_multi.btf_ids = OPTS_GET(opts, btf_ids, 0);
link_opts.tracing_multi.tgt_fds = OPTS_GET(opts, tgt_fds, 0);
@@ -13957,7 +14027,8 @@ int bpf_program__set_attach_target(struct bpf_program *prog,
return libbpf_err(err);
err = find_kernel_btf_id(prog->obj, attach_func_name,
prog->expected_attach_type,
- &btf_obj_fd, &btf_id, false);
+ &btf_obj_fd, &btf_id, false,
+ NULL);
if (err)
return libbpf_err(err);
}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index d7f0db7ab586..c087525ad25a 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -846,8 +846,12 @@ struct bpf_trace_multi_opts {
__u64 *cookies;
/* number of elements in syms/btf_ids/cookies arrays */
size_t cnt;
+ /* skip the invalid btf type before attaching */
+ bool skip_invalid;
+ /* attach a TRACING prog as TRACING_MULTI */
+ bool attach_tracing;
};
-#define bpf_trace_multi_opts__last_field cnt
+#define bpf_trace_multi_opts__last_field attach_tracing
LIBBPF_API struct bpf_link *
bpf_program__attach_trace_multi_opts(const struct bpf_program *prog,
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 22/25] selftests/bpf: use the glob_match() from libbpf in test_progs.c
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (20 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 21/25] libbpf: add skip_invalid and attach_tracing for tracing_multi Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 23/25] selftests/bpf: add get_ksyms and get_addrs to test_progs.c Menglong Dong
` (4 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
The glob_match() in test_progs.c has almost the same logic with the
glob_match() in libbpf.c, so we replace it to make the code simple.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
tools/testing/selftests/bpf/test_progs.c | 23 +----------------------
1 file changed, 1 insertion(+), 22 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 309d9d4a8ace..e246fe4b7b70 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -17,6 +17,7 @@
#include <sys/un.h>
#include <bpf/btf.h>
#include <time.h>
+#include "bpf/libbpf_internal.h"
#include "json_writer.h"
#include "network_helpers.h"
@@ -129,28 +130,6 @@ static int traffic_monitor_print_fn(const char *format, va_list args)
return 0;
}
-/* Adapted from perf/util/string.c */
-static bool glob_match(const char *str, const char *pat)
-{
- while (*str && *pat && *pat != '*') {
- if (*str != *pat)
- return false;
- str++;
- pat++;
- }
- /* Check wild card */
- if (*pat == '*') {
- while (*pat == '*')
- pat++;
- if (!*pat) /* Tail wild card matches all */
- return true;
- while (*str)
- if (glob_match(str++, pat))
- return true;
- }
- return !*str && !*pat;
-}
-
#define EXIT_NO_TEST 2
#define EXIT_ERR_SETUP_INFRA 3
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 23/25] selftests/bpf: add get_ksyms and get_addrs to test_progs.c
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (21 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 22/25] selftests/bpf: use the glob_match() from libbpf in test_progs.c Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 24/25] selftests/bpf: add testcases for multi-link of tracing Menglong Dong
` (3 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
We need to get all the kernel function that can be traced sometimes, so we
move the get_syms() and get_addrs() in kprobe_multi_test.c to test_progs.c
and rename it to bpf_get_ksyms() and bpf_get_addrs().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
.../bpf/prog_tests/kprobe_multi_test.c | 220 +-----------------
tools/testing/selftests/bpf/test_progs.c | 214 +++++++++++++++++
tools/testing/selftests/bpf/test_progs.h | 2 +
3 files changed, 219 insertions(+), 217 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
index e19ef509ebf8..171706e78da8 100644
--- a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
@@ -422,220 +422,6 @@ static void test_unique_match(void)
kprobe_multi__destroy(skel);
}
-static size_t symbol_hash(long key, void *ctx __maybe_unused)
-{
- return str_hash((const char *) key);
-}
-
-static bool symbol_equal(long key1, long key2, void *ctx __maybe_unused)
-{
- return strcmp((const char *) key1, (const char *) key2) == 0;
-}
-
-static bool is_invalid_entry(char *buf, bool kernel)
-{
- if (kernel && strchr(buf, '['))
- return true;
- if (!kernel && !strchr(buf, '['))
- return true;
- return false;
-}
-
-static bool skip_entry(char *name)
-{
- /*
- * We attach to almost all kernel functions and some of them
- * will cause 'suspicious RCU usage' when fprobe is attached
- * to them. Filter out the current culprits - arch_cpu_idle
- * default_idle and rcu_* functions.
- */
- if (!strcmp(name, "arch_cpu_idle"))
- return true;
- if (!strcmp(name, "default_idle"))
- return true;
- if (!strncmp(name, "rcu_", 4))
- return true;
- if (!strcmp(name, "bpf_dispatcher_xdp_func"))
- return true;
- if (!strncmp(name, "__ftrace_invalid_address__",
- sizeof("__ftrace_invalid_address__") - 1))
- return true;
- return false;
-}
-
-/* Do comparision by ignoring '.llvm.<hash>' suffixes. */
-static int compare_name(const char *name1, const char *name2)
-{
- const char *res1, *res2;
- int len1, len2;
-
- res1 = strstr(name1, ".llvm.");
- res2 = strstr(name2, ".llvm.");
- len1 = res1 ? res1 - name1 : strlen(name1);
- len2 = res2 ? res2 - name2 : strlen(name2);
-
- if (len1 == len2)
- return strncmp(name1, name2, len1);
- if (len1 < len2)
- return strncmp(name1, name2, len1) <= 0 ? -1 : 1;
- return strncmp(name1, name2, len2) >= 0 ? 1 : -1;
-}
-
-static int load_kallsyms_compare(const void *p1, const void *p2)
-{
- return compare_name(((const struct ksym *)p1)->name, ((const struct ksym *)p2)->name);
-}
-
-static int search_kallsyms_compare(const void *p1, const struct ksym *p2)
-{
- return compare_name(p1, p2->name);
-}
-
-static int get_syms(char ***symsp, size_t *cntp, bool kernel)
-{
- size_t cap = 0, cnt = 0;
- char *name = NULL, *ksym_name, **syms = NULL;
- struct hashmap *map;
- struct ksyms *ksyms;
- struct ksym *ks;
- char buf[256];
- FILE *f;
- int err = 0;
-
- ksyms = load_kallsyms_custom_local(load_kallsyms_compare);
- if (!ASSERT_OK_PTR(ksyms, "load_kallsyms_custom_local"))
- return -EINVAL;
-
- /*
- * The available_filter_functions contains many duplicates,
- * but other than that all symbols are usable in kprobe multi
- * interface.
- * Filtering out duplicates by using hashmap__add, which won't
- * add existing entry.
- */
-
- if (access("/sys/kernel/tracing/trace", F_OK) == 0)
- f = fopen("/sys/kernel/tracing/available_filter_functions", "r");
- else
- f = fopen("/sys/kernel/debug/tracing/available_filter_functions", "r");
-
- if (!f)
- return -EINVAL;
-
- map = hashmap__new(symbol_hash, symbol_equal, NULL);
- if (IS_ERR(map)) {
- err = libbpf_get_error(map);
- goto error;
- }
-
- while (fgets(buf, sizeof(buf), f)) {
- if (is_invalid_entry(buf, kernel))
- continue;
-
- free(name);
- if (sscanf(buf, "%ms$*[^\n]\n", &name) != 1)
- continue;
- if (skip_entry(name))
- continue;
-
- ks = search_kallsyms_custom_local(ksyms, name, search_kallsyms_compare);
- if (!ks) {
- err = -EINVAL;
- goto error;
- }
-
- ksym_name = ks->name;
- err = hashmap__add(map, ksym_name, 0);
- if (err == -EEXIST) {
- err = 0;
- continue;
- }
- if (err)
- goto error;
-
- err = libbpf_ensure_mem((void **) &syms, &cap,
- sizeof(*syms), cnt + 1);
- if (err)
- goto error;
-
- syms[cnt++] = ksym_name;
- }
-
- *symsp = syms;
- *cntp = cnt;
-
-error:
- free(name);
- fclose(f);
- hashmap__free(map);
- if (err)
- free(syms);
- return err;
-}
-
-static int get_addrs(unsigned long **addrsp, size_t *cntp, bool kernel)
-{
- unsigned long *addr, *addrs, *tmp_addrs;
- int err = 0, max_cnt, inc_cnt;
- char *name = NULL;
- size_t cnt = 0;
- char buf[256];
- FILE *f;
-
- if (access("/sys/kernel/tracing/trace", F_OK) == 0)
- f = fopen("/sys/kernel/tracing/available_filter_functions_addrs", "r");
- else
- f = fopen("/sys/kernel/debug/tracing/available_filter_functions_addrs", "r");
-
- if (!f)
- return -ENOENT;
-
- /* In my local setup, the number of entries is 50k+ so Let us initially
- * allocate space to hold 64k entries. If 64k is not enough, incrementally
- * increase 1k each time.
- */
- max_cnt = 65536;
- inc_cnt = 1024;
- addrs = malloc(max_cnt * sizeof(long));
- if (addrs == NULL) {
- err = -ENOMEM;
- goto error;
- }
-
- while (fgets(buf, sizeof(buf), f)) {
- if (is_invalid_entry(buf, kernel))
- continue;
-
- free(name);
- if (sscanf(buf, "%p %ms$*[^\n]\n", &addr, &name) != 2)
- continue;
- if (skip_entry(name))
- continue;
-
- if (cnt == max_cnt) {
- max_cnt += inc_cnt;
- tmp_addrs = realloc(addrs, max_cnt);
- if (!tmp_addrs) {
- err = -ENOMEM;
- goto error;
- }
- addrs = tmp_addrs;
- }
-
- addrs[cnt++] = (unsigned long)addr;
- }
-
- *addrsp = addrs;
- *cntp = cnt;
-
-error:
- free(name);
- fclose(f);
- if (err)
- free(addrs);
- return err;
-}
-
static void do_bench_test(struct kprobe_multi_empty *skel, struct bpf_kprobe_multi_opts *opts)
{
long attach_start_ns, attach_end_ns;
@@ -670,7 +456,7 @@ static void test_kprobe_multi_bench_attach(bool kernel)
char **syms = NULL;
size_t cnt = 0;
- if (!ASSERT_OK(get_syms(&syms, &cnt, kernel), "get_syms"))
+ if (!ASSERT_OK(bpf_get_ksyms(&syms, &cnt, kernel), "bpf_get_ksyms"))
return;
skel = kprobe_multi_empty__open_and_load();
@@ -696,13 +482,13 @@ static void test_kprobe_multi_bench_attach_addr(bool kernel)
size_t cnt = 0;
int err;
- err = get_addrs(&addrs, &cnt, kernel);
+ err = bpf_get_addrs(&addrs, &cnt, kernel);
if (err == -ENOENT) {
test__skip();
return;
}
- if (!ASSERT_OK(err, "get_addrs"))
+ if (!ASSERT_OK(err, "bpf_get_addrs"))
return;
skel = kprobe_multi_empty__open_and_load();
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index e246fe4b7b70..26cc50bbed8b 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -18,6 +18,7 @@
#include <bpf/btf.h>
#include <time.h>
#include "bpf/libbpf_internal.h"
+#include "bpf/hashmap.h"
#include "json_writer.h"
#include "network_helpers.h"
@@ -646,6 +647,219 @@ int bpf_find_map(const char *test, struct bpf_object *obj, const char *name)
return bpf_map__fd(map);
}
+static size_t symbol_hash(long key, void *ctx __maybe_unused)
+{
+ return str_hash((const char *) key);
+}
+
+static bool symbol_equal(long key1, long key2, void *ctx __maybe_unused)
+{
+ return strcmp((const char *) key1, (const char *) key2) == 0;
+}
+
+static bool is_invalid_entry(char *buf, bool kernel)
+{
+ if (kernel && strchr(buf, '['))
+ return true;
+ if (!kernel && !strchr(buf, '['))
+ return true;
+ return false;
+}
+
+static bool skip_entry(char *name)
+{
+ /*
+ * We attach to almost all kernel functions and some of them
+ * will cause 'suspicious RCU usage' when fprobe is attached
+ * to them. Filter out the current culprits - arch_cpu_idle
+ * default_idle and rcu_* functions.
+ */
+ if (!strcmp(name, "arch_cpu_idle"))
+ return true;
+ if (!strcmp(name, "default_idle"))
+ return true;
+ if (!strncmp(name, "rcu_", 4))
+ return true;
+ if (!strcmp(name, "bpf_dispatcher_xdp_func"))
+ return true;
+ if (!strncmp(name, "__ftrace_invalid_address__",
+ sizeof("__ftrace_invalid_address__") - 1))
+ return true;
+ return false;
+}
+
+/* Do comparison by ignoring '.llvm.<hash>' suffixes. */
+static int compare_name(const char *name1, const char *name2)
+{
+ const char *res1, *res2;
+ int len1, len2;
+
+ res1 = strstr(name1, ".llvm.");
+ res2 = strstr(name2, ".llvm.");
+ len1 = res1 ? res1 - name1 : strlen(name1);
+ len2 = res2 ? res2 - name2 : strlen(name2);
+
+ if (len1 == len2)
+ return strncmp(name1, name2, len1);
+ if (len1 < len2)
+ return strncmp(name1, name2, len1) <= 0 ? -1 : 1;
+ return strncmp(name1, name2, len2) >= 0 ? 1 : -1;
+}
+
+static int load_kallsyms_compare(const void *p1, const void *p2)
+{
+ return compare_name(((const struct ksym *)p1)->name, ((const struct ksym *)p2)->name);
+}
+
+static int search_kallsyms_compare(const void *p1, const struct ksym *p2)
+{
+ return compare_name(p1, p2->name);
+}
+
+int bpf_get_ksyms(char ***symsp, size_t *cntp, bool kernel)
+{
+ size_t cap = 0, cnt = 0;
+ char *name = NULL, *ksym_name, **syms = NULL;
+ struct hashmap *map;
+ struct ksyms *ksyms;
+ struct ksym *ks;
+ char buf[256];
+ FILE *f;
+ int err = 0;
+
+ ksyms = load_kallsyms_custom_local(load_kallsyms_compare);
+ if (!ASSERT_OK_PTR(ksyms, "load_kallsyms_custom_local"))
+ return -EINVAL;
+
+ /*
+ * The available_filter_functions contains many duplicates,
+ * but other than that all symbols are usable to trace.
+ * Filtering out duplicates by using hashmap__add, which won't
+ * add existing entry.
+ */
+
+ if (access("/sys/kernel/tracing/trace", F_OK) == 0)
+ f = fopen("/sys/kernel/tracing/available_filter_functions", "r");
+ else
+ f = fopen("/sys/kernel/debug/tracing/available_filter_functions", "r");
+
+ if (!f)
+ return -EINVAL;
+
+ map = hashmap__new(symbol_hash, symbol_equal, NULL);
+ if (IS_ERR(map)) {
+ err = libbpf_get_error(map);
+ goto error;
+ }
+
+ while (fgets(buf, sizeof(buf), f)) {
+ if (is_invalid_entry(buf, kernel))
+ continue;
+
+ free(name);
+ if (sscanf(buf, "%ms$*[^\n]\n", &name) != 1)
+ continue;
+ if (skip_entry(name))
+ continue;
+
+ ks = search_kallsyms_custom_local(ksyms, name, search_kallsyms_compare);
+ if (!ks) {
+ err = -EINVAL;
+ goto error;
+ }
+
+ ksym_name = ks->name;
+ err = hashmap__add(map, ksym_name, 0);
+ if (err == -EEXIST) {
+ err = 0;
+ continue;
+ }
+ if (err)
+ goto error;
+
+ err = libbpf_ensure_mem((void **) &syms, &cap,
+ sizeof(*syms), cnt + 1);
+ if (err)
+ goto error;
+
+ syms[cnt++] = ksym_name;
+ }
+
+ *symsp = syms;
+ *cntp = cnt;
+
+error:
+ free(name);
+ fclose(f);
+ hashmap__free(map);
+ if (err)
+ free(syms);
+ return err;
+}
+
+int bpf_get_addrs(unsigned long **addrsp, size_t *cntp, bool kernel)
+{
+ unsigned long *addr, *addrs, *tmp_addrs;
+ int err = 0, max_cnt, inc_cnt;
+ char *name = NULL;
+ size_t cnt = 0;
+ char buf[256];
+ FILE *f;
+
+ if (access("/sys/kernel/tracing/trace", F_OK) == 0)
+ f = fopen("/sys/kernel/tracing/available_filter_functions_addrs", "r");
+ else
+ f = fopen("/sys/kernel/debug/tracing/available_filter_functions_addrs", "r");
+
+ if (!f)
+ return -ENOENT;
+
+ /* In my local setup, the number of entries is 50k+ so Let us initially
+ * allocate space to hold 64k entries. If 64k is not enough, incrementally
+ * increase 1k each time.
+ */
+ max_cnt = 65536;
+ inc_cnt = 1024;
+ addrs = malloc(max_cnt * sizeof(long));
+ if (addrs == NULL) {
+ err = -ENOMEM;
+ goto error;
+ }
+
+ while (fgets(buf, sizeof(buf), f)) {
+ if (is_invalid_entry(buf, kernel))
+ continue;
+
+ free(name);
+ if (sscanf(buf, "%p %ms$*[^\n]\n", &addr, &name) != 2)
+ continue;
+ if (skip_entry(name))
+ continue;
+
+ if (cnt == max_cnt) {
+ max_cnt += inc_cnt;
+ tmp_addrs = realloc(addrs, max_cnt);
+ if (!tmp_addrs) {
+ err = -ENOMEM;
+ goto error;
+ }
+ addrs = tmp_addrs;
+ }
+
+ addrs[cnt++] = (unsigned long)addr;
+ }
+
+ *addrsp = addrs;
+ *cntp = cnt;
+
+error:
+ free(name);
+ fclose(f);
+ if (err)
+ free(addrs);
+ return err;
+}
+
int compare_map_keys(int map1_fd, int map2_fd)
{
__u32 key, next_key;
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 870694f2a359..477a785c8b5f 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -468,6 +468,8 @@ int trigger_module_test_write(int write_sz);
int write_sysctl(const char *sysctl, const char *value);
int get_bpf_max_tramp_links_from(struct btf *btf);
int get_bpf_max_tramp_links(void);
+int bpf_get_ksyms(char ***symsp, size_t *cntp, bool kernel);
+int bpf_get_addrs(unsigned long **addrsp, size_t *cntp, bool kernel);
struct netns_obj;
struct netns_obj *netns_new(const char *name, bool open);
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 24/25] selftests/bpf: add testcases for multi-link of tracing
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (22 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 23/25] selftests/bpf: add get_ksyms and get_addrs to test_progs.c Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 25/25] selftests/bpf: add performance bench test for trace prog Menglong Dong
` (2 subsequent siblings)
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
In this commit, we add some testcases for the following attach types:
BPF_TRACE_FENTRY_MULTI
BPF_TRACE_FEXIT_MULTI
BPF_MODIFY_RETURN_MULTI
We reuse the testings in fentry_test.c, fexit_test.c and modify_return.c
by attach the tracing bpf prog as tracing_multi.
We add some functions to skip for tracing progs to bpf_get_ksyms() in this
commit.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
tools/testing/selftests/bpf/Makefile | 2 +-
.../selftests/bpf/prog_tests/fentry_fexit.c | 22 +-
.../selftests/bpf/prog_tests/fentry_test.c | 79 +++--
.../selftests/bpf/prog_tests/fexit_test.c | 79 +++--
.../selftests/bpf/prog_tests/modify_return.c | 60 ++++
.../bpf/prog_tests/tracing_multi_link.c | 276 ++++++++++++++++++
.../selftests/bpf/progs/fentry_multi_empty.c | 13 +
.../bpf/progs/tracing_multi_override.c | 28 ++
.../selftests/bpf/progs/tracing_multi_test.c | 181 ++++++++++++
.../selftests/bpf/test_kmods/bpf_testmod.c | 24 ++
tools/testing/selftests/bpf/test_progs.c | 112 +++++++
tools/testing/selftests/bpf/test_progs.h | 3 +
12 files changed, 831 insertions(+), 48 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tracing_multi_link.c
create mode 100644 tools/testing/selftests/bpf/progs/fentry_multi_empty.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_override.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_test.c
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index cf5ed3bee573..93cacb56591e 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -496,7 +496,7 @@ LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h \
test_subskeleton.skel.h test_subskeleton_lib.skel.h \
test_usdt.skel.h
-LSKELS := fentry_test.c fexit_test.c fexit_sleep.c atomics.c \
+LSKELS := fexit_sleep.c atomics.c \
trace_printk.c trace_vprintk.c map_ptr_kern.c \
core_kern.c core_kern_overflow.c test_ringbuf.c \
test_ringbuf_n.c test_ringbuf_map_key.c test_ringbuf_write.c
diff --git a/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c b/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c
index 130f5b82d2e6..84cc8b669684 100644
--- a/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c
+++ b/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c
@@ -1,32 +1,32 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 Facebook */
#include <test_progs.h>
-#include "fentry_test.lskel.h"
-#include "fexit_test.lskel.h"
+#include "fentry_test.skel.h"
+#include "fexit_test.skel.h"
void test_fentry_fexit(void)
{
- struct fentry_test_lskel *fentry_skel = NULL;
- struct fexit_test_lskel *fexit_skel = NULL;
+ struct fentry_test *fentry_skel = NULL;
+ struct fexit_test *fexit_skel = NULL;
__u64 *fentry_res, *fexit_res;
int err, prog_fd, i;
LIBBPF_OPTS(bpf_test_run_opts, topts);
- fentry_skel = fentry_test_lskel__open_and_load();
+ fentry_skel = fentry_test__open_and_load();
if (!ASSERT_OK_PTR(fentry_skel, "fentry_skel_load"))
goto close_prog;
- fexit_skel = fexit_test_lskel__open_and_load();
+ fexit_skel = fexit_test__open_and_load();
if (!ASSERT_OK_PTR(fexit_skel, "fexit_skel_load"))
goto close_prog;
- err = fentry_test_lskel__attach(fentry_skel);
+ err = fentry_test__attach(fentry_skel);
if (!ASSERT_OK(err, "fentry_attach"))
goto close_prog;
- err = fexit_test_lskel__attach(fexit_skel);
+ err = fexit_test__attach(fexit_skel);
if (!ASSERT_OK(err, "fexit_attach"))
goto close_prog;
- prog_fd = fexit_skel->progs.test1.prog_fd;
+ prog_fd = bpf_program__fd(fexit_skel->progs.test1);
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "ipv6 test_run");
ASSERT_OK(topts.retval, "ipv6 test retval");
@@ -40,6 +40,6 @@ void test_fentry_fexit(void)
}
close_prog:
- fentry_test_lskel__destroy(fentry_skel);
- fexit_test_lskel__destroy(fexit_skel);
+ fentry_test__destroy(fentry_skel);
+ fexit_test__destroy(fexit_skel);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/fentry_test.c b/tools/testing/selftests/bpf/prog_tests/fentry_test.c
index aee1bc77a17f..9edd383feabd 100644
--- a/tools/testing/selftests/bpf/prog_tests/fentry_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/fentry_test.c
@@ -1,26 +1,16 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 Facebook */
#include <test_progs.h>
-#include "fentry_test.lskel.h"
+#include "fentry_test.skel.h"
#include "fentry_many_args.skel.h"
-static int fentry_test_common(struct fentry_test_lskel *fentry_skel)
+static int fentry_test_check(struct fentry_test *fentry_skel)
{
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd, i;
- int link_fd;
__u64 *result;
- LIBBPF_OPTS(bpf_test_run_opts, topts);
-
- err = fentry_test_lskel__attach(fentry_skel);
- if (!ASSERT_OK(err, "fentry_attach"))
- return err;
- /* Check that already linked program can't be attached again. */
- link_fd = fentry_test_lskel__test1__attach(fentry_skel);
- if (!ASSERT_LT(link_fd, 0, "fentry_attach_link"))
- return -1;
-
- prog_fd = fentry_skel->progs.test1.prog_fd;
+ prog_fd = bpf_program__fd(fentry_skel->progs.test1);
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "test_run");
ASSERT_EQ(topts.retval, 0, "test_run");
@@ -31,7 +21,28 @@ static int fentry_test_common(struct fentry_test_lskel *fentry_skel)
return -1;
}
- fentry_test_lskel__detach(fentry_skel);
+ return 0;
+}
+
+static int fentry_test_common(struct fentry_test *fentry_skel)
+{
+ struct bpf_link *link;
+ int err;
+
+ err = fentry_test__attach(fentry_skel);
+ if (!ASSERT_OK(err, "fentry_attach"))
+ return err;
+
+ /* Check that already linked program can't be attached again. */
+ link = bpf_program__attach(fentry_skel->progs.test1);
+ if (!ASSERT_ERR_PTR(link, "fentry_attach_link"))
+ return -1;
+
+ err = fentry_test_check(fentry_skel);
+ if (!ASSERT_OK(err, "fentry_test_check"))
+ return err;
+
+ fentry_test__detach(fentry_skel);
/* zero results for re-attach test */
memset(fentry_skel->bss, 0, sizeof(*fentry_skel->bss));
@@ -40,10 +51,10 @@ static int fentry_test_common(struct fentry_test_lskel *fentry_skel)
static void fentry_test(void)
{
- struct fentry_test_lskel *fentry_skel = NULL;
+ struct fentry_test *fentry_skel = NULL;
int err;
- fentry_skel = fentry_test_lskel__open_and_load();
+ fentry_skel = fentry_test__open_and_load();
if (!ASSERT_OK_PTR(fentry_skel, "fentry_skel_load"))
goto cleanup;
@@ -55,7 +66,7 @@ static void fentry_test(void)
ASSERT_OK(err, "fentry_second_attach");
cleanup:
- fentry_test_lskel__destroy(fentry_skel);
+ fentry_test__destroy(fentry_skel);
}
static void fentry_many_args(void)
@@ -84,10 +95,42 @@ static void fentry_many_args(void)
fentry_many_args__destroy(fentry_skel);
}
+static void fentry_multi_test(void)
+{
+ struct fentry_test *fentry_skel = NULL;
+ int err, prog_cnt;
+
+ fentry_skel = fentry_test__open();
+ if (!ASSERT_OK_PTR(fentry_skel, "fentry_skel_open"))
+ goto cleanup;
+
+ prog_cnt = sizeof(fentry_skel->progs) / sizeof(long);
+ err = bpf_to_tracing_multi((void *)&fentry_skel->progs, prog_cnt);
+ if (!ASSERT_OK(err, "fentry_to_multi"))
+ goto cleanup;
+
+ err = fentry_test__load(fentry_skel);
+ if (!ASSERT_OK(err, "fentry_skel_load"))
+ goto cleanup;
+
+ err = bpf_attach_as_tracing_multi((void *)&fentry_skel->progs,
+ prog_cnt,
+ (void *)&fentry_skel->links);
+ if (!ASSERT_OK(err, "fentry_attach_multi"))
+ goto cleanup;
+
+ err = fentry_test_check(fentry_skel);
+ ASSERT_OK(err, "fentry_first_attach");
+cleanup:
+ fentry_test__destroy(fentry_skel);
+}
+
void test_fentry_test(void)
{
if (test__start_subtest("fentry"))
fentry_test();
+ if (test__start_subtest("fentry_multi"))
+ fentry_multi_test();
if (test__start_subtest("fentry_many_args"))
fentry_many_args();
}
diff --git a/tools/testing/selftests/bpf/prog_tests/fexit_test.c b/tools/testing/selftests/bpf/prog_tests/fexit_test.c
index 1c13007e37dd..5652d02b3ad9 100644
--- a/tools/testing/selftests/bpf/prog_tests/fexit_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/fexit_test.c
@@ -1,26 +1,16 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 Facebook */
#include <test_progs.h>
-#include "fexit_test.lskel.h"
+#include "fexit_test.skel.h"
#include "fexit_many_args.skel.h"
-static int fexit_test_common(struct fexit_test_lskel *fexit_skel)
+static int fexit_test_check(struct fexit_test *fexit_skel)
{
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd, i;
- int link_fd;
__u64 *result;
- LIBBPF_OPTS(bpf_test_run_opts, topts);
-
- err = fexit_test_lskel__attach(fexit_skel);
- if (!ASSERT_OK(err, "fexit_attach"))
- return err;
- /* Check that already linked program can't be attached again. */
- link_fd = fexit_test_lskel__test1__attach(fexit_skel);
- if (!ASSERT_LT(link_fd, 0, "fexit_attach_link"))
- return -1;
-
- prog_fd = fexit_skel->progs.test1.prog_fd;
+ prog_fd = bpf_program__fd(fexit_skel->progs.test1);
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "test_run");
ASSERT_EQ(topts.retval, 0, "test_run");
@@ -31,7 +21,28 @@ static int fexit_test_common(struct fexit_test_lskel *fexit_skel)
return -1;
}
- fexit_test_lskel__detach(fexit_skel);
+ return 0;
+}
+
+static int fexit_test_common(struct fexit_test *fexit_skel)
+{
+ struct bpf_link *link;
+ int err;
+
+ err = fexit_test__attach(fexit_skel);
+ if (!ASSERT_OK(err, "fexit_attach"))
+ return err;
+
+ /* Check that already linked program can't be attached again. */
+ link = bpf_program__attach(fexit_skel->progs.test1);
+ if (!ASSERT_ERR_PTR(link, "fexit_attach_link"))
+ return -1;
+
+ err = fexit_test_check(fexit_skel);
+ if (!ASSERT_OK(err, "fexit_test_check"))
+ return err;
+
+ fexit_test__detach(fexit_skel);
/* zero results for re-attach test */
memset(fexit_skel->bss, 0, sizeof(*fexit_skel->bss));
@@ -40,10 +51,10 @@ static int fexit_test_common(struct fexit_test_lskel *fexit_skel)
static void fexit_test(void)
{
- struct fexit_test_lskel *fexit_skel = NULL;
+ struct fexit_test *fexit_skel = NULL;
int err;
- fexit_skel = fexit_test_lskel__open_and_load();
+ fexit_skel = fexit_test__open_and_load();
if (!ASSERT_OK_PTR(fexit_skel, "fexit_skel_load"))
goto cleanup;
@@ -55,7 +66,7 @@ static void fexit_test(void)
ASSERT_OK(err, "fexit_second_attach");
cleanup:
- fexit_test_lskel__destroy(fexit_skel);
+ fexit_test__destroy(fexit_skel);
}
static void fexit_many_args(void)
@@ -84,10 +95,42 @@ static void fexit_many_args(void)
fexit_many_args__destroy(fexit_skel);
}
+static void fexit_test_multi(void)
+{
+ struct fexit_test *fexit_skel = NULL;
+ int err, prog_cnt;
+
+ fexit_skel = fexit_test__open();
+ if (!ASSERT_OK_PTR(fexit_skel, "fexit_skel_open"))
+ goto cleanup;
+
+ prog_cnt = sizeof(fexit_skel->progs) / sizeof(long);
+ err = bpf_to_tracing_multi((void *)&fexit_skel->progs, prog_cnt);
+ if (!ASSERT_OK(err, "fexit_to_multi"))
+ goto cleanup;
+
+ err = fexit_test__load(fexit_skel);
+ if (!ASSERT_OK(err, "fexit_skel_load"))
+ goto cleanup;
+
+ err = bpf_attach_as_tracing_multi((void *)&fexit_skel->progs,
+ prog_cnt,
+ (void *)&fexit_skel->links);
+ if (!ASSERT_OK(err, "fexit_attach_multi"))
+ goto cleanup;
+
+ err = fexit_test_check(fexit_skel);
+ ASSERT_OK(err, "fexit_first_attach");
+cleanup:
+ fexit_test__destroy(fexit_skel);
+}
+
void test_fexit_test(void)
{
if (test__start_subtest("fexit"))
fexit_test();
+ if (test__start_subtest("fexit_multi"))
+ fexit_test_multi();
if (test__start_subtest("fexit_many_args"))
fexit_many_args();
}
diff --git a/tools/testing/selftests/bpf/prog_tests/modify_return.c b/tools/testing/selftests/bpf/prog_tests/modify_return.c
index a70c99c2f8c8..3ca454379e90 100644
--- a/tools/testing/selftests/bpf/prog_tests/modify_return.c
+++ b/tools/testing/selftests/bpf/prog_tests/modify_return.c
@@ -49,6 +49,56 @@ static void run_test(__u32 input_retval, __u16 want_side_effect, __s16 want_ret)
modify_return__destroy(skel);
}
+static void run_multi_test(__u32 input_retval, __u16 want_side_effect, __s16 want_ret)
+{
+ struct modify_return *skel = NULL;
+ int err, prog_fd, prog_cnt;
+ __u16 side_effect;
+ __s16 ret;
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+ skel = modify_return__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ /* stack function args is not supported by tracing multi-link yet,
+ * so we only enable the bpf progs without stack function args.
+ */
+ bpf_program__set_expected_attach_type(skel->progs.fentry_test,
+ BPF_TRACE_FENTRY_MULTI);
+ bpf_program__set_expected_attach_type(skel->progs.fexit_test,
+ BPF_TRACE_FEXIT_MULTI);
+ bpf_program__set_expected_attach_type(skel->progs.fmod_ret_test,
+ BPF_MODIFY_RETURN_MULTI);
+
+ err = modify_return__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ prog_cnt = sizeof(skel->progs) / sizeof(long);
+ err = bpf_attach_as_tracing_multi((void *)&skel->progs,
+ prog_cnt,
+ (void *)&skel->links);
+ if (!ASSERT_OK(err, "modify_return__attach failed"))
+ goto cleanup;
+
+ skel->bss->input_retval = input_retval;
+ prog_fd = bpf_program__fd(skel->progs.fmod_ret_test);
+ err = bpf_prog_test_run_opts(prog_fd, &topts);
+ ASSERT_OK(err, "test_run");
+
+ side_effect = UPPER(topts.retval);
+ ret = LOWER(topts.retval);
+
+ ASSERT_EQ(ret, want_ret, "test_run ret");
+ ASSERT_EQ(side_effect, want_side_effect, "modify_return side_effect");
+ ASSERT_EQ(skel->bss->fentry_result, 1, "modify_return fentry_result");
+ ASSERT_EQ(skel->bss->fexit_result, 1, "modify_return fexit_result");
+ ASSERT_EQ(skel->bss->fmod_ret_result, 1, "modify_return fmod_ret_result");
+cleanup:
+ modify_return__destroy(skel);
+}
+
/* TODO: conflict with get_func_ip_test */
void serial_test_modify_return(void)
{
@@ -59,3 +109,13 @@ void serial_test_modify_return(void)
0 /* want_side_effect */,
-EINVAL * 2 /* want_ret */);
}
+
+void serial_test_modify_return_multi(void)
+{
+ run_multi_test(0 /* input_retval */,
+ 2 /* want_side_effect */,
+ 33 /* want_ret */);
+ run_multi_test(-EINVAL /* input_retval */,
+ 1 /* want_side_effect */,
+ -EINVAL + 29 /* want_ret */);
+}
diff --git a/tools/testing/selftests/bpf/prog_tests/tracing_multi_link.c b/tools/testing/selftests/bpf/prog_tests/tracing_multi_link.c
new file mode 100644
index 000000000000..f730e26be911
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tracing_multi_link.c
@@ -0,0 +1,276 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 ChinaTelecom */
+
+#include <test_progs.h>
+#include "bpf/libbpf_internal.h"
+
+#include "tracing_multi_test.skel.h"
+#include "tracing_multi_override.skel.h"
+#include "fentry_multi_empty.skel.h"
+
+static void test_run(struct tracing_multi_test *skel)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
+ int err, prog_fd;
+
+ skel->bss->pid = getpid();
+ prog_fd = bpf_program__fd(skel->progs.fentry_cookie_test1);
+ err = bpf_prog_test_run_opts(prog_fd, &topts);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(topts.retval, 0, "test_run");
+
+ ASSERT_EQ(skel->bss->fentry_test1_result, 1, "fentry_test1_result");
+ ASSERT_EQ(skel->bss->fentry_test2_result, 1, "fentry_test2_result");
+ ASSERT_EQ(skel->bss->fentry_test3_result, 1, "fentry_test3_result");
+ ASSERT_EQ(skel->bss->fentry_test4_result, 1, "fentry_test4_result");
+ ASSERT_EQ(skel->bss->fentry_test5_result, 1, "fentry_test5_result");
+ ASSERT_EQ(skel->bss->fentry_test6_result, 1, "fentry_test6_result");
+ ASSERT_EQ(skel->bss->fentry_test7_result, 1, "fentry_test7_result");
+ ASSERT_EQ(skel->bss->fentry_test8_result, 1, "fentry_test8_result");
+}
+
+static void test_skel_auto_api(void)
+{
+ struct tracing_multi_test *skel;
+ int err;
+
+ skel = tracing_multi_test__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "tracing_multi_test__open_and_load"))
+ return;
+
+ /* disable all programs that should fail */
+ bpf_program__set_autoattach(skel->progs.fentry_fail_test1, false);
+ bpf_program__set_autoattach(skel->progs.fentry_fail_test2, false);
+ bpf_program__set_autoattach(skel->progs.fentry_fail_test3, false);
+ bpf_program__set_autoattach(skel->progs.fentry_fail_test4, false);
+ bpf_program__set_autoattach(skel->progs.fentry_fail_test5, false);
+ bpf_program__set_autoattach(skel->progs.fentry_fail_test6, false);
+
+ bpf_program__set_autoattach(skel->progs.fexit_fail_test1, false);
+ bpf_program__set_autoattach(skel->progs.fexit_fail_test2, false);
+ bpf_program__set_autoattach(skel->progs.fexit_fail_test3, false);
+
+ err = tracing_multi_test__attach(skel);
+ if (!ASSERT_OK(err, "tracing_multi_test__attach"))
+ goto cleanup;
+
+ test_run(skel);
+
+cleanup:
+ tracing_multi_test__destroy(skel);
+}
+
+static void test_skel_manual_api(void)
+{
+ struct tracing_multi_test *skel;
+ struct bpf_link *link;
+ int err;
+
+ skel = tracing_multi_test__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "tracing_multi_test__open_and_load"))
+ return;
+
+#define ATTACH_PROG(name, success) \
+do { \
+ link = bpf_program__attach(skel->progs.name); \
+ err = libbpf_get_error(link); \
+ if (!ASSERT_OK(success ? err : !err, \
+ "bpf_program__attach: " #name)) \
+ goto cleanup; \
+ skel->links.name = err ? NULL : link; \
+} while (0)
+
+ ATTACH_PROG(fentry_success_test1, true);
+ ATTACH_PROG(fentry_success_test2, true);
+ ATTACH_PROG(fentry_success_test3, true);
+ ATTACH_PROG(fentry_success_test4, true);
+
+ ATTACH_PROG(fexit_success_test1, true);
+ ATTACH_PROG(fexit_success_test2, true);
+
+ ATTACH_PROG(fentry_fail_test1, false);
+ ATTACH_PROG(fentry_fail_test2, false);
+ ATTACH_PROG(fentry_fail_test3, false);
+ ATTACH_PROG(fentry_fail_test4, false);
+ ATTACH_PROG(fentry_fail_test5, false);
+ ATTACH_PROG(fentry_fail_test6, false);
+
+ ATTACH_PROG(fexit_fail_test1, false);
+ ATTACH_PROG(fexit_fail_test2, false);
+ ATTACH_PROG(fexit_fail_test3, false);
+
+ ATTACH_PROG(fentry_cookie_test1, true);
+
+ test_run(skel);
+
+cleanup:
+ tracing_multi_test__destroy(skel);
+}
+
+static void test_attach_api(void)
+{
+ LIBBPF_OPTS(bpf_trace_multi_opts, opts);
+ struct tracing_multi_test *skel;
+ struct bpf_link *link;
+ const char *syms[8] = {
+ "bpf_fentry_test1",
+ "bpf_fentry_test2",
+ "bpf_fentry_test3",
+ "bpf_fentry_test4",
+ "bpf_fentry_test5",
+ "bpf_fentry_test6",
+ "bpf_fentry_test7",
+ "bpf_fentry_test8",
+ };
+ __u64 cookies[] = {1, 7, 2, 3, 4, 5, 6, 8};
+
+ skel = tracing_multi_test__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "tracing_multi_test__open_and_load"))
+ return;
+
+ opts.syms = syms;
+ opts.cookies = cookies;
+ opts.cnt = ARRAY_SIZE(syms);
+ link = bpf_program__attach_trace_multi_opts(skel->progs.fentry_cookie_test1,
+ &opts);
+ if (!ASSERT_OK_PTR(link, "bpf_program__attach_trace_multi_opts"))
+ goto cleanup;
+ skel->links.fentry_cookie_test1 = link;
+
+ skel->bss->test_cookie = true;
+ test_run(skel);
+cleanup:
+ tracing_multi_test__destroy(skel);
+}
+
+static void test_attach_bench(bool kernel)
+{
+ LIBBPF_OPTS(bpf_trace_multi_opts, opts);
+ struct fentry_multi_empty *skel;
+ long attach_start_ns, attach_end_ns;
+ long detach_start_ns, detach_end_ns;
+ double attach_delta, detach_delta;
+ struct bpf_link *link = NULL;
+ char **syms = NULL;
+ size_t cnt = 0;
+
+ if (!ASSERT_OK(bpf_get_ksyms(&syms, &cnt, kernel), "get_syms"))
+ return;
+
+ skel = fentry_multi_empty__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "fentry_multi_empty__open_and_load"))
+ goto cleanup;
+
+ opts.syms = (const char **) syms;
+ opts.cnt = cnt;
+ opts.skip_invalid = true;
+
+ attach_start_ns = get_time_ns();
+ link = bpf_program__attach_trace_multi_opts(skel->progs.fentry_multi_empty,
+ &opts);
+ attach_end_ns = get_time_ns();
+
+ if (!ASSERT_OK_PTR(link, "bpf_program__attach_trace_multi_opts"))
+ return;
+
+ detach_start_ns = get_time_ns();
+ bpf_link__destroy(link);
+ detach_end_ns = get_time_ns();
+
+ attach_delta = (attach_end_ns - attach_start_ns) / 1000000000.0;
+ detach_delta = (detach_end_ns - detach_start_ns) / 1000000000.0;
+
+ printf("%s: found %lu functions\n", __func__, opts.cnt);
+ printf("%s: attached in %7.3lfs\n", __func__, attach_delta);
+ printf("%s: detached in %7.3lfs\n", __func__, detach_delta);
+
+cleanup:
+ fentry_multi_empty__destroy(skel);
+ if (syms)
+ free(syms);
+}
+
+static void test_attach_override(bool fentry_over_multi)
+{
+ struct tracing_multi_override *skel;
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
+ struct bpf_link *link;
+ int err, prog_fd;
+
+ skel = tracing_multi_override__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "tracing_multi_test__open_and_load"))
+ goto cleanup;
+
+ if (fentry_over_multi) {
+ ATTACH_PROG(fentry_multi_override_test1, true);
+ ATTACH_PROG(fentry_override_test1, true);
+ } else {
+ ATTACH_PROG(fentry_override_test1, true);
+ ATTACH_PROG(fentry_multi_override_test1, true);
+ }
+
+ prog_fd = bpf_program__fd(skel->progs.fentry_multi_override_test1);
+ err = bpf_prog_test_run_opts(prog_fd, &topts);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(topts.retval, 0, "test_run");
+
+ ASSERT_EQ(skel->data->fentry_override_test1_result, 3,
+ "fentry_override_test1_result");
+cleanup:
+ tracing_multi_override__destroy(skel);
+}
+
+static void test_attach_multi(void)
+{
+ struct tracing_multi_override *skel;
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
+ struct bpf_link *link;
+ int err, prog_fd;
+
+ skel = tracing_multi_override__open();
+ if (!ASSERT_OK_PTR(skel, "tracing_multi_test__open"))
+ goto cleanup;
+
+ /* don't load fentry_override_test1, it will create a trampoline */
+ bpf_program__set_autoload(skel->progs.fentry_override_test1, false);
+ err = tracing_multi_override__load(skel);
+ if (!ASSERT_OK(err, "tracing_multi_test__load"))
+ goto cleanup;
+
+ ATTACH_PROG(fentry_multi_override_test1, true);
+ ATTACH_PROG(fentry_multi_override_test2, true);
+
+ prog_fd = bpf_program__fd(skel->progs.fentry_multi_override_test1);
+ err = bpf_prog_test_run_opts(prog_fd, &topts);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(topts.retval, 0, "test_run");
+
+ ASSERT_EQ(skel->data->fentry_override_test1_result, 4,
+ "fentry_override_test1_result");
+cleanup:
+ tracing_multi_override__destroy(skel);
+}
+
+void serial_test_tracing_multi_attach_bench(void)
+{
+ if (test__start_subtest("kernel"))
+ test_attach_bench(true);
+ if (test__start_subtest("modules"))
+ test_attach_bench(false);
+}
+
+void test_tracing_multi_attach_test(void)
+{
+ if (test__start_subtest("skel_auto_api"))
+ test_skel_auto_api();
+ if (test__start_subtest("skel_manual_api"))
+ test_skel_manual_api();
+ if (test__start_subtest("attach_api"))
+ test_attach_api();
+ if (test__start_subtest("attach_over_multi"))
+ test_attach_override(true);
+ if (test__start_subtest("attach_over_fentry"))
+ test_attach_override(false);
+ if (test__start_subtest("attach_multi"))
+ test_attach_multi();
+}
diff --git a/tools/testing/selftests/bpf/progs/fentry_multi_empty.c b/tools/testing/selftests/bpf/progs/fentry_multi_empty.c
new file mode 100644
index 000000000000..a09ba216dff8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/fentry_multi_empty.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 ChinaTelecom */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+SEC("fentry.multi/bpf_fentry_test1")
+int BPF_PROG(fentry_multi_empty)
+{
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/tracing_multi_override.c b/tools/testing/selftests/bpf/progs/tracing_multi_override.c
new file mode 100644
index 000000000000..8001be433914
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tracing_multi_override.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 ChinaTelecom */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+__u64 fentry_override_test1_result = 1;
+
+SEC("fentry.multi/bpf_fentry_test1")
+int BPF_PROG(fentry_multi_override_test1)
+{
+ fentry_override_test1_result++;
+ return 0;
+}
+
+SEC("fentry.multi/bpf_fentry_test1")
+int BPF_PROG(fentry_multi_override_test2)
+{
+ fentry_override_test1_result <<= 1;
+ return 0;
+}
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(fentry_override_test1)
+{
+ fentry_override_test1_result++;
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/tracing_multi_test.c b/tools/testing/selftests/bpf/progs/tracing_multi_test.c
new file mode 100644
index 000000000000..fa27851896b9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tracing_multi_test.c
@@ -0,0 +1,181 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 ChinaTelecom */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct bpf_testmod_struct_arg_1 {
+ int a;
+};
+struct bpf_testmod_struct_arg_2 {
+ long a;
+ long b;
+};
+
+__u64 test_result = 0;
+
+int pid = 0;
+int test_cookie = 0;
+
+__u64 fentry_test1_result = 0;
+__u64 fentry_test2_result = 0;
+__u64 fentry_test3_result = 0;
+__u64 fentry_test4_result = 0;
+__u64 fentry_test5_result = 0;
+__u64 fentry_test6_result = 0;
+__u64 fentry_test7_result = 0;
+__u64 fentry_test8_result = 0;
+
+extern const void bpf_fentry_test1 __ksym;
+extern const void bpf_fentry_test2 __ksym;
+extern const void bpf_fentry_test3 __ksym;
+extern const void bpf_fentry_test4 __ksym;
+extern const void bpf_fentry_test5 __ksym;
+extern const void bpf_fentry_test6 __ksym;
+extern const void bpf_fentry_test7 __ksym;
+extern const void bpf_fentry_test8 __ksym;
+
+SEC("fentry.multi/bpf_testmod_test_struct_arg_1,bpf_testmod_test_struct_arg_13")
+int BPF_PROG2(fentry_success_test1, struct bpf_testmod_struct_arg_2, a)
+{
+ test_result = a.a + a.b;
+ return 0;
+}
+
+SEC("fentry.multi/bpf_testmod_test_struct_arg_2,bpf_testmod_test_struct_arg_10")
+int BPF_PROG2(fentry_success_test2, int, a, struct bpf_testmod_struct_arg_2, b)
+{
+ test_result = a + b.a + b.b;
+ return 0;
+}
+
+SEC("fentry.multi/bpf_testmod_test_struct_arg_1,bpf_testmod_test_struct_arg_4")
+int BPF_PROG2(fentry_success_test3, struct bpf_testmod_struct_arg_2, a, int, b,
+ int, c)
+{
+ test_result = c;
+ return 0;
+}
+
+SEC("fentry.multi/bpf_testmod_test_struct_arg_1,bpf_testmod_test_struct_arg_2")
+int BPF_PROG2(fentry_success_test4, struct bpf_testmod_struct_arg_2, a, int, b,
+ int, c)
+{
+ test_result = c;
+ return 0;
+}
+
+SEC("fentry.multi/bpf_testmod_test_struct_arg_1,bpf_testmod_test_struct_arg_1")
+int BPF_PROG2(fentry_fail_test1, struct bpf_testmod_struct_arg_2, a)
+{
+ test_result = a.a + a.b;
+ return 0;
+}
+
+SEC("fentry.multi/bpf_testmod_test_struct_arg_1,bpf_testmod_test_struct_arg_2")
+int BPF_PROG2(fentry_fail_test2, struct bpf_testmod_struct_arg_2, a)
+{
+ test_result = a.a + a.b;
+ return 0;
+}
+
+SEC("fentry.multi/bpf_testmod_test_struct_arg_1,bpf_testmod_test_struct_arg_4")
+int BPF_PROG2(fentry_fail_test3, struct bpf_testmod_struct_arg_2, a)
+{
+ test_result = a.a + a.b;
+ return 0;
+}
+
+SEC("fentry.multi/bpf_testmod_test_struct_arg_2,bpf_testmod_test_struct_arg_2")
+int BPF_PROG2(fentry_fail_test4, int, a, struct bpf_testmod_struct_arg_2, b)
+{
+ test_result = a + b.a + b.b;
+ return 0;
+}
+
+SEC("fentry.multi/bpf_testmod_test_struct_arg_2,bpf_testmod_test_struct_arg_13")
+int BPF_PROG2(fentry_fail_test5, int, a, struct bpf_testmod_struct_arg_2, b)
+{
+ test_result = a + b.a + b.b;
+ return 0;
+}
+
+SEC("fentry.multi/bpf_testmod_test_struct_arg_1,bpf_testmod_test_struct_arg_12")
+int BPF_PROG2(fentry_fail_test6, struct bpf_testmod_struct_arg_2, a, int, b,
+ int, c)
+{
+ test_result = c;
+ return 0;
+}
+
+SEC("fexit.multi/bpf_testmod_test_struct_arg_1,bpf_testmod_test_struct_arg_2,bpf_testmod_test_struct_arg_3")
+int BPF_PROG2(fexit_success_test1, struct bpf_testmod_struct_arg_2, a, int, b,
+ int, c, int, retval)
+{
+ test_result = retval;
+ return 0;
+}
+
+SEC("fexit.multi/bpf_testmod_test_struct_arg_2,bpf_testmod_test_struct_arg_12")
+int BPF_PROG2(fexit_success_test2, int, a, struct bpf_testmod_struct_arg_2, b,
+ int, c, int, retval)
+{
+ test_result = a + b.a + b.b + retval;
+ return 0;
+}
+
+SEC("fexit.multi/bpf_testmod_test_struct_arg_1,bpf_testmod_test_struct_arg_4")
+int BPF_PROG2(fexit_fail_test1, struct bpf_testmod_struct_arg_2, a, int, b,
+ int, c, int, retval)
+{
+ test_result = retval;
+ return 0;
+}
+
+SEC("fexit.multi/bpf_testmod_test_struct_arg_2,bpf_testmod_test_struct_arg_10")
+int BPF_PROG2(fexit_fail_test2, int, a, struct bpf_testmod_struct_arg_2, b,
+ int, c, int, retval)
+{
+ test_result = a + b.a + b.b + retval;
+ return 0;
+}
+
+SEC("fexit.multi/bpf_testmod_test_struct_arg_2,bpf_testmod_test_struct_arg_11")
+int BPF_PROG2(fexit_fail_test3, int, a, struct bpf_testmod_struct_arg_2, b,
+ int, c, int, retval)
+{
+ test_result = a + b.a + b.b + retval;
+ return 0;
+}
+
+static void tracing_multi_check_cookie(unsigned long long *ctx)
+{
+ if (bpf_get_current_pid_tgid() >> 32 != pid)
+ return;
+
+ __u64 cookie = test_cookie ? bpf_get_attach_cookie(ctx) : 0;
+ __u64 addr = bpf_get_func_ip(ctx);
+
+#define SET(__var, __addr, __cookie) ({ \
+ if (((const void *) addr == __addr) && \
+ (!test_cookie || (cookie == __cookie))) \
+ __var = 1; \
+})
+ SET(fentry_test1_result, &bpf_fentry_test1, 1);
+ SET(fentry_test2_result, &bpf_fentry_test2, 7);
+ SET(fentry_test3_result, &bpf_fentry_test3, 2);
+ SET(fentry_test4_result, &bpf_fentry_test4, 3);
+ SET(fentry_test5_result, &bpf_fentry_test5, 4);
+ SET(fentry_test6_result, &bpf_fentry_test6, 5);
+ SET(fentry_test7_result, &bpf_fentry_test7, 6);
+ SET(fentry_test8_result, &bpf_fentry_test8, 8);
+}
+
+SEC("fentry.multi/bpf_fentry_test1,bpf_fentry_test2,bpf_fentry_test3,bpf_fentry_test4,bpf_fentry_test5,bpf_fentry_test6,bpf_fentry_test7,bpf_fentry_test8")
+int BPF_PROG(fentry_cookie_test1)
+{
+ tracing_multi_check_cookie(ctx);
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
index 2e54b95ad898..ebc4d5204136 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
@@ -128,6 +128,30 @@ bpf_testmod_test_struct_arg_9(u64 a, void *b, short c, int d, void *e, char f,
return bpf_testmod_test_struct_arg_result;
}
+noinline int
+bpf_testmod_test_struct_arg_10(int a, struct bpf_testmod_struct_arg_2 b) {
+ bpf_testmod_test_struct_arg_result = a + b.a + b.b;
+ return bpf_testmod_test_struct_arg_result;
+}
+
+noinline struct bpf_testmod_struct_arg_2 *
+bpf_testmod_test_struct_arg_11(int a, struct bpf_testmod_struct_arg_2 b, int c) {
+ bpf_testmod_test_struct_arg_result = a + b.a + b.b + c;
+ return (void *)bpf_testmod_test_struct_arg_result;
+}
+
+noinline int
+bpf_testmod_test_struct_arg_12(int a, struct bpf_testmod_struct_arg_2 b, int *c) {
+ bpf_testmod_test_struct_arg_result = a + b.a + b.b + *c;
+ return bpf_testmod_test_struct_arg_result;
+}
+
+noinline int
+bpf_testmod_test_struct_arg_13(struct bpf_testmod_struct_arg_2 b) {
+ bpf_testmod_test_struct_arg_result = b.a + b.b;
+ return bpf_testmod_test_struct_arg_result;
+}
+
noinline int
bpf_testmod_test_arg_ptr_to_struct(struct bpf_testmod_struct_arg_1 *a) {
bpf_testmod_test_struct_arg_result = a->a;
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 26cc50bbed8b..286a30c1c7ae 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -685,6 +685,68 @@ static bool skip_entry(char *name)
if (!strncmp(name, "__ftrace_invalid_address__",
sizeof("__ftrace_invalid_address__") - 1))
return true;
+
+ /* skip functions in "btf_id_deny" */
+ if (!strcmp(name, "migrate_disable"))
+ return true;
+ if (!strcmp(name, "migrate_enable"))
+ return true;
+ if (!strcmp(name, "rcu_read_unlock_strict"))
+ return true;
+ if (!strcmp(name, "preempt_count_add"))
+ return true;
+ if (!strcmp(name, "preempt_count_sub"))
+ return true;
+ if (!strcmp(name, "__rcu_read_lock"))
+ return true;
+ if (!strcmp(name, "__rcu_read_unlock"))
+ return true;
+
+ /* Following symbols have multi definition in kallsyms, take
+ * "t_next" for example:
+ *
+ * ffffffff813c10d0 t t_next
+ * ffffffff813d31b0 t t_next
+ * ffffffff813e06b0 t t_next
+ * ffffffff813eb360 t t_next
+ * ffffffff81613360 t t_next
+ *
+ * but only one of them have corresponding mrecord:
+ * ffffffff81613364 t_next
+ *
+ * The kernel search the target function address by the symbol
+ * name "t_next" with kallsyms_lookup_name() during attaching
+ * and the function "0xffffffff813c10d0" can be matched, which
+ * doesn't have a corresponding mrecord. And this will make
+ * the attach failing. Skip the functions like this.
+ *
+ * The list maybe not whole, so we still can fail......
+ */
+ if (!strcmp(name, "kill_pid_usb_asyncio"))
+ return true;
+ if (!strcmp(name, "t_next"))
+ return true;
+ if (!strcmp(name, "t_stop"))
+ return true;
+ if (!strcmp(name, "t_start"))
+ return true;
+ if (!strcmp(name, "p_next"))
+ return true;
+ if (!strcmp(name, "p_stop"))
+ return true;
+ if (!strcmp(name, "p_start"))
+ return true;
+ if (!strcmp(name, "mem32_serial_out"))
+ return true;
+ if (!strcmp(name, "mem32_serial_in"))
+ return true;
+ if (!strcmp(name, "io_serial_in"))
+ return true;
+ if (!strcmp(name, "io_serial_out"))
+ return true;
+ if (!strcmp(name, "event_callback"))
+ return true;
+
return false;
}
@@ -860,6 +922,56 @@ int bpf_get_addrs(unsigned long **addrsp, size_t *cntp, bool kernel)
return err;
}
+int bpf_to_tracing_multi(struct bpf_program **progs, int prog_cnt)
+{
+ enum bpf_attach_type type;
+ int i, err;
+
+ for (i = 0; i < prog_cnt; i++) {
+ type = bpf_program__get_expected_attach_type(progs[i]);
+ if (type == BPF_TRACE_FENTRY)
+ type = BPF_TRACE_FENTRY_MULTI;
+ else if (type == BPF_TRACE_FEXIT)
+ type = BPF_TRACE_FEXIT_MULTI;
+ else if (type == BPF_MODIFY_RETURN)
+ type = BPF_MODIFY_RETURN_MULTI;
+ else
+ continue;
+ err = bpf_program__set_expected_attach_type(progs[i], type);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+int bpf_attach_as_tracing_multi(struct bpf_program **progs, int prog_cnt,
+ struct bpf_link **link)
+{
+ struct bpf_link *__link;
+ int err, type;
+
+ for (int i = 0; i < prog_cnt; i++) {
+ LIBBPF_OPTS(bpf_trace_multi_opts, opts);
+
+ type = bpf_program__get_expected_attach_type(progs[i]);
+ if (type != BPF_TRACE_FENTRY_MULTI &&
+ type != BPF_TRACE_FEXIT_MULTI &&
+ type != BPF_MODIFY_RETURN_MULTI)
+ continue;
+
+ opts.attach_tracing = true;
+ __link = bpf_program__attach_trace_multi_opts(progs[i], &opts);
+ err = libbpf_get_error(link);
+ if (err)
+ return err;
+
+ link[i] = __link;
+ }
+
+ return 0;
+}
+
int compare_map_keys(int map1_fd, int map2_fd)
{
__u32 key, next_key;
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 477a785c8b5f..d0878bf605af 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -470,6 +470,9 @@ int get_bpf_max_tramp_links_from(struct btf *btf);
int get_bpf_max_tramp_links(void);
int bpf_get_ksyms(char ***symsp, size_t *cntp, bool kernel);
int bpf_get_addrs(unsigned long **addrsp, size_t *cntp, bool kernel);
+int bpf_to_tracing_multi(struct bpf_program **progs, int prog_cnt);
+int bpf_attach_as_tracing_multi(struct bpf_program **progs, int prog_cnt,
+ struct bpf_link **link);
struct netns_obj;
struct netns_obj *netns_new(const char *name, bool open);
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH bpf-next 25/25] selftests/bpf: add performance bench test for trace prog
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (23 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 24/25] selftests/bpf: add testcases for multi-link of tracing Menglong Dong
@ 2025-05-28 3:47 ` Menglong Dong
2025-05-28 13:51 ` [PATCH bpf-next 00/25] bpf: tracing multi-link support Steven Rostedt
2025-06-11 3:32 ` Alexei Starovoitov
26 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-28 3:47 UTC (permalink / raw)
To: alexei.starovoitov, rostedt, jolsa; +Cc: bpf, Menglong Dong, linux-kernel
Add testcase for the performance of the trace bpf progs. In this testcase,
bpf_fentry_test1() will be called 10000000 times in bpf_testmod_bench_run,
and the time consumed will be returned. Following cases is considered:
- nop: nothing is attached to bpf_fentry_test1()
- fentry: a empty FENTRY bpf program is attached to bpf_fentry_test1()
- fentry_multi_single: a empty FENTRY_MULTI bpf program is attached to
bpf_fentry_test1()
- fentry_multi_all: a empty FENTRY_MULTI bpf program is attached to all
the kernel functions
- kprobe_multi_single: a empty KPROBE_MULTI bpf program is attached to
bpf_fentry_test1()
- kprobe_multi_all: a empty KPROBE_MULTI bpf program is attached to all
the kernel functions
And we can get the result by running:
./test_progs -t tracing_multi_bench -v | grep time
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
.../selftests/bpf/prog_tests/trace_bench.c | 149 ++++++++++++++++++
.../selftests/bpf/progs/fentry_empty.c | 13 ++
.../testing/selftests/bpf/progs/trace_bench.c | 21 +++
.../selftests/bpf/test_kmods/bpf_testmod.c | 16 ++
4 files changed, 199 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/trace_bench.c
create mode 100644 tools/testing/selftests/bpf/progs/fentry_empty.c
create mode 100644 tools/testing/selftests/bpf/progs/trace_bench.c
diff --git a/tools/testing/selftests/bpf/prog_tests/trace_bench.c b/tools/testing/selftests/bpf/prog_tests/trace_bench.c
new file mode 100644
index 000000000000..673c9acf358c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/trace_bench.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 ChinaTelecom */
+
+#include <test_progs.h>
+#include "bpf/libbpf_internal.h"
+
+#include "fentry_multi_empty.skel.h"
+#include "fentry_empty.skel.h"
+#include "kprobe_multi_empty.skel.h"
+#include "trace_bench.skel.h"
+
+static void test_bench_run(const char *name)
+{
+ struct trace_bench *skel;
+ __u64 bench_result;
+ int err;
+
+ skel = trace_bench__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "trace_bench__open_and_load"))
+ return;
+
+ err = trace_bench__attach(skel);
+ if (!ASSERT_OK(err, "trace_bench__attach"))
+ goto cleanup;
+
+ ASSERT_OK(trigger_module_test_read(1), "trigger_read");
+
+ bench_result = skel->bss->bench_result / 1000;
+ printf("bench time for %s: %lld.%03lldms\n", name, bench_result / 1000,
+ bench_result % 1000);
+cleanup:
+ trace_bench__destroy(skel);
+}
+
+static void test_fentry_multi(bool load_all, char *name)
+{
+ LIBBPF_OPTS(bpf_trace_multi_opts, opts);
+ struct fentry_multi_empty *skel;
+ char **syms = NULL;
+ struct bpf_link *link;
+ size_t cnt = 0;
+ int err;
+
+ skel = fentry_multi_empty__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "fentry_multi_empty__open_and_load"))
+ goto cleanup;
+
+ if (!load_all) {
+ err = fentry_multi_empty__attach(skel);
+ if (!ASSERT_OK(err, "fentry_multi_empty__attach"))
+ goto cleanup;
+ goto do_test;
+ }
+
+ if (!ASSERT_OK(bpf_get_ksyms(&syms, &cnt, true), "get_syms"))
+ return;
+ opts.syms = (const char **) syms;
+ opts.cnt = cnt;
+ opts.skip_invalid = true;
+ link = bpf_program__attach_trace_multi_opts(skel->progs.fentry_multi_empty,
+ &opts);
+ if (!ASSERT_OK_PTR(link, "bpf_program__attach_trace_multi_opts"))
+ goto cleanup;
+ skel->links.fentry_multi_empty = link;
+ printf("attach %d functions before testings\n", (int)opts.cnt);
+
+do_test:
+ test_bench_run(name);
+cleanup:
+ fentry_multi_empty__destroy(skel);
+ if (syms)
+ free(syms);
+}
+
+static void test_fentry_single(void)
+{
+ struct fentry_empty *skel;
+ int err;
+
+ skel = fentry_empty__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "fentry_empty__open_and_load"))
+ return;
+
+ err = fentry_empty__attach(skel);
+ if (!ASSERT_OK(err, "fentry_empty__attach"))
+ goto cleanup;
+
+ test_bench_run("fentry_single");
+cleanup:
+ fentry_empty__destroy(skel);
+}
+
+static void test_kprobe_multi(bool load_all, char *name)
+{
+ LIBBPF_OPTS(bpf_kprobe_multi_opts, opts);
+ char *test_func = "bpf_fentry_test1";
+ struct kprobe_multi_empty *skel;
+ struct bpf_link *link;
+ char **syms = NULL;
+ size_t cnt = 0;
+
+ if (!ASSERT_OK(bpf_get_ksyms(&syms, &cnt, true), "get_syms"))
+ return;
+
+ skel = kprobe_multi_empty__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "kprobe_multi_empty__open_and_load"))
+ goto cleanup;
+
+ if (load_all) {
+ opts.syms = (const char **) syms;
+ opts.cnt = cnt;
+ } else {
+ opts.syms = (const char **) &test_func;
+ opts.cnt = 1;
+ }
+ link = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kprobe_empty,
+ NULL, &opts);
+ if (!ASSERT_OK_PTR(link, "bpf_program__attach_kprobe_multi_opts"))
+ goto cleanup;
+ skel->links.test_kprobe_empty = link;
+
+ if (load_all)
+ printf("attach %d functions before testings\n", (int)opts.cnt);
+ test_bench_run(name);
+
+cleanup:
+ kprobe_multi_empty__destroy(skel);
+ if (syms)
+ free(syms);
+}
+
+void test_trace_bench(void)
+{
+ if (test__start_subtest("nop"))
+ test_bench_run("nop");
+
+ if (test__start_subtest("fentry_single"))
+ test_fentry_single();
+
+ if (test__start_subtest("fentry_multi_single"))
+ test_fentry_multi(false, "fentry_multi_single");
+ if (test__start_subtest("fentry_multi_all"))
+ test_fentry_multi(true, "fentry_multi_all");
+
+ if (test__start_subtest("kprobe_multi_single"))
+ test_kprobe_multi(false, "kprobe_multi_single");
+ if (test__start_subtest("kprobe_multi_all"))
+ test_kprobe_multi(true, "kprobe_multi_all");
+}
diff --git a/tools/testing/selftests/bpf/progs/fentry_empty.c b/tools/testing/selftests/bpf/progs/fentry_empty.c
new file mode 100644
index 000000000000..f2bfaf04d56a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/fentry_empty.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 ChinaTelecom */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(fentry_empty)
+{
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/trace_bench.c b/tools/testing/selftests/bpf/progs/trace_bench.c
new file mode 100644
index 000000000000..98373871414a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/trace_bench.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 ChinaTelecom */
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+__u64 bench_result;
+
+SEC("fexit.multi/bpf_testmod_bench_run")
+int BPF_PROG(fexit_bench_done)
+{
+ __u64 ret = 0;
+
+ bpf_get_func_ret(ctx, &ret);
+ bench_result = ret;
+
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
index ebc4d5204136..d21775eba211 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
@@ -405,6 +405,20 @@ noinline int bpf_testmod_fentry_test11(u64 a, void *b, short c, int d,
return a + (long)b + c + d + (long)e + f + g + h + i + j + k;
}
+extern int bpf_fentry_test1(int a);
+noinline u64 bpf_testmod_bench_run(void)
+{
+ u64 start = ktime_get_boottime_ns();
+ u64 time;
+
+ for (int i = 0; i < 10000000; i++)
+ bpf_fentry_test1(i);
+
+ time = ktime_get_boottime_ns() - start;
+
+ return time;
+}
+
int bpf_testmod_fentry_ok;
noinline ssize_t
@@ -443,6 +457,8 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
(void)trace_bpf_testmod_test_raw_tp_null(NULL);
+ (void)bpf_testmod_bench_run();
+
bpf_testmod_test_struct_ops3();
struct_arg3 = kmalloc((sizeof(struct bpf_testmod_struct_arg_3) +
--
2.39.5
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH bpf-next 14/25] bpf: tracing: add multi-link support
2025-05-28 3:47 ` [PATCH bpf-next 14/25] bpf: tracing: add multi-link support Menglong Dong
@ 2025-05-28 11:34 ` kernel test robot
0 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2025-05-28 11:34 UTC (permalink / raw)
To: Menglong Dong, alexei.starovoitov, rostedt, jolsa
Cc: llvm, oe-kbuild-all, bpf, Menglong Dong, linux-kernel
Hi Menglong,
kernel test robot noticed the following build errors:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Menglong-Dong/add-per-function-metadata-storage-support/20250528-115819
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20250528034712.138701-15-dongml2%40chinatelecom.cn
patch subject: [PATCH bpf-next 14/25] bpf: tracing: add multi-link support
config: arm-randconfig-002-20250528 (https://download.01.org/0day-ci/archive/20250528/202505281947.qIShGsJU-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250528/202505281947.qIShGsJU-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505281947.qIShGsJU-lkp@intel.com/
All errors (new ones prefixed by >>):
>> kernel/bpf/syscall.c:3727:2: error: call to undeclared function 'bpf_gtrampoline_unlink_prog'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
3727 | bpf_gtrampoline_unlink_prog(&multi_link->link);
| ^
kernel/bpf/syscall.c:3727:2: note: did you mean 'bpf_trampoline_unlink_prog'?
include/linux/bpf.h:1492:19: note: 'bpf_trampoline_unlink_prog' declared here
1492 | static inline int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
| ^
>> kernel/bpf/syscall.c:3995:8: error: call to undeclared function 'bpf_gtrampoline_link_prog'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
3995 | err = bpf_gtrampoline_link_prog(&link->link);
| ^
kernel/bpf/syscall.c:3995:8: note: did you mean 'bpf_trampoline_link_prog'?
include/linux/bpf.h:1486:19: note: 'bpf_trampoline_link_prog' declared here
1486 | static inline int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
| ^
kernel/bpf/syscall.c:4001:3: error: call to undeclared function 'bpf_gtrampoline_unlink_prog'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
4001 | bpf_gtrampoline_unlink_prog(&link->link);
| ^
3 errors generated.
vim +/bpf_gtrampoline_unlink_prog +3727 kernel/bpf/syscall.c
3721
3722 static void bpf_tracing_multi_link_release(struct bpf_link *link)
3723 {
3724 struct bpf_tracing_multi_link *multi_link =
3725 container_of(link, struct bpf_tracing_multi_link, link.link);
3726
> 3727 bpf_gtrampoline_unlink_prog(&multi_link->link);
3728 __bpf_tracing_multi_link_release(multi_link);
3729 }
3730
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH bpf-next 03/25] arm64: implement per-function metadata storage for arm64
2025-05-28 3:46 ` [PATCH bpf-next 03/25] arm64: implement per-function metadata storage for arm64 Menglong Dong
@ 2025-05-28 12:16 ` kernel test robot
0 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2025-05-28 12:16 UTC (permalink / raw)
To: Menglong Dong, alexei.starovoitov, rostedt, jolsa
Cc: oe-kbuild-all, bpf, Menglong Dong, linux-kernel
Hi Menglong,
kernel test robot noticed the following build errors:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Menglong-Dong/add-per-function-metadata-storage-support/20250528-115819
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20250528034712.138701-4-dongml2%40chinatelecom.cn
patch subject: [PATCH bpf-next 03/25] arm64: implement per-function metadata storage for arm64
config: arm64-randconfig-002-20250528 (https://download.01.org/0day-ci/archive/20250528/202505282007.0CscfzXZ-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 7.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250528/202505282007.0CscfzXZ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505282007.0CscfzXZ-lkp@intel.com/
All errors (new ones prefixed by >>):
>> aarch64-linux-gcc: error: unrecognized command line option '-fpatchable-function-entry=1,1'
make[3]: *** [scripts/Makefile.build:203: scripts/mod/empty.o] Error 1 shuffle=4239289662
>> aarch64-linux-gcc: error: unrecognized command line option '-fpatchable-function-entry=1,1'
make[3]: *** [scripts/Makefile.build:98: scripts/mod/devicetable-offsets.s] Error 1 shuffle=4239289662
make[3]: Target 'scripts/mod/' not remade because of errors.
make[2]: *** [Makefile:1281: prepare0] Error 2 shuffle=4239289662
make[2]: Target 'prepare' not remade because of errors.
make[1]: *** [Makefile:248: __sub-make] Error 2 shuffle=4239289662
make[1]: Target 'prepare' not remade because of errors.
make: *** [Makefile:248: __sub-make] Error 2 shuffle=4239289662
make: Target 'prepare' not remade because of errors.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH bpf-next 15/25] ftrace: factor out __unregister_ftrace_direct
2025-05-28 3:47 ` [PATCH bpf-next 15/25] ftrace: factor out __unregister_ftrace_direct Menglong Dong
@ 2025-05-28 12:37 ` kernel test robot
0 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2025-05-28 12:37 UTC (permalink / raw)
To: Menglong Dong, alexei.starovoitov, rostedt, jolsa
Cc: oe-kbuild-all, bpf, Menglong Dong, linux-kernel
Hi Menglong,
kernel test robot noticed the following build warnings:
[auto build test WARNING on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Menglong-Dong/add-per-function-metadata-storage-support/20250528-115819
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20250528034712.138701-16-dongml2%40chinatelecom.cn
patch subject: [PATCH bpf-next 15/25] ftrace: factor out __unregister_ftrace_direct
config: sparc-randconfig-001-20250528 (https://download.01.org/0day-ci/archive/20250528/202505282037.xt8RiXkG-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 14.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250528/202505282037.xt8RiXkG-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505282037.xt8RiXkG-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> kernel/trace/ftrace.c:116:12: warning: '__unregister_ftrace_direct' declared 'static' but never defined [-Wunused-function]
116 | static int __unregister_ftrace_direct(struct ftrace_ops *ops, unsigned long addr,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
--
>> kernel/trace/ftrace.c:6053: warning: expecting prototype for unregister_ftrace_direct(). Prototype was for __unregister_ftrace_direct() instead
vim +116 kernel/trace/ftrace.c
114
115 static void ftrace_update_trampoline(struct ftrace_ops *ops);
> 116 static int __unregister_ftrace_direct(struct ftrace_ops *ops, unsigned long addr,
117 bool free_filters);
118
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH bpf-next 00/25] bpf: tracing multi-link support
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (24 preceding siblings ...)
2025-05-28 3:47 ` [PATCH bpf-next 25/25] selftests/bpf: add performance bench test for trace prog Menglong Dong
@ 2025-05-28 13:51 ` Steven Rostedt
2025-05-29 1:44 ` Menglong Dong
2025-06-11 3:32 ` Alexei Starovoitov
26 siblings, 1 reply; 38+ messages in thread
From: Steven Rostedt @ 2025-05-28 13:51 UTC (permalink / raw)
To: Menglong Dong; +Cc: alexei.starovoitov, jolsa, bpf, Menglong Dong, linux-kernel
On Wed, 28 May 2025 11:46:47 +0800
Menglong Dong <menglong8.dong@gmail.com> wrote:
> After four months, I finally finish the coding and testing of this series.
> This is my first time to write such a complex series, and it's so hard :/
> Anyway, I finished it.
> (I'm scared :/)
>
Note, sending out a complex series like this at the start of the merge
window is not good timing.
Most kernel maintainers will not be able to even look at this until the
merge window is closed (in two weeks).
That includes myself.
-- Steve
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH bpf-next 00/25] bpf: tracing multi-link support
2025-05-28 13:51 ` [PATCH bpf-next 00/25] bpf: tracing multi-link support Steven Rostedt
@ 2025-05-29 1:44 ` Menglong Dong
0 siblings, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-05-29 1:44 UTC (permalink / raw)
To: Steven Rostedt
Cc: alexei.starovoitov, jolsa, bpf, Menglong Dong, linux-kernel
On Wed, May 28, 2025 at 9:50 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 28 May 2025 11:46:47 +0800
> Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> > After four months, I finally finish the coding and testing of this series.
> > This is my first time to write such a complex series, and it's so hard :/
> > Anyway, I finished it.
> > (I'm scared :/)
> >
>
> Note, sending out a complex series like this at the start of the merge
> window is not good timing.
>
> Most kernel maintainers will not be able to even look at this until the
> merge window is closed (in two weeks).
Hi Steven, thank you for letting me know.
(It seems that I need to spend some time learning more about the merge
window and related processes :/)
I plan to resend this patch series in two weeks, which also gives me a
chance to further improve it to make it less shit.
Thanks!
Menglong Dong
>
> That includes myself.
>
> -- Steve
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH bpf-next 00/25] bpf: tracing multi-link support
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
` (25 preceding siblings ...)
2025-05-28 13:51 ` [PATCH bpf-next 00/25] bpf: tracing multi-link support Steven Rostedt
@ 2025-06-11 3:32 ` Alexei Starovoitov
2025-06-11 12:58 ` Menglong Dong
26 siblings, 1 reply; 38+ messages in thread
From: Alexei Starovoitov @ 2025-06-11 3:32 UTC (permalink / raw)
To: Menglong Dong; +Cc: Steven Rostedt, Jiri Olsa, bpf, Menglong Dong, LKML
On Tue, May 27, 2025 at 8:49 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> 1. Add per-function metadata storage support.
> 2. Add bpf global trampoline support for x86_64.
> 3. Add bpf global trampoline link support.
> 4. Add tracing multi-link support.
> 5. Compatibility between tracing and tracing_multi.
...
> ... and I think it will be a
> liberation to split it out to another series :/
There are lots of interesting ideas here and you know
already what the next step should be...
Split it into small chunks.
As presented it's hard to review and even if maintainers take on
that challenge the set is unlandable, since it spans various
subsystems.
In a small reviewable patch set we can argue about
approach A vs B while the current set has too many angles
to argue about.
Like the new concept of global trampoline.
It's nice to write bpf_global_caller() in asm
compared to arch_prepare_bpf_trampoline() that emits asm
on the fly, but it seems the only thing where it truly
needs asm is register save/restore. The rest can be done in C.
I suspect the whole gtramp can be written in C.
There is an attribute(interrupt) that all compilers support...
or use no attributes and inline asm for regs save/restore ?
or attribute(naked) and more inline asm ?
> no-mitigate + hash table mode
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> nop | fentry | fm_single | fm_all | km_single | km_all
> 9.014ms | 162.378ms | 180.511ms | 446.286ms | 220.634ms | 1465.133ms
> 9.038ms | 161.600ms | 178.757ms | 445.807ms | 220.656ms | 1463.714ms
> 9.048ms | 161.435ms | 180.510ms | 452.530ms | 220.943ms | 1487.494ms
> 9.030ms | 161.585ms | 178.699ms | 448.167ms | 220.107ms | 1463.785ms
> 9.056ms | 161.530ms | 178.947ms | 445.609ms | 221.026ms | 1560.584ms
...
> no-mitigate + function padding mode
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> nop | fentry | fm_single | fm_all | km_single | km_all
> 9.320ms | 166.454ms | 184.094ms | 193.884ms | 227.320ms | 1441.462ms
> 9.326ms | 166.651ms | 183.954ms | 193.912ms | 227.503ms | 1544.634ms
> 9.313ms | 170.501ms | 183.985ms | 191.738ms | 227.801ms | 1441.284ms
> 9.311ms | 166.957ms | 182.086ms | 192.063ms | 410.411ms | 1489.665ms
> 9.329ms | 166.332ms | 182.196ms | 194.154ms | 227.443ms | 1511.272ms
>
> The overhead of fentry_multi_all is a little higher than the
> fentry_multi_single. Maybe it is because the function
> ktime_get_boottime_ns(), which is used in bpf_testmod_bench_run(), is also
> traced? I haven't figured it out yet, but it doesn't matter :/
I think it matters a lot.
Looking at patch 25 the fm_all (in addition to fm_single) only
suppose to trigger from ktime_get_boottime,
but for hash table mode the difference is huge.
10M bpf_fentry_test1() calls are supposed to dominate 2 calls
to ktime_get and whatever else is called there,
but this is not what numbers tell.
Same discrepancy with kprobe_multi. 7x difference has to be understood,
since it's a sign that the benchmark is not really measuring
what it is supposed to measure. Which casts doubts on all numbers.
Another part is how come fentry is 20x slower than nop.
We don't see it in the existing bench-es. That's another red flag.
You need to rethink benchmarking strategy. The bench itself
should be spotless. Don't invent new stuff. Add to existing benchs.
They already measure nop, fentry, kprobe, kprobe-multi.
Then only introduce a global trampoline with a simple hash tab.
Compare against current numbers for fentry.
fm_single has to be within couple percents of fentry.
Then make fm_all attach to everything except funcs that bench trigger calls.
fm_all has to be exactly equal to fm_single.
If the difference is 2.5x like here (180 for fm_single vs 446 for fm_all)
something is wrong. Investigate it and don't proceed without full
understanding.
And only then introduce 5 byte special insn that indices into
an array for fast access to metadata.
Your numbers are a bit suspicious, but they show that fm_single
with hash tab is the same speed as the special kfunc_md_arch_support().
Which is expected.
With fm_all that triggers small set of kernel function
in a tight benchmark loop the performance of hashtab vs special
should _also_ be the same, because hashtab will perform O(1) lookup
that is hot in the cache (or hashtab has bad collisions and should be fixed).
fm_all should have the same speed as fm_single too,
because bench will only attach to things outside of the tight bench loop.
So attaching to thousands of kernel functions that are not being
triggered by the benchmark should not affect results.
The performance advantage of special kfunc_md_arch_support()
can probably only be seen in production when fentry.multi attaches
to thousands of kernel functions and random functions are called.
Then hash tab cache misses will be noticeable vs direct access.
There will be cache misses in both cases, but significantly more misses
for hash tab. Only then we can decide where special stuff is truly necessary.
So patches 2 and 3 are really last. After everything had already landed.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH bpf-next 00/25] bpf: tracing multi-link support
2025-06-11 3:32 ` Alexei Starovoitov
@ 2025-06-11 12:58 ` Menglong Dong
2025-06-11 16:11 ` Alexei Starovoitov
0 siblings, 1 reply; 38+ messages in thread
From: Menglong Dong @ 2025-06-11 12:58 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Steven Rostedt, Jiri Olsa, bpf, Menglong Dong, LKML
On 6/11/25 11:32, Alexei Starovoitov wrote:
> On Tue, May 27, 2025 at 8:49 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>>
>> 1. Add per-function metadata storage support.
>> 2. Add bpf global trampoline support for x86_64.
>> 3. Add bpf global trampoline link support.
>> 4. Add tracing multi-link support.
>> 5. Compatibility between tracing and tracing_multi.
>
> ...
>
>> ... and I think it will be a
>> liberation to split it out to another series :/
>
> There are lots of interesting ideas here and you know
> already what the next step should be...
> Split it into small chunks.
> As presented it's hard to review and even if maintainers take on
> that challenge the set is unlandable, since it spans various
> subsystems.
>
> In a small reviewable patch set we can argue about
> approach A vs B while the current set has too many angles
> to argue about.
Hi, Alexei.
You are right. In the very beginning, I planned to make the kernel function
metadata to be the first series. However, it's hard to judge if the function
metadata is useful without the usage of the BPF tracing multi-link. So I
kneaded them together in this series.
The features in this series can be split into 4 part:
* kernel function metadata
* BPF global trampoline
* tracing multi-link support
* gtramp work together with trampoline
I was planning to split out the 4th part out of this series. And now, I'm
not sure if we should split it in the following way:
* series 1: kernel function metadata
* series 2: BPF global trampoline + tracing multi-link support
* series 3: gtramp work together with trampoline
>
> Like the new concept of global trampoline.
> It's nice to write bpf_global_caller() in asm
> compared to arch_prepare_bpf_trampoline() that emits asm
> on the fly, but it seems the only thing where it truly
> needs asm is register save/restore. The rest can be done in C.
We also need to get the function ip from the stack and do the origin
call with asm.
>
> I suspect the whole gtramp can be written in C.
> There is an attribute(interrupt) that all compilers support...
> or use no attributes and inline asm for regs save/restore ?
> or attribute(naked) and more inline asm ?
That's a nice shot, which will make the bpf_global_caller() much easier.
I believe it worth a try.
>
>> no-mitigate + hash table mode
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> nop | fentry | fm_single | fm_all | km_single | km_all
>> 9.014ms | 162.378ms | 180.511ms | 446.286ms | 220.634ms | 1465.133ms
>> 9.038ms | 161.600ms | 178.757ms | 445.807ms | 220.656ms | 1463.714ms
>> 9.048ms | 161.435ms | 180.510ms | 452.530ms | 220.943ms | 1487.494ms
>> 9.030ms | 161.585ms | 178.699ms | 448.167ms | 220.107ms | 1463.785ms
>> 9.056ms | 161.530ms | 178.947ms | 445.609ms | 221.026ms | 1560.584ms
>
> ...
>
>> no-mitigate + function padding mode
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> nop | fentry | fm_single | fm_all | km_single | km_all
>> 9.320ms | 166.454ms | 184.094ms | 193.884ms | 227.320ms | 1441.462ms
>> 9.326ms | 166.651ms | 183.954ms | 193.912ms | 227.503ms | 1544.634ms
>> 9.313ms | 170.501ms | 183.985ms | 191.738ms | 227.801ms | 1441.284ms
>> 9.311ms | 166.957ms | 182.086ms | 192.063ms | 410.411ms | 1489.665ms
>> 9.329ms | 166.332ms | 182.196ms | 194.154ms | 227.443ms | 1511.272ms
>>
>> The overhead of fentry_multi_all is a little higher than the
>> fentry_multi_single. Maybe it is because the function
>> ktime_get_boottime_ns(), which is used in bpf_testmod_bench_run(), is also
>> traced? I haven't figured it out yet, but it doesn't matter :/
>
> I think it matters a lot.
> Looking at patch 25 the fm_all (in addition to fm_single) only
> suppose to trigger from ktime_get_boottime,
> but for hash table mode the difference is huge.
> 10M bpf_fentry_test1() calls are supposed to dominate 2 calls
> to ktime_get and whatever else is called there,
> but this is not what numbers tell.
>
> Same discrepancy with kprobe_multi. 7x difference has to be understood,
> since it's a sign that the benchmark is not really measuring
> what it is supposed to measure. Which casts doubts on all numbers.
I think there is some misunderstand here. In the hash table mode, we trace
all the kernel function for fm_all and km_all. Compared to fm_single and
km_single, the overhead of fm_all and km_all suffer from the hash lookup,
as we traced 40k+ functions in these case.
The overhead of kprobe_multi has a linear relation with the total kernel
function number in fprobe, so the 7x difference is reasonable. The same
to fentry_multi in hash table mode.
NOTE: The hash table lookup is not O(1) if the function number that we
traced more than 1k. According to my research, the loop count that we use
to find bpf_fentry_test1() with hlist_for_each_entry() is about 35 when
the functions number in the hash table is 47k.
BTW, the array length of the hash table that we use is 1024.
The CPU I used for the testing is:
AMD Ryzen 9 7940HX with Radeon Graphics
>
> Another part is how come fentry is 20x slower than nop.
> We don't see it in the existing bench-es. That's another red flag.
I think this has a strong relation with the Kconfig I use. When I do the
testing with "make tinyconfig" as the base, the fentry is ~9x slower than
nop. I do this test with the Kconfig of debian12 (6.1 kernel), and I think
there is more overhead to rcu_read_lock, migrate_disable, etc, in this
Kconfig.
>
> You need to rethink benchmarking strategy. The bench itself
> should be spotless. Don't invent new stuff. Add to existing benchs.
> They already measure nop, fentry, kprobe, kprobe-multi.
Great! It seems that I did so many useless works on the bench testing :/
>
> Then only introduce a global trampoline with a simple hash tab.
> Compare against current numbers for fentry.
> fm_single has to be within couple percents of fentry.
> Then make fm_all attach to everything except funcs that bench trigger calls.
> fm_all has to be exactly equal to fm_single.
> If the difference is 2.5x like here (180 for fm_single vs 446 for fm_all)
> something is wrong. Investigate it and don't proceed without full
> understanding.
Emm......Like what I explain above, the 2.5X difference is reasonable, and
this is exact the reason why we need the function padding based metadata,
which is able to make fentry_multi and kprobe_multi(in the feature) out of
overhead of the hash lookup.
>
> And only then introduce 5 byte special insn that indices into
> an array for fast access to metadata.
> Your numbers are a bit suspicious, but they show that fm_single
> with hash tab is the same speed as the special kfunc_md_arch_support().
> Which is expected.
> With fm_all that triggers small set of kernel function
> in a tight benchmark loop the performance of hashtab vs special
> should _also_ be the same, because hashtab will perform O(1) lookup
> that is hot in the cache (or hashtab has bad collisions and should be fixed).
I think this is the problem. The kernel function number is much more than
the array length, which makes the hash lookup not O(1) anymore.
Sorry that I wanted to show the performance of function padding based
metadata, and made the kernel function number that we traced huge, which
is ~47k.
When the function number less than 2k, the performance of fm_single and
fm_all don't have much difference, according to my previous testing :/
>
> fm_all should have the same speed as fm_single too,
> because bench will only attach to things outside of the tight bench loop.
> So attaching to thousands of kernel functions that are not being
> triggered by the benchmark should not affect results.
This is 47k kernel functions in this testing :/
> The performance advantage of special kfunc_md_arch_support()
> can probably only be seen in production when fentry.multi attaches
> to thousands of kernel functions and random functions are called.
> Then hash tab cache misses will be noticeable vs direct access.
> There will be cache misses in both cases, but significantly more misses
> for hash tab. Only then we can decide where special stuff is truly necessary.
> So patches 2 and 3 are really last. After everything had already landed.
Emm......The cache miss is something I didn't expect. The only thing I
concerned before is just the overhead of the hash lookup. To my utter
astonishment, this actually helps with cache misses as well!
BTW, should I still split out the function padding based metadata in
the last series?
Thanks!
Menglong Dong
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH bpf-next 00/25] bpf: tracing multi-link support
2025-06-11 12:58 ` Menglong Dong
@ 2025-06-11 16:11 ` Alexei Starovoitov
2025-06-12 0:07 ` Menglong Dong
0 siblings, 1 reply; 38+ messages in thread
From: Alexei Starovoitov @ 2025-06-11 16:11 UTC (permalink / raw)
To: Menglong Dong; +Cc: Steven Rostedt, Jiri Olsa, bpf, Menglong Dong, LKML
On Wed, Jun 11, 2025 at 5:59 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> On 6/11/25 11:32, Alexei Starovoitov wrote:
> > On Tue, May 27, 2025 at 8:49 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >>
> >> 1. Add per-function metadata storage support.
> >> 2. Add bpf global trampoline support for x86_64.
> >> 3. Add bpf global trampoline link support.
> >> 4. Add tracing multi-link support.
> >> 5. Compatibility between tracing and tracing_multi.
> >
> > ...
> >
> >> ... and I think it will be a
> >> liberation to split it out to another series :/
> >
> > There are lots of interesting ideas here and you know
> > already what the next step should be...
> > Split it into small chunks.
> > As presented it's hard to review and even if maintainers take on
> > that challenge the set is unlandable, since it spans various
> > subsystems.
> >
> > In a small reviewable patch set we can argue about
> > approach A vs B while the current set has too many angles
> > to argue about.
>
>
> Hi, Alexei.
>
>
> You are right. In the very beginning, I planned to make the kernel function
> metadata to be the first series. However, it's hard to judge if the function
> metadata is useful without the usage of the BPF tracing multi-link. So I
> kneaded them together in this series.
>
>
> The features in this series can be split into 4 part:
> * kernel function metadata
> * BPF global trampoline
> * tracing multi-link support
> * gtramp work together with trampoline
>
>
> I was planning to split out the 4th part out of this series. And now, I'm
> not sure if we should split it in the following way:
>
> * series 1: kernel function metadata
> * series 2: BPF global trampoline + tracing multi-link support
> * series 3: gtramp work together with trampoline
Neither. First thing is to understand benchmark numbers.
We're not there yet.
> >
> > Like the new concept of global trampoline.
> > It's nice to write bpf_global_caller() in asm
> > compared to arch_prepare_bpf_trampoline() that emits asm
> > on the fly, but it seems the only thing where it truly
> > needs asm is register save/restore. The rest can be done in C.
>
>
> We also need to get the function ip from the stack and do the origin
> call with asm.
>
>
> >
> > I suspect the whole gtramp can be written in C.
> > There is an attribute(interrupt) that all compilers support...
> > or use no attributes and inline asm for regs save/restore ?
> > or attribute(naked) and more inline asm ?
>
>
> That's a nice shot, which will make the bpf_global_caller() much easier.
> I believe it worth a try.
>
>
> >
> >> no-mitigate + hash table mode
> >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >> nop | fentry | fm_single | fm_all | km_single | km_all
> >> 9.014ms | 162.378ms | 180.511ms | 446.286ms | 220.634ms | 1465.133ms
> >> 9.038ms | 161.600ms | 178.757ms | 445.807ms | 220.656ms | 1463.714ms
> >> 9.048ms | 161.435ms | 180.510ms | 452.530ms | 220.943ms | 1487.494ms
> >> 9.030ms | 161.585ms | 178.699ms | 448.167ms | 220.107ms | 1463.785ms
> >> 9.056ms | 161.530ms | 178.947ms | 445.609ms | 221.026ms | 1560.584ms
> >
> > ...
> >
> >> no-mitigate + function padding mode
> >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >> nop | fentry | fm_single | fm_all | km_single | km_all
> >> 9.320ms | 166.454ms | 184.094ms | 193.884ms | 227.320ms | 1441.462ms
> >> 9.326ms | 166.651ms | 183.954ms | 193.912ms | 227.503ms | 1544.634ms
> >> 9.313ms | 170.501ms | 183.985ms | 191.738ms | 227.801ms | 1441.284ms
> >> 9.311ms | 166.957ms | 182.086ms | 192.063ms | 410.411ms | 1489.665ms
> >> 9.329ms | 166.332ms | 182.196ms | 194.154ms | 227.443ms | 1511.272ms
> >>
> >> The overhead of fentry_multi_all is a little higher than the
> >> fentry_multi_single. Maybe it is because the function
> >> ktime_get_boottime_ns(), which is used in bpf_testmod_bench_run(), is also
> >> traced? I haven't figured it out yet, but it doesn't matter :/
> >
> > I think it matters a lot.
> > Looking at patch 25 the fm_all (in addition to fm_single) only
> > suppose to trigger from ktime_get_boottime,
> > but for hash table mode the difference is huge.
> > 10M bpf_fentry_test1() calls are supposed to dominate 2 calls
> > to ktime_get and whatever else is called there,
> > but this is not what numbers tell.
> >
> > Same discrepancy with kprobe_multi. 7x difference has to be understood,
> > since it's a sign that the benchmark is not really measuring
> > what it is supposed to measure. Which casts doubts on all numbers.
>
>
> I think there is some misunderstand here. In the hash table mode, we trace
> all the kernel function for fm_all and km_all. Compared to fm_single and
> km_single, the overhead of fm_all and km_all suffer from the hash lookup,
> as we traced 40k+ functions in these case.
>
>
> The overhead of kprobe_multi has a linear relation with the total kernel
> function number in fprobe, so the 7x difference is reasonable. The same
> to fentry_multi in hash table mode.
No, it's not. More below...
> NOTE: The hash table lookup is not O(1) if the function number that we
> traced more than 1k. According to my research, the loop count that we use
> to find bpf_fentry_test1() with hlist_for_each_entry() is about 35 when
> the functions number in the hash table is 47k.
>
> BTW, the array length of the hash table that we use is 1024.
and that's the bug.
You added 47k functions to a htab with 1k bucket and
argue it's performance is slow?!
That's a pointless baseline.
Use rhashtable or size up buckets to match the number of functions
being traced, so that hash lookup is O(1).
>
> The CPU I used for the testing is:
> AMD Ryzen 9 7940HX with Radeon Graphics
>
>
> >
> > Another part is how come fentry is 20x slower than nop.
> > We don't see it in the existing bench-es. That's another red flag.
>
>
> I think this has a strong relation with the Kconfig I use. When I do the
> testing with "make tinyconfig" as the base, the fentry is ~9x slower than
> nop. I do this test with the Kconfig of debian12 (6.1 kernel), and I think
> there is more overhead to rcu_read_lock, migrate_disable, etc, in this
> Kconfig.
It shouldn't make any difference if hashtable is properly used.
>
>
> >
> > You need to rethink benchmarking strategy. The bench itself
> > should be spotless. Don't invent new stuff. Add to existing benchs.
> > They already measure nop, fentry, kprobe, kprobe-multi.
>
>
> Great! It seems that I did so many useless works on the bench testing :/
>
>
> >
> > Then only introduce a global trampoline with a simple hash tab.
> > Compare against current numbers for fentry.
> > fm_single has to be within couple percents of fentry.
> > Then make fm_all attach to everything except funcs that bench trigger calls.
> > fm_all has to be exactly equal to fm_single.
> > If the difference is 2.5x like here (180 for fm_single vs 446 for fm_all)
> > something is wrong. Investigate it and don't proceed without full
> > understanding.
>
>
> Emm......Like what I explain above, the 2.5X difference is reasonable, and
> this is exact the reason why we need the function padding based metadata,
> which is able to make fentry_multi and kprobe_multi(in the feature) out of
> overhead of the hash lookup.
Absolutely not. It only points into an implementation issue with hashtab.
>
> >
> > And only then introduce 5 byte special insn that indices into
> > an array for fast access to metadata.
> > Your numbers are a bit suspicious, but they show that fm_single
> > with hash tab is the same speed as the special kfunc_md_arch_support().
> > Which is expected.
> > With fm_all that triggers small set of kernel function
> > in a tight benchmark loop the performance of hashtab vs special
> > should _also_ be the same, because hashtab will perform O(1) lookup
> > that is hot in the cache (or hashtab has bad collisions and should be fixed).
>
>
> I think this is the problem. The kernel function number is much more than
> the array length, which makes the hash lookup not O(1) anymore.
>
> Sorry that I wanted to show the performance of function padding based
> metadata, and made the kernel function number that we traced huge, which
> is ~47k.
>
>
> When the function number less than 2k, the performance of fm_single and
> fm_all don't have much difference, according to my previous testing :/
Sigh. You should have said that in the beginning that your hashtab
is fixed size. All the comparisons and reasons are bogus.
>
> >
> > fm_all should have the same speed as fm_single too,
> > because bench will only attach to things outside of the tight bench loop.
> > So attaching to thousands of kernel functions that are not being
> > triggered by the benchmark should not affect results.
>
>
> This is 47k kernel functions in this testing :/
>
>
> > The performance advantage of special kfunc_md_arch_support()
> > can probably only be seen in production when fentry.multi attaches
> > to thousands of kernel functions and random functions are called.
> > Then hash tab cache misses will be noticeable vs direct access.
> > There will be cache misses in both cases, but significantly more misses
> > for hash tab. Only then we can decide where special stuff is truly necessary.
> > So patches 2 and 3 are really last. After everything had already landed.
>
>
> Emm......The cache miss is something I didn't expect. The only thing I
> concerned before is just the overhead of the hash lookup. To my utter
> astonishment, this actually helps with cache misses as well!
>
>
> BTW, should I still split out the function padding based metadata in
> the last series?
No. First make sure fm_single and fm_all has the same performance
with hashtable and demonstrate that with existing selftests/bpf/benchs/
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH bpf-next 00/25] bpf: tracing multi-link support
2025-06-11 16:11 ` Alexei Starovoitov
@ 2025-06-12 0:07 ` Menglong Dong
2025-06-12 0:58 ` Alexei Starovoitov
0 siblings, 1 reply; 38+ messages in thread
From: Menglong Dong @ 2025-06-12 0:07 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Steven Rostedt, Jiri Olsa, bpf, Menglong Dong, LKML
On Thu, Jun 12, 2025 at 12:11 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jun 11, 2025 at 5:59 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > On 6/11/25 11:32, Alexei Starovoitov wrote:
> > > On Tue, May 27, 2025 at 8:49 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > >>
> > >> 1. Add per-function metadata storage support.
> > >> 2. Add bpf global trampoline support for x86_64.
> > >> 3. Add bpf global trampoline link support.
> > >> 4. Add tracing multi-link support.
> > >> 5. Compatibility between tracing and tracing_multi.
> > >
> > > ...
> > >
> > >> ... and I think it will be a
> > >> liberation to split it out to another series :/
> > >
> > > There are lots of interesting ideas here and you know
> > > already what the next step should be...
> > > Split it into small chunks.
> > > As presented it's hard to review and even if maintainers take on
> > > that challenge the set is unlandable, since it spans various
> > > subsystems.
> > >
> > > In a small reviewable patch set we can argue about
> > > approach A vs B while the current set has too many angles
> > > to argue about.
> >
> >
> > Hi, Alexei.
> >
> >
> > You are right. In the very beginning, I planned to make the kernel function
> > metadata to be the first series. However, it's hard to judge if the function
> > metadata is useful without the usage of the BPF tracing multi-link. So I
> > kneaded them together in this series.
> >
> >
> > The features in this series can be split into 4 part:
> > * kernel function metadata
> > * BPF global trampoline
> > * tracing multi-link support
> > * gtramp work together with trampoline
> >
> >
> > I was planning to split out the 4th part out of this series. And now, I'm
> > not sure if we should split it in the following way:
> >
> > * series 1: kernel function metadata
> > * series 2: BPF global trampoline + tracing multi-link support
> > * series 3: gtramp work together with trampoline
>
> Neither. First thing is to understand benchmark numbers.
> We're not there yet.
>
> > >
> > > Like the new concept of global trampoline.
> > > It's nice to write bpf_global_caller() in asm
> > > compared to arch_prepare_bpf_trampoline() that emits asm
> > > on the fly, but it seems the only thing where it truly
> > > needs asm is register save/restore. The rest can be done in C.
> >
> >
> > We also need to get the function ip from the stack and do the origin
> > call with asm.
> >
> >
> > >
> > > I suspect the whole gtramp can be written in C.
> > > There is an attribute(interrupt) that all compilers support...
> > > or use no attributes and inline asm for regs save/restore ?
> > > or attribute(naked) and more inline asm ?
> >
> >
> > That's a nice shot, which will make the bpf_global_caller() much easier.
> > I believe it worth a try.
> >
> >
> > >
> > >> no-mitigate + hash table mode
> > >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >> nop | fentry | fm_single | fm_all | km_single | km_all
> > >> 9.014ms | 162.378ms | 180.511ms | 446.286ms | 220.634ms | 1465.133ms
> > >> 9.038ms | 161.600ms | 178.757ms | 445.807ms | 220.656ms | 1463.714ms
> > >> 9.048ms | 161.435ms | 180.510ms | 452.530ms | 220.943ms | 1487.494ms
> > >> 9.030ms | 161.585ms | 178.699ms | 448.167ms | 220.107ms | 1463.785ms
> > >> 9.056ms | 161.530ms | 178.947ms | 445.609ms | 221.026ms | 1560.584ms
> > >
> > > ...
> > >
> > >> no-mitigate + function padding mode
> > >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >> nop | fentry | fm_single | fm_all | km_single | km_all
> > >> 9.320ms | 166.454ms | 184.094ms | 193.884ms | 227.320ms | 1441.462ms
> > >> 9.326ms | 166.651ms | 183.954ms | 193.912ms | 227.503ms | 1544.634ms
> > >> 9.313ms | 170.501ms | 183.985ms | 191.738ms | 227.801ms | 1441.284ms
> > >> 9.311ms | 166.957ms | 182.086ms | 192.063ms | 410.411ms | 1489.665ms
> > >> 9.329ms | 166.332ms | 182.196ms | 194.154ms | 227.443ms | 1511.272ms
> > >>
> > >> The overhead of fentry_multi_all is a little higher than the
> > >> fentry_multi_single. Maybe it is because the function
> > >> ktime_get_boottime_ns(), which is used in bpf_testmod_bench_run(), is also
> > >> traced? I haven't figured it out yet, but it doesn't matter :/
> > >
> > > I think it matters a lot.
> > > Looking at patch 25 the fm_all (in addition to fm_single) only
> > > suppose to trigger from ktime_get_boottime,
> > > but for hash table mode the difference is huge.
> > > 10M bpf_fentry_test1() calls are supposed to dominate 2 calls
> > > to ktime_get and whatever else is called there,
> > > but this is not what numbers tell.
> > >
> > > Same discrepancy with kprobe_multi. 7x difference has to be understood,
> > > since it's a sign that the benchmark is not really measuring
> > > what it is supposed to measure. Which casts doubts on all numbers.
> >
> >
> > I think there is some misunderstand here. In the hash table mode, we trace
> > all the kernel function for fm_all and km_all. Compared to fm_single and
> > km_single, the overhead of fm_all and km_all suffer from the hash lookup,
> > as we traced 40k+ functions in these case.
> >
> >
> > The overhead of kprobe_multi has a linear relation with the total kernel
> > function number in fprobe, so the 7x difference is reasonable. The same
> > to fentry_multi in hash table mode.
>
> No, it's not. More below...
>
> > NOTE: The hash table lookup is not O(1) if the function number that we
> > traced more than 1k. According to my research, the loop count that we use
> > to find bpf_fentry_test1() with hlist_for_each_entry() is about 35 when
> > the functions number in the hash table is 47k.
> >
> > BTW, the array length of the hash table that we use is 1024.
>
> and that's the bug.
> You added 47k functions to a htab with 1k bucket and
> argue it's performance is slow?!
> That's a pointless baseline.
> Use rhashtable or size up buckets to match the number of functions
> being traced, so that hash lookup is O(1).
Hi Alexei, thank you for your explanation, and now I realize the
problem is my hash table :/
My hash table made reference to ftrace and fprobe, whose
max budget length is 1024.
It's interesting to make the hash table O(1) by using rhashtable
or sizing up the budgets, as you said. I suspect we even don't
need the function padding part if the hash table is random
enough.
I'll redesign the hash table part, and do the testing with the existing
bench to make fm_single the same as fm_all, which should be in
theory.
Thanks!
Menglong Dong
>
> >
> > The CPU I used for the testing is:
> > AMD Ryzen 9 7940HX with Radeon Graphics
> >
> >
> > >
> > > Another part is how come fentry is 20x slower than nop.
> > > We don't see it in the existing bench-es. That's another red flag.
> >
> >
> > I think this has a strong relation with the Kconfig I use. When I do the
> > testing with "make tinyconfig" as the base, the fentry is ~9x slower than
> > nop. I do this test with the Kconfig of debian12 (6.1 kernel), and I think
> > there is more overhead to rcu_read_lock, migrate_disable, etc, in this
> > Kconfig.
>
> It shouldn't make any difference if hashtable is properly used.
>
> >
> >
> > >
> > > You need to rethink benchmarking strategy. The bench itself
> > > should be spotless. Don't invent new stuff. Add to existing benchs.
> > > They already measure nop, fentry, kprobe, kprobe-multi.
> >
> >
> > Great! It seems that I did so many useless works on the bench testing :/
> >
> >
> > >
> > > Then only introduce a global trampoline with a simple hash tab.
> > > Compare against current numbers for fentry.
> > > fm_single has to be within couple percents of fentry.
> > > Then make fm_all attach to everything except funcs that bench trigger calls.
> > > fm_all has to be exactly equal to fm_single.
> > > If the difference is 2.5x like here (180 for fm_single vs 446 for fm_all)
> > > something is wrong. Investigate it and don't proceed without full
> > > understanding.
> >
> >
> > Emm......Like what I explain above, the 2.5X difference is reasonable, and
> > this is exact the reason why we need the function padding based metadata,
> > which is able to make fentry_multi and kprobe_multi(in the feature) out of
> > overhead of the hash lookup.
>
> Absolutely not. It only points into an implementation issue with hashtab.
>
> >
> > >
> > > And only then introduce 5 byte special insn that indices into
> > > an array for fast access to metadata.
> > > Your numbers are a bit suspicious, but they show that fm_single
> > > with hash tab is the same speed as the special kfunc_md_arch_support().
> > > Which is expected.
> > > With fm_all that triggers small set of kernel function
> > > in a tight benchmark loop the performance of hashtab vs special
> > > should _also_ be the same, because hashtab will perform O(1) lookup
> > > that is hot in the cache (or hashtab has bad collisions and should be fixed).
> >
> >
> > I think this is the problem. The kernel function number is much more than
> > the array length, which makes the hash lookup not O(1) anymore.
> >
> > Sorry that I wanted to show the performance of function padding based
> > metadata, and made the kernel function number that we traced huge, which
> > is ~47k.
> >
> >
> > When the function number less than 2k, the performance of fm_single and
> > fm_all don't have much difference, according to my previous testing :/
>
> Sigh. You should have said that in the beginning that your hashtab
> is fixed size. All the comparisons and reasons are bogus.
>
> >
> > >
> > > fm_all should have the same speed as fm_single too,
> > > because bench will only attach to things outside of the tight bench loop.
> > > So attaching to thousands of kernel functions that are not being
> > > triggered by the benchmark should not affect results.
> >
> >
> > This is 47k kernel functions in this testing :/
> >
> >
> > > The performance advantage of special kfunc_md_arch_support()
> > > can probably only be seen in production when fentry.multi attaches
> > > to thousands of kernel functions and random functions are called.
> > > Then hash tab cache misses will be noticeable vs direct access.
> > > There will be cache misses in both cases, but significantly more misses
> > > for hash tab. Only then we can decide where special stuff is truly necessary.
> > > So patches 2 and 3 are really last. After everything had already landed.
> >
> >
> > Emm......The cache miss is something I didn't expect. The only thing I
> > concerned before is just the overhead of the hash lookup. To my utter
> > astonishment, this actually helps with cache misses as well!
> >
> >
> > BTW, should I still split out the function padding based metadata in
> > the last series?
>
> No. First make sure fm_single and fm_all has the same performance
> with hashtable and demonstrate that with existing selftests/bpf/benchs/
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH bpf-next 00/25] bpf: tracing multi-link support
2025-06-12 0:07 ` Menglong Dong
@ 2025-06-12 0:58 ` Alexei Starovoitov
2025-06-12 3:18 ` Menglong Dong
2025-06-15 8:35 ` Menglong Dong
0 siblings, 2 replies; 38+ messages in thread
From: Alexei Starovoitov @ 2025-06-12 0:58 UTC (permalink / raw)
To: Menglong Dong; +Cc: Steven Rostedt, Jiri Olsa, bpf, Menglong Dong, LKML
On Wed, Jun 11, 2025 at 5:07 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> Hi Alexei, thank you for your explanation, and now I realize the
> problem is my hash table :/
>
> My hash table made reference to ftrace and fprobe, whose
> max budget length is 1024.
>
> It's interesting to make the hash table O(1) by using rhashtable
> or sizing up the budgets, as you said. I suspect we even don't
> need the function padding part if the hash table is random
> enough.
I suggest starting with rhashtable. It's used in many
performance critical places, and when rhashtable_params are
constant the compiler optimizes everything nicely.
lookup is lockless and only needs RCU, so safe to use
from fentry_multi.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH bpf-next 00/25] bpf: tracing multi-link support
2025-06-12 0:58 ` Alexei Starovoitov
@ 2025-06-12 3:18 ` Menglong Dong
2025-06-15 8:35 ` Menglong Dong
1 sibling, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-06-12 3:18 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Steven Rostedt, Jiri Olsa, bpf, Menglong Dong, LKML
On Thu, Jun 12, 2025 at 8:58 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jun 11, 2025 at 5:07 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > Hi Alexei, thank you for your explanation, and now I realize the
> > problem is my hash table :/
> >
> > My hash table made reference to ftrace and fprobe, whose
> > max budget length is 1024.
> >
> > It's interesting to make the hash table O(1) by using rhashtable
> > or sizing up the budgets, as you said. I suspect we even don't
> > need the function padding part if the hash table is random
> > enough.
>
> I suggest starting with rhashtable. It's used in many
> performance critical places, and when rhashtable_params are
> constant the compiler optimizes everything nicely.
> lookup is lockless and only needs RCU, so safe to use
> from fentry_multi.
Thanks for the advice! rhashtable is a nice choice for fentry_multi.
and I'll redesign the function metadata with it.
Thanks!
Menglong Dong
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH bpf-next 00/25] bpf: tracing multi-link support
2025-06-12 0:58 ` Alexei Starovoitov
2025-06-12 3:18 ` Menglong Dong
@ 2025-06-15 8:35 ` Menglong Dong
1 sibling, 0 replies; 38+ messages in thread
From: Menglong Dong @ 2025-06-15 8:35 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Steven Rostedt, Jiri Olsa, bpf, Menglong Dong, LKML
On Thu, Jun 12, 2025 at 8:58 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jun 11, 2025 at 5:07 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > Hi Alexei, thank you for your explanation, and now I realize the
> > problem is my hash table :/
> >
> > My hash table made reference to ftrace and fprobe, whose
> > max budget length is 1024.
> >
> > It's interesting to make the hash table O(1) by using rhashtable
> > or sizing up the budgets, as you said. I suspect we even don't
> > need the function padding part if the hash table is random
> > enough.
>
> I suggest starting with rhashtable. It's used in many
> performance critical places, and when rhashtable_params are
> constant the compiler optimizes everything nicely.
> lookup is lockless and only needs RCU, so safe to use
> from fentry_multi.
Hi, Alexei. Sorry to bother you. I have implemented the hash table
with the rhashtable and did the bench testing with the existing
framework.
You said that "fm_single has to be within couple percents of fentry"
before, and I think it's a little difficult if we use the hashtable without
the function padding mode.
The extra overhead of the global trampoline can be as follows:
1. addition hash lookup. The rhashtable is O(1), but the hash key
compute and memory read can still have a slight overhead.
2. addition function call to kfunc_md_get_noref() in the asm, which is
used to get the function metadata. We can make it inlined in the
function padding mode in the asm, but it's hard if we are using the
rhashtable.
3. extra logic in the global trampoline. For example, we save and
restore the reg1 to reg6 on the stack for the function args even if
the function we attached to doesn't have any args.
following is the bench result in rhashtable mode, and the performance
of fentry_multi is about 77.7% of fentry:
usermode-count : 893.357 ± 0.566M/s
kernel-count : 421.290 ± 0.159M/s
syscall-count : 21.018 ± 0.165M/s
fentry : 100.742 ± 0.065M/s
fexit : 51.283 ± 0.784M/s
fmodret : 55.410 ± 0.026M/s
fentry-multi : 78.237 ± 0.117M/s
fentry-multi-all: 80.090 ± 0.049M/s
rawtp : 161.496 ± 0.197M/s
tp : 70.021 ± 0.015M/s
kprobe : 54.693 ± 0.013M/s
kprobe-multi : 51.481 ± 0.023M/s
kretprobe : 22.504 ± 0.011M/s
kretprobe-multi: 27.221 ± 0.037M/s
(It's weird that the performance of fentry-multi-all is a little higher
than fentry-multi, but I'm sure that the bpf prog is attached to all the
kernel functions in the fentry-multi-all testcase.)
The overhead of the part 1 can be eliminated with using the
function padding mode, and following is the bench result:
usermode-count : 895.874 ± 2.472M/s
kernel-count : 423.882 ± 0.342M/s
syscall-count : 20.480 ± 0.009M/s
fentry : 105.191 ± 0.275M/s
fexit : 52.430 ± 0.050M/s
fmodret : 56.130 ± 0.062M/s
fentry-multi : 88.114 ± 0.108M/s
fentry-multi-all: 86.988 ± 0.024M/s
rawtp : 145.488 ± 0.043M/s
tp : 73.386 ± 0.095M/s
kprobe : 55.294 ± 0.046M/s
kprobe-multi : 50.457 ± 0.075M/s
kretprobe : 22.414 ± 0.020M/s
kretprobe-multi: 27.205 ± 0.044M/s
The performance of fentry_multi now is 83.7% of fentry. Next
step, we make the function metadata inlined in the asm, and
the performance of fentry_multi is 89.7% of the fentry, which is
close to "be within couple percents of fentry":
usermode-count : 886.836 ± 0.300M/s
kernel-count : 419.962 ± 1.252M/s
syscall-count : 20.715 ± 0.022M/s
fentry : 102.783 ± 0.166M/s
fexit : 52.502 ± 0.014M/s
fmodret : 55.822 ± 0.038M/s
fentry-multi : 92.201 ± 0.027M/s
fentry-multi-all: 89.831 ± 0.057M/s
rawtp : 158.337 ± 4.918M/s
tp : 72.883 ± 0.041M/s
kprobe : 54.963 ± 0.013M/s
kprobe-multi : 50.069 ± 0.079M/s
kretprobe : 22.260 ± 0.012M/s
kretprobe-multi: 27.211 ± 0.011M/s
For the overhead of the part3, I'm thinking of introducing a
dynamic global trampoline. We create different global trampoline
for the functions that have different features, and the features
can be:
* function arguments count
* if bpf_get_func_ip() is ever called in the bpf progs
* if FEXIT and MODIFY_RETURN progs existing
* etc.
Then, we can generate a global trampoline for the function with
minimum instructions. According to my estimation, the performance
of the fentry_multi should be above 95% of the fentry with
function padding, function metadata inline and dynamic global
trampoline.
In fact, I implemented the first version of this series with the dynamic
global trampoline.However, that makes the series very very complex.
So I think it's not a good idea to mention it in this series.
All in all, the performance of the fentry_multi can't be within a couple
percents of fentry if we use rhashtable only according to my testing,
and I'm not sure if I should go ahead :/
BTW, the Kconfig I used in the testing comes from "make tinyconfig", and
I enabled some config to make the tools/testing/selftests/bpf can be compiled
successfully. I would appreciate it if someone can offer a better and
authoritative Kconfig for the testing :/
Thanks, have a nice weekend!
Menglong Dong
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2025-06-15 8:35 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-28 3:46 [PATCH bpf-next 00/25] bpf: tracing multi-link support Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 01/25] add per-function metadata storage support Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 02/25] x86: implement per-function metadata storage for x86 Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 03/25] arm64: implement per-function metadata storage for arm64 Menglong Dong
2025-05-28 12:16 ` kernel test robot
2025-05-28 3:46 ` [PATCH bpf-next 04/25] bpf: make kfunc_md support global trampoline link Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 05/25] x86,bpf: add bpf_global_caller for global trampoline Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 06/25] ftrace: factor out ftrace_direct_update from register_ftrace_direct Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 07/25] ftrace: add reset_ftrace_direct_ips Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 08/25] bpf: introduce bpf_gtramp_link Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 09/25] bpf: tracing: add support to record and check the accessed args Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 10/25] bpf: refactor the modules_array to ptr_array Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 11/25] bpf: verifier: add btf to the function args of bpf_check_attach_target Menglong Dong
2025-05-28 3:46 ` [PATCH bpf-next 12/25] bpf: verifier: move btf_id_deny to bpf_check_attach_target Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 13/25] x86,bpf: factor out __arch_get_bpf_regs_nr Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 14/25] bpf: tracing: add multi-link support Menglong Dong
2025-05-28 11:34 ` kernel test robot
2025-05-28 3:47 ` [PATCH bpf-next 15/25] ftrace: factor out __unregister_ftrace_direct Menglong Dong
2025-05-28 12:37 ` kernel test robot
2025-05-28 3:47 ` [PATCH bpf-next 16/25] ftrace: supporting replace direct ftrace_ops Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 17/25] bpf: make trampoline compatible with global trampoline Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 18/25] libbpf: don't free btf if tracing_multi progs existing Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 19/25] libbpf: support tracing_multi Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 20/25] libbpf: add btf type hash lookup support Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 21/25] libbpf: add skip_invalid and attach_tracing for tracing_multi Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 22/25] selftests/bpf: use the glob_match() from libbpf in test_progs.c Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 23/25] selftests/bpf: add get_ksyms and get_addrs to test_progs.c Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 24/25] selftests/bpf: add testcases for multi-link of tracing Menglong Dong
2025-05-28 3:47 ` [PATCH bpf-next 25/25] selftests/bpf: add performance bench test for trace prog Menglong Dong
2025-05-28 13:51 ` [PATCH bpf-next 00/25] bpf: tracing multi-link support Steven Rostedt
2025-05-29 1:44 ` Menglong Dong
2025-06-11 3:32 ` Alexei Starovoitov
2025-06-11 12:58 ` Menglong Dong
2025-06-11 16:11 ` Alexei Starovoitov
2025-06-12 0:07 ` Menglong Dong
2025-06-12 0:58 ` Alexei Starovoitov
2025-06-12 3:18 ` Menglong Dong
2025-06-15 8:35 ` Menglong Dong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).