From: Jiri Olsa <olsajiri@gmail.com>
To: Menglong Dong <menglong8.dong@gmail.com>
Cc: alexei.starovoitov@gmail.com, rostedt@goodmis.org,
bpf@vger.kernel.org, Menglong Dong <dongml2@chinatelecom.cn>
Subject: Re: [PATCH bpf-next v2 00/18] bpf: tracing multi-link support
Date: Fri, 4 Jul 2025 10:47:27 +0200 [thread overview]
Message-ID: <aGeVH0VV_PRfOeZ9@krava> (raw)
In-Reply-To: <20250703121521.1874196-1-dongml2@chinatelecom.cn>
On Thu, Jul 03, 2025 at 08:15:03PM +0800, Menglong Dong wrote:
> (Thanks for Alexei's advice to implement the bpf global trampoline with C
> instead of asm, the performance of tracing-multi has been significantly
> improved. And the function metadata that implemented with hash table is
> also fast enough to satisfy our needs.)
>
> For now, the BPF program of type BPF_PROG_TYPE_TRACING is not allowed to
> be attached to multiple hooks, and we have to create a BPF program for
> each kernel function, for which we want to trace, even through all the
> program have the same (or similar) logic. This can consume extra memory,
> and make the program loading slow if we have plenty of kernel function to
> trace.
hi,
what tree did you base your patchset on? I can't apply it on
bpf-next/master and I tried several other trees
thanks,
jirka
>
> In this series, we add the support to allow attaching a tracing BPF
> program to multi hooks, which is similar to BPF_TRACE_KPROBE_MULTI.
> Generally speaking, this series can be divided into 5 parts:
>
> 1. Add per-function metadata storage support.
> 2. Add bpf global trampoline support for x86_64.
> 3. Add bpf global trampoline link support.
> 4. Add tracing multi-link support.
>
> per-function metadata storage
> -----------------------------
> The per-function metadata storage is the basic of the bpf global
> trampoline. In short, it's a hash table and store some information of the
> kernel functions. The key of this hash table is the kernel function
> address, and following data is stored in the hash value:
>
> * The BPF progs, whose type is FENTRY, FEXIT or MODIFY_RETURN. The struct
> kfunc_md_tramp_prog is introduced to store the BPF prog and the cookie,
> and makes the BPF progs of the same type a list with the "next" field.
> * The kernel function address
> * The kernel function arguments count
> * If origin call needed
>
> The budgets of the hash table can grow and shrink when necessary. Alexei
> advised to use rhashtable. However, the compiler is not clever enough and
> it refused to inline the hash lookup for me, which bring in addition
> overhead in the following BPF global trampoline. I have to replace the
> "inline" with "__always_inline" for rhashtable_lookup_fast,
> rhashtable_lookup, __rhashtable_lookup, rht_key_get_hash to force it
> inline the hash lookup for me. Then, I just implement a hash table myself
> instead.
>
> bpf global trampoline
> ---------------------
> The bpf global trampoline is similar to the general bpf trampoline. The
> bpf trampoline store the bpf progs and some metadata in the trampoline
> instructions directly. However, the bpf global trampoline store and get
> the metadata from the function metadata with kfunc_md_get_rcu(). This
> makes the bpf global trampoline more flexible and can be used for all the
> kernel functions.
>
> The bpf global trampoline is designed to implement the tracing multi-link
> for FENTRY, FEXIT and MODIFY_RETURN.
>
> The global trampoline is implemented in C mostly. We implement the entry
> of the trampoline with a "__naked" function, who will save the regs to
> an array on the stack and call bpf_global_caller_run(). The entry will
> pass the address of the array and the address of the rip to
> bpf_global_caller_run().
>
> The whole idea to implement the trampoline with C is inspired by Alexei
> in [3]. It do have advantage to implement in C. Some function call, such
> as __bpf_prog_enter_recur, __bpf_prog_exit_recur, __bpf_tramp_enter
> and __bpf_tramp_exit, are inlined, which reduces some overhead. The
> performance of the global trampoline can be see below.
>
> bpf global trampoline link
> --------------------------
> We reuse part of the code in [2] to implement the tracing multi-link. The
> struct bpf_gtramp_link is introduced for the bpf global trampoline link.
> Similar to the bpf trampoline link, the bpf global trampoline link has
> bpf_gtrampoline_link_prog() and bpf_gtrampoline_unlink_prog() to link and
> unlink the bpf progs.
>
> The "entries" in the bpf_gtramp_link is a array of struct
> bpf_gtramp_link_entry, which contain all the information of the functions
> that we trace, such as the address, the number of args, the cookie and so
> on.
>
> The bpf global trampoline is much simpler than the bpf trampoline, and we
> introduce then new struct bpf_global_trampoline for it. The "image" field
> is a pointer to bpf_global_caller_x. We introduce the global trampoline
> array and kernel function with arguments count "x" can be handled by the
> global trampoline global_tr_array[x]. We implement the global trampoline
> based on the direct ftrace, and the "fops" field for this propose. This
> means bpf2bpf is not supported by the tracing multi-link.
>
> When we link the bpf prog, we will add it to all the target functions'
> kfunc_md. Then, we get all the function addresses that have bpf progs with
> kfunc_md_bpf_ips(), and reset the ftrace filter of the fops to it. The
> direct ftrace don't support to reset the filter functions yet, so we
> introduce the reset_ftrace_direct_ips() to do this work.
>
> tracing multi-link
> ------------------
> Most of the code of this part comes from the series [2].
>
> In the 6th patch, we add the support to record index of the accessed
> function args of the target for tracing program. Meanwhile, we add the
> function btf_check_func_part_match() to compare the accessed function args
> of two function prototype. This function will be used in the next commit.
>
> In the 7th patch, we refactor the struct modules_array to ptr_array, as
> we need similar function to hold the target btf, target program and kernel
> modules that we reference to in the following commit.
>
> In the 11th patch, we implement the multi-link support for tracing, and
> following new attach types are added:
>
> BPF_TRACE_FENTRY_MULTI
> BPF_TRACE_FEXIT_MULTI
> BPF_MODIFY_RETURN_MULTI
>
> We introduce the struct bpf_tracing_multi_link for this purpose, which
> can hold all the kernel modules, target bpf program (for attaching to bpf
> program) or target btf (for attaching to kernel function) that we
> referenced.
>
> During loading, the first target is used for verification by the verifier.
> And during attaching, we check the consistency of all the targets with
> the first target.
>
> performance comparison
> ----------------------
> We have implemented the following performance testings in the selftests in
> bench_trigger.c:
>
> - trig-fentry-multi
> - trig-fentry-multi-all
> - trig-fexit-multi
> - trig-fmodret-multi
>
> The "fentry_multi_all" is used to test the performance of the function
> metadata hash table and all the kernel function is hooked during testings.
>
> The mitigations is disabled during the testings. It is enabled by default
> in the kernel, and we can disable it with the "mitigations=off" cmdline
> to do the testing.
>
> The testings is done with the command:
> ./run_bench_trigger.sh fentry fentry-multi fentry-multi-all fexit \
> fexit-multi fmodret fmodret-multi
>
> Following is the testings results, and the unit is "M/s":
>
> fentry | fm | fm_all | fexit | fexit-multi | fmodret | fmodret-multi
> 103.303 | 94.532 | 98.009 | 55.155 | 55.448 | 58.632 | 56.379
> 107.564 | 98.007 | 97.857 | 55.278 | 53.997 | 59.485 | 55.855
> 106.841 | 97.483 | 95.064 | 55.715 | 55.502 | 59.442 | 56.126
> 109.852 | 97.486 | 93.161 | 56.432 | 55.494 | 59.454 | 56.178
> 109.791 | 97.973 | 96.728 | 55.729 | 55.363 | 59.445 | 56.228
>
> * fm: fentry-multi, fm_all: fentry-multi-all
>
> Following is the results to run all the bench testings:
>
> usermode-count : 746.907 ± 0.323M/s
> kernel-count : 313.423 ± 0.031M/s
> syscall-count : 18.179 ± 0.013M/s
> fentry : 107.149 ± 0.051M/s
> fexit : 56.565 ± 0.019M/s
> fmodret : 59.495 ± 0.024M/s
> fentry-multi : 99.073 ± 0.087M/s
> fentry-multi-all: 97.920 ± 0.095M/s
> fexit-multi : 55.426 ± 0.045M/s
> fmodret-multi : 56.589 ± 0.163M/s
> rawtp : 166.774 ± 0.137M/s
> tp : 61.947 ± 0.035M/s
> kprobe : 43.719 ± 0.018M/s
> kprobe-multi : 47.451 ± 0.087M/s
> kretprobe : 18.358 ± 0.026M/s
> kretprobe-multi: 24.523 ± 0.016M/s
>
> From the above test data, it can be seen that the performance of fentry-multi
> is approximately 10% worse than that of fentry, and fmodret-multi is ~5%
> worse then fmodret, fexit-multi is almost the same to fexit.
>
> The bpf global trampoline has addition overhead in comparison with the bpf
> trampoline:
> 1. We do more checks. We check if origin call is need, if the prog is
> sleepable, etc, in the global trampoline.
> 2. We do more memory read and write. We need to load the bpf progs from
> memory, and save addition regs to stack.
> 3. The function metadata lookup.
>
> However, we also have some optimization:
> 1. For fentry, we avoid 2 function call: __bpf_prog_enter_recur and
> __bpf_prog_exit_recur, as we make them inline in our case.
> 2. For fexit/fmodret, we avoid another 2 function call: __bpf_tramp_enter
> and __bpf_tramp_exit by inline them.
>
> The performance of fentry-multi is closer to fentry-multi-all, which means
> the hash table is O(1) and fast enough.
>
> Further work
> ------------
> The performance of the global trampoline can be optimized further.
>
> First, we can avoid some checks by generate more bpf_global_caller, such
> as:
>
> static __always_inline notrace int
> bpf_global_caller_run(unsigned long *args, unsigned long *ip, int nr_args,
> bool sleepable, bool do_origin)
> {
> xxxxxx
> }
>
> static __always_used __no_stack_protector notrace int
> bpf_global_caller_2_sleep_origin(unsigned long *args, unsigned long *ip)
> {
> return bpf_global_caller_run(args, ip, nr_args, 2, 1, 1);
> }
>
> And the bpf global caller "bpf_global_caller_2_sleep_origin" can be used
> for the functions who have 2 function args, and have sleepable bpf progs,
> and have fexit or modify_return. The check of sleepable and origin call
> will be optimized by the compiler, as they are const.
>
> Second, we can implement the function metadata with the function padding.
> The hash table lookup for metadata consume ~15 instructions. With
> function padding, it needs only 5 instructions, and will be faster.
>
> Besides the performance, we also need to make the global trampoline
> collaborate with bpf trampoline. For now, FENTRY_MULTI will be attached
> to the target who already have FENTRY on it, and -EEXIST will be returned.
> So we need another series to make them work together.
>
> Changes since V1:
>
> * remove the function metadata that bases on function padding, and
> implement it with a resizable hash table.
> * rewrite the bpf global trampoline with C.
> * use the existing bpf bench frame for bench testings.
> * remove the part that make tracing-multi compatible with tracing.
>
> Link: https://lore.kernel.org/all/20250303132837.498938-1-dongml2@chinatelecom.cn/ [1]
> Link: https://lore.kernel.org/bpf/20240311093526.1010158-1-dongmenglong.8@bytedance.com/ [2]
> Link: https://lore.kernel.org/bpf/CAADnVQ+G+mQPJ+O1Oc9+UW=J17CGNC5B=usCmUDxBA-ze+gZGw@mail.gmail.com/ [3]
> Menglong Dong (18):
> bpf: add function hash table for tracing-multi
> x86,bpf: add bpf_global_caller for global trampoline
> ftrace: factor out ftrace_direct_update from register_ftrace_direct
> ftrace: add reset_ftrace_direct_ips
> bpf: introduce bpf_gtramp_link
> bpf: tracing: add support to record and check the accessed args
> bpf: refactor the modules_array to ptr_array
> bpf: verifier: add btf to the function args of bpf_check_attach_target
> bpf: verifier: move btf_id_deny to bpf_check_attach_target
> x86,bpf: factor out arch_bpf_get_regs_nr
> bpf: tracing: add multi-link support
> libbpf: don't free btf if tracing_multi progs existing
> libbpf: support tracing_multi
> libbpf: add btf type hash lookup support
> libbpf: add skip_invalid and attach_tracing for tracing_multi
> selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
> selftests/bpf: add basic testcases for tracing_multi
> selftests/bpf: add bench tests for tracing_multi
>
> arch/x86/Kconfig | 4 +
> arch/x86/net/bpf_jit_comp.c | 290 ++++++++++++-
> include/linux/bpf.h | 59 +++
> include/linux/bpf_tramp.h | 72 ++++
> include/linux/bpf_types.h | 1 +
> include/linux/bpf_verifier.h | 1 +
> include/linux/btf.h | 3 +-
> include/linux/ftrace.h | 7 +
> include/linux/kfunc_md.h | 91 ++++
> include/uapi/linux/bpf.h | 10 +
> kernel/bpf/Makefile | 1 +
> kernel/bpf/btf.c | 113 ++++-
> kernel/bpf/kfunc_md.c | 352 ++++++++++++++++
> kernel/bpf/syscall.c | 395 +++++++++++++++++-
> kernel/bpf/trampoline.c | 220 +++++++++-
> kernel/bpf/verifier.c | 161 ++++---
> kernel/trace/bpf_trace.c | 48 +--
> kernel/trace/ftrace.c | 183 +++++---
> net/bpf/test_run.c | 3 +
> net/core/bpf_sk_storage.c | 2 +
> net/sched/bpf_qdisc.c | 2 +-
> tools/bpf/bpftool/common.c | 3 +
> tools/include/uapi/linux/bpf.h | 10 +
> tools/lib/bpf/bpf.c | 10 +
> tools/lib/bpf/bpf.h | 6 +
> tools/lib/bpf/btf.c | 102 +++++
> tools/lib/bpf/btf.h | 6 +
> tools/lib/bpf/libbpf.c | 296 ++++++++++++-
> tools/lib/bpf/libbpf.h | 25 ++
> tools/lib/bpf/libbpf.map | 5 +
> tools/testing/selftests/bpf/Makefile | 2 +-
> tools/testing/selftests/bpf/bench.c | 8 +
> .../selftests/bpf/benchs/bench_trigger.c | 72 ++++
> .../selftests/bpf/benchs/run_bench_trigger.sh | 1 +
> .../selftests/bpf/prog_tests/fentry_fexit.c | 22 +-
> .../selftests/bpf/prog_tests/fentry_test.c | 79 +++-
> .../selftests/bpf/prog_tests/fexit_test.c | 79 +++-
> .../bpf/prog_tests/kprobe_multi_test.c | 220 +---------
> .../selftests/bpf/prog_tests/modify_return.c | 60 +++
> .../bpf/prog_tests/tracing_multi_link.c | 210 ++++++++++
> .../selftests/bpf/progs/fentry_multi_empty.c | 13 +
> .../selftests/bpf/progs/tracing_multi_test.c | 181 ++++++++
> .../selftests/bpf/progs/trigger_bench.c | 22 +
> .../selftests/bpf/test_kmods/bpf_testmod.c | 24 ++
> tools/testing/selftests/bpf/test_progs.c | 50 +++
> tools/testing/selftests/bpf/test_progs.h | 3 +
> tools/testing/selftests/bpf/trace_helpers.c | 283 +++++++++++++
> tools/testing/selftests/bpf/trace_helpers.h | 3 +
> 48 files changed, 3349 insertions(+), 464 deletions(-)
> create mode 100644 include/linux/bpf_tramp.h
> create mode 100644 include/linux/kfunc_md.h
> create mode 100644 kernel/bpf/kfunc_md.c
> create mode 100644 tools/testing/selftests/bpf/prog_tests/tracing_multi_link.c
> create mode 100644 tools/testing/selftests/bpf/progs/fentry_multi_empty.c
> create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_test.c
>
> --
> 2.39.5
>
>
next prev parent reply other threads:[~2025-07-04 8:47 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-03 12:15 [PATCH bpf-next v2 00/18] bpf: tracing multi-link support Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 01/18] bpf: add function hash table for tracing-multi Menglong Dong
2025-07-04 16:07 ` kernel test robot
2025-07-15 1:55 ` Alexei Starovoitov
2025-07-15 2:37 ` Menglong Dong
2025-07-15 2:49 ` Alexei Starovoitov
2025-07-15 3:13 ` Menglong Dong
2025-07-15 9:06 ` Menglong Dong
2025-07-15 16:22 ` Alexei Starovoitov
2025-07-03 12:15 ` [PATCH bpf-next v2 02/18] x86,bpf: add bpf_global_caller for global trampoline Menglong Dong
2025-07-15 2:25 ` Alexei Starovoitov
2025-07-15 8:36 ` Menglong Dong
2025-07-15 9:30 ` Menglong Dong
2025-07-16 16:56 ` Inlining migrate_disable/enable. Was: " Alexei Starovoitov
2025-07-16 18:24 ` Peter Zijlstra
2025-07-16 22:35 ` Alexei Starovoitov
2025-07-16 22:49 ` Steven Rostedt
2025-07-16 22:50 ` Steven Rostedt
2025-07-28 9:20 ` Menglong Dong
2025-07-31 16:15 ` Alexei Starovoitov
2025-08-01 1:42 ` Menglong Dong
2025-08-06 8:44 ` Menglong Dong
2025-08-08 0:58 ` Alexei Starovoitov
2025-08-08 5:48 ` Menglong Dong
2025-08-08 6:32 ` Menglong Dong
2025-08-08 15:47 ` Alexei Starovoitov
2025-07-15 16:35 ` Alexei Starovoitov
2025-07-16 13:05 ` Menglong Dong
2025-07-17 0:59 ` multi-fentry proposal. Was: " Alexei Starovoitov
2025-07-17 1:50 ` Menglong Dong
2025-07-17 2:13 ` Alexei Starovoitov
2025-07-17 2:37 ` Menglong Dong
2025-07-16 14:40 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 03/18] ftrace: factor out ftrace_direct_update from register_ftrace_direct Menglong Dong
2025-07-05 2:41 ` kernel test robot
2025-07-03 12:15 ` [PATCH bpf-next v2 04/18] ftrace: add reset_ftrace_direct_ips Menglong Dong
2025-07-03 15:30 ` Steven Rostedt
2025-07-04 1:54 ` Menglong Dong
2025-07-07 18:52 ` Steven Rostedt
2025-07-08 1:26 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 05/18] bpf: introduce bpf_gtramp_link Menglong Dong
2025-07-04 7:00 ` kernel test robot
2025-07-04 7:52 ` kernel test robot
2025-07-03 12:15 ` [PATCH bpf-next v2 06/18] bpf: tracing: add support to record and check the accessed args Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-14 23:45 ` Menglong Dong
2025-07-15 17:11 ` Andrii Nakryiko
2025-07-16 12:50 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 07/18] bpf: refactor the modules_array to ptr_array Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 08/18] bpf: verifier: add btf to the function args of bpf_check_attach_target Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 09/18] bpf: verifier: move btf_id_deny to bpf_check_attach_target Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 10/18] x86,bpf: factor out arch_bpf_get_regs_nr Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 11/18] bpf: tracing: add multi-link support Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 12/18] libbpf: don't free btf if tracing_multi progs existing Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-15 1:15 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 13/18] libbpf: support tracing_multi Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-15 1:58 ` Menglong Dong
2025-07-15 17:20 ` Andrii Nakryiko
2025-07-16 12:43 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 14/18] libbpf: add btf type hash lookup support Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-15 4:40 ` Menglong Dong
2025-07-15 17:20 ` Andrii Nakryiko
2025-07-16 11:53 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 15/18] libbpf: add skip_invalid and attach_tracing for tracing_multi Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-15 5:48 ` Menglong Dong
2025-07-15 17:23 ` Andrii Nakryiko
2025-07-16 11:46 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 16/18] selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 17/18] selftests/bpf: add basic testcases for tracing_multi Menglong Dong
2025-07-08 20:07 ` Alexei Starovoitov
2025-07-09 1:33 ` Menglong Dong
2025-07-14 23:49 ` Ihor Solodrai
2025-07-16 0:26 ` Ihor Solodrai
2025-07-16 0:31 ` Alexei Starovoitov
2025-07-16 0:34 ` Ihor Solodrai
2025-07-03 12:15 ` [PATCH bpf-next v2 18/18] selftests/bpf: add bench tests " Menglong Dong
2025-07-04 8:47 ` Jiri Olsa [this message]
2025-07-04 8:52 ` [PATCH bpf-next v2 00/18] bpf: tracing multi-link support Menglong Dong
2025-07-04 8:58 ` Menglong Dong
2025-07-04 9:12 ` Jiri Olsa
2025-07-15 2:31 ` Alexei Starovoitov
2025-07-15 2:44 ` Menglong Dong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGeVH0VV_PRfOeZ9@krava \
--to=olsajiri@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=dongml2@chinatelecom.cn \
--cc=menglong8.dong@gmail.com \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).