From: Jiri Olsa <olsajiri@gmail.com>
To: Menglong Dong <menglong8.dong@gmail.com>
Cc: alexei.starovoitov@gmail.com, rostedt@goodmis.org,
bpf@vger.kernel.org, Menglong Dong <dongml2@chinatelecom.cn>
Subject: Re: [PATCH bpf-next v2 00/18] bpf: tracing multi-link support
Date: Fri, 4 Jul 2025 10:47:27 +0200 [thread overview]
Message-ID: <aGeVH0VV_PRfOeZ9@krava> (raw)
In-Reply-To: <20250703121521.1874196-1-dongml2@chinatelecom.cn>
On Thu, Jul 03, 2025 at 08:15:03PM +0800, Menglong Dong wrote:
> (Thanks for Alexei's advice to implement the bpf global trampoline with C
> instead of asm, the performance of tracing-multi has been significantly
> improved. And the function metadata that implemented with hash table is
> also fast enough to satisfy our needs.)
>
> For now, the BPF program of type BPF_PROG_TYPE_TRACING is not allowed to
> be attached to multiple hooks, and we have to create a BPF program for
> each kernel function, for which we want to trace, even through all the
> program have the same (or similar) logic. This can consume extra memory,
> and make the program loading slow if we have plenty of kernel function to
> trace.
hi,
what tree did you base your patchset on? I can't apply it on
bpf-next/master and I tried several other trees
thanks,
jirka
>
> In this series, we add the support to allow attaching a tracing BPF
> program to multi hooks, which is similar to BPF_TRACE_KPROBE_MULTI.
> Generally speaking, this series can be divided into 5 parts:
>
> 1. Add per-function metadata storage support.
> 2. Add bpf global trampoline support for x86_64.
> 3. Add bpf global trampoline link support.
> 4. Add tracing multi-link support.
>
> per-function metadata storage
> -----------------------------
> The per-function metadata storage is the basic of the bpf global
> trampoline. In short, it's a hash table and store some information of the
> kernel functions. The key of this hash table is the kernel function
> address, and following data is stored in the hash value:
>
> * The BPF progs, whose type is FENTRY, FEXIT or MODIFY_RETURN. The struct
> kfunc_md_tramp_prog is introduced to store the BPF prog and the cookie,
> and makes the BPF progs of the same type a list with the "next" field.
> * The kernel function address
> * The kernel function arguments count
> * If origin call needed
>
> The budgets of the hash table can grow and shrink when necessary. Alexei
> advised to use rhashtable. However, the compiler is not clever enough and
> it refused to inline the hash lookup for me, which bring in addition
> overhead in the following BPF global trampoline. I have to replace the
> "inline" with "__always_inline" for rhashtable_lookup_fast,
> rhashtable_lookup, __rhashtable_lookup, rht_key_get_hash to force it
> inline the hash lookup for me. Then, I just implement a hash table myself
> instead.
>
> bpf global trampoline
> ---------------------
> The bpf global trampoline is similar to the general bpf trampoline. The
> bpf trampoline store the bpf progs and some metadata in the trampoline
> instructions directly. However, the bpf global trampoline store and get
> the metadata from the function metadata with kfunc_md_get_rcu(). This
> makes the bpf global trampoline more flexible and can be used for all the
> kernel functions.
>
> The bpf global trampoline is designed to implement the tracing multi-link
> for FENTRY, FEXIT and MODIFY_RETURN.
>
> The global trampoline is implemented in C mostly. We implement the entry
> of the trampoline with a "__naked" function, who will save the regs to
> an array on the stack and call bpf_global_caller_run(). The entry will
> pass the address of the array and the address of the rip to
> bpf_global_caller_run().
>
> The whole idea to implement the trampoline with C is inspired by Alexei
> in [3]. It do have advantage to implement in C. Some function call, such
> as __bpf_prog_enter_recur, __bpf_prog_exit_recur, __bpf_tramp_enter
> and __bpf_tramp_exit, are inlined, which reduces some overhead. The
> performance of the global trampoline can be see below.
>
> bpf global trampoline link
> --------------------------
> We reuse part of the code in [2] to implement the tracing multi-link. The
> struct bpf_gtramp_link is introduced for the bpf global trampoline link.
> Similar to the bpf trampoline link, the bpf global trampoline link has
> bpf_gtrampoline_link_prog() and bpf_gtrampoline_unlink_prog() to link and
> unlink the bpf progs.
>
> The "entries" in the bpf_gtramp_link is a array of struct
> bpf_gtramp_link_entry, which contain all the information of the functions
> that we trace, such as the address, the number of args, the cookie and so
> on.
>
> The bpf global trampoline is much simpler than the bpf trampoline, and we
> introduce then new struct bpf_global_trampoline for it. The "image" field
> is a pointer to bpf_global_caller_x. We introduce the global trampoline
> array and kernel function with arguments count "x" can be handled by the
> global trampoline global_tr_array[x]. We implement the global trampoline
> based on the direct ftrace, and the "fops" field for this propose. This
> means bpf2bpf is not supported by the tracing multi-link.
>
> When we link the bpf prog, we will add it to all the target functions'
> kfunc_md. Then, we get all the function addresses that have bpf progs with
> kfunc_md_bpf_ips(), and reset the ftrace filter of the fops to it. The
> direct ftrace don't support to reset the filter functions yet, so we
> introduce the reset_ftrace_direct_ips() to do this work.
>
> tracing multi-link
> ------------------
> Most of the code of this part comes from the series [2].
>
> In the 6th patch, we add the support to record index of the accessed
> function args of the target for tracing program. Meanwhile, we add the
> function btf_check_func_part_match() to compare the accessed function args
> of two function prototype. This function will be used in the next commit.
>
> In the 7th patch, we refactor the struct modules_array to ptr_array, as
> we need similar function to hold the target btf, target program and kernel
> modules that we reference to in the following commit.
>
> In the 11th patch, we implement the multi-link support for tracing, and
> following new attach types are added:
>
> BPF_TRACE_FENTRY_MULTI
> BPF_TRACE_FEXIT_MULTI
> BPF_MODIFY_RETURN_MULTI
>
> We introduce the struct bpf_tracing_multi_link for this purpose, which
> can hold all the kernel modules, target bpf program (for attaching to bpf
> program) or target btf (for attaching to kernel function) that we
> referenced.
>
> During loading, the first target is used for verification by the verifier.
> And during attaching, we check the consistency of all the targets with
> the first target.
>
> performance comparison
> ----------------------
> We have implemented the following performance testings in the selftests in
> bench_trigger.c:
>
> - trig-fentry-multi
> - trig-fentry-multi-all
> - trig-fexit-multi
> - trig-fmodret-multi
>
> The "fentry_multi_all" is used to test the performance of the function
> metadata hash table and all the kernel function is hooked during testings.
>
> The mitigations is disabled during the testings. It is enabled by default
> in the kernel, and we can disable it with the "mitigations=off" cmdline
> to do the testing.
>
> The testings is done with the command:
> ./run_bench_trigger.sh fentry fentry-multi fentry-multi-all fexit \
> fexit-multi fmodret fmodret-multi
>
> Following is the testings results, and the unit is "M/s":
>
> fentry | fm | fm_all | fexit | fexit-multi | fmodret | fmodret-multi
> 103.303 | 94.532 | 98.009 | 55.155 | 55.448 | 58.632 | 56.379
> 107.564 | 98.007 | 97.857 | 55.278 | 53.997 | 59.485 | 55.855
> 106.841 | 97.483 | 95.064 | 55.715 | 55.502 | 59.442 | 56.126
> 109.852 | 97.486 | 93.161 | 56.432 | 55.494 | 59.454 | 56.178
> 109.791 | 97.973 | 96.728 | 55.729 | 55.363 | 59.445 | 56.228
>
> * fm: fentry-multi, fm_all: fentry-multi-all
>
> Following is the results to run all the bench testings:
>
> usermode-count : 746.907 ± 0.323M/s
> kernel-count : 313.423 ± 0.031M/s
> syscall-count : 18.179 ± 0.013M/s
> fentry : 107.149 ± 0.051M/s
> fexit : 56.565 ± 0.019M/s
> fmodret : 59.495 ± 0.024M/s
> fentry-multi : 99.073 ± 0.087M/s
> fentry-multi-all: 97.920 ± 0.095M/s
> fexit-multi : 55.426 ± 0.045M/s
> fmodret-multi : 56.589 ± 0.163M/s
> rawtp : 166.774 ± 0.137M/s
> tp : 61.947 ± 0.035M/s
> kprobe : 43.719 ± 0.018M/s
> kprobe-multi : 47.451 ± 0.087M/s
> kretprobe : 18.358 ± 0.026M/s
> kretprobe-multi: 24.523 ± 0.016M/s
>
> From the above test data, it can be seen that the performance of fentry-multi
> is approximately 10% worse than that of fentry, and fmodret-multi is ~5%
> worse then fmodret, fexit-multi is almost the same to fexit.
>
> The bpf global trampoline has addition overhead in comparison with the bpf
> trampoline:
> 1. We do more checks. We check if origin call is need, if the prog is
> sleepable, etc, in the global trampoline.
> 2. We do more memory read and write. We need to load the bpf progs from
> memory, and save addition regs to stack.
> 3. The function metadata lookup.
>
> However, we also have some optimization:
> 1. For fentry, we avoid 2 function call: __bpf_prog_enter_recur and
> __bpf_prog_exit_recur, as we make them inline in our case.
> 2. For fexit/fmodret, we avoid another 2 function call: __bpf_tramp_enter
> and __bpf_tramp_exit by inline them.
>
> The performance of fentry-multi is closer to fentry-multi-all, which means
> the hash table is O(1) and fast enough.
>
> Further work
> ------------
> The performance of the global trampoline can be optimized further.
>
> First, we can avoid some checks by generate more bpf_global_caller, such
> as:
>
> static __always_inline notrace int
> bpf_global_caller_run(unsigned long *args, unsigned long *ip, int nr_args,
> bool sleepable, bool do_origin)
> {
> xxxxxx
> }
>
> static __always_used __no_stack_protector notrace int
> bpf_global_caller_2_sleep_origin(unsigned long *args, unsigned long *ip)
> {
> return bpf_global_caller_run(args, ip, nr_args, 2, 1, 1);
> }
>
> And the bpf global caller "bpf_global_caller_2_sleep_origin" can be used
> for the functions who have 2 function args, and have sleepable bpf progs,
> and have fexit or modify_return. The check of sleepable and origin call
> will be optimized by the compiler, as they are const.
>
> Second, we can implement the function metadata with the function padding.
> The hash table lookup for metadata consume ~15 instructions. With
> function padding, it needs only 5 instructions, and will be faster.
>
> Besides the performance, we also need to make the global trampoline
> collaborate with bpf trampoline. For now, FENTRY_MULTI will be attached
> to the target who already have FENTRY on it, and -EEXIST will be returned.
> So we need another series to make them work together.
>
> Changes since V1:
>
> * remove the function metadata that bases on function padding, and
> implement it with a resizable hash table.
> * rewrite the bpf global trampoline with C.
> * use the existing bpf bench frame for bench testings.
> * remove the part that make tracing-multi compatible with tracing.
>
> Link: https://lore.kernel.org/all/20250303132837.498938-1-dongml2@chinatelecom.cn/ [1]
> Link: https://lore.kernel.org/bpf/20240311093526.1010158-1-dongmenglong.8@bytedance.com/ [2]
> Link: https://lore.kernel.org/bpf/CAADnVQ+G+mQPJ+O1Oc9+UW=J17CGNC5B=usCmUDxBA-ze+gZGw@mail.gmail.com/ [3]
> Menglong Dong (18):
> bpf: add function hash table for tracing-multi
> x86,bpf: add bpf_global_caller for global trampoline
> ftrace: factor out ftrace_direct_update from register_ftrace_direct
> ftrace: add reset_ftrace_direct_ips
> bpf: introduce bpf_gtramp_link
> bpf: tracing: add support to record and check the accessed args
> bpf: refactor the modules_array to ptr_array
> bpf: verifier: add btf to the function args of bpf_check_attach_target
> bpf: verifier: move btf_id_deny to bpf_check_attach_target
> x86,bpf: factor out arch_bpf_get_regs_nr
> bpf: tracing: add multi-link support
> libbpf: don't free btf if tracing_multi progs existing
> libbpf: support tracing_multi
> libbpf: add btf type hash lookup support
> libbpf: add skip_invalid and attach_tracing for tracing_multi
> selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
> selftests/bpf: add basic testcases for tracing_multi
> selftests/bpf: add bench tests for tracing_multi
>
> arch/x86/Kconfig | 4 +
> arch/x86/net/bpf_jit_comp.c | 290 ++++++++++++-
> include/linux/bpf.h | 59 +++
> include/linux/bpf_tramp.h | 72 ++++
> include/linux/bpf_types.h | 1 +
> include/linux/bpf_verifier.h | 1 +
> include/linux/btf.h | 3 +-
> include/linux/ftrace.h | 7 +
> include/linux/kfunc_md.h | 91 ++++
> include/uapi/linux/bpf.h | 10 +
> kernel/bpf/Makefile | 1 +
> kernel/bpf/btf.c | 113 ++++-
> kernel/bpf/kfunc_md.c | 352 ++++++++++++++++
> kernel/bpf/syscall.c | 395 +++++++++++++++++-
> kernel/bpf/trampoline.c | 220 +++++++++-
> kernel/bpf/verifier.c | 161 ++++---
> kernel/trace/bpf_trace.c | 48 +--
> kernel/trace/ftrace.c | 183 +++++---
> net/bpf/test_run.c | 3 +
> net/core/bpf_sk_storage.c | 2 +
> net/sched/bpf_qdisc.c | 2 +-
> tools/bpf/bpftool/common.c | 3 +
> tools/include/uapi/linux/bpf.h | 10 +
> tools/lib/bpf/bpf.c | 10 +
> tools/lib/bpf/bpf.h | 6 +
> tools/lib/bpf/btf.c | 102 +++++
> tools/lib/bpf/btf.h | 6 +
> tools/lib/bpf/libbpf.c | 296 ++++++++++++-
> tools/lib/bpf/libbpf.h | 25 ++
> tools/lib/bpf/libbpf.map | 5 +
> tools/testing/selftests/bpf/Makefile | 2 +-
> tools/testing/selftests/bpf/bench.c | 8 +
> .../selftests/bpf/benchs/bench_trigger.c | 72 ++++
> .../selftests/bpf/benchs/run_bench_trigger.sh | 1 +
> .../selftests/bpf/prog_tests/fentry_fexit.c | 22 +-
> .../selftests/bpf/prog_tests/fentry_test.c | 79 +++-
> .../selftests/bpf/prog_tests/fexit_test.c | 79 +++-
> .../bpf/prog_tests/kprobe_multi_test.c | 220 +---------
> .../selftests/bpf/prog_tests/modify_return.c | 60 +++
> .../bpf/prog_tests/tracing_multi_link.c | 210 ++++++++++
> .../selftests/bpf/progs/fentry_multi_empty.c | 13 +
> .../selftests/bpf/progs/tracing_multi_test.c | 181 ++++++++
> .../selftests/bpf/progs/trigger_bench.c | 22 +
> .../selftests/bpf/test_kmods/bpf_testmod.c | 24 ++
> tools/testing/selftests/bpf/test_progs.c | 50 +++
> tools/testing/selftests/bpf/test_progs.h | 3 +
> tools/testing/selftests/bpf/trace_helpers.c | 283 +++++++++++++
> tools/testing/selftests/bpf/trace_helpers.h | 3 +
> 48 files changed, 3349 insertions(+), 464 deletions(-)
> create mode 100644 include/linux/bpf_tramp.h
> create mode 100644 include/linux/kfunc_md.h
> create mode 100644 kernel/bpf/kfunc_md.c
> create mode 100644 tools/testing/selftests/bpf/prog_tests/tracing_multi_link.c
> create mode 100644 tools/testing/selftests/bpf/progs/fentry_multi_empty.c
> create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_test.c
>
> --
> 2.39.5
>
>
next prev parent reply other threads:[~2025-07-04 8:47 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-03 12:15 [PATCH bpf-next v2 00/18] bpf: tracing multi-link support Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 01/18] bpf: add function hash table for tracing-multi Menglong Dong
2025-07-04 16:07 ` kernel test robot
2025-07-15 1:55 ` Alexei Starovoitov
2025-07-15 2:37 ` Menglong Dong
2025-07-15 2:49 ` Alexei Starovoitov
2025-07-15 3:13 ` Menglong Dong
2025-07-15 9:06 ` Menglong Dong
2025-07-15 16:22 ` Alexei Starovoitov
2025-07-03 12:15 ` [PATCH bpf-next v2 02/18] x86,bpf: add bpf_global_caller for global trampoline Menglong Dong
2025-07-15 2:25 ` Alexei Starovoitov
2025-07-15 8:36 ` Menglong Dong
2025-07-15 9:30 ` Menglong Dong
2025-07-16 16:56 ` Inlining migrate_disable/enable. Was: " Alexei Starovoitov
2025-07-16 18:24 ` Peter Zijlstra
2025-07-16 22:35 ` Alexei Starovoitov
2025-07-16 22:49 ` Steven Rostedt
2025-07-16 22:50 ` Steven Rostedt
2025-07-28 9:20 ` Menglong Dong
2025-07-31 16:15 ` Alexei Starovoitov
2025-08-01 1:42 ` Menglong Dong
2025-08-06 8:44 ` Menglong Dong
2025-08-08 0:58 ` Alexei Starovoitov
2025-08-08 5:48 ` Menglong Dong
2025-08-08 6:32 ` Menglong Dong
2025-08-08 15:47 ` Alexei Starovoitov
2025-07-15 16:35 ` Alexei Starovoitov
2025-07-16 13:05 ` Menglong Dong
2025-07-17 0:59 ` multi-fentry proposal. Was: " Alexei Starovoitov
2025-07-17 1:50 ` Menglong Dong
2025-07-17 2:13 ` Alexei Starovoitov
2025-07-17 2:37 ` Menglong Dong
2025-07-16 14:40 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 03/18] ftrace: factor out ftrace_direct_update from register_ftrace_direct Menglong Dong
2025-07-05 2:41 ` kernel test robot
2025-07-03 12:15 ` [PATCH bpf-next v2 04/18] ftrace: add reset_ftrace_direct_ips Menglong Dong
2025-07-03 15:30 ` Steven Rostedt
2025-07-04 1:54 ` Menglong Dong
2025-07-07 18:52 ` Steven Rostedt
2025-07-08 1:26 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 05/18] bpf: introduce bpf_gtramp_link Menglong Dong
2025-07-04 7:00 ` kernel test robot
2025-07-04 7:52 ` kernel test robot
2025-07-03 12:15 ` [PATCH bpf-next v2 06/18] bpf: tracing: add support to record and check the accessed args Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-14 23:45 ` Menglong Dong
2025-07-15 17:11 ` Andrii Nakryiko
2025-07-16 12:50 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 07/18] bpf: refactor the modules_array to ptr_array Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 08/18] bpf: verifier: add btf to the function args of bpf_check_attach_target Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 09/18] bpf: verifier: move btf_id_deny to bpf_check_attach_target Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 10/18] x86,bpf: factor out arch_bpf_get_regs_nr Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 11/18] bpf: tracing: add multi-link support Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 12/18] libbpf: don't free btf if tracing_multi progs existing Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-15 1:15 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 13/18] libbpf: support tracing_multi Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-15 1:58 ` Menglong Dong
2025-07-15 17:20 ` Andrii Nakryiko
2025-07-16 12:43 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 14/18] libbpf: add btf type hash lookup support Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-15 4:40 ` Menglong Dong
2025-07-15 17:20 ` Andrii Nakryiko
2025-07-16 11:53 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 15/18] libbpf: add skip_invalid and attach_tracing for tracing_multi Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-15 5:48 ` Menglong Dong
2025-07-15 17:23 ` Andrii Nakryiko
2025-07-16 11:46 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 16/18] selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 17/18] selftests/bpf: add basic testcases for tracing_multi Menglong Dong
2025-07-08 20:07 ` Alexei Starovoitov
2025-07-09 1:33 ` Menglong Dong
2025-07-14 23:49 ` Ihor Solodrai
2025-07-16 0:26 ` Ihor Solodrai
2025-07-16 0:31 ` Alexei Starovoitov
2025-07-16 0:34 ` Ihor Solodrai
2025-07-03 12:15 ` [PATCH bpf-next v2 18/18] selftests/bpf: add bench tests " Menglong Dong
2025-07-04 8:47 ` Jiri Olsa [this message]
2025-07-04 8:52 ` [PATCH bpf-next v2 00/18] bpf: tracing multi-link support Menglong Dong
2025-07-04 8:58 ` Menglong Dong
2025-07-04 9:12 ` Jiri Olsa
2025-07-15 2:31 ` Alexei Starovoitov
2025-07-15 2:44 ` Menglong Dong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGeVH0VV_PRfOeZ9@krava \
--to=olsajiri@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=dongml2@chinatelecom.cn \
--cc=menglong8.dong@gmail.com \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.