bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Olsa <olsajiri@gmail.com>
To: Menglong Dong <menglong8.dong@gmail.com>
Cc: alexei.starovoitov@gmail.com, rostedt@goodmis.org,
	bpf@vger.kernel.org, Menglong Dong <dongml2@chinatelecom.cn>
Subject: Re: [PATCH bpf-next v2 00/18] bpf: tracing multi-link support
Date: Fri, 4 Jul 2025 10:47:27 +0200	[thread overview]
Message-ID: <aGeVH0VV_PRfOeZ9@krava> (raw)
In-Reply-To: <20250703121521.1874196-1-dongml2@chinatelecom.cn>

On Thu, Jul 03, 2025 at 08:15:03PM +0800, Menglong Dong wrote:
> (Thanks for Alexei's advice to implement the bpf global trampoline with C
> instead of asm, the performance of tracing-multi has been significantly
> improved. And the function metadata that implemented with hash table is
> also fast enough to satisfy our needs.)
> 
> For now, the BPF program of type BPF_PROG_TYPE_TRACING is not allowed to
> be attached to multiple hooks, and we have to create a BPF program for
> each kernel function, for which we want to trace, even through all the
> program have the same (or similar) logic. This can consume extra memory,
> and make the program loading slow if we have plenty of kernel function to
> trace.

hi,
what tree did you base your patchset on? I can't apply it on
bpf-next/master and I tried several other trees

thanks,
jirka

> 
> In this series, we add the support to allow attaching a tracing BPF
> program to multi hooks, which is similar to BPF_TRACE_KPROBE_MULTI.
> Generally speaking, this series can be divided into 5 parts:
> 
> 1. Add per-function metadata storage support.
> 2. Add bpf global trampoline support for x86_64.
> 3. Add bpf global trampoline link support.
> 4. Add tracing multi-link support.
> 
> per-function metadata storage
> -----------------------------
> The per-function metadata storage is the basic of the bpf global
> trampoline. In short, it's a hash table and store some information of the
> kernel functions. The key of this hash table is the kernel function
> address, and following data is stored in the hash value:
> 
> * The BPF progs, whose type is FENTRY, FEXIT or MODIFY_RETURN. The struct
>   kfunc_md_tramp_prog is introduced to store the BPF prog and the cookie,
>   and makes the BPF progs of the same type a list with the "next" field.
> * The kernel function address
> * The kernel function arguments count
> * If origin call needed
> 
> The budgets of the hash table can grow and shrink when necessary. Alexei
> advised to use rhashtable. However, the compiler is not clever enough and
> it refused to inline the hash lookup for me, which bring in addition
> overhead in the following BPF global trampoline. I have to replace the
> "inline" with "__always_inline" for rhashtable_lookup_fast,
> rhashtable_lookup, __rhashtable_lookup, rht_key_get_hash to force it
> inline the hash lookup for me. Then, I just implement a hash table myself
> instead.
> 
> bpf global trampoline
> ---------------------
> The bpf global trampoline is similar to the general bpf trampoline. The
> bpf trampoline store the bpf progs and some metadata in the trampoline
> instructions directly. However, the bpf global trampoline store and get
> the metadata from the function metadata with kfunc_md_get_rcu(). This
> makes the bpf global trampoline more flexible and can be used for all the
> kernel functions.
> 
> The bpf global trampoline is designed to implement the tracing multi-link
> for FENTRY, FEXIT and MODIFY_RETURN.
> 
> The global trampoline is implemented in C mostly. We implement the entry
> of the trampoline with a "__naked" function, who will save the regs to
> an array on the stack and call bpf_global_caller_run(). The entry will
> pass the address of the array and the address of the rip to
> bpf_global_caller_run().
> 
> The whole idea to implement the trampoline with C is inspired by Alexei
> in [3]. It do have advantage to implement in C. Some function call, such
> as __bpf_prog_enter_recur, __bpf_prog_exit_recur, __bpf_tramp_enter
> and __bpf_tramp_exit, are inlined, which reduces some overhead. The
> performance of the global trampoline can be see below.
> 
> bpf global trampoline link
> --------------------------
> We reuse part of the code in [2] to implement the tracing multi-link. The
> struct bpf_gtramp_link is introduced for the bpf global trampoline link.
> Similar to the bpf trampoline link, the bpf global trampoline link has
> bpf_gtrampoline_link_prog() and bpf_gtrampoline_unlink_prog() to link and
> unlink the bpf progs.
> 
> The "entries" in the bpf_gtramp_link is a array of struct
> bpf_gtramp_link_entry, which contain all the information of the functions
> that we trace, such as the address, the number of args, the cookie and so
> on.
> 
> The bpf global trampoline is much simpler than the bpf trampoline, and we
> introduce then new struct bpf_global_trampoline for it. The "image" field
> is a pointer to bpf_global_caller_x. We introduce the global trampoline
> array and kernel function with arguments count "x" can be handled by the
> global trampoline global_tr_array[x]. We implement the global trampoline
> based on the direct ftrace, and the "fops" field for this propose. This
> means bpf2bpf is not supported by the tracing multi-link.
> 
> When we link the bpf prog, we will add it to all the target functions'
> kfunc_md. Then, we get all the function addresses that have bpf progs with
> kfunc_md_bpf_ips(), and reset the ftrace filter of the fops to it. The
> direct ftrace don't support to reset the filter functions yet, so we
> introduce the reset_ftrace_direct_ips() to do this work.
> 
> tracing multi-link
> ------------------
> Most of the code of this part comes from the series [2].
> 
> In the 6th patch, we add the support to record index of the accessed
> function args of the target for tracing program. Meanwhile, we add the
> function btf_check_func_part_match() to compare the accessed function args
> of two function prototype. This function will be used in the next commit.
> 
> In the 7th patch, we refactor the struct modules_array to ptr_array, as
> we need similar function to hold the target btf, target program and kernel
> modules that we reference to in the following commit.
> 
> In the 11th patch, we implement the multi-link support for tracing, and
> following new attach types are added:
> 
>   BPF_TRACE_FENTRY_MULTI
>   BPF_TRACE_FEXIT_MULTI
>   BPF_MODIFY_RETURN_MULTI
> 
> We introduce the struct bpf_tracing_multi_link for this purpose, which
> can hold all the kernel modules, target bpf program (for attaching to bpf
> program) or target btf (for attaching to kernel function) that we
> referenced.
> 
> During loading, the first target is used for verification by the verifier.
> And during attaching, we check the consistency of all the targets with
> the first target.
> 
> performance comparison
> ----------------------
> We have implemented the following performance testings in the selftests in
> bench_trigger.c:
> 
> - trig-fentry-multi
> - trig-fentry-multi-all
> - trig-fexit-multi
> - trig-fmodret-multi
> 
> The "fentry_multi_all" is used to test the performance of the function
> metadata hash table and all the kernel function is hooked during testings.
> 
> The mitigations is disabled during the testings. It is enabled by default
> in the kernel, and we can disable it with the "mitigations=off" cmdline
> to do the testing.
> 
> The testings is done with the command:
>   ./run_bench_trigger.sh fentry fentry-multi fentry-multi-all fexit \
>                          fexit-multi fmodret fmodret-multi
> 
> Following is the testings results, and the unit is "M/s":
> 
> fentry  | fm     | fm_all | fexit  | fexit-multi | fmodret | fmodret-multi
> 103.303 | 94.532 | 98.009 | 55.155 | 55.448      | 58.632  | 56.379 
> 107.564 | 98.007 | 97.857 | 55.278 | 53.997      | 59.485  | 55.855 
> 106.841 | 97.483 | 95.064 | 55.715 | 55.502      | 59.442  | 56.126 
> 109.852 | 97.486 | 93.161 | 56.432 | 55.494      | 59.454  | 56.178 
> 109.791 | 97.973 | 96.728 | 55.729 | 55.363      | 59.445  | 56.228
> 
> * fm: fentry-multi, fm_all: fentry-multi-all
> 
> Following is the results to run all the bench testings:
> 
>   usermode-count :  746.907 ± 0.323M/s
>   kernel-count   :  313.423 ± 0.031M/s 
>   syscall-count  :   18.179 ± 0.013M/s 
>   fentry         :  107.149 ± 0.051M/s 
>   fexit          :   56.565 ± 0.019M/s 
>   fmodret        :   59.495 ± 0.024M/s 
>   fentry-multi   :   99.073 ± 0.087M/s 
>   fentry-multi-all:   97.920 ± 0.095M/s 
>   fexit-multi    :   55.426 ± 0.045M/s 
>   fmodret-multi  :   56.589 ± 0.163M/s 
>   rawtp          :  166.774 ± 0.137M/s 
>   tp             :   61.947 ± 0.035M/s 
>   kprobe         :   43.719 ± 0.018M/s 
>   kprobe-multi   :   47.451 ± 0.087M/s 
>   kretprobe      :   18.358 ± 0.026M/s 
>   kretprobe-multi:   24.523 ± 0.016M/s
> 
> From the above test data, it can be seen that the performance of fentry-multi
> is approximately 10% worse than that of fentry, and fmodret-multi is ~5%
> worse then fmodret, fexit-multi is almost the same to fexit.
> 
> The bpf global trampoline has addition overhead in comparison with the bpf
> trampoline:
> 1. We do more checks. We check if origin call is need, if the prog is
>    sleepable, etc, in the global trampoline.
> 2. We do more memory read and write. We need to load the bpf progs from
>    memory, and save addition regs to stack.
> 3. The function metadata lookup.
> 
> However, we also have some optimization:
> 1. For fentry, we avoid 2 function call: __bpf_prog_enter_recur and
>    __bpf_prog_exit_recur, as we make them inline in our case.
> 2. For fexit/fmodret, we avoid another 2 function call: __bpf_tramp_enter
>    and __bpf_tramp_exit by inline them.
> 
> The performance of fentry-multi is closer to fentry-multi-all, which means
> the hash table is O(1) and fast enough.
> 
> Further work
> ------------
> The performance of the global trampoline can be optimized further.
> 
> First, we can avoid some checks by generate more bpf_global_caller, such
> as:
> 
> static __always_inline notrace int
> bpf_global_caller_run(unsigned long *args, unsigned long *ip, int nr_args,
>                       bool sleepable, bool do_origin)
> {
>     xxxxxx
> }
> 
> static __always_used __no_stack_protector notrace int
> bpf_global_caller_2_sleep_origin(unsigned long *args, unsigned long *ip)
> {
>     return bpf_global_caller_run(args, ip, nr_args, 2, 1, 1);
> }
> 
> And the bpf global caller "bpf_global_caller_2_sleep_origin" can be used
> for the functions who have 2 function args, and have sleepable bpf progs,
> and have fexit or modify_return. The check of sleepable and origin call
> will be optimized by the compiler, as they are const.
> 
> Second, we can implement the function metadata with the function padding.
> The hash table lookup for metadata consume ~15 instructions. With
> function padding, it needs only 5 instructions, and will be faster.
> 
> Besides the performance, we also need to make the global trampoline
> collaborate with bpf trampoline. For now, FENTRY_MULTI will be attached
> to the target who already have FENTRY on it, and -EEXIST will be returned.
> So we need another series to make them work together.
> 
> Changes since V1:
> 
> * remove the function metadata that bases on function padding, and
>   implement it with a resizable hash table.
> * rewrite the bpf global trampoline with C.
> * use the existing bpf bench frame for bench testings.
> * remove the part that make tracing-multi compatible with tracing.
> 
> Link: https://lore.kernel.org/all/20250303132837.498938-1-dongml2@chinatelecom.cn/ [1]
> Link: https://lore.kernel.org/bpf/20240311093526.1010158-1-dongmenglong.8@bytedance.com/ [2]
> Link: https://lore.kernel.org/bpf/CAADnVQ+G+mQPJ+O1Oc9+UW=J17CGNC5B=usCmUDxBA-ze+gZGw@mail.gmail.com/ [3]
> Menglong Dong (18):
>   bpf: add function hash table for tracing-multi
>   x86,bpf: add bpf_global_caller for global trampoline
>   ftrace: factor out ftrace_direct_update from register_ftrace_direct
>   ftrace: add reset_ftrace_direct_ips
>   bpf: introduce bpf_gtramp_link
>   bpf: tracing: add support to record and check the accessed args
>   bpf: refactor the modules_array to ptr_array
>   bpf: verifier: add btf to the function args of bpf_check_attach_target
>   bpf: verifier: move btf_id_deny to bpf_check_attach_target
>   x86,bpf: factor out arch_bpf_get_regs_nr
>   bpf: tracing: add multi-link support
>   libbpf: don't free btf if tracing_multi progs existing
>   libbpf: support tracing_multi
>   libbpf: add btf type hash lookup support
>   libbpf: add skip_invalid and attach_tracing for tracing_multi
>   selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
>   selftests/bpf: add basic testcases for tracing_multi
>   selftests/bpf: add bench tests for tracing_multi
> 
>  arch/x86/Kconfig                              |   4 +
>  arch/x86/net/bpf_jit_comp.c                   | 290 ++++++++++++-
>  include/linux/bpf.h                           |  59 +++
>  include/linux/bpf_tramp.h                     |  72 ++++
>  include/linux/bpf_types.h                     |   1 +
>  include/linux/bpf_verifier.h                  |   1 +
>  include/linux/btf.h                           |   3 +-
>  include/linux/ftrace.h                        |   7 +
>  include/linux/kfunc_md.h                      |  91 ++++
>  include/uapi/linux/bpf.h                      |  10 +
>  kernel/bpf/Makefile                           |   1 +
>  kernel/bpf/btf.c                              | 113 ++++-
>  kernel/bpf/kfunc_md.c                         | 352 ++++++++++++++++
>  kernel/bpf/syscall.c                          | 395 +++++++++++++++++-
>  kernel/bpf/trampoline.c                       | 220 +++++++++-
>  kernel/bpf/verifier.c                         | 161 ++++---
>  kernel/trace/bpf_trace.c                      |  48 +--
>  kernel/trace/ftrace.c                         | 183 +++++---
>  net/bpf/test_run.c                            |   3 +
>  net/core/bpf_sk_storage.c                     |   2 +
>  net/sched/bpf_qdisc.c                         |   2 +-
>  tools/bpf/bpftool/common.c                    |   3 +
>  tools/include/uapi/linux/bpf.h                |  10 +
>  tools/lib/bpf/bpf.c                           |  10 +
>  tools/lib/bpf/bpf.h                           |   6 +
>  tools/lib/bpf/btf.c                           | 102 +++++
>  tools/lib/bpf/btf.h                           |   6 +
>  tools/lib/bpf/libbpf.c                        | 296 ++++++++++++-
>  tools/lib/bpf/libbpf.h                        |  25 ++
>  tools/lib/bpf/libbpf.map                      |   5 +
>  tools/testing/selftests/bpf/Makefile          |   2 +-
>  tools/testing/selftests/bpf/bench.c           |   8 +
>  .../selftests/bpf/benchs/bench_trigger.c      |  72 ++++
>  .../selftests/bpf/benchs/run_bench_trigger.sh |   1 +
>  .../selftests/bpf/prog_tests/fentry_fexit.c   |  22 +-
>  .../selftests/bpf/prog_tests/fentry_test.c    |  79 +++-
>  .../selftests/bpf/prog_tests/fexit_test.c     |  79 +++-
>  .../bpf/prog_tests/kprobe_multi_test.c        | 220 +---------
>  .../selftests/bpf/prog_tests/modify_return.c  |  60 +++
>  .../bpf/prog_tests/tracing_multi_link.c       | 210 ++++++++++
>  .../selftests/bpf/progs/fentry_multi_empty.c  |  13 +
>  .../selftests/bpf/progs/tracing_multi_test.c  | 181 ++++++++
>  .../selftests/bpf/progs/trigger_bench.c       |  22 +
>  .../selftests/bpf/test_kmods/bpf_testmod.c    |  24 ++
>  tools/testing/selftests/bpf/test_progs.c      |  50 +++
>  tools/testing/selftests/bpf/test_progs.h      |   3 +
>  tools/testing/selftests/bpf/trace_helpers.c   | 283 +++++++++++++
>  tools/testing/selftests/bpf/trace_helpers.h   |   3 +
>  48 files changed, 3349 insertions(+), 464 deletions(-)
>  create mode 100644 include/linux/bpf_tramp.h
>  create mode 100644 include/linux/kfunc_md.h
>  create mode 100644 kernel/bpf/kfunc_md.c
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/tracing_multi_link.c
>  create mode 100644 tools/testing/selftests/bpf/progs/fentry_multi_empty.c
>  create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_test.c
> 
> -- 
> 2.39.5
> 
> 

  parent reply	other threads:[~2025-07-04  8:47 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-03 12:15 [PATCH bpf-next v2 00/18] bpf: tracing multi-link support Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 01/18] bpf: add function hash table for tracing-multi Menglong Dong
2025-07-04 16:07   ` kernel test robot
2025-07-15  1:55   ` Alexei Starovoitov
2025-07-15  2:37     ` Menglong Dong
2025-07-15  2:49       ` Alexei Starovoitov
2025-07-15  3:13         ` Menglong Dong
2025-07-15  9:06           ` Menglong Dong
2025-07-15 16:22             ` Alexei Starovoitov
2025-07-03 12:15 ` [PATCH bpf-next v2 02/18] x86,bpf: add bpf_global_caller for global trampoline Menglong Dong
2025-07-15  2:25   ` Alexei Starovoitov
2025-07-15  8:36     ` Menglong Dong
2025-07-15  9:30       ` Menglong Dong
2025-07-16 16:56         ` Inlining migrate_disable/enable. Was: " Alexei Starovoitov
2025-07-16 18:24           ` Peter Zijlstra
2025-07-16 22:35             ` Alexei Starovoitov
2025-07-16 22:49               ` Steven Rostedt
2025-07-16 22:50                 ` Steven Rostedt
2025-07-28  9:20               ` Menglong Dong
2025-07-31 16:15                 ` Alexei Starovoitov
2025-08-01  1:42                   ` Menglong Dong
2025-08-06  8:44                   ` Menglong Dong
2025-08-08  0:58                     ` Alexei Starovoitov
2025-08-08  5:48                       ` Menglong Dong
2025-08-08  6:32                       ` Menglong Dong
2025-08-08 15:47                         ` Alexei Starovoitov
2025-07-15 16:35       ` Alexei Starovoitov
2025-07-16 13:05         ` Menglong Dong
2025-07-17  0:59           ` multi-fentry proposal. Was: " Alexei Starovoitov
2025-07-17  1:50             ` Menglong Dong
2025-07-17  2:13               ` Alexei Starovoitov
2025-07-17  2:37                 ` Menglong Dong
2025-07-16 14:40         ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 03/18] ftrace: factor out ftrace_direct_update from register_ftrace_direct Menglong Dong
2025-07-05  2:41   ` kernel test robot
2025-07-03 12:15 ` [PATCH bpf-next v2 04/18] ftrace: add reset_ftrace_direct_ips Menglong Dong
2025-07-03 15:30   ` Steven Rostedt
2025-07-04  1:54     ` Menglong Dong
2025-07-07 18:52       ` Steven Rostedt
2025-07-08  1:26         ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 05/18] bpf: introduce bpf_gtramp_link Menglong Dong
2025-07-04  7:00   ` kernel test robot
2025-07-04  7:52   ` kernel test robot
2025-07-03 12:15 ` [PATCH bpf-next v2 06/18] bpf: tracing: add support to record and check the accessed args Menglong Dong
2025-07-14 22:07   ` Andrii Nakryiko
2025-07-14 23:45     ` Menglong Dong
2025-07-15 17:11       ` Andrii Nakryiko
2025-07-16 12:50         ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 07/18] bpf: refactor the modules_array to ptr_array Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 08/18] bpf: verifier: add btf to the function args of bpf_check_attach_target Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 09/18] bpf: verifier: move btf_id_deny to bpf_check_attach_target Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 10/18] x86,bpf: factor out arch_bpf_get_regs_nr Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 11/18] bpf: tracing: add multi-link support Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 12/18] libbpf: don't free btf if tracing_multi progs existing Menglong Dong
2025-07-14 22:07   ` Andrii Nakryiko
2025-07-15  1:15     ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 13/18] libbpf: support tracing_multi Menglong Dong
2025-07-14 22:07   ` Andrii Nakryiko
2025-07-15  1:58     ` Menglong Dong
2025-07-15 17:20       ` Andrii Nakryiko
2025-07-16 12:43         ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 14/18] libbpf: add btf type hash lookup support Menglong Dong
2025-07-14 22:07   ` Andrii Nakryiko
2025-07-15  4:40     ` Menglong Dong
2025-07-15 17:20       ` Andrii Nakryiko
2025-07-16 11:53         ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 15/18] libbpf: add skip_invalid and attach_tracing for tracing_multi Menglong Dong
2025-07-14 22:07   ` Andrii Nakryiko
2025-07-15  5:48     ` Menglong Dong
2025-07-15 17:23       ` Andrii Nakryiko
2025-07-16 11:46         ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 16/18] selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 17/18] selftests/bpf: add basic testcases for tracing_multi Menglong Dong
2025-07-08 20:07   ` Alexei Starovoitov
2025-07-09  1:33     ` Menglong Dong
2025-07-14 23:49     ` Ihor Solodrai
2025-07-16  0:26       ` Ihor Solodrai
2025-07-16  0:31         ` Alexei Starovoitov
2025-07-16  0:34           ` Ihor Solodrai
2025-07-03 12:15 ` [PATCH bpf-next v2 18/18] selftests/bpf: add bench tests " Menglong Dong
2025-07-04  8:47 ` Jiri Olsa [this message]
2025-07-04  8:52   ` [PATCH bpf-next v2 00/18] bpf: tracing multi-link support Menglong Dong
2025-07-04  8:58     ` Menglong Dong
2025-07-04  9:12       ` Jiri Olsa
2025-07-15  2:31 ` Alexei Starovoitov
2025-07-15  2:44   ` Menglong Dong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aGeVH0VV_PRfOeZ9@krava \
    --to=olsajiri@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=dongml2@chinatelecom.cn \
    --cc=menglong8.dong@gmail.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).