From: Menglong Dong <menglong.dong@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Menglong Dong <menglong8.dong@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Jiri Olsa <jolsa@kernel.org>, bpf <bpf@vger.kernel.org>,
Menglong Dong <dongml2@chinatelecom.cn>,
"H. Peter Anvin" <hpa@zytor.com>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
LKML <linux-kernel@vger.kernel.org>,
Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH bpf-next v2 02/18] x86,bpf: add bpf_global_caller for global trampoline
Date: Tue, 15 Jul 2025 16:36:57 +0800 [thread overview]
Message-ID: <45f4d349-7b08-45d3-9bec-3ab75217f9b6@linux.dev> (raw)
In-Reply-To: <CAADnVQKP1-gdmq1xkogFeRM6o3j2zf0Q8Atz=aCEkB0PkVx++A@mail.gmail.com>
On 7/15/25 10:25, Alexei Starovoitov wrote:
> On Thu, Jul 3, 2025 at 5:17 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
>> +static __always_inline void
>> +do_origin_call(unsigned long *args, unsigned long *ip, int nr_args)
>> +{
>> + /* Following code will be optimized by the compiler, as nr_args
>> + * is a const, and there will be no condition here.
>> + */
>> + if (nr_args == 0) {
>> + asm volatile(
>> + RESTORE_ORIGIN_0 CALL_NOSPEC "\n"
>> + "movq %%rax, %0\n"
>> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
>> + : [args]"r"(args), [thunk_target]"r"(*ip)
>> + :
>> + );
>> + } else if (nr_args == 1) {
>> + asm volatile(
>> + RESTORE_ORIGIN_1 CALL_NOSPEC "\n"
>> + "movq %%rax, %0\n"
>> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
>> + : [args]"r"(args), [thunk_target]"r"(*ip)
>> + : "rdi"
>> + );
>> + } else if (nr_args == 2) {
>> + asm volatile(
>> + RESTORE_ORIGIN_2 CALL_NOSPEC "\n"
>> + "movq %%rax, %0\n"
>> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
>> + : [args]"r"(args), [thunk_target]"r"(*ip)
>> + : "rdi", "rsi"
>> + );
>> + } else if (nr_args == 3) {
>> + asm volatile(
>> + RESTORE_ORIGIN_3 CALL_NOSPEC "\n"
>> + "movq %%rax, %0\n"
>> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
>> + : [args]"r"(args), [thunk_target]"r"(*ip)
>> + : "rdi", "rsi", "rdx"
>> + );
>> + } else if (nr_args == 4) {
>> + asm volatile(
>> + RESTORE_ORIGIN_4 CALL_NOSPEC "\n"
>> + "movq %%rax, %0\n"
>> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
>> + : [args]"r"(args), [thunk_target]"r"(*ip)
>> + : "rdi", "rsi", "rdx", "rcx"
>> + );
>> + } else if (nr_args == 5) {
>> + asm volatile(
>> + RESTORE_ORIGIN_5 CALL_NOSPEC "\n"
>> + "movq %%rax, %0\n"
>> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
>> + : [args]"r"(args), [thunk_target]"r"(*ip)
>> + : "rdi", "rsi", "rdx", "rcx", "r8"
>> + );
>> + } else if (nr_args == 6) {
>> + asm volatile(
>> + RESTORE_ORIGIN_6 CALL_NOSPEC "\n"
>> + "movq %%rax, %0\n"
>> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
>> + : [args]"r"(args), [thunk_target]"r"(*ip)
>> + : "rdi", "rsi", "rdx", "rcx", "r8", "r9"
>> + );
>> + }
>> +}
> What is the performance difference between 0-6 variants?
> I would think save/restore of regs shouldn't be that expensive.
> bpf trampoline saves only what's necessary because it can do
> this micro optimization, but for this one, I think, doing
> _one_ global trampoline that covers all cases will simplify the code
> a lot, but please benchmark the difference to understand
> the trade-off.
According to my benchmark, it has ~5% overhead to save/restore
*5* variants when compared with *0* variant. The save/restore of regs
is fast, but it still need 12 insn, which can produce ~6% overhead.
I think the performance is more import and we should keep this logic.
Should we? If you think the do_origin_call() is not simple enough, we
can recover all the 6 regs from the stack directly for the origin call,
which won't
introduce too much overhead, and keep the save/restore logic.
What do you think?
>
> The major simplification will be due to skipping nr_args.
> There won't be a need to do btf model and count the args.
> Just do one trampoline for them all.
>
> Also funcs with 7+ arguments need to be thought through
> from the start.
In the current version, the attachment will be rejected if any functions
have
7+ arguments.
> I think it's ok trade-off if we allow global trampoline
> to be safe to attach to a function with 7+ args (and
> it will not mess with the stack), but bpf prog can only
> access up to 6 args. The kfuncs to access arg 7 might be
> more complex and slower. It's ok trade off.
It's OK for fentry-multi, but we can't allow fexit-multi and
modify_return-multi
to be attached to the function with 7+ args, as we need to do the origin
call, and we can't recover the arguments in the stack for the origin
call for now.
So we can allow the functions with 7+ args to be attached as long as the
accessed
arguments are all in regs for fentry-multi. And I think we need one more
patch to
do the "all accessed arguments are in regs" checking, so maybe we can
put it in
the next series? As current series is a little complex :/
Anyway, I'll have a try to see if we can add this part in this series :)
>
>> +
>> +static __always_inline notrace void
>> +run_tramp_prog(struct kfunc_md_tramp_prog *tramp_prog,
>> + struct bpf_tramp_run_ctx *run_ctx, unsigned long *args)
>> +{
>> + struct bpf_prog *prog;
>> + u64 start_time;
>> +
>> + while (tramp_prog) {
>> + prog = tramp_prog->prog;
>> + run_ctx->bpf_cookie = tramp_prog->cookie;
>> + start_time = bpf_gtramp_enter(prog, run_ctx);
>> +
>> + if (likely(start_time)) {
>> + asm volatile(
>> + CALL_NOSPEC "\n"
>> + : : [thunk_target]"r"(prog->bpf_func), [args]"D"(args)
>> + );
> Why this cannot be "call *(prog->bpf_func)" ?
Do you mean "prog->bpf_func(args, NULL);"? In my previous testing, this
cause
bad performance, and I see others do the indirect call in this way. And
I just do
the benchmark again, it seems the performance is not affected in this
way anymore.
So I think I can replace it with "prog->bpf_func(args, NULL);" in the
next version.
>
>> + }
>> +
>> + bpf_gtramp_exit(prog, start_time, run_ctx);
>> + tramp_prog = tramp_prog->next;
>> + }
>> +}
>> +
>> +static __always_inline notrace int
>> +bpf_global_caller_run(unsigned long *args, unsigned long *ip, int nr_args)
> Pls share top 10 from "perf report" while running the bench.
> I'm curious about what's hot.
> Last time I benchmarked fentry/fexit migrate_disable/enable were
> one the hottest functions. I suspect it's the case here as well.
You are right, the migrate_disable/enable are the hottest functions in
both bpf trampoline and global trampoline. Following is the perf top
for fentry-multi:
36.36% bpf_prog_2dcccf652aac1793_bench_trigger_fentry_multi [k]
bpf_prog_2dcccf652aac1793_bench_trigger_fentry_multi 20.54% [kernel] [k]
migrate_enable 19.35% [kernel] [k] bpf_global_caller_5_run 6.52%
[kernel] [k] bpf_global_caller_5 3.58% libc.so.6 [.] syscall 2.88%
[kernel] [k] entry_SYSCALL_64 1.50% [kernel] [k] memchr_inv 1.39%
[kernel] [k] fput 1.04% [kernel] [k] migrate_disable 0.91% [kernel] [k]
_copy_to_user
And I also did the testing for fentry:
54.63% bpf_prog_2dcccf652aac1793_bench_trigger_fentry [k]
bpf_prog_2dcccf652aac1793_bench_trigger_fentry
10.43% [kernel] [k] migrate_enable
10.07% bpf_trampoline_6442517037 [k] bpf_trampoline_6442517037
8.06% [kernel] [k] __bpf_prog_exit_recur 4.11% libc.so.6 [.] syscall
2.15% [kernel] [k] entry_SYSCALL_64 1.48% [kernel] [k] memchr_inv 1.32%
[kernel] [k] fput 1.16% [kernel] [k] _copy_to_user 0.73% [kernel] [k]
bpf_prog_test_run_raw_tp
The migrate_enable/disable are used to do the recursive checking,
and I even wanted to perform recursive checks in the same way as
ftrace to eliminate this overhead :/
>
next prev parent reply other threads:[~2025-07-15 8:37 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-03 12:15 [PATCH bpf-next v2 00/18] bpf: tracing multi-link support Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 01/18] bpf: add function hash table for tracing-multi Menglong Dong
2025-07-04 16:07 ` kernel test robot
2025-07-15 1:55 ` Alexei Starovoitov
2025-07-15 2:37 ` Menglong Dong
2025-07-15 2:49 ` Alexei Starovoitov
2025-07-15 3:13 ` Menglong Dong
2025-07-15 9:06 ` Menglong Dong
2025-07-15 16:22 ` Alexei Starovoitov
2025-07-03 12:15 ` [PATCH bpf-next v2 02/18] x86,bpf: add bpf_global_caller for global trampoline Menglong Dong
2025-07-15 2:25 ` Alexei Starovoitov
2025-07-15 8:36 ` Menglong Dong [this message]
2025-07-15 9:30 ` Menglong Dong
2025-07-16 16:56 ` Inlining migrate_disable/enable. Was: " Alexei Starovoitov
2025-07-16 18:24 ` Peter Zijlstra
2025-07-16 22:35 ` Alexei Starovoitov
2025-07-16 22:49 ` Steven Rostedt
2025-07-16 22:50 ` Steven Rostedt
2025-07-28 9:20 ` Menglong Dong
2025-07-31 16:15 ` Alexei Starovoitov
2025-08-01 1:42 ` Menglong Dong
2025-08-06 8:44 ` Menglong Dong
2025-08-08 0:58 ` Alexei Starovoitov
2025-08-08 5:48 ` Menglong Dong
2025-08-08 6:32 ` Menglong Dong
2025-08-08 15:47 ` Alexei Starovoitov
2025-07-15 16:35 ` Alexei Starovoitov
2025-07-16 13:05 ` Menglong Dong
2025-07-17 0:59 ` multi-fentry proposal. Was: " Alexei Starovoitov
2025-07-17 1:50 ` Menglong Dong
2025-07-17 2:13 ` Alexei Starovoitov
2025-07-17 2:37 ` Menglong Dong
2025-07-16 14:40 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 03/18] ftrace: factor out ftrace_direct_update from register_ftrace_direct Menglong Dong
2025-07-05 2:41 ` kernel test robot
2025-07-03 12:15 ` [PATCH bpf-next v2 04/18] ftrace: add reset_ftrace_direct_ips Menglong Dong
2025-07-03 15:30 ` Steven Rostedt
2025-07-04 1:54 ` Menglong Dong
2025-07-07 18:52 ` Steven Rostedt
2025-07-08 1:26 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 05/18] bpf: introduce bpf_gtramp_link Menglong Dong
2025-07-04 7:00 ` kernel test robot
2025-07-04 7:52 ` kernel test robot
2025-07-03 12:15 ` [PATCH bpf-next v2 06/18] bpf: tracing: add support to record and check the accessed args Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-14 23:45 ` Menglong Dong
2025-07-15 17:11 ` Andrii Nakryiko
2025-07-16 12:50 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 07/18] bpf: refactor the modules_array to ptr_array Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 08/18] bpf: verifier: add btf to the function args of bpf_check_attach_target Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 09/18] bpf: verifier: move btf_id_deny to bpf_check_attach_target Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 10/18] x86,bpf: factor out arch_bpf_get_regs_nr Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 11/18] bpf: tracing: add multi-link support Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 12/18] libbpf: don't free btf if tracing_multi progs existing Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-15 1:15 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 13/18] libbpf: support tracing_multi Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-15 1:58 ` Menglong Dong
2025-07-15 17:20 ` Andrii Nakryiko
2025-07-16 12:43 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 14/18] libbpf: add btf type hash lookup support Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-15 4:40 ` Menglong Dong
2025-07-15 17:20 ` Andrii Nakryiko
2025-07-16 11:53 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 15/18] libbpf: add skip_invalid and attach_tracing for tracing_multi Menglong Dong
2025-07-14 22:07 ` Andrii Nakryiko
2025-07-15 5:48 ` Menglong Dong
2025-07-15 17:23 ` Andrii Nakryiko
2025-07-16 11:46 ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 16/18] selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 17/18] selftests/bpf: add basic testcases for tracing_multi Menglong Dong
2025-07-08 20:07 ` Alexei Starovoitov
2025-07-09 1:33 ` Menglong Dong
2025-07-14 23:49 ` Ihor Solodrai
2025-07-16 0:26 ` Ihor Solodrai
2025-07-16 0:31 ` Alexei Starovoitov
2025-07-16 0:34 ` Ihor Solodrai
2025-07-03 12:15 ` [PATCH bpf-next v2 18/18] selftests/bpf: add bench tests " Menglong Dong
2025-07-04 8:47 ` [PATCH bpf-next v2 00/18] bpf: tracing multi-link support Jiri Olsa
2025-07-04 8:52 ` Menglong Dong
2025-07-04 8:58 ` Menglong Dong
2025-07-04 9:12 ` Jiri Olsa
2025-07-15 2:31 ` Alexei Starovoitov
2025-07-15 2:44 ` Menglong Dong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45f4d349-7b08-45d3-9bec-3ab75217f9b6@linux.dev \
--to=menglong.dong@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=dongml2@chinatelecom.cn \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=hpa@zytor.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=menglong8.dong@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.