bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Menglong Dong <menglong.dong@linux.dev>
To: Jiri Olsa <jolsa@kernel.org>, Andrii Nakryiko <andrii@kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Menglong Dong <menglong8.dong@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>, bpf <bpf@vger.kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: multi-fentry proposal. Was: [PATCH bpf-next v2 02/18] x86,bpf: add bpf_global_caller for global trampoline
Date: Thu, 17 Jul 2025 09:50:36 +0800	[thread overview]
Message-ID: <3364591.aeNJFYEL58@7940hx> (raw)
In-Reply-To: <CAADnVQJ47PJXxjqES8BvtWkPq3fj9D0oTF6qqeNNpG66-_MGCg@mail.gmail.com>

On Thursday, July 17, 2025 8:59 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> write:
> On Wed, Jul 16, 2025 at 6:06 AM Menglong Dong <menglong.dong@linux.dev> wrote:
> >
> > On Wednesday, July 16, 2025 12:35 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> write:
> > > On Tue, Jul 15, 2025 at 1:37 AM Menglong Dong <menglong.dong@linux.dev> wrote:
> > > >
> > > >
> > > > On 7/15/25 10:25, Alexei Starovoitov wrote:
> > [......]
> > > >
> > > > According to my benchmark, it has ~5% overhead to save/restore
> > > > *5* variants when compared with *0* variant. The save/restore of regs
> > > > is fast, but it still need 12 insn, which can produce ~6% overhead.
> > >
> > > I think it's an ok trade off, because with one global trampoline
> > > we do not need to call rhashtable lookup before entering bpf prog.
> > > bpf prog will do it on demand if/when it needs to access arguments.
> > > This will compensate for a bit of lost performance due to extra save/restore.
> >
> > I don't understand here :/
> >
> > The rhashtable lookup is done at the beginning of the global trampoline,
> > which is called before we enter bpf prog. The bpf progs is stored in the
> > kfunc_md, and we need get them from the hash table.
> 
> Ahh. Right.
> 
> Looking at the existing bpf trampoline... It has complicated logic
> to handle livepatching and tailcalls. Your global trampoline
> doesn't, and once that is added it's starting to feel that it will
> look just as complex as the current one.
> So I think we better repurpose what we have.
> Maybe we can rewrite the existing one in C too.

You are right, the tailcalls is not handled yet. But for the livepatching,
it is already handled, as we always get the origin ip from the stack
and call it, just like how the bpf trampoline handle the livepatching.
So no addition handling is needed here.

> 
> How about the following approach.
> I think we discussed something like this in the past
> and Jiri tried to implement something like this.
> Andrii reminded me recently about it.
> 
> Say, we need to attach prog A to 30k functions.
> 10k with 2 args, 10k with 3 args, and 10k with 7 args.
> We can generate 3 _existing_ bpf trampolines for 2,3,7 args
> with hard coded prog A in there (the cookies would need to be
> fetched via binary search similar to kprobe-multi).
> The arch_prepare_bpf_trampoline() supports BPF_TRAMP_F_ORIG_STACK.
> So one 2-arg trampoline will work to invoke prog A in all 10k 2-arg functions.
> We don't need to match types, but have to compare that btf_func_model-s
> are the same.
> 
> Menglong, your global trampoline for 0,1,..6 args works only for x86,
> because btf_func_model doesn't care about sizes of args,
> but it's not the correct mental model to use.
> 
> The above "10k with 2 args" is a simplified example.
> We will need an arch specific callback is_btf_func_model_equal()
> that will compare func models in arch specific ways.
> For x86-64 the number of args is all it needs.
> For other archs it will compare sizes and flags too.
> So 30k functions will be sorted into
> 10k with btf_func_model_1, 10k with btf_func_model_2 and so on.
> And the corresponding number of equivalent trampolines will be generated.
> 
> Note there will be no actual BTF types. All args will be untyped and
> untrusted unlike current fentry.
> We can go further and sort 30k functions by comparing BTFs
> instead of btf_func_model-s, but I suspect 30k funcs will be split
> into several thousands of exact BTFs. At that point multi-fentry
> benefits are diminishing and we might as well generate 30k unique
> bpf trampolines for 30k functions and avoid all the complexity.
> So I would sort by btf_func_model compared by arch specific comparator.
> 
> Now say prog B needs to be attached to another 30k functions.
> If all 30k+30k functions are different then it's the same as
> the previous step.
> Say, prog A is attached to 10k funcs with btf_func_model_1.
> If prog B wants to attach to the exact same func set then we
> just regenerate bpf trampoline with hard coded progs A and B
> and reattach.
> If not then we need to split the set into up to 3 sets.
> Say, prog B wants 5k funcs, but only 1k func are common:
> (prog_A, 9k func with btf_func_model_1) -> bpf trampoline X
> (prog_A, prog_B, 1k funcs with btf_func_model_1) -> bpf trampoline Y
> (prog_B, 4k funcs with btf_func_model_1) -> bpf trampoline Z
> 
> And so on when prog C needs to be attached.
> At detach time we can merge sets/trampolines,
> but for now we can leave it all fragmented.
> Unlike regular fentry progs the multi-fentry progs are not going to
> be attached for long time. So we can reduce the detach complexity.
> 
> The nice part of the algorithm is that coexistence of fentry
> and multi-fentry is easy.
> If fentry is already attached to some function we just
> attach multi-fentry prog to that bpf trampoline.
> If multi-fentry was attached first and fentry needs to be attached,
> we create a regular bpf trampoline and add both progs there.

This seems not easy, and it is exactly how I handle the
coexistence now:

  https://lore.kernel.org/bpf/20250528034712.138701-16-dongml2@chinatelecom.cn/
  https://lore.kernel.org/bpf/20250528034712.138701-17-dongml2@chinatelecom.cn/
  https://lore.kernel.org/bpf/20250528034712.138701-18-dongml2@chinatelecom.cn/

The most difficult part is that we need a way to replace the the
multi-fentry with fentry for the function in the ftrace atomically. Of
course, we can remove the global trampoline first, and then attach
the bpf trampoline, which will make things much easier. But a
short suspend will happen for the progs in fentry-multi.

> 
> The intersect and sorting by btf_func_model is not trivial,
> but we can hold global trampoline_mutex, so no concerns of races.
> 
> Example:
> bpf_link_A is a set of:
> (prog_A, funcs X,Y with btf_func_model_1)
> (prog_A, funcs N,M with btf_func_model_2)
> 
> To attach prog B via bpf_link_B that wants:
> (prog_B, funcs Y,Z with btf_func_model_1)
> (prog_B, funcs P,Q with btf_func_model_3)
> 
> walk all existing links, intersect and split, and update the links.
> At the end:
> 
> bpf_link_A:
> (prog_A, funcs X with btf_func_model_1)
> (prog_A, prog_B funcs Y with btf_func_model_1)
> (prog_A, funcs N,M with btf_func_model_2)
> 
> bpf_link_B:
> (prog_A, prog_B funcs Y with btf_func_model_1)
> (prog_B, funcs Z with btf_func_model_1)
> (prog_B, funcs P,Q with btf_func_model_3)
> 
> When link is detached: walk its own tuples, remove the prog,
> if nr_progs == 0 -> detach corresponding trampoline,
> if nr_progs > 0 -> remove prog and regenerate trampoline.
> 
> If fentry prog C needs to be attached to N it might split bpf_link_A:
> (prog_A, funcs X with btf_func_model_1)
> (prog_A, prog_B funcs Y with btf_func_model_1)
> (prog_A, funcs M with btf_func_model_2)
> (prog_A, prog_C funcs N with _fentry_)
> 
> Last time we gave up on it because we discovered that
> overlap support was too complicated, but I cannot recall now
> what it was :)
> Maybe all of the above repeating some old mistakes.

In my impression, this is exactly the solution of Jiri's, and this is
part of the discussion that I know:

  https://lore.kernel.org/bpf/ZfKY6E8xhSgzYL1I@krava/

> 
> Jiri,
> How does the above proposal look to you?
> 





  reply	other threads:[~2025-07-17  1:51 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-03 12:15 [PATCH bpf-next v2 00/18] bpf: tracing multi-link support Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 01/18] bpf: add function hash table for tracing-multi Menglong Dong
2025-07-04 16:07   ` kernel test robot
2025-07-15  1:55   ` Alexei Starovoitov
2025-07-15  2:37     ` Menglong Dong
2025-07-15  2:49       ` Alexei Starovoitov
2025-07-15  3:13         ` Menglong Dong
2025-07-15  9:06           ` Menglong Dong
2025-07-15 16:22             ` Alexei Starovoitov
2025-07-03 12:15 ` [PATCH bpf-next v2 02/18] x86,bpf: add bpf_global_caller for global trampoline Menglong Dong
2025-07-15  2:25   ` Alexei Starovoitov
2025-07-15  8:36     ` Menglong Dong
2025-07-15  9:30       ` Menglong Dong
2025-07-16 16:56         ` Inlining migrate_disable/enable. Was: " Alexei Starovoitov
2025-07-16 18:24           ` Peter Zijlstra
2025-07-16 22:35             ` Alexei Starovoitov
2025-07-16 22:49               ` Steven Rostedt
2025-07-16 22:50                 ` Steven Rostedt
2025-07-28  9:20               ` Menglong Dong
2025-07-31 16:15                 ` Alexei Starovoitov
2025-08-01  1:42                   ` Menglong Dong
2025-08-06  8:44                   ` Menglong Dong
2025-08-08  0:58                     ` Alexei Starovoitov
2025-08-08  5:48                       ` Menglong Dong
2025-08-08  6:32                       ` Menglong Dong
2025-08-08 15:47                         ` Alexei Starovoitov
2025-07-15 16:35       ` Alexei Starovoitov
2025-07-16 13:05         ` Menglong Dong
2025-07-17  0:59           ` multi-fentry proposal. Was: " Alexei Starovoitov
2025-07-17  1:50             ` Menglong Dong [this message]
2025-07-17  2:13               ` Alexei Starovoitov
2025-07-17  2:37                 ` Menglong Dong
2025-07-16 14:40         ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 03/18] ftrace: factor out ftrace_direct_update from register_ftrace_direct Menglong Dong
2025-07-05  2:41   ` kernel test robot
2025-07-03 12:15 ` [PATCH bpf-next v2 04/18] ftrace: add reset_ftrace_direct_ips Menglong Dong
2025-07-03 15:30   ` Steven Rostedt
2025-07-04  1:54     ` Menglong Dong
2025-07-07 18:52       ` Steven Rostedt
2025-07-08  1:26         ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 05/18] bpf: introduce bpf_gtramp_link Menglong Dong
2025-07-04  7:00   ` kernel test robot
2025-07-04  7:52   ` kernel test robot
2025-07-03 12:15 ` [PATCH bpf-next v2 06/18] bpf: tracing: add support to record and check the accessed args Menglong Dong
2025-07-14 22:07   ` Andrii Nakryiko
2025-07-14 23:45     ` Menglong Dong
2025-07-15 17:11       ` Andrii Nakryiko
2025-07-16 12:50         ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 07/18] bpf: refactor the modules_array to ptr_array Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 08/18] bpf: verifier: add btf to the function args of bpf_check_attach_target Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 09/18] bpf: verifier: move btf_id_deny to bpf_check_attach_target Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 10/18] x86,bpf: factor out arch_bpf_get_regs_nr Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 11/18] bpf: tracing: add multi-link support Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 12/18] libbpf: don't free btf if tracing_multi progs existing Menglong Dong
2025-07-14 22:07   ` Andrii Nakryiko
2025-07-15  1:15     ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 13/18] libbpf: support tracing_multi Menglong Dong
2025-07-14 22:07   ` Andrii Nakryiko
2025-07-15  1:58     ` Menglong Dong
2025-07-15 17:20       ` Andrii Nakryiko
2025-07-16 12:43         ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 14/18] libbpf: add btf type hash lookup support Menglong Dong
2025-07-14 22:07   ` Andrii Nakryiko
2025-07-15  4:40     ` Menglong Dong
2025-07-15 17:20       ` Andrii Nakryiko
2025-07-16 11:53         ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 15/18] libbpf: add skip_invalid and attach_tracing for tracing_multi Menglong Dong
2025-07-14 22:07   ` Andrii Nakryiko
2025-07-15  5:48     ` Menglong Dong
2025-07-15 17:23       ` Andrii Nakryiko
2025-07-16 11:46         ` Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 16/18] selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c Menglong Dong
2025-07-03 12:15 ` [PATCH bpf-next v2 17/18] selftests/bpf: add basic testcases for tracing_multi Menglong Dong
2025-07-08 20:07   ` Alexei Starovoitov
2025-07-09  1:33     ` Menglong Dong
2025-07-14 23:49     ` Ihor Solodrai
2025-07-16  0:26       ` Ihor Solodrai
2025-07-16  0:31         ` Alexei Starovoitov
2025-07-16  0:34           ` Ihor Solodrai
2025-07-03 12:15 ` [PATCH bpf-next v2 18/18] selftests/bpf: add bench tests " Menglong Dong
2025-07-04  8:47 ` [PATCH bpf-next v2 00/18] bpf: tracing multi-link support Jiri Olsa
2025-07-04  8:52   ` Menglong Dong
2025-07-04  8:58     ` Menglong Dong
2025-07-04  9:12       ` Jiri Olsa
2025-07-15  2:31 ` Alexei Starovoitov
2025-07-15  2:44   ` Menglong Dong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3364591.aeNJFYEL58@7940hx \
    --to=menglong.dong@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=menglong8.dong@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).