netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Olsa <olsajiri@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: 梦龙董 <dongmenglong.8@bytedance.com>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Eddy Z" <eddyz87@gmail.com>, "Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@kernel.org>,
	"Stanislav Fomichev" <sdf@google.com>,
	"Hao Luo" <haoluo@google.com>,
	"Alexander Gordeev" <agordeev@linux.ibm.com>,
	"Christian Borntraeger" <borntraeger@linux.ibm.com>,
	"Sven Schnelle" <svens@linux.ibm.com>,
	"David S. Miller" <davem@davemloft.net>,
	"David Ahern" <dsahern@kernel.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"X86 ML" <x86@kernel.org>, "Steven Rostedt" <rostedt@goodmis.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Quentin Monnet" <quentin@isovalent.com>,
	bpf <bpf@vger.kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-riscv <linux-riscv@lists.infradead.org>,
	linux-s390 <linux-s390@vger.kernel.org>,
	"Network Development" <netdev@vger.kernel.org>,
	linux-trace-kernel@vger.kernel.org,
	"open list:KERNEL SELFTEST FRAMEWORK"
	<linux-kselftest@vger.kernel.org>,
	linux-stm32@st-md-mailman.stormreply.com
Subject: Re: [External] Re: [PATCH bpf-next v2 1/9] bpf: tracing: add support to record and check the accessed args
Date: Thu, 14 Mar 2024 07:27:52 +0100	[thread overview]
Message-ID: <ZfKY6E8xhSgzYL1I@krava> (raw)
In-Reply-To: <CAADnVQLHpi3J6cBJ0QBgCQ2aY6fWGnVvNGdfi3W-jmoa9d1eVQ@mail.gmail.com>

On Wed, Mar 13, 2024 at 05:25:35PM -0700, Alexei Starovoitov wrote:
> On Tue, Mar 12, 2024 at 6:53 PM 梦龙董 <dongmenglong.8@bytedance.com> wrote:
> >
> > On Wed, Mar 13, 2024 at 12:42 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Mar 11, 2024 at 7:42 PM 梦龙董 <dongmenglong.8@bytedance.com> wrote:
> > > >
> > [......]
> > >
> > > I see.
> > > I thought you're sharing the trampoline across attachments.
> > > (since bpf prog is the same).
> >
> > That seems to be a good idea, which I hadn't thought before.
> >
> > > But above approach cannot possibly work with a shared trampoline.
> > > You need to create individual trampoline for all attachment
> > > and point them to single bpf prog.
> > >
> > > tbh I'm less excited about this feature now, since sharing
> > > the prog across different attachments is nice, but it won't scale
> > > to thousands of attachments.
> > > I assumed that there will be a single trampoline with max(argno)
> > > across attachments and attach/detach will scale to thousands.
> > >
> > > With individual trampoline this will work for up to a hundred
> > > attachments max.
> >
> > What does "a hundred attachments max" means? Can't I
> > trace thousands of kernel functions with a bpf program of
> > tracing multi-link?
> 
> I mean what time does it take to attach one program
> to 100 fentry-s ?
> What is the time for 1k and for 10k ?
> 
> The kprobe multi test attaches to pretty much all funcs in
> /sys/kernel/tracing/available_filter_functions
> and it's fast enough to run in test_progs on every commit in bpf CI.
> See get_syms() in prog_tests/kprobe_multi_test.c
> 
> Can this new multi fentry do that?
> and at what speed?
> The answer will decide how applicable this api is going to be.
> Generating different trampolines for every attach point
> is an approach as well. Pls benchmark it too.
> 
> > >
> > > Let's step back.
> > > What is the exact use case you're trying to solve?
> > > Not an artificial one as selftest in patch 9, but the real use case?
> >
> > I have a tool, which is used to diagnose network problems,
> > and its name is "nettrace". It will trace many kernel functions, whose
> > function args contain "skb", like this:
> >
> > ./nettrace -p icmp
> > begin trace...
> > ***************** ffff889be8fbd500,ffff889be8fbcd00 ***************
> > [1272349.614564] [dev_gro_receive     ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614579] [__netif_receive_skb_core] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614585] [ip_rcv              ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614592] [ip_rcv_core         ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614599] [skb_clone           ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614616] [nf_hook_slow        ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614629] [nft_do_chain        ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614635] [ip_rcv_finish       ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614643] [ip_route_input_slow ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614647] [fib_validate_source ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614652] [ip_local_deliver    ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614658] [nf_hook_slow        ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614663] [ip_local_deliver_finish] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614666] [icmp_rcv            ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614671] [icmp_echo           ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614675] [icmp_reply          ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614715] [consume_skb         ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614722] [packet_rcv          ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> > [1272349.614725] [consume_skb         ] ICMP: 169.254.128.15 ->
> > 172.27.0.6 ping request, seq: 48220
> >
> > For now, I have to create a bpf program for every kernel
> > function that I want to trace, which is up to 200.
> >
> > With this multi-link, I only need to create 5 bpf program,
> > like this:
> >
> > int BPF_PROG(trace_skb_1, struct *skb);
> > int BPF_PROG(trace_skb_2, u64 arg0, struct *skb);
> > int BPF_PROG(trace_skb_3, u64 arg0, u64 arg1, struct *skb);
> > int BPF_PROG(trace_skb_4, u64 arg0, u64 arg1, u64 arg2, struct *skb);
> > int BPF_PROG(trace_skb_5, u64 arg0, u64 arg1, u64 arg2, u64 arg3, struct *skb);
> >
> > Then, I can attach trace_skb_1 to all the kernel functions that
> > I want to trace and whose first arg is skb; attach trace_skb_2 to kernel
> > functions whose 2nd arg is skb, etc.
> >
> > Or, I can create only one bpf program and store the index
> > of skb to the attachment cookie, and attach this program to all
> > the kernel functions that I want to trace.
> >
> > This is my use case. With the multi-link, now I only have
> > 1 bpf program, 1 bpf link, 200 trampolines, instead of 200
> > bpf programs, 200 bpf link and 200 trampolines.
> 
> I see. The use case makes sense to me.
> Andrii's retsnoop is used to do similar thing before kprobe multi was
> introduced.
> 
> > The shared trampoline you mentioned seems to be a
> > wonderful idea, which can make the 200 trampolines
> > to one. Let me have a look, we create a trampoline and
> > record the max args count of all the target functions, let's
> > mark it as arg_count.
> >
> > During generating the trampoline, we assume that the
> > function args count is arg_count. During attaching, we
> > check the consistency of all the target functions, just like
> > what we do now.
> 
> For one trampoline to handle all attach points we might
> need some arch support, but we can start simple.
> Make btf_func_model with MAX_BPF_FUNC_REG_ARGS
> by calling btf_distill_func_proto() with func==NULL.
> And use that to build a trampoline.
> 
> The challenge is how to use minimal number of trampolines
> when bpf_progA is attached for func1, func2, func3
> and bpf_progB is attached to func3, func4, func5.
> We'd still need 3 trampolines:
> for func[12] to call bpf_progA,
> for func3 to call bpf_progA and bpf_progB,
> for func[45] to call bpf_progB.
> 
> Jiri was trying to solve it in the past. His slides from LPC:
> https://lpc.events/event/16/contributions/1350/attachments/1033/1983/plumbers.pdf
> 
> Pls study them and his prior patchsets to avoid stepping on the same rakes.

yep, I refrained from commenting not to take you down the same path
I did, but if you insist.. ;-) 

I managed to forgot almost all of it, but the IIRC the main pain point
was that at some point I had to split existing trampoline which caused
the whole trampolines management and error paths to become a mess

I tried to explain things in [1] changelog and the latest patchset is in [0]

feel free to use/take anything, but I advice strongly against it ;-)
please let me know if I can help

jirka


[0] https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=bpf/batch
[1] https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=bpf/batch&id=52a1d4acdf55df41e99ca2cea51865e6821036ce

  reply	other threads:[~2024-03-14  6:27 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-11  9:35 [PATCH bpf-next v2 0/9] bpf: make tracing program support multi-link Menglong Dong
2024-03-11  9:35 ` [PATCH bpf-next v2 1/9] bpf: tracing: add support to record and check the accessed args Menglong Dong
2024-03-12  1:46   ` Alexei Starovoitov
2024-03-12  2:01     ` [External] " 梦龙董
2024-03-12  2:09       ` Alexei Starovoitov
2024-03-12  2:42         ` 梦龙董
2024-03-12  2:49           ` 梦龙董
2024-03-12 16:42           ` Alexei Starovoitov
2024-03-13  1:53             ` 梦龙董
2024-03-14  0:25               ` Alexei Starovoitov
2024-03-14  6:27                 ` Jiri Olsa [this message]
2024-03-15  8:17                   ` 梦龙董
2024-03-15  8:00                 ` 梦龙董
2024-03-28 14:43                   ` 梦龙董
2024-03-28 15:13                     ` Steven Rostedt
2024-03-28 23:17                       ` Alexei Starovoitov
2024-03-30  3:36                         ` 梦龙董
2024-03-29 23:28                       ` Andrii Nakryiko
2024-03-29 23:28                         ` Andrii Nakryiko
2024-03-29 23:28                         ` Andrii Nakryiko
2024-03-30  4:16                         ` 梦龙董
2024-03-30 12:27                         ` Steven Rostedt
2024-03-30 17:52                           ` Jiri Olsa
2024-03-31  2:34                             ` Andrii Nakryiko
2024-03-31  2:34                               ` Andrii Nakryiko
2024-03-30  3:18                       ` 梦龙董
2024-03-30 19:37                         ` Steven Rostedt
2024-04-01  2:28                           ` 梦龙董
2024-04-01 15:59                             ` Steven Rostedt
2024-03-11  9:35 ` [PATCH bpf-next v2 2/9] bpf: refactor the modules_array to ptr_array Menglong Dong
2024-03-12  1:48   ` Alexei Starovoitov
2024-03-12  1:53     ` [External] " 梦龙董
2024-03-11  9:35 ` [PATCH bpf-next v2 3/9] bpf: trampoline: introduce struct bpf_tramp_link_conn Menglong Dong
2024-03-11  9:35 ` [PATCH bpf-next v2 4/9] bpf: trampoline: introduce bpf_tramp_multi_link Menglong Dong
2024-03-11  9:35 ` [PATCH bpf-next v2 5/9] bpf: verifier: add btf to the function args of bpf_check_attach_target Menglong Dong
2024-03-12  1:51   ` Alexei Starovoitov
2024-03-12  3:13     ` [External] " 梦龙董
2024-03-11  9:35 ` [PATCH bpf-next v2 6/9] bpf: tracing: add multi-link support Menglong Dong
2024-03-11 18:59   ` kernel test robot
2024-03-11 21:36   ` kernel test robot
2024-03-11  9:35 ` [PATCH bpf-next v2 7/9] libbpf: don't free btf if program of multi-link tracing existing Menglong Dong
2024-03-12  1:55   ` Alexei Starovoitov
2024-03-12  2:05     ` [External] " 梦龙董
2024-03-12  2:13       ` Alexei Starovoitov
2024-03-12  2:56         ` 梦龙董
2024-03-11  9:35 ` [PATCH bpf-next v2 8/9] libbpf: add support for the multi-link of tracing Menglong Dong
2024-03-11 15:29   ` Quentin Monnet
2024-03-12  1:43     ` [External] " 梦龙董
2024-03-12  1:56   ` Alexei Starovoitov
2024-03-12  2:44     ` [External] " 梦龙董
2024-03-12 16:11       ` Alexei Starovoitov
2024-03-13  1:14         ` 梦龙董
2024-03-11  9:35 ` [PATCH bpf-next v2 9/9] selftests/bpf: add testcases for " Menglong Dong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfKY6E8xhSgzYL1I@krava \
    --to=olsajiri@gmail.com \
    --cc=agordeev@linux.ibm.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=borntraeger@linux.ibm.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dongmenglong.8@bytedance.com \
    --cc=dsahern@kernel.org \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-stm32@st-md-mailman.stormreply.com \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=netdev@vger.kernel.org \
    --cc=quentin@isovalent.com \
    --cc=rostedt@goodmis.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=svens@linux.ibm.com \
    --cc=x86@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).