Re: [RFC bpf-next 00/15] support inline tracing with BTF

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alan Maguire <alan.maguire@oracle.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Jiri Olsa <jolsa@kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Thierry Treyer <ttreyer@meta.com>,
	Yonghong Song <yonghong.song@linux.dev>,
	Song Liu <song@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
	Quentin Monnet <qmo@kernel.org>,
	Ihor Solodrai <ihor.solodrai@linux.dev>,
	David Faust <david.faust@oracle.com>,
	"Jose E. Marchesi" <jose.marchesi@oracle.com>,
	bpf <bpf@vger.kernel.org>
Subject: Re: [RFC bpf-next 00/15] support inline tracing with BTF
Date: Fri, 24 Oct 2025 12:53:55 +0100	[thread overview]
Message-ID: <29e31824-5bf3-4bfa-a097-07d3bc36fa33@oracle.com> (raw)
In-Reply-To: <CAEf4BzZx=X6vGqcA8SPU6D+v6k+TR=ZewebXMuXtpmML058piw@mail.gmail.com>

On 23/10/2025 17:16, Andrii Nakryiko wrote:
> On Thu, Oct 23, 2025 at 7:37 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> On 16/10/2025 19:36, Andrii Nakryiko wrote:
>>> On Tue, Oct 14, 2025 at 2:58 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>>
>>>> On 14/10/2025 01:12, Alexei Starovoitov wrote:
>>>>> On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>>>>
>>>>>>
>>>>>> I was trying to avoid being specific about inlines since the same
>>>>>> approach works for function sites with optimized-out parameters and they
>>>>>> could be easily added to the representation (and probably should be in a
>>>>>> future version of this series). Another "extra" source of info
>>>>>> potentially is the (non per-cpu) global variables that Stephen sent
>>>>>> patches for a while back and the feeling was it was too big to add to
>>>>>> vmlinux BTF proper.
>>>>>>
>>>>>> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
>>>>>
>>>>> aux is too abstract and doesn't convey any meaning.
>>>>> How about "BTF.func_info" ? It will cover inlined and optimized funcs.
>>>>>
>>>>
>>>> Sure, works for me.
>>>>
>>>>> Thinking more about reuse of struct btf_type for these...
>>>>> After sleeping on it it feels a bit awkward today, since if they're
>>>>> types they suppose to be in one table with other types,
>>>>> searchable and so on, but we actually don't want them there.
>>>>> btf_find_*() isn't fast and people are trying to optimize it.
>>>>> Also if we teach the kernel to use these loc-s they probably
>>>>> should be in a separate table.
>>>>>
>>>>
>>>> The BTF with location info is a separate split BTF, so it won't regress
>>>> search times of vmlinux/module BTF. Searching by name isn't really a
>>>> need for the non-LOCSEC cases; None of the FUNC_PROTO, LOC_PROTO and
>>>> LOC_PARAM have names, so the searching that will be done to deal with
>>>> inlines will all be within the LOCSEC representations for the inlines,
>>>> and from there it'll just be id-based lookup.
>>>>
>>>> Currently the LOCSECs are sorted internally by address, but we could
>>>> change that to be by name given that name-based lookup is the much more
>>>> likely search mode.
>>>>
>>>> One limitation we hit is that the max BTF vlen number is not sufficient
>>>> to represent all the inlines in one LOCSEC; we max out at specifying a
>>>> vlen of 65535, and need over 400000 LOCSEC entries. So we add multiple
>>>
>>> We have this, currently:
>>>
>>>
>>> /* Max # of struct/union/enum members or func args */
>>> #define BTF_MAX_VLEN    0xffff
>>>
>>> struct btf_type {
>>>         __u32 name_off;
>>>         /* "info" bits arrangement
>>>          * bits  0-15: vlen (e.g. # of struct's members)
>>>          * bits 16-23: unused
>>>          * bits 24-28: kind (e.g. int, ptr, array...etc)
>>>          * bits 29-30: unused
>>>          * bit     31: kind_flag, currently used by
>>>          *             struct, union, enum, fwd, enum64,
>>>          *             decl_tag and type_tag
>>>          */
>>>
>>>
>>> Note those unused 16-23 bits. We can use them to extend vlen up to 8
>>> million, which should hopefully be good enough? This split by name
>>> prefix sounds unnecessarily convoluted, tbh.
>>>
>>
>> That would be great! Do you have a preference for how libbpf might
>> handle this? Currently we have
>>
>>
>> static inline __u16 btf_vlen(const struct btf_type *t)
>> {
>>         return BTF_INFO_VLEN(t->info);
>> }
>>
>> As a result many consumers (in libbpf and elsewhere) use a __u16 for the
>> vlen value.  Would it make sense to add
>>
>> static inline __u32 btf_extended_vlen(const struct btf_type *t)
>> {
>>         return BTF_INFO_VLEN(t->info);
>> }
>>
>> perhaps?
> 
> just update btf_vlen() to return __u32 and use more bits. Those bits
> should be all zeroes today, so all this should be backwards
> compatible.
> 
>>
>>
>>>
>>>
>>>> LOCSECs. That was just a workaround before, but for faster name-based
>>>> lookup we could perhaps make use of the multiple LOCSECs by grouping
>>>> them by sorted function names. So if the first LOCSEC was called
>>>> inline.a and the next LOCSEC inline.c or whatever we'd know locations
>>>> named a*, b* are in that first LOCSEC and then do a binary search within
>>>> it. We could limit the number of LOCSECs to some reasonable upper bound
>>>> like 1024 and this would mean we'd binary search between ~400 LOCSECs
>>>> first and then - once we'd found the right one - within it to optimize
>>>> lookup time.
>>>>
>>>>> global non per-cpu vars fit into current BTF's datasec concept,
>>>>> so they can be another kernel module with a different name.
>>>>>
>>>>> I guess one can argue that LOCSEC is similar to DATASEC.
>>>>> Both need their own search tables separate from the main type table.
>>>>>
>>>>
>>>> Right though we could use a hybrid approach of using the LOCSEC name +
>>>> multiple LOCSECs (which we need anyway) to speed things up.
>>>>>>
>>>>>>> The partially inlined functions were the biggest footgun so far.
>>>>>>> Missing fully inlined is painful, but it's not a footgun.
>>>>>>> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
>>>>>>> user space is not enough. It's great and, probably, can be supported,
>>>>>>> but the kernel should use this "BTF.inline_info" as well to
>>>>>>> preserve "backward compatibility" for functions that were
>>>>>>> not-inlined in an older kernel and got partially inlined in a new kernel.
>>>>>>>
>>>>>>
>>>>>> That would be great; we'd need to teach the kernel to handle multi-split
>>>>>> BTF but I would hope that wouldn't be too tricky.
>>>>>>
>>>>>>> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
>>>>>>> make a lot of sense, but since libbpf has to attach a bunch
>>>>>>> of regular kprobes it seems to me the kernel support is more appropriate
>>>>>>> for the whole thing.
>>>>>>
>>>>>> I'm happy with either a userspace or kernel-based approach; the main aim
>>>>>> is to provide this functionality in as straightforward a form as
>>>>>> possible to tracers/libbpf. I have to confess I didn't follow the whole
>>>>>> kprobe multi progress, but at one stage that was more kprobe-based
>>>>>> right? Would there be any value in exploring a flavour of kprobe-multi
>>>>>> that didn't use fprobe and might work for this sort of use case? As you
>>>>>> say if we had that keeping a user-space based approach might be more
>>>>>> attractive as an option.
>>>>>
>>>>> Agree.
>>>>>
>>>>> Jiri,
>>>>> how hard would it be to make multi-kprobe work on arbitrary IPs ?
>>>>>
>>>>>>
>>>>>>> I mean when the kernel processes SEC("fentry/foo") into partially
>>>>>>> inlined function "foo" it should use fentry for "foo" and
>>>>>>> automatically add kprobe into inlined callsites and automatically
>>>>>>> generated code that collects arguments from appropriate registers
>>>>>>> and make "fentry/foo" behave like "foo" was not inlined at all.
>>>>>>> Arguably, we can use a new attach type.
>>>>>>> If we teach the kernel to do that then doing bpf_loc_arg() and a bunch
>>>>>>> of regular kprobes from libbpf is unnecessary.
>>>>>>> The kernel can do the same transparently and prepare the args
>>>>>>> depending on location.
>>>>>>> If some of the callsites are missing args it can fail the whole operation.
>>>>>>
>>>>>> There's a few options here but I think having attach modes which are
>>>>>> selectable - either best effort or all-or-none would both be needed I
>>>>>> think.
>>>>>
>>>>> Exactly. For partially inlined we would need all-or-none,
>>>>> but I see a case where somebody would want to say:
>>>>> "pls attach to all places where foo() is called and since
>>>>> it's inlined the actual entry point may not be accurate and it's ok".
>>>>>
>>>>> The latter would probably need a flag in tracing tools like bpftrace.
>>>>> I think all-or-none is a better default.
>>>>>
>>>>
>>>> Yep, agree.
>>>>
>>>>>>> Of course, doing the whole thing from libbpf feels good,
>>>>>>> since we're burdening the kernel with extra complexity,
>>>>>>> but lack of kprobe-multi changes the way to think about this trade off.
>>>>>>>
>>>>>>> Whether we decide that the kernel should do it or stay with bpf_loc_arg()
>>>>>>> the first few patches and pahole support can/should be landed first.
>>>>>>>
>>>>>>
>>>>>> Sounds great! Having patches 1-10 would be useful as that would allow us
>>>>>> in turn to update pahole's libbpf submodule commit to generate location
>>>>>> data, which would then allow us to update kbuild and start using it for
>>>>>> attach. So we can focus on generating the inline info first, and then
>>>>>> think about how we want to present that info to consumers.
>>>>>
>>>>> Yep. Please post pahole patches for review. I doubt folks
>>>>> will look into your git tree ;)
>>>>>
>>>>
>>>
>>> BTW, what happened to the self-described BTF patches? With these
>>> additions we are going to break all the BTF-based tooling one more
>>> time. Let's add minimal amount of changes to BTF to allow tools to
>>> skip unknown BTF types and dump the rest? I don't remember all the
>>> details by now, was there any major blocker last time? I feel like
>>> that minimal approach of fixed size + vlen * vlen_size would still
>>> work even for all these newly added types (even with the alternative
>>> for LOC_PARAM I mention in the corresponding patch).
>>>
>>>
>>
>> Yep that scheme would still work. The reason I didn't prioritize it here
>> is that the BTF with new LOC kinds is separate from the BTF that legacy
>> tools would be looking at, but I'd be happy to revive it if it'd help.
> 
> We are coming up on another big BTF update, so I think it's time to
> add this minimal self-describing info and teach bpftool and other
> tools to understand this, so that going forward we can add new types
> without breaking anything. So yeah, I think we should revive and land
> it roughly in the same time frame.
>

Ok sounds good, I'll work on reviving that series as a prerequisite for
the location stuff ASAP.

One other BTF UAPI issue maybe we should look at; should we steal one
more bit for BTF kind representation; we currently have room for 32 and
are using 19, with 3 more for the location stuff. Feels like we should
move to supporting 64 kinds by stealing a bit there too, what do you
think? That would still leave us with one unused bit in "info".

Alan

next prev parent reply	other threads:[~2025-10-24 11:54 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
2025-10-08 17:34 ` [RFC bpf-next 01/15] bpf: Extend UAPI to support location information Alan Maguire
2025-10-16 18:36   ` Andrii Nakryiko
2025-10-17  8:43     ` Alan Maguire
2025-10-20 20:57       ` Andrii Nakryiko
2025-10-23  8:17         ` Alan Maguire
2025-11-05  0:43           ` Andrii Nakryiko
2025-10-23  0:56   ` Eduard Zingerman
2025-10-23  8:35     ` Alan Maguire
2025-10-08 17:34 ` [RFC bpf-next 02/15] libbpf: Add support for BTF kinds LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
2025-10-23  0:57   ` Eduard Zingerman
2025-10-23 19:18   ` Eduard Zingerman
2025-10-23 19:59     ` Eduard Zingerman
2025-10-08 17:34 ` [RFC bpf-next 03/15] libbpf: Add option to retrieve map from old->new ids from btf__dedup() Alan Maguire
2025-10-16 18:39   ` Andrii Nakryiko
2025-10-17  8:56     ` Alan Maguire
2025-10-20 21:03       ` Andrii Nakryiko
2025-10-23  8:25         ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 04/15] libbpf: Fix parsing of multi-split BTF Alan Maguire
2025-10-16 18:36   ` Andrii Nakryiko
2025-10-17 13:47     ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 05/15] bpftool: Add ability to dump LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
2025-10-23  0:57   ` Eduard Zingerman
2025-10-23  8:38     ` Alan Maguire
2025-10-23  8:50       ` Eduard Zingerman
2025-10-08 17:35 ` [RFC bpf-next 06/15] bpftool: Handle multi-split BTF by supporting multiple base BTFs Alan Maguire
2025-10-16 18:36   ` Andrii Nakryiko
2025-10-17 13:47     ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 07/15] selftests/bpf: Test helper support for BTF_KIND_LOC[_PARAM|_PROTO|SEC] Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 08/15] selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to field iter tests Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 09/15] selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to dedup split tests Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 10/15] selftests/bpf: BTF distill tests to ensure LOC[_PARAM|_PROTO] add to split BTF Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 11/15] kbuild: Add support for extra BTF Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m Alan Maguire
2025-10-16 18:37   ` Andrii Nakryiko
2025-10-17 13:54     ` Alan Maguire
2025-10-20 21:05       ` Andrii Nakryiko
2025-10-23  0:58   ` Eduard Zingerman
2025-10-23 12:00     ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 13/15] libbpf: add API to load extra BTF Alan Maguire
2025-10-16 18:37   ` Andrii Nakryiko
2025-10-17 13:55     ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 14/15] libbpf: add support for BTF location attachment Alan Maguire
2025-10-16 18:36   ` Andrii Nakryiko
2025-10-17 14:02     ` Alan Maguire
2025-10-20 21:07       ` Andrii Nakryiko
2025-10-08 17:35 ` [RFC bpf-next 15/15] selftests/bpf: Add test tracing inline site using SEC("kloc") Alan Maguire
2025-10-12 23:45 ` [RFC bpf-next 00/15] support inline tracing with BTF Alexei Starovoitov
2025-10-13  7:38   ` Alan Maguire
2025-10-14  0:12     ` Alexei Starovoitov
2025-10-14  9:58       ` Alan Maguire
2025-10-16 18:36         ` Andrii Nakryiko
2025-10-23 14:37           ` Alan Maguire
2025-10-23 16:16             ` Andrii Nakryiko
2025-10-24 11:53               ` Alan Maguire [this message]
2025-10-14 11:52       ` Jiri Olsa
2025-10-14 14:55         ` Alan Maguire
2025-10-14 23:04           ` Masami Hiramatsu
2025-10-15 14:17           ` Jiri Olsa
2025-10-15 15:19             ` Alan Maguire
2025-10-15 18:35               ` Jiri Olsa
2025-10-23 22:32 ` Eduard Zingerman
2025-10-24 12:54   ` Alan Maguire

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29e31824-5bf3-4bfa-a097-07d3bc36fa33@oracle.com \
    --to=alan.maguire@oracle.com \
    --cc=acme@kernel.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=david.faust@oracle.com \
    --cc=haoluo@google.com \
    --cc=ihor.solodrai@linux.dev \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=jose.marchesi@oracle.com \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=qmo@kernel.org \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=ttreyer@meta.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).