Re: [RFC bpf-next 00/15] support inline tracing with BTF

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alan Maguire <alan.maguire@oracle.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Jiri Olsa <jolsa@kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Thierry Treyer <ttreyer@meta.com>,
	Yonghong Song <yonghong.song@linux.dev>,
	Song Liu <song@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
	Quentin Monnet <qmo@kernel.org>,
	Ihor Solodrai <ihor.solodrai@linux.dev>,
	David Faust <david.faust@oracle.com>,
	"Jose E. Marchesi" <jose.marchesi@oracle.com>,
	bpf <bpf@vger.kernel.org>
Subject: Re: [RFC bpf-next 00/15] support inline tracing with BTF
Date: Thu, 23 Oct 2025 15:37:22 +0100	[thread overview]
Message-ID: <1b7bd33c-1b50-421c-98be-4b6c41d89e1e@oracle.com> (raw)
In-Reply-To: <CAEf4Bza27n44nNcPUtQHMS9OR1BH_NafY1xcRqhKORJMNamP_w@mail.gmail.com>

On 16/10/2025 19:36, Andrii Nakryiko wrote:
> On Tue, Oct 14, 2025 at 2:58 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> On 14/10/2025 01:12, Alexei Starovoitov wrote:
>>> On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>>
>>>>
>>>> I was trying to avoid being specific about inlines since the same
>>>> approach works for function sites with optimized-out parameters and they
>>>> could be easily added to the representation (and probably should be in a
>>>> future version of this series). Another "extra" source of info
>>>> potentially is the (non per-cpu) global variables that Stephen sent
>>>> patches for a while back and the feeling was it was too big to add to
>>>> vmlinux BTF proper.
>>>>
>>>> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
>>>
>>> aux is too abstract and doesn't convey any meaning.
>>> How about "BTF.func_info" ? It will cover inlined and optimized funcs.
>>>
>>
>> Sure, works for me.
>>
>>> Thinking more about reuse of struct btf_type for these...
>>> After sleeping on it it feels a bit awkward today, since if they're
>>> types they suppose to be in one table with other types,
>>> searchable and so on, but we actually don't want them there.
>>> btf_find_*() isn't fast and people are trying to optimize it.
>>> Also if we teach the kernel to use these loc-s they probably
>>> should be in a separate table.
>>>
>>
>> The BTF with location info is a separate split BTF, so it won't regress
>> search times of vmlinux/module BTF. Searching by name isn't really a
>> need for the non-LOCSEC cases; None of the FUNC_PROTO, LOC_PROTO and
>> LOC_PARAM have names, so the searching that will be done to deal with
>> inlines will all be within the LOCSEC representations for the inlines,
>> and from there it'll just be id-based lookup.
>>
>> Currently the LOCSECs are sorted internally by address, but we could
>> change that to be by name given that name-based lookup is the much more
>> likely search mode.
>>
>> One limitation we hit is that the max BTF vlen number is not sufficient
>> to represent all the inlines in one LOCSEC; we max out at specifying a
>> vlen of 65535, and need over 400000 LOCSEC entries. So we add multiple
> 
> We have this, currently:
> 
> 
> /* Max # of struct/union/enum members or func args */
> #define BTF_MAX_VLEN    0xffff
> 
> struct btf_type {
>         __u32 name_off;
>         /* "info" bits arrangement
>          * bits  0-15: vlen (e.g. # of struct's members)
>          * bits 16-23: unused
>          * bits 24-28: kind (e.g. int, ptr, array...etc)
>          * bits 29-30: unused
>          * bit     31: kind_flag, currently used by
>          *             struct, union, enum, fwd, enum64,
>          *             decl_tag and type_tag
>          */
> 
> 
> Note those unused 16-23 bits. We can use them to extend vlen up to 8
> million, which should hopefully be good enough? This split by name
> prefix sounds unnecessarily convoluted, tbh.
>

That would be great! Do you have a preference for how libbpf might
handle this? Currently we have


static inline __u16 btf_vlen(const struct btf_type *t)
{
        return BTF_INFO_VLEN(t->info);
}

As a result many consumers (in libbpf and elsewhere) use a __u16 for the
vlen value.  Would it make sense to add

static inline __u32 btf_extended_vlen(const struct btf_type *t)
{
        return BTF_INFO_VLEN(t->info);
}

perhaps?


> 
> 
>> LOCSECs. That was just a workaround before, but for faster name-based
>> lookup we could perhaps make use of the multiple LOCSECs by grouping
>> them by sorted function names. So if the first LOCSEC was called
>> inline.a and the next LOCSEC inline.c or whatever we'd know locations
>> named a*, b* are in that first LOCSEC and then do a binary search within
>> it. We could limit the number of LOCSECs to some reasonable upper bound
>> like 1024 and this would mean we'd binary search between ~400 LOCSECs
>> first and then - once we'd found the right one - within it to optimize
>> lookup time.
>>
>>> global non per-cpu vars fit into current BTF's datasec concept,
>>> so they can be another kernel module with a different name.
>>>
>>> I guess one can argue that LOCSEC is similar to DATASEC.
>>> Both need their own search tables separate from the main type table.
>>>
>>
>> Right though we could use a hybrid approach of using the LOCSEC name +
>> multiple LOCSECs (which we need anyway) to speed things up.
>>>>
>>>>> The partially inlined functions were the biggest footgun so far.
>>>>> Missing fully inlined is painful, but it's not a footgun.
>>>>> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
>>>>> user space is not enough. It's great and, probably, can be supported,
>>>>> but the kernel should use this "BTF.inline_info" as well to
>>>>> preserve "backward compatibility" for functions that were
>>>>> not-inlined in an older kernel and got partially inlined in a new kernel.
>>>>>
>>>>
>>>> That would be great; we'd need to teach the kernel to handle multi-split
>>>> BTF but I would hope that wouldn't be too tricky.
>>>>
>>>>> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
>>>>> make a lot of sense, but since libbpf has to attach a bunch
>>>>> of regular kprobes it seems to me the kernel support is more appropriate
>>>>> for the whole thing.
>>>>
>>>> I'm happy with either a userspace or kernel-based approach; the main aim
>>>> is to provide this functionality in as straightforward a form as
>>>> possible to tracers/libbpf. I have to confess I didn't follow the whole
>>>> kprobe multi progress, but at one stage that was more kprobe-based
>>>> right? Would there be any value in exploring a flavour of kprobe-multi
>>>> that didn't use fprobe and might work for this sort of use case? As you
>>>> say if we had that keeping a user-space based approach might be more
>>>> attractive as an option.
>>>
>>> Agree.
>>>
>>> Jiri,
>>> how hard would it be to make multi-kprobe work on arbitrary IPs ?
>>>
>>>>
>>>>> I mean when the kernel processes SEC("fentry/foo") into partially
>>>>> inlined function "foo" it should use fentry for "foo" and
>>>>> automatically add kprobe into inlined callsites and automatically
>>>>> generated code that collects arguments from appropriate registers
>>>>> and make "fentry/foo" behave like "foo" was not inlined at all.
>>>>> Arguably, we can use a new attach type.
>>>>> If we teach the kernel to do that then doing bpf_loc_arg() and a bunch
>>>>> of regular kprobes from libbpf is unnecessary.
>>>>> The kernel can do the same transparently and prepare the args
>>>>> depending on location.
>>>>> If some of the callsites are missing args it can fail the whole operation.
>>>>
>>>> There's a few options here but I think having attach modes which are
>>>> selectable - either best effort or all-or-none would both be needed I
>>>> think.
>>>
>>> Exactly. For partially inlined we would need all-or-none,
>>> but I see a case where somebody would want to say:
>>> "pls attach to all places where foo() is called and since
>>> it's inlined the actual entry point may not be accurate and it's ok".
>>>
>>> The latter would probably need a flag in tracing tools like bpftrace.
>>> I think all-or-none is a better default.
>>>
>>
>> Yep, agree.
>>
>>>>> Of course, doing the whole thing from libbpf feels good,
>>>>> since we're burdening the kernel with extra complexity,
>>>>> but lack of kprobe-multi changes the way to think about this trade off.
>>>>>
>>>>> Whether we decide that the kernel should do it or stay with bpf_loc_arg()
>>>>> the first few patches and pahole support can/should be landed first.
>>>>>
>>>>
>>>> Sounds great! Having patches 1-10 would be useful as that would allow us
>>>> in turn to update pahole's libbpf submodule commit to generate location
>>>> data, which would then allow us to update kbuild and start using it for
>>>> attach. So we can focus on generating the inline info first, and then
>>>> think about how we want to present that info to consumers.
>>>
>>> Yep. Please post pahole patches for review. I doubt folks
>>> will look into your git tree ;)
>>>
>>
> 
> BTW, what happened to the self-described BTF patches? With these
> additions we are going to break all the BTF-based tooling one more
> time. Let's add minimal amount of changes to BTF to allow tools to
> skip unknown BTF types and dump the rest? I don't remember all the
> details by now, was there any major blocker last time? I feel like
> that minimal approach of fixed size + vlen * vlen_size would still
> work even for all these newly added types (even with the alternative
> for LOC_PARAM I mention in the corresponding patch).
> 
>

Yep that scheme would still work. The reason I didn't prioritize it here
is that the BTF with new LOC kinds is separate from the BTF that legacy
tools would be looking at, but I'd be happy to revive it if it'd help.

Thanks!

Alan

next prev parent reply	other threads:[~2025-10-23 14:38 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
2025-10-08 17:34 ` [RFC bpf-next 01/15] bpf: Extend UAPI to support location information Alan Maguire
2025-10-16 18:36   ` Andrii Nakryiko
2025-10-17  8:43     ` Alan Maguire
2025-10-20 20:57       ` Andrii Nakryiko
2025-10-23  8:17         ` Alan Maguire
2025-11-05  0:43           ` Andrii Nakryiko
2025-10-23  0:56   ` Eduard Zingerman
2025-10-23  8:35     ` Alan Maguire
2025-10-08 17:34 ` [RFC bpf-next 02/15] libbpf: Add support for BTF kinds LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
2025-10-23  0:57   ` Eduard Zingerman
2025-10-23 19:18   ` Eduard Zingerman
2025-10-23 19:59     ` Eduard Zingerman
2025-10-08 17:34 ` [RFC bpf-next 03/15] libbpf: Add option to retrieve map from old->new ids from btf__dedup() Alan Maguire
2025-10-16 18:39   ` Andrii Nakryiko
2025-10-17  8:56     ` Alan Maguire
2025-10-20 21:03       ` Andrii Nakryiko
2025-10-23  8:25         ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 04/15] libbpf: Fix parsing of multi-split BTF Alan Maguire
2025-10-16 18:36   ` Andrii Nakryiko
2025-10-17 13:47     ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 05/15] bpftool: Add ability to dump LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
2025-10-23  0:57   ` Eduard Zingerman
2025-10-23  8:38     ` Alan Maguire
2025-10-23  8:50       ` Eduard Zingerman
2025-10-08 17:35 ` [RFC bpf-next 06/15] bpftool: Handle multi-split BTF by supporting multiple base BTFs Alan Maguire
2025-10-16 18:36   ` Andrii Nakryiko
2025-10-17 13:47     ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 07/15] selftests/bpf: Test helper support for BTF_KIND_LOC[_PARAM|_PROTO|SEC] Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 08/15] selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to field iter tests Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 09/15] selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to dedup split tests Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 10/15] selftests/bpf: BTF distill tests to ensure LOC[_PARAM|_PROTO] add to split BTF Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 11/15] kbuild: Add support for extra BTF Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m Alan Maguire
2025-10-16 18:37   ` Andrii Nakryiko
2025-10-17 13:54     ` Alan Maguire
2025-10-20 21:05       ` Andrii Nakryiko
2025-10-23  0:58   ` Eduard Zingerman
2025-10-23 12:00     ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 13/15] libbpf: add API to load extra BTF Alan Maguire
2025-10-16 18:37   ` Andrii Nakryiko
2025-10-17 13:55     ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 14/15] libbpf: add support for BTF location attachment Alan Maguire
2025-10-16 18:36   ` Andrii Nakryiko
2025-10-17 14:02     ` Alan Maguire
2025-10-20 21:07       ` Andrii Nakryiko
2025-10-08 17:35 ` [RFC bpf-next 15/15] selftests/bpf: Add test tracing inline site using SEC("kloc") Alan Maguire
2025-10-12 23:45 ` [RFC bpf-next 00/15] support inline tracing with BTF Alexei Starovoitov
2025-10-13  7:38   ` Alan Maguire
2025-10-14  0:12     ` Alexei Starovoitov
2025-10-14  9:58       ` Alan Maguire
2025-10-16 18:36         ` Andrii Nakryiko
2025-10-23 14:37           ` Alan Maguire [this message]
2025-10-23 16:16             ` Andrii Nakryiko
2025-10-24 11:53               ` Alan Maguire
2025-10-14 11:52       ` Jiri Olsa
2025-10-14 14:55         ` Alan Maguire
2025-10-14 23:04           ` Masami Hiramatsu
2025-10-15 14:17           ` Jiri Olsa
2025-10-15 15:19             ` Alan Maguire
2025-10-15 18:35               ` Jiri Olsa
2025-10-23 22:32 ` Eduard Zingerman
2025-10-24 12:54   ` Alan Maguire

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b7bd33c-1b50-421c-98be-4b6c41d89e1e@oracle.com \
    --to=alan.maguire@oracle.com \
    --cc=acme@kernel.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=david.faust@oracle.com \
    --cc=haoluo@google.com \
    --cc=ihor.solodrai@linux.dev \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=jose.marchesi@oracle.com \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=qmo@kernel.org \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=ttreyer@meta.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).