bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alan Maguire <alan.maguire@oracle.com>
To: Thierry Treyer <ttreyer@meta.com>
Cc: "dwarves@vger.kernel.org" <dwarves@vger.kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"acme@kernel.org" <acme@kernel.org>,
	"ast@kernel.org" <ast@kernel.org>, Yonghong Song <yhs@meta.com>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"ihor.solodrai@linux.dev" <ihor.solodrai@linux.dev>,
	Song Liu <songliubraving@meta.com>,
	Mykola Lysenko <mykolal@meta.com>
Subject: Re: [PATCH RFC 0/3] list inline expansions in .BTF.inline
Date: Fri, 2 May 2025 09:31:54 +0100	[thread overview]
Message-ID: <7e7318c4-1592-4080-b548-acf898aef7b4@oracle.com> (raw)
In-Reply-To: <7995874C-FE08-46CC-9CBF-AF337E7FB560@meta.com>

On 01/05/2025 20:38, Thierry Treyer wrote:
>> 1. multiple functions with the same name and different function
>> signatures. Since we have no mechanism currently to associate function
>> and site we simply refuse to encode them in BTF today
>> 2. functions with inconsistent representations. If a function does not
>> use the expected registers for its function signature due to
>> optimizations we leave it out of BTF representation; and of course
>> 3. inline functions are not currently represented at all.
>>
>> I think we can do a better job with 1 and 2 while solving 3 as well.
>> Here's my suggestion.
> 
> I see how your approach covers all these problems!
> I would also add the following issue, which is a variant of 2 and 3:
> 
> 4. Partially inlined functions: functions having a symbol, but is also
>         inlined at some call sites. Currently not represented either.
> 

exactly, should have mentioned that too as this is a real pain point for
tracing; we think we are seeing every time a function is hit, but since
it is partially inlined we do not.

>> First, we separate functions which have complicated relationships with
>> their parameters (cases 1, 2 and 3) into a separate .BTF.func_aux
>> section or similar. That can be delivered in vmlinux or via a
>> special-purpose module; for modules it would be just a separate ELF
>> section as it would likely be small. We can control access to ensure
>> unprivileged users cannot get address information, hence the separation
>> from vmlinux BTF. But it is just (split) BTF, so no new format required.
>>
>> The advantage of this is that tracers today can do the straightforward
>> tracing of functions from /sys/kernel/btf/vmlinux, and if a function is
>> not there and the tracer supports handling more complex cases, it can
>> know to look in /sys/kernel/btf/vmlinux.func_aux.
> 
> Sounds good to me!
> Laying out the format of this new .BTF.func_aux section:
> 
> +---------------------+
> | BTF.func_aux header |
> +---------------------+
> ~  type info section  ~
> +---------------------+
> ~   string section    ~
> +---------------------+
> ~  location section   ~
> +---------------------+
>

I'd say we could keep location info in the type section; that way we
benefit from existing dedup mechanisms (though we'd have to extend them
to cover locations).

>> In that section we have a split BTF representation in which function
>> signatures for cases 1, 2, and 3 are represented in the usual way (FUNC
>> pointing at FUNC_PROTO). However since we know that the relationship
>> between function and its site(s) is complex, we need some extra info.
> 
> We have the same base as the BTF section, so we can encode FUNC and
> FUNC_PROTO in the 'type info section'. The strings for the new functions'
> names get deduplicated and stored in the 'string section'.
> 
> The 'location section' lists location expressions to locate the arguments.
> As discussed with Alexei, _one_ LOC_* operation will describe the location
> of _one_ argument; there is no series of operations to carry out in order
> to retrieve the argument's value. This also makes re-using location
> expressions across multiple arguments/functions through de-duplication.
> 
>> I'd propose we add a DATASEC containing functions and their addresses. A
>> FUNC datasec it could be laid out as follows
>>
>> struct btf_func_secinfo {
>> __u32 type;
>> __u32 offset;
>> __u32 loc;
>> };
> 
> We'd have a new BTF_KIND_FUNCSEC type followed by 'vlen' btf_func_secinfo.
> I see how 'type' and 'offset' can be used to disambiguate between functions
> sharing the same name, but I'm confused by 'loc'. Functions with multiple
> arguments will need a location expression for each of them.
> How about having another 'vlen', followed by the offsets into the location
> section?
> 
> struct btf_func_secinfo {
> __u32 type;
> __u32 offset;
> __u32 vlen;
> // Followed by: __u32 locs[vlen];
> }
> 
> Or did you have something else in mind?
> 

Good point, didn't think this through. I guess we should probably
experiment here to find the most compact representation but could the
location be a BTF type BTF_KIND_LOC with a vlen specifying the number of
parameter location representations? Then we'd follow it with a location
representation for each (like what we do for struct members using a
vlen-length series of "struct btf_member"s in a BTF_KIND_STRUCT.
That would be easier I think than having variable length sec info entries.

The interesting question will be with deduplication of identical
location info, which representation is more compact? I'd say we should
try a few variations and see which one wins in practice.

>> In the case of 1 (multiple signatures for a function name) the DATASEC
>> will have entries for each site which tie it to its associated FUNC.
>> This allows us to clarify which function is associated with which
>> address. So the type is the BTF_KIND_FUNC, the offset the address and
>> the loc is 0 since we don't need it for this case since the functions
>> have parameters in expected locations.
>>
>> In the case of 2 (functions with inconsistent representations) we use
>> the type to point at the FUNC, the offset the address of the function
>> and the loc to represent location info for that site. By leaving out
>> caller/callee info from location data we could potentially exploit the
>> fact that a lot of locations have similar layouts in terms of where
>> parameters are available, making dedup of location info possible.
>> Caller/callee relationship can still be inferred via the site address.
>>
>> Finally in case 3 we have inlines which would be represented similarly
>> to case 2; i.e. we marry a representation of the function (the type) to
>> the associated inline site via location data in the loc field.
> 
> Here's how it could look like:
> 
> [1] FUNC_PROTO ...
>       ...args
> [2] FUNC 'foo' type_id=1   # 1. name collision with [4]
> [3] FUNC_PROTO ...
>       ...args
> [4] FUNC 'foo' type_id=3   # 1. name collision with [2]
> [5] FUNC_PROTO ...
>       ...args
> [6] FUNC 'bar' type_id=5   # 2. non-standard arguments location
> [7] FUNC_PROTO ...
>       ...args
> [8] FUNC 'baz' type_id=7   # 3-4. partially/fully inlined function
> [9] FUNCSEC '.text', vlen=5
>   - type_id=2, offset=0x1000, loc=0 # 1. share the same name, but
>   - type_id=4, offset=0x2000, loc=0 #    differentiate with the offset
>   - type_id=6, offset=0x3000, loc=??? # 2. non-standard args location
>     * offset of arg0 locexpr: 0x1234  #    each arg gets a loc offset
>     * offset of arg1 locexpr: 0x5678  #    or some other encoding?
>   - type_id=8, offset=0x4000, loc=0   # 4. non-inlined instance
>   - type_id=8, offset=0x1050, loc=??? # 3. inlined instance
>     * # ...args loc offsets
> 
>> If so, the question becomes what are we missing today? As far as I can
>> see we need
>>
>> - support for new kinds BTF_KIND_FUNC_DATASEC, or simply use the kind
>> flag for existing BTF datasec to indicate function info
>> - support for new location kind
>> - pahole support to generate address-based datasec and location separately
>> - for modules, we would eventually need multi-split BTF that would allow
>> the func aux section to be split BTF on top of existing module BTF, i.e.
>> a 3-level split BTF
> 
> Do you think locations should be part of the 'type info section'?
> Or should they have their own 'location section'?
>

I'd say part of the type info section as we can use existing dedup with
tweaks to support location info.

> For modules, I'm less familiar with them.
> Would you have some guidance about their requirements?
> 

For modules we use split BTF to represent the type/function info etc. So
we deduplicate the module type representation, removing redundant info
already in the vmlinux BTF. Then the split BTF that covers type info
only in the kernel starts at last_vmlinux_BTF_id + 1.

So if we are going to separate location/inline info, we'd probably use a
similar split BTF approach, but the challenge with modules is that we'd
be using split BTF on top of existing module BTF, which is itself split
BTF. So we'd need to support a 3-level split where the location-related
BTF can refer to types in the module BTF (split) or the base BTF
(vmlinux BTF). We haven't got any cases of such multi-split BTF but it's
something we've discussed in the past and would probably be quite
doable; would just need kernel and libbpf support.

>> As I think some of the challenges you ran into implementing this
>> indicate, the current approach of matching ELF and DWARF info via name
>> only is creaking at the seams, and needs to be reworked (in fact it is
>> the source of a bug Alexei ran into around missing kfuncs). So I'm
>> hoping to get a patch out this week that uses address info to aid the
>> matching between ELF/DWARF, and from there it's a short jump to using it
>> in DATASEC representations.
>>
>> Anyway let me know what you think. If it sounds workable we could
>> perhaps try prototyping the pieces and see if we can get them working
>> with location info.
> 
> I'll look into emitting functions that are currently not represented,
> because they fall in the pitfalls 1-4. That will give us the base for
> the new .BTF.func_aux section.

Sounds great!

> I'm looking forward to use your patch to simplify the linking between
> DWARF and BTF.
> 

I've sent a series to dwarves [1] that starts down that road storing
function addresses internally in pahole but not emitting them yet. Needs
more work but could be a basis for us to start working to emit function
datasec sections.

Thanks!

Alan

[1]
https://lore.kernel.org/dwarves/20250501145645.3317264-1-alan.maguire@oracle.com/

> Thanks for your time and have a great day,
> Thierry


  reply	other threads:[~2025-05-02  8:32 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-16 19:20 [PATCH RFC 0/3] list inline expansions in .BTF.inline Thierry Treyer via B4 Relay
2025-04-16 19:20 ` [PATCH RFC 1/3] dwarf_loader: Add parameters list to inlined expansion Thierry Treyer via B4 Relay
2025-04-16 19:20 ` [PATCH RFC 2/3] dwarf_loader: Add name to inline expansion Thierry Treyer via B4 Relay
2025-04-16 19:20 ` [PATCH RFC 3/3] inline_encoder: Introduce inline encoder to emit BTF.inline Thierry Treyer via B4 Relay
2025-04-25 18:40 ` [PATCH RFC 0/3] list inline expansions in .BTF.inline Daniel Xu
2025-04-28 20:51   ` Alexei Starovoitov
2025-04-29 19:14     ` Thierry Treyer
2025-04-29 23:58       ` Alexei Starovoitov
2025-04-30 15:25 ` Alan Maguire
2025-05-01 19:38   ` Thierry Treyer
2025-05-02  8:31     ` Alan Maguire [this message]
2025-05-19 12:02 ` Alan Maguire
2025-05-22 17:56   ` Thierry Treyer
2025-05-22 20:03     ` Andrii Nakryiko
2025-05-22 20:16       ` Eduard Zingerman
2025-05-23 18:57         ` Thierry Treyer
2025-05-26 14:30           ` Alan Maguire
2025-05-27 21:41             ` Andrii Nakryiko
2025-05-28 10:14               ` Alan Maguire
2025-05-23 17:33     ` Alexei Starovoitov
2025-05-23 18:35       ` Thierry Treyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e7318c4-1592-4080-b548-acf898aef7b4@oracle.com \
    --to=alan.maguire@oracle.com \
    --cc=acme@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=dwarves@vger.kernel.org \
    --cc=ihor.solodrai@linux.dev \
    --cc=mykolal@meta.com \
    --cc=songliubraving@meta.com \
    --cc=ttreyer@meta.com \
    --cc=yhs@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).