bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alan Maguire <alan.maguire@oracle.com>
To: ttreyer@meta.com, dwarves@vger.kernel.org, bpf@vger.kernel.org
Cc: acme@kernel.org, ast@kernel.org, yhs@meta.com, andrii@kernel.org,
	ihor.solodrai@linux.dev, songliubraving@meta.com,
	mykolal@meta.com
Subject: Re: [PATCH RFC 0/3] list inline expansions in .BTF.inline
Date: Wed, 30 Apr 2025 16:25:19 +0100	[thread overview]
Message-ID: <ab9d66e1-bf54-4aa9-9f11-a3a1835acd8a@oracle.com> (raw)
In-Reply-To: <20250416-btf_inline-v1-0-e4bd2f8adae5@meta.com>

On 16/04/2025 20:20, Thierry Treyer via B4 Relay wrote:
> This proposal extends BTF to list the locations of inlined functions and
> their arguments in a new '.BTF.inline` section.
> 
> == Background ==
> 
> Inline functions are often a blind spot for profiling and tracing tools:
> * They cannot probe fully inlined functions.
>   The BTF contains no data about them.
> * They miss calls to partially inlined functions,
>   where a function has a symbol, but is also inlined in some callers.
> * They cannot account for time spent in inlined calls.
>   Instead, they report the time to the caller.
> * They don't provide a way to access the arguments of an inlined call.
> 
> The issue is exacerbated by Link-Time Optimization, which enables more
> inlining across Object files. One workaround is to disable inlining for
> the profiled functions, but that requires a whole kernel compilation and
> doesn't allow for iterative exploration.
> 
> The information required to solve the above problems is not easily
> accessible. It requires parsing most of the DWARF's '.debug_info` section,
> which is time consuming and not trivial.
> Instead, this proposal leverages and extends the existing information
> contained in '.BTF` (for typing) and '.BTF.ext` (for caller location),
> with information from a new section called '.BTF.inline`,
> listing inlined instances.
> 
> == .BTF.inline Section ==
> 
> The new '.BTF.inline` section has a layout similar to '.BTF`.
> 
>  off |0-bit      |16-bits  |24-bits  |32-bits                           |
> -----+-----------+---------+---------+----------------------------------+
> 0x00 |   magic   | version |  flags  |          header length           |
> 0x08 |      inline info offset       |        inline info length        |
> 0x10 |        location offset        |          location length         |
> -----+------------------------------------------------------------------+
>      ~                        inline info section                       ~
> -----+------------------------------------------------------------------+
>      ~                          location section                        ~
> -----+------------------------------------------------------------------+
> 
> It starts with a header (see 'struct btf_inline_header`),
> followed by two subsections:
> 1. The 'Inline Info' section contains an entry for each inlined function.
>    Each entry describes the instance's location in its caller and is
>    followed by the offsets in the 'Location' section of the parameters
>    location expressions. See 'struct btf_inline_instance`.
> 2. The 'Location' section contains location expressions describing how
>    to retrieve the value of a parameter. The expressions are NULL-
>    terminated and are adressed similarly to '.BTF`'s string table.
> 
> struct btf_inline_header {
>   uint16_t magic;
>   uint8_t version, flags;
>   uint32_t header_length;
>   uint32_t inline_info_offset, inline_info_length;
>   uint32_t location_offset, location_length;
> };
> 
> struct btf_inline_instance {
>   type_id_t callee_id;     // BTF id of the inlined function
>   type_id_t caller_id;     // BTF id of the caller
>   uint32_t caller_offset;  // offset of the callee within the caller
>   uint16_t nr_parms;       // number of parameters
> //uint32_t parm_location[nr_parms];  // offset of the location expression
> };                                   // in 'Location' for each parameter
> 
> == Location Expressions ==
> 
> We looked at the DWARF location expressions for the arguments of inlined
> instances having <= 100 instances, on a production kernel v6.9.0. This
> yielded 176,800 instances with 269,327 arguments. We learned that most
> expressions are simple register access, perhaps with an offset. We would
> get access to 87% of the arguments by implementing literal and register.
> 
> Op. Category      Expr. Count    Expr. %
> ----------------------------------------
> literal                 10714      3.98%
> register+above         234698     87.14%
> arithmetic+above       239444     88.90%
> composite+above        240394     89.26%
> stack+above            242075     89.88%
> empty                   27252     10.12%
> 
> We propose to re-encode DWARF location expressions into a custom BTF
> location expression format. It operates on a stack of values, similar to
> DWARF's location expressions, but is stripped of unused operators,
> while allowing future expansions.
> 
> A location expression is composed of a series of operations, terminated
> by a NULL-byte/LOC_END_OF_EXPR operator. The very first expression in the
> 'Location' subsection must be an empty expression constisting only of
> LOC_END_OF_EXPR.
> 
> An operator is a tagged union: the tag describes the operation to carry
> out and the union contains the operands.
>  
>  ID | Operator Name        | Operands[...]
> ----+----------------------+-------------------------------------------
>   0 | LOC_END_OF_EXPR      | _none_
>   1 | LOC_SIGNED_CONST_1   |  s8: constant's value
>   2 | LOC_SIGNED_CONST_2   | s16: constant's value
>   3 | LOC_SIGNED_CONST_4   | s32: constant's value
>   4 | LOC_SIGNED_CONST_8   | s64: constant's value
>   5 | LOC_UNSIGNED_CONST_1 |  u8: constant's value
>   6 | LOC_UNSIGNED_CONST_2 | u16: constant's value
>   7 | LOC_UNSIGNED_CONST_4 | u32: constant's value
>   8 | LOC_UNSIGNED_CONST_8 | u64: constant's value
>   9 | LOC_REGISTER         |  u8: DWARF register number from the ABI
>  10 | LOC_REGISTER_OFFSET  |  u8: DWARF register number from the ABI
>                            | s64: offset added to the register's value
>  11 | LOC_DEREF            |  u8: size of the deref'd type
> 
> This list should be further expanded to include arithmetic operations.
> 
> Example: accessing a field at offset 12B from a struct whose adresse is
>          in the '%rdi` register, on amd64, has the following encoding:
> 
> [0x0a 0x05 0x000000000000000c] [0x0b 0x04] [0x00]
>  |    |    ` Offset Added       |    |      ` LOC_END_OF_EXPR
>  |    ` Register Number         |    ` Size of Deref.
>  ` LOC_REGISTER_OFFSET          ` LOC_DEREF
> 
> == Summary ==
> 
> Combining the new information from '.BTF.inline` with the existing data
> from '.BTF` and '.BTF.ext`, tools will be able to locate inline functions
> and their arguments. Symbolizer can also use the data to display the
> functions inlined at a given address.
> 
> Fully inlined functions are not part of the BTF and thus are not covered
> by this proposal. Adding them to the BTF would enable their coverage and
> should be considered.
> 
> Signed-off-by: Thierry Treyer <ttreyer@meta.com>


This is fantastic work; a huge step forward having a practical
implementation of inline handling! So while I have some suggestions
about how we might be able to somewhat rework things to tackle some
associated problems and integrate more completely with existing BTF
representations I want to make sure they do not detract from the huge
progress you've made here.

There are some existing problems we have in tracing functions that could
benefit from an approach like this.  In the goal to maximize the
traceable surface of the system by representing functions in BTF, we
currently have to make some compromises. These are:

1. multiple functions with the same name and different function
signatures. Since we have no mechanism currently to associate function
and site we simply refuse to encode them in BTF today
2. functions with inconsistent representations. If a function does not
use the expected registers for its function signature due to
optimizations we leave it out of BTF representation; and of course
3. inline functions are not currently represented at all.

I think we can do a better job with 1 and 2 while solving 3 as well.
Here's my suggestion.

First, we separate functions which have complicated relationships with
their parameters (cases 1, 2 and 3) into a separate .BTF.func_aux
section or similar. That can be delivered in vmlinux or via a
special-purpose module; for modules it would be just a separate ELF
section as it would likely be small. We can control access to ensure
unprivileged users cannot get address information, hence the separation
from vmlinux BTF. But it is just (split) BTF, so no new format required.

The advantage of this is that tracers today can do the straightforward
tracing of functions from /sys/kernel/btf/vmlinux, and if a function is
not there and the tracer supports handling more complex cases, it can
know to look in /sys/kernel/btf/vmlinux.func_aux.

In that section we have a split BTF representation in which function
signatures for cases 1, 2, and 3 are represented in the usual way (FUNC
pointing at FUNC_PROTO). However since we know that the relationship
between function and its site(s) is complex, we need some extra info.
I'd propose we add a DATASEC containing functions and their addresses. A
FUNC datasec it could be laid out as follows

struct btf_func_secinfo {
	__u32 type;
	__u32 offset;
	__u32 loc;
};

In the case of 1 (multiple signatures for a function name) the DATASEC
will have entries for each site which tie it to its associated FUNC.
This allows us to clarify which function is associated with which
address. So the type is the BTF_KIND_FUNC, the offset the address and
the loc is 0 since we don't need it for this case since the functions
have parameters in expected locations.

In the case of 2 (functions with inconsistent representations) we use
the type to point at the FUNC, the offset the address of the function
and the loc to represent location info for that site. By leaving out
caller/callee info from location data we could potentially exploit the
fact that a lot of locations have similar layouts in terms of where
parameters are available, making dedup of location info possible.
Caller/callee relationship can still be inferred via the site address.

Finally in case 3 we have inlines which would be represented similarly
to case 2; i.e. we marry a representation of the function (the type) to
the associated inline site via location data in the loc field.

Does this approach sound workable? I think from your side it gives you
what you need:

- signals to tracers that functions have a standard form (FUNC is
present in vmlinux BTF)
- signals that functions have a more complex, site-specific form (FUNC
is present in vmlinux.func_aux)
- privileged-only access to address info by controlling access to the
/sys/kernel/btf representation of the func aux info
- a way to trace an inlined function via secinfo mapping function
signature to location at a specific address

If so, the question becomes what are we missing today? As far as I can
see we need

- support for new kinds BTF_KIND_FUNC_DATASEC, or simply use the kind
flag for existing BTF datasec to indicate function info
- support for new location kind
- pahole support to generate address-based datasec and location separately
- for modules, we would eventually need multi-split BTF that would allow
the func aux section to be split BTF on top of existing module BTF, i.e.
a 3-level split BTF

As I think some of the challenges you ran into implementing this
indicate, the current approach of matching ELF and DWARF info via name
only is creaking at the seams, and needs to be reworked (in fact it is
the source of a bug Alexei ran into around missing kfuncs). So I'm
hoping to get a patch out this week that uses address info to aid the
matching between ELF/DWARF, and from there it's a short jump to using it
in DATASEC representations.

Anyway let me know what you think. If it sounds workable we could
perhaps try prototyping the pieces and see if we can get them working
with location info.

Thanks!

Alan


> ---
> Thierry Treyer (3):
>       dwarf_loader: Add parameters list to inlined expansion
>       dwarf_loader: Add name to inline expansion
>       inline_encoder: Introduce inline encoder to emit BTF.inline
> 
>  CMakeLists.txt   |   3 +-
>  btf_encoder.c    |   5 +
>  btf_encoder.h    |   2 +
>  btf_inline.pk    |  55 ++++++
>  dwarf_loader.c   | 176 ++++++++++++--------
>  dwarves.c        |  26 +++
>  dwarves.h        |   7 +
>  inline_encoder.c | 496 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  inline_encoder.h |  25 +++
>  pahole.c         |  40 ++++-
>  10 files changed, 765 insertions(+), 70 deletions(-)
> ---
> base-commit: 4ef47f84324e925051a55de10f9a4f44ef1da844
> change-id: 20250416-btf_inline-e5047eea9b6f
> 
> Best regards,


  parent reply	other threads:[~2025-04-30 15:25 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-16 19:20 [PATCH RFC 0/3] list inline expansions in .BTF.inline Thierry Treyer via B4 Relay
2025-04-16 19:20 ` [PATCH RFC 1/3] dwarf_loader: Add parameters list to inlined expansion Thierry Treyer via B4 Relay
2025-04-16 19:20 ` [PATCH RFC 2/3] dwarf_loader: Add name to inline expansion Thierry Treyer via B4 Relay
2025-04-16 19:20 ` [PATCH RFC 3/3] inline_encoder: Introduce inline encoder to emit BTF.inline Thierry Treyer via B4 Relay
2025-04-25 18:40 ` [PATCH RFC 0/3] list inline expansions in .BTF.inline Daniel Xu
2025-04-28 20:51   ` Alexei Starovoitov
2025-04-29 19:14     ` Thierry Treyer
2025-04-29 23:58       ` Alexei Starovoitov
2025-04-30 15:25 ` Alan Maguire [this message]
2025-05-01 19:38   ` Thierry Treyer
2025-05-02  8:31     ` Alan Maguire
2025-05-19 12:02 ` Alan Maguire
2025-05-22 17:56   ` Thierry Treyer
2025-05-22 20:03     ` Andrii Nakryiko
2025-05-22 20:16       ` Eduard Zingerman
2025-05-23 18:57         ` Thierry Treyer
2025-05-26 14:30           ` Alan Maguire
2025-05-27 21:41             ` Andrii Nakryiko
2025-05-28 10:14               ` Alan Maguire
2025-05-23 17:33     ` Alexei Starovoitov
2025-05-23 18:35       ` Thierry Treyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ab9d66e1-bf54-4aa9-9f11-a3a1835acd8a@oracle.com \
    --to=alan.maguire@oracle.com \
    --cc=acme@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=dwarves@vger.kernel.org \
    --cc=ihor.solodrai@linux.dev \
    --cc=mykolal@meta.com \
    --cc=songliubraving@meta.com \
    --cc=ttreyer@meta.com \
    --cc=yhs@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).