Re: [PATCH dwarves 0/3] pahole: Replace or add functions with true signatures in btf

BPF List
 help / color / mirror / Atom feed

From: Alan Maguire <alan.maguire@oracle.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>,
	Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>,
	dwarves <dwarves@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
	David Faust <david.faust@oracle.com>,
	"Jose E . Marchesi" <jose.marchesi@oracle.com>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH dwarves 0/3] pahole: Replace or add functions with true signatures in btf
Date: Fri, 14 Nov 2025 15:57:09 +0000	[thread overview]
Message-ID: <3f95f01b-9cc4-499b-a18d-5c4975f0b0e5@oracle.com> (raw)
In-Reply-To: <CAADnVQKr+9gneG4ZZHBKWjTo-AiqPCf_Mxv_sCi9acqEKkKShw@mail.gmail.com>

On 13/11/2025 17:36, Alexei Starovoitov wrote:
> On Thu, Nov 13, 2025 at 8:45 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> On 11/11/2025 17:04, Yonghong Song wrote:
>>> Current vmlinux BTF encoding is based on the source level signatures.
>>> But the compiler may do some optimization and changed the signature.
>>> If the user tried with source level signature, their initial implementation
>>> may have wrong results and then the user need to check what is the
>>> problem and work around it, e.g. through kprobe since kprobe does not
>>> need vmlinux BTF.
>>>
>>> The following is a concrete example for [1].
>>> The original source signature:
>>>   typedef struct {
>>>         union {
>>>                 void            *kernel;
>>>                 void __user     *user;
>>>         };
>>>         bool            is_kernel : 1;
>>>   } sockptr_t;
>>>   typedef sockptr_t bpfptr_t;
>>>   static int map_create(union bpf_attr *attr, bpfptr_t uattr) { ... }
>>> After compiler optimization, the signature becomes:
>>>   static int map_create(union bpf_attr *attr, bool uattr__coerce1) { ... }
>>>
>>> In the above, uattr__coerce1 corresponds to 'is_kernel' field in sockptr_t.
>>> Here, the suffix '__coerce1' refers to the second 64bit value in
>>> sockptr_t. The first 64bit value will be '__coerce0' if that value
>>> is used instead.
>>>
>>> To do proper tracing, it would be good for the users to know the
>>> changed signature. With the actual signature, both kprobe and fentry
>>> should work as usual. This can avoid user surprise and improve
>>> developer productivity.
>>>
>>> The llvm compiler patch [1] collects true signature and encoded those
>>> functions in dwarf. pahole will process these functions and
>>> replace old signtures with true signatures. Additionally,
>>> new functions (e.g., foo.llvm.<hash>) can be encoded in
>>> vmlinux BTF as well.
>>>
>>> Patches 1/2 are refactor patches. Patch 3 has the detailed explanation
>>> in commit message and implements the logic to encode replaced or new
>>> signatures to vmlinux BTF. Please see Patch 3 for details.
>>>
>>
>>
>> Thanks for sending the series Yonghong! I think the thing we need to
>> discuss at a high level is this; what is the proposed relationship
>> between source code and BTF function encoding? The approach we have
>> taken thus far is to use source level as the basis for encoding, and as
>> part of that we attempt to identify cases where the source-level
>> expectations are violated by the compiled (optimized) code. We currently
>> do not encode those cases as in the case of optimized-out parameters,
>> source-level expectations of parameter position could lead to bad
>> behaviour. There are probably cases we miss in this, but that is the
>> intent at least.
>>
>> There are however cases where .isra-suffixed functions retain the
>> expected parameter representations; in such cases we encode with the
>> prefix name ("foo" not "foo.isra.0") as DWARF does.
>>
>> So in moving away from that, I think we need to make a clear decision
>> and have handling in place. My practical worry is that users trying to
>> write BPF progs cannot easily predict if a parameter is optimized out
>> and so on, so it's hard to write stable BPF programs for such
>> signatures. Less of a problem if using a high-level tracer I suppose.
>>
>> The approach I had been thinking about was to utilize BTF location
>> information for such cases, but the RFC [1] didn't get around to
>> implementing the support. So the idea would be have location info with
>> parameter types and locations, but because we don't encode a function
>> fentry can't be used (but kprobes still could as for inline sites). So
>> under that scheme the foo.llvm.hash functions could still be called
>> "foo" since we have address information for the sites we can match foo
>> to foo.llvm.hash.
>>
>> Anyway I'd appreciate other perspectives here. We have implicitly tied
>> BTF function encoding thus for to source-level representation for
>> reasons of fentry safety, but we could potentially move away from that.
>> Doing so though would I think at a minimum require machinery for fentry
>> safety to preserved, but we could find other ways to flag this in the
>> BTF function representation potentially. Thanks!
> 
> Looks like we have a big disconnect here.
> To me BTF was never about the source, but about vmlinux final binary.
> Compile flags, configs change both types and functions significantly.
> For types it's easy to see in the vmlinux BTF how they got transformed
> from the original types in the source. Some source types disappear
> altogether. Similar situation with functions. They mutate.
> Partial inling, function renames are all part of the same category.
> BTF has to describe the final result, so that tracers/users can
> actually debug/introspect the kernel they have and not an abstract
> kernel source. pahole was conservative and removed functions that
> don't match BTF. loc* set is going to bring back these functions
> into BTF with their arguments. True signature support is complementary
> and mandatory part to loc* set. We need both. Compiler has to
> store the true signature in dwarf and pahole has to pass it to BTF
> along with location of arguments and actual name of function symbol table.
>

I don't object to having a representation tied to the final binary;
however I will say there is huge value in _knowing_ things changed from
source to final representation. Now if we encode function names with '.'
suffixes that is one way of knowing, and it may be enough, but I think
we should think about mechanisms to ease overall developer experience in
that new world.

When I write a BPF program for fentry(), it seems to me to be deeply
inconsistent that I can make it work across multiple kernel versions
from a data structure perspective via CO-RE while also having to worry
about the risk of compiler optimizations transforming or eliminating
function parameters. It is a step forward in some ways that we can trace
such functions at all, but I still think we will need a better story
there. For example it is often the case that a BPF program only uses one
parameter from a function signature; if we don't access transformed or
eliminated parameters, can the verifier accept the fentry() even if the
signature doesn't exactly match? We don't need to add these things
today, but I think it would be good to discuss some of the consequences
and how we would possibly handle them.


> Re: whether to strip .llvm or not, I think it's better to keep BTF
> matching symbol table which is kallsyms. If it has .llvm suffix in kallsyms
> it should have the same name in BTF. Tracing tools can attach
> with "func_name.*" pattern. libbpf already supports it.
> And thanks to BTF the fentry prog should match what is true
> kernel function signature. What was the source signature is secondary.
> The users cannot write their progs based on source, since such
> source code doesn't exist in the binary, so nothing to trace.
> While true signature with actual parameters is traceable.

Yeah I think if we are passing through the changed function signatures
we definitely need a way to know such changes happened; the "." suffix
will tell us that.

Alan

next prev parent reply	other threads:[~2025-11-14 15:57 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-11 17:04 [PATCH dwarves 0/3] pahole: Replace or add functions with true signatures in btf Yonghong Song
2025-11-11 17:04 ` [PATCH dwarves 1/3] btf_encoder: Refactor elf_functions__new() with struct btf_encoder as argument Yonghong Song
2025-11-11 17:04 ` [PATCH dwarves 2/3] bpf_encoder: Refactor a helper elf_function__check_and_push_sym() Yonghong Song
2025-11-11 17:04 ` [PATCH dwarves 3/3] pahole: Replace or add functions with true signatures in btf Yonghong Song
2025-11-13 16:45 ` [PATCH dwarves 0/3] " Alan Maguire
2025-11-13 17:36   ` Alexei Starovoitov
2025-11-14 15:57     ` Alan Maguire [this message]
2025-11-14 20:11       ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f95f01b-9cc4-499b-a18d-5c4975f0b0e5@oracle.com \
    --to=alan.maguire@oracle.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=arnaldo.melo@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=david.faust@oracle.com \
    --cc=dwarves@vger.kernel.org \
    --cc=jose.marchesi@oracle.com \
    --cc=kernel-team@fb.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox