All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Bobrowski <mattbobrowski@google.com>
To: Yonghong Song <yonghong.song@linux.dev>
Cc: Alan Maguire <alan.maguire@oracle.com>,
	Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>,
	dwarves@vger.kernel.org, bpf@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>,
	John Fastabend <john.fastabend@gmail.com>,
	Jiri Olsa <jolsa@kernel.org>
Subject: Re: [PATCH dwarves] btf_encoder: prefer strong function definitions for BTF generation
Date: Mon, 5 Jan 2026 11:47:33 +0000	[thread overview]
Message-ID: <aVuk1e73g7ZTHqMY@google.com> (raw)
In-Reply-To: <aVt139VXMTka-hYw@google.com>

On Mon, Jan 05, 2026 at 08:27:11AM +0000, Matt Bobrowski wrote:
> On Fri, Jan 02, 2026 at 10:46:00AM -0800, Yonghong Song wrote:
> > 
> > 
> > On 12/31/25 12:53 AM, Matt Bobrowski wrote:
> > > Currently, when a function has both a weak and a strong definition
> > > across different compilation units (CUs), the BTF encoder arbitrarily
> > > selects one to generate the BTF entry. This selection fundamentally is
> > > dependent on the order in which pahole processes the CUs.
> > > 
> > > This indifference often leads to a mismatch where the generated BTF
> > > reflects the weak definition's prototype, even though the linker
> > > selected the strong definition for the final vmlinux binary.
> > > 
> > > A notable example described in [0] involving function
> > > bpf_lsm_mmap_file(). Both weak and strong definitions exist,
> > > distinguished only by parameter names (e.g., file vs
> > > file__nullable). While the strong definition is linked into the
> > > vmlinux object, the generated BTF contained the prototype for the weak
> > > definition. This causes issues for BPF verifier (e.g., __nullable
> > > annotation semantics), or tools relying on accurate type information.
> > > 
> > > To fix this, ensure the BTF encoder selects the function definition
> > > corresponding to the actual code linked into the binary. This is
> > > achieved by comparing the DWARF function address (DW_AT_low_pc) with
> > > the ELF symbol address (st_value). Only the DWARF entry for the strong
> > > definition will match the final resolved ELF symbol address.
> > > 
> > > [0] https://lore.kernel.org/all/aVJY9H-e83T7ivT4@google.com/
> > > 
> > > Link: https://lore.kernel.org/all/aVJY9H-e83T7ivT4@google.com/
> > > Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
> > 
> > LGTM with some nits below.
> 
> Thanks for the review.
> 
> > Acked-by: Yonghong Song <yonghong.song@linux.dev>
> > 
> > > ---
> > >   btf_encoder.c | 36 ++++++++++++++++++++++++++++++++++++
> > >   1 file changed, 36 insertions(+)
> > > 
> > > diff --git a/btf_encoder.c b/btf_encoder.c
> > > index b37ee7f..0462094 100644
> > > --- a/btf_encoder.c
> > > +++ b/btf_encoder.c
> > > @@ -79,6 +79,7 @@ struct btf_encoder_func_annot {
> > >   /* state used to do later encoding of saved functions */
> > >   struct btf_encoder_func_state {
> > > +	uint64_t addr;
> > >   	struct elf_function *elf;
> > >   	uint32_t type_id_off;
> > >   	uint16_t nr_parms;
> > > @@ -1258,6 +1259,7 @@ static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct functi
> > >   	if (!state)
> > >   		return -ENOMEM;
> > > +	state->addr = function__addr(fn);
> > >   	state->elf = func;
> > >   	state->nr_parms = ftype->nr_parms + (ftype->unspec_parms ? 1 : 0);
> > >   	state->ret_type_id = ftype->tag.type == 0 ? 0 : encoder->type_id_off + ftype->tag.type;
> > > @@ -1477,6 +1479,29 @@ static void btf_encoder__delete_saved_funcs(struct btf_encoder *encoder)
> > >   	encoder->func_states.cap = 0;
> > >   }
> > > +static struct btf_encoder_func_state *btf_encoder__select_canonical_state(struct btf_encoder_func_state *combined_states,
> > > +									  int combined_cnt)
> > > +{
> > > +	int i, j;
> > > +
> > > +	/*
> > > +	 * The same elf_function is shared amongst combined functions,
> > > +	 * as per saved_functions_combine().
> > > +	 */
> > > +	struct elf_function *elf = combined_states[0].elf;
> > 
> > The logic is okay. But can we limit elf->sym_cnt to be 1 here?
> > This will match the case where two functions (weak and strong)
> > co-exist in compiler and eventually only strong/global function
> > will survive.
> 
> In fact, checking again I believe that the loop is redundant because
> elf_function__has_ambiguous_address() ensures that if we reach this
> point, all symbols for the function share the same address. Therefore,
> checking the first symbol (elf->syms[0]) should be sufficient and
> equivalent to checking all of them.
> 
> Will send through a v2 with this amendment.

Hm, actually, no. I don't think the addresses stored within
elf->syms[#].addr should all be assumed to be the same at the point
which the new btf_encoder__select_canonical_state() function is called
(due to things like skip_encoding_inconsistent_proto possibly taking
effect). Therefore, I think it's best that we leave things as is and
exhaustively iterate through all elf->syms? I don't believe there's
any adverse effects in doing it this way anyway?

> > > +
> > > +	for (i = 0; i < combined_cnt; i++) {
> > > +		struct btf_encoder_func_state *state = &combined_states[i];
> > > +
> > > +		for (j = 0; j < elf->sym_cnt; j++) {
> > > +			if (state->addr == elf->syms[j].addr)
> > > +				return state;
> > > +		}
> > > +	}
> > > +
> > > +	return &combined_states[0];
> > > +}
> > > +
> > >   static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder, bool skip_encoding_inconsistent_proto)
> > >   {
> > >   	struct btf_encoder_func_state *saved_fns = encoder->func_states.array;
> > > @@ -1517,6 +1542,17 @@ static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder, bool skip_e
> > >   					0, 0);
> > >   		if (add_to_btf) {
> > > +			/*
> > > +			 * We're to add the current function within
> > > +			 * BTF. Although, from all functions that have
> > > +			 * possibly been combined via
> > > +			 * saved_functions_combine(), ensure to only
> > > +			 * select and emit BTF for the most canonical
> > > +			 * function definition.
> > > +			 */
> > > +			if (j - i > 1)
> > > +				state = btf_encoder__select_canonical_state(state, j - i);
> > > +
> > >   			if (is_kfunc_state(state))
> > >   				err = btf_encoder__add_bpf_kfunc(encoder, state);
> > >   			else
> > 

  reply	other threads:[~2026-01-05 11:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-31  8:53 [PATCH dwarves] btf_encoder: prefer strong function definitions for BTF generation Matt Bobrowski
2026-01-02 18:46 ` Yonghong Song
2026-01-05  8:27   ` Matt Bobrowski
2026-01-05 11:47     ` Matt Bobrowski [this message]
2026-01-05 16:23       ` Yonghong Song
2026-01-05 20:43         ` Matt Bobrowski
2026-01-07 15:50           ` Alan Maguire
2026-01-07 18:55             ` Matt Bobrowski
2026-01-08 18:18               ` Alan Maguire
2026-01-09  8:27                 ` Matt Bobrowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aVuk1e73g7ZTHqMY@google.com \
    --to=mattbobrowski@google.com \
    --cc=alan.maguire@oracle.com \
    --cc=andrii@kernel.org \
    --cc=arnaldo.melo@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dwarves@vger.kernel.org \
    --cc=eddyz87@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.