From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D683213FEE for ; Mon, 5 Jan 2026 20:43:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.54 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767645838; cv=none; b=bcvxN7JqZkAgi3OmXqFXRfYINg51MaKnXNfarbqwprvR0eB/OH16nMGnB88QiusAmTSeNcy37nTCxqkL1t5o5JrgJBDwkeP0Gm+Rg231eQqjmofyb6Y8DkMiHk/NJbNNNq47eI9EpgUZZFwEaKbkYVsJb9pA9SShx1Rz5FysspY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767645838; c=relaxed/simple; bh=WjJwmAfYdMf5SAaVPzbS5dFiogmjSrgtmAWrbrhoC3U=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=NDBjxpRIcjn1YNbdi1Un0QVwmBcYlSKNoMX4rg1SFFZAIRhrzYn4sOEy8mciQVb+lfBd88duD4co6sz3GMvTbEisyapOUn4APnSel2OSyLJMOi50PKhjeDJMVF024lEGL73Vkgz1Jat6xYo3EIww3f1ExsC8wikauuxXYfsmLmM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=srKcBCD1; arc=none smtp.client-ip=209.85.208.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="srKcBCD1" Received: by mail-ed1-f54.google.com with SMTP id 4fb4d7f45d1cf-64b8b5410a1so426096a12.2 for ; Mon, 05 Jan 2026 12:43:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1767645835; x=1768250635; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=lMqvStSvFSxUzkQX5q2v9C9qSg4usWmTmbHR+7n7Wmc=; b=srKcBCD1aMQReHlEK7wbuSuhaR07+4moAXXWyHVZBNnhKcP2T0+6mEsyPjNLAm6xyc XqFvA65IAJ65i8yZvfyiMcPZXFtLtTVyEQ9A03KoXHX2vgw8Tn6cbpIHELL0wBneaThl hnx4qpNsbwtDkOaciZZ+0xeCsRvtAd4fuM6iAPuQSQTPkU7IumGtZ0K4QqQWdkIz0nZ8 n/xXNoRZUqCzDG/SOSLcPdPUJfgkSJRQB/b+ITINmZ0GXW41X/g+tlhghUBKxPT/0nAA ARin5x8ZYIvQi+jwkqYUu8mdxMK7z+5ibOokqQhKrAn8Hc9Zi6X76nNRiRE5BbYBNHj4 sDFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767645835; x=1768250635; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lMqvStSvFSxUzkQX5q2v9C9qSg4usWmTmbHR+7n7Wmc=; b=CVlOv0XOo9/Xbm+q2qqN1x32G2SSCAJHgFj4TijkbKUQl07BdvMG240z6bHFwpGzJl XvqC0/n+nj21VzQMC13M4IsX3adeCflq+waXN0epzTBiMwhAmCLEZ1YqR9sKHNWQgzHl R78rIaBsZabkOfD8162yyTe84S42gA7vN5bwK+tALpTIA+EgVpCTrNJVRHhf9hcmj9pj YMySFqO+CDu0We900/xYUEV3qwNlnT19Qo+e4FJ9RzSwEx99nbvMJnXsnNzuV+lso4CU 8+tCA3CEvFeKPAt6rYatEstqfunwOHpBxm9dkFXXmmB3fDSwwUBsBsmkMc2jP8qhOkfe akRg== X-Forwarded-Encrypted: i=1; AJvYcCU317ewAUWCTqTgSiVzxWbYF21vbD1vElMx/xGRbyxgdGRW1+qtrr4wbzIRPOjqj43XhOU=@vger.kernel.org X-Gm-Message-State: AOJu0YyfRPcIIa/BRhnIKbvqYkpdhuP9B0bJF6rT0RGRvhR4xqfBLH2Q wRjKkbdrLG7uMR7y3QT5Roga8tbY0i65tv3Fagm5j2aUySeRN3nDsu4ZdxxvWJ6nZw== X-Gm-Gg: AY/fxX7Q1PayI/3xp7dq909vpMhmXBqA6QzLfLYHPlvGokX484stxtZMZcGg9K+LaHw nGKaUN7K8dT7yPYCNU5T8+BLfkD1G4iY7yIZUzhKKhi66rvFBT3M9cX+gOSMUOZSGOEmturbWXN 4PdV7ZdaLEC0jLpjbiqtWaLbPh9vzX5eUbLxlA/CDypM2kGuGrJUXlJcwpvyIDngzL/xIUaGeI1 JE4798AkCl1V/PFCRKRivx6tLb2kK/TBImmOQrv3HDY+eT04m7wANwgE1GZs2l3UU0crKfDz9Cp lbRRwEbwFV8U27tmVoY+M8oX4e8QoftTl3EpPbsA4HnSrT0AFQUFuFFlq8nD+3+NceUQYlnY7d9 +q/qhogwgjIblVwRnZ1jCGIRT8O7w1sz9zepWn+A5d/mtSA/nB4f6yfetq4tu1uOyLMPdfFaq0c 0L8/osd2NKV5gCbqu/pFxfZpoMZaJQLgmmhkwSBrgVgw6gqmSfN4fM+0RXIiQonIj8 X-Google-Smtp-Source: AGHT+IHkrrJjP4SqkLTB3N9Q9NR+A8WhQu2Nw1SmwhMvzRKlTiMI7DXKDwEtpz3SCfqBVyG07RM/9A== X-Received: by 2002:a05:6402:274d:b0:64d:23ac:6ca7 with SMTP id 4fb4d7f45d1cf-6507967c2fdmr607422a12.20.1767645835067; Mon, 05 Jan 2026 12:43:55 -0800 (PST) Received: from google.com (14.59.147.34.bc.googleusercontent.com. [34.147.59.14]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-6507bf65c07sm332578a12.23.2026.01.05.12.43.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Jan 2026 12:43:53 -0800 (PST) Date: Mon, 5 Jan 2026 20:43:50 +0000 From: Matt Bobrowski To: Yonghong Song Cc: Alan Maguire , Arnaldo Carvalho de Melo , dwarves@vger.kernel.org, bpf@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , KP Singh , Stanislav Fomichev , John Fastabend , Jiri Olsa Subject: Re: [PATCH dwarves] btf_encoder: prefer strong function definitions for BTF generation Message-ID: References: <20251231085322.3248063-1-mattbobrowski@google.com> <926aca4a-d7d5-4e7d-9096-77b27374c5cd@linux.dev> <6b0968a3-406b-412f-acbb-c00ac2ad7c93@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <6b0968a3-406b-412f-acbb-c00ac2ad7c93@linux.dev> On Mon, Jan 05, 2026 at 08:23:29AM -0800, Yonghong Song wrote: > > > On 1/5/26 3:47 AM, Matt Bobrowski wrote: > > On Mon, Jan 05, 2026 at 08:27:11AM +0000, Matt Bobrowski wrote: > > > On Fri, Jan 02, 2026 at 10:46:00AM -0800, Yonghong Song wrote: > > > > > > > > On 12/31/25 12:53 AM, Matt Bobrowski wrote: > > > > > Currently, when a function has both a weak and a strong definition > > > > > across different compilation units (CUs), the BTF encoder arbitrarily > > > > > selects one to generate the BTF entry. This selection fundamentally is > > > > > dependent on the order in which pahole processes the CUs. > > > > > > > > > > This indifference often leads to a mismatch where the generated BTF > > > > > reflects the weak definition's prototype, even though the linker > > > > > selected the strong definition for the final vmlinux binary. > > > > > > > > > > A notable example described in [0] involving function > > > > > bpf_lsm_mmap_file(). Both weak and strong definitions exist, > > > > > distinguished only by parameter names (e.g., file vs > > > > > file__nullable). While the strong definition is linked into the > > > > > vmlinux object, the generated BTF contained the prototype for the weak > > > > > definition. This causes issues for BPF verifier (e.g., __nullable > > > > > annotation semantics), or tools relying on accurate type information. > > > > > > > > > > To fix this, ensure the BTF encoder selects the function definition > > > > > corresponding to the actual code linked into the binary. This is > > > > > achieved by comparing the DWARF function address (DW_AT_low_pc) with > > > > > the ELF symbol address (st_value). Only the DWARF entry for the strong > > > > > definition will match the final resolved ELF symbol address. > > > > > > > > > > [0] https://lore.kernel.org/all/aVJY9H-e83T7ivT4@google.com/ > > > > > > > > > > Link: https://lore.kernel.org/all/aVJY9H-e83T7ivT4@google.com/ > > > > > Signed-off-by: Matt Bobrowski > > > > LGTM with some nits below. > > > Thanks for the review. > > > > > > > Acked-by: Yonghong Song > > > > > > > > > --- > > > > > btf_encoder.c | 36 ++++++++++++++++++++++++++++++++++++ > > > > > 1 file changed, 36 insertions(+) > > > > > > > > > > diff --git a/btf_encoder.c b/btf_encoder.c > > > > > index b37ee7f..0462094 100644 > > > > > --- a/btf_encoder.c > > > > > +++ b/btf_encoder.c > > > > > @@ -79,6 +79,7 @@ struct btf_encoder_func_annot { > > > > > /* state used to do later encoding of saved functions */ > > > > > struct btf_encoder_func_state { > > > > > + uint64_t addr; > > > > > struct elf_function *elf; > > > > > uint32_t type_id_off; > > > > > uint16_t nr_parms; > > > > > @@ -1258,6 +1259,7 @@ static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct functi > > > > > if (!state) > > > > > return -ENOMEM; > > > > > + state->addr = function__addr(fn); > > > > > state->elf = func; > > > > > state->nr_parms = ftype->nr_parms + (ftype->unspec_parms ? 1 : 0); > > > > > state->ret_type_id = ftype->tag.type == 0 ? 0 : encoder->type_id_off + ftype->tag.type; > > > > > @@ -1477,6 +1479,29 @@ static void btf_encoder__delete_saved_funcs(struct btf_encoder *encoder) > > > > > encoder->func_states.cap = 0; > > > > > } > > > > > +static struct btf_encoder_func_state *btf_encoder__select_canonical_state(struct btf_encoder_func_state *combined_states, > > > > > + int combined_cnt) > > > > > +{ > > > > > + int i, j; > > > > > + > > > > > + /* > > > > > + * The same elf_function is shared amongst combined functions, > > > > > + * as per saved_functions_combine(). > > > > > + */ > > > > > + struct elf_function *elf = combined_states[0].elf; > > > > The logic is okay. But can we limit elf->sym_cnt to be 1 here? > > > > This will match the case where two functions (weak and strong) > > > > co-exist in compiler and eventually only strong/global function > > > > will survive. > > > In fact, checking again I believe that the loop is redundant because > > > elf_function__has_ambiguous_address() ensures that if we reach this > > > point, all symbols for the function share the same address. Therefore, > > > checking the first symbol (elf->syms[0]) should be sufficient and > > > equivalent to checking all of them. > > > > > > Will send through a v2 with this amendment. > > Hm, actually, no. I don't think the addresses stored within > > elf->syms[#].addr should all be assumed to be the same at the point > > which the new btf_encoder__select_canonical_state() function is called > > (due to things like skip_encoding_inconsistent_proto possibly taking > > effect). Therefore, I think it's best that we leave things as is and > > exhaustively iterate through all elf->syms? I don't believe there's > > any adverse effects in doing it this way anyway? > > No. Your code is correct. For elf->sym_cnt, it covers both sym_cnt > is 1 or more than 1. My previous suggestion is to single out the > sym_cnt = 1 case since it is what you try to fix. > > I am okay with the current implementation since it is correct. > Maybe Alan and Arnaldo have additional comments about the code. Sure, sounds good. I think leaving it as is probably our best bet at this point. > > > > > + > > > > > + for (i = 0; i < combined_cnt; i++) { > > > > > + struct btf_encoder_func_state *state = &combined_states[i]; > > > > > + > > > > > + for (j = 0; j < elf->sym_cnt; j++) { > > > > > + if (state->addr == elf->syms[j].addr) > > > > > + return state; > > > > > + } > > > > > + } > > > > > + > > > > > + return &combined_states[0]; > > > > > +} > > > > > + > > > > > static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder, bool skip_encoding_inconsistent_proto) > > > > > { > > > > > struct btf_encoder_func_state *saved_fns = encoder->func_states.array; > > > > > @@ -1517,6 +1542,17 @@ static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder, bool skip_e > > > > > 0, 0); > > > > > if (add_to_btf) { > > > > > + /* > > > > > + * We're to add the current function within > > > > > + * BTF. Although, from all functions that have > > > > > + * possibly been combined via > > > > > + * saved_functions_combine(), ensure to only > > > > > + * select and emit BTF for the most canonical > > > > > + * function definition. > > > > > + */ > > > > > + if (j - i > 1) > > > > > + state = btf_encoder__select_canonical_state(state, j - i); > > > > > + > > > > > if (is_kfunc_state(state)) > > > > > err = btf_encoder__add_bpf_kfunc(encoder, state); > > > > > else >