From: Potnuri Bharat Teja <bharat@chelsio.com>
To: Alan Maguire <alan.maguire@oracle.com>
Cc: Bart Van Assche <bvanassche@acm.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Nilay Shroff <nilay@linux.ibm.com>
Subject: Re: Kernel build fails if both CONFIG_DEBUG_INFO_BTF and CONFIG_CHELSIO_T4=y (was CONFIG_KCSAN)
Date: Thu, 20 Nov 2025 23:35:30 +0530 [thread overview]
Message-ID: <aR9YasvOhnSI564i@chelsio.com> (raw)
In-Reply-To: <b8e8b560-bce5-414b-846d-0da6d22a9983@oracle.com>
On Thursday, November 11/20/25, 2025 at 23:23:39 +0530, Alan Maguire wrote:
> On 20/11/2025 14:20, Alan Maguire wrote:
> > On 18/11/2025 16:47, Bart Van Assche wrote:
> >> On 11/18/25 4:07 AM, Alan Maguire wrote:
> >>> hi Bart, thanks for the report! Not a know issue to me at least; I tried
> >>> to reproduce it with pahole v1.31 + gcc 12 and no luck. Would you mind
> >>> sharing a few additional details:
> >>>
> >>> - compiler version
> >>> - pahole version
> >>> - full .config
> >>
> >> Hi Alan,
> >>
> >> My answers to your questions are as follows:
> >> * Compiler version: gcc version 14.2.0 (Debian 14.2.0-19+build5)
> >> * pahole version: v1.30
> >> * Kernel config: has been attached to this email.
> >>
> >
> > thanks Bart! I've reproduced this now with gcc-14.2.1 + pahole 1.30 and
> > it is also observed with latest pahole 1.31. Investigating now, but if
> > you want to work around it in the short term, disabling CONFIG_WERROR
> > should allow resolve_btfids to proceed even where duplicate types are
> > present. Hopefully we will have a root cause/fix shortly though. Thanks
> > again for the report!
> >
>
> [adding cxgb4 maintainer, for reasons that will become clearer below.
> Context here is that Bart is seeing kernel builds fail at the
> resolve_btfids stage; resolve_btfids is finding the BPF Type Format
> representation of core kernel data structures has duplicate entries for
> key kernel data structures like task_struct]
>
> After adding some debug-only messaging to btf__dedup() in libbpf (which
> I will send as a patch as it makes debugging these situations much
> easier) I saw:
>
> libbpf: struct 'task_struct' (size 2560 vlen 194) appears equivalent but
> differs for 23-indexed cand/canon member 'sched_class'/'sched_class': 0
>
> Examining sched_class we see:
>
> [107] STRUCT 'task_struct' size=2560 vlen=194
> ...
> 'sched_class' type_id=480 bits_offset=5440
> ...
>
> [479] CONST '(anon)' type_id=8624
> [480] PTR '(anon)' type_id=479
>
> [8624] STRUCT 'sched_class' size=216 vlen=27
> 'enqueue_task' type_id=8844 bits_offset=0
> 'dequeue_task' type_id=8846 bits_offset=64
> 'yield_task' type_id=8823 bits_offset=128
> 'yield_to_task' type_id=8848 bits_offset=192
> 'wakeup_preempt' type_id=8844 bits_offset=256
> 'balance' type_id=8851 bits_offset=320
> 'pick_task' type_id=8853 bits_offset=384
> 'pick_next_task' type_id=8855 bits_offset=448
> 'put_prev_task' type_id=8857 bits_offset=512
> 'set_next_task' type_id=8859 bits_offset=576
> 'select_task_rq' type_id=8861 bits_offset=640
> 'migrate_task_rq' type_id=8863 bits_offset=704
> 'task_woken' type_id=8865 bits_offset=768
> 'set_cpus_allowed' type_id=8868 bits_offset=832
> 'rq_online' type_id=8823 bits_offset=896
> 'rq_offline' type_id=8823 bits_offset=960
> 'find_lock_rq' type_id=8870 bits_offset=1024
> 'task_tick' type_id=8844 bits_offset=1088
> 'task_fork' type_id=236 bits_offset=1152
> 'task_dead' type_id=236 bits_offset=1216
> 'switching_to' type_id=8865 bits_offset=1280
> 'switched_from' type_id=8865 bits_offset=1344
> 'switched_to' type_id=8865 bits_offset=1408
> 'reweight_task' type_id=8873 bits_offset=1472
> 'prio_changed' type_id=8844 bits_offset=1536
> 'get_rr_interval' type_id=8875 bits_offset=1600
> 'update_curr' type_id=8823 bits_offset=1664
>
>
> Now looking at the first duplicate:
>
> [36354] STRUCT 'task_struct' size=2560 vlen=194
> ...
> 'sched_class' type_id=36389 bits_offset=5440
> ...
>
>
> [36387] STRUCT 'sched_class' size=64 vlen=6
> 'state' type_id=28 bits_offset=0
> 'idx' type_id=28 bits_offset=8
> 'info' type_id=38195 bits_offset=32
> 'bind_type' type_id=38228 bits_offset=256
> 'entry_list' type_id=90 bits_offset=320
> 'refcnt' type_id=84 bits_offset=448
> [36388] CONST '(anon)' type_id=36387
> [36389] PTR '(anon)' type_id=36388
>
>
> sched_class looks totally different! The reason is cxgb4 declares its
> own sched_class while also #include'ing task_struct-related headers.
> Bart's config exposed this because he had CONFIG_CHELSIO_T4=y (I had 'm'
> in my config).
>
> If we look at drivers/net/ethernet/chelsio/cxgb4/sched.h we indeed see:
>
> struct sched_class {
> u8 state;
> u8 idx;
> struct ch_sched_params info;
> enum sched_bind_type bind_type;
> struct list_head entry_list;
> atomic_t refcnt;
> };
>
> ..and cxgb4_main.c has #include <linux/sched.h> and #include <sched.h>
> with the clashing sched_class. Using pahole we can establish that the
> BTF encoding is simply reflecting the DWARF representation ("pahole
> cxgb4.ko" shows this), so BTF is effectively correctly reflecting the
> underlying DWARF representation. This will make life confusing for
> debuggers too.
>
> So although it is a bit of a pain, I would suggest the simplest approach
> is to perhaps look at renaming sched_class to be a bit more
> domain-specific - ch_sched_class perhaps? That way it will not clash
> with task_struct's sched_class.
>
> I can send a patch but it would be great to get cxgb4 maintainers' take
> here first.
Thanks for adding me and the detailed debug, Alan and Bart.
I will try this and let you know.
>
> Thanks!
>
> Alan
next prev parent reply other threads:[~2025-11-20 18:57 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-17 20:40 Kernel build fails if both CONFIG_DEBUG_INFO_BTF and CONFIG_KCSAN are enabled Bart Van Assche
2025-11-18 12:07 ` Alan Maguire
2025-11-18 16:47 ` Bart Van Assche
2025-11-20 14:20 ` Alan Maguire
2025-11-20 17:53 ` Kernel build fails if both CONFIG_DEBUG_INFO_BTF and CONFIG_CHELSIO_T4=y (was CONFIG_KCSAN) Alan Maguire
2025-11-20 18:05 ` Potnuri Bharat Teja [this message]
2025-11-20 22:18 ` Alan Maguire
2025-11-21 17:22 ` Bart Van Assche
2025-11-21 18:15 ` Alan Maguire
2025-11-24 11:48 ` Potnuri Bharat Teja
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aR9YasvOhnSI564i@chelsio.com \
--to=bharat@chelsio.com \
--cc=alan.maguire@oracle.com \
--cc=bpf@vger.kernel.org \
--cc=bvanassche@acm.org \
--cc=martin.lau@linux.dev \
--cc=nilay@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox