BPF List
 help / color / mirror / Atom feed
From: Potnuri Bharat Teja <bharat@chelsio.com>
To: Alan Maguire <alan.maguire@oracle.com>
Cc: Bart Van Assche <bvanassche@acm.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Nilay Shroff <nilay@linux.ibm.com>
Subject: Re: Kernel build fails if both CONFIG_DEBUG_INFO_BTF and CONFIG_CHELSIO_T4=y (was CONFIG_KCSAN)
Date: Thu, 20 Nov 2025 23:35:30 +0530	[thread overview]
Message-ID: <aR9YasvOhnSI564i@chelsio.com> (raw)
In-Reply-To: <b8e8b560-bce5-414b-846d-0da6d22a9983@oracle.com>

On Thursday, November 11/20/25, 2025 at 23:23:39 +0530, Alan Maguire wrote:
> On 20/11/2025 14:20, Alan Maguire wrote:
> > On 18/11/2025 16:47, Bart Van Assche wrote:
> >> On 11/18/25 4:07 AM, Alan Maguire wrote:
> >>> hi Bart, thanks for the report! Not a know issue to me at least; I tried
> >>> to reproduce it with pahole v1.31 + gcc 12 and no luck. Would you mind
> >>> sharing a few additional details:
> >>>
> >>> - compiler version
> >>> - pahole version
> >>> - full .config
> >>
> >> Hi Alan,
> >>
> >> My answers to your questions are as follows:
> >> * Compiler version: gcc version 14.2.0 (Debian 14.2.0-19+build5)
> >> * pahole version: v1.30
> >> * Kernel config: has been attached to this email.
> >>
> > 
> > thanks Bart! I've reproduced this now with gcc-14.2.1 + pahole 1.30 and
> > it is also observed with latest pahole 1.31. Investigating now, but if
> > you want to work around it in the short term, disabling CONFIG_WERROR
> > should allow resolve_btfids to proceed even where duplicate types are
> > present. Hopefully we will have a root cause/fix shortly though. Thanks
> > again for the report!
> >
> 
> [adding cxgb4 maintainer, for reasons that will become clearer below.
> Context here is that Bart is seeing kernel builds fail at the
> resolve_btfids stage; resolve_btfids is finding the BPF Type Format
> representation of core kernel data structures has duplicate entries for
> key kernel data structures like task_struct]

> 
> After adding some debug-only messaging to btf__dedup() in libbpf (which
> I will send as a patch as it makes debugging these situations much
> easier) I saw:
> 
> libbpf: struct 'task_struct' (size 2560 vlen 194) appears equivalent but
> differs for 23-indexed cand/canon member 'sched_class'/'sched_class': 0
> 
> Examining sched_class we see:
> 
> [107] STRUCT 'task_struct' size=2560 vlen=194
> 	...
>         'sched_class' type_id=480 bits_offset=5440
> 	...
> 
> [479] CONST '(anon)' type_id=8624
> [480] PTR '(anon)' type_id=479
> 
> [8624] STRUCT 'sched_class' size=216 vlen=27
>         'enqueue_task' type_id=8844 bits_offset=0
>         'dequeue_task' type_id=8846 bits_offset=64
>         'yield_task' type_id=8823 bits_offset=128
>         'yield_to_task' type_id=8848 bits_offset=192
>         'wakeup_preempt' type_id=8844 bits_offset=256
>         'balance' type_id=8851 bits_offset=320
>         'pick_task' type_id=8853 bits_offset=384
>         'pick_next_task' type_id=8855 bits_offset=448
>         'put_prev_task' type_id=8857 bits_offset=512
>         'set_next_task' type_id=8859 bits_offset=576
>         'select_task_rq' type_id=8861 bits_offset=640
>         'migrate_task_rq' type_id=8863 bits_offset=704
>         'task_woken' type_id=8865 bits_offset=768
>         'set_cpus_allowed' type_id=8868 bits_offset=832
>         'rq_online' type_id=8823 bits_offset=896
>         'rq_offline' type_id=8823 bits_offset=960
>         'find_lock_rq' type_id=8870 bits_offset=1024
>         'task_tick' type_id=8844 bits_offset=1088
>         'task_fork' type_id=236 bits_offset=1152
>         'task_dead' type_id=236 bits_offset=1216
>         'switching_to' type_id=8865 bits_offset=1280
>         'switched_from' type_id=8865 bits_offset=1344
>         'switched_to' type_id=8865 bits_offset=1408
>         'reweight_task' type_id=8873 bits_offset=1472
>         'prio_changed' type_id=8844 bits_offset=1536
>         'get_rr_interval' type_id=8875 bits_offset=1600
>         'update_curr' type_id=8823 bits_offset=1664
> 
> 
> Now looking at the first duplicate:
> 
> [36354] STRUCT 'task_struct' size=2560 vlen=194
> 	...
>         'sched_class' type_id=36389 bits_offset=5440
> 	...
> 
> 
> [36387] STRUCT 'sched_class' size=64 vlen=6
>         'state' type_id=28 bits_offset=0
>         'idx' type_id=28 bits_offset=8
>         'info' type_id=38195 bits_offset=32
>         'bind_type' type_id=38228 bits_offset=256
>         'entry_list' type_id=90 bits_offset=320
>         'refcnt' type_id=84 bits_offset=448
> [36388] CONST '(anon)' type_id=36387
> [36389] PTR '(anon)' type_id=36388
> 
> 
> sched_class looks totally different! The reason is cxgb4 declares its
> own sched_class while also #include'ing task_struct-related headers.
> Bart's config exposed this because he had CONFIG_CHELSIO_T4=y (I had 'm'
> in my config).
> 
> If we look at drivers/net/ethernet/chelsio/cxgb4/sched.h we indeed see:
> 
> struct sched_class {
>         u8 state;
>         u8 idx;
>         struct ch_sched_params info;
>         enum sched_bind_type bind_type;
>         struct list_head entry_list;
>         atomic_t refcnt;
> };
> 
> ..and cxgb4_main.c has #include <linux/sched.h> and #include <sched.h>
> with the clashing sched_class. Using pahole we can establish that the
> BTF encoding is simply reflecting the DWARF representation ("pahole
> cxgb4.ko" shows this), so BTF is effectively correctly reflecting the
> underlying DWARF representation. This will make life confusing for
> debuggers too.
> 
> So although it is a bit of a pain, I would suggest the simplest approach
> is to perhaps look at renaming sched_class to be a bit more
> domain-specific - ch_sched_class perhaps? That way it will not clash
> with task_struct's sched_class.
> 
> I can send a patch but it would be great to get cxgb4 maintainers' take
> here first.
Thanks for adding me and the detailed debug, Alan and Bart.
I will try this and let you know.
> 
> Thanks!
> 
> Alan

  reply	other threads:[~2025-11-20 18:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-17 20:40 Kernel build fails if both CONFIG_DEBUG_INFO_BTF and CONFIG_KCSAN are enabled Bart Van Assche
2025-11-18 12:07 ` Alan Maguire
2025-11-18 16:47   ` Bart Van Assche
2025-11-20 14:20     ` Alan Maguire
2025-11-20 17:53       ` Kernel build fails if both CONFIG_DEBUG_INFO_BTF and CONFIG_CHELSIO_T4=y (was CONFIG_KCSAN) Alan Maguire
2025-11-20 18:05         ` Potnuri Bharat Teja [this message]
2025-11-20 22:18           ` Alan Maguire
2025-11-21 17:22             ` Bart Van Assche
2025-11-21 18:15               ` Alan Maguire
2025-11-24 11:48                 ` Potnuri Bharat Teja

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aR9YasvOhnSI564i@chelsio.com \
    --to=bharat@chelsio.com \
    --cc=alan.maguire@oracle.com \
    --cc=bpf@vger.kernel.org \
    --cc=bvanassche@acm.org \
    --cc=martin.lau@linux.dev \
    --cc=nilay@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox