All of lore.kernel.org
 help / color / mirror / Atom feed
From: Potnuri Bharat Teja <bharat@chelsio.com>
To: Alan Maguire <alan.maguire@oracle.com>
Cc: Bart Van Assche <bvanassche@acm.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Nilay Shroff <nilay@linux.ibm.com>
Subject: Re: Kernel build fails if both CONFIG_DEBUG_INFO_BTF and CONFIG_CHELSIO_T4=y (was CONFIG_KCSAN)
Date: Thu, 20 Nov 2025 23:35:30 +0530	[thread overview]
Message-ID: <aR9YasvOhnSI564i@chelsio.com> (raw)
In-Reply-To: <b8e8b560-bce5-414b-846d-0da6d22a9983@oracle.com>

On Thursday, November 11/20/25, 2025 at 23:23:39 +0530, Alan Maguire wrote:
> On 20/11/2025 14:20, Alan Maguire wrote:
> > On 18/11/2025 16:47, Bart Van Assche wrote:
> >> On 11/18/25 4:07 AM, Alan Maguire wrote:
> >>> hi Bart, thanks for the report! Not a know issue to me at least; I tried
> >>> to reproduce it with pahole v1.31 + gcc 12 and no luck. Would you mind
> >>> sharing a few additional details:
> >>>
> >>> - compiler version
> >>> - pahole version
> >>> - full .config
> >>
> >> Hi Alan,
> >>
> >> My answers to your questions are as follows:
> >> * Compiler version: gcc version 14.2.0 (Debian 14.2.0-19+build5)
> >> * pahole version: v1.30
> >> * Kernel config: has been attached to this email.
> >>
> > 
> > thanks Bart! I've reproduced this now with gcc-14.2.1 + pahole 1.30 and
> > it is also observed with latest pahole 1.31. Investigating now, but if
> > you want to work around it in the short term, disabling CONFIG_WERROR
> > should allow resolve_btfids to proceed even where duplicate types are
> > present. Hopefully we will have a root cause/fix shortly though. Thanks
> > again for the report!
> >
> 
> [adding cxgb4 maintainer, for reasons that will become clearer below.
> Context here is that Bart is seeing kernel builds fail at the
> resolve_btfids stage; resolve_btfids is finding the BPF Type Format
> representation of core kernel data structures has duplicate entries for
> key kernel data structures like task_struct]

> 
> After adding some debug-only messaging to btf__dedup() in libbpf (which
> I will send as a patch as it makes debugging these situations much
> easier) I saw:
> 
> libbpf: struct 'task_struct' (size 2560 vlen 194) appears equivalent but
> differs for 23-indexed cand/canon member 'sched_class'/'sched_class': 0
> 
> Examining sched_class we see:
> 
> [107] STRUCT 'task_struct' size=2560 vlen=194
> 	...
>         'sched_class' type_id=480 bits_offset=5440
> 	...
> 
> [479] CONST '(anon)' type_id=8624
> [480] PTR '(anon)' type_id=479
> 
> [8624] STRUCT 'sched_class' size=216 vlen=27
>         'enqueue_task' type_id=8844 bits_offset=0
>         'dequeue_task' type_id=8846 bits_offset=64
>         'yield_task' type_id=8823 bits_offset=128
>         'yield_to_task' type_id=8848 bits_offset=192
>         'wakeup_preempt' type_id=8844 bits_offset=256
>         'balance' type_id=8851 bits_offset=320
>         'pick_task' type_id=8853 bits_offset=384
>         'pick_next_task' type_id=8855 bits_offset=448
>         'put_prev_task' type_id=8857 bits_offset=512
>         'set_next_task' type_id=8859 bits_offset=576
>         'select_task_rq' type_id=8861 bits_offset=640
>         'migrate_task_rq' type_id=8863 bits_offset=704
>         'task_woken' type_id=8865 bits_offset=768
>         'set_cpus_allowed' type_id=8868 bits_offset=832
>         'rq_online' type_id=8823 bits_offset=896
>         'rq_offline' type_id=8823 bits_offset=960
>         'find_lock_rq' type_id=8870 bits_offset=1024
>         'task_tick' type_id=8844 bits_offset=1088
>         'task_fork' type_id=236 bits_offset=1152
>         'task_dead' type_id=236 bits_offset=1216
>         'switching_to' type_id=8865 bits_offset=1280
>         'switched_from' type_id=8865 bits_offset=1344
>         'switched_to' type_id=8865 bits_offset=1408
>         'reweight_task' type_id=8873 bits_offset=1472
>         'prio_changed' type_id=8844 bits_offset=1536
>         'get_rr_interval' type_id=8875 bits_offset=1600
>         'update_curr' type_id=8823 bits_offset=1664
> 
> 
> Now looking at the first duplicate:
> 
> [36354] STRUCT 'task_struct' size=2560 vlen=194
> 	...
>         'sched_class' type_id=36389 bits_offset=5440
> 	...
> 
> 
> [36387] STRUCT 'sched_class' size=64 vlen=6
>         'state' type_id=28 bits_offset=0
>         'idx' type_id=28 bits_offset=8
>         'info' type_id=38195 bits_offset=32
>         'bind_type' type_id=38228 bits_offset=256
>         'entry_list' type_id=90 bits_offset=320
>         'refcnt' type_id=84 bits_offset=448
> [36388] CONST '(anon)' type_id=36387
> [36389] PTR '(anon)' type_id=36388
> 
> 
> sched_class looks totally different! The reason is cxgb4 declares its
> own sched_class while also #include'ing task_struct-related headers.
> Bart's config exposed this because he had CONFIG_CHELSIO_T4=y (I had 'm'
> in my config).
> 
> If we look at drivers/net/ethernet/chelsio/cxgb4/sched.h we indeed see:
> 
> struct sched_class {
>         u8 state;
>         u8 idx;
>         struct ch_sched_params info;
>         enum sched_bind_type bind_type;
>         struct list_head entry_list;
>         atomic_t refcnt;
> };
> 
> ..and cxgb4_main.c has #include <linux/sched.h> and #include <sched.h>
> with the clashing sched_class. Using pahole we can establish that the
> BTF encoding is simply reflecting the DWARF representation ("pahole
> cxgb4.ko" shows this), so BTF is effectively correctly reflecting the
> underlying DWARF representation. This will make life confusing for
> debuggers too.
> 
> So although it is a bit of a pain, I would suggest the simplest approach
> is to perhaps look at renaming sched_class to be a bit more
> domain-specific - ch_sched_class perhaps? That way it will not clash
> with task_struct's sched_class.
> 
> I can send a patch but it would be great to get cxgb4 maintainers' take
> here first.
Thanks for adding me and the detailed debug, Alan and Bart.
I will try this and let you know.
> 
> Thanks!
> 
> Alan

  reply	other threads:[~2025-11-20 18:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-17 20:40 Kernel build fails if both CONFIG_DEBUG_INFO_BTF and CONFIG_KCSAN are enabled Bart Van Assche
2025-11-18 12:07 ` Alan Maguire
2025-11-18 16:47   ` Bart Van Assche
2025-11-20 14:20     ` Alan Maguire
2025-11-20 17:53       ` Kernel build fails if both CONFIG_DEBUG_INFO_BTF and CONFIG_CHELSIO_T4=y (was CONFIG_KCSAN) Alan Maguire
2025-11-20 18:05         ` Potnuri Bharat Teja [this message]
2025-11-20 22:18           ` Alan Maguire
2025-11-21 17:22             ` Bart Van Assche
2025-11-21 18:15               ` Alan Maguire
2025-11-24 11:48                 ` Potnuri Bharat Teja

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aR9YasvOhnSI564i@chelsio.com \
    --to=bharat@chelsio.com \
    --cc=alan.maguire@oracle.com \
    --cc=bpf@vger.kernel.org \
    --cc=bvanassche@acm.org \
    --cc=martin.lau@linux.dev \
    --cc=nilay@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.