LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
To: "Chen, Yu C" <yu.c.chen@intel.com>,
	Srikar Dronamraju <srikar@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Ritesh Harjani <riteshh@linux.ibm.com>,
	"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	linux-sched@vger.kernel.org, tim.c.chen@linux.intel.com,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
Date: Tue, 26 May 2026 10:54:25 +0530	[thread overview]
Message-ID: <8675afde-5c7f-4717-be7a-a473bd1af381@linux.ibm.com> (raw)
In-Reply-To: <a0e3f4f4-03b6-4bb1-881f-06bf5abca1a3@intel.com>


On 26/05/26 9:38 am, Chen, Yu C wrote:
> Hi Venkat,
>
> On 5/26/2026 11:14 AM, Srikar Dronamraju wrote:
>> * Chen, Yu C <yu.c.chen@intel.com> [2026-05-25 23:35:45]:
>>
>>> Hi Venkat,
>>>
>>> On 5/25/2026 10:07 PM, Venkat Rao Bagalkote wrote:
>>>> Greetings!!!
>>>>
>>>> I am seeing an early boot kernel panic due to NULL pointer dereference
>>>> on a POWER9 (pSeries) system when testing linux-next (next-20260522).


This issue is seen on P11 as well.

[    0.006697] smp: Brought up 1 node, 16 CPUs
[    0.006702] Big cores detected but using small core scheduling
[    0.006752] BUG: Kernel NULL pointer dereference on read at 0x00000000
[    0.006755] Faulting instruction address: 0xc000000020adbb6c
[    0.006759] Oops: Kernel access of bad area, sig: 7 [#1]
[    0.006762] LE PAGE_SIZE=4K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
[    0.006767] Modules linked in:
[    0.006772] CPU: 4 UID: 0 PID: 1 Comm: swapper/4 Not tainted 
7.1.0-rc5-next-20260525 #1 PREEMPT(lazy)
[    0.006777] Hardware name: IBM,9080-HEX Power11 (architected) 
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[    0.006781] NIP:  c000000020adbb6c LR: c0000000202e5a58 CTR: 
0000000000000000
[    0.006784] REGS: c0000000283d7890 TRAP: 0300   Not tainted 
(7.1.0-rc5-next-20260525)
[    0.006788] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 
44002242  XER: 20040003
[    0.006796] CFAR: c0000000202e5a54 DAR: 0000000000000000 DSISR: 
00080000 IRQMASK: 0
[    0.006796] GPR00: 0000000000000000 c0000000283d7b50 c000000021abf100 
0000000000000010
[    0.006796] GPR04: 0000000000000010 0000000000000030 0000000000000000 
c000000028365500
[    0.006796] GPR08: 0000000000000000 c000000022213598 000000003b77d000 
0000000000000000
[    0.006796] GPR12: c00000002005d8f0 c000000000008000 c0000000283cb578 
c0000000283cb400
[    0.006796] GPR16: c0000000283c9000 c000000022218b20 c0000000222330e8 
00000000ffffffff
[    0.006796] GPR20: fffffffffffffff6 0000000000000000 c000000022da36e0 
0000000000000000
[    0.006796] GPR24: 0000000000000000 0000000000000000 c0000000283c9178 
c0000000227b5f00
[    0.006796] GPR28: c00000002831c1e8 c000000022db5980 0000000000000000 
0000000000000000
[    0.006835] NIP [c000000020adbb6c] _find_first_bit+0xc/0xc0
[    0.006842] LR [c0000000202e5a58] build_sched_domains+0x7d8/0xb40
[    0.006847] Call Trace:
[    0.006849] [c0000000283d7b50] [c0000000202e5408] 
build_sched_domains+0x188/0xb40 (unreliable)
[    0.006854] [c0000000283d7c90] [c000000022034380] 
sched_init_domains+0x118/0x168
[    0.006860] [c0000000283d7ce0] [c000000022032b14] 
sched_init_smp+0xa8/0x158
[    0.006865] [c0000000283d7d30] [c000000022005674] 
kernel_init_freeable+0x1ac/0x294
[    0.006870] [c0000000283d7dd0] [c000000020011718] kernel_init+0x2c/0x1c4
[    0.006874] [c0000000283d7e30] [c00000002000debc] 
ret_from_kernel_user_thread+0x14/0x1c
[    0.006878] ---- interrupt: 0 at 0x0
[    0.006881] Code: eb610038 7fc3f378 eb810040 eba10048 38210060 
ebc1fff0 ebe1fff8 7c0803a6 4e800020 7c681b78 7c832379 4d820020 
<e9280000> 38e3ffff 39400000 78e7d7e2
[    0.006895] ---[ end trace 0000000000000000 ]---
[    0.006898]


Regards,

Venkat.

>>>
>>> It seems that cpumask_first(llc_mask(i)) is accessing
>>> NULL cpu_coregroup_mask():
>>
>>> has_coregroup_support() is false, thus cpu_coregroup_map
>>> is never allocated in smp_prepare_cpus().
>>> This machine is a "shared system" VM. We should probably
>>> let the LLC id generation fall back to using L2 id if
>>> cpu_coregroup_mask is unavailable (which restores the
>>> behavior before this patch). I'm wondering if the following
>>> change would help(need IBM friends' help on this):
>>
>> Power9 and below systems, dont have coregroup.
>> Its not because of shared LPAR. But its true for dedicated LPARs too.
>> Only Power10 and above systems have hemisphere where we add MC/coregroup
>> support.
>>
>
> OK, thanks for the correction. Are you saying coregroup_enabled is false
> on Power9 and older hardware, and set to true on Power10? Power10 has a
> corresponding device-tree property, which is parsed to enable hemisphere
> support in find_possible_nodes(). This is why has_coregroup_support()
> returns true for Power10.
>
>>>
>>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>>> index 3467f86fd78f..cf6c2e4190ab 100644
>>> --- a/arch/powerpc/kernel/smp.c
>>> +++ b/arch/powerpc/kernel/smp.c
>>> @@ -1042,11 +1042,6 @@ static const struct cpumask
>>> *tl_smallcore_smt_mask(struct sched_domain_topology_
>>>   }
>>>   #endif
>>>
>>> -struct cpumask *cpu_coregroup_mask(int cpu)
>>> -{
>>> -       return per_cpu(cpu_coregroup_map, cpu);
>>> -}
>>> -
>>>   static bool has_coregroup_support(void)
>>>   {
>>>          /* Coregroup identification not available on shared systems */
>>> @@ -1056,6 +1051,14 @@ static bool has_coregroup_support(void)
>>>          return coregroup_enabled;
>>>   }
>>>
>>> +struct cpumask *cpu_coregroup_mask(int cpu)
>>> +{
>>> +       if (!has_coregroup_support())
>>> +               return cpu_l2_cache_mask(cpu);
>>> +
>>> +       return per_cpu(cpu_coregroup_map, cpu);
>>> +}
>>> +
>>
>> While this is a work-around for the problem in Power9
>> It will hurt Power10 and Power11 systems.
>> As has been alluded by Prateek, MC is not LLC on Power.
>
> Could you please elaborate on the cache topology?
> Specifically, could you clarify what the LLC is for Power9
> and Power10 respectively? Is it always the L2 cache?
>
> I have checked the IBM documentation available at:
> https://hc32.hotchips.org/assets/program/conference/day1/HotChips2020_Server_Processors_IBM_Starke_POWER10_v33.pdf 
>
> According to the document, a hemisphere corresponds to a 64MB
> L3 cache shared by 8 cores. Since the MC domain spans a single
> hemisphere, I wonder why the SD_SHARE_LLC flag is not enabled
> for the MC domain?
>
>> So by using llc_mask as cpu_coregroup_mask() we run the trouble of 
>> assuming
>> MC to be similar to LLC. So it will impact Power 10/11 Systems.
>>
>> In commit b5ea300a17e3 sched/cache: Make LLC id continuous, we define
>> #define llc_mask(cpu) cpu_coregroup_mask(cpu)
>>
>> defining it llc_mask to cpu_coregroup_mask means MC should be LLC.
>> This is not true for some architectures atleast on Power.
>>
>
> OK.
>
>> So shouldn't it be using
>> #define llc_mask(cpu) per_cpu(sd_llc, cpu)
>>
>> This should work for systems where LLC is sub-coregroup, coregroup 
>> (or super
>> coregroup: Lets say some archs want LLC at PKG and cluster at 
>> coregroup).
>>
>> if we do that, I dont think we even need the else case where we say
>> #define llc_mask(cpu) cpumask_of(cpu)
>>
>
> I suppose you are referring to
> sched_domain_span(per_cpu(sd_llc, cpu)).
>
> Indeed, deriving the LLC from the SD_SHARE_LLC level offers
> better scalability. However, this approach would involve scheduler
> domains, which can be truncated by cpuset partitions - a scenario we
> prefer to avoid.
>
> thanks,
> Chenyu
>


  parent reply	other threads:[~2026-05-26  5:25 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-25 14:07 [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
2026-05-25 15:35 ` Chen, Yu C
2026-05-25 16:16   ` K Prateek Nayak
2026-05-26  3:14     ` Chen, Yu C
2026-05-26  3:14   ` Srikar Dronamraju
2026-05-26  4:08     ` Chen, Yu C
2026-05-26  4:58       ` Srikar Dronamraju
2026-05-26  5:53         ` K Prateek Nayak
2026-05-26 14:08           ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask Chen Yu
2026-05-27  7:01             ` Shrikanth Hegde
2026-05-27 16:05               ` Chen, Yu C
2026-05-27 18:07                 ` Shrikanth Hegde
2026-05-28  4:58                   ` Shrikanth Hegde
2026-05-28  9:12                     ` Chen, Yu C
2026-05-28 10:26                       ` Shrikanth Hegde
2026-05-28 15:54                       ` Srikar Dronamraju
2026-05-28 15:58                   ` Srikar Dronamraju
2026-05-27 16:30               ` K Prateek Nayak
2026-05-26  5:24       ` Venkat Rao Bagalkote [this message]
2026-05-27  7:05         ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Shrikanth Hegde
2026-05-28 16:01           ` Srikar Dronamraju
2026-05-28  6:54 ` Ritesh Harjani
2026-05-28 16:06   ` Srikar Dronamraju
2026-05-28 11:27 ` Shrikanth Hegde
2026-05-28 13:21   ` Chen, Yu C
2026-05-28 15:06   ` Ritesh Harjani
2026-05-28 15:56   ` Srikar Dronamraju
2026-05-28 16:31     ` Shrikanth Hegde
2026-05-28 16:44       ` Srikar Dronamraju
2026-05-29  3:58 ` Shrikanth Hegde
2026-05-29  6:59   ` Venkat Rao Bagalkote

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8675afde-5c7f-4717-be7a-a473bd1af381@linux.ibm.com \
    --to=venkat88@linux.ibm.com \
    --cc=chleroy@kernel.org \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sched@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riteshh@linux.ibm.com \
    --cc=srikar@linux.ibm.com \
    --cc=sshegde@linux.ibm.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox