From: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
To: "Chen, Yu C" <yu.c.chen@intel.com>,
Srikar Dronamraju <srikar@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>,
Shrikanth Hegde <sshegde@linux.ibm.com>,
Ritesh Harjani <riteshh@linux.ibm.com>,
"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
linux-sched@vger.kernel.org, tim.c.chen@linux.intel.com,
K Prateek Nayak <kprateek.nayak@amd.com>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
Date: Tue, 26 May 2026 10:54:25 +0530 [thread overview]
Message-ID: <8675afde-5c7f-4717-be7a-a473bd1af381@linux.ibm.com> (raw)
In-Reply-To: <a0e3f4f4-03b6-4bb1-881f-06bf5abca1a3@intel.com>
On 26/05/26 9:38 am, Chen, Yu C wrote:
> Hi Venkat,
>
> On 5/26/2026 11:14 AM, Srikar Dronamraju wrote:
>> * Chen, Yu C <yu.c.chen@intel.com> [2026-05-25 23:35:45]:
>>
>>> Hi Venkat,
>>>
>>> On 5/25/2026 10:07 PM, Venkat Rao Bagalkote wrote:
>>>> Greetings!!!
>>>>
>>>> I am seeing an early boot kernel panic due to NULL pointer dereference
>>>> on a POWER9 (pSeries) system when testing linux-next (next-20260522).
This issue is seen on P11 as well.
[ 0.006697] smp: Brought up 1 node, 16 CPUs
[ 0.006702] Big cores detected but using small core scheduling
[ 0.006752] BUG: Kernel NULL pointer dereference on read at 0x00000000
[ 0.006755] Faulting instruction address: 0xc000000020adbb6c
[ 0.006759] Oops: Kernel access of bad area, sig: 7 [#1]
[ 0.006762] LE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
[ 0.006767] Modules linked in:
[ 0.006772] CPU: 4 UID: 0 PID: 1 Comm: swapper/4 Not tainted
7.1.0-rc5-next-20260525 #1 PREEMPT(lazy)
[ 0.006777] Hardware name: IBM,9080-HEX Power11 (architected)
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[ 0.006781] NIP: c000000020adbb6c LR: c0000000202e5a58 CTR:
0000000000000000
[ 0.006784] REGS: c0000000283d7890 TRAP: 0300 Not tainted
(7.1.0-rc5-next-20260525)
[ 0.006788] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR:
44002242 XER: 20040003
[ 0.006796] CFAR: c0000000202e5a54 DAR: 0000000000000000 DSISR:
00080000 IRQMASK: 0
[ 0.006796] GPR00: 0000000000000000 c0000000283d7b50 c000000021abf100
0000000000000010
[ 0.006796] GPR04: 0000000000000010 0000000000000030 0000000000000000
c000000028365500
[ 0.006796] GPR08: 0000000000000000 c000000022213598 000000003b77d000
0000000000000000
[ 0.006796] GPR12: c00000002005d8f0 c000000000008000 c0000000283cb578
c0000000283cb400
[ 0.006796] GPR16: c0000000283c9000 c000000022218b20 c0000000222330e8
00000000ffffffff
[ 0.006796] GPR20: fffffffffffffff6 0000000000000000 c000000022da36e0
0000000000000000
[ 0.006796] GPR24: 0000000000000000 0000000000000000 c0000000283c9178
c0000000227b5f00
[ 0.006796] GPR28: c00000002831c1e8 c000000022db5980 0000000000000000
0000000000000000
[ 0.006835] NIP [c000000020adbb6c] _find_first_bit+0xc/0xc0
[ 0.006842] LR [c0000000202e5a58] build_sched_domains+0x7d8/0xb40
[ 0.006847] Call Trace:
[ 0.006849] [c0000000283d7b50] [c0000000202e5408]
build_sched_domains+0x188/0xb40 (unreliable)
[ 0.006854] [c0000000283d7c90] [c000000022034380]
sched_init_domains+0x118/0x168
[ 0.006860] [c0000000283d7ce0] [c000000022032b14]
sched_init_smp+0xa8/0x158
[ 0.006865] [c0000000283d7d30] [c000000022005674]
kernel_init_freeable+0x1ac/0x294
[ 0.006870] [c0000000283d7dd0] [c000000020011718] kernel_init+0x2c/0x1c4
[ 0.006874] [c0000000283d7e30] [c00000002000debc]
ret_from_kernel_user_thread+0x14/0x1c
[ 0.006878] ---- interrupt: 0 at 0x0
[ 0.006881] Code: eb610038 7fc3f378 eb810040 eba10048 38210060
ebc1fff0 ebe1fff8 7c0803a6 4e800020 7c681b78 7c832379 4d820020
<e9280000> 38e3ffff 39400000 78e7d7e2
[ 0.006895] ---[ end trace 0000000000000000 ]---
[ 0.006898]
Regards,
Venkat.
>>>
>>> It seems that cpumask_first(llc_mask(i)) is accessing
>>> NULL cpu_coregroup_mask():
>>
>>> has_coregroup_support() is false, thus cpu_coregroup_map
>>> is never allocated in smp_prepare_cpus().
>>> This machine is a "shared system" VM. We should probably
>>> let the LLC id generation fall back to using L2 id if
>>> cpu_coregroup_mask is unavailable (which restores the
>>> behavior before this patch). I'm wondering if the following
>>> change would help(need IBM friends' help on this):
>>
>> Power9 and below systems, dont have coregroup.
>> Its not because of shared LPAR. But its true for dedicated LPARs too.
>> Only Power10 and above systems have hemisphere where we add MC/coregroup
>> support.
>>
>
> OK, thanks for the correction. Are you saying coregroup_enabled is false
> on Power9 and older hardware, and set to true on Power10? Power10 has a
> corresponding device-tree property, which is parsed to enable hemisphere
> support in find_possible_nodes(). This is why has_coregroup_support()
> returns true for Power10.
>
>>>
>>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>>> index 3467f86fd78f..cf6c2e4190ab 100644
>>> --- a/arch/powerpc/kernel/smp.c
>>> +++ b/arch/powerpc/kernel/smp.c
>>> @@ -1042,11 +1042,6 @@ static const struct cpumask
>>> *tl_smallcore_smt_mask(struct sched_domain_topology_
>>> }
>>> #endif
>>>
>>> -struct cpumask *cpu_coregroup_mask(int cpu)
>>> -{
>>> - return per_cpu(cpu_coregroup_map, cpu);
>>> -}
>>> -
>>> static bool has_coregroup_support(void)
>>> {
>>> /* Coregroup identification not available on shared systems */
>>> @@ -1056,6 +1051,14 @@ static bool has_coregroup_support(void)
>>> return coregroup_enabled;
>>> }
>>>
>>> +struct cpumask *cpu_coregroup_mask(int cpu)
>>> +{
>>> + if (!has_coregroup_support())
>>> + return cpu_l2_cache_mask(cpu);
>>> +
>>> + return per_cpu(cpu_coregroup_map, cpu);
>>> +}
>>> +
>>
>> While this is a work-around for the problem in Power9
>> It will hurt Power10 and Power11 systems.
>> As has been alluded by Prateek, MC is not LLC on Power.
>
> Could you please elaborate on the cache topology?
> Specifically, could you clarify what the LLC is for Power9
> and Power10 respectively? Is it always the L2 cache?
>
> I have checked the IBM documentation available at:
> https://hc32.hotchips.org/assets/program/conference/day1/HotChips2020_Server_Processors_IBM_Starke_POWER10_v33.pdf
>
> According to the document, a hemisphere corresponds to a 64MB
> L3 cache shared by 8 cores. Since the MC domain spans a single
> hemisphere, I wonder why the SD_SHARE_LLC flag is not enabled
> for the MC domain?
>
>> So by using llc_mask as cpu_coregroup_mask() we run the trouble of
>> assuming
>> MC to be similar to LLC. So it will impact Power 10/11 Systems.
>>
>> In commit b5ea300a17e3 sched/cache: Make LLC id continuous, we define
>> #define llc_mask(cpu) cpu_coregroup_mask(cpu)
>>
>> defining it llc_mask to cpu_coregroup_mask means MC should be LLC.
>> This is not true for some architectures atleast on Power.
>>
>
> OK.
>
>> So shouldn't it be using
>> #define llc_mask(cpu) per_cpu(sd_llc, cpu)
>>
>> This should work for systems where LLC is sub-coregroup, coregroup
>> (or super
>> coregroup: Lets say some archs want LLC at PKG and cluster at
>> coregroup).
>>
>> if we do that, I dont think we even need the else case where we say
>> #define llc_mask(cpu) cpumask_of(cpu)
>>
>
> I suppose you are referring to
> sched_domain_span(per_cpu(sd_llc, cpu)).
>
> Indeed, deriving the LLC from the SD_SHARE_LLC level offers
> better scalability. However, this approach would involve scheduler
> domains, which can be truncated by cpuset partitions - a scenario we
> prefer to avoid.
>
> thanks,
> Chenyu
>
next prev parent reply other threads:[~2026-05-26 5:25 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-25 14:07 [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
2026-05-25 15:35 ` Chen, Yu C
2026-05-25 16:16 ` K Prateek Nayak
2026-05-26 3:14 ` Chen, Yu C
2026-05-26 3:14 ` Srikar Dronamraju
2026-05-26 4:08 ` Chen, Yu C
2026-05-26 4:58 ` Srikar Dronamraju
2026-05-26 5:53 ` K Prateek Nayak
2026-05-26 14:08 ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask Chen Yu
2026-05-27 7:01 ` Shrikanth Hegde
2026-05-27 16:05 ` Chen, Yu C
2026-05-27 18:07 ` Shrikanth Hegde
2026-05-28 4:58 ` Shrikanth Hegde
2026-05-28 9:12 ` Chen, Yu C
2026-05-28 10:26 ` Shrikanth Hegde
2026-05-28 15:54 ` Srikar Dronamraju
2026-05-28 15:58 ` Srikar Dronamraju
2026-05-27 16:30 ` K Prateek Nayak
2026-05-26 5:24 ` Venkat Rao Bagalkote [this message]
2026-05-27 7:05 ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Shrikanth Hegde
2026-05-28 16:01 ` Srikar Dronamraju
2026-05-28 6:54 ` Ritesh Harjani
2026-05-28 16:06 ` Srikar Dronamraju
2026-05-28 11:27 ` Shrikanth Hegde
2026-05-28 13:21 ` Chen, Yu C
2026-05-28 15:06 ` Ritesh Harjani
2026-05-28 15:56 ` Srikar Dronamraju
2026-05-28 16:31 ` Shrikanth Hegde
2026-05-28 16:44 ` Srikar Dronamraju
2026-05-29 3:58 ` Shrikanth Hegde
2026-05-29 6:59 ` Venkat Rao Bagalkote
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8675afde-5c7f-4717-be7a-a473bd1af381@linux.ibm.com \
--to=venkat88@linux.ibm.com \
--cc=chleroy@kernel.org \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-sched@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=peterz@infradead.org \
--cc=riteshh@linux.ibm.com \
--cc=srikar@linux.ibm.com \
--cc=sshegde@linux.ibm.com \
--cc=tim.c.chen@linux.intel.com \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox