* [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
@ 2026-05-25 14:07 Venkat Rao Bagalkote
2026-05-25 15:35 ` Chen, Yu C
` (3 more replies)
0 siblings, 4 replies; 31+ messages in thread
From: Venkat Rao Bagalkote @ 2026-05-25 14:07 UTC (permalink / raw)
To: Peter Zijlstra, K Prateek Nayak, Chen, Yu C, tim.c.chen
Cc: Madhavan Srinivasan, Shrikanth Hegde, Ritesh Harjani,
Christophe Leroy (CS GROUP), LKML, linuxppc-dev, linux-sched
Greetings!!!
I am seeing an early boot kernel panic due to NULL pointer dereference
on a POWER9 (pSeries) system when testing linux-next (next-20260522).
Traces:
[ 0.038567] Big cores detected but using small core scheduling
[ 0.038796] BUG: Kernel NULL pointer dereference at 0x00000000
[ 0.038804] Faulting instruction address: 0xc000000000e58504
[ 0.038812] Oops: Kernel access of bad area, sig: 11 [#1]
[ 0.038819] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
[ 0.038830] Modules linked in:
[ 0.038840] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted
7.0.0-rc6+ #14 PREEMPTLAZY
[ 0.038851] Hardware name: IBM,8375-42A POWER9 (architected) 0x4e0202
0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries
[ 0.038860] NIP: c000000000e58504 LR: c000000000e58500 CTR:
0000000000000000
[ 0.038869] REGS: c0000000090e78e0 TRAP: 0380 Not tainted (7.0.0-rc6+)
[ 0.038878] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR:
44002242 XER: 20040003
[ 0.038907] CFAR: c00000000093f3f0 IRQMASK: 0
[ 0.038907] GPR00: c00000000038b3b8 c0000000090e7b80 c00000000259a800
0000000000000000
[ 0.038907] GPR04: 0000000000000038 0000000000000038 c00000000c6e2560
0000000000000000
[ 0.038907] GPR08: 0000000000000000 0000000000000037 0000ffffffffffff
0000000000000000
[ 0.038907] GPR12: c000000000072730 c0000000051b0000 c00000000c6ee560
00000000ffffffff
[ 0.038907] GPR16: 0000000000000000 0000000000000038 c0000000032c6b08
fffffffffffffff6
[ 0.038907] GPR20: 0000000000000000 c000000004d1a6e0 0000000000000000
0000000000000000
[ 0.038907] GPR24: 0000000000000000 0000000000000000 00000000ffffffff
c00000000a3bf940
[ 0.038907] GPR28: 0000000000000038 0000000000000000 0000000000000000
0000000000000000
[ 0.039029] NIP [c000000000e58504] _find_first_bit+0x44/0x130
[ 0.039043] LR [c000000000e58500] _find_first_bit+0x40/0x130
[ 0.039054] Call Trace:
[ 0.039060] [c0000000090e7b80] [c00000000416af20]
schedutil_gov+0x0/0xa0 (unreliable)
[ 0.039076] [c0000000090e7bc0] [c00000000038b3b8]
build_sched_domains+0xad8/0xe50
[ 0.039089] [c0000000090e7ce0] [c000000003045d78]
sched_init_smp+0xa8/0x164
[ 0.039102] [c0000000090e7d30] [c00000000300f374]
kernel_init_freeable+0x250/0x370
[ 0.039117] [c0000000090e7de0] [c000000000011f90] kernel_init+0x34/0x1e4
[ 0.039129] [c0000000090e7e50] [c00000000000debc]
ret_from_kernel_user_thread+0x14/0x1c
[ 0.039142] ---- interrupt: 0 at 0x0
[ 0.039150] Code: 41820090 7c0802a6 393cffff fbe10038 7c7f1b78
fba10028 fbc10030 3bc00000 793dd7e2 f8010050 4bae6e9d 60000000
<e93f0000> 2c290000 408200bc 283c0040
[ 0.039196] ---[ end trace 0000000000000000 ]---
Git bisect is pointing to b5ea300a17e3 sched/cache: Make LLC id
continuous as first bad commit.
Git Bisect Logs:
# git bisect log
git bisect start
# status: waiting for both good and bad commits
# bad: [c1ecb239fa3456529a32255359fc78b69eb9d847] Add linux-next
specific files for 20260522
git bisect bad c1ecb239fa3456529a32255359fc78b69eb9d847
# status: waiting for good commit(s), bad commit known
# good: [5200f5f493f79f14bbdc349e402a40dfb32f23c8] Linux 7.1-rc4
git bisect good 5200f5f493f79f14bbdc349e402a40dfb32f23c8
# good: [7cd27a0d57b8539366c98bb04fe48d1aff779ea9] Merge branch 'main'
of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect good 7cd27a0d57b8539366c98bb04fe48d1aff779ea9
# good: [efb3dd6031ec9858c7285fd673970320c86c01f3] Merge branch 'next'
of https://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git
git bisect good efb3dd6031ec9858c7285fd673970320c86c01f3
# bad: [1a6066d1c1243fdc5ed464032bbdf12e6710c027] Merge branch
'driver-core-next' of
https://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git
git bisect bad 1a6066d1c1243fdc5ed464032bbdf12e6710c027
# good: [409a99cbc316d912c999fd75b9df042b25900e50] Merge branch
'for-next' of
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
git bisect good 409a99cbc316d912c999fd75b9df042b25900e50
# bad: [af73f6b022c8c09a3234176892a18216be4cd984] Merge branch 'next' of
git://git.kernel.org/pub/scm/virt/kvm/kvm.git
git bisect bad af73f6b022c8c09a3234176892a18216be4cd984
# bad: [6a459eb254e4bff61546587eccd3091955123d24] Merge branch into
tip/master: 'sched/core'
git bisect bad 6a459eb254e4bff61546587eccd3091955123d24
# good: [71ba4bb66c3a9287245d0f5fcfb27d4b951ba402] Merge branch into
tip/master: 'locking/core'
git bisect good 71ba4bb66c3a9287245d0f5fcfb27d4b951ba402
# good: [f3b45696a160a2230d846de8f706e835984ae65b] Merge branch into
tip/master: 'objtool/core'
git bisect good f3b45696a160a2230d846de8f706e835984ae65b
# bad: [c99b8593b060931c5a0a4b701689f8d6a2c00dbf] sched/cache: Fix stale
preferred_llc for a new task
git bisect bad c99b8593b060931c5a0a4b701689f8d6a2c00dbf
# bad: [5b1d5e6db20a6c64ffb95d04578db8c4b0228eea] sched/cache: Respect
LLC preference in task migration and detach
git bisect bad 5b1d5e6db20a6c64ffb95d04578db8c4b0228eea
# bad: [46afe3af7ead57190b6d362e214814ec804e3b7b] sched/cache: Track
LLC-preferred tasks per runqueue
git bisect bad 46afe3af7ead57190b6d362e214814ec804e3b7b
# good: [f025ef275388742643a2c33f00a0d9c0af3112ee] sched/cache: Record
per LLC utilization to guide cache aware scheduling decisions
git bisect good f025ef275388742643a2c33f00a0d9c0af3112ee
# bad: [b5ea300a17e37eada7a98561fbd34a3054578713] sched/cache: Make LLC
id continuous
git bisect bad b5ea300a17e37eada7a98561fbd34a3054578713
# good: [23b2b5ccc45ce2a38b9336a916088fffdc4cdfb1] sched/cache:
Introduce helper functions to enforce LLC migration policy
git bisect good 23b2b5ccc45ce2a38b9336a916088fffdc4cdfb1
# first bad commit: [b5ea300a17e37eada7a98561fbd34a3054578713]
sched/cache: Make LLC id continuous
b5ea300a17e37eada7a98561fbd34a3054578713 is the first bad commit
commit b5ea300a17e37eada7a98561fbd34a3054578713
Author: Tim Chen <tim.c.chen@linux.intel.com>
Date: Wed Apr 1 14:52:17 2026 -0700
sched/cache: Make LLC id continuous
Introduce an index mapping between CPUs and their LLCs. This provides
a roughly continuous per LLC index needed for cache-aware load
balancing in
later patches.
The existing per_cpu llc_id usually points to the first CPU of the
LLC domain, which is sparse and unsuitable as an array index. Using
llc_id directly would waste memory.
With the new mapping, CPUs in the same LLC share an approximate
continuous id:
per_cpu(llc_id, CPU=0...15) = 0
per_cpu(llc_id, CPU=16...31) = 1
per_cpu(llc_id, CPU=32...47) = 2
...
Note that the LLC IDs are allocated via bitmask, so the IDs may be
reused during CPU offline->online transitions.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Originally-by: K Prateek Nayak <kprateek.nayak@amd.com>
Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link:
https://patch.msgid.link/047ef46339e4db497b54a89940a7ebedf27fcf28.1775065312.git.tim.c.chen@linux.intel.com
kernel/sched/core.c | 2 ++
kernel/sched/sched.h | 3 ++
kernel/sched/topology.c | 90
+++++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 93 insertions(+), 2 deletions(-)
If you happen to fix this, please add below tag.
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Regards,
Venkat.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-25 14:07 [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
@ 2026-05-25 15:35 ` Chen, Yu C
2026-05-25 16:16 ` K Prateek Nayak
2026-05-26 3:14 ` Srikar Dronamraju
2026-05-28 6:54 ` Ritesh Harjani
` (2 subsequent siblings)
3 siblings, 2 replies; 31+ messages in thread
From: Chen, Yu C @ 2026-05-25 15:35 UTC (permalink / raw)
To: Venkat Rao Bagalkote
Cc: Madhavan Srinivasan, Shrikanth Hegde, Ritesh Harjani,
Christophe Leroy (CS GROUP), LKML, linuxppc-dev, linux-sched,
tim.c.chen, K Prateek Nayak, Peter Zijlstra
Hi Venkat,
On 5/25/2026 10:07 PM, Venkat Rao Bagalkote wrote:
> Greetings!!!
>
> I am seeing an early boot kernel panic due to NULL pointer dereference
> on a POWER9 (pSeries) system when testing linux-next (next-20260522).
>
Thanks for the test.
>
> Traces:
>
> [ 0.038567] Big cores detected but using small core scheduling
> [ 0.038796] BUG: Kernel NULL pointer dereference at 0x00000000
> [ 0.038804] Faulting instruction address: 0xc000000000e58504
> [ 0.038812] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 0.038819] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
> [ 0.038830] Modules linked in:
> [ 0.038840] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-
> rc6+ #14 PREEMPTLAZY
> [ 0.038851] Hardware name: IBM,8375-42A POWER9 (architected) 0x4e0202
> 0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries
> [ 0.039029] NIP [c000000000e58504] _find_first_bit+0x44/0x130
> [ 0.039043] LR [c000000000e58500] _find_first_bit+0x40/0x130
> [ 0.039076] [c0000000090e7bc0] [c00000000038b3b8]
> build_sched_domains+0xad8/0xe50
It seems that cpumask_first(llc_mask(i)) is accessing
NULL cpu_coregroup_mask():
has_coregroup_support() is false, thus cpu_coregroup_map
is never allocated in smp_prepare_cpus().
This machine is a "shared system" VM. We should probably
let the LLC id generation fall back to using L2 id if
cpu_coregroup_mask is unavailable (which restores the
behavior before this patch). I'm wondering if the following
change would help(need IBM friends' help on this):
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 3467f86fd78f..cf6c2e4190ab 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1042,11 +1042,6 @@ static const struct cpumask
*tl_smallcore_smt_mask(struct sched_domain_topology_
}
#endif
-struct cpumask *cpu_coregroup_mask(int cpu)
-{
- return per_cpu(cpu_coregroup_map, cpu);
-}
-
static bool has_coregroup_support(void)
{
/* Coregroup identification not available on shared systems */
@@ -1056,6 +1051,14 @@ static bool has_coregroup_support(void)
return coregroup_enabled;
}
+struct cpumask *cpu_coregroup_mask(int cpu)
+{
+ if (!has_coregroup_support())
+ return cpu_l2_cache_mask(cpu);
+
+ return per_cpu(cpu_coregroup_map, cpu);
+}
+
static int __init init_big_cores(void)
{
int cpu;
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-25 15:35 ` Chen, Yu C
@ 2026-05-25 16:16 ` K Prateek Nayak
2026-05-26 3:14 ` Chen, Yu C
2026-05-26 3:14 ` Srikar Dronamraju
1 sibling, 1 reply; 31+ messages in thread
From: K Prateek Nayak @ 2026-05-25 16:16 UTC (permalink / raw)
To: Chen, Yu C, Venkat Rao Bagalkote
Cc: Madhavan Srinivasan, Shrikanth Hegde, Ritesh Harjani,
Christophe Leroy (CS GROUP), LKML, linuxppc-dev, linux-sched,
tim.c.chen, Peter Zijlstra
Hello Chenyu, Venkat,
On 5/25/2026 9:05 PM, Chen, Yu C wrote:
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 3467f86fd78f..cf6c2e4190ab 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1042,11 +1042,6 @@ static const struct cpumask *tl_smallcore_smt_mask(struct sched_domain_topology_
> }
> #endif
>
> -struct cpumask *cpu_coregroup_mask(int cpu)
> -{
> - return per_cpu(cpu_coregroup_map, cpu);
> -}
> -
> static bool has_coregroup_support(void)
> {
> /* Coregroup identification not available on shared systems */
> @@ -1056,6 +1051,14 @@ static bool has_coregroup_support(void)
> return coregroup_enabled;
> }
>
> +struct cpumask *cpu_coregroup_mask(int cpu)
> +{
> + if (!has_coregroup_support())
> + return cpu_l2_cache_mask(cpu);
> +> + return per_cpu(cpu_coregroup_map, cpu);
Interestingly, on powerpc, the MC domain doesn't have SD_SHARE_LLC flag
set but I believe there is still some benefit of keeping the tasks on
the same hemisphere?
If we are actually aiming for LLC, I think cpu_l2_cache_mask() is the
right cpumask for all cases since tl_cache_mask() also returns that
and the l2_cache_mask is set in all cases covered by update_mask_by_l2()
in the same file.
If consolidation on hemisphere is beneficial, then the above diff
looks correct.
Note: For has_coregroup_support(), with the above fix, the scheduler
side llc_id will now be based on MC domain's span which seems wrong
since it doesn't have SD_SHARE_LLC flag and might lead to other
behavioral changes now.
> +}
> +
> static int __init init_big_cores(void)
> {
> int cpu;
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-25 15:35 ` Chen, Yu C
2026-05-25 16:16 ` K Prateek Nayak
@ 2026-05-26 3:14 ` Srikar Dronamraju
2026-05-26 4:08 ` Chen, Yu C
1 sibling, 1 reply; 31+ messages in thread
From: Srikar Dronamraju @ 2026-05-26 3:14 UTC (permalink / raw)
To: Chen, Yu C
Cc: Venkat Rao Bagalkote, Madhavan Srinivasan, Shrikanth Hegde,
Ritesh Harjani, Christophe Leroy (CS GROUP), LKML, linuxppc-dev,
linux-sched, tim.c.chen, K Prateek Nayak, Peter Zijlstra
* Chen, Yu C <yu.c.chen@intel.com> [2026-05-25 23:35:45]:
> Hi Venkat,
>
> On 5/25/2026 10:07 PM, Venkat Rao Bagalkote wrote:
> > Greetings!!!
> >
> > I am seeing an early boot kernel panic due to NULL pointer dereference
> > on a POWER9 (pSeries) system when testing linux-next (next-20260522).
>
> It seems that cpumask_first(llc_mask(i)) is accessing
> NULL cpu_coregroup_mask():
> has_coregroup_support() is false, thus cpu_coregroup_map
> is never allocated in smp_prepare_cpus().
> This machine is a "shared system" VM. We should probably
> let the LLC id generation fall back to using L2 id if
> cpu_coregroup_mask is unavailable (which restores the
> behavior before this patch). I'm wondering if the following
> change would help(need IBM friends' help on this):
Power9 and below systems, dont have coregroup.
Its not because of shared LPAR. But its true for dedicated LPARs too.
Only Power10 and above systems have hemisphere where we add MC/coregroup
support.
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 3467f86fd78f..cf6c2e4190ab 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1042,11 +1042,6 @@ static const struct cpumask
> *tl_smallcore_smt_mask(struct sched_domain_topology_
> }
> #endif
>
> -struct cpumask *cpu_coregroup_mask(int cpu)
> -{
> - return per_cpu(cpu_coregroup_map, cpu);
> -}
> -
> static bool has_coregroup_support(void)
> {
> /* Coregroup identification not available on shared systems */
> @@ -1056,6 +1051,14 @@ static bool has_coregroup_support(void)
> return coregroup_enabled;
> }
>
> +struct cpumask *cpu_coregroup_mask(int cpu)
> +{
> + if (!has_coregroup_support())
> + return cpu_l2_cache_mask(cpu);
> +
> + return per_cpu(cpu_coregroup_map, cpu);
> +}
> +
While this is a work-around for the problem in Power9
It will hurt Power10 and Power11 systems.
As has been alluded by Prateek, MC is not LLC on Power.
So by using llc_mask as cpu_coregroup_mask() we run the trouble of assuming
MC to be similar to LLC. So it will impact Power 10/11 Systems.
In commit b5ea300a17e3 sched/cache: Make LLC id continuous, we define
#define llc_mask(cpu) cpu_coregroup_mask(cpu)
defining it llc_mask to cpu_coregroup_mask means MC should be LLC.
This is not true for some architectures atleast on Power.
So shouldn't it be using
#define llc_mask(cpu) per_cpu(sd_llc, cpu)
This should work for systems where LLC is sub-coregroup, coregroup (or super
coregroup: Lets say some archs want LLC at PKG and cluster at coregroup).
if we do that, I dont think we even need the else case where we say
#define llc_mask(cpu) cpumask_of(cpu)
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-25 16:16 ` K Prateek Nayak
@ 2026-05-26 3:14 ` Chen, Yu C
0 siblings, 0 replies; 31+ messages in thread
From: Chen, Yu C @ 2026-05-26 3:14 UTC (permalink / raw)
To: K Prateek Nayak, Venkat Rao Bagalkote
Cc: Madhavan Srinivasan, Shrikanth Hegde, Ritesh Harjani,
Christophe Leroy (CS GROUP), LKML, linuxppc-dev, linux-sched,
tim.c.chen, Peter Zijlstra
On 5/26/2026 12:16 AM, K Prateek Nayak wrote:
> Hello Chenyu, Venkat,
>
> On 5/25/2026 9:05 PM, Chen, Yu C wrote:
>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> index 3467f86fd78f..cf6c2e4190ab 100644
>> --- a/arch/powerpc/kernel/smp.c
>> +++ b/arch/powerpc/kernel/smp.c
>> @@ -1042,11 +1042,6 @@ static const struct cpumask *tl_smallcore_smt_mask(struct sched_domain_topology_
>> }
>> #endif
>>
>> -struct cpumask *cpu_coregroup_mask(int cpu)
>> -{
>> - return per_cpu(cpu_coregroup_map, cpu);
>> -}
>> -
>> static bool has_coregroup_support(void)
>> {
>> /* Coregroup identification not available on shared systems */
>> @@ -1056,6 +1051,14 @@ static bool has_coregroup_support(void)
>> return coregroup_enabled;
>> }
>>
>> +struct cpumask *cpu_coregroup_mask(int cpu)
>> +{
>> + if (!has_coregroup_support())
>> + return cpu_l2_cache_mask(cpu);
>> +> + return per_cpu(cpu_coregroup_map, cpu);
>
> Interestingly, on powerpc, the MC domain doesn't have SD_SHARE_LLC flag
> set but I believe there is still some benefit of keeping the tasks on
> the same hemisphere?
>
You are right. I guess power9 reported here does not have hemisphere and
power10 has, according to commit fb2ff9fa72e2:
"From Power10 processors onwards, each chip has 2 hemispheres"
but yes on both power9 and power10, MC domain doesn't have SD_SHARE_LLC
thus aggregating threads to 1 L2 domain might bring benefit. A side note is
that, cache aware scheduling is disabled on power for now because
power does not use the generic cacheinfo framework, thus its cache size
is returned as 0 so get_effective_llc_bytes() returns 0(for now, unless
we get the help from IBM friends to support it)
commit 7030513a0877 ("7030513a0877")
> If we are actually aiming for LLC, I think cpu_l2_cache_mask() is the
> right cpumask for all cases since tl_cache_mask() also returns that
> and the l2_cache_mask is set in all cases covered by update_mask_by_l2()
> in the same file.
>
> If consolidation on hemisphere is beneficial, then the above diff
> looks correct.
>
> Note: For has_coregroup_support(), with the above fix, the scheduler
> side llc_id will now be based on MC domain's span which seems wrong
> since it doesn't have SD_SHARE_LLC flag and might lead to other
> behavioral changes now.
>
You are right, it seems to be a bug when has_coregroup_support() is enabled.
Maybe we can always return l2 id for power?
How about this(revert previous diff):
diff --git a/arch/powerpc/include/asm/topology.h
b/arch/powerpc/include/asm/topology.h
index 66ed5fe1b718..3b3b4156f418 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -130,13 +130,15 @@ static inline int cpu_to_coregroup_id(int cpu)
#ifdef CONFIG_SMP
#include <asm/cputable.h>
+#include <asm/smp.h>
struct cpumask *cpu_coregroup_mask(int cpu);
const struct cpumask *cpu_die_mask(int cpu);
int cpu_die_id(int cpu);
+#define arch_llc_mask(cpu) cpu_l2_cache_mask(cpu)
+
#ifdef CONFIG_PPC64
-#include <asm/smp.h>
#define topology_physical_package_id(cpu) (cpu_to_chip_id(cpu))
#define topology_sibling_cpumask(cpu)
(per_cpu(cpu_sibling_map, cpu))
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index df2ceb54c970..6772eb0ce493 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2063,12 +2063,18 @@ const struct cpumask *tl_mc_mask(struct
sched_domain_topology_level *tl, int cpu
return cpu_coregroup_mask(cpu);
}
-#define llc_mask(cpu) cpu_coregroup_mask(cpu)
+#ifndef arch_llc_mask
+#define arch_llc_mask(cpu) cpu_coregroup_mask(cpu)
+#endif
#else
-#define llc_mask(cpu) cpumask_of(cpu)
+#ifndef arch_llc_mask
+#define arch_llc_mask(cpu) cpumask_of(cpu)
+#endif
#endif
+#define llc_mask(cpu) arch_llc_mask(cpu)
+
const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level
*tl, int cpu)
{
return cpu_node_mask(cpu);
thanks,
Chenyu
>> +}
>> +
>> static int __init init_big_cores(void)
>> {
>> int cpu;
>
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-26 3:14 ` Srikar Dronamraju
@ 2026-05-26 4:08 ` Chen, Yu C
2026-05-26 4:58 ` Srikar Dronamraju
2026-05-26 5:24 ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
0 siblings, 2 replies; 31+ messages in thread
From: Chen, Yu C @ 2026-05-26 4:08 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Venkat Rao Bagalkote, Madhavan Srinivasan, Shrikanth Hegde,
Ritesh Harjani, Christophe Leroy (CS GROUP), LKML, linuxppc-dev,
linux-sched, tim.c.chen, K Prateek Nayak, Peter Zijlstra
Hi Venkat,
On 5/26/2026 11:14 AM, Srikar Dronamraju wrote:
> * Chen, Yu C <yu.c.chen@intel.com> [2026-05-25 23:35:45]:
>
>> Hi Venkat,
>>
>> On 5/25/2026 10:07 PM, Venkat Rao Bagalkote wrote:
>>> Greetings!!!
>>>
>>> I am seeing an early boot kernel panic due to NULL pointer dereference
>>> on a POWER9 (pSeries) system when testing linux-next (next-20260522).
>>
>> It seems that cpumask_first(llc_mask(i)) is accessing
>> NULL cpu_coregroup_mask():
>
>> has_coregroup_support() is false, thus cpu_coregroup_map
>> is never allocated in smp_prepare_cpus().
>> This machine is a "shared system" VM. We should probably
>> let the LLC id generation fall back to using L2 id if
>> cpu_coregroup_mask is unavailable (which restores the
>> behavior before this patch). I'm wondering if the following
>> change would help(need IBM friends' help on this):
>
> Power9 and below systems, dont have coregroup.
> Its not because of shared LPAR. But its true for dedicated LPARs too.
> Only Power10 and above systems have hemisphere where we add MC/coregroup
> support.
>
OK, thanks for the correction. Are you saying coregroup_enabled is false
on Power9 and older hardware, and set to true on Power10? Power10 has a
corresponding device-tree property, which is parsed to enable hemisphere
support in find_possible_nodes(). This is why has_coregroup_support()
returns true for Power10.
>>
>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> index 3467f86fd78f..cf6c2e4190ab 100644
>> --- a/arch/powerpc/kernel/smp.c
>> +++ b/arch/powerpc/kernel/smp.c
>> @@ -1042,11 +1042,6 @@ static const struct cpumask
>> *tl_smallcore_smt_mask(struct sched_domain_topology_
>> }
>> #endif
>>
>> -struct cpumask *cpu_coregroup_mask(int cpu)
>> -{
>> - return per_cpu(cpu_coregroup_map, cpu);
>> -}
>> -
>> static bool has_coregroup_support(void)
>> {
>> /* Coregroup identification not available on shared systems */
>> @@ -1056,6 +1051,14 @@ static bool has_coregroup_support(void)
>> return coregroup_enabled;
>> }
>>
>> +struct cpumask *cpu_coregroup_mask(int cpu)
>> +{
>> + if (!has_coregroup_support())
>> + return cpu_l2_cache_mask(cpu);
>> +
>> + return per_cpu(cpu_coregroup_map, cpu);
>> +}
>> +
>
> While this is a work-around for the problem in Power9
> It will hurt Power10 and Power11 systems.
> As has been alluded by Prateek, MC is not LLC on Power.
Could you please elaborate on the cache topology?
Specifically, could you clarify what the LLC is for Power9
and Power10 respectively? Is it always the L2 cache?
I have checked the IBM documentation available at:
https://hc32.hotchips.org/assets/program/conference/day1/HotChips2020_Server_Processors_IBM_Starke_POWER10_v33.pdf
According to the document, a hemisphere corresponds to a 64MB
L3 cache shared by 8 cores. Since the MC domain spans a single
hemisphere, I wonder why the SD_SHARE_LLC flag is not enabled
for the MC domain?
> So by using llc_mask as cpu_coregroup_mask() we run the trouble of assuming
> MC to be similar to LLC. So it will impact Power 10/11 Systems.
>
> In commit b5ea300a17e3 sched/cache: Make LLC id continuous, we define
> #define llc_mask(cpu) cpu_coregroup_mask(cpu)
>
> defining it llc_mask to cpu_coregroup_mask means MC should be LLC.
> This is not true for some architectures atleast on Power.
>
OK.
> So shouldn't it be using
> #define llc_mask(cpu) per_cpu(sd_llc, cpu)
>
> This should work for systems where LLC is sub-coregroup, coregroup (or super
> coregroup: Lets say some archs want LLC at PKG and cluster at coregroup).
>
> if we do that, I dont think we even need the else case where we say
> #define llc_mask(cpu) cpumask_of(cpu)
>
I suppose you are referring to
sched_domain_span(per_cpu(sd_llc, cpu)).
Indeed, deriving the LLC from the SD_SHARE_LLC level offers
better scalability. However, this approach would involve scheduler
domains, which can be truncated by cpuset partitions - a scenario we
prefer to avoid.
thanks,
Chenyu
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-26 4:08 ` Chen, Yu C
@ 2026-05-26 4:58 ` Srikar Dronamraju
2026-05-26 5:53 ` K Prateek Nayak
2026-05-26 5:24 ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
1 sibling, 1 reply; 31+ messages in thread
From: Srikar Dronamraju @ 2026-05-26 4:58 UTC (permalink / raw)
To: Chen, Yu C
Cc: Venkat Rao Bagalkote, Madhavan Srinivasan, Shrikanth Hegde,
Ritesh Harjani, Christophe Leroy (CS GROUP), LKML, linuxppc-dev,
linux-sched, tim.c.chen, K Prateek Nayak, Peter Zijlstra
Hi Chen,
> > > It seems that cpumask_first(llc_mask(i)) is accessing
> > > NULL cpu_coregroup_mask():
> >
> > > has_coregroup_support() is false, thus cpu_coregroup_map
> > > is never allocated in smp_prepare_cpus().
> > > This machine is a "shared system" VM. We should probably
> > > let the LLC id generation fall back to using L2 id if
> > > cpu_coregroup_mask is unavailable (which restores the
> > > behavior before this patch). I'm wondering if the following
> > > change would help(need IBM friends' help on this):
> >
> > Power9 and below systems, dont have coregroup.
> > Its not because of shared LPAR. But its true for dedicated LPARs too.
> > Only Power10 and above systems have hemisphere where we add MC/coregroup
> > support.
> >
>
> OK, thanks for the correction. Are you saying coregroup_enabled is false
> on Power9 and older hardware, and set to true on Power10? Power10 has a
> corresponding device-tree property, which is parsed to enable hemisphere
> support in find_possible_nodes(). This is why has_coregroup_support()
> returns true for Power10.
>
Yes, Chen,
coregroup_enabled is true only on Power 10 +
Yes we decipher coregroup from the device-tree properties.
> > > +struct cpumask *cpu_coregroup_mask(int cpu)
> > > +{
> > > + if (!has_coregroup_support())
> > > + return cpu_l2_cache_mask(cpu);
> > > +
> > > + return per_cpu(cpu_coregroup_map, cpu);
> > > +}
> > > +
> >
> > While this is a work-around for the problem in Power9
> > It will hurt Power10 and Power11 systems.
> > As has been alluded by Prateek, MC is not LLC on Power.
>
> Could you please elaborate on the cache topology?
> Specifically, could you clarify what the LLC is for Power9
> and Power10 respectively? Is it always the L2 cache?
>
> I have checked the IBM documentation available at:
> https://hc32.hotchips.org/assets/program/conference/day1/HotChips2020_Server_Processors_IBM_Starke_POWER10_v33.pdf
> According to the document, a hemisphere corresponds to a 64MB
> L3 cache shared by 8 cores. Since the MC domain spans a single
> hemisphere, I wonder why the SD_SHARE_LLC flag is not enabled
> for the MC domain?
If we look at the presentation you pointed above, L2 is 2Mb per SMT8 Core.
L3 is local 8MB per SMT8 core which together form a 64MB l3-buffer per
hemisphere. L3 is a Victim cache and All L3 together form a L3.1 buffer.
In practice, we split the cache per small core aka SMT4 core. So we have
1Mb L2 per SMT4 core, 4Mb L3 per SMT4 Core. L3 is a Victim cache and All L3
combine to form L3.1 buffer. Hence for now we still consider L2 to be LLC.
On Power9, L2 is at CACHE domain
On all other Power Systems (P7,P8, P10, P11), L2 is at SMT domain.
On Power, We haven taken L2 as LLC.
lscpu (on Power 10)
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 480
On-line CPU(s) list: 0-479
Thread(s) per core: 8
Core(s) per socket: 15
Socket(s): 4
NUMA node(s): 4
Model: 2.0 (pvr 0080 0200)
Model name: POWER10, altivec supported
CPU max MHz: 3249.0000
CPU min MHz: 3249.0000
L1d cache: 32K
L1i cache: 48K
L2 cache: 1024K
L3 cache: 4096K
NUMA node0 CPU(s): 0-119
NUMA node1 CPU(s): 120-239
NUMA node2 CPU(s): 240-359
NUMA node3 CPU(s): 360-479
L2 Cache reported here is for SMT4 Core.
lscpu (on Power 9)
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 8
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Model: 2.2 (pvr 004e 0202)
Model name: POWER9 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 10240K
NUMA node0 CPU(s): 0-63
NUMA node1 CPU(s): 64-127
Physical sockets: 2
Physical chips: 1
Physical cores/chip: 8
L2 Cache reported here is for SMT8 Core aka CACHE domain.
>
> > So by using llc_mask as cpu_coregroup_mask() we run the trouble of assuming
> > MC to be similar to LLC. So it will impact Power 10/11 Systems.
> >
> > In commit b5ea300a17e3 sched/cache: Make LLC id continuous, we define
> > #define llc_mask(cpu) cpu_coregroup_mask(cpu)
> >
> > defining it llc_mask to cpu_coregroup_mask means MC should be LLC.
> > This is not true for some architectures atleast on Power.
> >
>
> OK.
>
> > So shouldn't it be using
> > #define llc_mask(cpu) per_cpu(sd_llc, cpu)
> >
> > This should work for systems where LLC is sub-coregroup, coregroup (or super
> > coregroup: Lets say some archs want LLC at PKG and cluster at coregroup).
> >
> > if we do that, I dont think we even need the else case where we say
> > #define llc_mask(cpu) cpumask_of(cpu)
> >
>
> I suppose you are referring to
> sched_domain_span(per_cpu(sd_llc, cpu)).
>
> Indeed, deriving the LLC from the SD_SHARE_LLC level offers
> better scalability. However, this approach would involve scheduler
> domains, which can be truncated by cpuset partitions - a scenario we
> prefer to avoid.
>
Shouldnt cache-aware scheduling be worried about cpuset partitions too.
If a cpuset has subset of LLC cores, then we should Scheduler assume it can
control complete LLC?
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-26 4:08 ` Chen, Yu C
2026-05-26 4:58 ` Srikar Dronamraju
@ 2026-05-26 5:24 ` Venkat Rao Bagalkote
2026-05-27 7:05 ` Shrikanth Hegde
1 sibling, 1 reply; 31+ messages in thread
From: Venkat Rao Bagalkote @ 2026-05-26 5:24 UTC (permalink / raw)
To: Chen, Yu C, Srikar Dronamraju
Cc: Madhavan Srinivasan, Shrikanth Hegde, Ritesh Harjani,
Christophe Leroy (CS GROUP), LKML, linuxppc-dev, linux-sched,
tim.c.chen, K Prateek Nayak, Peter Zijlstra
On 26/05/26 9:38 am, Chen, Yu C wrote:
> Hi Venkat,
>
> On 5/26/2026 11:14 AM, Srikar Dronamraju wrote:
>> * Chen, Yu C <yu.c.chen@intel.com> [2026-05-25 23:35:45]:
>>
>>> Hi Venkat,
>>>
>>> On 5/25/2026 10:07 PM, Venkat Rao Bagalkote wrote:
>>>> Greetings!!!
>>>>
>>>> I am seeing an early boot kernel panic due to NULL pointer dereference
>>>> on a POWER9 (pSeries) system when testing linux-next (next-20260522).
This issue is seen on P11 as well.
[ 0.006697] smp: Brought up 1 node, 16 CPUs
[ 0.006702] Big cores detected but using small core scheduling
[ 0.006752] BUG: Kernel NULL pointer dereference on read at 0x00000000
[ 0.006755] Faulting instruction address: 0xc000000020adbb6c
[ 0.006759] Oops: Kernel access of bad area, sig: 7 [#1]
[ 0.006762] LE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
[ 0.006767] Modules linked in:
[ 0.006772] CPU: 4 UID: 0 PID: 1 Comm: swapper/4 Not tainted
7.1.0-rc5-next-20260525 #1 PREEMPT(lazy)
[ 0.006777] Hardware name: IBM,9080-HEX Power11 (architected)
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[ 0.006781] NIP: c000000020adbb6c LR: c0000000202e5a58 CTR:
0000000000000000
[ 0.006784] REGS: c0000000283d7890 TRAP: 0300 Not tainted
(7.1.0-rc5-next-20260525)
[ 0.006788] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR:
44002242 XER: 20040003
[ 0.006796] CFAR: c0000000202e5a54 DAR: 0000000000000000 DSISR:
00080000 IRQMASK: 0
[ 0.006796] GPR00: 0000000000000000 c0000000283d7b50 c000000021abf100
0000000000000010
[ 0.006796] GPR04: 0000000000000010 0000000000000030 0000000000000000
c000000028365500
[ 0.006796] GPR08: 0000000000000000 c000000022213598 000000003b77d000
0000000000000000
[ 0.006796] GPR12: c00000002005d8f0 c000000000008000 c0000000283cb578
c0000000283cb400
[ 0.006796] GPR16: c0000000283c9000 c000000022218b20 c0000000222330e8
00000000ffffffff
[ 0.006796] GPR20: fffffffffffffff6 0000000000000000 c000000022da36e0
0000000000000000
[ 0.006796] GPR24: 0000000000000000 0000000000000000 c0000000283c9178
c0000000227b5f00
[ 0.006796] GPR28: c00000002831c1e8 c000000022db5980 0000000000000000
0000000000000000
[ 0.006835] NIP [c000000020adbb6c] _find_first_bit+0xc/0xc0
[ 0.006842] LR [c0000000202e5a58] build_sched_domains+0x7d8/0xb40
[ 0.006847] Call Trace:
[ 0.006849] [c0000000283d7b50] [c0000000202e5408]
build_sched_domains+0x188/0xb40 (unreliable)
[ 0.006854] [c0000000283d7c90] [c000000022034380]
sched_init_domains+0x118/0x168
[ 0.006860] [c0000000283d7ce0] [c000000022032b14]
sched_init_smp+0xa8/0x158
[ 0.006865] [c0000000283d7d30] [c000000022005674]
kernel_init_freeable+0x1ac/0x294
[ 0.006870] [c0000000283d7dd0] [c000000020011718] kernel_init+0x2c/0x1c4
[ 0.006874] [c0000000283d7e30] [c00000002000debc]
ret_from_kernel_user_thread+0x14/0x1c
[ 0.006878] ---- interrupt: 0 at 0x0
[ 0.006881] Code: eb610038 7fc3f378 eb810040 eba10048 38210060
ebc1fff0 ebe1fff8 7c0803a6 4e800020 7c681b78 7c832379 4d820020
<e9280000> 38e3ffff 39400000 78e7d7e2
[ 0.006895] ---[ end trace 0000000000000000 ]---
[ 0.006898]
Regards,
Venkat.
>>>
>>> It seems that cpumask_first(llc_mask(i)) is accessing
>>> NULL cpu_coregroup_mask():
>>
>>> has_coregroup_support() is false, thus cpu_coregroup_map
>>> is never allocated in smp_prepare_cpus().
>>> This machine is a "shared system" VM. We should probably
>>> let the LLC id generation fall back to using L2 id if
>>> cpu_coregroup_mask is unavailable (which restores the
>>> behavior before this patch). I'm wondering if the following
>>> change would help(need IBM friends' help on this):
>>
>> Power9 and below systems, dont have coregroup.
>> Its not because of shared LPAR. But its true for dedicated LPARs too.
>> Only Power10 and above systems have hemisphere where we add MC/coregroup
>> support.
>>
>
> OK, thanks for the correction. Are you saying coregroup_enabled is false
> on Power9 and older hardware, and set to true on Power10? Power10 has a
> corresponding device-tree property, which is parsed to enable hemisphere
> support in find_possible_nodes(). This is why has_coregroup_support()
> returns true for Power10.
>
>>>
>>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>>> index 3467f86fd78f..cf6c2e4190ab 100644
>>> --- a/arch/powerpc/kernel/smp.c
>>> +++ b/arch/powerpc/kernel/smp.c
>>> @@ -1042,11 +1042,6 @@ static const struct cpumask
>>> *tl_smallcore_smt_mask(struct sched_domain_topology_
>>> }
>>> #endif
>>>
>>> -struct cpumask *cpu_coregroup_mask(int cpu)
>>> -{
>>> - return per_cpu(cpu_coregroup_map, cpu);
>>> -}
>>> -
>>> static bool has_coregroup_support(void)
>>> {
>>> /* Coregroup identification not available on shared systems */
>>> @@ -1056,6 +1051,14 @@ static bool has_coregroup_support(void)
>>> return coregroup_enabled;
>>> }
>>>
>>> +struct cpumask *cpu_coregroup_mask(int cpu)
>>> +{
>>> + if (!has_coregroup_support())
>>> + return cpu_l2_cache_mask(cpu);
>>> +
>>> + return per_cpu(cpu_coregroup_map, cpu);
>>> +}
>>> +
>>
>> While this is a work-around for the problem in Power9
>> It will hurt Power10 and Power11 systems.
>> As has been alluded by Prateek, MC is not LLC on Power.
>
> Could you please elaborate on the cache topology?
> Specifically, could you clarify what the LLC is for Power9
> and Power10 respectively? Is it always the L2 cache?
>
> I have checked the IBM documentation available at:
> https://hc32.hotchips.org/assets/program/conference/day1/HotChips2020_Server_Processors_IBM_Starke_POWER10_v33.pdf
>
> According to the document, a hemisphere corresponds to a 64MB
> L3 cache shared by 8 cores. Since the MC domain spans a single
> hemisphere, I wonder why the SD_SHARE_LLC flag is not enabled
> for the MC domain?
>
>> So by using llc_mask as cpu_coregroup_mask() we run the trouble of
>> assuming
>> MC to be similar to LLC. So it will impact Power 10/11 Systems.
>>
>> In commit b5ea300a17e3 sched/cache: Make LLC id continuous, we define
>> #define llc_mask(cpu) cpu_coregroup_mask(cpu)
>>
>> defining it llc_mask to cpu_coregroup_mask means MC should be LLC.
>> This is not true for some architectures atleast on Power.
>>
>
> OK.
>
>> So shouldn't it be using
>> #define llc_mask(cpu) per_cpu(sd_llc, cpu)
>>
>> This should work for systems where LLC is sub-coregroup, coregroup
>> (or super
>> coregroup: Lets say some archs want LLC at PKG and cluster at
>> coregroup).
>>
>> if we do that, I dont think we even need the else case where we say
>> #define llc_mask(cpu) cpumask_of(cpu)
>>
>
> I suppose you are referring to
> sched_domain_span(per_cpu(sd_llc, cpu)).
>
> Indeed, deriving the LLC from the SD_SHARE_LLC level offers
> better scalability. However, this approach would involve scheduler
> domains, which can be truncated by cpuset partitions - a scenario we
> prefer to avoid.
>
> thanks,
> Chenyu
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-26 4:58 ` Srikar Dronamraju
@ 2026-05-26 5:53 ` K Prateek Nayak
2026-05-26 14:08 ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask Chen Yu
0 siblings, 1 reply; 31+ messages in thread
From: K Prateek Nayak @ 2026-05-26 5:53 UTC (permalink / raw)
To: Srikar Dronamraju, Chen, Yu C
Cc: Venkat Rao Bagalkote, Madhavan Srinivasan, Shrikanth Hegde,
Ritesh Harjani, Christophe Leroy (CS GROUP), LKML, linuxppc-dev,
linux-sched, tim.c.chen, Peter Zijlstra
Hello Srikar,
On 5/26/2026 10:28 AM, Srikar Dronamraju wrote:
> L2 Cache reported here is for SMT8 Core aka CACHE domain.
Apart for the scheduler, nothing in tree currently cares about
cpu_coregroup_mask() except for drivers/base/arch_topology.c but
Power doesn't select GENERIC_ARCH_TOPOLOGY.
Why can't Power have an internal mask for MC domain (tl_mc_mask) and
the scheduler can use cpu_coregroup_mask() for the actual LLc? (The L2
mask in this case.)
Power anyways adds its own topology via set_sched_topology() so the
default_topology from kernel/sched/topology.c remains unused.
...
> Shouldnt cache-aware scheduling be worried about cpuset partitions too.
> If a cpuset has subset of LLC cores, then we should Scheduler assume it can
> control complete LLC?
Well, the scheduling takes care of partitions and the cache aware
scheduling bits take care of looking at the full system perspective
for stats aggregation and pointing to a particular LLc.
We don't compare llc_id across cpusets so we keeping one unique llc_id
per H/W LLC instance is feasible and it enables us to keep llc_id space
limited for optimizing cache-aware scheduling.
Now if we have threads of same process across partitions, we'll
still aggregate the utilization numbers across the full LLC but
the load balancers at individual partitions will make a call on
the aggregation.
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask
2026-05-26 5:53 ` K Prateek Nayak
@ 2026-05-26 14:08 ` Chen Yu
2026-05-27 7:01 ` Shrikanth Hegde
0 siblings, 1 reply; 31+ messages in thread
From: Chen Yu @ 2026-05-26 14:08 UTC (permalink / raw)
To: kprateek.nayak
Cc: srikar, venkat88, maddy, sshegde, riteshh, chleroy, tim.c.chen,
peterz, linux-kernel, linuxppc-dev, linux-sched, Chen Yu
Hi Prateek,
On Tue, 26 May 2026 11:23:59 +0530, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> Hello Srikar,
>
> On 5/26/2026 10:28 AM, Srikar Dronamraju wrote:
> > L2 Cache reported here is for SMT8 Core aka CACHE domain.
>
> Apart for the scheduler, nothing in tree currently cares about
> cpu_coregroup_mask() except for drivers/base/arch_topology.c but
> Power doesn't select GENERIC_ARCH_TOPOLOGY.
>
> Why can't Power have an internal mask for MC domain (tl_mc_mask) and
> the scheduler can use cpu_coregroup_mask() for the actual LLc? (The L2
> mask in this case.)
>
> Power anyways adds its own topology via set_sched_topology() so the
> default_topology from kernel/sched/topology.c remains unused.
>
> ...
>
> > Shouldnt cache-aware scheduling be worried about cpuset partitions too.
> > If a cpuset has subset of LLC cores, then we should Scheduler assume it can
> > control complete LLC?
>
> Well, the scheduling takes care of partitions and the cache aware
> scheduling bits take care of looking at the full system perspective
> for stats aggregation and pointing to a particular LLc.
>
> We don't compare llc_id across cpusets so we keeping one unique llc_id
> per H/W LLC instance is feasible and it enables us to keep llc_id space
> limited for optimizing cache-aware scheduling.
>
> Now if we have threads of same process across partitions, we'll
> still aggregate the utilization numbers across the full LLC but
> the load balancers at individual partitions will make a call on
> the aggregation.
>
> --
> Thanks and Regards,
> Prateek
>
>
I suppose what you suggested looks like below:
powerpc/smp: make cpu_coregroup_mask() return the LLC
On pSeries shared LPARs(or coregroup_enabled is false on
Power9 and earlier) the hemisphere map is not allocated, so
build_sched_domains() dereferences a NULL cpumask and crashes.
The generic scheduler expects cpu_coregroup_mask() to span the LLC.
On powerpc the LLC is the L2. Return cpu_l2_cache_mask() instead of
the hemisphere map. Use a coregroup_map() helper for the in-file
hemisphere users, and a powerpc_tl_mc_mask() wrapper for the MC
sched-domain level.
Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
arch/powerpc/kernel/smp.c | 35 +++++++++++++++++++++++------------
1 file changed, 23 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1040,11 +1040,22 @@ static const struct cpumask *tl_smallcore_smt_mask(struct sched_domain_topology_
}
#endif
+static inline struct cpumask *coregroup_map(int cpu)
+{
+ return per_cpu(cpu_coregroup_map, cpu);
+}
+
struct cpumask *cpu_coregroup_mask(int cpu)
{
- return per_cpu(cpu_coregroup_map, cpu);
+ return cpu_l2_cache_mask(cpu);
+}
+
+static const struct cpumask *
+powerpc_tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+ return coregroup_map(cpu);
}
static bool has_coregroup_support(void)
{
if (is_shared_processor())
@@ -1155,7 +1166,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
if (has_coregroup_support())
- cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
+ cpumask_set_cpu(boot_cpuid, coregroup_map(boot_cpuid));
init_big_cores();
if (has_big_cores) {
@@ -1520,8 +1531,8 @@ static void remove_cpu_from_masks(int cpu)
set_cpus_unrelated(cpu, i, cpu_core_mask);
if (has_coregroup_support()) {
- for_each_cpu(i, cpu_coregroup_mask(cpu))
- set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
+ for_each_cpu(i, coregroup_map(cpu))
+ set_cpus_unrelated(cpu, i, coregroup_map);
}
}
#endif
@@ -1553,7 +1564,7 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
if (!*mask) {
/* Assume only siblings are part of this CPU's coregroup */
for_each_cpu(i, submask_fn(cpu))
- set_cpus_related(cpu, i, cpu_coregroup_mask);
+ set_cpus_related(cpu, i, coregroup_map);
return;
}
@@ -1561,18 +1572,18 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu));
/* Update coregroup mask with all the CPUs that are part of submask */
- or_cpumasks_related(cpu, cpu, submask_fn, cpu_coregroup_mask);
+ or_cpumasks_related(cpu, cpu, submask_fn, coregroup_map);
/* Skip all CPUs already part of coregroup mask */
- cpumask_andnot(*mask, *mask, cpu_coregroup_mask(cpu));
+ cpumask_andnot(*mask, *mask, coregroup_map(cpu));
for_each_cpu(i, *mask) {
/* Skip all CPUs not part of this coregroup */
if (coregroup_id == cpu_to_coregroup_id(i)) {
- or_cpumasks_related(cpu, i, submask_fn, cpu_coregroup_mask);
+ or_cpumasks_related(cpu, i, submask_fn, coregroup_map);
cpumask_andnot(*mask, *mask, submask_fn(i));
} else {
- cpumask_andnot(*mask, *mask, cpu_coregroup_mask(i));
+ cpumask_andnot(*mask, *mask, coregroup_map(i));
}
}
}
@@ -1733,7 +1744,7 @@ static void __init build_sched_topology(void)
if (has_coregroup_support()) {
powerpc_topology[i++] =
- SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
+ SDTL_INIT(powerpc_tl_mc_mask, powerpc_shared_proc_flags, MC);
}
powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask, powerpc_shared_proc_flags, PKG);
--
2.43.0
Thanks,
Yu
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask
2026-05-26 14:08 ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask Chen Yu
@ 2026-05-27 7:01 ` Shrikanth Hegde
2026-05-27 16:05 ` Chen, Yu C
2026-05-27 16:30 ` K Prateek Nayak
0 siblings, 2 replies; 31+ messages in thread
From: Shrikanth Hegde @ 2026-05-27 7:01 UTC (permalink / raw)
To: Chen Yu, kprateek.nayak
Cc: srikar, venkat88, maddy, riteshh, chleroy, tim.c.chen, peterz,
linux-kernel, linuxppc-dev, linux-sched
Hi Chen, Prateek.
I got back to work today, sorry for delay.
I am trying to go through the mails.
Apologies in case i have missed any bits.
On 5/26/26 7:38 PM, Chen Yu wrote:
> Hi Prateek,
>
> On Tue, 26 May 2026 11:23:59 +0530, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>> Hello Srikar,
>>
>> On 5/26/2026 10:28 AM, Srikar Dronamraju wrote:
>>> L2 Cache reported here is for SMT8 Core aka CACHE domain.
>>
>> Apart for the scheduler, nothing in tree currently cares about
>> cpu_coregroup_mask() except for drivers/base/arch_topology.c but
>> Power doesn't select GENERIC_ARCH_TOPOLOGY.
>>
>> Why can't Power have an internal mask for MC domain (tl_mc_mask) and
>> the scheduler can use cpu_coregroup_mask() for the actual LLc? (The L2
>> mask in this case.)
This seems wrong. there is no notion that coregroup_mask
(MC domain) has to point at LLC domain.
For example, on Shared LPAR, there is no MC domain and LLC is at SMT core level.
In that case coregroup_mask has point at SMT mask is wrong.
If we need a mask to point to the LLC mask which arch has to return, then we would
need a new api say cpu_llc_mask ? that can point accordingly.
I don't like mixing MC domain and LLC into one bit.
>>
>> Power anyways adds its own topology via set_sched_topology() so the
>> default_topology from kernel/sched/topology.c remains unused.
>>
>> ...
>>
>>> Shouldnt cache-aware scheduling be worried about cpuset partitions too.
>>> If a cpuset has subset of LLC cores, then we should Scheduler assume it can
>>> control complete LLC?
>>
>> Well, the scheduling takes care of partitions and the cache aware
>> scheduling bits take care of looking at the full system perspective
>> for stats aggregation and pointing to a particular LLc.
>>
>> We don't compare llc_id across cpusets so we keeping one unique llc_id
>> per H/W LLC instance is feasible and it enables us to keep llc_id space
>> limited for optimizing cache-aware scheduling.
>>
>> Now if we have threads of same process across partitions, we'll
>> still aggregate the utilization numbers across the full LLC but
>> the load balancers at individual partitions will make a call on
>> the aggregation.
>>
>> --
>> Thanks and Regards,
>> Prateek
>>
>>
>
> I suppose what you suggested looks like below:
>
> powerpc/smp: make cpu_coregroup_mask() return the LLC
>
> On pSeries shared LPARs(or coregroup_enabled is false on
> Power9 and earlier) the hemisphere map is not allocated, so
> build_sched_domains() dereferences a NULL cpumask and crashes.
>
> The generic scheduler expects cpu_coregroup_mask() to span the LLC.
> On powerpc the LLC is the L2. Return cpu_l2_cache_mask() instead of
> the hemisphere map. Use a coregroup_map() helper for the in-file
> hemisphere users, and a powerpc_tl_mc_mask() wrapper for the MC
> sched-domain level.
>
> Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> arch/powerpc/kernel/smp.c | 35 +++++++++++++++++++++++------------
> 1 file changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1040,11 +1040,22 @@ static const struct cpumask *tl_smallcore_smt_mask(struct sched_domain_topology_
> }
> #endif
>
> +static inline struct cpumask *coregroup_map(int cpu)
> +{
> + return per_cpu(cpu_coregroup_map, cpu);
> +}
> +
> struct cpumask *cpu_coregroup_mask(int cpu)
> {
> - return per_cpu(cpu_coregroup_map, cpu);
> + return cpu_l2_cache_mask(cpu);
> +}
This looks wrong to me too. In different hardware topologies
there maybe distinction between coregroup and l2 mask.
Let me go through the code and see if there is better way.
> +
> +static const struct cpumask *
> +powerpc_tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
> +{
> + return coregroup_map(cpu);
> }
>
> static bool has_coregroup_support(void)
> {
> if (is_shared_processor())
> @@ -1155,7 +1166,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
>
> if (has_coregroup_support())
> - cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
> + cpumask_set_cpu(boot_cpuid, coregroup_map(boot_cpuid));
>
> init_big_cores();
> if (has_big_cores) {
> @@ -1520,8 +1531,8 @@ static void remove_cpu_from_masks(int cpu)
> set_cpus_unrelated(cpu, i, cpu_core_mask);
>
> if (has_coregroup_support()) {
> - for_each_cpu(i, cpu_coregroup_mask(cpu))
> - set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
> + for_each_cpu(i, coregroup_map(cpu))
> + set_cpus_unrelated(cpu, i, coregroup_map);
> }
> }
> #endif
> @@ -1553,7 +1564,7 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
> if (!*mask) {
> /* Assume only siblings are part of this CPU's coregroup */
> for_each_cpu(i, submask_fn(cpu))
> - set_cpus_related(cpu, i, cpu_coregroup_mask);
> + set_cpus_related(cpu, i, coregroup_map);
>
> return;
> }
> @@ -1561,18 +1572,18 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
> cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu));
>
> /* Update coregroup mask with all the CPUs that are part of submask */
> - or_cpumasks_related(cpu, cpu, submask_fn, cpu_coregroup_mask);
> + or_cpumasks_related(cpu, cpu, submask_fn, coregroup_map);
>
> /* Skip all CPUs already part of coregroup mask */
> - cpumask_andnot(*mask, *mask, cpu_coregroup_mask(cpu));
> + cpumask_andnot(*mask, *mask, coregroup_map(cpu));
>
> for_each_cpu(i, *mask) {
> /* Skip all CPUs not part of this coregroup */
> if (coregroup_id == cpu_to_coregroup_id(i)) {
> - or_cpumasks_related(cpu, i, submask_fn, cpu_coregroup_mask);
> + or_cpumasks_related(cpu, i, submask_fn, coregroup_map);
> cpumask_andnot(*mask, *mask, submask_fn(i));
> } else {
> - cpumask_andnot(*mask, *mask, cpu_coregroup_mask(i));
> + cpumask_andnot(*mask, *mask, coregroup_map(i));
> }
> }
> }
> @@ -1733,7 +1744,7 @@ static void __init build_sched_topology(void)
>
> if (has_coregroup_support()) {
> powerpc_topology[i++] =
> - SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
I would prefer not do this rename. having tl_mc_mask helps to find the usage across
the codebase.
> + SDTL_INIT(powerpc_tl_mc_mask, powerpc_shared_proc_flags, MC);
> }
>
> powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask, powerpc_shared_proc_flags, PKG);
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-26 5:24 ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
@ 2026-05-27 7:05 ` Shrikanth Hegde
2026-05-28 16:01 ` Srikar Dronamraju
0 siblings, 1 reply; 31+ messages in thread
From: Shrikanth Hegde @ 2026-05-27 7:05 UTC (permalink / raw)
To: Venkat Rao Bagalkote
Cc: Madhavan Srinivasan, Ritesh Harjani, Christophe Leroy (CS GROUP),
LKML, linuxppc-dev, linux-sched, tim.c.chen, K Prateek Nayak,
Peter Zijlstra, Chen, Yu C, Srikar Dronamraju
On 5/26/26 10:54 AM, Venkat Rao Bagalkote wrote:
>
> On 26/05/26 9:38 am, Chen, Yu C wrote:
>> Hi Venkat,
>>
>> On 5/26/2026 11:14 AM, Srikar Dronamraju wrote:
>>> * Chen, Yu C <yu.c.chen@intel.com> [2026-05-25 23:35:45]:
>>>
>>>> Hi Venkat,
>>>>
>>>> On 5/25/2026 10:07 PM, Venkat Rao Bagalkote wrote:
>>>>> Greetings!!!
>>>>>
>>>>> I am seeing an early boot kernel panic due to NULL pointer dereference
>>>>> on a POWER9 (pSeries) system when testing linux-next (next-20260522).
>
>
> This issue is seen on P11 as well.
>
> [ 0.006697] smp: Brought up 1 node, 16 CPUs
> [ 0.006702] Big cores detected but using small core scheduling
> [ 0.006752] BUG: Kernel NULL pointer dereference on read at 0x00000000
> [ 0.006755] Faulting instruction address: 0xc000000020adbb6c
> [ 0.006759] Oops: Kernel access of bad area, sig: 7 [#1]
> [ 0.006762] LE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
> [ 0.006767] Modules linked in:
> [ 0.006772] CPU: 4 UID: 0 PID: 1 Comm: swapper/4 Not tainted 7.1.0-
> rc5-next-20260525 #1 PREEMPT(lazy)
> [ 0.006777] Hardware name: IBM,9080-HEX Power11 (architected)
> 0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
> [ 0.006781] NIP: c000000020adbb6c LR: c0000000202e5a58 CTR:
> 0000000000000000
> [ 0.006784] REGS: c0000000283d7890 TRAP: 0300 Not tainted (7.1.0-
> rc5-next-20260525)
> [ 0.006788] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR:
> 44002242 XER: 20040003
> [ 0.006796] CFAR: c0000000202e5a54 DAR: 0000000000000000 DSISR:
> 00080000 IRQMASK: 0
> [ 0.006796] GPR00: 0000000000000000 c0000000283d7b50 c000000021abf100
> 0000000000000010
> [ 0.006796] GPR04: 0000000000000010 0000000000000030 0000000000000000
> c000000028365500
> [ 0.006796] GPR08: 0000000000000000 c000000022213598 000000003b77d000
> 0000000000000000
> [ 0.006796] GPR12: c00000002005d8f0 c000000000008000 c0000000283cb578
> c0000000283cb400
> [ 0.006796] GPR16: c0000000283c9000 c000000022218b20 c0000000222330e8
> 00000000ffffffff
> [ 0.006796] GPR20: fffffffffffffff6 0000000000000000 c000000022da36e0
> 0000000000000000
> [ 0.006796] GPR24: 0000000000000000 0000000000000000 c0000000283c9178
> c0000000227b5f00
> [ 0.006796] GPR28: c00000002831c1e8 c000000022db5980 0000000000000000
> 0000000000000000
> [ 0.006835] NIP [c000000020adbb6c] _find_first_bit+0xc/0xc0
> [ 0.006842] LR [c0000000202e5a58] build_sched_domains+0x7d8/0xb40
> [ 0.006847] Call Trace:
> [ 0.006849] [c0000000283d7b50] [c0000000202e5408]
> build_sched_domains+0x188/0xb40 (unreliable)
> [ 0.006854] [c0000000283d7c90] [c000000022034380]
> sched_init_domains+0x118/0x168
> [ 0.006860] [c0000000283d7ce0] [c000000022032b14]
> sched_init_smp+0xa8/0x158
> [ 0.006865] [c0000000283d7d30] [c000000022005674]
> kernel_init_freeable+0x1ac/0x294
> [ 0.006870] [c0000000283d7dd0] [c000000020011718] kernel_init+0x2c/0x1c4
> [ 0.006874] [c0000000283d7e30] [c00000002000debc]
> ret_from_kernel_user_thread+0x14/0x1c
> [ 0.006878] ---- interrupt: 0 at 0x0
> [ 0.006881] Code: eb610038 7fc3f378 eb810040 eba10048 38210060
> ebc1fff0 ebe1fff8 7c0803a6 4e800020 7c681b78 7c832379 4d820020
> <e9280000> 38e3ffff 39400000 78e7d7e2
> [ 0.006895] ---[ end trace 0000000000000000 ]---
> [ 0.006898]
>
>
> Regards,
>
> Venkat.
>
Venkat,
Was it on P11 on Shared LPAR?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask
2026-05-27 7:01 ` Shrikanth Hegde
@ 2026-05-27 16:05 ` Chen, Yu C
2026-05-27 18:07 ` Shrikanth Hegde
2026-05-27 16:30 ` K Prateek Nayak
1 sibling, 1 reply; 31+ messages in thread
From: Chen, Yu C @ 2026-05-27 16:05 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: srikar, venkat88, maddy, riteshh, chleroy, tim.c.chen, peterz,
linux-kernel, linuxppc-dev, linux-sched, kprateek.nayak
Hi Shrikanth,
On 5/27/2026 3:01 PM, Shrikanth Hegde wrote:
> Hi Chen, Prateek.
>
> I got back to work today, sorry for delay.
> I am trying to go through the mails.
> Apologies in case i have missed any bits.
>
Thanks for taking a look at this!
> On 5/26/26 7:38 PM, Chen Yu wrote:
>> Hi Prateek,
>>
>> On Tue, 26 May 2026 11:23:59 +0530, K Prateek Nayak
>> <kprateek.nayak@amd.com> wrote:
>>> Hello Srikar,
>>>
>>> On 5/26/2026 10:28 AM, Srikar Dronamraju wrote:
>>>> L2 Cache reported here is for SMT8 Core aka CACHE domain.
>>>
>>> Apart for the scheduler, nothing in tree currently cares about
>>> cpu_coregroup_mask() except for drivers/base/arch_topology.c but
>>> Power doesn't select GENERIC_ARCH_TOPOLOGY.
>>>
>>> Why can't Power have an internal mask for MC domain (tl_mc_mask) and
>>> the scheduler can use cpu_coregroup_mask() for the actual LLc? (The L2
>>> mask in this case.)
>
> This seems wrong. there is no notion that coregroup_mask
> (MC domain) has to point at LLC domain.
>
> For example, on Shared LPAR, there is no MC domain and LLC is at SMT
> core level.
> In that case coregroup_mask has point at SMT mask is wrong.
>
On Shared LPAR, highest_flag_domain(SD_SHARE_LLC) selected the
SMT domain(L2 shared)prior to commit b5ea300a17e3.
Prateek suggested changing cpu_coregroup_mask() to use
cpu_l2_cache_mask(), which makes the LLC mask cover the same range.
sd_llc, size and grouping remain unchanged. Only sd_llc_id becomes
contiguous, which aligns with the intent of this commit.
But yes, the naming is confusing. cpu_coregroup_mask suggests a
"group of cores", but after the change, it only covers threads
within a single SMT core.
> If we need a mask to point to the LLC mask which arch has to return,
> then we would
> need a new api say cpu_llc_mask ? that can point accordingly.
>
Do you mean something like this?
https://lore.kernel.org/lkml/8d14c844-b4a8-4af6-acab-2cfdd42225be@intel.com/
> I don't like mixing MC domain and LLC into one bit.
>
[ ... ]
>> struct cpumask *cpu_coregroup_mask(int cpu)
>> {
>> - return per_cpu(cpu_coregroup_map, cpu);
>> + return cpu_l2_cache_mask(cpu);
>> +}
>
> This looks wrong to me too. In different hardware topologies
> there maybe distinction between coregroup and l2 mask.
>
> Let me go through the code and see if there is better way.
>
Sure, please go ahead - I'm on board with the direction
you settle on.
thanks,
Chenyu
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask
2026-05-27 7:01 ` Shrikanth Hegde
2026-05-27 16:05 ` Chen, Yu C
@ 2026-05-27 16:30 ` K Prateek Nayak
1 sibling, 0 replies; 31+ messages in thread
From: K Prateek Nayak @ 2026-05-27 16:30 UTC (permalink / raw)
To: Shrikanth Hegde, Chen Yu
Cc: srikar, venkat88, maddy, riteshh, chleroy, tim.c.chen, peterz,
linux-kernel, linuxppc-dev, linux-sched
Hello Shrikanth,
On 5/27/2026 12:31 PM, Shrikanth Hegde wrote:
> Hi Chen, Prateek.
>
> I got back to work today, sorry for delay.
> I am trying to go through the mails.
> Apologies in case i have missed any bits.
>
> On 5/26/26 7:38 PM, Chen Yu wrote:
>> Hi Prateek,
>>
>> On Tue, 26 May 2026 11:23:59 +0530, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>>> Hello Srikar,
>>>
>>> On 5/26/2026 10:28 AM, Srikar Dronamraju wrote:
>>>> L2 Cache reported here is for SMT8 Core aka CACHE domain.
>>>
>>> Apart for the scheduler, nothing in tree currently cares about
>>> cpu_coregroup_mask() except for drivers/base/arch_topology.c but
>>> Power doesn't select GENERIC_ARCH_TOPOLOGY.
>>>
>>> Why can't Power have an internal mask for MC domain (tl_mc_mask) and
>>> the scheduler can use cpu_coregroup_mask() for the actual LLc? (The L2
>>> mask in this case.)
>
> This seems wrong. there is no notion that coregroup_mask
> (MC domain) has to point at LLC domain.
It seems that only PowerPC is special at that. Only 3 architectures
override the default topology via set_sched_topology() - x86, and
s390 both still have SD_SHARE_LLC set for their MC domain, as is the
case with default_topology[] in topology.c with cpu_core_flags().
> For example, on Shared LPAR, there is no MC domain and LLC is at SMT core level.
> In that case coregroup_mask has point at SMT mask is wrong.
That is equivalent of MC degenerating onto the core domain right?
cpu_coregroup_mask() pointing to a core shouldn't be problematic
in that case.
> If we need a mask to point to the LLC mask which arch has to return, then we would
> need a new api say cpu_llc_mask ? that can point accordingly.
>
> I don't like mixing MC domain and LLC into one bit.
The SCHED_CACHE bits assume cpu_coregroup_mask() points to the sd_llc
domain and uses the sched_cpu_activate() path to assign the llc_id
independent of partitions and sched domain bits.
That assumption holds true for everything except powerpc. Is there
anything aside from the scheduler bits that use the
cpu_coregroup_mask()? We can always keep a big fat comment on top that
reads it points to the sd_llc domain and it may not be the MC domain
on power.
>> I suppose what you suggested looks like below:
Hello Chenyu! Yes, this was pretty much what I had in mind! Thank you
for the patch.
>>
>> powerpc/smp: make cpu_coregroup_mask() return the LLC
>>
>> On pSeries shared LPARs(or coregroup_enabled is false on
>> Power9 and earlier) the hemisphere map is not allocated, so
>> build_sched_domains() dereferences a NULL cpumask and crashes.
>>
>> The generic scheduler expects cpu_coregroup_mask() to span the LLC.
>> On powerpc the LLC is the L2. Return cpu_l2_cache_mask() instead of
>> the hemisphere map. Use a coregroup_map() helper for the in-file
>> hemisphere users, and a powerpc_tl_mc_mask() wrapper for the MC
>> sched-domain level.
>>
>> Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>> Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
>> ---
>> arch/powerpc/kernel/smp.c | 35 +++++++++++++++++++++++------------
>> 1 file changed, 23 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> --- a/arch/powerpc/kernel/smp.c
>> +++ b/arch/powerpc/kernel/smp.c
>> @@ -1040,11 +1040,22 @@ static const struct cpumask *tl_smallcore_smt_mask(struct sched_domain_topology_
>> }
>> #endif
>> +static inline struct cpumask *coregroup_map(int cpu)
>> +{
>> + return per_cpu(cpu_coregroup_map, cpu);
>> +}
>> +
>> struct cpumask *cpu_coregroup_mask(int cpu)
>> {
>> - return per_cpu(cpu_coregroup_map, cpu);
>> + return cpu_l2_cache_mask(cpu);
>> +}
>
> This looks wrong to me too. In different hardware topologies
> there maybe distinction between coregroup and l2 mask.
>
> Let me go through the code and see if there is better way.
The other option was to add an arch_llc_mask macro that can be
optionally defined on the arch/ side if the cpu_coregroup_mask()
doesn't point to the LLC
https://lore.kernel.org/lkml/8d14c844-b4a8-4af6-acab-2cfdd42225be@intel.com/
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask
2026-05-27 16:05 ` Chen, Yu C
@ 2026-05-27 18:07 ` Shrikanth Hegde
2026-05-28 4:58 ` Shrikanth Hegde
2026-05-28 15:58 ` Srikar Dronamraju
0 siblings, 2 replies; 31+ messages in thread
From: Shrikanth Hegde @ 2026-05-27 18:07 UTC (permalink / raw)
To: Chen, Yu C, kprateek.nayak
Cc: srikar, venkat88, maddy, riteshh, chleroy, tim.c.chen, peterz,
linux-kernel, linuxppc-dev, linux-sched
Hi Chen, Prateek.
On 5/27/26 9:35 PM, Chen, Yu C wrote:
> Hi Shrikanth,
>
> On 5/27/2026 3:01 PM, Shrikanth Hegde wrote:
>> Hi Chen, Prateek.
>>
>> I got back to work today, sorry for delay.
>> I am trying to go through the mails.
>> Apologies in case i have missed any bits.
>>
>
> Thanks for taking a look at this!
>
>> On 5/26/26 7:38 PM, Chen Yu wrote:
>>> Hi Prateek,
>>>
>>> On Tue, 26 May 2026 11:23:59 +0530, K Prateek Nayak
>>> <kprateek.nayak@amd.com> wrote:
>>>> Hello Srikar,
>>>>
>>>> On 5/26/2026 10:28 AM, Srikar Dronamraju wrote:
>>>>> L2 Cache reported here is for SMT8 Core aka CACHE domain.
>>>>
>>>> Apart for the scheduler, nothing in tree currently cares about
>>>> cpu_coregroup_mask() except for drivers/base/arch_topology.c but
>>>> Power doesn't select GENERIC_ARCH_TOPOLOGY.
>>>>
>>>> Why can't Power have an internal mask for MC domain (tl_mc_mask) and
>>>> the scheduler can use cpu_coregroup_mask() for the actual LLc? (The L2
>>>> mask in this case.)
>>
>> This seems wrong. there is no notion that coregroup_mask
>> (MC domain) has to point at LLC domain.
>>
>> For example, on Shared LPAR, there is no MC domain and LLC is at SMT
>> core level.
>> In that case coregroup_mask has point at SMT mask is wrong.
>>
>
> On Shared LPAR, highest_flag_domain(SD_SHARE_LLC) selected the
> SMT domain(L2 shared)prior to commit b5ea300a17e3.
> Prateek suggested changing cpu_coregroup_mask() to use
> cpu_l2_cache_mask(), which makes the LLC mask cover the same range.
> sd_llc, size and grouping remain unchanged. Only sd_llc_id becomes
> contiguous, which aligns with the intent of this commit.
>
> But yes, the naming is confusing. cpu_coregroup_mask suggests a
> "group of cores", but after the change, it only covers threads
> within a single SMT core.
>
Yes. Though it might achieve the same effect, keeping it explicit may
help in maintaining it better.
On PowerPC, there are these subtleties of Shared LPAR where MC domain
per se doesn't exit etc, and on power9 and earlier has different
topologies. Maybe one could figure it all out and simplifies them into
a few masks. But yes, it is slightly different as of today.
>> If we need a mask to point to the LLC mask which arch has to return,
>> then we would
>> need a new api say cpu_llc_mask ? that can point accordingly.
>>
>
> Do you mean something like this?
> https://lore.kernel.org/lkml/8d14c844-b4a8-4af6-
> acab-2cfdd42225be@intel.com/
Yes. This is something i prefer, but not to make it point to l2 mask
always. See below diff which now make it boot on Shared Processor LPAR
where i see panic without it.
Need to add proper comments where appropriate.
diff --git a/arch/powerpc/include/asm/topology.h
b/arch/powerpc/include/asm/topology.h
index 66ed5fe1b718..bd1db3b1dbb0 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -131,6 +131,9 @@ static inline int cpu_to_coregroup_id(int cpu)
#ifdef CONFIG_SMP
#include <asm/cputable.h>
+const struct cpumask *arch_llc_mask(int cpu);
+#define arch_llc_mask arch_llc_mask
+
struct cpumask *cpu_coregroup_mask(int cpu);
const struct cpumask *cpu_die_mask(int cpu);
int cpu_die_id(int cpu);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 3467f86fd78f..26c15c786c55 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1101,6 +1101,13 @@ const struct cpumask *cpu_die_mask(int cpu)
}
EXPORT_SYMBOL_GPL(cpu_die_mask);
+const struct cpumask *arch_llc_mask(int cpu)
+{
+ if (has_coregroup_support())
+ return cpu_coregroup_mask(cpu);
+ return cpu_smallcore_mask(cpu);
+}
+
int cpu_die_id(int cpu)
{
if (has_coregroup_support())
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index df2ceb54c970..3b5155121276 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2063,7 +2063,11 @@ const struct cpumask *tl_mc_mask(struct
sched_domain_topology_level *tl, int cpu
return cpu_coregroup_mask(cpu);
}
+#ifndef arch_llc_mask
#define llc_mask(cpu) cpu_coregroup_mask(cpu)
+#else
+#define llc_mask(cpu) arch_llc_mask(cpu)
+#endif
#else
#define llc_mask(cpu) cpumask_of(cpu)
(One more subtlety; crash would be seen only with NR_CPUS=8192 as
CPUMASK_OFFSTACK=y, but that's a different concern altogether.)
>
>> I don't like mixing MC domain and LLC into one bit.
>>
>
> [ ... ]
>
>>> struct cpumask *cpu_coregroup_mask(int cpu)
>>> {
>>> - return per_cpu(cpu_coregroup_map, cpu);
>>> + return cpu_l2_cache_mask(cpu);
>>> +}
>>
>> This looks wrong to me too. In different hardware topologies
>> there maybe distinction between coregroup and l2 mask.
>>
>> Let me go through the code and see if there is better way.
>>
>
> Sure, please go ahead - I'm on board with the direction
> you settle on.
>
> thanks,
> Chenyu
>
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask
2026-05-27 18:07 ` Shrikanth Hegde
@ 2026-05-28 4:58 ` Shrikanth Hegde
2026-05-28 9:12 ` Chen, Yu C
2026-05-28 15:58 ` Srikar Dronamraju
1 sibling, 1 reply; 31+ messages in thread
From: Shrikanth Hegde @ 2026-05-28 4:58 UTC (permalink / raw)
To: Chen, Yu C, kprateek.nayak
Cc: srikar, venkat88, maddy, riteshh, chleroy, tim.c.chen, peterz,
linux-kernel, linuxppc-dev, linux-sched
On 5/27/26 11:37 PM, Shrikanth Hegde wrote:
> Hi Chen, Prateek.
>
> On 5/27/26 9:35 PM, Chen, Yu C wrote:
>> Hi Shrikanth,
>>
>> On 5/27/2026 3:01 PM, Shrikanth Hegde wrote:
>>> Hi Chen, Prateek.
>>>
>>> I got back to work today, sorry for delay.
>>> I am trying to go through the mails.
>>> Apologies in case i have missed any bits.
>>>
>>
>> Thanks for taking a look at this!
>>
>>> On 5/26/26 7:38 PM, Chen Yu wrote:
>>>> Hi Prateek,
>>>>
>>>> On Tue, 26 May 2026 11:23:59 +0530, K Prateek Nayak
>>>> <kprateek.nayak@amd.com> wrote:
>>>>> Hello Srikar,
>>>>>
>>>>> On 5/26/2026 10:28 AM, Srikar Dronamraju wrote:
>>>>>> L2 Cache reported here is for SMT8 Core aka CACHE domain.
>>>>>
>>>>> Apart for the scheduler, nothing in tree currently cares about
>>>>> cpu_coregroup_mask() except for drivers/base/arch_topology.c but
>>>>> Power doesn't select GENERIC_ARCH_TOPOLOGY.
>>>>>
>>>>> Why can't Power have an internal mask for MC domain (tl_mc_mask) and
>>>>> the scheduler can use cpu_coregroup_mask() for the actual LLc? (The L2
>>>>> mask in this case.)
>>>
>>> This seems wrong. there is no notion that coregroup_mask
>>> (MC domain) has to point at LLC domain.
>>>
>>> For example, on Shared LPAR, there is no MC domain and LLC is at SMT
>>> core level.
>>> In that case coregroup_mask has point at SMT mask is wrong.
>>>
>>
>> On Shared LPAR, highest_flag_domain(SD_SHARE_LLC) selected the
>> SMT domain(L2 shared)prior to commit b5ea300a17e3.
>> Prateek suggested changing cpu_coregroup_mask() to use
>> cpu_l2_cache_mask(), which makes the LLC mask cover the same range.
>> sd_llc, size and grouping remain unchanged. Only sd_llc_id becomes
>> contiguous, which aligns with the intent of this commit.
>>
>> But yes, the naming is confusing. cpu_coregroup_mask suggests a
>> "group of cores", but after the change, it only covers threads
>> within a single SMT core.
>>
>
> Yes. Though it might achieve the same effect, keeping it explicit may
> help in maintaining it better.
>
> On PowerPC, there are these subtleties of Shared LPAR where MC domain
> per se doesn't exit etc, and on power9 and earlier has different
> topologies. Maybe one could figure it all out and simplifies them into
> a few masks. But yes, it is slightly different as of today.
>
>>> If we need a mask to point to the LLC mask which arch has to return,
>>> then we would
>>> need a new api say cpu_llc_mask ? that can point accordingly.
>>>
>>
>> Do you mean something like this?
>> https://lore.kernel.org/lkml/8d14c844-b4a8-4af6-
>> acab-2cfdd42225be@intel.com/
>
> Yes. This is something i prefer, but not to make it point to l2 mask
> always. See below diff which now make it boot on Shared Processor LPAR
> where i see panic without it.
>
> Need to add proper comments where appropriate.
>
> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/
> asm/topology.h
> index 66ed5fe1b718..bd1db3b1dbb0 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -131,6 +131,9 @@ static inline int cpu_to_coregroup_id(int cpu)
> #ifdef CONFIG_SMP
> #include <asm/cputable.h>
>
> +const struct cpumask *arch_llc_mask(int cpu);
> +#define arch_llc_mask arch_llc_mask
> +
> struct cpumask *cpu_coregroup_mask(int cpu);
> const struct cpumask *cpu_die_mask(int cpu);
> int cpu_die_id(int cpu);
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 3467f86fd78f..26c15c786c55 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1101,6 +1101,13 @@ const struct cpumask *cpu_die_mask(int cpu)
> }
> EXPORT_SYMBOL_GPL(cpu_die_mask);
>
> +const struct cpumask *arch_llc_mask(int cpu)
> +{
> + if (has_coregroup_support())
> + return cpu_coregroup_mask(cpu);
> + return cpu_smallcore_mask(cpu);
This function body needs change, since LLC is not at MC.
and I didn't account for power9.
Rest of the structure is what i would prefer the direction to go.
This will help future architectures too to account for their specific
needs.
What do you think?
> +}
> +
> int cpu_die_id(int cpu)
> {
> if (has_coregroup_support())
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index df2ceb54c970..3b5155121276 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2063,7 +2063,11 @@ const struct cpumask *tl_mc_mask(struct
> sched_domain_topology_level *tl, int cpu
> return cpu_coregroup_mask(cpu);
> }
>
> +#ifndef arch_llc_mask
> #define llc_mask(cpu) cpu_coregroup_mask(cpu)
> +#else
> +#define llc_mask(cpu) arch_llc_mask(cpu)
> +#endif
>
> #else
> #define llc_mask(cpu) cpumask_of(cpu)
>
> (One more subtlety; crash would be seen only with NR_CPUS=8192 as
> CPUMASK_OFFSTACK=y, but that's a different concern altogether.)
>
>
>>
>>> I don't like mixing MC domain and LLC into one bit.
>>>
>>
>> [ ... ]
>>
>>>> struct cpumask *cpu_coregroup_mask(int cpu)
>>>> {
>>>> - return per_cpu(cpu_coregroup_map, cpu);
>>>> + return cpu_l2_cache_mask(cpu);
>>>> +}
>>>
>>> This looks wrong to me too. In different hardware topologies
>>> there maybe distinction between coregroup and l2 mask.
>>>
>>> Let me go through the code and see if there is better way.
>>>
>>
>> Sure, please go ahead - I'm on board with the direction
>> you settle on.
>>
>> thanks,
>> Chenyu
>>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-25 14:07 [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
2026-05-25 15:35 ` Chen, Yu C
@ 2026-05-28 6:54 ` Ritesh Harjani
2026-05-28 16:06 ` Srikar Dronamraju
2026-05-28 11:27 ` Shrikanth Hegde
2026-05-29 3:58 ` Shrikanth Hegde
3 siblings, 1 reply; 31+ messages in thread
From: Ritesh Harjani @ 2026-05-28 6:54 UTC (permalink / raw)
To: Venkat Rao Bagalkote, Peter Zijlstra, K Prateek Nayak, Chen, Yu C,
tim.c.chen
Cc: Madhavan Srinivasan, Shrikanth Hegde, Ritesh Harjani,
Christophe Leroy (CS GROUP), LKML, linuxppc-dev, linux-sched
Venkat Rao Bagalkote <venkat88@linux.ibm.com> writes:
> Greetings!!!
>
> I am seeing an early boot kernel panic due to NULL pointer dereference
> on a POWER9 (pSeries) system when testing linux-next (next-20260522).
>
>
> [ 0.039029] NIP [c000000000e58504] _find_first_bit+0x44/0x130
> [ 0.039043] LR [c000000000e58500] _find_first_bit+0x40/0x130
> [ 0.039054] Call Trace:
> [ 0.039060] [c0000000090e7b80] [c00000000416af20]
> schedutil_gov+0x0/0xa0 (unreliable)
> [ 0.039076] [c0000000090e7bc0] [c00000000038b3b8]
> build_sched_domains+0xad8/0xe50
> [ 0.039089] [c0000000090e7ce0] [c000000003045d78]
> sched_init_smp+0xa8/0x164
> [ 0.039102] [c0000000090e7d30] [c00000000300f374]
> kernel_init_freeable+0x250/0x370
> [ 0.039117] [c0000000090e7de0] [c000000000011f90] kernel_init+0x34/0x1e4
> [ 0.039129] [c0000000090e7e50] [c00000000000debc]
> ret_from_kernel_user_thread+0x14/0x1c
> [ 0.039142] ---- interrupt: 0 at 0x0
> [ 0.039150] Code: 41820090 7c0802a6 393cffff fbe10038 7c7f1b78
> fba10028 fbc10030 3bc00000 793dd7e2 f8010050 4bae6e9d 60000000
> <e93f0000> 2c290000 408200bc 283c0040
> [ 0.039196] ---[ end trace 0000000000000000 ]---
>
>
Well, I am hitting this on 7.1.0-rc5-next-20260526-00010-gbfac43765a97
with Qemu Pseries TCG power10/11.
[ 0.342868][ T1] smp: Bringing up secondary CPUs ...
[ 0.342868][ T1] smp: Bringing up secondary CPUs ...
[ 0.525419][ T1] smp: Brought up 1 node, 4 CPUs
[ 0.525419][ T1] smp: Brought up 1 node, 4 CPUs
[ 0.527992][ T1] numa: Node 0 CPUs: 0-3
[ 0.527992][ T1] numa: Node 0 CPUs: 0-3
[ 0.552787][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
[ 0.552787][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
[ 0.557446][ T1] Faulting instruction address: 0xc000000000fe3f1c
[ 0.557446][ T1] Faulting instruction address: 0xc000000000fe3f1c
cpu 0x0: Vector: 300 (Data Access) at [c000000006607800]
pc: c000000000fe3f1c: _find_first_bit+0xc/0xc0
lr: c00000000027b7d8: build_sched_domains+0xbb4/0x1938
sp: c000000006607ac0
msr: 8000000002009033
dar: 0
dsisr: 80000
current = 0xc000000006f9fb00
paca = 0xc000000005670000 irqmask: 0x03 irq_happened: 0x09
pid = 1, comm = swapper/0
Linux version 7.1.0-rc5-next-20260526-00010-gbfac43765a97-dirty (powerpc64le-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #19 SMP PREEMPT Thu May 28 12:29:51 IST 2026
enter ? for help
[link register ] c00000000027b7d8 build_sched_domains+0xbb4/0x1938
[c000000006607ac0] c00000000027b0a8 build_sched_domains+0x484/0x1938 (unreliable)
[c000000006607c20] c000000004053180 sched_init_domains+0x114/0x1cc
[c000000006607c70] c0000000040515e0 sched_init_smp+0x5c/0x17c
[c000000006607cc0] c000000004012888 kernel_init_freeable+0x258/0x790
[c000000006607dc0] c000000000011f3c kernel_init+0x34/0x268
[c000000006607e30] c00000000000debc ret_from_kernel_user_thread+0x14/0x1c
---- Exception: 0 at 0000000000000000
-ritesh
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask
2026-05-28 4:58 ` Shrikanth Hegde
@ 2026-05-28 9:12 ` Chen, Yu C
2026-05-28 10:26 ` Shrikanth Hegde
2026-05-28 15:54 ` Srikar Dronamraju
0 siblings, 2 replies; 31+ messages in thread
From: Chen, Yu C @ 2026-05-28 9:12 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: srikar, venkat88, maddy, riteshh, chleroy, tim.c.chen, peterz,
linux-kernel, linuxppc-dev, linux-sched, kprateek.nayak
On 5/28/2026 12:58 PM, Shrikanth Hegde wrote:
>
>
> On 5/27/26 11:37 PM, Shrikanth Hegde wrote:
[ ... ]
>> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/
>> include/ asm/topology.h
>> index 66ed5fe1b718..bd1db3b1dbb0 100644
>> --- a/arch/powerpc/include/asm/topology.h
>> +++ b/arch/powerpc/include/asm/topology.h
>> @@ -131,6 +131,9 @@ static inline int cpu_to_coregroup_id(int cpu)
>> #ifdef CONFIG_SMP
>> #include <asm/cputable.h>
>>
>> +const struct cpumask *arch_llc_mask(int cpu);
>> +#define arch_llc_mask arch_llc_mask
>> +
>> struct cpumask *cpu_coregroup_mask(int cpu);
>> const struct cpumask *cpu_die_mask(int cpu);
>> int cpu_die_id(int cpu);
>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> index 3467f86fd78f..26c15c786c55 100644
>> --- a/arch/powerpc/kernel/smp.c
>> +++ b/arch/powerpc/kernel/smp.c
>> @@ -1101,6 +1101,13 @@ const struct cpumask *cpu_die_mask(int cpu)
>> }
>> EXPORT_SYMBOL_GPL(cpu_die_mask);
>>
>> +const struct cpumask *arch_llc_mask(int cpu)
>> +{
>> + if (has_coregroup_support())
>> + return cpu_coregroup_mask(cpu);
>> + return cpu_smallcore_mask(cpu);
>
>
> This function body needs change, since LLC is not at MC.
> and I didn't account for power9.
>
> Rest of the structure is what i would prefer the direction to go.
> This will help future architectures too to account for their specific
> needs.
>
> What do you think?
>
Yes this direction look good to me. Regarding the arch_llc_mask(),
how about the following per Srikar's description
const struct cpumask *arch_llc_mask(int cpu)
{
/* Power9, CACHE domain is the LLC*/ if (shared_caches)
return cpu_l2_cache_mask(cpu);
/* P7, P8, P10, P11, SMT domain is the LLC*/ return cpu_smt_mask(cpu);
}
thanks,
Chenyu
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask
2026-05-28 9:12 ` Chen, Yu C
@ 2026-05-28 10:26 ` Shrikanth Hegde
2026-05-28 15:54 ` Srikar Dronamraju
1 sibling, 0 replies; 31+ messages in thread
From: Shrikanth Hegde @ 2026-05-28 10:26 UTC (permalink / raw)
To: Chen, Yu C
Cc: srikar, venkat88, maddy, riteshh, chleroy, tim.c.chen, peterz,
linux-kernel, linuxppc-dev, linux-sched, kprateek.nayak
Hi Chenyu.
> Yes this direction look good to me. Regarding the arch_llc_mask(),
> how about the following per Srikar's description
>
> const struct cpumask *arch_llc_mask(int cpu)
> {
> /* Power9, CACHE domain is the LLC*/ if (shared_caches)
> return cpu_l2_cache_mask(cpu);
>
> /* P7, P8, P10, P11, SMT domain is the LLC*/ return
> cpu_smt_mask(cpu);
> }
>
> thanks,
> Chenyu
>
Yes. this is what i was thinking too. I have the patch, let me send it
across shortly. It is fixing the boot panic for me in SPLPAR. Hopefully
it should work for others too.
I will send it as reply to main thread so one doesn't miss it out.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-25 14:07 [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
2026-05-25 15:35 ` Chen, Yu C
2026-05-28 6:54 ` Ritesh Harjani
@ 2026-05-28 11:27 ` Shrikanth Hegde
2026-05-28 13:21 ` Chen, Yu C
` (2 more replies)
2026-05-29 3:58 ` Shrikanth Hegde
3 siblings, 3 replies; 31+ messages in thread
From: Shrikanth Hegde @ 2026-05-28 11:27 UTC (permalink / raw)
To: Venkat Rao Bagalkote, K Prateek Nayak, Chen, Yu C,
Srikar Dronamraju, Ritesh Harjani
Cc: Madhavan Srinivasan, Christophe Leroy (CS GROUP), LKML,
linuxppc-dev, Peter Zijlstra, tim.c.chen
On 5/25/26 7:37 PM, Venkat Rao Bagalkote wrote:
> Greetings!!!
>
> I am seeing an early boot kernel panic due to NULL pointer dereference
> on a POWER9 (pSeries) system when testing linux-next (next-20260522).
>
Hi Venkat, Ritesh,
Could you please try the below diff and see if it helps.
This helps to fix boot problem for SPLPAR for me.
Hi Chenyu,
Let me know if I have to send the patch. Or
if you want to add more comments or change it feel free to pick it up and send it.
Either way is fine. Let me know.
Hi Prateek, Srikar,
I hope the below diff makes sense. Please check.
nit: llc_mask is still under CONFIG_SCHED_MC, for ppc it is set to true
always for SMP systems, and for others it is LLC domain. So not a concern i guess.
---
From 10e9413cef063446d67dc02c2b44e1ea582e5d53 Mon Sep 17 00:00:00 2001
From: Shrikanth Hegde <sshegde@linux.ibm.com>
Date: Thu, 28 May 2026 06:16:44 -0400
Subject: [PATCH] topology: Provide arch_llc_mask for cache aware scheduling
Venkat Reported a boot kernel panic next-20260522. Git bisect pointed to
b5ea300a17e3 ("sched/cache: Make LLC id continuous")
Stacktrace points to llc_mask being null.
NIP [c000000000e58504] _find_first_bit+0x44/0x130
LR [c000000000e58500] _find_first_bit+0x40/0x130
Call Trace:
build_sched_domains+0xad8/0xe50
sched_init_smp+0xa8/0x164
kernel_init_freeable+0x250/0x370
ret_from_kernel_user_thread+0x14/0x1c
On powerpc, cpu_coregroup_mask is available only when the underlying
hardware support coregroup. In shared LPAR, QEMU guest or power9 etc
coregroup isn;t supported. In such cases llc_mask was being referrenced
when it was null leading to panic.
on powerpc, LLC is at SMT core level. So assumption that coregroup(MC)
domain point to LLC is wrong. Provide a way for archs to say where its
LLC is if it not at MC domain.
Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/51154de7-3700-4cb4-82f2-1b3a8fa427f7@linux.ibm.com/
Suggested-by: Chen, Yu C <yu.c.chen@intel.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
arch/powerpc/include/asm/topology.h | 3 +++
arch/powerpc/kernel/smp.c | 10 ++++++++++
kernel/sched/topology.c | 9 +++++++++
3 files changed, 22 insertions(+)
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 66ed5fe1b718..bd1db3b1dbb0 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -131,6 +131,9 @@ static inline int cpu_to_coregroup_id(int cpu)
#ifdef CONFIG_SMP
#include <asm/cputable.h>
+const struct cpumask *arch_llc_mask(int cpu);
+#define arch_llc_mask arch_llc_mask
+
struct cpumask *cpu_coregroup_mask(int cpu);
const struct cpumask *cpu_die_mask(int cpu);
int cpu_die_id(int cpu);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 3467f86fd78f..cc8e87d6cae9 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1101,6 +1101,16 @@ const struct cpumask *cpu_die_mask(int cpu)
}
EXPORT_SYMBOL_GPL(cpu_die_mask);
+const struct cpumask *arch_llc_mask(int cpu)
+{
+ /* Power9, CACHE domain is the LLC*/
+ if (shared_caches)
+ return cpu_l2_cache_mask(cpu);
+
+ /* For others, SMT domain is the LLC*/
+ return cpu_smt_mask(cpu);
+}
+
int cpu_die_id(int cpu)
{
if (has_coregroup_support())
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index df2ceb54c970..01af3d8f9eb9 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2063,7 +2063,16 @@ const struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu
return cpu_coregroup_mask(cpu);
}
+/*
+ * Majority of architectures have LLC at MC domain level with exception
+ * such as powerpc. Provide a way for arch to specify where its LLC is
+ * if it falls in exception category
+ */
+# ifndef arch_llc_mask
#define llc_mask(cpu) cpu_coregroup_mask(cpu)
+# else
+#define llc_mask(cpu) arch_llc_mask(cpu)
+# endif
#else
#define llc_mask(cpu) cpumask_of(cpu)
--
2.47.3
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-28 11:27 ` Shrikanth Hegde
@ 2026-05-28 13:21 ` Chen, Yu C
2026-05-28 15:06 ` Ritesh Harjani
2026-05-28 15:56 ` Srikar Dronamraju
2 siblings, 0 replies; 31+ messages in thread
From: Chen, Yu C @ 2026-05-28 13:21 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: Madhavan Srinivasan, Christophe Leroy (CS GROUP), LKML,
linuxppc-dev, tim.c.chen, Srikar Dronamraju, Peter Zijlstra,
K Prateek Nayak, Venkat Rao Bagalkote, Ritesh Harjani
Hi Shrikanth,
On 5/28/2026 7:27 PM, Shrikanth Hegde wrote:
>
>
> From 10e9413cef063446d67dc02c2b44e1ea582e5d53 Mon Sep 17 00:00:00 2001
> From: Shrikanth Hegde <sshegde@linux.ibm.com>
> Date: Thu, 28 May 2026 06:16:44 -0400
> Subject: [PATCH] topology: Provide arch_llc_mask for cache aware scheduling
>
> Venkat Reported a boot kernel panic next-20260522. Git bisect pointed to
> b5ea300a17e3 ("sched/cache: Make LLC id continuous")
>
> Stacktrace points to llc_mask being null.
>
> NIP [c000000000e58504] _find_first_bit+0x44/0x130
> LR [c000000000e58500] _find_first_bit+0x40/0x130
> Call Trace:
> build_sched_domains+0xad8/0xe50
> sched_init_smp+0xa8/0x164
> kernel_init_freeable+0x250/0x370
> ret_from_kernel_user_thread+0x14/0x1c
>
> On powerpc, cpu_coregroup_mask is available only when the underlying
> hardware support coregroup. In shared LPAR, QEMU guest or power9 etc
> coregroup isn;t supported. In such cases llc_mask was being referrenced
> when it was null leading to panic.
>
> on powerpc, LLC is at SMT core level. So assumption that coregroup(MC)
> domain point to LLC is wrong. Provide a way for archs to say where its
> LLC is if it not at MC domain.
>
> Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/
> all/51154de7-3700-4cb4-82f2-1b3a8fa427f7@linux.ibm.com/
> Suggested-by: Chen, Yu C <yu.c.chen@intel.com>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
It looks good to me. Let's see if this resolves the issue for Venkat
and Ritesh. Feel free to send the formal version :)
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
thanks,
Chenyu
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-28 11:27 ` Shrikanth Hegde
2026-05-28 13:21 ` Chen, Yu C
@ 2026-05-28 15:06 ` Ritesh Harjani
2026-05-28 15:56 ` Srikar Dronamraju
2 siblings, 0 replies; 31+ messages in thread
From: Ritesh Harjani @ 2026-05-28 15:06 UTC (permalink / raw)
To: Shrikanth Hegde, Venkat Rao Bagalkote, K Prateek Nayak,
Chen, Yu C, Srikar Dronamraju, Ritesh Harjani
Cc: Madhavan Srinivasan, Christophe Leroy (CS GROUP), LKML,
linuxppc-dev, Peter Zijlstra, tim.c.chen
Shrikanth Hegde <sshegde@linux.ibm.com> writes:
> On 5/25/26 7:37 PM, Venkat Rao Bagalkote wrote:
>> Greetings!!!
>>
>> I am seeing an early boot kernel panic due to NULL pointer dereference
>> on a POWER9 (pSeries) system when testing linux-next (next-20260522).
>>
>
> Hi Venkat, Ritesh,
> Could you please try the below diff and see if it helps.
> This helps to fix boot problem for SPLPAR for me.
>
> Hi Chenyu,
> Let me know if I have to send the patch. Or
> if you want to add more comments or change it feel free to pick it up and send it.
> Either way is fine. Let me know.
>
> Hi Prateek, Srikar,
> I hope the below diff makes sense. Please check.
>
> nit: llc_mask is still under CONFIG_SCHED_MC, for ppc it is set to true
> always for SMP systems, and for others it is LLC domain. So not a concern i guess.
> ---
>
> From 10e9413cef063446d67dc02c2b44e1ea582e5d53 Mon Sep 17 00:00:00 2001
> From: Shrikanth Hegde <sshegde@linux.ibm.com>
> Date: Thu, 28 May 2026 06:16:44 -0400
> Subject: [PATCH] topology: Provide arch_llc_mask for cache aware scheduling
>
> Venkat Reported a boot kernel panic next-20260522. Git bisect pointed to
> b5ea300a17e3 ("sched/cache: Make LLC id continuous")
>
> Stacktrace points to llc_mask being null.
>
> NIP [c000000000e58504] _find_first_bit+0x44/0x130
> LR [c000000000e58500] _find_first_bit+0x40/0x130
> Call Trace:
> build_sched_domains+0xad8/0xe50
> sched_init_smp+0xa8/0x164
> kernel_init_freeable+0x250/0x370
> ret_from_kernel_user_thread+0x14/0x1c
>
> On powerpc, cpu_coregroup_mask is available only when the underlying
> hardware support coregroup. In shared LPAR, QEMU guest or power9 etc
> coregroup isn;t supported. In such cases llc_mask was being referrenced
> when it was null leading to panic.
>
> on powerpc, LLC is at SMT core level. So assumption that coregroup(MC)
> domain point to LLC is wrong. Provide a way for archs to say where its
> LLC is if it not at MC domain.
>
> Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/all/51154de7-3700-4cb4-82f2-1b3a8fa427f7@linux.ibm.com/
> Suggested-by: Chen, Yu C <yu.c.chen@intel.com>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
> arch/powerpc/include/asm/topology.h | 3 +++
> arch/powerpc/kernel/smp.c | 10 ++++++++++
> kernel/sched/topology.c | 9 +++++++++
> 3 files changed, 22 insertions(+)
>
I had to apply this change manually - as it wasn't cleanly applicable
for me on linux-next. But since the changes were straight forward so it
wasn't a problem.
With this change I don't see the panic anymore, which I was observing
earlier.
Thanks for the quick fix. Feel free to add:
Tested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask
2026-05-28 9:12 ` Chen, Yu C
2026-05-28 10:26 ` Shrikanth Hegde
@ 2026-05-28 15:54 ` Srikar Dronamraju
1 sibling, 0 replies; 31+ messages in thread
From: Srikar Dronamraju @ 2026-05-28 15:54 UTC (permalink / raw)
To: Chen, Yu C
Cc: Shrikanth Hegde, venkat88, maddy, riteshh, chleroy, tim.c.chen,
peterz, linux-kernel, linuxppc-dev, linux-sched, kprateek.nayak
* Chen, Yu C <yu.c.chen@intel.com> [2026-05-28 17:12:41]:
> On 5/28/2026 12:58 PM, Shrikanth Hegde wrote:
> >
> >
> > On 5/27/26 11:37 PM, Shrikanth Hegde wrote:
>
> [ ... ]
>
> > > diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/
> > > include/ asm/topology.h
> > > index 66ed5fe1b718..bd1db3b1dbb0 100644
> > > --- a/arch/powerpc/include/asm/topology.h
> > > +++ b/arch/powerpc/include/asm/topology.h
> > > @@ -131,6 +131,9 @@ static inline int cpu_to_coregroup_id(int cpu)
> > > #ifdef CONFIG_SMP
> > > #include <asm/cputable.h>
> > >
> > > +const struct cpumask *arch_llc_mask(int cpu);
> > > +#define arch_llc_mask arch_llc_mask
> > > +
> > > struct cpumask *cpu_coregroup_mask(int cpu);
> > > const struct cpumask *cpu_die_mask(int cpu);
> > > int cpu_die_id(int cpu);
> > > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > > index 3467f86fd78f..26c15c786c55 100644
> > > --- a/arch/powerpc/kernel/smp.c
> > > +++ b/arch/powerpc/kernel/smp.c
> > > @@ -1101,6 +1101,13 @@ const struct cpumask *cpu_die_mask(int cpu)
> > > }
> > > EXPORT_SYMBOL_GPL(cpu_die_mask);
> > >
> > > +const struct cpumask *arch_llc_mask(int cpu)
> > > +{
> > > + if (has_coregroup_support())
> > > + return cpu_coregroup_mask(cpu);
> > > + return cpu_smallcore_mask(cpu);
> >
> >
> > This function body needs change, since LLC is not at MC.
> > and I didn't account for power9.
> >
> > Rest of the structure is what i would prefer the direction to go.
> > This will help future architectures too to account for their specific
> > needs.
> >
> > What do you think?
> >
>
> Yes this direction look good to me. Regarding the arch_llc_mask(),
> how about the following per Srikar's description
>
> const struct cpumask *arch_llc_mask(int cpu)
> {
> /* Power9, CACHE domain is the LLC*/ if (shared_caches)
> return cpu_l2_cache_mask(cpu);
>
> /* P7, P8, P10, P11, SMT domain is the LLC*/ return cpu_smt_mask(cpu);
On P7, P8, P10, P11, cpu_l2_cache_mask should be same as cpu_smt_mask
so I dont see a point of checking if(shared caches)
We can as well return
#define arch_llc_mask cpu_l2_cache_mask
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-28 11:27 ` Shrikanth Hegde
2026-05-28 13:21 ` Chen, Yu C
2026-05-28 15:06 ` Ritesh Harjani
@ 2026-05-28 15:56 ` Srikar Dronamraju
2026-05-28 16:31 ` Shrikanth Hegde
2 siblings, 1 reply; 31+ messages in thread
From: Srikar Dronamraju @ 2026-05-28 15:56 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: Venkat Rao Bagalkote, K Prateek Nayak, Chen, Yu C, Ritesh Harjani,
Madhavan Srinivasan, Christophe Leroy (CS GROUP), LKML,
linuxppc-dev, Peter Zijlstra, tim.c.chen
* Shrikanth Hegde <sshegde@linux.ibm.com> [2026-05-28 16:57:54]:
>
>
> On 5/25/26 7:37 PM, Venkat Rao Bagalkote wrote:
> > Greetings!!!
> >
> > I am seeing an early boot kernel panic due to NULL pointer dereference
> > on a POWER9 (pSeries) system when testing linux-next (next-20260522).
> >
>
> Hi Venkat, Ritesh,
> Could you please try the below diff and see if it helps.
> This helps to fix boot problem for SPLPAR for me.
>
> Hi Chenyu,
> Let me know if I have to send the patch. Or
> if you want to add more comments or change it feel free to pick it up and send it.
> Either way is fine. Let me know.
>
> Hi Prateek, Srikar,
> I hope the below diff makes sense. Please check.
>
> nit: llc_mask is still under CONFIG_SCHED_MC, for ppc it is set to true
> always for SMP systems, and for others it is LLC domain. So not a concern i guess.
> ---
>
> From 10e9413cef063446d67dc02c2b44e1ea582e5d53 Mon Sep 17 00:00:00 2001
> From: Shrikanth Hegde <sshegde@linux.ibm.com>
> Date: Thu, 28 May 2026 06:16:44 -0400
> Subject: [PATCH] topology: Provide arch_llc_mask for cache aware scheduling
>
> Venkat Reported a boot kernel panic next-20260522. Git bisect pointed to
> b5ea300a17e3 ("sched/cache: Make LLC id continuous")
>
> Stacktrace points to llc_mask being null.
>
> NIP [c000000000e58504] _find_first_bit+0x44/0x130
> LR [c000000000e58500] _find_first_bit+0x40/0x130
> Call Trace:
> build_sched_domains+0xad8/0xe50
> sched_init_smp+0xa8/0x164
> kernel_init_freeable+0x250/0x370
> ret_from_kernel_user_thread+0x14/0x1c
>
> On powerpc, cpu_coregroup_mask is available only when the underlying
> hardware support coregroup. In shared LPAR, QEMU guest or power9 etc
> coregroup isn;t supported. In such cases llc_mask was being referrenced
> when it was null leading to panic.
>
> on powerpc, LLC is at SMT core level. So assumption that coregroup(MC)
> domain point to LLC is wrong. Provide a way for archs to say where its
> LLC is if it not at MC domain.
>
> Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/all/51154de7-3700-4cb4-82f2-1b3a8fa427f7@linux.ibm.com/
> Suggested-by: Chen, Yu C <yu.c.chen@intel.com>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
> arch/powerpc/include/asm/topology.h | 3 +++
> arch/powerpc/kernel/smp.c | 10 ++++++++++
> kernel/sched/topology.c | 9 +++++++++
> 3 files changed, 22 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> index 66ed5fe1b718..bd1db3b1dbb0 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -131,6 +131,9 @@ static inline int cpu_to_coregroup_id(int cpu)
> #ifdef CONFIG_SMP
> #include <asm/cputable.h>
> +const struct cpumask *arch_llc_mask(int cpu);
> +#define arch_llc_mask arch_llc_mask
> +
> struct cpumask *cpu_coregroup_mask(int cpu);
> const struct cpumask *cpu_die_mask(int cpu);
> int cpu_die_id(int cpu);
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 3467f86fd78f..cc8e87d6cae9 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1101,6 +1101,16 @@ const struct cpumask *cpu_die_mask(int cpu)
> }
> EXPORT_SYMBOL_GPL(cpu_die_mask);
> +const struct cpumask *arch_llc_mask(int cpu)
> +{
> + /* Power9, CACHE domain is the LLC*/
> + if (shared_caches)
> + return cpu_l2_cache_mask(cpu);
> +
> + /* For others, SMT domain is the LLC*/
> + return cpu_smt_mask(cpu);
> +}
Why dont we do
#define arch_llc_mask cpu_l2_cache_mask
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask
2026-05-27 18:07 ` Shrikanth Hegde
2026-05-28 4:58 ` Shrikanth Hegde
@ 2026-05-28 15:58 ` Srikar Dronamraju
1 sibling, 0 replies; 31+ messages in thread
From: Srikar Dronamraju @ 2026-05-28 15:58 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: Chen, Yu C, kprateek.nayak, venkat88, maddy, riteshh, chleroy,
tim.c.chen, peterz, linux-kernel, linuxppc-dev, linux-sched
* Shrikanth Hegde <sshegde@linux.ibm.com> [2026-05-27 23:37:30]:
> Hi Chen, Prateek.
>
> EXPORT_SYMBOL_GPL(cpu_die_mask);
>
> +const struct cpumask *arch_llc_mask(int cpu)
> +{
> + if (has_coregroup_support())
> + return cpu_coregroup_mask(cpu);
> + return cpu_smallcore_mask(cpu);
> +}
> +
This is not correct. why should we again send coregroup for llc.
> int cpu_die_id(int cpu)
> {
> if (has_coregroup_support())
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index df2ceb54c970..3b5155121276 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2063,7 +2063,11 @@ const struct cpumask *tl_mc_mask(struct
> sched_domain_topology_level *tl, int cpu
> return cpu_coregroup_mask(cpu);
> }
>
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-27 7:05 ` Shrikanth Hegde
@ 2026-05-28 16:01 ` Srikar Dronamraju
0 siblings, 0 replies; 31+ messages in thread
From: Srikar Dronamraju @ 2026-05-28 16:01 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: Venkat Rao Bagalkote, Madhavan Srinivasan, Ritesh Harjani,
Christophe Leroy (CS GROUP), LKML, linuxppc-dev, linux-sched,
tim.c.chen, K Prateek Nayak, Peter Zijlstra, Chen, Yu C
* Shrikanth Hegde <sshegde@linux.ibm.com> [2026-05-27 12:35:20]:
> On 5/26/26 10:54 AM, Venkat Rao Bagalkote wrote:
> >
> > On 26/05/26 9:38 am, Chen, Yu C wrote:
> > > Hi Venkat,
> > >
> > > On 5/26/2026 11:14 AM, Srikar Dronamraju wrote:
> > > > * Chen, Yu C <yu.c.chen@intel.com> [2026-05-25 23:35:45]:
> > > >
> > > > > Hi Venkat,
> > > > >
> > > > > On 5/25/2026 10:07 PM, Venkat Rao Bagalkote wrote:
> > > > > > Greetings!!!
> > > > > >
> > > > > > I am seeing an early boot kernel panic due to NULL pointer dereference
> > > > > > on a POWER9 (pSeries) system when testing linux-next (next-20260522).
> >
> >
> > This issue is seen on P11 as well.
> >
> >
> > Regards,
> >
> > Venkat.
> >
>
> Venkat,
>
> Was it on P11 on Shared LPAR?
I also thought the same.
But I checked with Venkat, he said it was on a dedicated LPAR.
My multiple tries on dedicated, didnt reproduce the problem.
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-28 6:54 ` Ritesh Harjani
@ 2026-05-28 16:06 ` Srikar Dronamraju
0 siblings, 0 replies; 31+ messages in thread
From: Srikar Dronamraju @ 2026-05-28 16:06 UTC (permalink / raw)
To: Ritesh Harjani
Cc: Venkat Rao Bagalkote, Peter Zijlstra, K Prateek Nayak, Chen, Yu C,
tim.c.chen, Madhavan Srinivasan, Shrikanth Hegde, Ritesh Harjani,
Christophe Leroy (CS GROUP), LKML, linuxppc-dev, linux-sched
* Ritesh Harjani <ritesh.list@gmail.com> [2026-05-28 12:24:39]:
> Venkat Rao Bagalkote <venkat88@linux.ibm.com> writes:
>
> > Greetings!!!
> >
> > I am seeing an early boot kernel panic due to NULL pointer dereference
> > on a POWER9 (pSeries) system when testing linux-next (next-20260522).
> >
> >
> > [ 0.039029] NIP [c000000000e58504] _find_first_bit+0x44/0x130
> > [ 0.039043] LR [c000000000e58500] _find_first_bit+0x40/0x130
> > [ 0.039054] Call Trace:
> > [ 0.039060] [c0000000090e7b80] [c00000000416af20]
> > schedutil_gov+0x0/0xa0 (unreliable)
> > [ 0.039076] [c0000000090e7bc0] [c00000000038b3b8]
> > build_sched_domains+0xad8/0xe50
> > [ 0.039089] [c0000000090e7ce0] [c000000003045d78]
> > sched_init_smp+0xa8/0x164
> > [ 0.039102] [c0000000090e7d30] [c00000000300f374]
> > kernel_init_freeable+0x250/0x370
> > [ 0.039117] [c0000000090e7de0] [c000000000011f90] kernel_init+0x34/0x1e4
> > [ 0.039129] [c0000000090e7e50] [c00000000000debc]
> > ret_from_kernel_user_thread+0x14/0x1c
> > [ 0.039142] ---- interrupt: 0 at 0x0
> > [ 0.039150] Code: 41820090 7c0802a6 393cffff fbe10038 7c7f1b78
> > fba10028 fbc10030 3bc00000 793dd7e2 f8010050 4bae6e9d 60000000
> > <e93f0000> 2c290000 408200bc 283c0040
> > [ 0.039196] ---[ end trace 0000000000000000 ]---
> >
> >
>
> Well, I am hitting this on 7.1.0-rc5-next-20260526-00010-gbfac43765a97
> with Qemu Pseries TCG power10/11.
>
Qemu Pseries guest would be a shared lpar. So thats expected to be similar
to P9. We would not have a coregroup_map allocated.
> [ 0.342868][ T1] smp: Bringing up secondary CPUs ...
> [ 0.342868][ T1] smp: Bringing up secondary CPUs ...
> [ 0.525419][ T1] smp: Brought up 1 node, 4 CPUs
> [ 0.525419][ T1] smp: Brought up 1 node, 4 CPUs
> [ 0.527992][ T1] numa: Node 0 CPUs: 0-3
> [ 0.527992][ T1] numa: Node 0 CPUs: 0-3
> [ 0.552787][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
> [ 0.552787][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
> [ 0.557446][ T1] Faulting instruction address: 0xc000000000fe3f1c
> [ 0.557446][ T1] Faulting instruction address: 0xc000000000fe3f1c
> cpu 0x0: Vector: 300 (Data Access) at [c000000006607800]
> pc: c000000000fe3f1c: _find_first_bit+0xc/0xc0
> lr: c00000000027b7d8: build_sched_domains+0xbb4/0x1938
> sp: c000000006607ac0
> msr: 8000000002009033
> dar: 0
> dsisr: 80000
> current = 0xc000000006f9fb00
> paca = 0xc000000005670000 irqmask: 0x03 irq_happened: 0x09
> pid = 1, comm = swapper/0
> Linux version 7.1.0-rc5-next-20260526-00010-gbfac43765a97-dirty (powerpc64le-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #19 SMP PREEMPT Thu May 28 12:29:51 IST 2026
> enter ? for help
> [link register ] c00000000027b7d8 build_sched_domains+0xbb4/0x1938
> [c000000006607ac0] c00000000027b0a8 build_sched_domains+0x484/0x1938 (unreliable)
> [c000000006607c20] c000000004053180 sched_init_domains+0x114/0x1cc
> [c000000006607c70] c0000000040515e0 sched_init_smp+0x5c/0x17c
> [c000000006607cc0] c000000004012888 kernel_init_freeable+0x258/0x790
> [c000000006607dc0] c000000000011f3c kernel_init+0x34/0x268
> [c000000006607e30] c00000000000debc ret_from_kernel_user_thread+0x14/0x1c
> ---- Exception: 0 at 0000000000000000
>
>
>
> -ritesh
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-28 15:56 ` Srikar Dronamraju
@ 2026-05-28 16:31 ` Shrikanth Hegde
2026-05-28 16:44 ` Srikar Dronamraju
0 siblings, 1 reply; 31+ messages in thread
From: Shrikanth Hegde @ 2026-05-28 16:31 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Venkat Rao Bagalkote, K Prateek Nayak, Chen, Yu C, Ritesh Harjani,
Madhavan Srinivasan, Christophe Leroy (CS GROUP), LKML,
linuxppc-dev, Peter Zijlstra, tim.c.chen
On 5/28/26 9:26 PM, Srikar Dronamraju wrote:
> * Shrikanth Hegde <sshegde@linux.ibm.com> [2026-05-28 16:57:54]:
>
>>
>>
>> On 5/25/26 7:37 PM, Venkat Rao Bagalkote wrote:
>>> Greetings!!!
>>>
>>> I am seeing an early boot kernel panic due to NULL pointer dereference
>>> on a POWER9 (pSeries) system when testing linux-next (next-20260522).
>>>
>>
>> Hi Venkat, Ritesh,
>> Could you please try the below diff and see if it helps.
>> This helps to fix boot problem for SPLPAR for me.
>>
>> Hi Chenyu,
>> Let me know if I have to send the patch. Or
>> if you want to add more comments or change it feel free to pick it up and send it.
>> Either way is fine. Let me know.
>>
>> Hi Prateek, Srikar,
>> I hope the below diff makes sense. Please check.
>>
>> nit: llc_mask is still under CONFIG_SCHED_MC, for ppc it is set to true
>> always for SMP systems, and for others it is LLC domain. So not a concern i guess.
>> ---
>>
>> From 10e9413cef063446d67dc02c2b44e1ea582e5d53 Mon Sep 17 00:00:00 2001
>> From: Shrikanth Hegde <sshegde@linux.ibm.com>
>> Date: Thu, 28 May 2026 06:16:44 -0400
>> Subject: [PATCH] topology: Provide arch_llc_mask for cache aware scheduling
>>
>> Venkat Reported a boot kernel panic next-20260522. Git bisect pointed to
>> b5ea300a17e3 ("sched/cache: Make LLC id continuous")
>>
>> Stacktrace points to llc_mask being null.
>>
>> NIP [c000000000e58504] _find_first_bit+0x44/0x130
>> LR [c000000000e58500] _find_first_bit+0x40/0x130
>> Call Trace:
>> build_sched_domains+0xad8/0xe50
>> sched_init_smp+0xa8/0x164
>> kernel_init_freeable+0x250/0x370
>> ret_from_kernel_user_thread+0x14/0x1c
>>
>> On powerpc, cpu_coregroup_mask is available only when the underlying
>> hardware support coregroup. In shared LPAR, QEMU guest or power9 etc
>> coregroup isn;t supported. In such cases llc_mask was being referrenced
>> when it was null leading to panic.
>>
>> on powerpc, LLC is at SMT core level. So assumption that coregroup(MC)
>> domain point to LLC is wrong. Provide a way for archs to say where its
>> LLC is if it not at MC domain.
>>
>> Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>> Closes: https://lore.kernel.org/all/51154de7-3700-4cb4-82f2-1b3a8fa427f7@linux.ibm.com/
>> Suggested-by: Chen, Yu C <yu.c.chen@intel.com>
>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>> ---
>> arch/powerpc/include/asm/topology.h | 3 +++
>> arch/powerpc/kernel/smp.c | 10 ++++++++++
>> kernel/sched/topology.c | 9 +++++++++
>> 3 files changed, 22 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
>> index 66ed5fe1b718..bd1db3b1dbb0 100644
>> --- a/arch/powerpc/include/asm/topology.h
>> +++ b/arch/powerpc/include/asm/topology.h
>> @@ -131,6 +131,9 @@ static inline int cpu_to_coregroup_id(int cpu)
>> #ifdef CONFIG_SMP
>> #include <asm/cputable.h>
>> +const struct cpumask *arch_llc_mask(int cpu);
>> +#define arch_llc_mask arch_llc_mask
>> +
>> struct cpumask *cpu_coregroup_mask(int cpu);
>> const struct cpumask *cpu_die_mask(int cpu);
>> int cpu_die_id(int cpu);
>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> index 3467f86fd78f..cc8e87d6cae9 100644
>> --- a/arch/powerpc/kernel/smp.c
>> +++ b/arch/powerpc/kernel/smp.c
>> @@ -1101,6 +1101,16 @@ const struct cpumask *cpu_die_mask(int cpu)
>> }
>> EXPORT_SYMBOL_GPL(cpu_die_mask);
>> +const struct cpumask *arch_llc_mask(int cpu)
>> +{
>> + /* Power9, CACHE domain is the LLC*/
>> + if (shared_caches)
>> + return cpu_l2_cache_mask(cpu);
>> +
>> + /* For others, SMT domain is the LLC*/
>> + return cpu_smt_mask(cpu);
>> +}
>
> Why dont we do
> #define arch_llc_mask cpu_l2_cache_mask
>
I would prefer to keep the abstraction. This leaves
room for implementation details.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-28 16:31 ` Shrikanth Hegde
@ 2026-05-28 16:44 ` Srikar Dronamraju
0 siblings, 0 replies; 31+ messages in thread
From: Srikar Dronamraju @ 2026-05-28 16:44 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: Venkat Rao Bagalkote, K Prateek Nayak, Chen, Yu C, Ritesh Harjani,
Madhavan Srinivasan, Christophe Leroy (CS GROUP), LKML,
linuxppc-dev, Peter Zijlstra, tim.c.chen
* Shrikanth Hegde <sshegde@linux.ibm.com> [2026-05-28 22:01:53]:
> > > index 3467f86fd78f..cc8e87d6cae9 100644
> > > --- a/arch/powerpc/kernel/smp.c
> > > +++ b/arch/powerpc/kernel/smp.c
> > > @@ -1101,6 +1101,16 @@ const struct cpumask *cpu_die_mask(int cpu)
> > > }
> > > EXPORT_SYMBOL_GPL(cpu_die_mask);
> > > +const struct cpumask *arch_llc_mask(int cpu)
> > > +{
> > > + /* Power9, CACHE domain is the LLC*/
> > > + if (shared_caches)
> > > + return cpu_l2_cache_mask(cpu);
> > > +
> > > + /* For others, SMT domain is the LLC*/
> > > + return cpu_smt_mask(cpu);
> > > +}
> >
> > Why dont we do
> > #define arch_llc_mask cpu_l2_cache_mask
> >
>
> I would prefer to keep the abstraction. This leaves
> room for implementation details.
>
We could always do that whenever we need it.
But doing it now will confuse the reader. Why keep if and else when both are
returning the same.
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-25 14:07 [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
` (2 preceding siblings ...)
2026-05-28 11:27 ` Shrikanth Hegde
@ 2026-05-29 3:58 ` Shrikanth Hegde
2026-05-29 6:59 ` Venkat Rao Bagalkote
3 siblings, 1 reply; 31+ messages in thread
From: Shrikanth Hegde @ 2026-05-29 3:58 UTC (permalink / raw)
To: Venkat Rao Bagalkote, Chen, Yu C, tim.c.chen, K Prateek Nayak,
Srikar Dronamraju
Cc: Madhavan Srinivasan, Ritesh Harjani, Christophe Leroy (CS GROUP),
LKML, linuxppc-dev, Peter Zijlstra
Hello. Sorry for too many mails.
On 5/25/26 7:37 PM, Venkat Rao Bagalkote wrote:
> Greetings!!!
>
> I am seeing an early boot kernel panic due to NULL pointer dereference
> on a POWER9 (pSeries) system when testing linux-next (next-20260522).
>
>
Based on srikar's suggestion to keep the below,
#define arch_llc_mask(cpu) cpu_l2_cache_mask(cpu)
which makes it pretty much what chenyu had here
https://lore.kernel.org/all/8d14c844-b4a8-4af6-acab-2cfdd42225be@intel.com/
I added the changelog and comments. removed the changes in !CONFIG_MC case since powerpc
defines it always. I have changed the chenyu tag to Co-developed-by: instead.
I have carried the tested-by and reviewed-by tags since patch is
still more or less the same.
This is based on tip/master at
5c89783224e9 Merge branch into tip/master: 'x86/tdx'
I am planning to send it based on tip tree.
Let me know if has to be against a any different tree.
Please let me know if there are any concerns.
verified below too fixes the panic seen in shared LPAR.
============================================
From: Shrikanth Hegde <sshegde@linux.ibm.com>
Date: Thu, 28 May 2026 23:23:43 -0400
Subject: [PATCH] sched/topology: Provide arch_llc_mask for cache aware scheduling
Venkat Reported a boot kernel panic next-20260522. Git bisect pointed to
b5ea300a17e3 ("sched/cache: Make LLC id continuous")
Stacktrace points to llc_mask being null.
NIP [c000000000e58504] _find_first_bit+0x44/0x130
LR [c000000000e58500] _find_first_bit+0x40/0x130
Call Trace:
build_sched_domains+0xad8/0xe50
sched_init_smp+0xa8/0x164
kernel_init_freeable+0x250/0x370
ret_from_kernel_user_thread+0x14/0x1c
On powerpc, cpu_coregroup_mask is available only when the underlying
hardware support coregroup. In shared LPAR, QEMU guest or power9 etc
coregroup isn't supported. In such cases llc_mask was being referenced
when it was null leading to panic.
on powerpc, LLC is at SMT core level. So assumption that coregroup(MC)
domain point to LLC is wrong. Provide a way for archs to say where its
LLC is if it not at MC domain.
Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/51154de7-3700-4cb4-82f2-1b3a8fa427f7@linux.ibm.com/
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Co-developed-by: Chen, Yu C <yu.c.chen@intel.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
arch/powerpc/include/asm/topology.h | 6 ++++++
kernel/sched/topology.c | 13 +++++++++++--
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 66ed5fe1b718..e3de0f3d8b86 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -135,6 +135,12 @@ struct cpumask *cpu_coregroup_mask(int cpu);
const struct cpumask *cpu_die_mask(int cpu);
int cpu_die_id(int cpu);
+/* Points to where the LLC is. On power9 this will point at CACHE
+ * domain, On others it will point to SMT domain. In all cases
+ * cpu_l2_cache_mask points to where LLC is.
+ */
+#define arch_llc_mask(cpu) cpu_l2_cache_mask(cpu)
+
#ifdef CONFIG_PPC64
#include <asm/smp.h>
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index df2ceb54c970..622e2e01974c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2063,12 +2063,21 @@ const struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu
return cpu_coregroup_mask(cpu);
}
-#define llc_mask(cpu) cpu_coregroup_mask(cpu)
+/*
+ * Majority of architectures have LLC at MC domain level with exception
+ * such as powerpc. Provide a way for arch to specify where its LLC is
+ * if it falls in exception category
+ */
+# ifndef arch_llc_mask
+#define arch_llc_mask(cpu) cpu_coregroup_mask(cpu)
+# endif
#else
-#define llc_mask(cpu) cpumask_of(cpu)
+#define arch_llc_mask(cpu) cpumask_of(cpu)
#endif
+#define llc_mask(cpu) arch_llc_mask(cpu)
+
const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
{
return cpu_node_mask(cpu);
--
2.47.3
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9
2026-05-29 3:58 ` Shrikanth Hegde
@ 2026-05-29 6:59 ` Venkat Rao Bagalkote
0 siblings, 0 replies; 31+ messages in thread
From: Venkat Rao Bagalkote @ 2026-05-29 6:59 UTC (permalink / raw)
To: Shrikanth Hegde, Chen, Yu C, tim.c.chen, K Prateek Nayak,
Srikar Dronamraju
Cc: Madhavan Srinivasan, Ritesh Harjani, Christophe Leroy (CS GROUP),
LKML, linuxppc-dev, Peter Zijlstra
On 29/05/26 9:28 am, Shrikanth Hegde wrote:
> Hello. Sorry for too many mails.
>
> On 5/25/26 7:37 PM, Venkat Rao Bagalkote wrote:
>> Greetings!!!
>>
>> I am seeing an early boot kernel panic due to NULL pointer
>> dereference on a POWER9 (pSeries) system when testing linux-next
>> (next-20260522).
>>
>>
>
> Based on srikar's suggestion to keep the below,
> #define arch_llc_mask(cpu) cpu_l2_cache_mask(cpu)
>
> which makes it pretty much what chenyu had here
> https://lore.kernel.org/all/8d14c844-b4a8-4af6-acab-2cfdd42225be@intel.com/
>
>
> I added the changelog and comments. removed the changes in !CONFIG_MC
> case since powerpc
> defines it always. I have changed the chenyu tag to Co-developed-by:
> instead.
>
> I have carried the tested-by and reviewed-by tags since patch is
> still more or less the same.
>
> This is based on tip/master at
> 5c89783224e9 Merge branch into tip/master: 'x86/tdx'
> I am planning to send it based on tip tree.
> Let me know if has to be against a any different tree.
>
> Please let me know if there are any concerns.
> verified below too fixes the panic seen in shared LPAR.
>
> ============================================
>
> From: Shrikanth Hegde <sshegde@linux.ibm.com>
> Date: Thu, 28 May 2026 23:23:43 -0400
> Subject: [PATCH] sched/topology: Provide arch_llc_mask for cache aware
> scheduling
>
> Venkat Reported a boot kernel panic next-20260522. Git bisect pointed to
> b5ea300a17e3 ("sched/cache: Make LLC id continuous")
>
> Stacktrace points to llc_mask being null.
>
> NIP [c000000000e58504] _find_first_bit+0x44/0x130
> LR [c000000000e58500] _find_first_bit+0x40/0x130
> Call Trace:
> build_sched_domains+0xad8/0xe50
> sched_init_smp+0xa8/0x164
> kernel_init_freeable+0x250/0x370
> ret_from_kernel_user_thread+0x14/0x1c
>
> On powerpc, cpu_coregroup_mask is available only when the underlying
> hardware support coregroup. In shared LPAR, QEMU guest or power9 etc
> coregroup isn't supported. In such cases llc_mask was being referenced
> when it was null leading to panic.
>
> on powerpc, LLC is at SMT core level. So assumption that coregroup(MC)
> domain point to LLC is wrong. Provide a way for archs to say where its
> LLC is if it not at MC domain.
>
> Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes:
> https://lore.kernel.org/all/51154de7-3700-4cb4-82f2-1b3a8fa427f7@linux.ibm.com/
> Reviewed-by: Chen Yu <yu.c.chen@intel.com>
> Tested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> Co-developed-by: Chen, Yu C <yu.c.chen@intel.com>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
Tested this fix, by applying on top of commit
b5ea300a17e37eada7a98561fbd34a3054578713 and on P9, its booting fine.
Please add the below tag.
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Regards,
Venkat.
> arch/powerpc/include/asm/topology.h | 6 ++++++
> kernel/sched/topology.c | 13 +++++++++++--
> 2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/topology.h
> b/arch/powerpc/include/asm/topology.h
> index 66ed5fe1b718..e3de0f3d8b86 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -135,6 +135,12 @@ struct cpumask *cpu_coregroup_mask(int cpu);
> const struct cpumask *cpu_die_mask(int cpu);
> int cpu_die_id(int cpu);
>
> +/* Points to where the LLC is. On power9 this will point at CACHE
> + * domain, On others it will point to SMT domain. In all cases
> + * cpu_l2_cache_mask points to where LLC is.
> + */
> +#define arch_llc_mask(cpu) cpu_l2_cache_mask(cpu)
> +
> #ifdef CONFIG_PPC64
> #include <asm/smp.h>
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index df2ceb54c970..622e2e01974c 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2063,12 +2063,21 @@ const struct cpumask *tl_mc_mask(struct
> sched_domain_topology_level *tl, int cpu
> return cpu_coregroup_mask(cpu);
> }
>
> -#define llc_mask(cpu) cpu_coregroup_mask(cpu)
> +/*
> + * Majority of architectures have LLC at MC domain level with exception
> + * such as powerpc. Provide a way for arch to specify where its LLC is
> + * if it falls in exception category
> + */
> +# ifndef arch_llc_mask
> +#define arch_llc_mask(cpu) cpu_coregroup_mask(cpu)
> +# endif
>
> #else
> -#define llc_mask(cpu) cpumask_of(cpu)
> +#define arch_llc_mask(cpu) cpumask_of(cpu)
> #endif
>
> +#define llc_mask(cpu) arch_llc_mask(cpu)
> +
> const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level
> *tl, int cpu)
> {
> return cpu_node_mask(cpu);
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2026-05-29 6:59 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-25 14:07 [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
2026-05-25 15:35 ` Chen, Yu C
2026-05-25 16:16 ` K Prateek Nayak
2026-05-26 3:14 ` Chen, Yu C
2026-05-26 3:14 ` Srikar Dronamraju
2026-05-26 4:08 ` Chen, Yu C
2026-05-26 4:58 ` Srikar Dronamraju
2026-05-26 5:53 ` K Prateek Nayak
2026-05-26 14:08 ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask Chen Yu
2026-05-27 7:01 ` Shrikanth Hegde
2026-05-27 16:05 ` Chen, Yu C
2026-05-27 18:07 ` Shrikanth Hegde
2026-05-28 4:58 ` Shrikanth Hegde
2026-05-28 9:12 ` Chen, Yu C
2026-05-28 10:26 ` Shrikanth Hegde
2026-05-28 15:54 ` Srikar Dronamraju
2026-05-28 15:58 ` Srikar Dronamraju
2026-05-27 16:30 ` K Prateek Nayak
2026-05-26 5:24 ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
2026-05-27 7:05 ` Shrikanth Hegde
2026-05-28 16:01 ` Srikar Dronamraju
2026-05-28 6:54 ` Ritesh Harjani
2026-05-28 16:06 ` Srikar Dronamraju
2026-05-28 11:27 ` Shrikanth Hegde
2026-05-28 13:21 ` Chen, Yu C
2026-05-28 15:06 ` Ritesh Harjani
2026-05-28 15:56 ` Srikar Dronamraju
2026-05-28 16:31 ` Shrikanth Hegde
2026-05-28 16:44 ` Srikar Dronamraju
2026-05-29 3:58 ` Shrikanth Hegde
2026-05-29 6:59 ` Venkat Rao Bagalkote
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.