Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Chen Yu <yu.c.chen@intel.com>
To: kprateek.nayak@amd.com
Cc: srikar@linux.ibm.com, venkat88@linux.ibm.com,
	maddy@linux.ibm.com, sshegde@linux.ibm.com,
	riteshh@linux.ibm.com, chleroy@kernel.org,
	tim.c.chen@linux.intel.com, peterz@infradead.org,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-sched@vger.kernel.org, Chen Yu <yu.c.chen@intel.com>
Subject: Re: [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask
Date: Tue, 26 May 2026 22:08:56 +0800	[thread overview]
Message-ID: <20260526140856.139657-1-yu.c.chen@intel.com> (raw)
In-Reply-To: <058664ab-0982-4c13-9d4b-caa2f7616b0f@amd.com>

Hi Prateek,

On Tue, 26 May 2026 11:23:59 +0530, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> Hello Srikar,
>
> On 5/26/2026 10:28 AM, Srikar Dronamraju wrote:
> > L2 Cache reported here is for SMT8 Core aka CACHE domain.
>
> Apart for the scheduler, nothing in tree currently cares about
> cpu_coregroup_mask() except for drivers/base/arch_topology.c but
> Power doesn't select GENERIC_ARCH_TOPOLOGY.
>
> Why can't Power have an internal mask for MC domain (tl_mc_mask) and
> the scheduler can use cpu_coregroup_mask() for the actual LLc? (The L2
> mask in this case.)
>
> Power anyways adds its own topology via set_sched_topology() so the
> default_topology from kernel/sched/topology.c remains unused.
>
> ...
>
> > Shouldnt cache-aware scheduling be worried about cpuset partitions too.
> > If a cpuset has subset of LLC cores, then we should Scheduler assume it can
> > control complete LLC?
>
> Well, the scheduling takes care of partitions and the cache aware
> scheduling bits take care of looking at the full system perspective
> for stats aggregation and pointing to a particular LLc.
>
> We don't compare llc_id across cpusets so we keeping one unique llc_id
> per H/W LLC instance is feasible and it enables us to keep llc_id space
> limited for optimizing cache-aware scheduling.
>
> Now if we have threads of same process across partitions, we'll
> still aggregate the utilization numbers across the full LLC but
> the load balancers at individual partitions will make a call on
> the aggregation.
>
> -- 
> Thanks and Regards,
> Prateek
>
>

I suppose what you suggested looks like below:

powerpc/smp: make cpu_coregroup_mask() return the LLC

On pSeries shared LPARs(or coregroup_enabled is false on
Power9 and earlier) the hemisphere map is not allocated, so
build_sched_domains() dereferences a NULL cpumask and crashes.

The generic scheduler expects cpu_coregroup_mask() to span the LLC.
On powerpc the LLC is the L2. Return cpu_l2_cache_mask() instead of
the hemisphere map. Use a coregroup_map() helper for the in-file
hemisphere users, and a powerpc_tl_mc_mask() wrapper for the MC
sched-domain level.

Fixes: b5ea300a17e3 ("sched/cache: Make LLC id continuous")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 arch/powerpc/kernel/smp.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1040,11 +1040,22 @@ static const struct cpumask *tl_smallcore_smt_mask(struct sched_domain_topology_
 }
 #endif
 
+static inline struct cpumask *coregroup_map(int cpu)
+{
+	return per_cpu(cpu_coregroup_map, cpu);
+}
+
 struct cpumask *cpu_coregroup_mask(int cpu)
 {
-	return per_cpu(cpu_coregroup_map, cpu);
+	return cpu_l2_cache_mask(cpu);
+}
+
+static const struct cpumask *
+powerpc_tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return coregroup_map(cpu);
 }
 
 static bool has_coregroup_support(void)
 {
 	if (is_shared_processor())
@@ -1155,7 +1166,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 	cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
 
 	if (has_coregroup_support())
-		cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
+		cpumask_set_cpu(boot_cpuid, coregroup_map(boot_cpuid));
 
 	init_big_cores();
 	if (has_big_cores) {
@@ -1520,8 +1531,8 @@ static void remove_cpu_from_masks(int cpu)
 		set_cpus_unrelated(cpu, i, cpu_core_mask);
 
 	if (has_coregroup_support()) {
-		for_each_cpu(i, cpu_coregroup_mask(cpu))
-			set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
+		for_each_cpu(i, coregroup_map(cpu))
+			set_cpus_unrelated(cpu, i, coregroup_map);
 	}
 }
 #endif
@@ -1553,7 +1564,7 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
 	if (!*mask) {
 		/* Assume only siblings are part of this CPU's coregroup */
 		for_each_cpu(i, submask_fn(cpu))
-			set_cpus_related(cpu, i, cpu_coregroup_mask);
+			set_cpus_related(cpu, i, coregroup_map);
 
 		return;
 	}
@@ -1561,18 +1572,18 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
 	cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu));
 
 	/* Update coregroup mask with all the CPUs that are part of submask */
-	or_cpumasks_related(cpu, cpu, submask_fn, cpu_coregroup_mask);
+	or_cpumasks_related(cpu, cpu, submask_fn, coregroup_map);
 
 	/* Skip all CPUs already part of coregroup mask */
-	cpumask_andnot(*mask, *mask, cpu_coregroup_mask(cpu));
+	cpumask_andnot(*mask, *mask, coregroup_map(cpu));
 
 	for_each_cpu(i, *mask) {
 		/* Skip all CPUs not part of this coregroup */
 		if (coregroup_id == cpu_to_coregroup_id(i)) {
-			or_cpumasks_related(cpu, i, submask_fn, cpu_coregroup_mask);
+			or_cpumasks_related(cpu, i, submask_fn, coregroup_map);
 			cpumask_andnot(*mask, *mask, submask_fn(i));
 		} else {
-			cpumask_andnot(*mask, *mask, cpu_coregroup_mask(i));
+			cpumask_andnot(*mask, *mask, coregroup_map(i));
 		}
 	}
 }
@@ -1733,7 +1744,7 @@ static void __init build_sched_topology(void)
 
 	if (has_coregroup_support()) {
 		powerpc_topology[i++] =
-			SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
+			SDTL_INIT(powerpc_tl_mc_mask, powerpc_shared_proc_flags, MC);
 	}
 
 	powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask, powerpc_shared_proc_flags, PKG);
-- 
2.43.0

Thanks,
Yu

next prev parent reply	other threads:[~2026-05-26 14:18 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-25 14:07 [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
2026-05-25 15:35 ` Chen, Yu C
2026-05-25 16:16   ` K Prateek Nayak
2026-05-26  3:14     ` Chen, Yu C
2026-05-26  3:14   ` Srikar Dronamraju
2026-05-26  4:08     ` Chen, Yu C
2026-05-26  4:58       ` Srikar Dronamraju
2026-05-26  5:53         ` K Prateek Nayak
2026-05-26 14:08           ` Chen Yu [this message]
2026-05-27  7:01             ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask Shrikanth Hegde
2026-05-27 16:05               ` Chen, Yu C
2026-05-27 18:07                 ` Shrikanth Hegde
2026-05-28  4:58                   ` Shrikanth Hegde
2026-05-28  9:12                     ` Chen, Yu C
2026-05-28 10:26                       ` Shrikanth Hegde
2026-05-28 15:54                       ` Srikar Dronamraju
2026-05-28 15:58                   ` Srikar Dronamraju
2026-05-27 16:30               ` K Prateek Nayak
2026-05-26  5:24       ` [BUG] sched/cache: "Make LLC id continuous" causes NULL cpumask dereference in build_sched_domains on POWER9 Venkat Rao Bagalkote
2026-05-27  7:05         ` Shrikanth Hegde
2026-05-28 16:01           ` Srikar Dronamraju
2026-05-28  6:54 ` Ritesh Harjani
2026-05-28 16:06   ` Srikar Dronamraju
2026-05-28 11:27 ` Shrikanth Hegde
2026-05-28 13:21   ` Chen, Yu C
2026-05-28 15:06   ` Ritesh Harjani
2026-05-28 15:56   ` Srikar Dronamraju
2026-05-28 16:31     ` Shrikanth Hegde
2026-05-28 16:44       ` Srikar Dronamraju
2026-05-29  3:58 ` Shrikanth Hegde
2026-05-29  6:59   ` Venkat Rao Bagalkote

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260526140856.139657-1-yu.c.chen@intel.com \
    --to=yu.c.chen@intel.com \
    --cc=chleroy@kernel.org \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sched@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riteshh@linux.ibm.com \
    --cc=srikar@linux.ibm.com \
    --cc=sshegde@linux.ibm.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=venkat88@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox