linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
@ 2025-08-26  4:13 K Prateek Nayak
  2025-08-26  4:13 ` [PATCH v7 1/8] " K Prateek Nayak
                   ` (8 more replies)
  0 siblings, 9 replies; 36+ messages in thread
From: K Prateek Nayak @ 2025-08-26  4:13 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	linuxppc-dev, linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

This version uses Peter's suggestion from [1] as if and incrementally
adds cleanup on top to the arch/ bits. I've tested the x86 side but the
PowerPC and the s390 bits are only build tested. Review and feedback is
greatly appreciated.

[1] https://lore.kernel.org/lkml/20250825091910.GT3245006@noisy.programming.kicks-ass.net/

Patches are prepared on top of tip:master at commit 4628e5bbca91 ("Merge
branch into tip/master: 'x86/tdx'")
---
changelog v6..v7:

o Fix the s390 and ppc build errors (Intel test robot)

o Use Peter's diff as is and incrementally do the cleanup on top. The
  PowerPC part was slightly more extensive due to the lack of
  CONFIG_SCHED_MC in arch/powerpc/Kconfig.

v6: https://lore.kernel.org/lkml/20250825120244.11093-1-kprateek.nayak@amd.com/
---
K Prateek Nayak (7):
  powerpc/smp: Rename cpu_corgroup_* to cpu_corgrp_*
  powerpc/smp: Export cpu_coregroup_mask()
  powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
  sched/topology: Unify tl_smt_mask() across core and all arch
  sched/topology: Unify tl_cls_mask() across core and x86
  sched/topology: Unify tl_mc_mask() across core and all arch
  sched/topology: Unify tl_pkg_mask() across core and all arch

Peter Zijlstra (1):
  sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()

 arch/powerpc/Kconfig           |  9 ++++++
 arch/powerpc/include/asm/smp.h |  4 +++
 arch/powerpc/kernel/smp.c      | 51 +++++++++++++++++++---------------
 arch/s390/kernel/topology.c    | 16 ++++-------
 arch/x86/kernel/smpboot.c      |  9 +++---
 include/linux/sched/topology.h | 34 ++++++++++++++++++++---
 include/linux/topology.h       |  2 +-
 kernel/sched/topology.c        | 28 +++++++------------
 8 files changed, 93 insertions(+), 60 deletions(-)


base-commit: 4628e5bbca916edaf4ed55915ab399f9ba25519f
-- 
2.34.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v7 1/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
  2025-08-26  4:13 [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() K Prateek Nayak
@ 2025-08-26  4:13 ` K Prateek Nayak
  2025-08-28 23:06   ` Tim Chen
  2025-08-26  4:13 ` [PATCH v7 2/8] powerpc/smp: Rename cpu_corgroup_* to cpu_corgrp_* K Prateek Nayak
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: K Prateek Nayak @ 2025-08-26  4:13 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	linuxppc-dev, linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

From: Peter Zijlstra <peterz@infradead.org>

Leon [1] and Vinicius [2] noted a topology_span_sane() warning during
their testing starting from v6.16-rc1. Debug that followed pointed to
the tl->mask() for the NODE domain being incorrectly resolved to that of
the highest NUMA domain.

tl->mask() for NODE is set to the sd_numa_mask() which depends on the
global "sched_domains_curr_level" hack. "sched_domains_curr_level" is
set to the "tl->numa_level" during tl traversal in build_sched_domains()
calling sd_init() but was not reset before topology_span_sane().

Since "tl->numa_level" still reflected the old value from
build_sched_domains(), topology_span_sane() for the NODE domain trips
when the span of the last NUMA domain overlaps.

Instead of replicating the "sched_domains_curr_level" hack, get rid of
it entirely and instead, pass the entire "sched_domain_topology_level"
object to tl->cpumask() function to prevent such mishap in the future.

sd_numa_mask() now directly references "tl->numa_level" instead of
relying on the global "sched_domains_curr_level" hack to index into
sched_domains_numa_masks[].

The original warning was reproducible on the following NUMA topology
reported by Leon:

    $ sudo numactl -H
    available: 5 nodes (0-4)
    node 0 cpus: 0 1
    node 0 size: 2927 MB
    node 0 free: 1603 MB
    node 1 cpus: 2 3
    node 1 size: 3023 MB
    node 1 free: 3008 MB
    node 2 cpus: 4 5
    node 2 size: 3023 MB
    node 2 free: 3007 MB
    node 3 cpus: 6 7
    node 3 size: 3023 MB
    node 3 free: 3002 MB
    node 4 cpus: 8 9
    node 4 size: 3022 MB
    node 4 free: 2718 MB
    node distances:
    node   0   1   2   3   4
      0:  10  39  38  37  36
      1:  39  10  38  37  36
      2:  38  38  10  37  36
      3:  37  37  37  10  36
      4:  36  36  36  36  10

The above topology can be mimicked using the following QEMU cmd that was
used to reproduce the warning and test the fix:

     sudo qemu-system-x86_64 -enable-kvm -cpu host \
     -m 20G -smp cpus=10,sockets=10 -machine q35 \
     -object memory-backend-ram,size=4G,id=m0 \
     -object memory-backend-ram,size=4G,id=m1 \
     -object memory-backend-ram,size=4G,id=m2 \
     -object memory-backend-ram,size=4G,id=m3 \
     -object memory-backend-ram,size=4G,id=m4 \
     -numa node,cpus=0-1,memdev=m0,nodeid=0 \
     -numa node,cpus=2-3,memdev=m1,nodeid=1 \
     -numa node,cpus=4-5,memdev=m2,nodeid=2 \
     -numa node,cpus=6-7,memdev=m3,nodeid=3 \
     -numa node,cpus=8-9,memdev=m4,nodeid=4 \
     -numa dist,src=0,dst=1,val=39 \
     -numa dist,src=0,dst=2,val=38 \
     -numa dist,src=0,dst=3,val=37 \
     -numa dist,src=0,dst=4,val=36 \
     -numa dist,src=1,dst=0,val=39 \
     -numa dist,src=1,dst=2,val=38 \
     -numa dist,src=1,dst=3,val=37 \
     -numa dist,src=1,dst=4,val=36 \
     -numa dist,src=2,dst=0,val=38 \
     -numa dist,src=2,dst=1,val=38 \
     -numa dist,src=2,dst=3,val=37 \
     -numa dist,src=2,dst=4,val=36 \
     -numa dist,src=3,dst=0,val=37 \
     -numa dist,src=3,dst=1,val=37 \
     -numa dist,src=3,dst=2,val=37 \
     -numa dist,src=3,dst=4,val=36 \
     -numa dist,src=4,dst=0,val=36 \
     -numa dist,src=4,dst=1,val=36 \
     -numa dist,src=4,dst=2,val=36 \
     -numa dist,src=4,dst=3,val=36 \
     ...

  [ prateek: Fixed build issues on s390 and ppc, put everything behind
    the respective CONFIG_SCHED_* ]

Reported-by: Leon Romanovsky <leon@kernel.org>
Closes: https://lore.kernel.org/lkml/20250610110701.GA256154@unreal/ [1]
Fixes: ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap") # ce29a7da84cd, f55dac1dafb3
Link: https://lore.kernel.org/lkml/a3de98387abad28592e6ab591f3ff6107fe01dc1.1755893468.git.tim.c.chen@linux.intel.com/ [2]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 arch/powerpc/kernel/smp.c      | 26 +++++++++++-----
 arch/s390/kernel/topology.c    | 20 +++++++++----
 arch/x86/kernel/smpboot.c      | 30 ++++++++++++++++---
 include/linux/sched/topology.h |  4 ++-
 include/linux/topology.h       |  2 +-
 kernel/sched/topology.c        | 54 ++++++++++++++++++++++------------
 6 files changed, 99 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index f59e4b9cc207..862f50c09539 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1028,16 +1028,21 @@ static int powerpc_shared_proc_flags(void)
  * We can't just pass cpu_l2_cache_mask() directly because
  * returns a non-const pointer and the compiler barfs on that.
  */
-static const struct cpumask *shared_cache_mask(int cpu)
+static const struct cpumask *shared_cache_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return per_cpu(cpu_l2_cache_map, cpu);
 }
 
 #ifdef CONFIG_SCHED_SMT
-static const struct cpumask *smallcore_smt_mask(int cpu)
+static const struct cpumask *smallcore_smt_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return cpu_smallcore_mask(cpu);
 }
+
+static const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_smt_mask(cpu);
+}
 #endif
 
 static struct cpumask *cpu_coregroup_mask(int cpu)
@@ -1054,11 +1059,16 @@ static bool has_coregroup_support(void)
 	return coregroup_enabled;
 }
 
-static const struct cpumask *cpu_mc_mask(int cpu)
+static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return cpu_coregroup_mask(cpu);
 }
 
+static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_node_mask(cpu);
+}
+
 static int __init init_big_cores(void)
 {
 	int cpu;
@@ -1448,7 +1458,7 @@ static bool update_mask_by_l2(int cpu, cpumask_var_t *mask)
 		return false;
 	}
 
-	cpumask_and(*mask, cpu_online_mask, cpu_cpu_mask(cpu));
+	cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu));
 
 	/* Update l2-cache mask with all the CPUs that are part of submask */
 	or_cpumasks_related(cpu, cpu, submask_fn, cpu_l2_cache_mask);
@@ -1538,7 +1548,7 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
 		return;
 	}
 
-	cpumask_and(*mask, cpu_online_mask, cpu_cpu_mask(cpu));
+	cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu));
 
 	/* Update coregroup mask with all the CPUs that are part of submask */
 	or_cpumasks_related(cpu, cpu, submask_fn, cpu_coregroup_mask);
@@ -1601,7 +1611,7 @@ static void add_cpu_to_masks(int cpu)
 
 	/* If chip_id is -1; limit the cpu_core_mask to within PKG */
 	if (chip_id == -1)
-		cpumask_and(mask, mask, cpu_cpu_mask(cpu));
+		cpumask_and(mask, mask, cpu_node_mask(cpu));
 
 	for_each_cpu(i, mask) {
 		if (chip_id == cpu_to_chip_id(i)) {
@@ -1703,7 +1713,7 @@ static void __init build_sched_topology(void)
 		powerpc_topology[i++] =
 			SDTL_INIT(smallcore_smt_mask, powerpc_smt_flags, SMT);
 	} else {
-		powerpc_topology[i++] = SDTL_INIT(cpu_smt_mask, powerpc_smt_flags, SMT);
+		powerpc_topology[i++] = SDTL_INIT(tl_smt_mask, powerpc_smt_flags, SMT);
 	}
 #endif
 	if (shared_caches) {
@@ -1716,7 +1726,7 @@ static void __init build_sched_topology(void)
 			SDTL_INIT(cpu_mc_mask, powerpc_shared_proc_flags, MC);
 	}
 
-	powerpc_topology[i++] = SDTL_INIT(cpu_cpu_mask, powerpc_shared_proc_flags, PKG);
+	powerpc_topology[i++] = SDTL_INIT(cpu_pkg_mask, powerpc_shared_proc_flags, PKG);
 
 	/* There must be one trailing NULL entry left.  */
 	BUG_ON(i >= ARRAY_SIZE(powerpc_topology) - 1);
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 46569b8e47dd..5129e3ffa7f5 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -509,7 +509,7 @@ int topology_cpu_init(struct cpu *cpu)
 	return rc;
 }
 
-static const struct cpumask *cpu_thread_mask(int cpu)
+static const struct cpumask *cpu_thread_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return &cpu_topology[cpu].thread_mask;
 }
@@ -520,22 +520,32 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
 	return &cpu_topology[cpu].core_mask;
 }
 
-static const struct cpumask *cpu_book_mask(int cpu)
+static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return &cpu_topology[cpu].core_mask;
+}
+
+static const struct cpumask *cpu_book_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return &cpu_topology[cpu].book_mask;
 }
 
-static const struct cpumask *cpu_drawer_mask(int cpu)
+static const struct cpumask *cpu_drawer_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return &cpu_topology[cpu].drawer_mask;
 }
 
+static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_node_mask(cpu);
+}
+
 static struct sched_domain_topology_level s390_topology[] = {
 	SDTL_INIT(cpu_thread_mask, cpu_smt_flags, SMT),
-	SDTL_INIT(cpu_coregroup_mask, cpu_core_flags, MC),
+	SDTL_INIT(cpu_mc_mask, cpu_core_flags, MC),
 	SDTL_INIT(cpu_book_mask, NULL, BOOK),
 	SDTL_INIT(cpu_drawer_mask, NULL, DRAWER),
-	SDTL_INIT(cpu_cpu_mask, NULL, PKG),
+	SDTL_INIT(cpu_pkg_mask, NULL, PKG),
 	{ NULL, },
 };
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 33e166f6ab12..4cd3d69741cf 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -463,14 +463,36 @@ static int x86_core_flags(void)
 {
 	return cpu_core_flags() | x86_sched_itmt_flags();
 }
+
+static const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_smt_mask(cpu);
+}
 #endif
+
 #ifdef CONFIG_SCHED_CLUSTER
 static int x86_cluster_flags(void)
 {
 	return cpu_cluster_flags() | x86_sched_itmt_flags();
 }
+static const struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_clustergroup_mask(cpu);
+}
+#endif
+
+#ifdef CONFIG_SCHED_MC
+static const struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_coregroup_mask(cpu);
+}
 #endif
 
+static const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_node_mask(cpu);
+}
+
 /*
  * Set if a package/die has multiple NUMA nodes inside.
  * AMD Magny-Cours, Intel Cluster-on-Die, and Intel
@@ -479,14 +501,14 @@ static int x86_cluster_flags(void)
 static bool x86_has_numa_in_package;
 
 static struct sched_domain_topology_level x86_topology[] = {
-	SDTL_INIT(cpu_smt_mask, cpu_smt_flags, SMT),
+	SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT),
 #ifdef CONFIG_SCHED_CLUSTER
-	SDTL_INIT(cpu_clustergroup_mask, x86_cluster_flags, CLS),
+	SDTL_INIT(tl_cls_mask, x86_cluster_flags, CLS),
 #endif
 #ifdef CONFIG_SCHED_MC
-	SDTL_INIT(cpu_coregroup_mask, x86_core_flags, MC),
+	SDTL_INIT(tl_mc_mask, x86_core_flags, MC),
 #endif
-	SDTL_INIT(cpu_cpu_mask, x86_sched_itmt_flags, PKG),
+	SDTL_INIT(tl_pkg_mask, x86_sched_itmt_flags, PKG),
 	{ NULL },
 };
 
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 5263746b63e8..602508130c8a 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -30,6 +30,8 @@ struct sd_flag_debug {
 };
 extern const struct sd_flag_debug sd_flag_debug[];
 
+struct sched_domain_topology_level;
+
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
 {
@@ -172,7 +174,7 @@ bool cpus_equal_capacity(int this_cpu, int that_cpu);
 bool cpus_share_cache(int this_cpu, int that_cpu);
 bool cpus_share_resources(int this_cpu, int that_cpu);
 
-typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
+typedef const struct cpumask *(*sched_domain_mask_f)(struct sched_domain_topology_level *tl, int cpu);
 typedef int (*sched_domain_flags_f)(void);
 
 struct sd_data {
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 33b7fda97d39..6575af39fd10 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -260,7 +260,7 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
 
 #endif
 
-static inline const struct cpumask *cpu_cpu_mask(int cpu)
+static inline const struct cpumask *cpu_node_mask(int cpu)
 {
 	return cpumask_of_node(cpu_to_node(cpu));
 }
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 977e133bb8a4..dfc754e0668c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1591,7 +1591,6 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
 enum numa_topology_type sched_numa_topology_type;
 
 static int			sched_domains_numa_levels;
-static int			sched_domains_curr_level;
 
 int				sched_max_numa_distance;
 static int			*sched_domains_numa_distance;
@@ -1632,14 +1631,7 @@ sd_init(struct sched_domain_topology_level *tl,
 	int sd_id, sd_weight, sd_flags = 0;
 	struct cpumask *sd_span;
 
-#ifdef CONFIG_NUMA
-	/*
-	 * Ugly hack to pass state to sd_numa_mask()...
-	 */
-	sched_domains_curr_level = tl->numa_level;
-#endif
-
-	sd_weight = cpumask_weight(tl->mask(cpu));
+	sd_weight = cpumask_weight(tl->mask(tl, cpu));
 
 	if (tl->sd_flags)
 		sd_flags = (*tl->sd_flags)();
@@ -1677,7 +1669,7 @@ sd_init(struct sched_domain_topology_level *tl,
 	};
 
 	sd_span = sched_domain_span(sd);
-	cpumask_and(sd_span, cpu_map, tl->mask(cpu));
+	cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu));
 	sd_id = cpumask_first(sd_span);
 
 	sd->flags |= asym_cpu_capacity_classify(sd_span, cpu_map);
@@ -1732,22 +1724,48 @@ sd_init(struct sched_domain_topology_level *tl,
 	return sd;
 }
 
+#ifdef CONFIG_SCHED_SMT
+static const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_smt_mask(cpu);
+}
+#endif
+
+#ifdef CONFIG_SCHED_CLUSTER
+static const struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_clustergroup_mask(cpu);
+}
+#endif
+
+#ifdef CONFIG_SCHED_MC
+static const struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_coregroup_mask(cpu);
+}
+#endif
+
+static const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_node_mask(cpu);
+}
+
 /*
  * Topology list, bottom-up.
  */
 static struct sched_domain_topology_level default_topology[] = {
 #ifdef CONFIG_SCHED_SMT
-	SDTL_INIT(cpu_smt_mask, cpu_smt_flags, SMT),
+	SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT),
 #endif
 
 #ifdef CONFIG_SCHED_CLUSTER
-	SDTL_INIT(cpu_clustergroup_mask, cpu_cluster_flags, CLS),
+	SDTL_INIT(tl_cls_mask, cpu_cluster_flags, CLS),
 #endif
 
 #ifdef CONFIG_SCHED_MC
-	SDTL_INIT(cpu_coregroup_mask, cpu_core_flags, MC),
+	SDTL_INIT(tl_mc_mask, cpu_core_flags, MC),
 #endif
-	SDTL_INIT(cpu_cpu_mask, NULL, PKG),
+	SDTL_INIT(tl_pkg_mask, NULL, PKG),
 	{ NULL, },
 };
 
@@ -1769,9 +1787,9 @@ void __init set_sched_topology(struct sched_domain_topology_level *tl)
 
 #ifdef CONFIG_NUMA
 
-static const struct cpumask *sd_numa_mask(int cpu)
+static const struct cpumask *sd_numa_mask(struct sched_domain_topology_level *tl, int cpu)
 {
-	return sched_domains_numa_masks[sched_domains_curr_level][cpu_to_node(cpu)];
+	return sched_domains_numa_masks[tl->numa_level][cpu_to_node(cpu)];
 }
 
 static void sched_numa_warn(const char *str)
@@ -2411,7 +2429,7 @@ static bool topology_span_sane(const struct cpumask *cpu_map)
 		 * breaks the linking done for an earlier span.
 		 */
 		for_each_cpu(cpu, cpu_map) {
-			const struct cpumask *tl_cpu_mask = tl->mask(cpu);
+			const struct cpumask *tl_cpu_mask = tl->mask(tl, cpu);
 			int id;
 
 			/* lowest bit set in this mask is used as a unique id */
@@ -2419,7 +2437,7 @@ static bool topology_span_sane(const struct cpumask *cpu_map)
 
 			if (cpumask_test_cpu(id, id_seen)) {
 				/* First CPU has already been seen, ensure identical spans */
-				if (!cpumask_equal(tl->mask(id), tl_cpu_mask))
+				if (!cpumask_equal(tl->mask(tl, id), tl_cpu_mask))
 					return false;
 			} else {
 				/* First CPU hasn't been seen before, ensure it's a completely new span */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 2/8] powerpc/smp: Rename cpu_corgroup_* to cpu_corgrp_*
  2025-08-26  4:13 [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() K Prateek Nayak
  2025-08-26  4:13 ` [PATCH v7 1/8] " K Prateek Nayak
@ 2025-08-26  4:13 ` K Prateek Nayak
  2025-08-26  5:02   ` Christophe Leroy
  2025-08-26  4:13 ` [PATCH v7 3/8] powerpc/smp: Export cpu_coregroup_mask() K Prateek Nayak
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: K Prateek Nayak @ 2025-08-26  4:13 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	linuxppc-dev, linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

Rename cpu_corgroup_{map,mask} to cpu_corgrp_{map,mask} to free up the
cpu_corgroup_* namespace. cpu_corgroup_mask() will be added back in the
subsequent commit for CONFIG_SCHED_MC enablement.

No functional changes intended.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 arch/powerpc/kernel/smp.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 862f50c09539..4f48262658cc 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -87,7 +87,7 @@ DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_core_map);
-static DEFINE_PER_CPU(cpumask_var_t, cpu_coregroup_map);
+static DEFINE_PER_CPU(cpumask_var_t, cpu_corgrp_map);
 
 EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
 EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map);
@@ -1045,9 +1045,9 @@ static const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl,
 }
 #endif
 
-static struct cpumask *cpu_coregroup_mask(int cpu)
+static struct cpumask *cpu_corgrp_mask(int cpu)
 {
-	return per_cpu(cpu_coregroup_map, cpu);
+	return per_cpu(cpu_corgrp_map, cpu);
 }
 
 static bool has_coregroup_support(void)
@@ -1061,7 +1061,7 @@ static bool has_coregroup_support(void)
 
 static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl, int cpu)
 {
-	return cpu_coregroup_mask(cpu);
+	return cpu_corgrp_mask(cpu);
 }
 
 static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
@@ -1124,7 +1124,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 		zalloc_cpumask_var_node(&per_cpu(cpu_core_map, cpu),
 					GFP_KERNEL, cpu_to_node(cpu));
 		if (has_coregroup_support())
-			zalloc_cpumask_var_node(&per_cpu(cpu_coregroup_map, cpu),
+			zalloc_cpumask_var_node(&per_cpu(cpu_corgrp_map, cpu),
 						GFP_KERNEL, cpu_to_node(cpu));
 
 #ifdef CONFIG_NUMA
@@ -1145,7 +1145,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 	cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
 
 	if (has_coregroup_support())
-		cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
+		cpumask_set_cpu(boot_cpuid, cpu_corgrp_mask(boot_cpuid));
 
 	init_big_cores();
 	if (has_big_cores) {
@@ -1510,8 +1510,8 @@ static void remove_cpu_from_masks(int cpu)
 		set_cpus_unrelated(cpu, i, cpu_core_mask);
 
 	if (has_coregroup_support()) {
-		for_each_cpu(i, cpu_coregroup_mask(cpu))
-			set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
+		for_each_cpu(i, cpu_corgrp_mask(cpu))
+			set_cpus_unrelated(cpu, i, cpu_corgrp_mask);
 	}
 }
 #endif
@@ -1543,7 +1543,7 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
 	if (!*mask) {
 		/* Assume only siblings are part of this CPU's coregroup */
 		for_each_cpu(i, submask_fn(cpu))
-			set_cpus_related(cpu, i, cpu_coregroup_mask);
+			set_cpus_related(cpu, i, cpu_corgrp_mask);
 
 		return;
 	}
@@ -1551,18 +1551,18 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
 	cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu));
 
 	/* Update coregroup mask with all the CPUs that are part of submask */
-	or_cpumasks_related(cpu, cpu, submask_fn, cpu_coregroup_mask);
+	or_cpumasks_related(cpu, cpu, submask_fn, cpu_corgrp_mask);
 
 	/* Skip all CPUs already part of coregroup mask */
-	cpumask_andnot(*mask, *mask, cpu_coregroup_mask(cpu));
+	cpumask_andnot(*mask, *mask, cpu_corgrp_mask(cpu));
 
 	for_each_cpu(i, *mask) {
 		/* Skip all CPUs not part of this coregroup */
 		if (coregroup_id == cpu_to_coregroup_id(i)) {
-			or_cpumasks_related(cpu, i, submask_fn, cpu_coregroup_mask);
+			or_cpumasks_related(cpu, i, submask_fn, cpu_corgrp_mask);
 			cpumask_andnot(*mask, *mask, submask_fn(i));
 		} else {
-			cpumask_andnot(*mask, *mask, cpu_coregroup_mask(i));
+			cpumask_andnot(*mask, *mask, cpu_corgrp_mask(i));
 		}
 	}
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 3/8] powerpc/smp: Export cpu_coregroup_mask()
  2025-08-26  4:13 [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() K Prateek Nayak
  2025-08-26  4:13 ` [PATCH v7 1/8] " K Prateek Nayak
  2025-08-26  4:13 ` [PATCH v7 2/8] powerpc/smp: Rename cpu_corgroup_* to cpu_corgrp_* K Prateek Nayak
@ 2025-08-26  4:13 ` K Prateek Nayak
  2025-08-26  4:54   ` Christophe Leroy
  2025-08-26  4:13 ` [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits K Prateek Nayak
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: K Prateek Nayak @ 2025-08-26  4:13 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	linuxppc-dev, linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

Deine cpu_coregroup_mask() to export the per-cpu cpu_corgrp_map when
coregroups are supported(). When has_coregroup_support() returns false,
cpu_coregroup_mask() returns the mask used by the PKG domain.

Since this will only be used after CONFIG_SCHED_MC is added for PowerPC,
no functional changes are intended at this point.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 arch/powerpc/include/asm/smp.h | 2 ++
 arch/powerpc/kernel/smp.c      | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index b77927ccb0ab..86de4d0dd0aa 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -148,6 +148,8 @@ static inline const struct cpumask *cpu_smt_mask(int cpu)
 }
 #endif /* CONFIG_SCHED_SMT */
 
+extern const struct cpumask *cpu_coregroup_mask(int cpu);
+
 /* Since OpenPIC has only 4 IPIs, we use slightly different message numbers.
  *
  * Make sure this matches openpic_request_IPIs in open_pic.c, or what shows up
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 4f48262658cc..e623f2864dc4 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1059,6 +1059,14 @@ static bool has_coregroup_support(void)
 	return coregroup_enabled;
 }
 
+const struct cpumask *cpu_coregroup_mask(int cpu)
+{
+	if (has_coregroup_support())
+		return per_cpu(cpu_corgrp_map, cpu);
+
+	return cpu_node_mask(cpu);
+}
+
 static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return cpu_corgrp_mask(cpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
  2025-08-26  4:13 [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() K Prateek Nayak
                   ` (2 preceding siblings ...)
  2025-08-26  4:13 ` [PATCH v7 3/8] powerpc/smp: Export cpu_coregroup_mask() K Prateek Nayak
@ 2025-08-26  4:13 ` K Prateek Nayak
  2025-08-26  4:49   ` Christophe Leroy
  2025-08-26  9:27   ` Shrikanth Hegde
  2025-08-26  4:13 ` [PATCH v7 5/8] sched/topology: Unify tl_smt_mask() across core and all arch K Prateek Nayak
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 36+ messages in thread
From: K Prateek Nayak @ 2025-08-26  4:13 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	linuxppc-dev, linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

PowerPC enables the MC scheduling domain by default on systems with
coregroup support without having a SCHED_MC config in Kconfig.

The scheduler uses CONFIG_SCHED_MC to introduce the MC domain in the
default topology (core) and to optimize the default CPU selection
routine (sched-ext).

Introduce CONFIG_SCHED_MC for powerpc and note that it should be
preferably enabled given the current default behavior. This also ensures
PowerPC is tested during future developments that come to depend on
CONFIG_SCHED_MC.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 arch/powerpc/Kconfig           | 9 +++++++++
 arch/powerpc/include/asm/smp.h | 2 ++
 arch/powerpc/kernel/smp.c      | 4 ++++
 3 files changed, 15 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 93402a1d9c9f..e954ab3f635f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -971,6 +971,15 @@ config SCHED_SMT
 	  when dealing with POWER5 cpus at a cost of slightly increased
 	  overhead in some places. If unsure say N here.
 
+config SCHED_MC
+	bool "Multi-Core Cache (MC) scheduler support"
+	depends on PPC64 && SMP
+	default y
+	help
+	  MC scheduler support improves the CPU scheduler's decision making
+	  when dealing with POWER systems that contain multiple Last Level
+	  Cache instances on the same socket. If unsure say Y here.
+
 config PPC_DENORMALISATION
 	bool "PowerPC denormalisation exception handling"
 	depends on PPC_BOOK3S_64
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 86de4d0dd0aa..9a320d96e891 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -148,7 +148,9 @@ static inline const struct cpumask *cpu_smt_mask(int cpu)
 }
 #endif /* CONFIG_SCHED_SMT */
 
+#ifdef CONFIG_SCHED_MC
 extern const struct cpumask *cpu_coregroup_mask(int cpu);
+#endif
 
 /* Since OpenPIC has only 4 IPIs, we use slightly different message numbers.
  *
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e623f2864dc4..7f79b853b221 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1059,6 +1059,7 @@ static bool has_coregroup_support(void)
 	return coregroup_enabled;
 }
 
+#ifdef CONFIG_SCHED_MC
 const struct cpumask *cpu_coregroup_mask(int cpu)
 {
 	if (has_coregroup_support())
@@ -1071,6 +1072,7 @@ static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl,
 {
 	return cpu_corgrp_mask(cpu);
 }
+#endif
 
 static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
 {
@@ -1729,10 +1731,12 @@ static void __init build_sched_topology(void)
 			SDTL_INIT(shared_cache_mask, powerpc_shared_cache_flags, CACHE);
 	}
 
+#ifdef CONFIG_SCHED_MC
 	if (has_coregroup_support()) {
 		powerpc_topology[i++] =
 			SDTL_INIT(cpu_mc_mask, powerpc_shared_proc_flags, MC);
 	}
+#endif
 
 	powerpc_topology[i++] = SDTL_INIT(cpu_pkg_mask, powerpc_shared_proc_flags, PKG);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 5/8] sched/topology: Unify tl_smt_mask() across core and all arch
  2025-08-26  4:13 [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() K Prateek Nayak
                   ` (3 preceding siblings ...)
  2025-08-26  4:13 ` [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits K Prateek Nayak
@ 2025-08-26  4:13 ` K Prateek Nayak
  2025-08-26  5:13   ` Christophe Leroy
  2025-08-26  8:01   ` Peter Zijlstra
  2025-08-26  4:13 ` [PATCH v7 6/8] sched/topology: Unify tl_cls_mask() across core and x86 K Prateek Nayak
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 36+ messages in thread
From: K Prateek Nayak @ 2025-08-26  4:13 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	linuxppc-dev, linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

Unify the tl_smt_mask() wrapper around cpu_smt_mask() across core, x86,
ppc, and s390.

On s390, include/linux/topology.c defines an explicit cpu_smt_mask()
wrapper around topology_sibling_cpumask() when cpu_smt_mask() is not
defined by the arch/ bits and topology_sibling_cpumask() on s390 returns
&cpu_topology[cpu].thread_mask.

No functional changes intended.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 arch/powerpc/kernel/smp.c      | 5 -----
 arch/s390/kernel/topology.c    | 8 +-------
 arch/x86/kernel/smpboot.c      | 5 -----
 include/linux/sched/topology.h | 8 +++++++-
 kernel/sched/topology.c        | 7 -------
 5 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 7f79b853b221..c58ddf84fe63 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1038,11 +1038,6 @@ static const struct cpumask *smallcore_smt_mask(struct sched_domain_topology_lev
 {
 	return cpu_smallcore_mask(cpu);
 }
-
-static const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return cpu_smt_mask(cpu);
-}
 #endif
 
 static struct cpumask *cpu_corgrp_mask(int cpu)
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 5129e3ffa7f5..c88eda847309 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -509,12 +509,6 @@ int topology_cpu_init(struct cpu *cpu)
 	return rc;
 }
 
-static const struct cpumask *cpu_thread_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return &cpu_topology[cpu].thread_mask;
-}
-
-
 const struct cpumask *cpu_coregroup_mask(int cpu)
 {
 	return &cpu_topology[cpu].core_mask;
@@ -541,7 +535,7 @@ static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl
 }
 
 static struct sched_domain_topology_level s390_topology[] = {
-	SDTL_INIT(cpu_thread_mask, cpu_smt_flags, SMT),
+	SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT),
 	SDTL_INIT(cpu_mc_mask, cpu_core_flags, MC),
 	SDTL_INIT(cpu_book_mask, NULL, BOOK),
 	SDTL_INIT(cpu_drawer_mask, NULL, DRAWER),
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 4cd3d69741cf..03ff6270966a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -463,11 +463,6 @@ static int x86_core_flags(void)
 {
 	return cpu_core_flags() | x86_sched_itmt_flags();
 }
-
-static const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return cpu_smt_mask(cpu);
-}
 #endif
 
 #ifdef CONFIG_SCHED_CLUSTER
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 602508130c8a..d75fbb7d9667 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -37,7 +37,13 @@ static inline int cpu_smt_flags(void)
 {
 	return SD_SHARE_CPUCAPACITY | SD_SHARE_LLC;
 }
-#endif
+
+static const __maybe_unused
+struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_smt_mask(cpu);
+}
+#endif /* CONFIG_SCHED_SMT */
 
 #ifdef CONFIG_SCHED_CLUSTER
 static inline int cpu_cluster_flags(void)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index dfc754e0668c..92165fe56a2d 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1724,13 +1724,6 @@ sd_init(struct sched_domain_topology_level *tl,
 	return sd;
 }
 
-#ifdef CONFIG_SCHED_SMT
-static const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return cpu_smt_mask(cpu);
-}
-#endif
-
 #ifdef CONFIG_SCHED_CLUSTER
 static const struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 6/8] sched/topology: Unify tl_cls_mask() across core and x86
  2025-08-26  4:13 [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() K Prateek Nayak
                   ` (4 preceding siblings ...)
  2025-08-26  4:13 ` [PATCH v7 5/8] sched/topology: Unify tl_smt_mask() across core and all arch K Prateek Nayak
@ 2025-08-26  4:13 ` K Prateek Nayak
  2025-08-26  5:14   ` Christophe Leroy
  2025-08-26  4:13 ` [PATCH v7 7/8] sched/topology: Unify tl_mc_mask() across core and all arch K Prateek Nayak
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: K Prateek Nayak @ 2025-08-26  4:13 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	linuxppc-dev, linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

Unify the tl_cls_mask() used by both the scheduler core and x86.
No functional changes intended.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 arch/x86/kernel/smpboot.c      | 4 ----
 include/linux/sched/topology.h | 8 +++++++-
 kernel/sched/topology.c        | 7 -------
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 03ff6270966a..81a40d777d65 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -470,10 +470,6 @@ static int x86_cluster_flags(void)
 {
 	return cpu_cluster_flags() | x86_sched_itmt_flags();
 }
-static const struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return cpu_clustergroup_mask(cpu);
-}
 #endif
 
 #ifdef CONFIG_SCHED_MC
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index d75fbb7d9667..e54501cc8e47 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -50,7 +50,13 @@ static inline int cpu_cluster_flags(void)
 {
 	return SD_CLUSTER | SD_SHARE_LLC;
 }
-#endif
+
+static const __maybe_unused
+struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_clustergroup_mask(cpu);
+}
+#endif /* CONFIG_SCHED_CLUSTER */
 
 #ifdef CONFIG_SCHED_MC
 static inline int cpu_core_flags(void)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 92165fe56a2d..4530cbad41e1 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1724,13 +1724,6 @@ sd_init(struct sched_domain_topology_level *tl,
 	return sd;
 }
 
-#ifdef CONFIG_SCHED_CLUSTER
-static const struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return cpu_clustergroup_mask(cpu);
-}
-#endif
-
 #ifdef CONFIG_SCHED_MC
 static const struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 7/8] sched/topology: Unify tl_mc_mask() across core and all arch
  2025-08-26  4:13 [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() K Prateek Nayak
                   ` (5 preceding siblings ...)
  2025-08-26  4:13 ` [PATCH v7 6/8] sched/topology: Unify tl_cls_mask() across core and x86 K Prateek Nayak
@ 2025-08-26  4:13 ` K Prateek Nayak
  2025-08-26  5:15   ` Christophe Leroy
  2025-08-26  4:13 ` [PATCH v7 8/8] sched/topology: Unify tl_pkg_mask() " K Prateek Nayak
  2025-08-26 10:05 ` [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() Shrikanth Hegde
  8 siblings, 1 reply; 36+ messages in thread
From: K Prateek Nayak @ 2025-08-26  4:13 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	linuxppc-dev, linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

Unify the tl_mc_mask() wrapper around cpu_coregroup_mask() used by core,
x86, powerpc, and s390.

No functional changes intended.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 arch/powerpc/kernel/smp.c      | 7 +------
 arch/s390/kernel/topology.c    | 7 +------
 arch/x86/kernel/smpboot.c      | 7 -------
 include/linux/sched/topology.h | 8 +++++++-
 kernel/sched/topology.c        | 7 -------
 5 files changed, 9 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index c58ddf84fe63..40719679385b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1062,11 +1062,6 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
 
 	return cpu_node_mask(cpu);
 }
-
-static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return cpu_corgrp_mask(cpu);
-}
 #endif
 
 static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
@@ -1729,7 +1724,7 @@ static void __init build_sched_topology(void)
 #ifdef CONFIG_SCHED_MC
 	if (has_coregroup_support()) {
 		powerpc_topology[i++] =
-			SDTL_INIT(cpu_mc_mask, powerpc_shared_proc_flags, MC);
+			SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
 	}
 #endif
 
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index c88eda847309..8dbf32f362e1 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -514,11 +514,6 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
 	return &cpu_topology[cpu].core_mask;
 }
 
-static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return &cpu_topology[cpu].core_mask;
-}
-
 static const struct cpumask *cpu_book_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return &cpu_topology[cpu].book_mask;
@@ -536,7 +531,7 @@ static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl
 
 static struct sched_domain_topology_level s390_topology[] = {
 	SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT),
-	SDTL_INIT(cpu_mc_mask, cpu_core_flags, MC),
+	SDTL_INIT(tl_mc_mask, cpu_core_flags, MC),
 	SDTL_INIT(cpu_book_mask, NULL, BOOK),
 	SDTL_INIT(cpu_drawer_mask, NULL, DRAWER),
 	SDTL_INIT(cpu_pkg_mask, NULL, PKG),
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 81a40d777d65..bfbcac9a73d1 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -472,13 +472,6 @@ static int x86_cluster_flags(void)
 }
 #endif
 
-#ifdef CONFIG_SCHED_MC
-static const struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return cpu_coregroup_mask(cpu);
-}
-#endif
-
 static const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return cpu_node_mask(cpu);
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index e54501cc8e47..075d1f063668 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -63,7 +63,13 @@ static inline int cpu_core_flags(void)
 {
 	return SD_SHARE_LLC;
 }
-#endif
+
+static const __maybe_unused
+struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_coregroup_mask(cpu);
+}
+#endif /* CONFIG_SCHED_MC */
 
 #ifdef CONFIG_NUMA
 static inline int cpu_numa_flags(void)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 4530cbad41e1..77d14430c5e1 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1724,13 +1724,6 @@ sd_init(struct sched_domain_topology_level *tl,
 	return sd;
 }
 
-#ifdef CONFIG_SCHED_MC
-static const struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return cpu_coregroup_mask(cpu);
-}
-#endif
-
 static const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return cpu_node_mask(cpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 8/8] sched/topology: Unify tl_pkg_mask() across core and all arch
  2025-08-26  4:13 [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() K Prateek Nayak
                   ` (6 preceding siblings ...)
  2025-08-26  4:13 ` [PATCH v7 7/8] sched/topology: Unify tl_mc_mask() across core and all arch K Prateek Nayak
@ 2025-08-26  4:13 ` K Prateek Nayak
  2025-08-26  5:16   ` Christophe Leroy
  2025-08-26 10:05 ` [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() Shrikanth Hegde
  8 siblings, 1 reply; 36+ messages in thread
From: K Prateek Nayak @ 2025-08-26  4:13 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	linuxppc-dev, linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

Unify the tl_pkg_mask() wrapper around cpu_nod_mask() across core, x86,
powerpc, and s390.

No functional changes intended.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 arch/powerpc/kernel/smp.c      | 7 +------
 arch/s390/kernel/topology.c    | 7 +------
 arch/x86/kernel/smpboot.c      | 5 -----
 include/linux/sched/topology.h | 6 ++++++
 kernel/sched/topology.c        | 5 -----
 5 files changed, 8 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 40719679385b..8e869c13f7ed 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1064,11 +1064,6 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
 }
 #endif
 
-static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return cpu_node_mask(cpu);
-}
-
 static int __init init_big_cores(void)
 {
 	int cpu;
@@ -1728,7 +1723,7 @@ static void __init build_sched_topology(void)
 	}
 #endif
 
-	powerpc_topology[i++] = SDTL_INIT(cpu_pkg_mask, powerpc_shared_proc_flags, PKG);
+	powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask, powerpc_shared_proc_flags, PKG);
 
 	/* There must be one trailing NULL entry left.  */
 	BUG_ON(i >= ARRAY_SIZE(powerpc_topology) - 1);
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 8dbf32f362e1..8f5b6ecc055f 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -524,17 +524,12 @@ static const struct cpumask *cpu_drawer_mask(struct sched_domain_topology_level
 	return &cpu_topology[cpu].drawer_mask;
 }
 
-static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return cpu_node_mask(cpu);
-}
-
 static struct sched_domain_topology_level s390_topology[] = {
 	SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT),
 	SDTL_INIT(tl_mc_mask, cpu_core_flags, MC),
 	SDTL_INIT(cpu_book_mask, NULL, BOOK),
 	SDTL_INIT(cpu_drawer_mask, NULL, DRAWER),
-	SDTL_INIT(cpu_pkg_mask, NULL, PKG),
+	SDTL_INIT(tl_pkg_mask, NULL, PKG),
 	{ NULL, },
 };
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index bfbcac9a73d1..6c0ab30a80e2 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -472,11 +472,6 @@ static int x86_cluster_flags(void)
 }
 #endif
 
-static const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return cpu_node_mask(cpu);
-}
-
 /*
  * Set if a package/die has multiple NUMA nodes inside.
  * AMD Magny-Cours, Intel Cluster-on-Die, and Intel
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 075d1f063668..807603bfe8ff 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -71,6 +71,12 @@ struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
 }
 #endif /* CONFIG_SCHED_MC */
 
+static const __maybe_unused
+struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_node_mask(cpu);
+}
+
 #ifdef CONFIG_NUMA
 static inline int cpu_numa_flags(void)
 {
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 77d14430c5e1..18889bd97e22 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1724,11 +1724,6 @@ sd_init(struct sched_domain_topology_level *tl,
 	return sd;
 }
 
-static const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
-{
-	return cpu_node_mask(cpu);
-}
-
 /*
  * Topology list, bottom-up.
  */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
  2025-08-26  4:13 ` [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits K Prateek Nayak
@ 2025-08-26  4:49   ` Christophe Leroy
  2025-08-26  8:07     ` Peter Zijlstra
  2025-08-26  9:27   ` Shrikanth Hegde
  1 sibling, 1 reply; 36+ messages in thread
From: Christophe Leroy @ 2025-08-26  4:49 UTC (permalink / raw)
  To: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, thomas.weissschuh, Li Chen, Bibo Mao,
	Mete Durlu, Tobias Huschle, Easwar Hariharan, Guo Weikang,
	Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal,
	Yury Norov [NVIDIA], Sudeep Holla, Jonathan Cameron, Andrea Righi,
	Yicong Yang, Ricardo Neri, Tim Chen, Vinicius Costa Gomes



Le 26/08/2025 à 06:13, K Prateek Nayak a écrit :
> PowerPC enables the MC scheduling domain by default on systems with
> coregroup support without having a SCHED_MC config in Kconfig.
> 
> The scheduler uses CONFIG_SCHED_MC to introduce the MC domain in the
> default topology (core) and to optimize the default CPU selection
> routine (sched-ext).
> 
> Introduce CONFIG_SCHED_MC for powerpc and note that it should be
> preferably enabled given the current default behavior. This also ensures
> PowerPC is tested during future developments that come to depend on
> CONFIG_SCHED_MC.
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>   arch/powerpc/Kconfig           | 9 +++++++++
>   arch/powerpc/include/asm/smp.h | 2 ++
>   arch/powerpc/kernel/smp.c      | 4 ++++
>   3 files changed, 15 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 93402a1d9c9f..e954ab3f635f 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -971,6 +971,15 @@ config SCHED_SMT
>   	  when dealing with POWER5 cpus at a cost of slightly increased
>   	  overhead in some places. If unsure say N here.
>   
> +config SCHED_MC
> +	bool "Multi-Core Cache (MC) scheduler support"
> +	depends on PPC64 && SMP
> +	default y
> +	help
> +	  MC scheduler support improves the CPU scheduler's decision making
> +	  when dealing with POWER systems that contain multiple Last Level
> +	  Cache instances on the same socket. If unsure say Y here.
> +

You shouldn't duplicate CONFIG_SCHED_MC in every architecture, instead 
you should define a CONFIG_ARCH_HAS_SCHED_MC in arch/Kconfig that gets 
selected by architectures then have CONFIG_SCHED_MC defined in 
init/Kconfig or kernel/Kconfig or so.

>   config PPC_DENORMALISATION
>   	bool "PowerPC denormalisation exception handling"
>   	depends on PPC_BOOK3S_64
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index 86de4d0dd0aa..9a320d96e891 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -148,7 +148,9 @@ static inline const struct cpumask *cpu_smt_mask(int cpu)
>   }
>   #endif /* CONFIG_SCHED_SMT */
>   
> +#ifdef CONFIG_SCHED_MC
>   extern const struct cpumask *cpu_coregroup_mask(int cpu);
> +#endif

Why do you need this ifdef ? Leaving it outside #ifdef allows you to do 
constructs like:

	if (IS_ENABLED(CONFIG_SCHED_MC))
		cpu_coregroup_mask(cpu);

Otherwise you'll need to ensure all calls to cpu_coregroup_mask() are 
also inside #ifdefs, which is not the recommended way nowadays.

>   
>   /* Since OpenPIC has only 4 IPIs, we use slightly different message numbers.
>    *
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index e623f2864dc4..7f79b853b221 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1059,6 +1059,7 @@ static bool has_coregroup_support(void)
>   	return coregroup_enabled;
>   }
>   
> +#ifdef CONFIG_SCHED_MC
>   const struct cpumask *cpu_coregroup_mask(int cpu)
>   {
>   	if (has_coregroup_support())
> @@ -1071,6 +1072,7 @@ static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl,
>   {
>   	return cpu_corgrp_mask(cpu);
>   }
> +#endif
>   
>   static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
>   {
> @@ -1729,10 +1731,12 @@ static void __init build_sched_topology(void)
>   			SDTL_INIT(shared_cache_mask, powerpc_shared_cache_flags, CACHE);
>   	}
>   
> +#ifdef CONFIG_SCHED_MC

As I said above, define the function prototype at all time in smp.h and 
use IS_ENABLED(CONFIG_SCHED_MC) here instead of a #ifdef

>   	if (has_coregroup_support()) {
>   		powerpc_topology[i++] =
>   			SDTL_INIT(cpu_mc_mask, powerpc_shared_proc_flags, MC);
>   	}
> +#endif
>   
>   	powerpc_topology[i++] = SDTL_INIT(cpu_pkg_mask, powerpc_shared_proc_flags, PKG);
>   


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 3/8] powerpc/smp: Export cpu_coregroup_mask()
  2025-08-26  4:13 ` [PATCH v7 3/8] powerpc/smp: Export cpu_coregroup_mask() K Prateek Nayak
@ 2025-08-26  4:54   ` Christophe Leroy
  0 siblings, 0 replies; 36+ messages in thread
From: Christophe Leroy @ 2025-08-26  4:54 UTC (permalink / raw)
  To: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, thomas.weissschuh, Li Chen, Bibo Mao,
	Mete Durlu, Tobias Huschle, Easwar Hariharan, Guo Weikang,
	Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal,
	Yury Norov [NVIDIA], Sudeep Holla, Jonathan Cameron, Andrea Righi,
	Yicong Yang, Ricardo Neri, Tim Chen, Vinicius Costa Gomes



Le 26/08/2025 à 06:13, K Prateek Nayak a écrit :
> Deine cpu_coregroup_mask() to export the per-cpu cpu_corgrp_map when
> coregroups are supported(). When has_coregroup_support() returns false,
> cpu_coregroup_mask() returns the mask used by the PKG domain.
> 
> Since this will only be used after CONFIG_SCHED_MC is added for PowerPC,
> no functional changes are intended at this point.
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>   arch/powerpc/include/asm/smp.h | 2 ++
>   arch/powerpc/kernel/smp.c      | 8 ++++++++
>   2 files changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index b77927ccb0ab..86de4d0dd0aa 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -148,6 +148,8 @@ static inline const struct cpumask *cpu_smt_mask(int cpu)
>   }
>   #endif /* CONFIG_SCHED_SMT */
>   
> +extern const struct cpumask *cpu_coregroup_mask(int cpu);

'extern' keyword is pointless for function prototypes, remove it.

See report from checkpatch:

$ ./scripts/checkpatch.pl --strict -g a064a30c52a5
CHECK: extern prototypes should be avoided in .h files
#28: FILE: arch/powerpc/include/asm/smp.h:151:
+extern const struct cpumask *cpu_coregroup_mask(int cpu);

total: 0 errors, 0 warnings, 1 checks, 22 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
       mechanically convert to the typical style using --fix or 
--fix-inplace.

Commit a064a30c52a5 ("powerpc/smp: Export cpu_coregroup_mask()") has 
style problems, please review.

NOTE: If any of the errors are false positives, please report
       them to the maintainer, see CHECKPATCH in MAINTAINERS.



> +
>   /* Since OpenPIC has only 4 IPIs, we use slightly different message numbers.
>    *
>    * Make sure this matches openpic_request_IPIs in open_pic.c, or what shows up
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 4f48262658cc..e623f2864dc4 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1059,6 +1059,14 @@ static bool has_coregroup_support(void)
>   	return coregroup_enabled;
>   }
>   
> +const struct cpumask *cpu_coregroup_mask(int cpu)
> +{
> +	if (has_coregroup_support())
> +		return per_cpu(cpu_corgrp_map, cpu);
> +
> +	return cpu_node_mask(cpu);
> +}
> +
>   static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl, int cpu)
>   {
>   	return cpu_corgrp_mask(cpu);


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 2/8] powerpc/smp: Rename cpu_corgroup_* to cpu_corgrp_*
  2025-08-26  4:13 ` [PATCH v7 2/8] powerpc/smp: Rename cpu_corgroup_* to cpu_corgrp_* K Prateek Nayak
@ 2025-08-26  5:02   ` Christophe Leroy
  2025-09-01  3:05     ` K Prateek Nayak
  0 siblings, 1 reply; 36+ messages in thread
From: Christophe Leroy @ 2025-08-26  5:02 UTC (permalink / raw)
  To: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, thomas.weissschuh, Li Chen, Bibo Mao,
	Mete Durlu, Tobias Huschle, Easwar Hariharan, Guo Weikang,
	Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal,
	Yury Norov [NVIDIA], Sudeep Holla, Jonathan Cameron, Andrea Righi,
	Yicong Yang, Ricardo Neri, Tim Chen, Vinicius Costa Gomes



Le 26/08/2025 à 06:13, K Prateek Nayak a écrit :
> Rename cpu_corgroup_{map,mask} to cpu_corgrp_{map,mask} to free up the
> cpu_corgroup_* namespace. cpu_corgroup_mask() will be added back in the
> subsequent commit for CONFIG_SCHED_MC enablement.

This renaming seems odd and uncomplete. For instance 
update_coregroup_mask() should probably be renamed as well shoudln't it ?

When you say cpu_corgroup_mask() will be added back, you mean the same 
function or a completely different function but with the same name ?

What's really the difference between corgrp and coregroup ?

Shouldn't also has_coregroup_support() now be renamed has_corgrp_support() ?

Christophe

> 
> No functional changes intended.
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>   arch/powerpc/kernel/smp.c | 26 +++++++++++++-------------
>   1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 862f50c09539..4f48262658cc 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -87,7 +87,7 @@ DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
>   DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
>   DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
>   DEFINE_PER_CPU(cpumask_var_t, cpu_core_map);
> -static DEFINE_PER_CPU(cpumask_var_t, cpu_coregroup_map);
> +static DEFINE_PER_CPU(cpumask_var_t, cpu_corgrp_map);
>   
>   EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
>   EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map);
> @@ -1045,9 +1045,9 @@ static const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl,
>   }
>   #endif
>   
> -static struct cpumask *cpu_coregroup_mask(int cpu)
> +static struct cpumask *cpu_corgrp_mask(int cpu)
>   {
> -	return per_cpu(cpu_coregroup_map, cpu);
> +	return per_cpu(cpu_corgrp_map, cpu);
>   }
>   
>   static bool has_coregroup_support(void)
> @@ -1061,7 +1061,7 @@ static bool has_coregroup_support(void)
>   
>   static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl, int cpu)
>   {
> -	return cpu_coregroup_mask(cpu);
> +	return cpu_corgrp_mask(cpu);
>   }
>   
>   static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
> @@ -1124,7 +1124,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>   		zalloc_cpumask_var_node(&per_cpu(cpu_core_map, cpu),
>   					GFP_KERNEL, cpu_to_node(cpu));
>   		if (has_coregroup_support())
> -			zalloc_cpumask_var_node(&per_cpu(cpu_coregroup_map, cpu),
> +			zalloc_cpumask_var_node(&per_cpu(cpu_corgrp_map, cpu),
>   						GFP_KERNEL, cpu_to_node(cpu));
>   
>   #ifdef CONFIG_NUMA
> @@ -1145,7 +1145,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>   	cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
>   
>   	if (has_coregroup_support())
> -		cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
> +		cpumask_set_cpu(boot_cpuid, cpu_corgrp_mask(boot_cpuid));
>   
>   	init_big_cores();
>   	if (has_big_cores) {
> @@ -1510,8 +1510,8 @@ static void remove_cpu_from_masks(int cpu)
>   		set_cpus_unrelated(cpu, i, cpu_core_mask);
>   
>   	if (has_coregroup_support()) {
> -		for_each_cpu(i, cpu_coregroup_mask(cpu))
> -			set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
> +		for_each_cpu(i, cpu_corgrp_mask(cpu))
> +			set_cpus_unrelated(cpu, i, cpu_corgrp_mask);
>   	}
>   }
>   #endif
> @@ -1543,7 +1543,7 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
>   	if (!*mask) {
>   		/* Assume only siblings are part of this CPU's coregroup */
>   		for_each_cpu(i, submask_fn(cpu))
> -			set_cpus_related(cpu, i, cpu_coregroup_mask);
> +			set_cpus_related(cpu, i, cpu_corgrp_mask);
>   
>   		return;
>   	}
> @@ -1551,18 +1551,18 @@ static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
>   	cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu));
>   
>   	/* Update coregroup mask with all the CPUs that are part of submask */
> -	or_cpumasks_related(cpu, cpu, submask_fn, cpu_coregroup_mask);
> +	or_cpumasks_related(cpu, cpu, submask_fn, cpu_corgrp_mask);
>   
>   	/* Skip all CPUs already part of coregroup mask */
> -	cpumask_andnot(*mask, *mask, cpu_coregroup_mask(cpu));
> +	cpumask_andnot(*mask, *mask, cpu_corgrp_mask(cpu));
>   
>   	for_each_cpu(i, *mask) {
>   		/* Skip all CPUs not part of this coregroup */
>   		if (coregroup_id == cpu_to_coregroup_id(i)) {
> -			or_cpumasks_related(cpu, i, submask_fn, cpu_coregroup_mask);
> +			or_cpumasks_related(cpu, i, submask_fn, cpu_corgrp_mask);
>   			cpumask_andnot(*mask, *mask, submask_fn(i));
>   		} else {
> -			cpumask_andnot(*mask, *mask, cpu_coregroup_mask(i));
> +			cpumask_andnot(*mask, *mask, cpu_corgrp_mask(i));
>   		}
>   	}
>   }


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 5/8] sched/topology: Unify tl_smt_mask() across core and all arch
  2025-08-26  4:13 ` [PATCH v7 5/8] sched/topology: Unify tl_smt_mask() across core and all arch K Prateek Nayak
@ 2025-08-26  5:13   ` Christophe Leroy
  2025-08-26  8:01   ` Peter Zijlstra
  1 sibling, 0 replies; 36+ messages in thread
From: Christophe Leroy @ 2025-08-26  5:13 UTC (permalink / raw)
  To: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, thomas.weissschuh, Li Chen, Bibo Mao,
	Mete Durlu, Tobias Huschle, Easwar Hariharan, Guo Weikang,
	Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal,
	Yury Norov [NVIDIA], Sudeep Holla, Jonathan Cameron, Andrea Righi,
	Yicong Yang, Ricardo Neri, Tim Chen, Vinicius Costa Gomes



Le 26/08/2025 à 06:13, K Prateek Nayak a écrit :
> Unify the tl_smt_mask() wrapper around cpu_smt_mask() across core, x86,
> ppc, and s390.
> 
> On s390, include/linux/topology.c defines an explicit cpu_smt_mask()
> wrapper around topology_sibling_cpumask() when cpu_smt_mask() is not
> defined by the arch/ bits and topology_sibling_cpumask() on s390 returns
> &cpu_topology[cpu].thread_mask.
> 
> No functional changes intended.
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>   arch/powerpc/kernel/smp.c      | 5 -----
>   arch/s390/kernel/topology.c    | 8 +-------
>   arch/x86/kernel/smpboot.c      | 5 -----
>   include/linux/sched/topology.h | 8 +++++++-
>   kernel/sched/topology.c        | 7 -------
>   5 files changed, 8 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 7f79b853b221..c58ddf84fe63 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1038,11 +1038,6 @@ static const struct cpumask *smallcore_smt_mask(struct sched_domain_topology_lev
>   {
>   	return cpu_smallcore_mask(cpu);
>   }
> -
> -static const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return cpu_smt_mask(cpu);
> -}
>   #endif
>   
>   static struct cpumask *cpu_corgrp_mask(int cpu)
> diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
> index 5129e3ffa7f5..c88eda847309 100644
> --- a/arch/s390/kernel/topology.c
> +++ b/arch/s390/kernel/topology.c
> @@ -509,12 +509,6 @@ int topology_cpu_init(struct cpu *cpu)
>   	return rc;
>   }
>   
> -static const struct cpumask *cpu_thread_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return &cpu_topology[cpu].thread_mask;
> -}
> -
> -
>   const struct cpumask *cpu_coregroup_mask(int cpu)
>   {
>   	return &cpu_topology[cpu].core_mask;
> @@ -541,7 +535,7 @@ static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl
>   }
>   
>   static struct sched_domain_topology_level s390_topology[] = {
> -	SDTL_INIT(cpu_thread_mask, cpu_smt_flags, SMT),
> +	SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT),
>   	SDTL_INIT(cpu_mc_mask, cpu_core_flags, MC),
>   	SDTL_INIT(cpu_book_mask, NULL, BOOK),
>   	SDTL_INIT(cpu_drawer_mask, NULL, DRAWER),
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 4cd3d69741cf..03ff6270966a 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -463,11 +463,6 @@ static int x86_core_flags(void)
>   {
>   	return cpu_core_flags() | x86_sched_itmt_flags();
>   }
> -
> -static const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return cpu_smt_mask(cpu);
> -}
>   #endif
>   
>   #ifdef CONFIG_SCHED_CLUSTER
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index 602508130c8a..d75fbb7d9667 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -37,7 +37,13 @@ static inline int cpu_smt_flags(void)
>   {
>   	return SD_SHARE_CPUCAPACITY | SD_SHARE_LLC;
>   }
> -#endif
> +
> +static const __maybe_unused
> +struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)

__maybe_unused hides the dust under the carpet.

Leave the function in kernel/sched/topology.c and make it non-static 
with a prototype in linux/sched/topology.h

> +{
> +	return cpu_smt_mask(cpu);
> +}
> +#endif /* CONFIG_SCHED_SMT */
>   
>   #ifdef CONFIG_SCHED_CLUSTER
>   static inline int cpu_cluster_flags(void)
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index dfc754e0668c..92165fe56a2d 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1724,13 +1724,6 @@ sd_init(struct sched_domain_topology_level *tl,
>   	return sd;
>   }
>   
> -#ifdef CONFIG_SCHED_SMT
> -static const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return cpu_smt_mask(cpu);
> -}
> -#endif
> -
>   #ifdef CONFIG_SCHED_CLUSTER
>   static const struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
>   {


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 6/8] sched/topology: Unify tl_cls_mask() across core and x86
  2025-08-26  4:13 ` [PATCH v7 6/8] sched/topology: Unify tl_cls_mask() across core and x86 K Prateek Nayak
@ 2025-08-26  5:14   ` Christophe Leroy
  0 siblings, 0 replies; 36+ messages in thread
From: Christophe Leroy @ 2025-08-26  5:14 UTC (permalink / raw)
  To: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, thomas.weissschuh, Li Chen, Bibo Mao,
	Mete Durlu, Tobias Huschle, Easwar Hariharan, Guo Weikang,
	Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal,
	Yury Norov [NVIDIA], Sudeep Holla, Jonathan Cameron, Andrea Righi,
	Yicong Yang, Ricardo Neri, Tim Chen, Vinicius Costa Gomes



Le 26/08/2025 à 06:13, K Prateek Nayak a écrit :
> Unify the tl_cls_mask() used by both the scheduler core and x86.
> No functional changes intended.
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>   arch/x86/kernel/smpboot.c      | 4 ----
>   include/linux/sched/topology.h | 8 +++++++-
>   kernel/sched/topology.c        | 7 -------
>   3 files changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 03ff6270966a..81a40d777d65 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -470,10 +470,6 @@ static int x86_cluster_flags(void)
>   {
>   	return cpu_cluster_flags() | x86_sched_itmt_flags();
>   }
> -static const struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return cpu_clustergroup_mask(cpu);
> -}
>   #endif
>   
>   #ifdef CONFIG_SCHED_MC
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index d75fbb7d9667..e54501cc8e47 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -50,7 +50,13 @@ static inline int cpu_cluster_flags(void)
>   {
>   	return SD_CLUSTER | SD_SHARE_LLC;
>   }
> -#endif
> +
> +static const __maybe_unused

Same as previous patch, don't hide dust under the carpet, if you need 
__maybe_unused it means there is a problem in the construct.

> +struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
> +{
> +	return cpu_clustergroup_mask(cpu);
> +}
> +#endif /* CONFIG_SCHED_CLUSTER */
>   
>   #ifdef CONFIG_SCHED_MC
>   static inline int cpu_core_flags(void)
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 92165fe56a2d..4530cbad41e1 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1724,13 +1724,6 @@ sd_init(struct sched_domain_topology_level *tl,
>   	return sd;
>   }
>   
> -#ifdef CONFIG_SCHED_CLUSTER
> -static const struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return cpu_clustergroup_mask(cpu);
> -}
> -#endif
> -
>   #ifdef CONFIG_SCHED_MC
>   static const struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
>   {


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 7/8] sched/topology: Unify tl_mc_mask() across core and all arch
  2025-08-26  4:13 ` [PATCH v7 7/8] sched/topology: Unify tl_mc_mask() across core and all arch K Prateek Nayak
@ 2025-08-26  5:15   ` Christophe Leroy
  0 siblings, 0 replies; 36+ messages in thread
From: Christophe Leroy @ 2025-08-26  5:15 UTC (permalink / raw)
  To: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, thomas.weissschuh, Li Chen, Bibo Mao,
	Mete Durlu, Tobias Huschle, Easwar Hariharan, Guo Weikang,
	Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal,
	Yury Norov [NVIDIA], Sudeep Holla, Jonathan Cameron, Andrea Righi,
	Yicong Yang, Ricardo Neri, Tim Chen, Vinicius Costa Gomes



Le 26/08/2025 à 06:13, K Prateek Nayak a écrit :
> Unify the tl_mc_mask() wrapper around cpu_coregroup_mask() used by core,
> x86, powerpc, and s390.
> 
> No functional changes intended.
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>   arch/powerpc/kernel/smp.c      | 7 +------
>   arch/s390/kernel/topology.c    | 7 +------
>   arch/x86/kernel/smpboot.c      | 7 -------
>   include/linux/sched/topology.h | 8 +++++++-
>   kernel/sched/topology.c        | 7 -------
>   5 files changed, 9 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index c58ddf84fe63..40719679385b 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1062,11 +1062,6 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
>   
>   	return cpu_node_mask(cpu);
>   }
> -
> -static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return cpu_corgrp_mask(cpu);
> -}
>   #endif
>   
>   static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
> @@ -1729,7 +1724,7 @@ static void __init build_sched_topology(void)
>   #ifdef CONFIG_SCHED_MC
>   	if (has_coregroup_support()) {
>   		powerpc_topology[i++] =
> -			SDTL_INIT(cpu_mc_mask, powerpc_shared_proc_flags, MC);
> +			SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
>   	}
>   #endif
>   
> diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
> index c88eda847309..8dbf32f362e1 100644
> --- a/arch/s390/kernel/topology.c
> +++ b/arch/s390/kernel/topology.c
> @@ -514,11 +514,6 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
>   	return &cpu_topology[cpu].core_mask;
>   }
>   
> -static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return &cpu_topology[cpu].core_mask;
> -}
> -
>   static const struct cpumask *cpu_book_mask(struct sched_domain_topology_level *tl, int cpu)
>   {
>   	return &cpu_topology[cpu].book_mask;
> @@ -536,7 +531,7 @@ static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl
>   
>   static struct sched_domain_topology_level s390_topology[] = {
>   	SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT),
> -	SDTL_INIT(cpu_mc_mask, cpu_core_flags, MC),
> +	SDTL_INIT(tl_mc_mask, cpu_core_flags, MC),
>   	SDTL_INIT(cpu_book_mask, NULL, BOOK),
>   	SDTL_INIT(cpu_drawer_mask, NULL, DRAWER),
>   	SDTL_INIT(cpu_pkg_mask, NULL, PKG),
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 81a40d777d65..bfbcac9a73d1 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -472,13 +472,6 @@ static int x86_cluster_flags(void)
>   }
>   #endif
>   
> -#ifdef CONFIG_SCHED_MC
> -static const struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return cpu_coregroup_mask(cpu);
> -}
> -#endif
> -
>   static const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
>   {
>   	return cpu_node_mask(cpu);
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index e54501cc8e47..075d1f063668 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -63,7 +63,13 @@ static inline int cpu_core_flags(void)
>   {
>   	return SD_SHARE_LLC;
>   }
> -#endif
> +
> +static const __maybe_unused

Same as the two previous patches, __maybe_unused shouldn't be required.

> +struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
> +{
> +	return cpu_coregroup_mask(cpu);
> +}
> +#endif /* CONFIG_SCHED_MC */
>   
>   #ifdef CONFIG_NUMA
>   static inline int cpu_numa_flags(void)
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 4530cbad41e1..77d14430c5e1 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1724,13 +1724,6 @@ sd_init(struct sched_domain_topology_level *tl,
>   	return sd;
>   }
>   
> -#ifdef CONFIG_SCHED_MC
> -static const struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return cpu_coregroup_mask(cpu);
> -}
> -#endif
> -
>   static const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
>   {
>   	return cpu_node_mask(cpu);


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 8/8] sched/topology: Unify tl_pkg_mask() across core and all arch
  2025-08-26  4:13 ` [PATCH v7 8/8] sched/topology: Unify tl_pkg_mask() " K Prateek Nayak
@ 2025-08-26  5:16   ` Christophe Leroy
  0 siblings, 0 replies; 36+ messages in thread
From: Christophe Leroy @ 2025-08-26  5:16 UTC (permalink / raw)
  To: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, thomas.weissschuh, Li Chen, Bibo Mao,
	Mete Durlu, Tobias Huschle, Easwar Hariharan, Guo Weikang,
	Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal,
	Yury Norov [NVIDIA], Sudeep Holla, Jonathan Cameron, Andrea Righi,
	Yicong Yang, Ricardo Neri, Tim Chen, Vinicius Costa Gomes



Le 26/08/2025 à 06:13, K Prateek Nayak a écrit :
> Unify the tl_pkg_mask() wrapper around cpu_nod_mask() across core, x86,
> powerpc, and s390.
> 
> No functional changes intended.
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>   arch/powerpc/kernel/smp.c      | 7 +------
>   arch/s390/kernel/topology.c    | 7 +------
>   arch/x86/kernel/smpboot.c      | 5 -----
>   include/linux/sched/topology.h | 6 ++++++
>   kernel/sched/topology.c        | 5 -----
>   5 files changed, 8 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 40719679385b..8e869c13f7ed 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1064,11 +1064,6 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
>   }
>   #endif
>   
> -static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return cpu_node_mask(cpu);
> -}
> -
>   static int __init init_big_cores(void)
>   {
>   	int cpu;
> @@ -1728,7 +1723,7 @@ static void __init build_sched_topology(void)
>   	}
>   #endif
>   
> -	powerpc_topology[i++] = SDTL_INIT(cpu_pkg_mask, powerpc_shared_proc_flags, PKG);
> +	powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask, powerpc_shared_proc_flags, PKG);
>   
>   	/* There must be one trailing NULL entry left.  */
>   	BUG_ON(i >= ARRAY_SIZE(powerpc_topology) - 1);
> diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
> index 8dbf32f362e1..8f5b6ecc055f 100644
> --- a/arch/s390/kernel/topology.c
> +++ b/arch/s390/kernel/topology.c
> @@ -524,17 +524,12 @@ static const struct cpumask *cpu_drawer_mask(struct sched_domain_topology_level
>   	return &cpu_topology[cpu].drawer_mask;
>   }
>   
> -static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return cpu_node_mask(cpu);
> -}
> -
>   static struct sched_domain_topology_level s390_topology[] = {
>   	SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT),
>   	SDTL_INIT(tl_mc_mask, cpu_core_flags, MC),
>   	SDTL_INIT(cpu_book_mask, NULL, BOOK),
>   	SDTL_INIT(cpu_drawer_mask, NULL, DRAWER),
> -	SDTL_INIT(cpu_pkg_mask, NULL, PKG),
> +	SDTL_INIT(tl_pkg_mask, NULL, PKG),
>   	{ NULL, },
>   };
>   
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index bfbcac9a73d1..6c0ab30a80e2 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -472,11 +472,6 @@ static int x86_cluster_flags(void)
>   }
>   #endif
>   
> -static const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return cpu_node_mask(cpu);
> -}
> -
>   /*
>    * Set if a package/die has multiple NUMA nodes inside.
>    * AMD Magny-Cours, Intel Cluster-on-Die, and Intel
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index 075d1f063668..807603bfe8ff 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -71,6 +71,12 @@ struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
>   }
>   #endif /* CONFIG_SCHED_MC */
>   
> +static const __maybe_unused

Same as the three previous patches, __maybe_unused shouldn't be required.

> +struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
> +{
> +	return cpu_node_mask(cpu);
> +}
> +
>   #ifdef CONFIG_NUMA
>   static inline int cpu_numa_flags(void)
>   {
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 77d14430c5e1..18889bd97e22 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1724,11 +1724,6 @@ sd_init(struct sched_domain_topology_level *tl,
>   	return sd;
>   }
>   
> -static const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
> -{
> -	return cpu_node_mask(cpu);
> -}
> -
>   /*
>    * Topology list, bottom-up.
>    */


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 5/8] sched/topology: Unify tl_smt_mask() across core and all arch
  2025-08-26  4:13 ` [PATCH v7 5/8] sched/topology: Unify tl_smt_mask() across core and all arch K Prateek Nayak
  2025-08-26  5:13   ` Christophe Leroy
@ 2025-08-26  8:01   ` Peter Zijlstra
  2025-08-26  8:11     ` Christophe Leroy
  1 sibling, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2025-08-26  8:01 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, thomas.weissschuh,
	Li Chen, Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index 602508130c8a..d75fbb7d9667 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -37,7 +37,13 @@ static inline int cpu_smt_flags(void)
>  {
>  	return SD_SHARE_CPUCAPACITY | SD_SHARE_LLC;
>  }
> -#endif
> +
> +static const __maybe_unused
> +struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
> +{
> +	return cpu_smt_mask(cpu);
> +}
> +#endif /* CONFIG_SCHED_SMT */

Problem with that __maybe_unused is that you forgot inline.

static inline const
struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
{
	return cpu_smt_mask(cpu);
}

seems to make it happy.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
  2025-08-26  4:49   ` Christophe Leroy
@ 2025-08-26  8:07     ` Peter Zijlstra
  2025-08-26  9:43       ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2025-08-26  8:07 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Juri Lelli, Vincent Guittot, linuxppc-dev, linux-kernel,
	linux-s390, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

On Tue, Aug 26, 2025 at 06:49:29AM +0200, Christophe Leroy wrote:
> 
> 
> Le 26/08/2025 à 06:13, K Prateek Nayak a écrit :
> > PowerPC enables the MC scheduling domain by default on systems with
> > coregroup support without having a SCHED_MC config in Kconfig.
> > 
> > The scheduler uses CONFIG_SCHED_MC to introduce the MC domain in the
> > default topology (core) and to optimize the default CPU selection
> > routine (sched-ext).
> > 
> > Introduce CONFIG_SCHED_MC for powerpc and note that it should be
> > preferably enabled given the current default behavior. This also ensures
> > PowerPC is tested during future developments that come to depend on
> > CONFIG_SCHED_MC.
> > 
> > Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> > ---
> >   arch/powerpc/Kconfig           | 9 +++++++++
> >   arch/powerpc/include/asm/smp.h | 2 ++
> >   arch/powerpc/kernel/smp.c      | 4 ++++
> >   3 files changed, 15 insertions(+)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index 93402a1d9c9f..e954ab3f635f 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -971,6 +971,15 @@ config SCHED_SMT
> >   	  when dealing with POWER5 cpus at a cost of slightly increased
> >   	  overhead in some places. If unsure say N here.
> > +config SCHED_MC
> > +	bool "Multi-Core Cache (MC) scheduler support"
> > +	depends on PPC64 && SMP
> > +	default y
> > +	help
> > +	  MC scheduler support improves the CPU scheduler's decision making
> > +	  when dealing with POWER systems that contain multiple Last Level
> > +	  Cache instances on the same socket. If unsure say Y here.
> > +
> 
> You shouldn't duplicate CONFIG_SCHED_MC in every architecture, instead you
> should define a CONFIG_ARCH_HAS_SCHED_MC in arch/Kconfig that gets selected
> by architectures then have CONFIG_SCHED_MC defined in init/Kconfig or
> kernel/Kconfig or so.

Let me add this first -- it is currently duplicated. Then I'll see about
merging the thing across architectures.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 5/8] sched/topology: Unify tl_smt_mask() across core and all arch
  2025-08-26  8:01   ` Peter Zijlstra
@ 2025-08-26  8:11     ` Christophe Leroy
  2025-08-26  8:24       ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Christophe Leroy @ 2025-08-26  8:11 UTC (permalink / raw)
  To: Peter Zijlstra, K Prateek Nayak
  Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Juri Lelli, Vincent Guittot, linuxppc-dev, linux-kernel,
	linux-s390, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes



Le 26/08/2025 à 10:01, Peter Zijlstra a écrit :
>> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
>> index 602508130c8a..d75fbb7d9667 100644
>> --- a/include/linux/sched/topology.h
>> +++ b/include/linux/sched/topology.h
>> @@ -37,7 +37,13 @@ static inline int cpu_smt_flags(void)
>>   {
>>   	return SD_SHARE_CPUCAPACITY | SD_SHARE_LLC;
>>   }
>> -#endif
>> +
>> +static const __maybe_unused
>> +struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
>> +{
>> +	return cpu_smt_mask(cpu);
>> +}
>> +#endif /* CONFIG_SCHED_SMT */
> 
> Problem with that __maybe_unused is that you forgot inline.
> 
> static inline const
> struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
> {
> 	return cpu_smt_mask(cpu);
> }
> 
> seems to make it happy.
> 

But the function is referenced by SDTL_INIT() macro so there is no real 
point in declaring it inline. Would be cleaner to have it defined in a C 
file.

Christophe

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 5/8] sched/topology: Unify tl_smt_mask() across core and all arch
  2025-08-26  8:11     ` Christophe Leroy
@ 2025-08-26  8:24       ` Peter Zijlstra
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2025-08-26  8:24 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Juri Lelli, Vincent Guittot, linuxppc-dev, linux-kernel,
	linux-s390, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

On Tue, Aug 26, 2025 at 10:11:40AM +0200, Christophe Leroy wrote:
> 
> 
> Le 26/08/2025 à 10:01, Peter Zijlstra a écrit :
> > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > > index 602508130c8a..d75fbb7d9667 100644
> > > --- a/include/linux/sched/topology.h
> > > +++ b/include/linux/sched/topology.h
> > > @@ -37,7 +37,13 @@ static inline int cpu_smt_flags(void)
> > >   {
> > >   	return SD_SHARE_CPUCAPACITY | SD_SHARE_LLC;
> > >   }
> > > -#endif
> > > +
> > > +static const __maybe_unused
> > > +struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
> > > +{
> > > +	return cpu_smt_mask(cpu);
> > > +}
> > > +#endif /* CONFIG_SCHED_SMT */
> > 
> > Problem with that __maybe_unused is that you forgot inline.
> > 
> > static inline const
> > struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
> > {
> > 	return cpu_smt_mask(cpu);
> > }
> > 
> > seems to make it happy.
> > 
> 
> But the function is referenced by SDTL_INIT() macro so there is no real
> point in declaring it inline. Would be cleaner to have it defined in a C
> file.

Ah, that's what you mean. I was more focussed on getting rid of that
horrible __maybe_unused and then either works. But yes, perhaps just
having them in a .c file is best.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
  2025-08-26  4:13 ` [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits K Prateek Nayak
  2025-08-26  4:49   ` Christophe Leroy
@ 2025-08-26  9:27   ` Shrikanth Hegde
  2025-09-01  4:50     ` K Prateek Nayak
  1 sibling, 1 reply; 36+ messages in thread
From: Shrikanth Hegde @ 2025-08-26  9:27 UTC (permalink / raw)
  To: K Prateek Nayak, Andrea Righi
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, thomas.weissschuh, Li Chen, Bibo Mao,
	Mete Durlu, Tobias Huschle, Easwar Hariharan, Guo Weikang,
	Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal,
	Yury Norov [NVIDIA], Sudeep Holla, Jonathan Cameron, Yicong Yang,
	Ricardo Neri, Tim Chen, Vinicius Costa Gomes, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Juri Lelli, Vincent Guittot, linuxppc-dev, linux-kernel,
	linux-s390, Peter Zijlstra



On 8/26/25 9:43 AM, K Prateek Nayak wrote:
> PowerPC enables the MC scheduling domain by default on systems with
> coregroup support without having a SCHED_MC config in Kconfig.
> 
> The scheduler uses CONFIG_SCHED_MC to introduce the MC domain in the
> default topology (core) and to optimize the default CPU selection
> routine (sched-ext).

Curious to know if sched_ext usage. i see below code.

if (static_branch_maybe(CONFIG_SCHED_MC, &scx_selcpu_topo_llc))

scx_selcpu_topo_llc = true if there is sd_llc. One can have llc domain without MC domain.
I am wondering whats the reason behind the clubbing.

> 
> Introduce CONFIG_SCHED_MC for powerpc and note that it should be
> preferably enabled given the current default behavior. This also ensures
> PowerPC is tested during future developments that come to depend on
> CONFIG_SCHED_MC.
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>   arch/powerpc/Kconfig           | 9 +++++++++
>   arch/powerpc/include/asm/smp.h | 2 ++
>   arch/powerpc/kernel/smp.c      | 4 ++++
>   3 files changed, 15 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 93402a1d9c9f..e954ab3f635f 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -971,6 +971,15 @@ config SCHED_SMT
>   	  when dealing with POWER5 cpus at a cost of slightly increased
>   	  overhead in some places. If unsure say N here.
>   
> +config SCHED_MC
> +	bool "Multi-Core Cache (MC) scheduler support"
> +	depends on PPC64 && SMP
> +	default y
> +	help
> +	  MC scheduler support improves the CPU scheduler's decision making
> +	  when dealing with POWER systems that contain multiple Last Level
> +	  Cache instances on the same socket. If unsure say Y here.
> +
>   config PPC_DENORMALISATION
>   	bool "PowerPC denormalisation exception handling"
>   	depends on PPC_BOOK3S_64
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index 86de4d0dd0aa..9a320d96e891 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -148,7 +148,9 @@ static inline const struct cpumask *cpu_smt_mask(int cpu)
>   }
>   #endif /* CONFIG_SCHED_SMT */
>   
> +#ifdef CONFIG_SCHED_MC
>   extern const struct cpumask *cpu_coregroup_mask(int cpu);
> +#endif
>   

Is ifdef necessary here?

>   /* Since OpenPIC has only 4 IPIs, we use slightly different message numbers.
>    *
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index e623f2864dc4..7f79b853b221 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1059,6 +1059,7 @@ static bool has_coregroup_support(void)
>   	return coregroup_enabled;
>   }
>   
> +#ifdef CONFIG_SCHED_MC
>   const struct cpumask *cpu_coregroup_mask(int cpu)
>   {
>   	if (has_coregroup_support())
> @@ -1071,6 +1072,7 @@ static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl,
>   {
>   	return cpu_corgrp_mask(cpu);
>   }
> +#endif
>   

Previous patch says cpu_coregroup_mask is exported. Is it exported in any way to user or modules?

Also i don't see similar gating in other archs. It maybe unnecessary.

>   static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
>   {
> @@ -1729,10 +1731,12 @@ static void __init build_sched_topology(void)
>   			SDTL_INIT(shared_cache_mask, powerpc_shared_cache_flags, CACHE);
>   	}
>   
> +#ifdef CONFIG_SCHED_MC
>   	if (has_coregroup_support()) {
>   		powerpc_topology[i++] =
>   			SDTL_INIT(cpu_mc_mask, powerpc_shared_proc_flags, MC);
>   	}
> +#endif

Just this gating should suffice IMO.>   
>   	powerpc_topology[i++] = SDTL_INIT(cpu_pkg_mask, powerpc_shared_proc_flags, PKG);
>   


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
  2025-08-26  8:07     ` Peter Zijlstra
@ 2025-08-26  9:43       ` Peter Zijlstra
  2025-08-26  9:59         ` Peter Zijlstra
  2025-08-28 14:43         ` Shrikanth Hegde
  0 siblings, 2 replies; 36+ messages in thread
From: Peter Zijlstra @ 2025-08-26  9:43 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Juri Lelli, Vincent Guittot, linuxppc-dev, linux-kernel,
	linux-s390, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

On Tue, Aug 26, 2025 at 10:07:06AM +0200, Peter Zijlstra wrote:
> On Tue, Aug 26, 2025 at 06:49:29AM +0200, Christophe Leroy wrote:
> > 
> > 
> > Le 26/08/2025 à 06:13, K Prateek Nayak a écrit :
> > > PowerPC enables the MC scheduling domain by default on systems with
> > > coregroup support without having a SCHED_MC config in Kconfig.
> > > 
> > > The scheduler uses CONFIG_SCHED_MC to introduce the MC domain in the
> > > default topology (core) and to optimize the default CPU selection
> > > routine (sched-ext).
> > > 
> > > Introduce CONFIG_SCHED_MC for powerpc and note that it should be
> > > preferably enabled given the current default behavior. This also ensures
> > > PowerPC is tested during future developments that come to depend on
> > > CONFIG_SCHED_MC.
> > > 
> > > Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> > > ---
> > >   arch/powerpc/Kconfig           | 9 +++++++++
> > >   arch/powerpc/include/asm/smp.h | 2 ++
> > >   arch/powerpc/kernel/smp.c      | 4 ++++
> > >   3 files changed, 15 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > > index 93402a1d9c9f..e954ab3f635f 100644
> > > --- a/arch/powerpc/Kconfig
> > > +++ b/arch/powerpc/Kconfig
> > > @@ -971,6 +971,15 @@ config SCHED_SMT
> > >   	  when dealing with POWER5 cpus at a cost of slightly increased
> > >   	  overhead in some places. If unsure say N here.
> > > +config SCHED_MC
> > > +	bool "Multi-Core Cache (MC) scheduler support"
> > > +	depends on PPC64 && SMP
> > > +	default y
> > > +	help
> > > +	  MC scheduler support improves the CPU scheduler's decision making
> > > +	  when dealing with POWER systems that contain multiple Last Level
> > > +	  Cache instances on the same socket. If unsure say Y here.
> > > +
> > 
> > You shouldn't duplicate CONFIG_SCHED_MC in every architecture, instead you
> > should define a CONFIG_ARCH_HAS_SCHED_MC in arch/Kconfig that gets selected
> > by architectures then have CONFIG_SCHED_MC defined in init/Kconfig or
> > kernel/Kconfig or so.
> 
> Let me add this first -- it is currently duplicated. Then I'll see about
> merging the thing across architectures.

So what I added to power was:

config SCHED_MC
	def_bool y
	depends on PPC64 && SMP

because that is more or less the behaviour that was there, per the
existing SDTL_INIT().

---

Now, when I look at unifying those config options (there's a metric ton
of crap that's duplicated in the arch/*/Kconfig), I end up with something
like the below.

And while that isn't exact, it is the closest I could make it without
making a giant mess of things.

WDYT?

---
 Kconfig           |   38 ++++++++++++++++++++++++++++++++++++++
 arm/Kconfig       |   18 ++----------------
 arm64/Kconfig     |   26 +++-----------------------
 loongarch/Kconfig |   19 ++-----------------
 mips/Kconfig      |   16 ++--------------
 parisc/Kconfig    |    9 +--------
 powerpc/Kconfig   |   15 +++------------
 riscv/Kconfig     |    9 +--------
 s390/Kconfig      |    8 ++------
 sparc/Kconfig     |   20 ++------------------
 x86/Kconfig       |   27 ++++-----------------------
 11 files changed, 60 insertions(+), 145 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -41,6 +41,44 @@ config HOTPLUG_SMT
 config SMT_NUM_THREADS_DYNAMIC
 	bool
 
+config ARCH_SUPPORTS_SCHED_SMT
+	bool
+
+config ARCH_SUPPORTS_SCHED_CLUSTER
+	bool
+
+config ARCH_SUPPORTS_SCHED_MC
+	bool
+
+config SCHED_SMT
+	bool "SMT (Hyperthreading) scheduler support"
+	depends on ARCH_SUPPORTS_SCHED_SMT
+	default y
+	help
+	  Improves the CPU scheduler's decision making when dealing with
+	  MultiThreading at a cost of slightly increased overhead in some
+	  places. If unsure say N here.
+
+config SCHED_CLUSTER
+	bool "Cluster scheduler support"
+	depends on ARCH_SUPPORTS_SCHED_CLUSTER
+	default y
+	help
+	  Cluster scheduler support improves the CPU scheduler's decision
+	  making when dealing with machines that have clusters of CPUs.
+	  Cluster usually means a couple of CPUs which are placed closely
+	  by sharing mid-level caches, last-level cache tags or internal
+	  busses.
+
+config SCHED_MC
+	bool "Multi-Core Cache (MC) scheduler support"
+	depends on ARCH_SUPPORTS_SCHED_MC
+	default y
+	help
+	  Multi-core scheduler support improves the CPU scheduler's decision
+	  making when dealing with multi-core CPU chips at a cost of slightly
+	  increased overhead in some places. If unsure say N here.
+
 # Selected by HOTPLUG_CORE_SYNC_DEAD or HOTPLUG_CORE_SYNC_FULL
 config HOTPLUG_CORE_SYNC
 	bool
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -941,28 +941,14 @@ config IRQSTACKS
 config ARM_CPU_TOPOLOGY
 	bool "Support cpu topology definition"
 	depends on SMP && CPU_V7
+	select ARCH_SUPPORTS_SCHED_MC
+	select ARCH_SUPPORTS_SCHED_SMT
 	default y
 	help
 	  Support ARM cpu topology definition. The MPIDR register defines
 	  affinity between processors which is then used to describe the cpu
 	  topology of an ARM System.
 
-config SCHED_MC
-	bool "Multi-core scheduler support"
-	depends on ARM_CPU_TOPOLOGY
-	help
-	  Multi-core scheduler support improves the CPU scheduler's decision
-	  making when dealing with multi-core CPU chips at a cost of slightly
-	  increased overhead in some places. If unsure say N here.
-
-config SCHED_SMT
-	bool "SMT scheduler support"
-	depends on ARM_CPU_TOPOLOGY
-	help
-	  Improves the CPU scheduler's decision making when dealing with
-	  MultiThreading at a cost of slightly increased overhead in some
-	  places. If unsure say N here.
-
 config HAVE_ARM_SCU
 	bool
 	help
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -108,6 +108,9 @@ config ARM64
 	select ARCH_SUPPORTS_PER_VMA_LOCK
 	select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
 	select ARCH_SUPPORTS_RT
+	select ARCH_SUPPORTS_SCHED_SMT
+	select ARCH_SUPPORTS_SCHED_CLUSTER
+	select ARCH_SUPPORTS_SCHED_MC
 	select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
 	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
 	select ARCH_WANT_DEFAULT_BPF_JIT
@@ -1505,29 +1508,6 @@ config CPU_LITTLE_ENDIAN
 
 endchoice
 
-config SCHED_MC
-	bool "Multi-core scheduler support"
-	help
-	  Multi-core scheduler support improves the CPU scheduler's decision
-	  making when dealing with multi-core CPU chips at a cost of slightly
-	  increased overhead in some places. If unsure say N here.
-
-config SCHED_CLUSTER
-	bool "Cluster scheduler support"
-	help
-	  Cluster scheduler support improves the CPU scheduler's decision
-	  making when dealing with machines that have clusters of CPUs.
-	  Cluster usually means a couple of CPUs which are placed closely
-	  by sharing mid-level caches, last-level cache tags or internal
-	  busses.
-
-config SCHED_SMT
-	bool "SMT scheduler support"
-	help
-	  Improves the CPU scheduler's decision making when dealing with
-	  MultiThreading at a cost of slightly increased overhead in some
-	  places. If unsure say N here.
-
 config NR_CPUS
 	int "Maximum number of CPUs (2-4096)"
 	range 2 4096
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -70,6 +70,8 @@ config LOONGARCH
 	select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
 	select ARCH_SUPPORTS_NUMA_BALANCING
 	select ARCH_SUPPORTS_RT
+	select ARCH_SUPPORTS_SCHED_SMT if SMP
+	select ARCH_SUPPORTS_SCHED_MC  if SMP
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_MEMTEST
@@ -448,23 +450,6 @@ config EFI_STUB
 	  This kernel feature allows the kernel to be loaded directly by
 	  EFI firmware without the use of a bootloader.
 
-config SCHED_SMT
-	bool "SMT scheduler support"
-	depends on SMP
-	default y
-	help
-	  Improves scheduler's performance when there are multiple
-	  threads in one physical core.
-
-config SCHED_MC
-	bool "Multi-core scheduler support"
-	depends on SMP
-	default y
-	help
-	  Multi-core scheduler support improves the CPU scheduler's decision
-	  making when dealing with multi-core CPU chips at a cost of slightly
-	  increased overhead in some places.
-
 config SMP
 	bool "Multi-Processing support"
 	help
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2223,7 +2223,7 @@ config MIPS_MT_SMP
 	select SMP
 	select SMP_UP
 	select SYS_SUPPORTS_SMP
-	select SYS_SUPPORTS_SCHED_SMT
+	select ARCH_SUPPORTS_SCHED_SMT
 	select MIPS_PERF_SHARED_TC_COUNTERS
 	help
 	  This is a kernel model which is known as SMVP. This is supported
@@ -2235,18 +2235,6 @@ config MIPS_MT_SMP
 config MIPS_MT
 	bool
 
-config SCHED_SMT
-	bool "SMT (multithreading) scheduler support"
-	depends on SYS_SUPPORTS_SCHED_SMT
-	default n
-	help
-	  SMT scheduler support improves the CPU scheduler's decision making
-	  when dealing with MIPS MT enabled cores at a cost of slightly
-	  increased overhead in some places. If unsure say N here.
-
-config SYS_SUPPORTS_SCHED_SMT
-	bool
-
 config SYS_SUPPORTS_MULTITHREADING
 	bool
 
@@ -2318,7 +2306,7 @@ config MIPS_CPS
 	select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
 	select SYNC_R4K if (CEVT_R4K || CSRC_R4K)
 	select SYS_SUPPORTS_HOTPLUG_CPU
-	select SYS_SUPPORTS_SCHED_SMT if CPU_MIPSR6
+	select ARCH_SUPPORTS_SCHED_SMT if CPU_MIPSR6
 	select SYS_SUPPORTS_SMP
 	select WEAK_ORDERING
 	select GENERIC_IRQ_MIGRATION if HOTPLUG_CPU
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -44,6 +44,7 @@ config PARISC
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_ARCH_TOPOLOGY if SMP
+	select ARCH_SUPPORTS_SCHED_MC if SMP && PA8X00
 	select GENERIC_CPU_DEVICES if !SMP
 	select GENERIC_LIB_DEVMEM_IS_ALLOWED
 	select SYSCTL_ARCH_UNALIGN_ALLOW
@@ -319,14 +320,6 @@ config SMP
 
 	  If you don't know what to do here, say N.
 
-config SCHED_MC
-	bool "Multi-core scheduler support"
-	depends on GENERIC_ARCH_TOPOLOGY && PA8X00
-	help
-	  Multi-core scheduler support improves the CPU scheduler's decision
-	  making when dealing with multi-core CPU chips at a cost of slightly
-	  increased overhead in some places. If unsure say N here.
-
 config IRQSTACKS
 	bool "Use separate kernel stacks when processing interrupts"
 	default y
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -170,6 +170,9 @@ config PPC
 	select ARCH_STACKWALK
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC	if PPC_BOOK3S || PPC_8xx
+	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP
+	select ARCH_SUPPORTS_SCHED_MC		if PPC64 && SMP
+	select SCHED_MC				if ARCH_SUPPORTS_SCHED_MC
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_CMPXCHG_LOCKREF		if PPC64
 	select ARCH_USE_MEMTEST
@@ -963,18 +966,6 @@ config PPC_PROT_SAO_LPAR
 config PPC_COPRO_BASE
 	bool
 
-config SCHED_SMT
-	bool "SMT (Hyperthreading) scheduler support"
-	depends on PPC64 && SMP
-	help
-	  SMT scheduler support improves the CPU scheduler's decision making
-	  when dealing with POWER5 cpus at a cost of slightly increased
-	  overhead in some places. If unsure say N here.
-
-config SCHED_MC
-	def_bool y
-	depends on PPC64 && SMP
-
 config PPC_DENORMALISATION
 	bool "PowerPC denormalisation exception handling"
 	depends on PPC_BOOK3S_64
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -72,6 +72,7 @@ config RISCV
 	select ARCH_SUPPORTS_PER_VMA_LOCK if MMU
 	select ARCH_SUPPORTS_RT
 	select ARCH_SUPPORTS_SHADOW_CALL_STACK if HAVE_SHADOW_CALL_STACK
+	select ARCH_SUPPORTS_SCHED_MC if SMP
 	select ARCH_USE_CMPXCHG_LOCKREF if 64BIT
 	select ARCH_USE_MEMTEST
 	select ARCH_USE_QUEUED_RWLOCKS
@@ -453,14 +454,6 @@ config SMP
 
 	  If you don't know what to do here, say N.
 
-config SCHED_MC
-	bool "Multi-core scheduler support"
-	depends on SMP
-	help
-	  Multi-core scheduler support improves the CPU scheduler's decision
-	  making when dealing with multi-core CPU chips at a cost of slightly
-	  increased overhead in some places. If unsure say N here.
-
 config NR_CPUS
 	int "Maximum number of CPUs (2-512)"
 	depends on SMP
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -547,15 +547,11 @@ config NODES_SHIFT
 	depends on NUMA
 	default "1"
 
-config SCHED_SMT
-	def_bool n
-
-config SCHED_MC
-	def_bool n
-
 config SCHED_TOPOLOGY
 	def_bool y
 	prompt "Topology scheduler support"
+	select ARCH_SUPPORTS_SCHED_SMT
+	select ARCH_SUPPORTS_SCHED_MC
 	select SCHED_SMT
 	select SCHED_MC
 	help
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -110,6 +110,8 @@ config SPARC64
 	select HAVE_SETUP_PER_CPU_AREA
 	select NEED_PER_CPU_EMBED_FIRST_CHUNK
 	select NEED_PER_CPU_PAGE_FIRST_CHUNK
+	select ARCH_SUPPORTS_SCHED_SMT if SMP
+	select ARCH_SUPPORTS_SCHED_MC  if SMP
 
 config ARCH_PROC_KCORE_TEXT
 	def_bool y
@@ -288,24 +290,6 @@ if SPARC64 || COMPILE_TEST
 source "kernel/power/Kconfig"
 endif
 
-config SCHED_SMT
-	bool "SMT (Hyperthreading) scheduler support"
-	depends on SPARC64 && SMP
-	default y
-	help
-	  SMT scheduler support improves the CPU scheduler's decision making
-	  when dealing with SPARC cpus at a cost of slightly increased overhead
-	  in some places. If unsure say N here.
-
-config SCHED_MC
-	bool "Multi-core scheduler support"
-	depends on SPARC64 && SMP
-	default y
-	help
-	  Multi-core scheduler support improves the CPU scheduler's decision
-	  making when dealing with multi-core CPU chips at a cost of slightly
-	  increased overhead in some places. If unsure say N here.
-
 config CMDLINE_BOOL
 	bool "Default bootloader kernel arguments"
 	depends on SPARC64
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -330,6 +330,10 @@ config X86
 	imply IMA_SECURE_AND_OR_TRUSTED_BOOT    if EFI
 	select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
 	select ARCH_SUPPORTS_PT_RECLAIM		if X86_64
+	select ARCH_SUPPORTS_SCHED_SMT		if SMP
+	select SCHED_SMT			if SMP
+	select ARCH_SUPPORTS_SCHED_CLUSTER	if SMP
+	select ARCH_SUPPORTS_SCHED_MC		if SMP
 
 config INSTRUCTION_DECODER
 	def_bool y
@@ -1036,29 +1040,6 @@ config NR_CPUS
 	  This is purely to save memory: each supported CPU adds about 8KB
 	  to the kernel image.
 
-config SCHED_CLUSTER
-	bool "Cluster scheduler support"
-	depends on SMP
-	default y
-	help
-	  Cluster scheduler support improves the CPU scheduler's decision
-	  making when dealing with machines that have clusters of CPUs.
-	  Cluster usually means a couple of CPUs which are placed closely
-	  by sharing mid-level caches, last-level cache tags or internal
-	  busses.
-
-config SCHED_SMT
-	def_bool y if SMP
-
-config SCHED_MC
-	def_bool y
-	prompt "Multi-core scheduler support"
-	depends on SMP
-	help
-	  Multi-core scheduler support improves the CPU scheduler's decision
-	  making when dealing with multi-core CPU chips at a cost of slightly
-	  increased overhead in some places. If unsure say N here.
-
 config SCHED_MC_PRIO
 	bool "CPU core priorities scheduler support"
 	depends on SCHED_MC

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
  2025-08-26  9:43       ` Peter Zijlstra
@ 2025-08-26  9:59         ` Peter Zijlstra
  2025-08-28 14:43         ` Shrikanth Hegde
  1 sibling, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2025-08-26  9:59 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Juri Lelli, Vincent Guittot, linuxppc-dev, linux-kernel,
	linux-s390, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

On Tue, Aug 26, 2025 at 11:43:58AM +0200, Peter Zijlstra wrote:

> Now, when I look at unifying those config options (there's a metric ton
> of crap that's duplicated in the arch/*/Kconfig), I end up with something
> like the below.
> 
> And while that isn't exact, it is the closest I could make it without
> making a giant mess of things.
> 
> WDYT?

Anyway, enough tinkering with this for a little bit. Things are here:

  https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/core

For to robots to provide feedback :-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
  2025-08-26  4:13 [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() K Prateek Nayak
                   ` (7 preceding siblings ...)
  2025-08-26  4:13 ` [PATCH v7 8/8] sched/topology: Unify tl_pkg_mask() " K Prateek Nayak
@ 2025-08-26 10:05 ` Shrikanth Hegde
  2025-08-26 10:13   ` Peter Zijlstra
  8 siblings, 1 reply; 36+ messages in thread
From: Shrikanth Hegde @ 2025-08-26 10:05 UTC (permalink / raw)
  To: K Prateek Nayak, Peter Zijlstra
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, thomas.weissschuh, Li Chen, Bibo Mao,
	Mete Durlu, Tobias Huschle, Easwar Hariharan, Guo Weikang,
	Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal,
	Yury Norov [NVIDIA], Sudeep Holla, Jonathan Cameron, Andrea Righi,
	Yicong Yang, Ricardo Neri, Tim Chen, Vinicius Costa Gomes,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390



On 8/26/25 9:43 AM, K Prateek Nayak wrote:
> This version uses Peter's suggestion from [1] as if and incrementally
> adds cleanup on top to the arch/ bits. I've tested the x86 side but the
> PowerPC and the s390 bits are only build tested. Review and feedback is
> greatly appreciated.
> 
> [1] https://lore.kernel.org/lkml/20250825091910.GT3245006@noisy.programming.kicks-ass.net/
> 
> Patches are prepared on top of tip:master at commit 4628e5bbca91 ("Merge
> branch into tip/master: 'x86/tdx'")
> ---
> changelog v6..v7:
> 
> o Fix the s390 and ppc build errors (Intel test robot)
> 
> o Use Peter's diff as is and incrementally do the cleanup on top. The
>    PowerPC part was slightly more extensive due to the lack of
>    CONFIG_SCHED_MC in arch/powerpc/Kconfig.
> 
> v6: https://lore.kernel.org/lkml/20250825120244.11093-1-kprateek.nayak@amd.com/
> ---
> K Prateek Nayak (7):
>    powerpc/smp: Rename cpu_corgroup_* to cpu_corgrp_*
>    powerpc/smp: Export cpu_coregroup_mask()
>    powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
>    sched/topology: Unify tl_smt_mask() across core and all arch
>    sched/topology: Unify tl_cls_mask() across core and x86
>    sched/topology: Unify tl_mc_mask() across core and all arch
>    sched/topology: Unify tl_pkg_mask() across core and all arch
> 
> Peter Zijlstra (1):
>    sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
> 
Can the names be standardized to begin with tl_ ?

arch/powerpc/kernel/smp.c:			SDTL_INIT(smallcore_smt_mask, powerpc_smt_flags, SMT);
arch/powerpc/kernel/smp.c:			SDTL_INIT(shared_cache_mask, powerpc_shared_cache_flags, CACHE);
arch/s390/kernel/topology.c:	SDTL_INIT(cpu_book_mask, NULL, BOOK),
arch/s390/kernel/topology.c:	SDTL_INIT(cpu_drawer_mask, NULL, DRAWER),
kernel/sched/topology.c:	tl[i++] = SDTL_INIT(sd_numa_mask, NULL, NODE);
kernel/sched/topology.c:		tl[i] = SDTL_INIT(sd_numa_mask, cpu_numa_flags, NUMA);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
  2025-08-26 10:05 ` [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() Shrikanth Hegde
@ 2025-08-26 10:13   ` Peter Zijlstra
  2025-08-29  7:53     ` Valentin Schneider
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2025-08-26 10:13 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: K Prateek Nayak, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Juri Lelli, Vincent Guittot, linuxppc-dev, linux-kernel,
	linux-s390

On Tue, Aug 26, 2025 at 03:35:03PM +0530, Shrikanth Hegde wrote:
> 
> 
> On 8/26/25 9:43 AM, K Prateek Nayak wrote:
> > This version uses Peter's suggestion from [1] as if and incrementally
> > adds cleanup on top to the arch/ bits. I've tested the x86 side but the
> > PowerPC and the s390 bits are only build tested. Review and feedback is
> > greatly appreciated.
> > 
> > [1] https://lore.kernel.org/lkml/20250825091910.GT3245006@noisy.programming.kicks-ass.net/
> > 
> > Patches are prepared on top of tip:master at commit 4628e5bbca91 ("Merge
> > branch into tip/master: 'x86/tdx'")
> > ---
> > changelog v6..v7:
> > 
> > o Fix the s390 and ppc build errors (Intel test robot)
> > 
> > o Use Peter's diff as is and incrementally do the cleanup on top. The
> >    PowerPC part was slightly more extensive due to the lack of
> >    CONFIG_SCHED_MC in arch/powerpc/Kconfig.
> > 
> > v6: https://lore.kernel.org/lkml/20250825120244.11093-1-kprateek.nayak@amd.com/
> > ---
> > K Prateek Nayak (7):
> >    powerpc/smp: Rename cpu_corgroup_* to cpu_corgrp_*
> >    powerpc/smp: Export cpu_coregroup_mask()
> >    powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
> >    sched/topology: Unify tl_smt_mask() across core and all arch
> >    sched/topology: Unify tl_cls_mask() across core and x86
> >    sched/topology: Unify tl_mc_mask() across core and all arch
> >    sched/topology: Unify tl_pkg_mask() across core and all arch
> > 
> > Peter Zijlstra (1):
> >    sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
> > 
> Can the names be standardized to begin with tl_ ?
> 
> arch/powerpc/kernel/smp.c:			SDTL_INIT(smallcore_smt_mask, powerpc_smt_flags, SMT);
> arch/powerpc/kernel/smp.c:			SDTL_INIT(shared_cache_mask, powerpc_shared_cache_flags, CACHE);
> arch/s390/kernel/topology.c:	SDTL_INIT(cpu_book_mask, NULL, BOOK),
> arch/s390/kernel/topology.c:	SDTL_INIT(cpu_drawer_mask, NULL, DRAWER),
> kernel/sched/topology.c:	tl[i++] = SDTL_INIT(sd_numa_mask, NULL, NODE);
> kernel/sched/topology.c:		tl[i] = SDTL_INIT(sd_numa_mask, cpu_numa_flags, NUMA);

Already done :-) My version looks like the below.

I picked up v6 yesterday and this morning started poking at the robot
fail reported before seeing this v7 thing.

Current pile lives here for the robots:

  https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/core

I thought about doing a /cpu_*_flags/tl_*_flags/ patch as well, but
figured this was enough for now.

---
Subject: sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon, 25 Aug 2025 12:02:44 +0000

Leon [1] and Vinicius [2] noted a topology_span_sane() warning during
their testing starting from v6.16-rc1. Debug that followed pointed to
the tl->mask() for the NODE domain being incorrectly resolved to that of
the highest NUMA domain.

tl->mask() for NODE is set to the sd_numa_mask() which depends on the
global "sched_domains_curr_level" hack. "sched_domains_curr_level" is
set to the "tl->numa_level" during tl traversal in build_sched_domains()
calling sd_init() but was not reset before topology_span_sane().

Since "tl->numa_level" still reflected the old value from
build_sched_domains(), topology_span_sane() for the NODE domain trips
when the span of the last NUMA domain overlaps.

Instead of replicating the "sched_domains_curr_level" hack, get rid of
it entirely and instead, pass the entire "sched_domain_topology_level"
object to tl->cpumask() function to prevent such mishap in the future.

sd_numa_mask() now directly references "tl->numa_level" instead of
relying on the global "sched_domains_curr_level" hack to index into
sched_domains_numa_masks[].

The original warning was reproducible on the following NUMA topology
reported by Leon:

    $ sudo numactl -H
    available: 5 nodes (0-4)
    node 0 cpus: 0 1
    node 0 size: 2927 MB
    node 0 free: 1603 MB
    node 1 cpus: 2 3
    node 1 size: 3023 MB
    node 1 free: 3008 MB
    node 2 cpus: 4 5
    node 2 size: 3023 MB
    node 2 free: 3007 MB
    node 3 cpus: 6 7
    node 3 size: 3023 MB
    node 3 free: 3002 MB
    node 4 cpus: 8 9
    node 4 size: 3022 MB
    node 4 free: 2718 MB
    node distances:
    node   0   1   2   3   4
      0:  10  39  38  37  36
      1:  39  10  38  37  36
      2:  38  38  10  37  36
      3:  37  37  37  10  36
      4:  36  36  36  36  10

The above topology can be mimicked using the following QEMU cmd that was
used to reproduce the warning and test the fix:

     sudo qemu-system-x86_64 -enable-kvm -cpu host \
     -m 20G -smp cpus=10,sockets=10 -machine q35 \
     -object memory-backend-ram,size=4G,id=m0 \
     -object memory-backend-ram,size=4G,id=m1 \
     -object memory-backend-ram,size=4G,id=m2 \
     -object memory-backend-ram,size=4G,id=m3 \
     -object memory-backend-ram,size=4G,id=m4 \
     -numa node,cpus=0-1,memdev=m0,nodeid=0 \
     -numa node,cpus=2-3,memdev=m1,nodeid=1 \
     -numa node,cpus=4-5,memdev=m2,nodeid=2 \
     -numa node,cpus=6-7,memdev=m3,nodeid=3 \
     -numa node,cpus=8-9,memdev=m4,nodeid=4 \
     -numa dist,src=0,dst=1,val=39 \
     -numa dist,src=0,dst=2,val=38 \
     -numa dist,src=0,dst=3,val=37 \
     -numa dist,src=0,dst=4,val=36 \
     -numa dist,src=1,dst=0,val=39 \
     -numa dist,src=1,dst=2,val=38 \
     -numa dist,src=1,dst=3,val=37 \
     -numa dist,src=1,dst=4,val=36 \
     -numa dist,src=2,dst=0,val=38 \
     -numa dist,src=2,dst=1,val=38 \
     -numa dist,src=2,dst=3,val=37 \
     -numa dist,src=2,dst=4,val=36 \
     -numa dist,src=3,dst=0,val=37 \
     -numa dist,src=3,dst=1,val=37 \
     -numa dist,src=3,dst=2,val=37 \
     -numa dist,src=3,dst=4,val=36 \
     -numa dist,src=4,dst=0,val=36 \
     -numa dist,src=4,dst=1,val=36 \
     -numa dist,src=4,dst=2,val=36 \
     -numa dist,src=4,dst=3,val=36 \
     ...

  [ prateek: Moved common functions to include/linux/sched/topology.h,
    reuse the common bits for s390 and ppc, commit message ]

Closes: https://lore.kernel.org/lkml/20250610110701.GA256154@unreal/ [1]
Fixes: ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap") # ce29a7da84cd, f55dac1dafb3
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/a3de98387abad28592e6ab591f3ff6107fe01dc1.1755893468.git.tim.c.chen@linux.intel.com/ [2]
---
 arch/powerpc/Kconfig                |    4 ++++
 arch/powerpc/include/asm/topology.h |    2 ++
 arch/powerpc/kernel/smp.c           |   27 +++++++++++----------------
 arch/s390/kernel/topology.c         |   20 +++++++-------------
 arch/x86/kernel/smpboot.c           |    8 ++++----
 include/linux/sched/topology.h      |   28 +++++++++++++++++++++++++++-
 include/linux/topology.h            |    2 +-
 kernel/sched/topology.c             |   28 ++++++++++------------------
 8 files changed, 66 insertions(+), 53 deletions(-)

--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -971,6 +971,10 @@ config SCHED_SMT
 	  when dealing with POWER5 cpus at a cost of slightly increased
 	  overhead in some places. If unsure say N here.
 
+config SCHED_MC
+	def_bool y
+	depends on PPC64 && SMP
+
 config PPC_DENORMALISATION
 	bool "PowerPC denormalisation exception handling"
 	depends on PPC_BOOK3S_64
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -134,6 +134,8 @@ static inline int cpu_to_coregroup_id(in
 #ifdef CONFIG_PPC64
 #include <asm/smp.h>
 
+struct cpumask *cpu_coregroup_mask(int cpu);
+
 #define topology_physical_package_id(cpu)	(cpu_to_chip_id(cpu))
 
 #define topology_sibling_cpumask(cpu)	(per_cpu(cpu_sibling_map, cpu))
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1028,19 +1028,19 @@ static int powerpc_shared_proc_flags(voi
  * We can't just pass cpu_l2_cache_mask() directly because
  * returns a non-const pointer and the compiler barfs on that.
  */
-static const struct cpumask *shared_cache_mask(int cpu)
+static const struct cpumask *tl_cache_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return per_cpu(cpu_l2_cache_map, cpu);
 }
 
 #ifdef CONFIG_SCHED_SMT
-static const struct cpumask *smallcore_smt_mask(int cpu)
+static const struct cpumask *tl_smallcore_smt_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return cpu_smallcore_mask(cpu);
 }
 #endif
 
-static struct cpumask *cpu_coregroup_mask(int cpu)
+struct cpumask *cpu_coregroup_mask(int cpu)
 {
 	return per_cpu(cpu_coregroup_map, cpu);
 }
@@ -1054,11 +1054,6 @@ static bool has_coregroup_support(void)
 	return coregroup_enabled;
 }
 
-static const struct cpumask *cpu_mc_mask(int cpu)
-{
-	return cpu_coregroup_mask(cpu);
-}
-
 static int __init init_big_cores(void)
 {
 	int cpu;
@@ -1448,7 +1443,7 @@ static bool update_mask_by_l2(int cpu, c
 		return false;
 	}
 
-	cpumask_and(*mask, cpu_online_mask, cpu_cpu_mask(cpu));
+	cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu));
 
 	/* Update l2-cache mask with all the CPUs that are part of submask */
 	or_cpumasks_related(cpu, cpu, submask_fn, cpu_l2_cache_mask);
@@ -1538,7 +1533,7 @@ static void update_coregroup_mask(int cp
 		return;
 	}
 
-	cpumask_and(*mask, cpu_online_mask, cpu_cpu_mask(cpu));
+	cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu));
 
 	/* Update coregroup mask with all the CPUs that are part of submask */
 	or_cpumasks_related(cpu, cpu, submask_fn, cpu_coregroup_mask);
@@ -1601,7 +1596,7 @@ static void add_cpu_to_masks(int cpu)
 
 	/* If chip_id is -1; limit the cpu_core_mask to within PKG */
 	if (chip_id == -1)
-		cpumask_and(mask, mask, cpu_cpu_mask(cpu));
+		cpumask_and(mask, mask, cpu_node_mask(cpu));
 
 	for_each_cpu(i, mask) {
 		if (chip_id == cpu_to_chip_id(i)) {
@@ -1701,22 +1696,22 @@ static void __init build_sched_topology(
 	if (has_big_cores) {
 		pr_info("Big cores detected but using small core scheduling\n");
 		powerpc_topology[i++] =
-			SDTL_INIT(smallcore_smt_mask, powerpc_smt_flags, SMT);
+			SDTL_INIT(tl_smallcore_smt_mask, powerpc_smt_flags, SMT);
 	} else {
-		powerpc_topology[i++] = SDTL_INIT(cpu_smt_mask, powerpc_smt_flags, SMT);
+		powerpc_topology[i++] = SDTL_INIT(tl_smt_mask, powerpc_smt_flags, SMT);
 	}
 #endif
 	if (shared_caches) {
 		powerpc_topology[i++] =
-			SDTL_INIT(shared_cache_mask, powerpc_shared_cache_flags, CACHE);
+			SDTL_INIT(tl_cache_mask, powerpc_shared_cache_flags, CACHE);
 	}
 
 	if (has_coregroup_support()) {
 		powerpc_topology[i++] =
-			SDTL_INIT(cpu_mc_mask, powerpc_shared_proc_flags, MC);
+			SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
 	}
 
-	powerpc_topology[i++] = SDTL_INIT(cpu_cpu_mask, powerpc_shared_proc_flags, PKG);
+	powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask, powerpc_shared_proc_flags, PKG);
 
 	/* There must be one trailing NULL entry left.  */
 	BUG_ON(i >= ARRAY_SIZE(powerpc_topology) - 1);
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -509,33 +509,27 @@ int topology_cpu_init(struct cpu *cpu)
 	return rc;
 }
 
-static const struct cpumask *cpu_thread_mask(int cpu)
-{
-	return &cpu_topology[cpu].thread_mask;
-}
-
-
 const struct cpumask *cpu_coregroup_mask(int cpu)
 {
 	return &cpu_topology[cpu].core_mask;
 }
 
-static const struct cpumask *cpu_book_mask(int cpu)
+static const struct cpumask *tl_book_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return &cpu_topology[cpu].book_mask;
 }
 
-static const struct cpumask *cpu_drawer_mask(int cpu)
+static const struct cpumask *tl_drawer_mask(struct sched_domain_topology_level *tl, int cpu)
 {
 	return &cpu_topology[cpu].drawer_mask;
 }
 
 static struct sched_domain_topology_level s390_topology[] = {
-	SDTL_INIT(cpu_thread_mask, cpu_smt_flags, SMT),
-	SDTL_INIT(cpu_coregroup_mask, cpu_core_flags, MC),
-	SDTL_INIT(cpu_book_mask, NULL, BOOK),
-	SDTL_INIT(cpu_drawer_mask, NULL, DRAWER),
-	SDTL_INIT(cpu_cpu_mask, NULL, PKG),
+	SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT),
+	SDTL_INIT(tl_mc_mask, cpu_core_flags, MC),
+	SDTL_INIT(tl_book_mask, NULL, BOOK),
+	SDTL_INIT(tl_drawer_mask, NULL, DRAWER),
+	SDTL_INIT(tl_pkg_mask, NULL, PKG),
 	{ NULL, },
 };
 
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -479,14 +479,14 @@ static int x86_cluster_flags(void)
 static bool x86_has_numa_in_package;
 
 static struct sched_domain_topology_level x86_topology[] = {
-	SDTL_INIT(cpu_smt_mask, cpu_smt_flags, SMT),
+	SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT),
 #ifdef CONFIG_SCHED_CLUSTER
-	SDTL_INIT(cpu_clustergroup_mask, x86_cluster_flags, CLS),
+	SDTL_INIT(tl_cls_mask, x86_cluster_flags, CLS),
 #endif
 #ifdef CONFIG_SCHED_MC
-	SDTL_INIT(cpu_coregroup_mask, x86_core_flags, MC),
+	SDTL_INIT(tl_mc_mask, x86_core_flags, MC),
 #endif
-	SDTL_INIT(cpu_cpu_mask, x86_sched_itmt_flags, PKG),
+	SDTL_INIT(tl_pkg_mask, x86_sched_itmt_flags, PKG),
 	{ NULL },
 };
 
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -30,11 +30,19 @@ struct sd_flag_debug {
 };
 extern const struct sd_flag_debug sd_flag_debug[];
 
+struct sched_domain_topology_level;
+
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
 {
 	return SD_SHARE_CPUCAPACITY | SD_SHARE_LLC;
 }
+
+static inline const
+struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_smt_mask(cpu);
+}
 #endif
 
 #ifdef CONFIG_SCHED_CLUSTER
@@ -42,6 +50,12 @@ static inline int cpu_cluster_flags(void
 {
 	return SD_CLUSTER | SD_SHARE_LLC;
 }
+
+static inline const
+struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_clustergroup_mask(cpu);
+}
 #endif
 
 #ifdef CONFIG_SCHED_MC
@@ -49,8 +63,20 @@ static inline int cpu_core_flags(void)
 {
 	return SD_SHARE_LLC;
 }
+
+static inline const
+struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_coregroup_mask(cpu);
+}
 #endif
 
+static inline const
+struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
+{
+	return cpu_node_mask(cpu);
+}
+
 #ifdef CONFIG_NUMA
 static inline int cpu_numa_flags(void)
 {
@@ -172,7 +198,7 @@ bool cpus_equal_capacity(int this_cpu, i
 bool cpus_share_cache(int this_cpu, int that_cpu);
 bool cpus_share_resources(int this_cpu, int that_cpu);
 
-typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
+typedef const struct cpumask *(*sched_domain_mask_f)(struct sched_domain_topology_level *tl, int cpu);
 typedef int (*sched_domain_flags_f)(void);
 
 struct sd_data {
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -260,7 +260,7 @@ static inline bool topology_is_primary_t
 
 #endif
 
-static inline const struct cpumask *cpu_cpu_mask(int cpu)
+static inline const struct cpumask *cpu_node_mask(int cpu)
 {
 	return cpumask_of_node(cpu_to_node(cpu));
 }
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1591,7 +1591,6 @@ static void claim_allocations(int cpu, s
 enum numa_topology_type sched_numa_topology_type;
 
 static int			sched_domains_numa_levels;
-static int			sched_domains_curr_level;
 
 int				sched_max_numa_distance;
 static int			*sched_domains_numa_distance;
@@ -1632,14 +1631,7 @@ sd_init(struct sched_domain_topology_lev
 	int sd_id, sd_weight, sd_flags = 0;
 	struct cpumask *sd_span;
 
-#ifdef CONFIG_NUMA
-	/*
-	 * Ugly hack to pass state to sd_numa_mask()...
-	 */
-	sched_domains_curr_level = tl->numa_level;
-#endif
-
-	sd_weight = cpumask_weight(tl->mask(cpu));
+	sd_weight = cpumask_weight(tl->mask(tl, cpu));
 
 	if (tl->sd_flags)
 		sd_flags = (*tl->sd_flags)();
@@ -1677,7 +1669,7 @@ sd_init(struct sched_domain_topology_lev
 	};
 
 	sd_span = sched_domain_span(sd);
-	cpumask_and(sd_span, cpu_map, tl->mask(cpu));
+	cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu));
 	sd_id = cpumask_first(sd_span);
 
 	sd->flags |= asym_cpu_capacity_classify(sd_span, cpu_map);
@@ -1737,17 +1729,17 @@ sd_init(struct sched_domain_topology_lev
  */
 static struct sched_domain_topology_level default_topology[] = {
 #ifdef CONFIG_SCHED_SMT
-	SDTL_INIT(cpu_smt_mask, cpu_smt_flags, SMT),
+	SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT),
 #endif
 
 #ifdef CONFIG_SCHED_CLUSTER
-	SDTL_INIT(cpu_clustergroup_mask, cpu_cluster_flags, CLS),
+	SDTL_INIT(tl_cls_mask, cpu_cluster_flags, CLS),
 #endif
 
 #ifdef CONFIG_SCHED_MC
-	SDTL_INIT(cpu_coregroup_mask, cpu_core_flags, MC),
+	SDTL_INIT(tl_mc_mask, cpu_core_flags, MC),
 #endif
-	SDTL_INIT(cpu_cpu_mask, NULL, PKG),
+	SDTL_INIT(tl_pkg_mask, NULL, PKG),
 	{ NULL, },
 };
 
@@ -1769,9 +1761,9 @@ void __init set_sched_topology(struct sc
 
 #ifdef CONFIG_NUMA
 
-static const struct cpumask *sd_numa_mask(int cpu)
+static const struct cpumask *sd_numa_mask(struct sched_domain_topology_level *tl, int cpu)
 {
-	return sched_domains_numa_masks[sched_domains_curr_level][cpu_to_node(cpu)];
+	return sched_domains_numa_masks[tl->numa_level][cpu_to_node(cpu)];
 }
 
 static void sched_numa_warn(const char *str)
@@ -2411,7 +2403,7 @@ static bool topology_span_sane(const str
 		 * breaks the linking done for an earlier span.
 		 */
 		for_each_cpu(cpu, cpu_map) {
-			const struct cpumask *tl_cpu_mask = tl->mask(cpu);
+			const struct cpumask *tl_cpu_mask = tl->mask(tl, cpu);
 			int id;
 
 			/* lowest bit set in this mask is used as a unique id */
@@ -2419,7 +2411,7 @@ static bool topology_span_sane(const str
 
 			if (cpumask_test_cpu(id, id_seen)) {
 				/* First CPU has already been seen, ensure identical spans */
-				if (!cpumask_equal(tl->mask(id), tl_cpu_mask))
+				if (!cpumask_equal(tl->mask(tl, id), tl_cpu_mask))
 					return false;
 			} else {
 				/* First CPU hasn't been seen before, ensure it's a completely new span */


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
  2025-08-26  9:43       ` Peter Zijlstra
  2025-08-26  9:59         ` Peter Zijlstra
@ 2025-08-28 14:43         ` Shrikanth Hegde
  2025-09-01  8:35           ` Peter Zijlstra
  1 sibling, 1 reply; 36+ messages in thread
From: Shrikanth Hegde @ 2025-08-28 14:43 UTC (permalink / raw)
  To: Peter Zijlstra, Christophe Leroy
  Cc: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Juri Lelli, Vincent Guittot, linuxppc-dev, linux-kernel,
	linux-s390, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, thomas.weissschuh, Li Chen,
	Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes


Hi Peter.

Looking at this,
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/core&id=9d710c5b2bb37cedf5f09ce988884fb5795e1a76

> 
> WDYT?
> 
> ---
>   Kconfig           |   38 ++++++++++++++++++++++++++++++++++++++
>   arm/Kconfig       |   18 ++----------------
>   arm64/Kconfig     |   26 +++-----------------------
>   loongarch/Kconfig |   19 ++-----------------
>   mips/Kconfig      |   16 ++--------------
>   parisc/Kconfig    |    9 +--------
>   powerpc/Kconfig   |   15 +++------------
>   riscv/Kconfig     |    9 +--------
>   s390/Kconfig      |    8 ++------
>   sparc/Kconfig     |   20 ++------------------
>   x86/Kconfig       |   27 ++++-----------------------
>   11 files changed, 60 insertions(+), 145 deletions(-)
> 
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -41,6 +41,44 @@ config HOTPLUG_SMT
>   config SMT_NUM_THREADS_DYNAMIC
>   	bool
>   
> +config ARCH_SUPPORTS_SCHED_SMT
> +	bool
> +
> +config ARCH_SUPPORTS_SCHED_CLUSTER
> +	bool
> +
> +config ARCH_SUPPORTS_SCHED_MC
> +	bool
> +
> +config SCHED_SMT
> +	bool "SMT (Hyperthreading) scheduler support"
> +	depends on ARCH_SUPPORTS_SCHED_SMT
> +	default y
> +	help
> +	  Improves the CPU scheduler's decision making when dealing with
> +	  MultiThreading at a cost of slightly increased overhead in some
> +	  places. If unsure say N here.
> +
> +config SCHED_CLUSTER
> +	bool "Cluster scheduler support"
> +	depends on ARCH_SUPPORTS_SCHED_CLUSTER
> +	default y
> +	help
> +	  Cluster scheduler support improves the CPU scheduler's decision
> +	  making when dealing with machines that have clusters of CPUs.
> +	  Cluster usually means a couple of CPUs which are placed closely
> +	  by sharing mid-level caches, last-level cache tags or internal
> +	  busses.
> +
> +config SCHED_MC
> +	bool "Multi-Core Cache (MC) scheduler support"
> +	depends on ARCH_SUPPORTS_SCHED_MC
> +	default y
> +	help
> +	  Multi-core scheduler support improves the CPU scheduler's decision
> +	  making when dealing with multi-core CPU chips at a cost of slightly
> +	  increased overhead in some places. If unsure say N here.
> +
>   # Selected by HOTPLUG_CORE_SYNC_DEAD or HOTPLUG_CORE_SYNC_FULL
>   config HOTPLUG_CORE_SYNC

...

> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -170,6 +170,9 @@ config PPC
>   	select ARCH_STACKWALK
>   	select ARCH_SUPPORTS_ATOMIC_RMW
>   	select ARCH_SUPPORTS_DEBUG_PAGEALLOC	if PPC_BOOK3S || PPC_8xx
> +	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP
> +	select ARCH_SUPPORTS_SCHED_MC		if PPC64 && SMP
> +	select SCHED_MC				if ARCH_SUPPORTS_SCHED_MC

Wondering if this SCHED_MC is necessary here? shouldn't it be set by arch/Kconfig?

nit: Also, can we have so they are still sorted?
	select ARCH_SUPPORTS_SCHED_MC		if PPC64 && SMP
	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP

>   	select ARCH_USE_BUILTIN_BSWAP
>   	select ARCH_USE_CMPXCHG_LOCKREF		if PPC64
>   	select ARCH_USE_MEMTEST
> @@ -963,18 +966,6 @@ config PPC_PROT_SAO_LPAR
>   config PPC_COPRO_BASE
>   	bool
>   
> -config SCHED_SMT
> -	bool "SMT (Hyperthreading) scheduler support"
> -	depends on PPC64 && SMP
> -	help
> -	  SMT scheduler support improves the CPU scheduler's decision making
> -	  when dealing with POWER5 cpus at a cost of slightly increased
> -	  overhead in some places. If unsure say N here.
> -
> -config SCHED_MC
> -	def_bool y
> -	depends on PPC64 && SMP
> -
>   config PPC_DENORMALISATION
>   	bool "PowerPC denormalisation exception handling"
>   	depends on PPC_BOOK3S_64
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -72,6 +72,7 @@ config RISCV
>   	select ARCH_SUPPORTS_PER_VMA_LOCK if MMU
>   	select ARCH_SUPPORTS_RT
>   	select ARCH_SUPPORTS_SHADOW_CALL_STACK if HAVE_SHADOW_CALL_STACK
> +	select ARCH_SUPPORTS_SCHED_MC if SMP
>   	select ARCH_USE_CMPXCHG_LOCKREF if 64BIT
>   	select ARCH_USE_MEMTEST
>   	select ARCH_USE_QUEUED_RWLOCKS
> @@ -453,14 +454,6 @@ config SMP
>   
>   	  If you don't know what to do here, say N.
>   
> -config SCHED_MC
> -	bool "Multi-core scheduler support"
> -	depends on SMP
> -	help
> -	  Multi-core scheduler support improves the CPU scheduler's decision
> -	  making when dealing with multi-core CPU chips at a cost of slightly
> -	  increased overhead in some places. If unsure say N here.
> -
>   config NR_CPUS
>   	int "Maximum number of CPUs (2-512)"
>   	depends on SMP
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -547,15 +547,11 @@ config NODES_SHIFT
>   	depends on NUMA
>   	default "1"
>   
> -config SCHED_SMT
> -	def_bool n
> -
> -config SCHED_MC
> -	def_bool n
> -
>   config SCHED_TOPOLOGY
>   	def_bool y
>   	prompt "Topology scheduler support"
> +	select ARCH_SUPPORTS_SCHED_SMT
> +	select ARCH_SUPPORTS_SCHED_MC
>   	select SCHED_SMT
>   	select SCHED_MC
Same here. Above two are needed?

>   	help
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -110,6 +110,8 @@ config SPARC64
>   	select HAVE_SETUP_PER_CPU_AREA
>   	select NEED_PER_CPU_EMBED_FIRST_CHUNK
>   	select NEED_PER_CPU_PAGE_FIRST_CHUNK
> +	select ARCH_SUPPORTS_SCHED_SMT if SMP
> +	select ARCH_SUPPORTS_SCHED_MC  if SMP
>   
>   config ARCH_PROC_KCORE_TEXT
>   	def_bool y
> @@ -288,24 +290,6 @@ if SPARC64 || COMPILE_TEST
>   source "kernel/power/Kconfig"
>   endif
>   
> -config SCHED_SMT
> -	bool "SMT (Hyperthreading) scheduler support"
> -	depends on SPARC64 && SMP
> -	default y
> -	help
> -	  SMT scheduler support improves the CPU scheduler's decision making
> -	  when dealing with SPARC cpus at a cost of slightly increased overhead
> -	  in some places. If unsure say N here.
> -
> -config SCHED_MC
> -	bool "Multi-core scheduler support"
> -	depends on SPARC64 && SMP
> -	default y
> -	help
> -	  Multi-core scheduler support improves the CPU scheduler's decision
> -	  making when dealing with multi-core CPU chips at a cost of slightly
> -	  increased overhead in some places. If unsure say N here.
> -
>   config CMDLINE_BOOL
>   	bool "Default bootloader kernel arguments"
>   	depends on SPARC64
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -330,6 +330,10 @@ config X86
>   	imply IMA_SECURE_AND_OR_TRUSTED_BOOT    if EFI
>   	select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
>   	select ARCH_SUPPORTS_PT_RECLAIM		if X86_64
> +	select ARCH_SUPPORTS_SCHED_SMT		if SMP
> +	select SCHED_SMT			if SMP
Is this SCHED_SMT needed here?

> +	select ARCH_SUPPORTS_SCHED_CLUSTER	if SMP
> +	select ARCH_SUPPORTS_SCHED_MC		if SMP
>   
>   config INSTRUCTION_DECODER
>   	def_bool y
> @@ -1036,29 +1040,6 @@ config NR_CPUS
>   	  This is purely to save memory: each supported CPU adds about 8KB
>   	  to the kernel image.
>   
> -config SCHED_CLUSTER
> -	bool "Cluster scheduler support"
> -	depends on SMP
> -	default y
> -	help
> -	  Cluster scheduler support improves the CPU scheduler's decision
> -	  making when dealing with machines that have clusters of CPUs.
> -	  Cluster usually means a couple of CPUs which are placed closely
> -	  by sharing mid-level caches, last-level cache tags or internal
> -	  busses.
> -
> -config SCHED_SMT
> -	def_bool y if SMP
> -
> -config SCHED_MC
> -	def_bool y
> -	prompt "Multi-core scheduler support"
> -	depends on SMP
> -	help
> -	  Multi-core scheduler support improves the CPU scheduler's decision
> -	  making when dealing with multi-core CPU chips at a cost of slightly
> -	  increased overhead in some places. If unsure say N here.
> -
>   config SCHED_MC_PRIO
>   	bool "CPU core priorities scheduler support"
>   	depends on SCHED_MC

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 1/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
  2025-08-26  4:13 ` [PATCH v7 1/8] " K Prateek Nayak
@ 2025-08-28 23:06   ` Tim Chen
  0 siblings, 0 replies; 36+ messages in thread
From: Tim Chen @ 2025-08-28 23:06 UTC (permalink / raw)
  To: K Prateek Nayak, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	linuxppc-dev, linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, thomas.weissschuh, Li Chen, Bibo Mao,
	Mete Durlu, Tobias Huschle, Easwar Hariharan, Guo Weikang,
	Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal,
	Yury Norov [NVIDIA], Sudeep Holla, Jonathan Cameron, Andrea Righi,
	Yicong Yang, Ricardo Neri, Vinicius Costa Gomes

On Tue, 2025-08-26 at 04:13 +0000, K Prateek Nayak wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> 
...snip...

>  
> -static const struct cpumask *cpu_mc_mask(int cpu)
> +static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl, int cpu)
>  {
>  	return cpu_coregroup_mask(cpu);
>  }
>  
> +static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
> +{
> +	return cpu_node_mask(cpu);
> +}
> +

I suggest that we rename this function as tl_node_mask as we are
returning the node mask.  We could have multiple nodes in a package
for SNC and it may not actually be pkg_mask for such case.

Rest of the patch looks good.

Thanks.

Tim


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
  2025-08-26 10:13   ` Peter Zijlstra
@ 2025-08-29  7:53     ` Valentin Schneider
  2025-08-29  8:53       ` Shrikanth Hegde
  0 siblings, 1 reply; 36+ messages in thread
From: Valentin Schneider @ 2025-08-29  7:53 UTC (permalink / raw)
  To: Peter Zijlstra, Shrikanth Hegde
  Cc: K Prateek Nayak, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, thomas.weissschuh, Li Chen, Bibo Mao, Mete Durlu,
	Tobias Huschle, Easwar Hariharan, Guo Weikang, Rafael J. Wysocki,
	Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal, Yury Norov [NVIDIA],
	Sudeep Holla, Jonathan Cameron, Andrea Righi, Yicong Yang,
	Ricardo Neri, Tim Chen, Vinicius Costa Gomes, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Juri Lelli, Vincent Guittot, linuxppc-dev, linux-kernel,
	linux-s390

On 26/08/25 12:13, Peter Zijlstra wrote:
> Subject: sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Mon, 25 Aug 2025 12:02:44 +0000
>
> Leon [1] and Vinicius [2] noted a topology_span_sane() warning during
> their testing starting from v6.16-rc1. Debug that followed pointed to
> the tl->mask() for the NODE domain being incorrectly resolved to that of
> the highest NUMA domain.
>
> tl->mask() for NODE is set to the sd_numa_mask() which depends on the
> global "sched_domains_curr_level" hack. "sched_domains_curr_level" is
> set to the "tl->numa_level" during tl traversal in build_sched_domains()
> calling sd_init() but was not reset before topology_span_sane().
>
> Since "tl->numa_level" still reflected the old value from
> build_sched_domains(), topology_span_sane() for the NODE domain trips
> when the span of the last NUMA domain overlaps.
>
> Instead of replicating the "sched_domains_curr_level" hack, get rid of
> it entirely and instead, pass the entire "sched_domain_topology_level"
> object to tl->cpumask() function to prevent such mishap in the future.
>
> sd_numa_mask() now directly references "tl->numa_level" instead of
> relying on the global "sched_domains_curr_level" hack to index into
> sched_domains_numa_masks[].
>

Eh, of course I see this *after* looking at the v6 patch.

I tested this again for good measure, but given I only test this under
x86 and the changes with v6 are in s390/ppc, I didn't expect to see much
change :-)

Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Tested-by: Valentin Schneider <vschneid@redhat.com>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
  2025-08-29  7:53     ` Valentin Schneider
@ 2025-08-29  8:53       ` Shrikanth Hegde
  2025-09-01  4:39         ` K Prateek Nayak
  2025-09-01  8:58         ` Peter Zijlstra
  0 siblings, 2 replies; 36+ messages in thread
From: Shrikanth Hegde @ 2025-08-29  8:53 UTC (permalink / raw)
  To: Valentin Schneider, Peter Zijlstra, Madhavan Srinivasan
  Cc: K Prateek Nayak, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, thomas.weissschuh, Li Chen, Bibo Mao, Mete Durlu,
	Tobias Huschle, Easwar Hariharan, Guo Weikang, Rafael J. Wysocki,
	Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal, Yury Norov [NVIDIA],
	Sudeep Holla, Jonathan Cameron, Andrea Righi, Yicong Yang,
	Ricardo Neri, Tim Chen, Vinicius Costa Gomes, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390



On 8/29/25 1:23 PM, Valentin Schneider wrote:
> On 26/08/25 12:13, Peter Zijlstra wrote:
>> Subject: sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
>> From: Peter Zijlstra <peterz@infradead.org>
>> Date: Mon, 25 Aug 2025 12:02:44 +0000
>>
>> Leon [1] and Vinicius [2] noted a topology_span_sane() warning during
>> their testing starting from v6.16-rc1. Debug that followed pointed to
>> the tl->mask() for the NODE domain being incorrectly resolved to that of
>> the highest NUMA domain.
>>
>> tl->mask() for NODE is set to the sd_numa_mask() which depends on the
>> global "sched_domains_curr_level" hack. "sched_domains_curr_level" is
>> set to the "tl->numa_level" during tl traversal in build_sched_domains()
>> calling sd_init() but was not reset before topology_span_sane().
>>
>> Since "tl->numa_level" still reflected the old value from
>> build_sched_domains(), topology_span_sane() for the NODE domain trips
>> when the span of the last NUMA domain overlaps.
>>
>> Instead of replicating the "sched_domains_curr_level" hack, get rid of
>> it entirely and instead, pass the entire "sched_domain_topology_level"
>> object to tl->cpumask() function to prevent such mishap in the future.
>>
>> sd_numa_mask() now directly references "tl->numa_level" instead of
>> relying on the global "sched_domains_curr_level" hack to index into
>> sched_domains_numa_masks[].
>>
> 
> Eh, of course I see this *after* looking at the v6 patch.
> 
> I tested this again for good measure, but given I only test this under
> x86 and the changes with v6 are in s390/ppc, I didn't expect to see much
> change :-)
> 
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
> Tested-by: Valentin Schneider <vschneid@redhat.com>
> 

I was looking at: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/core

Current code doesn't allow one to enable/disable SCHED_MC on ppc since it is set always in kconfig.
Used the below patch:

I think since the config is there, it would be good to provide a option to disable. no?

---

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index fc0d1c19f5a1..da5b2f8d3686 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -170,9 +170,8 @@ config PPC
  	select ARCH_STACKWALK
  	select ARCH_SUPPORTS_ATOMIC_RMW
  	select ARCH_SUPPORTS_DEBUG_PAGEALLOC	if PPC_BOOK3S || PPC_8xx
-	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP
  	select ARCH_SUPPORTS_SCHED_MC		if PPC64 && SMP
-	select SCHED_MC				if ARCH_SUPPORTS_SCHED_MC
+	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP
  	select ARCH_USE_BUILTIN_BSWAP
  	select ARCH_USE_CMPXCHG_LOCKREF		if PPC64
  	select ARCH_USE_MEMTEST
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 68edb66c2964..458ec5bd859e 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1706,10 +1706,12 @@ static void __init build_sched_topology(void)
  			SDTL_INIT(tl_cache_mask, powerpc_shared_cache_flags, CACHE);
  	}
  
+#ifdef CONFIG_SCHED_MC
  	if (has_coregroup_support()) {
  		powerpc_topology[i++] =
  			SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
  	}
+#endif
  
  	powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask, powerpc_shared_proc_flags, PKG);
  


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 2/8] powerpc/smp: Rename cpu_corgroup_* to cpu_corgrp_*
  2025-08-26  5:02   ` Christophe Leroy
@ 2025-09-01  3:05     ` K Prateek Nayak
  0 siblings, 0 replies; 36+ messages in thread
From: K Prateek Nayak @ 2025-09-01  3:05 UTC (permalink / raw)
  To: Christophe Leroy, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, thomas.weissschuh, Li Chen, Bibo Mao,
	Mete Durlu, Tobias Huschle, Easwar Hariharan, Guo Weikang,
	Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal,
	Yury Norov [NVIDIA], Sudeep Holla, Jonathan Cameron, Andrea Righi,
	Yicong Yang, Ricardo Neri, Tim Chen, Vinicius Costa Gomes

Hello Christophe,

On 8/26/2025 10:32 AM, Christophe Leroy wrote:
> 
> 
> Le 26/08/2025 à 06:13, K Prateek Nayak a écrit :
>> Rename cpu_corgroup_{map,mask} to cpu_corgrp_{map,mask} to free up the
>> cpu_corgroup_* namespace. cpu_corgroup_mask() will be added back in the
>> subsequent commit for CONFIG_SCHED_MC enablement.
> 
> This renaming seems odd and uncomplete. For instance update_coregroup_mask() should probably be renamed as well shoudln't it ?

So this was a bad copypasta on my part! It should have been
s/cpu_coregroup_*/cpu_coregrp_*/

> 
> When you say cpu_corgroup_mask() will be added back, you mean the same function or a completely different function but with the same name ?
> 
> What's really the difference between corgrp and coregroup ?
> 
> Shouldn't also has_coregroup_support() now be renamed has_corgrp_support() ?


The main intention was that kernel/sched/topology.c uses the
cpu_coregroup_mask() as the default function for to derive the mask for
MC domain in the default topology and PPC uses it internally for this
file only.

Peter just exposed the cpu_coregroup_mask() as is in
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/core&id=6e890353ce7e983a621d30413d4fc6d228ae1b4f
which should be fine too since the PPC side overrides the default
topology and can decide to add or ommit the MC bits.

I was erring on the side of caution and allowing cpu_coregroup_mask() to
return the node mask if has_coregroup_support() returns false but given
the MC domain is never added when has_coregroup_support() returns false,
we don't need all this.

-- 
Thanks and Regards,
Prateek


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
  2025-08-29  8:53       ` Shrikanth Hegde
@ 2025-09-01  4:39         ` K Prateek Nayak
  2025-09-01  8:58         ` Peter Zijlstra
  1 sibling, 0 replies; 36+ messages in thread
From: K Prateek Nayak @ 2025-09-01  4:39 UTC (permalink / raw)
  To: Shrikanth Hegde, Valentin Schneider, Peter Zijlstra,
	Madhavan Srinivasan
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	thomas.weissschuh, Li Chen, Bibo Mao, Mete Durlu, Tobias Huschle,
	Easwar Hariharan, Guo Weikang, Rafael J. Wysocki, Brian Gerst,
	Patryk Wlazlyn, Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390

Hello Shrikanth,

On 8/29/2025 2:23 PM, Shrikanth Hegde wrote:
> I was looking at: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/core
> 
> Current code doesn't allow one to enable/disable SCHED_MC on ppc since it is set always in kconfig.
> Used the below patch:
> 
> I think since the config is there, it would be good to provide a option to disable. no?

I think this makes sense.

FWIW, Peter added the "select SCHED_MC" to keep it consistent with the
current behavior.

> 
> ---
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index fc0d1c19f5a1..da5b2f8d3686 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -170,9 +170,8 @@ config PPC
>      select ARCH_STACKWALK
>      select ARCH_SUPPORTS_ATOMIC_RMW
>      select ARCH_SUPPORTS_DEBUG_PAGEALLOC    if PPC_BOOK3S || PPC_8xx
> -    select ARCH_SUPPORTS_SCHED_SMT        if PPC64 && SMP
>      select ARCH_SUPPORTS_SCHED_MC        if PPC64 && SMP
> -    select SCHED_MC                if ARCH_SUPPORTS_SCHED_MC
> +    select ARCH_SUPPORTS_SCHED_SMT        if PPC64 && SMP
>      select ARCH_USE_BUILTIN_BSWAP
>      select ARCH_USE_CMPXCHG_LOCKREF        if PPC64
>      select ARCH_USE_MEMTEST
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 68edb66c2964..458ec5bd859e 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1706,10 +1706,12 @@ static void __init build_sched_topology(void)
>              SDTL_INIT(tl_cache_mask, powerpc_shared_cache_flags, CACHE);
>      }
>  
> +#ifdef CONFIG_SCHED_MC
>      if (has_coregroup_support()) {
>          powerpc_topology[i++] =
>              SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
>      }
> +#endif

When I was looking at this, the whole of .*coregroup.* related bits in
smp.c can technically go behind CONFIG_SCHED_MC too but that is a much
larger cleanup and perhaps unnecessary too so this looks good.

-- 
Thanks and Regards,
Prateek


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
  2025-08-26  9:27   ` Shrikanth Hegde
@ 2025-09-01  4:50     ` K Prateek Nayak
  0 siblings, 0 replies; 36+ messages in thread
From: K Prateek Nayak @ 2025-09-01  4:50 UTC (permalink / raw)
  To: Shrikanth Hegde, Andrea Righi
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, thomas.weissschuh, Li Chen, Bibo Mao,
	Mete Durlu, Tobias Huschle, Easwar Hariharan, Guo Weikang,
	Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn, Swapnil Sapkal,
	Yury Norov [NVIDIA], Sudeep Holla, Jonathan Cameron, Yicong Yang,
	Ricardo Neri, Tim Chen, Vinicius Costa Gomes, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Juri Lelli, Vincent Guittot, linuxppc-dev, linux-kernel,
	linux-s390, Peter Zijlstra



On 8/26/2025 2:57 PM, Shrikanth Hegde wrote:
>> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
>> index 86de4d0dd0aa..9a320d96e891 100644
>> --- a/arch/powerpc/include/asm/smp.h
>> +++ b/arch/powerpc/include/asm/smp.h
>> @@ -148,7 +148,9 @@ static inline const struct cpumask *cpu_smt_mask(int cpu)
>>   }
>>   #endif /* CONFIG_SCHED_SMT */
>>   +#ifdef CONFIG_SCHED_MC
>>   extern const struct cpumask *cpu_coregroup_mask(int cpu);
>> +#endif
>>   
> 
> Is ifdef necessary here?

This is gone in Peter's squash but I added it just to
remain consistent with cpu_smt_mask() above.

> 
>>   /* Since OpenPIC has only 4 IPIs, we use slightly different message numbers.
>>    *
>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> index e623f2864dc4..7f79b853b221 100644
>> --- a/arch/powerpc/kernel/smp.c
>> +++ b/arch/powerpc/kernel/smp.c
>> @@ -1059,6 +1059,7 @@ static bool has_coregroup_support(void)
>>       return coregroup_enabled;
>>   }
>>   +#ifdef CONFIG_SCHED_MC
>>   const struct cpumask *cpu_coregroup_mask(int cpu)
>>   {
>>       if (has_coregroup_support())
>> @@ -1071,6 +1072,7 @@ static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_level *tl,
>>   {
>>       return cpu_corgrp_mask(cpu);
>>   }
>> +#endif
>>   
> 
> Previous patch says cpu_coregroup_mask is exported. Is it exported in any way to user or modules?

Just "exposed" to kernel/sched/topology.c bits :)

I don't think this is used by any generic module / exported to
userspace.

> 
> Also i don't see similar gating in other archs. It maybe unnecessary.
> 
>>   static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_level *tl, int cpu)
>>   {
>> @@ -1729,10 +1731,12 @@ static void __init build_sched_topology(void)
>>               SDTL_INIT(shared_cache_mask, powerpc_shared_cache_flags, CACHE);
>>       }
>>   +#ifdef CONFIG_SCHED_MC
>>       if (has_coregroup_support()) {
>>           powerpc_topology[i++] =
>>               SDTL_INIT(cpu_mc_mask, powerpc_shared_proc_flags, MC);
>>       }
>> +#endif
> 
> Just this gating should suffice IMO.

Ack. Your suggested diff to have CONFIG_SCHED_MC configurable on powerpc
looks good.

-- 
Thanks and Regards,
Prateek


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
  2025-08-28 14:43         ` Shrikanth Hegde
@ 2025-09-01  8:35           ` Peter Zijlstra
  2025-09-01  8:52             ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2025-09-01  8:35 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Christophe Leroy, K Prateek Nayak, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, thomas.weissschuh,
	Li Chen, Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

On Thu, Aug 28, 2025 at 08:13:51PM +0530, Shrikanth Hegde wrote:

> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -170,6 +170,9 @@ config PPC
> >   	select ARCH_STACKWALK
> >   	select ARCH_SUPPORTS_ATOMIC_RMW
> >   	select ARCH_SUPPORTS_DEBUG_PAGEALLOC	if PPC_BOOK3S || PPC_8xx
> > +	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP
> > +	select ARCH_SUPPORTS_SCHED_MC		if PPC64 && SMP
> > +	select SCHED_MC				if ARCH_SUPPORTS_SCHED_MC
> 
> Wondering if this SCHED_MC is necessary here? shouldn't it be set by arch/Kconfig?

Ah, so without this SCHED_MC becomes a user selectable option, with this
it is an always on option (for ppc64) -- no user prompt.

That is, this is the only way I found to have similar semantics to this:

> > -config SCHED_MC
> > -	def_bool y
> > -	depends on PPC64 && SMP
> > -

Which is also not a user selectable option.

> nit: Also, can we have so they are still sorted?
> 	select ARCH_SUPPORTS_SCHED_MC		if PPC64 && SMP
> 	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP

Sure, let me flip them. I need to prod that that patch anyway, built
robot still ain'ted happy.


> > --- a/arch/s390/Kconfig
> > +++ b/arch/s390/Kconfig
> > @@ -547,15 +547,11 @@ config NODES_SHIFT
> >   	depends on NUMA
> >   	default "1"
> > -config SCHED_SMT
> > -	def_bool n
> > -
> > -config SCHED_MC
> > -	def_bool n
> > -
> >   config SCHED_TOPOLOGY
> >   	def_bool y
> >   	prompt "Topology scheduler support"
> > +	select ARCH_SUPPORTS_SCHED_SMT
> > +	select ARCH_SUPPORTS_SCHED_MC
> >   	select SCHED_SMT
> >   	select SCHED_MC
> Same here. Above two are needed?

Same issue; previously neither were user selectable symbols. By only
selecting the ARCH_SUPPORTS_$FOO variants, the $FOO options become user
selectable. By then explicitly selecting $FOO as well, that user option
is taken away again.


> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -330,6 +330,10 @@ config X86
> >   	imply IMA_SECURE_AND_OR_TRUSTED_BOOT    if EFI
> >   	select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
> >   	select ARCH_SUPPORTS_PT_RECLAIM		if X86_64
> > +	select ARCH_SUPPORTS_SCHED_SMT		if SMP
> > +	select SCHED_SMT			if SMP
> Is this SCHED_SMT needed here?

Same again...

> > +	select ARCH_SUPPORTS_SCHED_CLUSTER	if SMP
> > +	select ARCH_SUPPORTS_SCHED_MC		if SMP
> >   config INSTRUCTION_DECODER
> >   	def_bool y
> > @@ -1036,29 +1040,6 @@ config NR_CPUS
> >   	  This is purely to save memory: each supported CPU adds about 8KB
> >   	  to the kernel image.
> > -config SCHED_CLUSTER
> > -	bool "Cluster scheduler support"
> > -	depends on SMP
> > -	default y
> > -	help
> > -	  Cluster scheduler support improves the CPU scheduler's decision
> > -	  making when dealing with machines that have clusters of CPUs.
> > -	  Cluster usually means a couple of CPUs which are placed closely
> > -	  by sharing mid-level caches, last-level cache tags or internal
> > -	  busses.
> > -
> > -config SCHED_SMT
> > -	def_bool y if SMP
> > -
> > -config SCHED_MC
> > -	def_bool y
> > -	prompt "Multi-core scheduler support"
> > -	depends on SMP
> > -	help
> > -	  Multi-core scheduler support improves the CPU scheduler's decision
> > -	  making when dealing with multi-core CPU chips at a cost of slightly
> > -	  increased overhead in some places. If unsure say N here.

See how SCHED_SMT is not a user option for x86.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits
  2025-09-01  8:35           ` Peter Zijlstra
@ 2025-09-01  8:52             ` Peter Zijlstra
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2025-09-01  8:52 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Christophe Leroy, K Prateek Nayak, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, thomas.weissschuh,
	Li Chen, Bibo Mao, Mete Durlu, Tobias Huschle, Easwar Hariharan,
	Guo Weikang, Rafael J. Wysocki, Brian Gerst, Patryk Wlazlyn,
	Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes

On Mon, Sep 01, 2025 at 10:35:07AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 28, 2025 at 08:13:51PM +0530, Shrikanth Hegde wrote:
> 
> > > --- a/arch/powerpc/Kconfig
> > > +++ b/arch/powerpc/Kconfig
> > > @@ -170,6 +170,9 @@ config PPC
> > >   	select ARCH_STACKWALK
> > >   	select ARCH_SUPPORTS_ATOMIC_RMW
> > >   	select ARCH_SUPPORTS_DEBUG_PAGEALLOC	if PPC_BOOK3S || PPC_8xx
> > > +	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP
> > > +	select ARCH_SUPPORTS_SCHED_MC		if PPC64 && SMP
> > > +	select SCHED_MC				if ARCH_SUPPORTS_SCHED_MC
> > 
> > Wondering if this SCHED_MC is necessary here? shouldn't it be set by arch/Kconfig?
> 
> Ah, so without this SCHED_MC becomes a user selectable option, with this
> it is an always on option (for ppc64) -- no user prompt.
> 
> That is, this is the only way I found to have similar semantics to this:
> 
> > > -config SCHED_MC
> > > -	def_bool y
> > > -	depends on PPC64 && SMP
> > > -
> 
> Which is also not a user selectable option.
> 
> > nit: Also, can we have so they are still sorted?
> > 	select ARCH_SUPPORTS_SCHED_MC		if PPC64 && SMP
> > 	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP
> 
> Sure, let me flip them. I need to prod that that patch anyway, built
> robot still ain'ted happy.

Looks like 44x/iss476-smp_defconfig (iow 32bit power) also wants
SCHED_MC, so it should be:

config SCHED_MC
	def_bool y
	depends on SMP

Its just SMT that's a PPC64 special.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
  2025-08-29  8:53       ` Shrikanth Hegde
  2025-09-01  4:39         ` K Prateek Nayak
@ 2025-09-01  8:58         ` Peter Zijlstra
  2025-09-01 17:06           ` Shrikanth Hegde
  1 sibling, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2025-09-01  8:58 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Valentin Schneider, Madhavan Srinivasan, K Prateek Nayak,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	thomas.weissschuh, Li Chen, Bibo Mao, Mete Durlu, Tobias Huschle,
	Easwar Hariharan, Guo Weikang, Rafael J. Wysocki, Brian Gerst,
	Patryk Wlazlyn, Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390

On Fri, Aug 29, 2025 at 02:23:06PM +0530, Shrikanth Hegde wrote:

> I was looking at: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/core
> 
> Current code doesn't allow one to enable/disable SCHED_MC on ppc since it is set always in kconfig.
> Used the below patch:
> 
> I think since the config is there, it would be good to provide a option to disable. no?

So current PPC code has this MC thing unconditional. I've been
preserving that behaviour. If PPC maintainers feel they want this
selectable, I'm happy to include something like the below, but as a
separate patch with a separate changelog that states this explicit
choice.

> ---
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index fc0d1c19f5a1..da5b2f8d3686 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -170,9 +170,8 @@ config PPC
>  	select ARCH_STACKWALK
>  	select ARCH_SUPPORTS_ATOMIC_RMW
>  	select ARCH_SUPPORTS_DEBUG_PAGEALLOC	if PPC_BOOK3S || PPC_8xx
> -	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP
>  	select ARCH_SUPPORTS_SCHED_MC		if PPC64 && SMP
> -	select SCHED_MC				if ARCH_SUPPORTS_SCHED_MC
> +	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP
>  	select ARCH_USE_BUILTIN_BSWAP
>  	select ARCH_USE_CMPXCHG_LOCKREF		if PPC64
>  	select ARCH_USE_MEMTEST
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 68edb66c2964..458ec5bd859e 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1706,10 +1706,12 @@ static void __init build_sched_topology(void)
>  			SDTL_INIT(tl_cache_mask, powerpc_shared_cache_flags, CACHE);
>  	}
> +#ifdef CONFIG_SCHED_MC
>  	if (has_coregroup_support()) {
>  		powerpc_topology[i++] =
>  			SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
>  	}
> +#endif
>  	powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask, powerpc_shared_proc_flags, PKG);
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
  2025-09-01  8:58         ` Peter Zijlstra
@ 2025-09-01 17:06           ` Shrikanth Hegde
  0 siblings, 0 replies; 36+ messages in thread
From: Shrikanth Hegde @ 2025-09-01 17:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Valentin Schneider, Madhavan Srinivasan, K Prateek Nayak,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	thomas.weissschuh, Li Chen, Bibo Mao, Mete Durlu, Tobias Huschle,
	Easwar Hariharan, Guo Weikang, Rafael J. Wysocki, Brian Gerst,
	Patryk Wlazlyn, Swapnil Sapkal, Yury Norov [NVIDIA], Sudeep Holla,
	Jonathan Cameron, Andrea Righi, Yicong Yang, Ricardo Neri,
	Tim Chen, Vinicius Costa Gomes, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Juri Lelli, Vincent Guittot, linuxppc-dev,
	linux-kernel, linux-s390



On 9/1/25 2:28 PM, Peter Zijlstra wrote:
> On Fri, Aug 29, 2025 at 02:23:06PM +0530, Shrikanth Hegde wrote:
> 
>> I was looking at: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/core
>>
>> Current code doesn't allow one to enable/disable SCHED_MC on ppc since it is set always in kconfig.
>> Used the below patch:
>>
>> I think since the config is there, it would be good to provide a option to disable. no?
> 
> So current PPC code has this MC thing unconditional. I've been
> preserving that behaviour. If PPC maintainers feel they want this
> selectable, I'm happy to include something like the below, but as a
> separate patch with a separate changelog that states this explicit
> choice.
> 

Fair enough. Will send it as separate patch.

>> ---
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index fc0d1c19f5a1..da5b2f8d3686 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -170,9 +170,8 @@ config PPC
>>   	select ARCH_STACKWALK
>>   	select ARCH_SUPPORTS_ATOMIC_RMW
>>   	select ARCH_SUPPORTS_DEBUG_PAGEALLOC	if PPC_BOOK3S || PPC_8xx
>> -	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP
>>   	select ARCH_SUPPORTS_SCHED_MC		if PPC64 && SMP
>> -	select SCHED_MC				if ARCH_SUPPORTS_SCHED_MC
>> +	select ARCH_SUPPORTS_SCHED_SMT		if PPC64 && SMP
>>   	select ARCH_USE_BUILTIN_BSWAP
>>   	select ARCH_USE_CMPXCHG_LOCKREF		if PPC64
>>   	select ARCH_USE_MEMTEST
>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> index 68edb66c2964..458ec5bd859e 100644
>> --- a/arch/powerpc/kernel/smp.c
>> +++ b/arch/powerpc/kernel/smp.c
>> @@ -1706,10 +1706,12 @@ static void __init build_sched_topology(void)
>>   			SDTL_INIT(tl_cache_mask, powerpc_shared_cache_flags, CACHE);
>>   	}
>> +#ifdef CONFIG_SCHED_MC
>>   	if (has_coregroup_support()) {
>>   		powerpc_topology[i++] =
>>   			SDTL_INIT(tl_mc_mask, powerpc_shared_proc_flags, MC);
>>   	}
>> +#endif
>>   	powerpc_topology[i++] = SDTL_INIT(tl_pkg_mask, powerpc_shared_proc_flags, PKG);
>>


If possible for below two, please consider tags.

https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/core&id=496d4cc3d478a662f90cce3a3e3be4af56f78a02
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/core&id=a912f3e2c6d91f7ea7b294c02796b59af4f50078

Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>

for powerpc bits:
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2025-09-01 17:08 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-26  4:13 [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() K Prateek Nayak
2025-08-26  4:13 ` [PATCH v7 1/8] " K Prateek Nayak
2025-08-28 23:06   ` Tim Chen
2025-08-26  4:13 ` [PATCH v7 2/8] powerpc/smp: Rename cpu_corgroup_* to cpu_corgrp_* K Prateek Nayak
2025-08-26  5:02   ` Christophe Leroy
2025-09-01  3:05     ` K Prateek Nayak
2025-08-26  4:13 ` [PATCH v7 3/8] powerpc/smp: Export cpu_coregroup_mask() K Prateek Nayak
2025-08-26  4:54   ` Christophe Leroy
2025-08-26  4:13 ` [PATCH v7 4/8] powerpc/smp: Introduce CONFIG_SCHED_MC to guard MC scheduling bits K Prateek Nayak
2025-08-26  4:49   ` Christophe Leroy
2025-08-26  8:07     ` Peter Zijlstra
2025-08-26  9:43       ` Peter Zijlstra
2025-08-26  9:59         ` Peter Zijlstra
2025-08-28 14:43         ` Shrikanth Hegde
2025-09-01  8:35           ` Peter Zijlstra
2025-09-01  8:52             ` Peter Zijlstra
2025-08-26  9:27   ` Shrikanth Hegde
2025-09-01  4:50     ` K Prateek Nayak
2025-08-26  4:13 ` [PATCH v7 5/8] sched/topology: Unify tl_smt_mask() across core and all arch K Prateek Nayak
2025-08-26  5:13   ` Christophe Leroy
2025-08-26  8:01   ` Peter Zijlstra
2025-08-26  8:11     ` Christophe Leroy
2025-08-26  8:24       ` Peter Zijlstra
2025-08-26  4:13 ` [PATCH v7 6/8] sched/topology: Unify tl_cls_mask() across core and x86 K Prateek Nayak
2025-08-26  5:14   ` Christophe Leroy
2025-08-26  4:13 ` [PATCH v7 7/8] sched/topology: Unify tl_mc_mask() across core and all arch K Prateek Nayak
2025-08-26  5:15   ` Christophe Leroy
2025-08-26  4:13 ` [PATCH v7 8/8] sched/topology: Unify tl_pkg_mask() " K Prateek Nayak
2025-08-26  5:16   ` Christophe Leroy
2025-08-26 10:05 ` [PATCH v7 0/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() Shrikanth Hegde
2025-08-26 10:13   ` Peter Zijlstra
2025-08-29  7:53     ` Valentin Schneider
2025-08-29  8:53       ` Shrikanth Hegde
2025-09-01  4:39         ` K Prateek Nayak
2025-09-01  8:58         ` Peter Zijlstra
2025-09-01 17:06           ` Shrikanth Hegde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).