public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/6] x86/topo: SNC Divination
@ 2026-02-26 10:49 Peter Zijlstra
  2026-02-26 10:49 ` [RFC][PATCH 1/6] x86/topo: Store extra copy of SRAT table Peter Zijlstra
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: Peter Zijlstra @ 2026-02-26 10:49 UTC (permalink / raw)
  To: x86, tglx
  Cc: linux-kernel, peterz, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, kprateek.nayak, patryk.wlazlyn,
	rafael.j.wysocki, russ.anderson, zhao1.liu, tony.luck

Hi!

So we once again ran head-first into the fact that the CPUs fail to enumerate
useful state. This time it is SNC (again).

Thomas recently rewrote much of the topology code to use MADT and CPUID to
derive many of the useful measure of the system *before* SMP bringup. Removing
much broken magic.

Inspired by that, I wondered if we might do the same for SNC. Clearly MADT is
not sufficient, however combined with SRAT we should have enough.

Further luck will have it that by the time MADT gets parsed, SRAT is already
parsed, so integrating them is mostly straight forward. The only caveat is that
numa_emulation can mess things up in between.

Combining all this gives a straight forward measure of nodes-per-package, which
should reflect the SNC mode. All before SMP bringup.

Use this to 'fix' various SNC snafus.

Compile and boot tested on non SNC only for now; I'll see if I can convince
qemu to play along.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 1/6] x86/topo: Store extra copy of SRAT table
  2026-02-26 10:49 [RFC][PATCH 0/6] x86/topo: SNC Divination Peter Zijlstra
@ 2026-02-26 10:49 ` Peter Zijlstra
  2026-02-26 10:49 ` [RFC][PATCH 2/6] x86/topo: Add TOPO_NUMA_DOMAIN Peter Zijlstra
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2026-02-26 10:49 UTC (permalink / raw)
  To: x86, tglx
  Cc: linux-kernel, peterz, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, kprateek.nayak, patryk.wlazlyn,
	rafael.j.wysocki, russ.anderson, zhao1.liu, tony.luck

Because numa_emulate() will wreck the __apicid_to_node[] table, keep
an extra copy.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/numa.h |    5 +++++
 arch/x86/mm/numa.c          |    4 ++++
 2 files changed, 9 insertions(+)

--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -19,13 +19,18 @@ extern int numa_off;
  * The mapping may be overridden by apic->numa_cpu_node() on 32bit and thus
  * should be accessed by the accessors - set_apicid_to_node() and
  * numa_cpu_node().
+ *
+ * __apicid_to_node[] is affected by numa_emulation(), while
+ * __apicid_to_phys_node[] is not.
  */
 extern s16 __apicid_to_node[MAX_LOCAL_APIC];
+extern s16 __apicid_to_phys_node[MAX_LOCAL_APIC];
 extern nodemask_t numa_nodes_parsed __initdata;
 
 static inline void set_apicid_to_node(int apicid, s16 node)
 {
 	__apicid_to_node[apicid] = node;
+	__apicid_to_phys_node[apicid] = node;
 }
 
 extern int numa_cpu_node(int cpu);
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -48,6 +48,10 @@ s16 __apicid_to_node[MAX_LOCAL_APIC] = {
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
 
+s16 __apicid_to_phys_node[MAX_LOCAL_APIC] = {
+	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
+};
+
 int numa_cpu_node(int cpu)
 {
 	u32 apicid = early_per_cpu(x86_cpu_to_apicid, cpu);



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 2/6] x86/topo: Add TOPO_NUMA_DOMAIN
  2026-02-26 10:49 [RFC][PATCH 0/6] x86/topo: SNC Divination Peter Zijlstra
  2026-02-26 10:49 ` [RFC][PATCH 1/6] x86/topo: Store extra copy of SRAT table Peter Zijlstra
@ 2026-02-26 10:49 ` Peter Zijlstra
  2026-02-27 13:19   ` K Prateek Nayak
  2026-02-26 10:49 ` [RFC][PATCH 3/6] x86/topo: Add __num_nodes_per_package Peter Zijlstra
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2026-02-26 10:49 UTC (permalink / raw)
  To: x86, tglx
  Cc: linux-kernel, peterz, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, kprateek.nayak, patryk.wlazlyn,
	rafael.j.wysocki, russ.anderson, zhao1.liu, tony.luck

Use the SRAT data to add an extra NUMA domain. Since the SLIT table is
a matrix, the SRAT proximity domain 'must' be a dense set and will not
exceed MAX_LOCAL_APIC.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/topology.h |    3 +++
 arch/x86/kernel/cpu/topology.c  |    9 +++++++++
 2 files changed, 12 insertions(+)

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -111,6 +111,9 @@ enum x86_topology_domains {
 	TOPO_DIE_DOMAIN,
 	TOPO_DIEGRP_DOMAIN,
 	TOPO_PKG_DOMAIN,
+#ifdef CONFIG_NUMA
+	TOPO_NUMA_DOMAIN,
+#endif
 	TOPO_MAX_DOMAIN,
 };
 
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -31,6 +31,7 @@
 #include <asm/mpspec.h>
 #include <asm/msr.h>
 #include <asm/smp.h>
+#include <asm/numa.h>
 
 #include "cpu.h"
 
@@ -88,6 +89,14 @@ static inline u32 topo_apicid(u32 apicid
 {
 	if (dom == TOPO_SMT_DOMAIN)
 		return apicid;
+#ifdef CONFIG_NUMA
+	if (dom == TOPO_NUMA_DOMAIN) {
+		int nid = __apicid_to_phys_node[apicid];
+		if (nid == NUMA_NO_NODE)
+			nid = 0;
+		return nid;
+	}
+#endif
 	return apicid & (UINT_MAX << x86_topo_system.dom_shifts[dom - 1]);
 }
 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 3/6] x86/topo: Add __num_nodes_per_package
  2026-02-26 10:49 [RFC][PATCH 0/6] x86/topo: SNC Divination Peter Zijlstra
  2026-02-26 10:49 ` [RFC][PATCH 1/6] x86/topo: Store extra copy of SRAT table Peter Zijlstra
  2026-02-26 10:49 ` [RFC][PATCH 2/6] x86/topo: Add TOPO_NUMA_DOMAIN Peter Zijlstra
@ 2026-02-26 10:49 ` Peter Zijlstra
  2026-02-26 17:46   ` Kyle Meyer
  2026-02-26 10:49 ` [RFC][PATCH 4/6] x86/topo: Replace x86_has_numa_in_package Peter Zijlstra
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2026-02-26 10:49 UTC (permalink / raw)
  To: x86, tglx
  Cc: linux-kernel, peterz, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, kprateek.nayak, patryk.wlazlyn,
	rafael.j.wysocki, russ.anderson, zhao1.liu, tony.luck

Use the MADT and SRAT table data to compute __num_nodes_per_package.

This number is useful to divinate the various Intel CoD/SNC modes,
since the platforms are failing to provide this otherwise.

Doing it this way is independent of the number of online CPUs and
other such shenanigans.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/topology.h |    1 +
 arch/x86/kernel/cpu/common.c    |    3 +++
 arch/x86/kernel/cpu/topology.c  |    9 ++++++++-
 3 files changed, 12 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -158,6 +158,7 @@ extern unsigned int __max_logical_packag
 extern unsigned int __max_threads_per_core;
 extern unsigned int __num_threads_per_package;
 extern unsigned int __num_cores_per_package;
+extern unsigned int __num_nodes_per_package;
 
 const char *get_topology_cpu_type_name(struct cpuinfo_x86 *c);
 enum x86_topology_cpu_type get_topology_cpu_type(struct cpuinfo_x86 *c);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -95,6 +95,9 @@ EXPORT_SYMBOL(__max_dies_per_package);
 unsigned int __max_logical_packages __ro_after_init = 1;
 EXPORT_SYMBOL(__max_logical_packages);
 
+unsigned int __num_nodes_per_package __ro_after_init = 1;
+EXPORT_SYMBOL(__num_nodes_per_package);
+
 unsigned int __num_cores_per_package __ro_after_init = 1;
 EXPORT_SYMBOL(__num_cores_per_package);
 
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -497,11 +497,18 @@ void __init topology_init_possible_cpus(
 	set_nr_cpu_ids(allowed);
 
 	cnta = domain_weight(TOPO_PKG_DOMAIN);
+	cntb = domain_weight(TOPO_NUMA_DOMAIN);
+
+	__num_nodes_per_package = DIV_ROUND_UP(cntb, cnta);
+
+	pr_info("Max. logical packages: %3u\n", cnta);
+	pr_info("Max. logical nodes:    %3u\n", cntb);
+	pr_info("Num. nodes per package:%3u\n", __num_nodes_per_package);
+
 	cntb = domain_weight(TOPO_DIE_DOMAIN);
 	__max_logical_packages = cnta;
 	__max_dies_per_package = 1U << (get_count_order(cntb) - get_count_order(cnta));
 
-	pr_info("Max. logical packages: %3u\n", cnta);
 	pr_info("Max. logical dies:     %3u\n", cntb);
 	pr_info("Max. dies per package: %3u\n", __max_dies_per_package);
 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 4/6] x86/topo: Replace x86_has_numa_in_package
  2026-02-26 10:49 [RFC][PATCH 0/6] x86/topo: SNC Divination Peter Zijlstra
                   ` (2 preceding siblings ...)
  2026-02-26 10:49 ` [RFC][PATCH 3/6] x86/topo: Add __num_nodes_per_package Peter Zijlstra
@ 2026-02-26 10:49 ` Peter Zijlstra
  2026-02-26 10:49 ` [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess Peter Zijlstra
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2026-02-26 10:49 UTC (permalink / raw)
  To: x86, tglx
  Cc: linux-kernel, peterz, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, kprateek.nayak, patryk.wlazlyn,
	rafael.j.wysocki, russ.anderson, zhao1.liu, tony.luck

.. with the brand spanking new __num_nodes_per_package.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/smpboot.c |   13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -468,13 +468,6 @@ static int x86_cluster_flags(void)
 }
 #endif
 
-/*
- * Set if a package/die has multiple NUMA nodes inside.
- * AMD Magny-Cours, Intel Cluster-on-Die, and Intel
- * Sub-NUMA Clustering have this.
- */
-static bool x86_has_numa_in_package;
-
 static struct sched_domain_topology_level x86_topology[] = {
 	SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT),
 #ifdef CONFIG_SCHED_CLUSTER
@@ -496,7 +489,7 @@ static void __init build_sched_topology(
 	 * PKG domain since the NUMA domains will auto-magically create the
 	 * right spanning domains based on the SLIT.
 	 */
-	if (x86_has_numa_in_package) {
+	if (__num_nodes_per_package > 1) {
 		unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2;
 
 		memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
@@ -550,7 +543,7 @@ int arch_sched_node_distance(int from, i
 	case INTEL_GRANITERAPIDS_X:
 	case INTEL_ATOM_DARKMONT_X:
 
-		if (!x86_has_numa_in_package || topology_max_packages() == 1 ||
+		if (topology_max_packages() == 1 || __num_nodes_per_package == 1 ||
 		    d < REMOTE_DISTANCE)
 			return d;
 
@@ -606,7 +599,7 @@ void set_cpu_sibling_map(int cpu)
 		o = &cpu_data(i);
 
 		if (match_pkg(c, o) && !topology_same_node(c, o))
-			x86_has_numa_in_package = true;
+			WARN_ON_ONCE(__num_nodes_per_package == 1);
 
 		if ((i == cpu) || (has_smt && match_smt(c, o)))
 			link_mask(topology_sibling_cpumask, cpu, i);



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-02-26 10:49 [RFC][PATCH 0/6] x86/topo: SNC Divination Peter Zijlstra
                   ` (3 preceding siblings ...)
  2026-02-26 10:49 ` [RFC][PATCH 4/6] x86/topo: Replace x86_has_numa_in_package Peter Zijlstra
@ 2026-02-26 10:49 ` Peter Zijlstra
  2026-02-26 17:07   ` Chen, Yu C
  2026-02-26 10:49 ` [RFC][PATCH 6/6] x86/resctrl: Fix SNC detection Peter Zijlstra
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2026-02-26 10:49 UTC (permalink / raw)
  To: x86, tglx
  Cc: linux-kernel, peterz, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, kprateek.nayak, patryk.wlazlyn,
	rafael.j.wysocki, russ.anderson, zhao1.liu, tony.luck

So per 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode")

The original crazy SNC-3 SLIT table was:

node distances:
node     0    1    2    3    4    5
    0:   10   15   17   21   28   26
    1:   15   10   15   23   26   23
    2:   17   15   10   26   23   21
    3:   21   28   26   10   15   17
    4:   23   26   23   15   10   15
    5:   26   23   21   17   15   10

And per:

  https://lore.kernel.org/lkml/20250825075642.GQ3245006@noisy.programming.kicks-ass.net/

My suggestion was to average the off-trace clusters to restore sanity.

However, 4d6dd05d07d0 implements this under various assumptions:

 - there will never be more than 2 packages;
 - the off-trace cluster will have distance >20

And then HPE shows up with a machine that matches the
Vendor-Family-Model checks but looks like this:

Here's an 8 socket (2 chassis) HPE system with SNC enabled:

node   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  0:  10  12  16  16  16  16  18  18  40  40  40  40  40  40  40  40
  1:  12  10  16  16  16  16  18  18  40  40  40  40  40  40  40  40
  2:  16  16  10  12  18  18  16  16  40  40  40  40  40  40  40  40
  3:  16  16  12  10  18  18  16  16  40  40  40  40  40  40  40  40
  4:  16  16  18  18  10  12  16  16  40  40  40  40  40  40  40  40
  5:  16  16  18  18  12  10  16  16  40  40  40  40  40  40  40  40
  6:  18  18  16  16  16  16  10  12  40  40  40  40  40  40  40  40
  7:  18  18  16  16  16  16  12  10  40  40  40  40  40  40  40  40
  8:  40  40  40  40  40  40  40  40  10  12  16  16  16  16  18  18
  9:  40  40  40  40  40  40  40  40  12  10  16  16  16  16  18  18
 10:  40  40  40  40  40  40  40  40  16  16  10  12  18  18  16  16
 11:  40  40  40  40  40  40  40  40  16  16  12  10  18  18  16  16
 12:  40  40  40  40  40  40  40  40  16  16  18  18  10  12  16  16
 13:  40  40  40  40  40  40  40  40  16  16  18  18  12  10  16  16
 14:  40  40  40  40  40  40  40  40  18  18  16  16  16  16  10  12
 15:  40  40  40  40  40  40  40  40  18  18  16  16  16  16  12  10

 10 = Same chassis and socket
 12 = Same chassis and socket (SNC)
 16 = Same chassis and adjacent socket
 18 = Same chassis and non-adjacent socket
 40 = Different chassis

*However* this is SNC-2.

This completely invalidates all the earlier assumptions and trips
WARNs.

Now that the topology code has a sensible measure of
nodes-per-package, we can use that to divinate the SNC mode at hand,
and only fix up SNC-3 topologies.

With the only assumption that there are no CPU-less nodes -- is this
a valid assumption ?

Fixes: 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/smpboot.c |   64 +++++++++++++++++-----------------------------
 1 file changed, 25 insertions(+), 39 deletions(-)

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -506,33 +506,32 @@ static void __init build_sched_topology(
 }
 
 #ifdef CONFIG_NUMA
-static int sched_avg_remote_distance;
-static int avg_remote_numa_distance(void)
+static int slit_cluster_distance(int i, int j)
 {
-	int i, j;
-	int distance, nr_remote, total_distance;
-
-	if (sched_avg_remote_distance > 0)
-		return sched_avg_remote_distance;
-
-	nr_remote = 0;
-	total_distance = 0;
-	for_each_node_state(i, N_CPU) {
-		for_each_node_state(j, N_CPU) {
-			distance = node_distance(i, j);
-
-			if (distance >= REMOTE_DISTANCE) {
-				nr_remote++;
-				total_distance += distance;
-			}
+	int u = __num_nodes_per_package;
+	long d = 0;
+	int x, y;
+
+	/*
+	 * Is this a unit cluster on the trace?
+	 */
+	if ((i / u) == (j / u))
+		return node_distance(i, j);
+
+	/*
+	 * Off-trace cluster, return average of the cluster to force symmetry.
+	 */
+	x = i - (i % u);
+	y = j - (j % u);
+
+	for (i = x; i < x + u; i++) {
+		for (j = y; j < y + u; j++) {
+			d += node_distance(i, j);
+			d += node_distance(j, i);
 		}
 	}
-	if (nr_remote)
-		sched_avg_remote_distance = total_distance / nr_remote;
-	else
-		sched_avg_remote_distance = REMOTE_DISTANCE;
 
-	return sched_avg_remote_distance;
+	return d / (2*u*u);
 }
 
 int arch_sched_node_distance(int from, int to)
@@ -542,13 +541,11 @@ int arch_sched_node_distance(int from, i
 	switch (boot_cpu_data.x86_vfm) {
 	case INTEL_GRANITERAPIDS_X:
 	case INTEL_ATOM_DARKMONT_X:
-
-		if (topology_max_packages() == 1 || __num_nodes_per_package == 1 ||
-		    d < REMOTE_DISTANCE)
+		if (topology_max_packages() == 1 || __num_nodes_per_package < 3)
 			return d;
 
 		/*
-		 * With SNC enabled, there could be too many levels of remote
+		 * With SNC-3 enabled, there could be too many levels of remote
 		 * NUMA node distances, creating NUMA domain levels
 		 * including local nodes and partial remote nodes.
 		 *
@@ -557,19 +554,8 @@ int arch_sched_node_distance(int from, i
 		 * in the remote package in the same sched group.
 		 * Simplify NUMA domains and avoid extra NUMA levels including
 		 * different remote NUMA nodes and local nodes.
-		 *
-		 * GNR and CWF don't expect systems with more than 2 packages
-		 * and more than 2 hops between packages. Single average remote
-		 * distance won't be appropriate if there are more than 2
-		 * packages as average distance to different remote packages
-		 * could be different.
 		 */
-		WARN_ONCE(topology_max_packages() > 2,
-			  "sched: Expect only up to 2 packages for GNR or CWF, "
-			  "but saw %d packages when building sched domains.",
-			  topology_max_packages());
-
-		d = avg_remote_numa_distance();
+		return slit_cluster_distance(from, to);
 	}
 	return d;
 }



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 6/6] x86/resctrl: Fix SNC detection
  2026-02-26 10:49 [RFC][PATCH 0/6] x86/topo: SNC Divination Peter Zijlstra
                   ` (4 preceding siblings ...)
  2026-02-26 10:49 ` [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess Peter Zijlstra
@ 2026-02-26 10:49 ` Peter Zijlstra
  2026-02-26 19:42   ` Luck, Tony
  2026-02-26 19:16 ` [RFC][PATCH 0/6] x86/topo: SNC Divination Luck, Tony
  2026-03-02 18:21 ` Kyle Meyer
  7 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2026-02-26 10:49 UTC (permalink / raw)
  To: x86, tglx
  Cc: linux-kernel, peterz, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, kprateek.nayak, patryk.wlazlyn,
	rafael.j.wysocki, russ.anderson, zhao1.liu, tony.luck

Now that the x86 topology code has a sensible nodes-per-package
measure, that does not depend on the online status of CPUs, use this
to divinate the SNC mode.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/cpu/resctrl/monitor.c |   44 ----------------------------------
 1 file changed, 1 insertion(+), 43 deletions(-)

--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -364,51 +364,9 @@ void arch_mon_domain_online(struct rdt_r
 		msr_clear_bit(MSR_RMID_SNC_CONFIG, 0);
 }
 
-/* CPU models that support MSR_RMID_SNC_CONFIG */
-static const struct x86_cpu_id snc_cpu_ids[] __initconst = {
-	X86_MATCH_VFM(INTEL_ICELAKE_X, 0),
-	X86_MATCH_VFM(INTEL_SAPPHIRERAPIDS_X, 0),
-	X86_MATCH_VFM(INTEL_EMERALDRAPIDS_X, 0),
-	X86_MATCH_VFM(INTEL_GRANITERAPIDS_X, 0),
-	X86_MATCH_VFM(INTEL_ATOM_CRESTMONT_X, 0),
-	X86_MATCH_VFM(INTEL_ATOM_DARKMONT_X, 0),
-	{}
-};
-
-/*
- * There isn't a simple hardware bit that indicates whether a CPU is running
- * in Sub-NUMA Cluster (SNC) mode. Infer the state by comparing the
- * number of CPUs sharing the L3 cache with CPU0 to the number of CPUs in
- * the same NUMA node as CPU0.
- * It is not possible to accurately determine SNC state if the system is
- * booted with a maxcpus=N parameter. That distorts the ratio of SNC nodes
- * to L3 caches. It will be OK if system is booted with hyperthreading
- * disabled (since this doesn't affect the ratio).
- */
 static __init int snc_get_config(void)
 {
-	struct cacheinfo *ci = get_cpu_cacheinfo_level(0, RESCTRL_L3_CACHE);
-	const cpumask_t *node0_cpumask;
-	int cpus_per_node, cpus_per_l3;
-	int ret;
-
-	if (!x86_match_cpu(snc_cpu_ids) || !ci)
-		return 1;
-
-	cpus_read_lock();
-	if (num_online_cpus() != num_present_cpus())
-		pr_warn("Some CPUs offline, SNC detection may be incorrect\n");
-	cpus_read_unlock();
-
-	node0_cpumask = cpumask_of_node(cpu_to_node(0));
-
-	cpus_per_node = cpumask_weight(node0_cpumask);
-	cpus_per_l3 = cpumask_weight(&ci->shared_cpu_map);
-
-	if (!cpus_per_node || !cpus_per_l3)
-		return 1;
-
-	ret = cpus_per_l3 / cpus_per_node;
+	int ret = __num_nodes_per_package;
 
 	/* sanity check: Only valid results are 1, 2, 3, 4, 6 */
 	switch (ret) {



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-02-26 10:49 ` [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess Peter Zijlstra
@ 2026-02-26 17:07   ` Chen, Yu C
  2026-02-26 19:00     ` Tim Chen
  2026-02-27 11:56     ` Peter Zijlstra
  0 siblings, 2 replies; 32+ messages in thread
From: Chen, Yu C @ 2026-02-26 17:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, tim.c.chen, kyle.meyer, vinicius.gomes, brgerst,
	hpa, kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck, x86, tglx

Hi Peter,

On 2/26/2026 6:49 PM, Peter Zijlstra wrote:
> +	int u = __num_nodes_per_package;

Yes, this is much simpler, thanks for the patch!

> +	long d = 0;
> +	int x, y;
> +
> +	/*
> +	 * Is this a unit cluster on the trace?
> +	 */
> +	if ((i / u) == (j / u))
> +		return node_distance(i, j);

If the number of nodes per package is 3, we assume that
every 3 consecutive nodes are SNC siblings (on the same
trace):node0, node1, and node2 are SNC siblings, while
node3, node4, and node5 form another group of SNC siblings.

I have a curious thought: could it be possible that
node0, node2, and node4 are SNC siblings, and node1,
node3, and node5 are another set of SNC siblings instead?

Then I studied the code a little more, node ids are dynamically
allocated via the acpi_map_pxm_to_node, so the assignment of node
ids depends on the order in which each processor affinity structure
is listed in the SRAT table. For example, suppose CPU0 belongs to
package0 and CPU1 belongs to package1, but their entries are placed
consecutively in the SRAT. In this case, the Proximity Domain of
CPU0 would be mapped to node0 via acpi_map_pxm_to_node, and CPU1’s
Proximity Domain would be assigned node1. The logic above would
then treat them as belonging to the same package, even though they
are physically in different packages. However, I believe such a
scenario is unlikely to occur in practice in the BIOS and if it
happens it should be a BIOS bug if I understand correctly.

thanks,
Chenyu


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 3/6] x86/topo: Add __num_nodes_per_package
  2026-02-26 10:49 ` [RFC][PATCH 3/6] x86/topo: Add __num_nodes_per_package Peter Zijlstra
@ 2026-02-26 17:46   ` Kyle Meyer
  2026-02-27 11:57     ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Kyle Meyer @ 2026-02-26 17:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, tglx, linux-kernel, tim.c.chen, yu.c.chen, vinicius.gomes,
	brgerst, hpa, kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck

On Thu, Feb 26, 2026 at 11:49:12AM +0100, Peter Zijlstra wrote:
> Use the MADT and SRAT table data to compute __num_nodes_per_package.
> 
> This number is useful to divinate the various Intel CoD/SNC modes,
> since the platforms are failing to provide this otherwise.
> 
> Doing it this way is independent of the number of online CPUs and
> other such shenanigans.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/include/asm/topology.h |    1 +
>  arch/x86/kernel/cpu/common.c    |    3 +++
>  arch/x86/kernel/cpu/topology.c  |    9 ++++++++-
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -158,6 +158,7 @@ extern unsigned int __max_logical_packag
>  extern unsigned int __max_threads_per_core;
>  extern unsigned int __num_threads_per_package;
>  extern unsigned int __num_cores_per_package;
> +extern unsigned int __num_nodes_per_package;
>  
>  const char *get_topology_cpu_type_name(struct cpuinfo_x86 *c);
>  enum x86_topology_cpu_type get_topology_cpu_type(struct cpuinfo_x86 *c);
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -95,6 +95,9 @@ EXPORT_SYMBOL(__max_dies_per_package);
>  unsigned int __max_logical_packages __ro_after_init = 1;
>  EXPORT_SYMBOL(__max_logical_packages);
>  
> +unsigned int __num_nodes_per_package __ro_after_init = 1;
> +EXPORT_SYMBOL(__num_nodes_per_package);
> +
>  unsigned int __num_cores_per_package __ro_after_init = 1;
>  EXPORT_SYMBOL(__num_cores_per_package);
>  
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -497,11 +497,18 @@ void __init topology_init_possible_cpus(
>  	set_nr_cpu_ids(allowed);
>  
>  	cnta = domain_weight(TOPO_PKG_DOMAIN);
> +	cntb = domain_weight(TOPO_NUMA_DOMAIN);

We'll need to check CONFIG_NUMA here.

TOPO_NUMA_DOMAIN is undeclared when CONFIG_NUMA is not set.

> +
> +	__num_nodes_per_package = DIV_ROUND_UP(cntb, cnta);
> +
> +	pr_info("Max. logical packages: %3u\n", cnta);
> +	pr_info("Max. logical nodes:    %3u\n", cntb);
> +	pr_info("Num. nodes per package:%3u\n", __num_nodes_per_package);
> +
>  	cntb = domain_weight(TOPO_DIE_DOMAIN);
>  	__max_logical_packages = cnta;
>  	__max_dies_per_package = 1U << (get_count_order(cntb) - get_count_order(cnta));
>  
> -	pr_info("Max. logical packages: %3u\n", cnta);
>  	pr_info("Max. logical dies:     %3u\n", cntb);
>  	pr_info("Max. dies per package: %3u\n", __max_dies_per_package);
>  
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-02-26 17:07   ` Chen, Yu C
@ 2026-02-26 19:00     ` Tim Chen
  2026-02-26 22:11       ` Tim Chen
  2026-02-27 13:01       ` Peter Zijlstra
  2026-02-27 11:56     ` Peter Zijlstra
  1 sibling, 2 replies; 32+ messages in thread
From: Tim Chen @ 2026-02-26 19:00 UTC (permalink / raw)
  To: Chen, Yu C, Peter Zijlstra
  Cc: linux-kernel, kyle.meyer, vinicius.gomes, brgerst, hpa,
	kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki, russ.anderson,
	zhao1.liu, tony.luck, x86, tglx

On Fri, 2026-02-27 at 01:07 +0800, Chen, Yu C wrote:
> Hi Peter,
> 
> On 2/26/2026 6:49 PM, Peter Zijlstra wrote:
> > +	int u = __num_nodes_per_package;
> 
> Yes, this is much simpler, thanks for the patch!
> 
> > +	long d = 0;
> > +	int x, y;
> > +
> > +	/*
> > +	 * Is this a unit cluster on the trace?
> > +	 */
> > +	if ((i / u) == (j / u))
> > +		return node_distance(i, j);
> 
> If the number of nodes per package is 3, we assume that
> every 3 consecutive nodes are SNC siblings (on the same
> trace):node0, node1, and node2 are SNC siblings, while
> node3, node4, and node5 form another group of SNC siblings.
> 
> I have a curious thought: could it be possible that
> node0, node2, and node4 are SNC siblings, and node1,
> node3, and node5 are another set of SNC siblings instead?
> 
> Then I studied the code a little more, node ids are dynamically
> allocated via the acpi_map_pxm_to_node, so the assignment of node
> ids depends on the order in which each processor affinity structure
> is listed in the SRAT table. For example, suppose CPU0 belongs to
> package0 and CPU1 belongs to package1, but their entries are placed
> consecutively in the SRAT. In this case, the Proximity Domain of
> CPU0 would be mapped to node0 via acpi_map_pxm_to_node, and CPU1’s
> Proximity Domain would be assigned node1. The logic above would
> then treat them as belonging to the same package, even though they
> are physically in different packages. However, I believe such a
> scenario is unlikely to occur in practice in the BIOS and if it
> happens it should be a BIOS bug if I understand correctly.
> 
> 

May be a good idea to sanity check that the nodes in the first unit cluster
has the same package id and give a WARNING if that's not the case.

Tim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/6] x86/topo: SNC Divination
  2026-02-26 10:49 [RFC][PATCH 0/6] x86/topo: SNC Divination Peter Zijlstra
                   ` (5 preceding siblings ...)
  2026-02-26 10:49 ` [RFC][PATCH 6/6] x86/resctrl: Fix SNC detection Peter Zijlstra
@ 2026-02-26 19:16 ` Luck, Tony
  2026-03-02 18:21 ` Kyle Meyer
  7 siblings, 0 replies; 32+ messages in thread
From: Luck, Tony @ 2026-02-26 19:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, tglx, linux-kernel, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, kprateek.nayak, patryk.wlazlyn,
	rafael.j.wysocki, russ.anderson, zhao1.liu

On Thu, Feb 26, 2026 at 11:49:09AM +0100, Peter Zijlstra wrote:
> Compile and boot tested on non SNC only for now; I'll see if I can convince
> qemu to play along.

Tried it on an SNC 3 system. The resctrl patch still gets the right value
with these patches applied.

Tested-by: Tony Luck <tony.luck@intel.com>

-Tony

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 6/6] x86/resctrl: Fix SNC detection
  2026-02-26 10:49 ` [RFC][PATCH 6/6] x86/resctrl: Fix SNC detection Peter Zijlstra
@ 2026-02-26 19:42   ` Luck, Tony
  2026-02-26 20:47     ` Luck, Tony
  0 siblings, 1 reply; 32+ messages in thread
From: Luck, Tony @ 2026-02-26 19:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, tglx, linux-kernel, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, kprateek.nayak, patryk.wlazlyn,
	rafael.j.wysocki, russ.anderson, zhao1.liu

On Thu, Feb 26, 2026 at 11:49:15AM +0100, Peter Zijlstra wrote:
> Now that the x86 topology code has a sensible nodes-per-package
> measure, that does not depend on the online status of CPUs, use this
> to divinate the SNC mode.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/cpu/resctrl/monitor.c |   44 ----------------------------------
>  1 file changed, 1 insertion(+), 43 deletions(-)
> 
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -364,51 +364,9 @@ void arch_mon_domain_online(struct rdt_r
>  		msr_clear_bit(MSR_RMID_SNC_CONFIG, 0);
>  }
>  
> -/* CPU models that support MSR_RMID_SNC_CONFIG */
> -static const struct x86_cpu_id snc_cpu_ids[] __initconst = {
> -	X86_MATCH_VFM(INTEL_ICELAKE_X, 0),
> -	X86_MATCH_VFM(INTEL_SAPPHIRERAPIDS_X, 0),
> -	X86_MATCH_VFM(INTEL_EMERALDRAPIDS_X, 0),
> -	X86_MATCH_VFM(INTEL_GRANITERAPIDS_X, 0),
> -	X86_MATCH_VFM(INTEL_ATOM_CRESTMONT_X, 0),
> -	X86_MATCH_VFM(INTEL_ATOM_DARKMONT_X, 0),
> -	{}
> -};

It isn't safe to drop this and the x86_match_cpu() check.

These are the CPUs that implement SNC and MSR_RMID_SNC_CONFIG. So if you
set __num_nodes_per_package > 1 on an older CoD system Linux will
think this is SNC and poke this MSR (and get #GP).

-Tony

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 6/6] x86/resctrl: Fix SNC detection
  2026-02-26 19:42   ` Luck, Tony
@ 2026-02-26 20:47     ` Luck, Tony
  2026-02-27  9:26       ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Luck, Tony @ 2026-02-26 20:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, tglx, linux-kernel, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, kprateek.nayak, patryk.wlazlyn,
	rafael.j.wysocki, russ.anderson, zhao1.liu

I think this resctrl patch should look like this:

-Tony

From c1a1b6c681a3a466c6c0bb4ba5a2cdd4dbefdb60 Mon Sep 17 00:00:00 2001
From: Tony Luck <tony.luck@intel.com>
Date: Thu, 26 Feb 2026 12:35:54 -0800
Subject: [PATCH] x86/resctrl: Fix SNC detection

Now that the x86 topology code has a sensible nodes-per-package
measure, that does not depend on the online status of CPUs, use this
to divinate the SNC mode.

Note that when Cluster on Die (CoD) is configured on older systems this
will also show multiple NUMA nodes per package. Intel Resource Director
Technology is incomaptible with CoD. Print a warning and do not use the
fixup MSR_RMID_SNC_CONFIG.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 36 ++++-----------------------
 1 file changed, 5 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index e6a154240b8d..8ff0f78b8658 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -364,7 +364,7 @@ void arch_mon_domain_online(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 		msr_clear_bit(MSR_RMID_SNC_CONFIG, 0);
 }
 
-/* CPU models that support MSR_RMID_SNC_CONFIG */
+/* CPU models that support SNC and MSR_RMID_SNC_CONFIG */
 static const struct x86_cpu_id snc_cpu_ids[] __initconst = {
 	X86_MATCH_VFM(INTEL_ICELAKE_X, 0),
 	X86_MATCH_VFM(INTEL_SAPPHIRERAPIDS_X, 0),
@@ -375,40 +375,14 @@ static const struct x86_cpu_id snc_cpu_ids[] __initconst = {
 	{}
 };
 
-/*
- * There isn't a simple hardware bit that indicates whether a CPU is running
- * in Sub-NUMA Cluster (SNC) mode. Infer the state by comparing the
- * number of CPUs sharing the L3 cache with CPU0 to the number of CPUs in
- * the same NUMA node as CPU0.
- * It is not possible to accurately determine SNC state if the system is
- * booted with a maxcpus=N parameter. That distorts the ratio of SNC nodes
- * to L3 caches. It will be OK if system is booted with hyperthreading
- * disabled (since this doesn't affect the ratio).
- */
 static __init int snc_get_config(void)
 {
-	struct cacheinfo *ci = get_cpu_cacheinfo_level(0, RESCTRL_L3_CACHE);
-	const cpumask_t *node0_cpumask;
-	int cpus_per_node, cpus_per_l3;
-	int ret;
-
-	if (!x86_match_cpu(snc_cpu_ids) || !ci)
-		return 1;
+	int ret = __num_nodes_per_package;
 
-	cpus_read_lock();
-	if (num_online_cpus() != num_present_cpus())
-		pr_warn("Some CPUs offline, SNC detection may be incorrect\n");
-	cpus_read_unlock();
-
-	node0_cpumask = cpumask_of_node(cpu_to_node(0));
-
-	cpus_per_node = cpumask_weight(node0_cpumask);
-	cpus_per_l3 = cpumask_weight(&ci->shared_cpu_map);
-
-	if (!cpus_per_node || !cpus_per_l3)
+	if (__num_nodes_per_package > 1 && !x86_match_cpu(snc_cpu_ids)) {
+		pr_warn("CoD enabled system? Resctrl not supported\n");
 		return 1;
-
-	ret = cpus_per_l3 / cpus_per_node;
+	}
 
 	/* sanity check: Only valid results are 1, 2, 3, 4, 6 */
 	switch (ret) {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-02-26 19:00     ` Tim Chen
@ 2026-02-26 22:11       ` Tim Chen
  2026-02-26 22:25         ` Tim Chen
  2026-02-27 13:01       ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Tim Chen @ 2026-02-26 22:11 UTC (permalink / raw)
  To: Chen, Yu C, Peter Zijlstra
  Cc: linux-kernel, kyle.meyer, vinicius.gomes, brgerst, hpa,
	kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki, russ.anderson,
	zhao1.liu, tony.luck, x86, tglx

On Thu, 2026-02-26 at 11:00 -0800, Tim Chen wrote:
> On Fri, 2026-02-27 at 01:07 +0800, Chen, Yu C wrote:
> > Hi Peter,
> > 
> > On 2/26/2026 6:49 PM, Peter Zijlstra wrote:
> > > +	int u = __num_nodes_per_package;
> > 
> > Yes, this is much simpler, thanks for the patch!
> > 
> > > +	long d = 0;
> > > +	int x, y;
> > > +
> > > +	/*
> > > +	 * Is this a unit cluster on the trace?
> > > +	 */
> > > +	if ((i / u) == (j / u))
> > > +		return node_distance(i, j);
> > 
> > If the number of nodes per package is 3, we assume that
> > every 3 consecutive nodes are SNC siblings (on the same
> > trace):node0, node1, and node2 are SNC siblings, while
> > node3, node4, and node5 form another group of SNC siblings.
> > 
> > I have a curious thought: could it be possible that
> > node0, node2, and node4 are SNC siblings, and node1,
> > node3, and node5 are another set of SNC siblings instead?
> > 
> > Then I studied the code a little more, node ids are dynamically
> > allocated via the acpi_map_pxm_to_node, so the assignment of node
> > ids depends on the order in which each processor affinity structure
> > is listed in the SRAT table. For example, suppose CPU0 belongs to
> > package0 and CPU1 belongs to package1, but their entries are placed
> > consecutively in the SRAT. In this case, the Proximity Domain of
> > CPU0 would be mapped to node0 via acpi_map_pxm_to_node, and CPU1’s
> > Proximity Domain would be assigned node1. The logic above would
> > then treat them as belonging to the same package, even though they
> > are physically in different packages. However, I believe such a
> > scenario is unlikely to occur in practice in the BIOS and if it
> > happens it should be a BIOS bug if I understand correctly.
> > 
> > 
> 
> May be a good idea to sanity check that the nodes in the first unit cluster
> has the same package id and give a WARNING if that's not the case.
> 
Perhaps something like below

Tim

---
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d97f8f4e014c..38384ea5253a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -511,6 +511,24 @@ static int slit_cluster_distance(int i, int j)
 	int u = __num_nodes_per_package;
 	long d = 0;
 	int x, y;
+	static int valid_slit = 0;
+
+	if (valid_slit == -1)
+		return node_distance(i, j);
+
+	if (valid_slit == 0) {
+		/* Check first nodes in package are grouped together consecutively */
+		for (x = 0; x < u-1 ; x++) {
+			if (topology_physical_package_id(x) !=
+			    topology_physical_package_id(x+1)) {
+				pr_warn("Expect nodes %d and %d to be in the same package",
+						x, x+1);
+				valid_slit = -1;
+				return node_distance(i, j);
+			}
+		}
+		valid_slit = 1;
+	}
 
 	/*
 	 * Is this a unit cluster on the trace?


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-02-26 22:11       ` Tim Chen
@ 2026-02-26 22:25         ` Tim Chen
  0 siblings, 0 replies; 32+ messages in thread
From: Tim Chen @ 2026-02-26 22:25 UTC (permalink / raw)
  To: Chen, Yu C, Peter Zijlstra
  Cc: linux-kernel, kyle.meyer, vinicius.gomes, brgerst, hpa,
	kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki, russ.anderson,
	zhao1.liu, tony.luck, x86, tglx

On Thu, 2026-02-26 at 14:11 -0800, Tim Chen wrote:
> On Thu, 2026-02-26 at 11:00 -0800, Tim Chen wrote:
> > On Fri, 2026-02-27 at 01:07 +0800, Chen, Yu C wrote:
> > > Hi Peter,
> > > 
> > > On 2/26/2026 6:49 PM, Peter Zijlstra wrote:
> > > > +	int u = __num_nodes_per_package;
> > > 
> > > Yes, this is much simpler, thanks for the patch!
> > > 
> > > > +	long d = 0;
> > > > +	int x, y;
> > > > +
> > > > +	/*
> > > > +	 * Is this a unit cluster on the trace?
> > > > +	 */
> > > > +	if ((i / u) == (j / u))
> > > > +		return node_distance(i, j);
> > > 
> > > If the number of nodes per package is 3, we assume that
> > > every 3 consecutive nodes are SNC siblings (on the same
> > > trace):node0, node1, and node2 are SNC siblings, while
> > > node3, node4, and node5 form another group of SNC siblings.
> > > 
> > > I have a curious thought: could it be possible that
> > > node0, node2, and node4 are SNC siblings, and node1,
> > > node3, and node5 are another set of SNC siblings instead?
> > > 
> > > Then I studied the code a little more, node ids are dynamically
> > > allocated via the acpi_map_pxm_to_node, so the assignment of node
> > > ids depends on the order in which each processor affinity structure
> > > is listed in the SRAT table. For example, suppose CPU0 belongs to
> > > package0 and CPU1 belongs to package1, but their entries are placed
> > > consecutively in the SRAT. In this case, the Proximity Domain of
> > > CPU0 would be mapped to node0 via acpi_map_pxm_to_node, and CPU1’s
> > > Proximity Domain would be assigned node1. The logic above would
> > > then treat them as belonging to the same package, even though they
> > > are physically in different packages. However, I believe such a
> > > scenario is unlikely to occur in practice in the BIOS and if it
> > > happens it should be a BIOS bug if I understand correctly.
> > > 
> > > 
> > 
> > May be a good idea to sanity check that the nodes in the first unit cluster
> > has the same package id and give a WARNING if that's not the case.
> > 
> Perhaps something like below
> 
> Tim
> 
> ---
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index d97f8f4e014c..38384ea5253a 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -511,6 +511,24 @@ static int slit_cluster_distance(int i, int j)
>  	int u = __num_nodes_per_package;
>  	long d = 0;
>  	int x, y;
> +	static int valid_slit = 0;
> +
> +	if (valid_slit == -1)
> +		return node_distance(i, j);
> +
> +	if (valid_slit == 0) {
> +		/* Check first nodes in package are grouped together consecutively */
> +		for (x = 0; x < u-1 ; x++) {
> +			if (topology_physical_package_id(x) !=
> +			    topology_physical_package_id(x+1)) {

This won't work because topology_physical_package_id() takes cpu
as argument.  Will need to find the first cpu of x and x+1 and pass to it.

> +				pr_warn("Expect nodes %d and %d to be in the same package",
> +						x, x+1);
> +				valid_slit = -1;
> +				return node_distance(i, j);
> +			}
> +		}
> +		valid_slit = 1;
> +	}
>  
>  	/*
>  	 * Is this a unit cluster on the trace?
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 6/6] x86/resctrl: Fix SNC detection
  2026-02-26 20:47     ` Luck, Tony
@ 2026-02-27  9:26       ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2026-02-27  9:26 UTC (permalink / raw)
  To: Luck, Tony
  Cc: x86, tglx, linux-kernel, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, kprateek.nayak, patryk.wlazlyn,
	rafael.j.wysocki, russ.anderson, zhao1.liu

On Thu, Feb 26, 2026 at 12:47:41PM -0800, Luck, Tony wrote:
> I think this resctrl patch should look like this:


Ah, great. I was a little heavy on the delete button indeed.

Also, I hope you told the RDT guys what you think about being required
to know the SNC number, but them not providing it anywhere ;-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-02-26 17:07   ` Chen, Yu C
  2026-02-26 19:00     ` Tim Chen
@ 2026-02-27 11:56     ` Peter Zijlstra
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2026-02-27 11:56 UTC (permalink / raw)
  To: Chen, Yu C
  Cc: linux-kernel, tim.c.chen, kyle.meyer, vinicius.gomes, brgerst,
	hpa, kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck, x86, tglx

On Fri, Feb 27, 2026 at 01:07:40AM +0800, Chen, Yu C wrote:
> Hi Peter,
> 
> On 2/26/2026 6:49 PM, Peter Zijlstra wrote:
> > +	int u = __num_nodes_per_package;
> 
> Yes, this is much simpler, thanks for the patch!
> 
> > +	long d = 0;
> > +	int x, y;
> > +
> > +	/*
> > +	 * Is this a unit cluster on the trace?
> > +	 */
> > +	if ((i / u) == (j / u))
> > +		return node_distance(i, j);
> 
> If the number of nodes per package is 3, we assume that
> every 3 consecutive nodes are SNC siblings (on the same
> trace):node0, node1, and node2 are SNC siblings, while
> node3, node4, and node5 form another group of SNC siblings.
> 
> I have a curious thought: could it be possible that
> node0, node2, and node4 are SNC siblings, and node1,
> node3, and node5 are another set of SNC siblings instead?

Yes, give a BIOS guy enough bong-hits and this can be.

That said (and knock on wood), I've so far never seen this (and please
people, don't take this as a challenge).

> Then I studied the code a little more, node ids are dynamically
> allocated via the acpi_map_pxm_to_node, so the assignment of node
> ids depends on the order in which each processor affinity structure
> is listed in the SRAT table. For example, suppose CPU0 belongs to
> package0 and CPU1 belongs to package1, but their entries are placed
> consecutively in the SRAT. In this case, the Proximity Domain of
> CPU0 would be mapped to node0 via acpi_map_pxm_to_node, and CPU1’s
> Proximity Domain would be assigned node1. The logic above would
> then treat them as belonging to the same package, even though they
> are physically in different packages. However, I believe such a
> scenario is unlikely to occur in practice in the BIOS and if it
> happens it should be a BIOS bug if I understand correctly.

Just so.

The thing I worried about is getting memory only nodes iterated in
between or something. But as long as the CPU enumeration happens before
the 'other' crud, then the CPU node mappings should be the consecutive
low numbers and it all just works.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 3/6] x86/topo: Add __num_nodes_per_package
  2026-02-26 17:46   ` Kyle Meyer
@ 2026-02-27 11:57     ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2026-02-27 11:57 UTC (permalink / raw)
  To: Kyle Meyer
  Cc: x86, tglx, linux-kernel, tim.c.chen, yu.c.chen, vinicius.gomes,
	brgerst, hpa, kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck

On Thu, Feb 26, 2026 at 11:46:15AM -0600, Kyle Meyer wrote:

> > --- a/arch/x86/kernel/cpu/topology.c
> > +++ b/arch/x86/kernel/cpu/topology.c
> > @@ -497,11 +497,18 @@ void __init topology_init_possible_cpus(
> >  	set_nr_cpu_ids(allowed);
> >  
> >  	cnta = domain_weight(TOPO_PKG_DOMAIN);
> > +	cntb = domain_weight(TOPO_NUMA_DOMAIN);
> 
> We'll need to check CONFIG_NUMA here.
> 
> TOPO_NUMA_DOMAIN is undeclared when CONFIG_NUMA is not set.

Indeed.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-02-26 19:00     ` Tim Chen
  2026-02-26 22:11       ` Tim Chen
@ 2026-02-27 13:01       ` Peter Zijlstra
  2026-02-27 19:23         ` Tim Chen
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2026-02-27 13:01 UTC (permalink / raw)
  To: Tim Chen
  Cc: Chen, Yu C, linux-kernel, kyle.meyer, vinicius.gomes, brgerst,
	hpa, kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck, x86, tglx

On Thu, Feb 26, 2026 at 11:00:38AM -0800, Tim Chen wrote:
> May be a good idea to sanity check that the nodes in the first unit cluster
> has the same package id and give a WARNING if that's not the case.

But then we'd also have check the second cluster is another package. And
if we're checking that, we might as well check they're symmetric.

Is this sufficiently paranoid for you? :-)


---
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -61,6 +61,7 @@
 #include <linux/cpuhotplug.h>
 #include <linux/mc146818rtc.h>
 #include <linux/acpi.h>
+#include <linux/once_lite.h>
 
 #include <asm/acpi.h>
 #include <asm/cacheinfo.h>
@@ -506,12 +507,58 @@ static void __init build_sched_topology(
 }
 
 #ifdef CONFIG_NUMA
+static bool slit_cluster_symmetric(int N)
+{
+	for (int k = 0; k < __num_nodes_per_package; k++) {
+		for (int l = k; l < __num_nodes_per_package; l++) {
+			if (node_distance(N + k, N + l) != 
+			    node_distance(N + l, N + k))
+				return false;
+		}
+	}
+
+	return true;
+}
+
+static u32 slit_cluster_package(int N)
+{
+	u32 pkg_id = ~0;
+
+	for (int n = 0; n < __num_nodes_per_package; n++) {
+		const struct cpumask *cpus = cpumask_of_node(N + n);
+		int cpu;
+
+		for_each_cpu(cpu, cpus) {
+			u32 id = topology_logical_package_id(cpu);
+			if (pkg_id == ~0)
+				pkg_id = id;
+			if (pkg_id != id)
+				return ~0;
+		}
+	}
+
+	return pkg_id;
+}
+
+/* If you NUMA_EMU on top of SNC, you get to keep the pieces */
+static void slit_validate(void)
+{
+	u32 pkg1 = slit_cluster_package(0);
+	u32 pkg2 = slit_cluster_package(__num_nodes_per_package);
+	WARN_ON(pkg1 == ~0 || pkg2 == ~0 || pkg1 == pkg2);
+
+	WARN_ON(!slit_cluster_symmetric(0));
+	WARN_ON(!slit_cluster_symmetric(__num_nodes_per_package));
+}
+
 static int slit_cluster_distance(int i, int j)
 {
 	int u = __num_nodes_per_package;
 	long d = 0;
 	int x, y;
 
+	DO_ONCE_LITE(slit_validate);
+
 	/*
 	 * Is this a unit cluster on the trace?
 	 */


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 2/6] x86/topo: Add TOPO_NUMA_DOMAIN
  2026-02-26 10:49 ` [RFC][PATCH 2/6] x86/topo: Add TOPO_NUMA_DOMAIN Peter Zijlstra
@ 2026-02-27 13:19   ` K Prateek Nayak
  2026-02-27 14:06     ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: K Prateek Nayak @ 2026-02-27 13:19 UTC (permalink / raw)
  To: Peter Zijlstra, x86, tglx
  Cc: linux-kernel, tim.c.chen, yu.c.chen, kyle.meyer, vinicius.gomes,
	brgerst, hpa, patryk.wlazlyn, rafael.j.wysocki, russ.anderson,
	zhao1.liu, tony.luck

Hello Peter,

On 2/26/2026 4:19 PM, Peter Zijlstra wrote:
> @@ -88,6 +89,14 @@ static inline u32 topo_apicid(u32 apicid
>  {
>  	if (dom == TOPO_SMT_DOMAIN)
>  		return apicid;
> +#ifdef CONFIG_NUMA
> +	if (dom == TOPO_NUMA_DOMAIN) {
> +		int nid = __apicid_to_phys_node[apicid];
> +		if (nid == NUMA_NO_NODE)
> +			nid = 0;
> +		return nid;
> +	}
> +#endif

I'm not digging this override - simply because topo_apicid() was not
meant to handle these kinds of cases where we cannot derive a topology
ID by simply shifting and masking the APICID.

Looking at the series, all we need is an equivalent of:

  domain_weight(TOPO_NUMA_DOMAIN)

so can we do something like the following on top of the changes in this
series:

  (!CONFIG_NUMA has only been build tested)

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index 5d6da2ad84e5..05461e2cd931 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -53,6 +53,8 @@ extern void __init init_cpu_to_node(void);
 extern void numa_add_cpu(unsigned int cpu);
 extern void numa_remove_cpu(unsigned int cpu);
 extern void init_gi_nodes(void);
+extern void __init topo_register_apic_phys_node(int apicid);
+extern int num_phys_nodes(void);
 #else	/* CONFIG_NUMA */
 static inline void numa_set_node(int cpu, int node)	{ }
 static inline void numa_clear_node(int cpu)		{ }
@@ -60,6 +62,11 @@ static inline void init_cpu_to_node(void)		{ }
 static inline void numa_add_cpu(unsigned int cpu)	{ }
 static inline void numa_remove_cpu(unsigned int cpu)	{ }
 static inline void init_gi_nodes(void)			{ }
+static inline void __init topo_register_apic_phys_node(int apicid) { }
+static inline int num_phys_nodes(void)
+{
+	return 1;
+}
 #endif	/* CONFIG_NUMA */
 
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 7fe9ea4ee1e7..9b3f92c5f0e0 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -111,9 +111,6 @@ enum x86_topology_domains {
 	TOPO_DIE_DOMAIN,
 	TOPO_DIEGRP_DOMAIN,
 	TOPO_PKG_DOMAIN,
-#ifdef CONFIG_NUMA
-	TOPO_NUMA_DOMAIN,
-#endif
 	TOPO_MAX_DOMAIN,
 };
 
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 399388213bc0..1d3bed3ae40e 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -89,14 +89,6 @@ static inline u32 topo_apicid(u32 apicid, enum x86_topology_domains dom)
 {
 	if (dom == TOPO_SMT_DOMAIN)
 		return apicid;
-#ifdef CONFIG_NUMA
-	if (dom == TOPO_NUMA_DOMAIN) {
-		int nid = __apicid_to_phys_node[apicid];
-		if (nid == NUMA_NO_NODE)
-			nid = 0;
-		return nid;
-	}
-#endif
 	return apicid & (UINT_MAX << x86_topo_system.dom_shifts[dom - 1]);
 }
 
@@ -254,6 +246,8 @@ static __init void topo_register_apic(u32 apic_id, u32 acpi_id, bool present)
 	 */
 	for (dom = TOPO_SMT_DOMAIN; dom < TOPO_MAX_DOMAIN; dom++)
 		set_bit(topo_apicid(apic_id, dom), apic_maps[dom].map);
+
+	topo_register_apic_phys_node(apic_id);
 }
 
 /**
@@ -501,7 +495,7 @@ void __init topology_init_possible_cpus(void)
 	set_nr_cpu_ids(allowed);
 
 	cnta = domain_weight(TOPO_PKG_DOMAIN);
-	cntb = domain_weight(TOPO_NUMA_DOMAIN);
+	cntb = num_phys_nodes();
 
 	__num_nodes_per_package = DIV_ROUND_UP(cntb, cnta);
 
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 4556a1561aa0..b60076745a32 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -52,6 +52,8 @@ s16 __apicid_to_phys_node[MAX_LOCAL_APIC] = {
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
 
+static nodemask_t apic_phys_node_map __ro_after_init;
+
 int numa_cpu_node(int cpu)
 {
 	u32 apicid = early_per_cpu(x86_cpu_to_apicid, cpu);
@@ -61,6 +63,24 @@ int numa_cpu_node(int cpu)
 	return NUMA_NO_NODE;
 }
 
+static int topo_apicid_to_node(int apicid)
+{
+	int nid = __apicid_to_phys_node[apicid];
+	if (nid == NUMA_NO_NODE)
+		nid = 0;
+	return nid;
+}
+
+void __init topo_register_apic_phys_node(int apicid)
+{
+	set_bit(topo_apicid_to_node(apicid), apic_phys_node_map.bits);
+}
+
+int __init num_phys_nodes(void)
+{
+	return bitmap_weight(apic_phys_node_map.bits, MAX_NUMNODES);
+}
+
 cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
 EXPORT_SYMBOL(node_to_cpumask_map);
 
---

Slightly larger diffstat but all the NUMA bits are together.

Thoughts?

>  	return apicid & (UINT_MAX << x86_topo_system.dom_shifts[dom - 1]);
>  }
>  

-- 
Thanks and Regards,
Prateek


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 2/6] x86/topo: Add TOPO_NUMA_DOMAIN
  2026-02-27 13:19   ` K Prateek Nayak
@ 2026-02-27 14:06     ` Peter Zijlstra
  2026-03-02  4:16       ` K Prateek Nayak
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2026-02-27 14:06 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: x86, tglx, linux-kernel, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck

On Fri, Feb 27, 2026 at 06:49:36PM +0530, K Prateek Nayak wrote:
> Hello Peter,
> 
> On 2/26/2026 4:19 PM, Peter Zijlstra wrote:
> > @@ -88,6 +89,14 @@ static inline u32 topo_apicid(u32 apicid
> >  {
> >  	if (dom == TOPO_SMT_DOMAIN)
> >  		return apicid;
> > +#ifdef CONFIG_NUMA
> > +	if (dom == TOPO_NUMA_DOMAIN) {
> > +		int nid = __apicid_to_phys_node[apicid];
> > +		if (nid == NUMA_NO_NODE)
> > +			nid = 0;
> > +		return nid;
> > +	}
> > +#endif
> 
> I'm not digging this override - simply because topo_apicid() was not
> meant to handle these kinds of cases where we cannot derive a topology
> ID by simply shifting and masking the APICID.
> 
> Looking at the series, all we need is an equivalent of:
> 
>   domain_weight(TOPO_NUMA_DOMAIN)

Fair enough; but then lets replace patch 1 and 2 with something like
that.

But I must note that the nodemask API is crap; it has both node_set() and
__node_set() be the atomic version :-(

Let me go rework the other patches to fit on this.

---
diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index 53ba39ce010c..a9063f332fa6 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -22,6 +22,7 @@ extern int numa_off;
  */
 extern s16 __apicid_to_node[MAX_LOCAL_APIC];
 extern nodemask_t numa_nodes_parsed __initdata;
+extern nodemask_t numa_phys_nodes_parsed __initdata;
 
 static inline void set_apicid_to_node(int apicid, s16 node)
 {
@@ -48,6 +49,7 @@ extern void __init init_cpu_to_node(void);
 extern void numa_add_cpu(unsigned int cpu);
 extern void numa_remove_cpu(unsigned int cpu);
 extern void init_gi_nodes(void);
+extern int num_phys_nodes(void);
 #else	/* CONFIG_NUMA */
 static inline void numa_set_node(int cpu, int node)	{ }
 static inline void numa_clear_node(int cpu)		{ }
@@ -55,6 +57,10 @@ static inline void init_cpu_to_node(void)		{ }
 static inline void numa_add_cpu(unsigned int cpu)	{ }
 static inline void numa_remove_cpu(unsigned int cpu)	{ }
 static inline void init_gi_nodes(void)			{ }
+static inline int num_phys_nodes(void)
+{
+	return 1;
+}
 #endif	/* CONFIG_NUMA */
 
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 23190a786d31..bfcd33127789 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -31,6 +31,7 @@
 #include <asm/mpspec.h>
 #include <asm/msr.h>
 #include <asm/smp.h>
+#include <asm/numa.h>
 
 #include "cpu.h"
 
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 7a97327140df..99d0a9332c14 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -48,6 +48,8 @@ s16 __apicid_to_node[MAX_LOCAL_APIC] = {
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
 
+nodemask_t numa_phys_nodes_parsed __initdata;
+
 int numa_cpu_node(int cpu)
 {
 	u32 apicid = early_per_cpu(x86_cpu_to_apicid, cpu);
@@ -57,6 +59,11 @@ int numa_cpu_node(int cpu)
 	return NUMA_NO_NODE;
 }
 
+int __init num_phys_nodes(void)
+{
+	return bitmap_weight(numa_phys_nodes_parsed.bits, MAX_NUMNODES);
+}
+
 cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
 EXPORT_SYMBOL(node_to_cpumask_map);
 
@@ -210,6 +217,7 @@ static int __init dummy_numa_init(void)
 	       0LLU, PFN_PHYS(max_pfn) - 1);
 
 	node_set(0, numa_nodes_parsed);
+	node_set(0, numa_phys_nodes_parsed);
 	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
 
 	return 0;
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index 6f8e0f21c710..44ca66651756 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -57,6 +57,7 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
 	}
 	set_apicid_to_node(apic_id, node);
 	node_set(node, numa_nodes_parsed);
+	node_set(node, numa_phys_nodes_parsed);
 	pr_debug("SRAT: PXM %u -> APIC 0x%04x -> Node %u\n", pxm, apic_id, node);
 }
 
@@ -97,6 +98,7 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
 
 	set_apicid_to_node(apic_id, node);
 	node_set(node, numa_nodes_parsed);
+	node_set(node, numa_phys_nodes_parsed);
 	pr_debug("SRAT: PXM %u -> APIC 0x%02x -> Node %u\n", pxm, apic_id, node);
 }
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-02-27 13:01       ` Peter Zijlstra
@ 2026-02-27 19:23         ` Tim Chen
  2026-02-28  7:35           ` Chen, Yu C
  0 siblings, 1 reply; 32+ messages in thread
From: Tim Chen @ 2026-02-27 19:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Chen, Yu C, linux-kernel, kyle.meyer, vinicius.gomes, brgerst,
	hpa, kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck, x86, tglx

On Fri, 2026-02-27 at 14:01 +0100, Peter Zijlstra wrote:
> On Thu, Feb 26, 2026 at 11:00:38AM -0800, Tim Chen wrote:
> > May be a good idea to sanity check that the nodes in the first unit cluster
> > has the same package id and give a WARNING if that's not the case.
> 
> But then we'd also have check the second cluster is another package. And
> if we're checking that, we might as well check they're symmetric.
> 
> Is this sufficiently paranoid for you? :-)
> 

Thanks. Looks pretty good and hopefully those warnings will
never be triggered.

Tim

> 
> ---
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -61,6 +61,7 @@
>  #include <linux/cpuhotplug.h>
>  #include <linux/mc146818rtc.h>
>  #include <linux/acpi.h>
> +#include <linux/once_lite.h>
>  
>  #include <asm/acpi.h>
>  #include <asm/cacheinfo.h>
> @@ -506,12 +507,58 @@ static void __init build_sched_topology(
>  }
>  
>  #ifdef CONFIG_NUMA
> +static bool slit_cluster_symmetric(int N)
> +{
> +	for (int k = 0; k < __num_nodes_per_package; k++) {
> +		for (int l = k; l < __num_nodes_per_package; l++) {
> +			if (node_distance(N + k, N + l) != 
> +			    node_distance(N + l, N + k))
> +				return false;
> +		}
> +	}
> +
> +	return true;
> +}
> +
> +static u32 slit_cluster_package(int N)
> +{
> +	u32 pkg_id = ~0;
> +
> +	for (int n = 0; n < __num_nodes_per_package; n++) {
> +		const struct cpumask *cpus = cpumask_of_node(N + n);
> +		int cpu;
> +
> +		for_each_cpu(cpu, cpus) {
> +			u32 id = topology_logical_package_id(cpu);
> +			if (pkg_id == ~0)
> +				pkg_id = id;
> +			if (pkg_id != id)
> +				return ~0;
> +		}
> +	}
> +
> +	return pkg_id;
> +}
> +
> +/* If you NUMA_EMU on top of SNC, you get to keep the pieces */
> +static void slit_validate(void)
> +{
> +	u32 pkg1 = slit_cluster_package(0);
> +	u32 pkg2 = slit_cluster_package(__num_nodes_per_package);
> +	WARN_ON(pkg1 == ~0 || pkg2 == ~0 || pkg1 == pkg2);
> +
> +	WARN_ON(!slit_cluster_symmetric(0));
> +	WARN_ON(!slit_cluster_symmetric(__num_nodes_per_package));
> +}
> +
>  static int slit_cluster_distance(int i, int j)
>  {
>  	int u = __num_nodes_per_package;
>  	long d = 0;
>  	int x, y;
>  
> +	DO_ONCE_LITE(slit_validate);
> +
>  	/*
>  	 * Is this a unit cluster on the trace?
>  	 */
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-02-27 19:23         ` Tim Chen
@ 2026-02-28  7:35           ` Chen, Yu C
  2026-03-02 16:43             ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Chen, Yu C @ 2026-02-28  7:35 UTC (permalink / raw)
  To: Tim Chen, Peter Zijlstra
  Cc: linux-kernel, kyle.meyer, vinicius.gomes, brgerst, hpa,
	kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki, russ.anderson,
	zhao1.liu, tony.luck, x86, tglx

On 2/28/2026 3:23 AM, Tim Chen wrote:
> On Fri, 2026-02-27 at 14:01 +0100, Peter Zijlstra wrote:
>> On Thu, Feb 26, 2026 at 11:00:38AM -0800, Tim Chen wrote:
>>> May be a good idea to sanity check that the nodes in the first unit cluster
>>> has the same package id and give a WARNING if that's not the case.
>>
>> But then we'd also have check the second cluster is another package. And
>> if we're checking that, we might as well check they're symmetric.
>>
>> Is this sufficiently paranoid for you? :-)
>>
> 
> Thanks. Looks pretty good and hopefully those warnings will
> never be triggered.
> 
>> +
>> +/* If you NUMA_EMU on top of SNC, you get to keep the pieces */
>> +static void slit_validate(void)
>> +{
>> +	u32 pkg1 = slit_cluster_package(0);
>> +	u32 pkg2 = slit_cluster_package(__num_nodes_per_package);
>> +	WARN_ON(pkg1 == ~0 || pkg2 == ~0 || pkg1 == pkg2);
>> +
>> +	WARN_ON(!slit_cluster_symmetric(0));
>> +	WARN_ON(!slit_cluster_symmetric(__num_nodes_per_package));
>> +}
>> +

Here we check packages0 and 1, should we check all the packages?

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8735f1968b00..91bac3e2e7fd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -543,12 +543,13 @@ static u32 slit_cluster_package(int N)
  /* If you NUMA_EMU on top of SNC, you get to keep the pieces */
  static void slit_validate(void)
  {
-       u32 pkg1 = slit_cluster_package(0);
-       u32 pkg2 = slit_cluster_package(__num_nodes_per_package);
-       WARN_ON(pkg1 == ~0 || pkg2 == ~0 || pkg1 == pkg2);
+       for (int pkg = 0; pkg < topology_max_packages(); pkg++) {
+               int node = pkg * __num_nodes_per_package;
+               u32 pkg_id = slit_cluster_package(node);

-       WARN_ON(!slit_cluster_symmetric(0));
-       WARN_ON(!slit_cluster_symmetric(__num_nodes_per_package));
+               WARN_ON(pkg_id == ~0);
+               WARN_ON(!slit_cluster_symmetric(node));
+       }
  }


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 2/6] x86/topo: Add TOPO_NUMA_DOMAIN
  2026-02-27 14:06     ` Peter Zijlstra
@ 2026-03-02  4:16       ` K Prateek Nayak
  2026-03-02 15:10         ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: K Prateek Nayak @ 2026-03-02  4:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, tglx, linux-kernel, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck

Hello Peter,

On 2/27/2026 7:36 PM, Peter Zijlstra wrote:
>> Looking at the series, all we need is an equivalent of:
>>
>>   domain_weight(TOPO_NUMA_DOMAIN)
> 
> Fair enough; but then lets replace patch 1 and 2 with something like
> that.
> 
> But I must note that the nodemask API is crap; it has both node_set() and
> __node_set() be the atomic version :-(
> 
> Let me go rework the other patches to fit on this.

Boots fine with a s/domain_weight(TOPO_NUMA_DOMAIN)/num_phys_nodes()/
applied to Patch 3.

Topology looks fine for NPS4 on my 3rd Generation EPYC with 2 sockets,
and I haven't triggered any warning even with "L3 as NUMA" turned on.
Feel free to include:

Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

-- 
Thanks and Regards,
Prateek


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 2/6] x86/topo: Add TOPO_NUMA_DOMAIN
  2026-03-02  4:16       ` K Prateek Nayak
@ 2026-03-02 15:10         ` Peter Zijlstra
  2026-03-02 15:35           ` K Prateek Nayak
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2026-03-02 15:10 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: x86, tglx, linux-kernel, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck

On Mon, Mar 02, 2026 at 09:46:57AM +0530, K Prateek Nayak wrote:
> Hello Peter,
> 
> On 2/27/2026 7:36 PM, Peter Zijlstra wrote:
> >> Looking at the series, all we need is an equivalent of:
> >>
> >>   domain_weight(TOPO_NUMA_DOMAIN)
> > 
> > Fair enough; but then lets replace patch 1 and 2 with something like
> > that.
> > 
> > But I must note that the nodemask API is crap; it has both node_set() and
> > __node_set() be the atomic version :-(
> > 
> > Let me go rework the other patches to fit on this.
> 
> Boots fine with a s/domain_weight(TOPO_NUMA_DOMAIN)/num_phys_nodes()/
> applied to Patch 3.
> 
> Topology looks fine for NPS4 on my 3rd Generation EPYC with 2 sockets,
> and I haven't triggered any warning even with "L3 as NUMA" turned on.
> Feel free to include:
> 
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

Thanks!

I had a quick look at this NPS stuff, and that is more or less the same
as the intel SNC thing. With two notable exceptions:

 - you've stuck to power-of-two numbers (good!)

 - NPS0; I don't think Intel has anything like that (although I could be
   mistaken).

Now, the __num_nodes_per_package is obviously not going to work for
NPS0 (it bottoms out at 1).

Should we look at adding something for NPS0, or has that not been needed
(yet) ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 2/6] x86/topo: Add TOPO_NUMA_DOMAIN
  2026-03-02 15:10         ` Peter Zijlstra
@ 2026-03-02 15:35           ` K Prateek Nayak
  2026-03-02 16:28             ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: K Prateek Nayak @ 2026-03-02 15:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, tglx, linux-kernel, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck

Hello Peter,

On 3/2/2026 8:40 PM, Peter Zijlstra wrote:
> On Mon, Mar 02, 2026 at 09:46:57AM +0530, K Prateek Nayak wrote:
>> Hello Peter,
>>
>> On 2/27/2026 7:36 PM, Peter Zijlstra wrote:
>>>> Looking at the series, all we need is an equivalent of:
>>>>
>>>>   domain_weight(TOPO_NUMA_DOMAIN)
>>>
>>> Fair enough; but then lets replace patch 1 and 2 with something like
>>> that.
>>>
>>> But I must note that the nodemask API is crap; it has both node_set() and
>>> __node_set() be the atomic version :-(
>>>
>>> Let me go rework the other patches to fit on this.
>>
>> Boots fine with a s/domain_weight(TOPO_NUMA_DOMAIN)/num_phys_nodes()/
>> applied to Patch 3.
>>
>> Topology looks fine for NPS4 on my 3rd Generation EPYC with 2 sockets,
>> and I haven't triggered any warning even with "L3 as NUMA" turned on.
>> Feel free to include:
>>
>> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> 
> Thanks!
> 
> I had a quick look at this NPS stuff, and that is more or less the same
> as the intel SNC thing. With two notable exceptions:
> 
>  - you've stuck to power-of-two numbers (good!)

Yeah but "L3 as NUMA" on a 6CCX machines doesn't follow that :-(
Is there any implicit dependency there?

P.S. All these configs are symmetric so those divisions should give the
correct results.

> 
>  - NPS0; I don't think Intel has anything like that (although I could be
>    mistaken).
> 
> Now, the __num_nodes_per_package is obviously not going to work for
> NPS0 (it bottoms out at 1).
> 
> Should we look at adding something for NPS0, or has that not been needed
> (yet) ?

Let me go boot into NPS0 to see what my machine thinks. But it shouldn't
do any harm right because of the DIV_ROUND_UP() right?

__num_nodes_per_package will be 1 (which is technically correct since
the whole package is indeed one node) and then we retain the PKG domain
so as far as those bits are concerned, it should be fine.

This is from my dual socket Zen3 booted into NPS0:

    CPU topo: Max. logical packages:   2
    CPU topo: Max. logical nodes:      1
    CPU topo: Num. nodes per package:  1
    CPU topo: Max. logical dies:       2
    CPU topo: Max. dies per package:   1
    CPU topo: Max. threads per core:   2
    CPU topo: Num. cores per package:    64
    CPU topo: Num. threads per package: 128
    CPU topo: Allowing 256 present CPUs plus 0 hotplug CPUs


CPU0's scheduler topology looks like:

    CPU0 attaching sched-domain(s):
     domain-0: span=0,128 level=SMT
      groups: 0:{ span=0 },
            128:{ span=128 }
      domain-1: span=0-7,128-135 level=MC
       groups: 0:{ span=0,128 cap=2048 },
               1:{ span=1,129 cap=2048 },
               2:{ span=2,130 cap=2048 },
               3:{ span=3,131 cap=2048 },
               4:{ span=4,132 cap=2048 },
               5:{ span=5,133 cap=2048 },
               6:{ span=6,134 cap=2048 },
               7:{ span=7,135 cap=2048 }
       domain-2: span=0-255 level=PKG
        groups:  0:{ span=0-7,128-135 cap=16384 },
                 8:{ span=8-15,136-143 cap=16384 },
                16:{ span=16-23,144-151 cap=16384 },
                24:{ span=24-31,152-159 cap=16384 },
                32:{ span=32-39,160-167 cap=16384 },
                40:{ span=40-47,168-175 cap=16384 },
                48:{ span=48-55,176-183 cap=16384 },
                56:{ span=56-63,184-191 cap=16384 },
                64:{ span=64-71,192-199 cap=16384 },
                72:{ span=72-79,200-207 cap=16384 },
                80:{ span=80-87,208-215 cap=16384 },
                88:{ span=88-95,216-223 cap=16384 },
                96:{ span=96-103,224-231 cap=16384 },
               104:{ span=104-111,232-239 cap=16384 },
               112:{ span=112-119,240-247 cap=16384 },
               120:{ span=120-127,248-255 cap=16384 }
    ...
    root domain span: 0-255


The PKG domain covers both the sockets since it uses the node mask which
covers the entire system.

-- 
Thanks and Regards,
Prateek


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 2/6] x86/topo: Add TOPO_NUMA_DOMAIN
  2026-03-02 15:35           ` K Prateek Nayak
@ 2026-03-02 16:28             ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2026-03-02 16:28 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: x86, tglx, linux-kernel, tim.c.chen, yu.c.chen, kyle.meyer,
	vinicius.gomes, brgerst, hpa, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck

On Mon, Mar 02, 2026 at 09:05:03PM +0530, K Prateek Nayak wrote:

> > I had a quick look at this NPS stuff, and that is more or less the same
> > as the intel SNC thing. With two notable exceptions:
> > 
> >  - you've stuck to power-of-two numbers (good!)
> 
> Yeah but "L3 as NUMA" on a 6CCX machines doesn't follow that :-(
> Is there any implicit dependency there?
> 
> P.S. All these configs are symmetric so those divisions should give the
> correct results.

Nah, the code here doesn't care. Specifically the case at hand was
SNC-3, where we make one package have 3 nodes.

> >  - NPS0; I don't think Intel has anything like that (although I could be
> >    mistaken).
> > 
> > Now, the __num_nodes_per_package is obviously not going to work for
> > NPS0 (it bottoms out at 1).
> > 
> > Should we look at adding something for NPS0, or has that not been needed
> > (yet) ?
> 
> Let me go boot into NPS0 to see what my machine thinks. But it shouldn't
> do any harm right because of the DIV_ROUND_UP() right?

Right, no harm. And I've since realized you can detect it by:

	num_phys_nodes() == 1 && topology_max_packages() == 2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-02-28  7:35           ` Chen, Yu C
@ 2026-03-02 16:43             ` Peter Zijlstra
  2026-03-03  6:31               ` Zhang Rui
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2026-03-02 16:43 UTC (permalink / raw)
  To: Chen, Yu C
  Cc: Tim Chen, linux-kernel, kyle.meyer, vinicius.gomes, brgerst, hpa,
	kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki, russ.anderson,
	zhao1.liu, tony.luck, x86, tglx

nt On Sat, Feb 28, 2026 at 03:35:26PM +0800, Chen, Yu C wrote:

> Here we check packages0 and 1, should we check all the packages?

Might as well I suppose.

Could you boot queue/x86/topo on an snc-3 machine to verify it doesn't
explode with all the paranoia on?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/6] x86/topo: SNC Divination
  2026-02-26 10:49 [RFC][PATCH 0/6] x86/topo: SNC Divination Peter Zijlstra
                   ` (6 preceding siblings ...)
  2026-02-26 19:16 ` [RFC][PATCH 0/6] x86/topo: SNC Divination Luck, Tony
@ 2026-03-02 18:21 ` Kyle Meyer
  7 siblings, 0 replies; 32+ messages in thread
From: Kyle Meyer @ 2026-03-02 18:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, tglx, linux-kernel, tim.c.chen, yu.c.chen, vinicius.gomes,
	brgerst, hpa, kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck

On Thu, Feb 26, 2026 at 11:49:09AM +0100, Peter Zijlstra wrote:
> Hi!
> 
> So we once again ran head-first into the fact that the CPUs fail to enumerate
> useful state. This time it is SNC (again).
> 
> Thomas recently rewrote much of the topology code to use MADT and CPUID to
> derive many of the useful measure of the system *before* SMP bringup. Removing
> much broken magic.
> 
> Inspired by that, I wondered if we might do the same for SNC. Clearly MADT is
> not sufficient, however combined with SRAT we should have enough.
> 
> Further luck will have it that by the time MADT gets parsed, SRAT is already
> parsed, so integrating them is mostly straight forward. The only caveat is that
> numa_emulation can mess things up in between.
> 
> Combining all this gives a straight forward measure of nodes-per-package, which
> should reflect the SNC mode. All before SMP bringup.
> 
> Use this to 'fix' various SNC snafus.
> 
> Compile and boot tested on non SNC only for now; I'll see if I can convince
> qemu to play along.

Thank you for the series!

Tested on a Sapphire Rapids system with SNC-4, SNC-2, and SNC disabled.
Tested on a Granite Rapids system with SNC-2 and SNC disabled.

The nodes per package were correct.

I don't have access to a system with CXL attached memory.

Tested-by: Kyle Meyer <kyle.meyer@hpe.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-03-02 16:43             ` Peter Zijlstra
@ 2026-03-03  6:31               ` Zhang Rui
  2026-03-03  6:39                 ` Chen, Yu C
  2026-03-03  8:44                 ` Peter Zijlstra
  0 siblings, 2 replies; 32+ messages in thread
From: Zhang Rui @ 2026-03-03  6:31 UTC (permalink / raw)
  To: Peter Zijlstra, Chen, Yu C
  Cc: Tim Chen, linux-kernel, kyle.meyer, vinicius.gomes, brgerst, hpa,
	kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki, russ.anderson,
	zhao1.liu, tony.luck, x86, tglx

On Mon, 2026-03-02 at 17:43 +0100, Peter Zijlstra wrote:
> nt On Sat, Feb 28, 2026 at 03:35:26PM +0800, Chen, Yu C wrote:
> 
> > Here we check packages0 and 1, should we check all the packages?
> 
> Might as well I suppose.
> 
> Could you boot queue/x86/topo on an snc-3 machine to verify it
> doesn't
> explode with all the paranoia on?

Hi, Peter,

regarding slit_validate() in
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=x86/topo&id=24ca94ac4b72803a7164b7ad84f06f0e9f0c49df

I suppose we want to use 
	WARN_ON_ONCE(!slit_cluster_symmetric(n))
rather than
	WARN_ON_ONCE(slit_cluster_symmetric(n));
right?

I tested the queue/x86/topo plus above change,

1. on GNR 4 sockets with SNC2, no difference about sched_domains in
/proc/schedstat compared with upstream 7.0-rc1, plus the below warning
is also gone
[   11.439633] ------------[ cut here ]------------
[   11.440491] sched: Expect only up to 2 packages for GNR or CWF, but
saw 4 packages when building sched domains.
[   11.440493] WARNING: arch/x86/kernel/smpboot.c:574 at
arch_sched_node_distance+0x133/0x140, CPU#0: swapper/0/1

2. on CWF 2 sockets with SNC3 and CWF 1 socket with SNC3, no difference
about sched_domains in /proc/schedstat compared with upstream 7.0-rc1

So

Tested-by: Zhang Rui <rui.zhang@intel.com>

-rui

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-03-03  6:31               ` Zhang Rui
@ 2026-03-03  6:39                 ` Chen, Yu C
  2026-03-03  8:44                 ` Peter Zijlstra
  1 sibling, 0 replies; 32+ messages in thread
From: Chen, Yu C @ 2026-03-03  6:39 UTC (permalink / raw)
  To: Zhang Rui, Peter Zijlstra
  Cc: Tim Chen, linux-kernel, kyle.meyer, vinicius.gomes, brgerst, hpa,
	kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki, russ.anderson,
	zhao1.liu, tony.luck, x86, tglx

On 3/3/2026 2:31 PM, Zhang Rui wrote:
> On Mon, 2026-03-02 at 17:43 +0100, Peter Zijlstra wrote:
>> nt On Sat, Feb 28, 2026 at 03:35:26PM +0800, Chen, Yu C wrote:
>>
>>> Here we check packages0 and 1, should we check all the packages?
>>
>> Might as well I suppose.
>>
>> Could you boot queue/x86/topo on an snc-3 machine to verify it
>> doesn't
>> explode with all the paranoia on?
> 
> Hi, Peter,
> 
> regarding slit_validate() in
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=x86/topo&id=24ca94ac4b72803a7164b7ad84f06f0e9f0c49df
> 
> I suppose we want to use
> 	WARN_ON_ONCE(!slit_cluster_symmetric(n))
> rather than
> 	WARN_ON_ONCE(slit_cluster_symmetric(n));

With Rui's fix, tested on GNR ,2 sockets, SNC3, NUMA distance symmetric 
platform,
it works as expected,
  smp: Brought up 6 nodes, 384 CPUs
   domain-0: span=0,192 level=SMT
   domain-1: span=0-31,192-223 level=MC
   domain-2: span=0-95,192-287 level=NUMA
     groups: 0:{ span=0-31,192-223 cap=65536 }, 32:{ span=32-63,224-255 
cap=65536 }, 64:{ span=64-95,256-287 cap=65536 }
   domain-3: span=0-383 level=NUMA
     groups: 0:{ span=0-95,192-287 cap=196608 }, 96:{ 
span=96-191,288-383 cap=196608 }

resctrl also detects SNC3
root@a4bf018d7604:/sys/fs/resctrl/mon_data# tree mon_L3_00/
mon_L3_00/
├── llc_occupancy
├── mbm_local_bytes
├── mbm_total_bytes
├── mon_sub_L3_00
│   ├── llc_occupancy
│   ├── mbm_local_bytes
│   └── mbm_total_bytes
├── mon_sub_L3_01
│   ├── llc_occupancy
│   ├── mbm_local_bytes
│   └── mbm_total_bytes
└── mon_sub_L3_02
     ├── llc_occupancy
     ├── mbm_local_bytes
     └── mbm_total_bytes

> So
> 
> Tested-by: Zhang Rui <rui.zhang@intel.com>
> 

For this series,
Tested-by: Chen Yu <yu.c.chen@intel.com>

thanks,
Chenyu


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
  2026-03-03  6:31               ` Zhang Rui
  2026-03-03  6:39                 ` Chen, Yu C
@ 2026-03-03  8:44                 ` Peter Zijlstra
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2026-03-03  8:44 UTC (permalink / raw)
  To: Zhang Rui
  Cc: Chen, Yu C, Tim Chen, linux-kernel, kyle.meyer, vinicius.gomes,
	brgerst, hpa, kprateek.nayak, patryk.wlazlyn, rafael.j.wysocki,
	russ.anderson, zhao1.liu, tony.luck, x86, tglx

On Tue, Mar 03, 2026 at 02:31:37PM +0800, Zhang Rui wrote:
> On Mon, 2026-03-02 at 17:43 +0100, Peter Zijlstra wrote:
> > nt On Sat, Feb 28, 2026 at 03:35:26PM +0800, Chen, Yu C wrote:
> > 
> > > Here we check packages0 and 1, should we check all the packages?
> > 
> > Might as well I suppose.
> > 
> > Could you boot queue/x86/topo on an snc-3 machine to verify it
> > doesn't
> > explode with all the paranoia on?
> 
> Hi, Peter,
> 
> regarding slit_validate() in
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=x86/topo&id=24ca94ac4b72803a7164b7ad84f06f0e9f0c49df
> 
> I suppose we want to use 
> 	WARN_ON_ONCE(!slit_cluster_symmetric(n))
> rather than
> 	WARN_ON_ONCE(slit_cluster_symmetric(n));
> right?

D'0h yes, very much so!

Thanks!

Let me go expand the Changelogs some post and then probably stick it in
tip somewhere.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2026-03-03  8:45 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26 10:49 [RFC][PATCH 0/6] x86/topo: SNC Divination Peter Zijlstra
2026-02-26 10:49 ` [RFC][PATCH 1/6] x86/topo: Store extra copy of SRAT table Peter Zijlstra
2026-02-26 10:49 ` [RFC][PATCH 2/6] x86/topo: Add TOPO_NUMA_DOMAIN Peter Zijlstra
2026-02-27 13:19   ` K Prateek Nayak
2026-02-27 14:06     ` Peter Zijlstra
2026-03-02  4:16       ` K Prateek Nayak
2026-03-02 15:10         ` Peter Zijlstra
2026-03-02 15:35           ` K Prateek Nayak
2026-03-02 16:28             ` Peter Zijlstra
2026-02-26 10:49 ` [RFC][PATCH 3/6] x86/topo: Add __num_nodes_per_package Peter Zijlstra
2026-02-26 17:46   ` Kyle Meyer
2026-02-27 11:57     ` Peter Zijlstra
2026-02-26 10:49 ` [RFC][PATCH 4/6] x86/topo: Replace x86_has_numa_in_package Peter Zijlstra
2026-02-26 10:49 ` [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess Peter Zijlstra
2026-02-26 17:07   ` Chen, Yu C
2026-02-26 19:00     ` Tim Chen
2026-02-26 22:11       ` Tim Chen
2026-02-26 22:25         ` Tim Chen
2026-02-27 13:01       ` Peter Zijlstra
2026-02-27 19:23         ` Tim Chen
2026-02-28  7:35           ` Chen, Yu C
2026-03-02 16:43             ` Peter Zijlstra
2026-03-03  6:31               ` Zhang Rui
2026-03-03  6:39                 ` Chen, Yu C
2026-03-03  8:44                 ` Peter Zijlstra
2026-02-27 11:56     ` Peter Zijlstra
2026-02-26 10:49 ` [RFC][PATCH 6/6] x86/resctrl: Fix SNC detection Peter Zijlstra
2026-02-26 19:42   ` Luck, Tony
2026-02-26 20:47     ` Luck, Tony
2026-02-27  9:26       ` Peter Zijlstra
2026-02-26 19:16 ` [RFC][PATCH 0/6] x86/topo: SNC Divination Luck, Tony
2026-03-02 18:21 ` Kyle Meyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox