[PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours
@ 2009-05-04 17:33 Andreas Herrmann
  2009-05-04 17:34 ` [PATCH 1/3] x86: introduce cpuinfo->cpu_node_id to reflect topology of multi-node CPU Andreas Herrmann
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Andreas Herrmann @ 2009-05-04 17:33 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner; +Cc: linux-kernel

Hi,

Following patches add support for AMD Magny-Cours CPU.

I slightly change struct cpuinfo where I'd like to introduce
cpu_node_id to reflect CPU topology for AMD Magny-Cours CPU which
consists of two internal-nodes.

For all cores on the same multi-node CPU (Magny-Cours) /proc/cpuinfo
will show:
- same phys_proc_id
- cpu_node_id of the internal node (0 or 1)
- cpu_core_id (e.g. in range of 0 to 5)

I also change identification of core siblings (and thread siblings)
which will also be based on cpu_node_id in addition to phys_proc_id.

Furthermore I adapt the L3 cache information to reflect the cache
characteristics of one internal node instead of the entire package.

Primarily this changes are needed to correct core sibling information
for Magny-Cours. This CPU has two NBs on one physical package -- each
internal node has its own processor configuration space (i.e. set of
northbridge PCI functions).

Patches are against tip/master as of today.
Please consider patches 1 and 2 for .30.
Patch 3 should be applied to tip/x86/cpu where all the recent
cacheinfo changes reside (e.g. commit
6265ff19ca08df0d96c859ae5e4dc2d9ad07070e).

Regards,
Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] x86: introduce cpuinfo->cpu_node_id to reflect topology of multi-node CPU
  2009-05-04 17:33 [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours Andreas Herrmann
@ 2009-05-04 17:34 ` Andreas Herrmann
  2009-05-06 11:44   ` Ingo Molnar
  2009-05-04 17:36 ` [PATCH 2/3] x86: fixup topology detection for AMD " Andreas Herrmann
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Andreas Herrmann @ 2009-05-04 17:34 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner; +Cc: linux-kernel


Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
---
 arch/x86/include/asm/processor.h |    2 ++
 arch/x86/kernel/cpu/common.c     |    2 ++
 arch/x86/kernel/cpu/proc.c       |    1 +
 arch/x86/kernel/smpboot.c        |    5 ++++-
 4 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0b2fab0..b49d72b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -106,6 +106,8 @@ struct cpuinfo_x86 {
 	u16			booted_cores;
 	/* Physical processor id: */
 	u16			phys_proc_id;
+	/* Node id in case of multi-node processor: */
+	u16			cpu_node_id;
 	/* Core id: */
 	u16			cpu_core_id;
 	/* Index into per_cpu list: */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 591012f..9bff330 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -470,6 +470,8 @@ out:
 	if ((c->x86_max_cores * smp_num_siblings) > 1) {
 		printk(KERN_INFO  "CPU: Physical Processor ID: %d\n",
 		       c->phys_proc_id);
+		printk(KERN_INFO  "CPU: Processor Node ID: %d\n",
+		       c->cpu_node_id);
 		printk(KERN_INFO  "CPU: Processor Core ID: %d\n",
 		       c->cpu_core_id);
 	}
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index f93047f..e761c90 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -13,6 +13,7 @@ static void show_cpuinfo_core(struct seq_file *m, struct cpuinfo_x86 *c,
 #ifdef CONFIG_SMP
 	if (c->x86_max_cores * smp_num_siblings > 1) {
 		seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
+		seq_printf(m, "node id\t\t: %d\n", c->cpu_node_id);
 		seq_printf(m, "siblings\t: %d\n",
 			   cpumask_weight(cpu_sibling_mask(cpu)));
 		seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d2e8de9..dc0e735 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -374,6 +374,7 @@ void __cpuinit set_cpu_sibling_map(int cpu)
 			struct cpuinfo_x86 *o = &cpu_data(i);
 
 			if (c->phys_proc_id == o->phys_proc_id &&
+			    c->cpu_node_id == o->cpu_node_id &&
 			    c->cpu_core_id == o->cpu_core_id) {
 				cpumask_set_cpu(i, cpu_sibling_mask(cpu));
 				cpumask_set_cpu(cpu, cpu_sibling_mask(i));
@@ -401,7 +402,8 @@ void __cpuinit set_cpu_sibling_map(int cpu)
 			cpumask_set_cpu(i, c->llc_shared_map);
 			cpumask_set_cpu(cpu, cpu_data(i).llc_shared_map);
 		}
-		if (c->phys_proc_id == cpu_data(i).phys_proc_id) {
+		if (c->phys_proc_id == cpu_data(i).phys_proc_id &&
+		    c->cpu_node_id == cpu_data(i).cpu_node_id) {
 			cpumask_set_cpu(i, cpu_core_mask(cpu));
 			cpumask_set_cpu(cpu, cpu_core_mask(i));
 			/*
@@ -1218,6 +1220,7 @@ static void remove_siblinginfo(int cpu)
 	cpumask_clear(cpu_sibling_mask(cpu));
 	cpumask_clear(cpu_core_mask(cpu));
 	c->phys_proc_id = 0;
+	c->cpu_node_id = 0;
 	c->cpu_core_id = 0;
 	cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
 }
-- 
1.6.2




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] x86: introduce cpuinfo->cpu_node_id to reflect topology of multi-node CPU
  2009-05-04 17:34 ` [PATCH 1/3] x86: introduce cpuinfo->cpu_node_id to reflect topology of multi-node CPU Andreas Herrmann
@ 2009-05-06 11:44   ` Ingo Molnar
  2009-05-06 16:14     ` Andreas Herrmann
  0 siblings, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2009-05-06 11:44 UTC (permalink / raw)
  To: Andreas Herrmann; +Cc: H. Peter Anvin, Thomas Gleixner, linux-kernel


* Andreas Herrmann <andreas.herrmann3@amd.com> wrote:

> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
> ---
>  arch/x86/include/asm/processor.h |    2 ++
>  arch/x86/kernel/cpu/common.c     |    2 ++
>  arch/x86/kernel/cpu/proc.c       |    1 +
>  arch/x86/kernel/smpboot.c        |    5 ++++-
>  4 files changed, 9 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 0b2fab0..b49d72b 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -106,6 +106,8 @@ struct cpuinfo_x86 {
>  	u16			booted_cores;
>  	/* Physical processor id: */
>  	u16			phys_proc_id;
> +	/* Node id in case of multi-node processor: */
> +	u16			cpu_node_id;

btw., do you have any plans to propagate this information into the 
scheduler domains tree?

Another level of domains, to cover the two internal nodes, would do 
the trick nicely and automatically. This would work even if the BIOS 
does not provide information and we have to go to lowlevel registers 
or CPUID to recover it.

This can be done even if there's no SRAT. (there's no SRAT because 
say due to interleaving there's no real NUMA structure of memory. 
But there's still CPU scheduling differences worth expressing.)

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] x86: introduce cpuinfo->cpu_node_id to reflect topology of multi-node CPU
  2009-05-06 11:44   ` Ingo Molnar
@ 2009-05-06 16:14     ` Andreas Herrmann
  0 siblings, 0 replies; 17+ messages in thread
From: Andreas Herrmann @ 2009-05-06 16:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: H. Peter Anvin, Thomas Gleixner, linux-kernel

On Wed, May 06, 2009 at 01:44:23PM +0200, Ingo Molnar wrote:
> 
> * Andreas Herrmann <andreas.herrmann3@amd.com> wrote:
> 
> > Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
> > ---
> >  arch/x86/include/asm/processor.h |    2 ++
> >  arch/x86/kernel/cpu/common.c     |    2 ++
> >  arch/x86/kernel/cpu/proc.c       |    1 +
> >  arch/x86/kernel/smpboot.c        |    5 ++++-
> >  4 files changed, 9 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > index 0b2fab0..b49d72b 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -106,6 +106,8 @@ struct cpuinfo_x86 {
> >  	u16			booted_cores;
> >  	/* Physical processor id: */
> >  	u16			phys_proc_id;
> > +	/* Node id in case of multi-node processor: */
> > +	u16			cpu_node_id;
> 
> btw., do you have any plans to propagate this information into the 
> scheduler domains tree?

No plans yet -- as I don't know much about the scheduler code so far.
But it's worth it and I'll do it.

> Another level of domains, to cover the two internal nodes, would do 
> the trick nicely and automatically. This would work even if the BIOS 
> does not provide information and we have to go to lowlevel registers 
> or CPUID to recover it.

Yes.

(And maybe for powersaving reasons it might be worth to use nodes on
same socket instead of nodes on different sockets. But I have to check
that in-depth first.)

> This can be done even if there's no SRAT. (there's no SRAT because 
> say due to interleaving there's no real NUMA structure of memory. 
> But there's still CPU scheduling differences worth expressing.)

Seconded.


Regards,
Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/3] x86: fixup topology detection for AMD multi-node CPU
  2009-05-04 17:33 [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours Andreas Herrmann
  2009-05-04 17:34 ` [PATCH 1/3] x86: introduce cpuinfo->cpu_node_id to reflect topology of multi-node CPU Andreas Herrmann
@ 2009-05-04 17:36 ` Andreas Herrmann
  2009-05-04 17:37 ` [PATCH 3/3] x86: cacheinfo: fixup L3 cache information " Andreas Herrmann
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Andreas Herrmann @ 2009-05-04 17:36 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner; +Cc: linux-kernel


Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
---
 arch/x86/include/asm/cpufeature.h |    1 +
 arch/x86/kernel/cpu/amd.c         |   63 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 63 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index bb83b1c..17b59ec 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -94,6 +94,7 @@
 #define X86_FEATURE_TSC_RELIABLE (3*32+23) /* TSC is known to be reliable */
 #define X86_FEATURE_NONSTOP_TSC	(3*32+24) /* TSC does not stop in C states */
 #define X86_FEATURE_CLFLUSH_MONITOR (3*32+25) /* "" clflush reqd with monitor */
+#define X86_FEATURE_AMD_DCM     (3*32+26) /* multi-node processor */
 
 /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
 #define X86_FEATURE_XMM3	(4*32+ 0) /* "pni" SSE-3 */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 7e4a459..3be0645 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -6,6 +6,7 @@
 #include <asm/processor.h>
 #include <asm/apic.h>
 #include <asm/cpu.h>
+#include <asm/pci-direct.h>
 
 #ifdef CONFIG_X86_64
 # include <asm/numa_64.h>
@@ -250,6 +251,59 @@ static int __cpuinit nearby_node(int apicid)
 #endif
 
 /*
+ * Fixup core topology information for AMD multi-node processors.
+ * Assumption 1: Number of cores in each internal node is the same.
+ * Assumption 2: Mixed systems with both single-node and dual-node
+ *               processors are not supported.
+ */
+static void __cpuinit amd_fixup_dcm(struct cpuinfo_x86 *c)
+{
+	u32 t;
+	u8 n, ni;
+
+	/* Fixup topology information only once for a core. */
+	if (cpu_has(c, X86_FEATURE_AMD_DCM))
+		return;
+
+	/* Check for multi-node processor on boot cpu. */
+	t = read_pci_config(0, 24, 3, 0xe8);
+	if (!(t & (1 << 29)))
+		return;
+
+	set_cpu_cap(c, X86_FEATURE_AMD_DCM);
+
+	/* each internal node has half the number of cores */
+	c->x86_max_cores = c->x86_max_cores >> 1;
+
+	/* ids of both nodes of this dual-node processor */
+	n = c->phys_proc_id << 1;
+
+	/*
+	 * determine internal node id and assign cores
+	 * fifty-fifty to each node of the dual-node processor
+	 */
+	t = read_pci_config(0, 24 + n, 3, 0xe8);
+	ni = (t>>30) & 0x3;
+	if (ni == 0) {
+		if (c->cpu_core_id < c->x86_max_cores)
+			c->cpu_node_id = 0;
+		else
+			c->cpu_node_id = 1;
+	} else {
+		if (c->cpu_core_id < c->x86_max_cores)
+			c->cpu_node_id = 1;
+		else
+			c->cpu_node_id = 0;
+	}
+
+	/*
+	 * fixup core id as each internal node has half the number of
+	 * original max_cores
+	 */
+	c->cpu_core_id = c->cpu_core_id % c->x86_max_cores;
+}
+
+/*
  * On a AMD dual core setup the lower bits of the APIC id distingush the cores.
  * Assumes number of cores is a power of two.
  */
@@ -264,6 +318,9 @@ static void __cpuinit amd_detect_cmp(struct cpuinfo_x86 *c)
 	c->cpu_core_id = c->initial_apicid & ((1 << bits)-1);
 	/* Convert the initial APIC ID into the socket ID */
 	c->phys_proc_id = c->initial_apicid >> bits;
+	/* fixup topology information on multi-node processors */
+	if ((c->x86 == 0x10) && (c->x86_model == 9))
+		amd_fixup_dcm(c);
 #endif
 }
 
@@ -274,7 +331,11 @@ static void __cpuinit srat_detect_node(struct cpuinfo_x86 *c)
 	int node;
 	unsigned apicid = hard_smp_processor_id();
 
-	node = c->phys_proc_id;
+	if (cpu_has(c, X86_FEATURE_AMD_DCM))
+		node = (c->phys_proc_id << 1) + c->cpu_node_id;
+	else
+		node = c->phys_proc_id;
+
 	if (apicid_to_node[apicid] != NUMA_NO_NODE)
 		node = apicid_to_node[apicid];
 	if (!node_online(node)) {
-- 
1.6.2




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] x86: cacheinfo: fixup L3 cache information for AMD multi-node CPU
  2009-05-04 17:33 [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours Andreas Herrmann
  2009-05-04 17:34 ` [PATCH 1/3] x86: introduce cpuinfo->cpu_node_id to reflect topology of multi-node CPU Andreas Herrmann
  2009-05-04 17:36 ` [PATCH 2/3] x86: fixup topology detection for AMD " Andreas Herrmann
@ 2009-05-04 17:37 ` Andreas Herrmann
  2009-05-04 17:44 ` [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours Andreas Herrmann
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Andreas Herrmann @ 2009-05-04 17:37 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner; +Cc: linux-kernel


L3 cache size, associativity and shared_cpu information need to be
adapted to show information for one internal node instead of the
entire physical package.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
---
 arch/x86/kernel/cpu/intel_cacheinfo.c |   20 ++++++++++++++------
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c
index 789efe2..4918fc9 100644
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -241,7 +241,7 @@ amd_cpuid4(int leaf, union _cpuid4_leaf_eax *eax,
 	case 0:
 		if (!l1->val)
 			return;
-		assoc = l1->assoc;
+		assoc = assocs[l1->assoc];
 		line_size = l1->line_size;
 		lines_per_tag = l1->lines_per_tag;
 		size_in_kb = l1->size_in_kb;
@@ -249,7 +249,7 @@ amd_cpuid4(int leaf, union _cpuid4_leaf_eax *eax,
 	case 2:
 		if (!l2.val)
 			return;
-		assoc = l2.assoc;
+		assoc = assocs[l2.assoc];
 		line_size = l2.line_size;
 		lines_per_tag = l2.lines_per_tag;
 		/* cpu_data has errata corrections for K7 applied */
@@ -258,10 +258,14 @@ amd_cpuid4(int leaf, union _cpuid4_leaf_eax *eax,
 	case 3:
 		if (!l3.val)
 			return;
-		assoc = l3.assoc;
+		assoc = assocs[l3.assoc];
 		line_size = l3.line_size;
 		lines_per_tag = l3.lines_per_tag;
 		size_in_kb = l3.size_encoded * 512;
+		if (boot_cpu_has(X86_FEATURE_AMD_DCM)) {
+			size_in_kb = size_in_kb >> 1;
+			assoc = assoc >> 1;
+		}
 		break;
 	default:
 		return;
@@ -278,10 +282,10 @@ amd_cpuid4(int leaf, union _cpuid4_leaf_eax *eax,
 	eax->split.num_cores_on_die = current_cpu_data.x86_max_cores - 1;
 
 
-	if (assoc == 0xf)
+	if (assoc == 0xffff)
 		eax->split.is_fully_associative = 1;
 	ebx->split.coherency_line_size = line_size - 1;
-	ebx->split.ways_of_associativity = assocs[assoc] - 1;
+	ebx->split.ways_of_associativity = assoc - 1;
 	ebx->split.physical_line_partition = lines_per_tag - 1;
 	ecx->split.number_of_sets = (size_in_kb * 1024) / line_size /
 		(ebx->split.ways_of_associativity + 1) - 1;
@@ -598,7 +602,11 @@ static void __cpuinit get_cpu_leaves(void *_retval)
 				cache_remove_shared_cpu_map(cpu, i);
 			break;
 		}
-		cache_shared_cpu_map_setup(cpu, j);
+		if (boot_cpu_has(X86_FEATURE_AMD_DCM))
+			cpumask_copy(to_cpumask(this_leaf->shared_cpu_map),
+				     topology_core_cpumask(cpu));
+		else
+			cache_shared_cpu_map_setup(cpu, j);
 	}
 }
 
-- 
1.6.2




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours
  2009-05-04 17:33 [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours Andreas Herrmann
                   ` (2 preceding siblings ...)
  2009-05-04 17:37 ` [PATCH 3/3] x86: cacheinfo: fixup L3 cache information " Andreas Herrmann
@ 2009-05-04 17:44 ` Andreas Herrmann
  2009-05-04 20:16 ` Andi Kleen
  2009-05-08 14:28 ` Andreas Herrmann
  5 siblings, 0 replies; 17+ messages in thread
From: Andreas Herrmann @ 2009-05-04 17:44 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner; +Cc: linux-kernel

On Mon, May 04, 2009 at 07:33:30PM +0200, Andreas Herrmann wrote:
> Hi,
> 
> Following patches add support for AMD Magny-Cours CPU.
> 
> I slightly change struct cpuinfo where I'd like to introduce
> cpu_node_id to reflect CPU topology for AMD Magny-Cours CPU which
> consists of two internal-nodes.
> 
> For all cores on the same multi-node CPU (Magny-Cours) /proc/cpuinfo
> will show:
> - same phys_proc_id
> - cpu_node_id of the internal node (0 or 1)
> - cpu_core_id (e.g. in range of 0 to 5)
> 
> I also change identification of core siblings (and thread siblings)
> which will also be based on cpu_node_id in addition to phys_proc_id.
> 
> Furthermore I adapt the L3 cache information to reflect the cache
> characteristics of one internal node instead of the entire package.
> 
> Primarily this changes are needed to correct core sibling information
> for Magny-Cours. This CPU has two NBs on one physical package -- each
> internal node has its own processor configuration space (i.e. set of
> northbridge PCI functions).

I should have mentioned that first two patches fix topology information
provided in /sys/devices/cpu/cpuX/topology (core_sibling information).

I think, the internal node information should also be exposed
there. E.g. introducing cpu_node_siblings and cpu_node_sibling_list.

But that affects other architectures (the code in
drivers/base/topology is not x86-specific) and probably needs some
more discussion. Patches to address this will follow.



Regards,
Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours
  2009-05-04 17:33 [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours Andreas Herrmann
                   ` (3 preceding siblings ...)
  2009-05-04 17:44 ` [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours Andreas Herrmann
@ 2009-05-04 20:16 ` Andi Kleen
  2009-05-05  9:22   ` Andreas Herrmann
  2009-05-08 14:28 ` Andreas Herrmann
  5 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2009-05-04 20:16 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

Andreas Herrmann <andreas.herrmann3@amd.com> writes:

> Following patches add support for AMD Magny-Cours CPU.
>
> I slightly change struct cpuinfo where I'd like to introduce
> cpu_node_id to reflect CPU topology for AMD Magny-Cours CPU which
> consists of two internal-nodes.

It's unclear to me why you need this special case versus 
just using the normal NUMA distances to represent the internal
nodes as "nearby nodes" versus sockets as "farer away".

Essentially it should be just two level NUMA which can be already
described fine for the scheduler or VM using the existing SRAT
mechanism.

Who needs this additional information?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours
  2009-05-04 20:16 ` Andi Kleen
@ 2009-05-05  9:22   ` Andreas Herrmann
  2009-05-05  9:35     ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Herrmann @ 2009-05-05  9:22 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

On Mon, May 04, 2009 at 10:16:15PM +0200, Andi Kleen wrote:
> Andreas Herrmann <andreas.herrmann3@amd.com> writes:
> 
> > Following patches add support for AMD Magny-Cours CPU.
> >
> > I slightly change struct cpuinfo where I'd like to introduce
> > cpu_node_id to reflect CPU topology for AMD Magny-Cours CPU which
> > consists of two internal-nodes.
> 
> It's unclear to me why you need this special case versus 
> just using the normal NUMA distances to represent the internal
> nodes as "nearby nodes" versus sockets as "farer away".

NUMA memory allocation and scheduling based on SRAT works just fine
w/o that fix. But the patches are _not_ for NUMA node detection.

This is for CPU topology detection -- which cores are using the same
northbridge.

> Essentially it should be just two level NUMA which can be already
> described fine for the scheduler or VM using the existing SRAT
> mechanism.

It doesn't suffice to rely on SRAT here.

Best example is node interleaving. Usually you won't get a SRAT table
on such a system. Thus you see just one NUMA node in
/sys/devices/system/node.  But on such a configuration you still see
(and you want to see) the correct CPU topology information in
/sys/devices/system/cpu/cpuX/topology. Based on that you always can
figure out which cores are on the same physical package independent of
availability and contents of SRAT and even with kernels that are
compiled w/o NUMA support.

A big change with Magny-Cours in contrast to other AMD
CPUs is that instead of

   physical package == one northbridge (one node)

we have

   physical package == two northbridges (two nodes)

and this needs to be represented somehow in the kernel.

> Who needs this additional information?

The kernel needs to know this when accessing processor configuration
space, when accessing shared MSRs or for counting northbridge specific
events.

Regards,
Andreas 

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours
  2009-05-05  9:22   ` Andreas Herrmann
@ 2009-05-05  9:35     ` Andi Kleen
  2009-05-05 10:48       ` Andreas Herrmann
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2009-05-05  9:35 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Andi Kleen, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	linux-kernel

> Best example is node interleaving. Usually you won't get a SRAT table
> on such a system.

That sounds like a BIOS bug. It should supply a suitable SLIT/SRAT
even for this case. Or perhaps if the BIOS are really that broken
add a suitable quirk that provides distances, but better fix the BIOSes.

 Thus you see just one NUMA node in
> /sys/devices/system/node.  But on such a configuration you still see
> (and you want to see) the correct CPU topology information in
> /sys/devices/system/cpu/cpuX/topology. Based on that you always can
> figure out which cores are on the same physical package independent of
> availability and contents of SRAT and even with kernels that are
> compiled w/o NUMA support.

So you're adding a x86 specific mini NUMA for kernels without NUMA
(which btw becomes more and more an exotic case -- modern distros
are normally unconditionally NUMA) Doesn't seem very useful.

My problem with that is that imho the x86 topology information is already
too complicated -- i suspect very few people can make sense of it --
and you're making it even worse, adding another strange special case.

On the other hand NUMA topology is comparatively straight forward and well 
understood and it's flexible enough to express your case too.

>    physical package == two northbridges (two nodes)
> 
> and this needs to be represented somehow in the kernel.

It's just two nodes with a very fast interconnect.

> 
> > Who needs this additional information?
> 
> The kernel needs to know this when accessing processor configuration
> space, when accessing shared MSRs or for counting northbridge specific
> events.

You're saying there are MSRs shared between the two in package nodes?

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours
  2009-05-05  9:35     ` Andi Kleen
@ 2009-05-05 10:48       ` Andreas Herrmann
  2009-05-05 12:02         ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Herrmann @ 2009-05-05 10:48 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

On Tue, May 05, 2009 at 11:35:20AM +0200, Andi Kleen wrote:
> > Best example is node interleaving. Usually you won't get a SRAT table
> > on such a system.
> 
> That sounds like a BIOS bug. It should supply a suitable SLIT/SRAT
> even for this case. Or perhaps if the BIOS are really that broken
> add a suitable quirk that provides distances, but better fix the BIOSes.

How do you define SRAT when node interleaving is enabled?
(Defining same distances between all nodes, describing only one node,
or omitting SRAT entirely? I've observed that the latter is common
behavior.)

> > Thus you see just one NUMA node in
> > /sys/devices/system/node.  But on such a configuration you still see
> > (and you want to see) the correct CPU topology information in
> > /sys/devices/system/cpu/cpuX/topology. Based on that you always can
> > figure out which cores are on the same physical package independent of
> > availability and contents of SRAT and even with kernels that are
> > compiled w/o NUMA support.
> 
> So you're adding a x86 specific mini NUMA for kernels without NUMA
> (which btw becomes more and more an exotic case -- modern distros
> are normally unconditionally NUMA) Doesn't seem very useful.

No, I just tried to give an example why you can't derive CPU topology
from NUMA topology.

IMHO we have two sorts of topology information:
(1) CPU topology (physical package, core siblings, thread siblings)
(2) NUMA topology

Of course also for non-NUMA systems the kernel detects and provides (1).

> My problem with that is that imho the x86 topology information is already
> too complicated --

Well, it won't be simpler in the future. But it shouldn't be too complicate
to understand it if its' properly represented and documented.

> i suspect very few people can make sense of it --
> and you're making it even worse, adding another strange special case.

It's an abstraction -- I think of it just as another level in the CPU
hierarchy -- where existing CPUs and multi-node CPUs fit in:

  physical package --> processor node --> processor core --> thread

I guess the problem is that you are associating node always with NUMA.
Would it help to rename cpu_node_id to something else?

I suggested to introduce

 cpu_node_id (in style of AMD specs)

How about

 cpu_chip_id (in the style of MCM - multi-chip module ;-)
 cpu_nb_id   (nb == northbridge, introducing kind of northbridge domain)
 cpu_die_id

or something entirely different?

> On the other hand NUMA topology is comparatively straight forward and well 
> understood and it's flexible enough to express your case too.
> 
> >    physical package == two northbridges (two nodes)
> > 
> > and this needs to be represented somehow in the kernel.
> 
> It's just two nodes with a very fast interconnect.

In fact, I also thought about representing each internal node as one
physical package. But that is even worse as you can't figure out which
node is on the same socket. And "physical package id" is used as
socket information.

The best solution is to reflect the correct CPU topology (all levels
of the hierarchy) in the kernel. As another use case: for power
management you might want to know both which cores are on which
internal node _and_ which nodes are on the same physical package.

> > > Who needs this additional information?
> > 
> > The kernel needs to know this when accessing processor configuration
> > space, when accessing shared MSRs or for counting northbridge specific
> > events.
> 
> You're saying there are MSRs shared between the two in package nodes?

No. I referred to NB MSRs that are shared between the cores on the
same (internal) node.

Regards,

Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours
  2009-05-05 10:48       ` Andreas Herrmann
@ 2009-05-05 12:02         ` Andi Kleen
  2009-05-05 14:40           ` Andreas Herrmann
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2009-05-05 12:02 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Andi Kleen, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	linux-kernel

On Tue, May 05, 2009 at 12:48:48PM +0200, Andreas Herrmann wrote:
> On Tue, May 05, 2009 at 11:35:20AM +0200, Andi Kleen wrote:
> > > Best example is node interleaving. Usually you won't get a SRAT table
> > > on such a system.
> > 
> > That sounds like a BIOS bug. It should supply a suitable SLIT/SRAT
> > even for this case. Or perhaps if the BIOS are really that broken
> > add a suitable quirk that provides distances, but better fix the BIOSes.
> 
> How do you define SRAT when node interleaving is enabled?
> (Defining same distances between all nodes, describing only one node,
> or omitting SRAT entirely? I've observed that the latter is common
> behavior.)

Either a memory less node with 10 distance (which seems to be envogue 
recently for some reason) or multiple nodes with 10 distance.

> 
> > > Thus you see just one NUMA node in
> > > /sys/devices/system/node.  But on such a configuration you still see
> > > (and you want to see) the correct CPU topology information in
> > > /sys/devices/system/cpu/cpuX/topology. Based on that you always can
> > > figure out which cores are on the same physical package independent of
> > > availability and contents of SRAT and even with kernels that are
> > > compiled w/o NUMA support.
> > 
> > So you're adding a x86 specific mini NUMA for kernels without NUMA
> > (which btw becomes more and more an exotic case -- modern distros
> > are normally unconditionally NUMA) Doesn't seem very useful.
> 
> No, I just tried to give an example why you can't derive CPU topology

First I must say it's unclear to me if CPU topology is really generally
useful to export to the user. If they want to know how far cores
are away they should look at cache sharing and NUMA distances (especially
cache topology gives a very good approximation anyways). For other
purposes like power management just having arbitary sets (x is shared
with y in a set without hierarchy) seems to work just fine.

Then traditionally there were special cases for SMT and for packages
(for error containment etc.) and some hacks for licensing, but these
don't really apply in your case or can be already expressed in other
ways.

If there is really a good use case for exporting CPU topology I would
argue for not adding another adhoc level, but just export a SLIT
style arbitary distance table somewhere in sys. That would support to express 
any possible future hierarchies too. But again I have doubts that's really
needed at all.


> > and you're making it even worse, adding another strange special case.
> 
> It's an abstraction -- I think of it just as another level in the CPU

It's not a general abstraction, just another ad-hoc hack.

> hierarchy -- where existing CPUs and multi-node CPUs fit in:
> 
>   physical package --> processor node --> processor core --> thread
> 
> I guess the problem is that you are associating node always with NUMA.
> Would it help to rename cpu_node_id to something else?

Nope. It's a general problem, renaming it won't make it better.

> or something entirely different?
> 
> > On the other hand NUMA topology is comparatively straight forward and well 
> > understood and it's flexible enough to express your case too.
> > 
> > >    physical package == two northbridges (two nodes)
> > > 
> > > and this needs to be represented somehow in the kernel.
> > 
> > It's just two nodes with a very fast interconnect.
> 
> In fact, I also thought about representing each internal node as one
> physical package. But that is even worse as you can't figure out which
> node is on the same socket. 

That's what the physical id is for.

> The best solution is to reflect the correct CPU topology (all levels
> of the hierarchy) in the kernel. As another use case: for power
> management you might want to know both which cores are on which
> internal node _and_ which nodes are on the same physical package.

The powernow driver needs to know this?

The question is really if it needs to be generally known. That
seems doubtful.

> 
> > > > Who needs this additional information?
> > > 
> > > The kernel needs to know this when accessing processor configuration
> > > space, when accessing shared MSRs or for counting northbridge specific
> > > events.
> > 
> > You're saying there are MSRs shared between the two in package nodes?
> 
> No. I referred to NB MSRs that are shared between the cores on the
> same (internal) node.

Just check siblings then.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours
  2009-05-05 12:02         ` Andi Kleen
@ 2009-05-05 14:40           ` Andreas Herrmann
  2009-05-05 15:31             ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Herrmann @ 2009-05-05 14:40 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

On Tue, May 05, 2009 at 02:02:06PM +0200, Andi Kleen wrote:
> On Tue, May 05, 2009 at 12:48:48PM +0200, Andreas Herrmann wrote:
> > On Tue, May 05, 2009 at 11:35:20AM +0200, Andi Kleen wrote:
> > > > Best example is node interleaving. Usually you won't get a SRAT table
> > > > on such a system.
> > > 
> > > That sounds like a BIOS bug. It should supply a suitable SLIT/SRAT
> > > even for this case. Or perhaps if the BIOS are really that broken
> > > add a suitable quirk that provides distances, but better fix the BIOSes.
> > 
> > How do you define SRAT when node interleaving is enabled?
> > (Defining same distances between all nodes, describing only one node,
> > or omitting SRAT entirely? I've observed that the latter is common
> > behavior.)
> 
> Either a memory less node with 10 distance (which seems to be envogue 
> recently for some reason) or multiple nodes with 10 distance.

See below -- (a) and (b).

> > > > Thus you see just one NUMA node in
> > > > /sys/devices/system/node.  But on such a configuration you still see
> > > > (and you want to see) the correct CPU topology information in
> > > > /sys/devices/system/cpu/cpuX/topology. Based on that you always can
> > > > figure out which cores are on the same physical package independent of
> > > > availability and contents of SRAT and even with kernels that are
> > > > compiled w/o NUMA support.
> > > 
> > > So you're adding a x86 specific mini NUMA for kernels without NUMA
> > > (which btw becomes more and more an exotic case -- modern distros
> > > are normally unconditionally NUMA) Doesn't seem very useful.
> > 
> > No, I just tried to give an example why you can't derive CPU topology
> 
> First I must say it's unclear to me if CPU topology is really generally
> useful to export to the user.

I think it is useful.

(Linux already provides this kind of information.  But it should pass
only useful/correct information to user space. For Magny-Cours sibling
information provided by current Linux kernel is not very useful.)

> If they want to know how far cores
> are away they should look at cache sharing and NUMA distances (especially
> cache topology gives a very good approximation anyways). For other
> purposes like power management just having arbitary sets (x is shared
> with y in a set without hierarchy) seems to work just fine.

(a) Are you saying that users have to check NUMA distances when they
    want to pin tasks on certain CPUs?

(b) Just an example, if SRAT describes

    "Either a memory less node with 10 distance (which seems to be
     envogue recently for some reason) or multiple nodes with 10
     distance."

    how would you do (a) to pin tasks say on the same internal node or
    on the same physical package? That is not straightforward. But
    representing entire CPU topology in sysfs makes this obvious.

> Then traditionally there were special cases for SMT and for packages
> (for error containment etc.) and some hacks for licensing, but these
> don't really apply in your case or can be already expressed in other
> ways.

Yes, it's a new "special case" which can't be expressed in other ways.
SLIT and SRAT are not sufficient.

The kernel needs to know which cores are on same internal node. The
way my patches do it is to fix core_siblings to refer to siblings on
same internal node instead of physical processor.

Another approach would have been:
- to keep core_siblings as is (to specify siblings on same physical
  processor)
- and to introduce a new cpu mask to specify siblings on same internal
  node

> If there is really a good use case for exporting CPU topology I would
> argue for not adding another adhoc level, but just export a SLIT
> style arbitary distance table somewhere in sys. That would support to express 
> any possible future hierarchies too. But again I have doubts that's really
> needed at all.

I don't agree on this.

> > > and you're making it even worse, adding another strange special case.
> > 
> > It's an abstraction -- I think of it just as another level in the CPU
> 
> It's not a general abstraction, just another ad-hoc hack.

Fine.
But do you have any constructive and usable suggestion how Linux
should handle topology information for multi-node processors?

> > hierarchy -- where existing CPUs and multi-node CPUs fit in:
> > 
> >   physical package --> processor node --> processor core --> thread
> > 
> > I guess the problem is that you are associating node always with NUMA.
> > Would it help to rename cpu_node_id to something else?
> 
> Nope. It's a general problem, renaming it won't make it better.

I don't see "a general problem". There is just a new CPU introducing a
topology that slightly differs from what we had so far. Adapting the
kernel shouldn't be that problematic.

> > or something entirely different?
> > 
> > > On the other hand NUMA topology is comparatively straight forward and well 
> > > understood and it's flexible enough to express your case too.
> > > 
> > > >    physical package == two northbridges (two nodes)
> > > > 
> > > > and this needs to be represented somehow in the kernel.
> > > 
> > > It's just two nodes with a very fast interconnect.
> > 
> > In fact, I also thought about representing each internal node as one
> > physical package. But that is even worse as you can't figure out which
> > node is on the same socket. 
> 
> That's what the physical id is for.

Seconded. The physical id is for identifying the socket. That
implies that phys_proc_id != id_of_internal_node, right?

  <snip>

> > > You're saying there are MSRs shared between the two in package nodes?
> > 
> > No. I referred to NB MSRs that are shared between the cores on the
> > same (internal) node.
> 
> Just check siblings then.

I can't "just check siblings" if core_siblings represent all cores on
the same physical package.


Regards,
Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours
  2009-05-05 14:40           ` Andreas Herrmann
@ 2009-05-05 15:31             ` Andi Kleen
  2009-05-05 16:47               ` Andreas Herrmann
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2009-05-05 15:31 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Andi Kleen, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	linux-kernel

On Tue, May 05, 2009 at 04:40:27PM +0200, Andreas Herrmann wrote:

I think the general problem with your patchset is that it seems
more like a solution in search of a problem, instead of the proper other
way around.

> > > > > Thus you see just one NUMA node in
> > > > > /sys/devices/system/node.  But on such a configuration you still see
> > > > > (and you want to see) the correct CPU topology information in
> > > > > /sys/devices/system/cpu/cpuX/topology. Based on that you always can
> > > > > figure out which cores are on the same physical package independent of
> > > > > availability and contents of SRAT and even with kernels that are
> > > > > compiled w/o NUMA support.
> > > > 
> > > > So you're adding a x86 specific mini NUMA for kernels without NUMA
> > > > (which btw becomes more and more an exotic case -- modern distros
> > > > are normally unconditionally NUMA) Doesn't seem very useful.
> > > 
> > > No, I just tried to give an example why you can't derive CPU topology
> > 
> > First I must say it's unclear to me if CPU topology is really generally
> > useful to export to the user.
> 
> I think it is useful.

You forgot to state for what?

> 
> (Linux already provides this kind of information.  But it should pass
> only useful/correct information to user space. For Magny-Cours sibling
> information provided by current Linux kernel is not very useful.)
> 
> > If they want to know how far cores
> > are away they should look at cache sharing and NUMA distances (especially
> > cache topology gives a very good approximation anyways). For other
> > purposes like power management just having arbitary sets (x is shared
> > with y in a set without hierarchy) seems to work just fine.
> 
> (a) Are you saying that users have to check NUMA distances when they
>     want to pin tasks on certain CPUs?

CPU == core ? No you just bind to that CPU. Was that a trick question?

If you mean package -- they should probably just bind to one of the two nodes.

> 
> (b) Just an example, if SRAT describes
> 
>     "Either a memory less node with 10 distance (which seems to be
>      envogue recently for some reason) or multiple nodes with 10
>      distance."
> 
>     how would you do (a) to pin tasks say on the same internal node or

Even if they have 10 distance the internal nodes would be still
different. So you can just say "bind to node N" and that would
be one of the internal nodes.

>     on the same physical package? That is not straightforward. But

Bind to all nodes with zero distance from me.

>     representing entire CPU topology in sysfs makes this obvious.

You didn't do that.

> 
> > Then traditionally there were special cases for SMT and for packages
> > (for error containment etc.) and some hacks for licensing, but these
> > don't really apply in your case or can be already expressed in other
> > ways.
> 
> Yes, it's a new "special case" which can't be expressed in other ways.

You seem to assume that it's obviously useful to express. Perhaps
I'm dumb, but can you explain for what?

> SLIT and SRAT are not sufficient.
> 
> The kernel 

Which part of the kernel?

> needs to know which cores are on same internal node. The

If internal core is a north bridge then that's just a NUMA node.

> > > > and you're making it even worse, adding another strange special case.
> > > 
> > > It's an abstraction -- I think of it just as another level in the CPU
> > 
> > It's not a general abstraction, just another ad-hoc hack.
> 
> Fine.
> But do you have any coNSTructive and usable suggestion how Linux

Yes, sysfs and arbitary SLIT like topology like above. But of course a use 
case would need to be established for it first. If there's no use case
there shouldn't be any code either.

> should handle topology information for multi-node processors?
> 
> > > hierarchy -- where existing CPUs and multi-node CPUs fit in:
> > > 
> > >   physical package --> processor node --> processor core --> thread
> > > 
> > > I guess the problem is that you are associating node always with NUMA.
> > > Would it help to rename cpu_node_id to something else?
> > 
> > Nope. It's a general problem, renaming it won't make it better.
> 
> I don't see "a general problem". There is just a new CPU introducing a
> topology that slightly differs from what we had so far. Adapting the
> kernel shouldn't be that problematic.

The general problem is that you're trying to stretch an old ad-hoc
hack (siblings etc. in /proc/cpuinfo) to a complicated new graph structure
and the interface is just not up to that.

As a exercise you can try to write a user space program that uses
these fields to query the topology. It will be a mess.

We went through the same with the caches. Initially /proc/cpuinfo
tried to express the caches. At some point it just became too messy
and cpuinfo now only gives a very simplified "legacy" field and
the real information is in /sys. It went into /sys because there 
was a clear use case. If there's a clear use case for exposing
complex CPU topology (so far that's still an open question
due to lack of good examples) then also a new proper non hacky
interface in /sys would make sense.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours
  2009-05-05 15:31             ` Andi Kleen
@ 2009-05-05 16:47               ` Andreas Herrmann
  2009-05-05 17:54                 ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Herrmann @ 2009-05-05 16:47 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, linux-kernel

On Tue, May 05, 2009 at 05:31:59PM +0200, Andi Kleen wrote:
> On Tue, May 05, 2009 at 04:40:27PM +0200, Andreas Herrmann wrote:
> 
> I think the general problem with your patchset is that it seems
> more like a solution in search of a problem, instead of the proper other
> way around.

   <snip>

> > > First I must say it's unclear to me if CPU topology is really generally
> > > useful to export to the user.
> > 
> > I think it is useful.
> 
> You forgot to state for what?

You are kidding, aren't you? Isn't this obvious?
Why shouldn't a user be interested in stuff like core_siblings and
thread_siblings? Maybe a user want to make scheduling decisions based
on that and pin tasks accordingly?

  <snip>

> > (a) Are you saying that users have to check NUMA distances when they
> >     want to pin tasks on certain CPUs?
> 
> CPU == core ? No you just bind to that CPU. Was that a trick question?

Of course that was no trick question -- at most a stupid typo. (Sometimes
in Linux CPU == core.) So, no, sorry I meant "certain cores".
(And I meant not pinning to one core but to a set of cores).

  <snip>

> >     representing entire CPU topology in sysfs makes this obvious.
> 
> You didn't do that.

I partially did that and mentioned what else might be needed, see

http://marc.info/?l=linux-kernel&m=124145920830040

  <snip>

> > Yes, it's a new "special case" which can't be expressed in other ways.
> 
> You seem to assume that it's obviously useful to express. Perhaps
> I'm dumb, but can you explain for what?
> 
> > SLIT and SRAT are not sufficient.
> > 
> > The kernel 
> 
> Which part of the kernel?

I provided this info in my first reply to you. Here it is again:

  "The kernel needs to know this when accessing processor
   configuration space, when accessing shared MSRs or for counting
   northbridge specific events."

To translate this for you. Potential users are
- EDAC ;-)
- other MCA related stuff (e.g. L3 cache index disable)
- performance monitoring
- most probably everything that accesses processor configuration
  space and shared MSRs

I think it is best to store this information centrally instead of
letting each component figuring out which cores are on the
same node.

   <snip>

> The general problem is that you're trying to stretch an old ad-hoc
> hack (siblings etc. in /proc/cpuinfo) to a complicated new graph structure
> and the interface is just not up to that.

You didn't read all my mails regarding this topic.
The patches fixup sibling information for Magny-Cours. This info is
not only exposed to /proc/cpuinfo but also with cpu-topology
information in sysfs. I don't see why
/sys/devices/system/cpu/cpuX/topology is an old ad-hoc hack.

> As a exercise you can try to write a user space program that uses
> these fields to query the topology. It will be a mess.

You shouldn't try to use /proc/cpuinfo for that but instead look up
topology in /sys/devices/system/cpu/cpuX/topology. BTW, I have already
such a script and I am using it.

> We went through the same with the caches. Initially /proc/cpuinfo
> tried to express the caches. At some point it just became too messy
> and cpuinfo now only gives a very simplified "legacy" field and
> the real information is in /sys. It went into /sys because there 
> was a clear use case. If there's a clear use case for exposing
> complex CPU topology (so far that's still an open question
> due to lack of good examples) then also a new proper non hacky
> interface in /sys would make sense.

CPU topology is already in /sys.


Regards,
Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours
  2009-05-05 16:47               ` Andreas Herrmann
@ 2009-05-05 17:54                 ` Andi Kleen
  0 siblings, 0 replies; 17+ messages in thread
From: Andi Kleen @ 2009-05-05 17:54 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Andi Kleen, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	linux-kernel

On Tue, May 05, 2009 at 06:47:33PM +0200, Andreas Herrmann wrote:
> > > > First I must say it's unclear to me if CPU topology is really generally
> > > > useful to export to the user.
> > > 
> > > I think it is useful.
> > 
> > You forgot to state for what?
> 
> You are kidding, aren't you? Isn't this obvious?
> Why shouldn't a user be interested in stuff like core_siblings and
> thread_siblings? Maybe a user want to make scheduling decisions based
> on that and pin tasks accordingly?

my earlier point was in this case they should pin based on cache topology
(= cost of cache line transfer in cache) or on memory topology (cost of 
sharing data out of cache). cpu topology seems comparatively unimportant 
compared to the first two. 

> > > (a) Are you saying that users have to check NUMA distances when they
> > >     want to pin tasks on certain CPUs?
> > 
> > CPU == core ? No you just bind to that CPU. Was that a trick question?
> 
> Of course that was no trick question -- at most a stupid typo. (Sometimes
> in Linux CPU == core.) So, no, sorry I meant "certain cores".
> (And I meant not pinning to one core but to a set of cores).

I suspect you meant pinning to a node :) numactl --cpunodebind=...

> 
> > > SLIT and SRAT are not sufficient.
> > > 
> > > The kernel 
> > 
> > Which part of the kernel?
> 
> I provided this info in my first reply to you. Here it is again:
> 
>   "The kernel needs to know this when accessing processor
>    configuration space, when accessing shared MSRs or for counting
>    northbridge specific events."

Wait, but only a "internal node" shares MSRs and that is just
a node anyways isn't it? So that code just needs to look for nodes.

Ok I think you were worried about NUMA off, but still providing
a "fake mini NUMA" seems inferior to me than just always providing
a basic NUMA (even if it's all the same distance) for this case.

Anyways if it's really not working to use nodes for this internally
(although I must admit it's not fully clear to me why not)
I think it's ok to add this information internally; the part
i just object is extending the already stretched cpuinfo interface
for it and exporting such information without designing a proper
flexible future proof interface. 

> To translate this for you. Potential users are
> - EDAC ;-)
> - other MCA related stuff (e.g. L3 cache index disable)

Surely that's handled with the existing cache topology.

> - performance monitoring
> - most probably everything that accesses processor configuration
>   space and shared MSRs

It's just the same as a NUMA node. Not different from old systems.
The code can just look that up.

> You didn't read all my mails regarding this topic.
> The patches fixup sibling information for Magny-Cours. This info is
> not only exposed to /proc/cpuinfo but also with cpu-topology
> information in sysfs. I don't see why
> /sys/devices/system/cpu/cpuX/topology is an old ad-hoc hack.

Ok need to check that. I hope it didn't do the graph hardcoding
like your cpuinfo patch though.  After all sysfs is flexible
enough to express arbitary graphs and if something is moved
there it should be a flexible interface. Again I'm not sure
it's really needed though. Having three different topologies
for scheduling is likely not a very good idea in any case.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours
  2009-05-04 17:33 [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours Andreas Herrmann
                   ` (4 preceding siblings ...)
  2009-05-04 20:16 ` Andi Kleen
@ 2009-05-08 14:28 ` Andreas Herrmann
  5 siblings, 0 replies; 17+ messages in thread
From: Andreas Herrmann @ 2009-05-08 14:28 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner; +Cc: linux-kernel

On Mon, May 04, 2009 at 07:33:30PM +0200, Andreas Herrmann wrote:
> For all cores on the same multi-node CPU (Magny-Cours) /proc/cpuinfo
> will show:
> - same phys_proc_id
> - cpu_node_id of the internal node (0 or 1)
> - cpu_core_id (e.g. in range of 0 to 5)

Are there further objections (besides Andi's) to show cpu_node_id
in /proc/cpuinfo?

As I've written in another mail I also plan to introduce
cpu_node_siblings and expose this information in
/sys/devices/system/cpu/cpuX/topology

That would mean:
- for single-node processors node_siblings are equal to
  core_siblings.
- for dual-node processors node_siblings is the union of
  two sets of core_siblings (one set on each node).

Any objections/comments? Should I use some other naming instead of node?
Node means a set of cores plus a northbirdge and memory controller.
But maybe I should use chip instead of node (chip_siblings, cpu_chip_id,
etc.)? This would be in dependence on multi-CHIP module.

> Patches are against tip/master as of today.
> Please consider patches 1 and 2 for .30.

I guess it's way too late to add something like this for .30, right?
So I'll prepare some new patches and try to hit .31.


Thanks,
Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-05-08 14:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-04 17:33 [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours Andreas Herrmann
2009-05-04 17:34 ` [PATCH 1/3] x86: introduce cpuinfo->cpu_node_id to reflect topology of multi-node CPU Andreas Herrmann
2009-05-06 11:44   ` Ingo Molnar
2009-05-06 16:14     ` Andreas Herrmann
2009-05-04 17:36 ` [PATCH 2/3] x86: fixup topology detection for AMD " Andreas Herrmann
2009-05-04 17:37 ` [PATCH 3/3] x86: cacheinfo: fixup L3 cache information " Andreas Herrmann
2009-05-04 17:44 ` [PATCH 0/3] x86: adapt CPU topology detection for AMD Magny-Cours Andreas Herrmann
2009-05-04 20:16 ` Andi Kleen
2009-05-05  9:22   ` Andreas Herrmann
2009-05-05  9:35     ` Andi Kleen
2009-05-05 10:48       ` Andreas Herrmann
2009-05-05 12:02         ` Andi Kleen
2009-05-05 14:40           ` Andreas Herrmann
2009-05-05 15:31             ` Andi Kleen
2009-05-05 16:47               ` Andreas Herrmann
2009-05-05 17:54                 ` Andi Kleen
2009-05-08 14:28 ` Andreas Herrmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox