[PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2)
@ 2007-08-24 22:26 travis
  2007-08-24 22:26 ` [PATCH 1/6] x86: fix cpu_to_node references (v2) travis
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: travis @ 2007-08-24 22:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, Andrew Morton, Christoph Lameter

Changes for version v2:

> > Note the addtional change of the cpu_llc_id type from u8
> > to int for ARCH x86_64 to correspond with ARCH i386.

> At least currently it cannot be more than 8 bit. So why
> waste memory? It would be better to change i386

Done.  (x86_64 type => u8).

> > Fix four instances where cpu_to_node is referenced
> > > by array instead of via the cpu_to_node macro.  This
> > > is preparation to moving it to the per_cpu data area.

> Shouldn't this patch be logically before the per cpu 
> conversion (which is 3/6). This way the result would
> be git bisectable.

Done.  (Moved to PATCH 1/6).

> > processor_core.c currently tries to determine the apicid by special casing
> > for IA64 and x86. The desired information is readily available via
> > 
> > 	    cpu_physical_id()
> > 
> > on IA64, i386 and x86_64.
> 
> Have you tried this with a !CONFIG_SMP build? The drivers/dma code was doing
> the same and running into problems because it wasn't defined there.

Fixed. (New export in PATCH 6/6).

Previous Intro:

In x86_64 and i386 architectures most arrays that are sized
using NR_CPUS lay in local memory on node 0.  Not only will most
(99%?) of the systems not use all the slots in these arrays,
particularly when NR_CPUS is increased to accommodate future
very high cpu count systems, but a number of cache lines are
passed unnecessarily on the system bus when these arrays are
referenced by cpus on other nodes.

Typically, the values in these arrays are referenced by the cpu
accessing it's own values, though when passing IPI interrupts,
the cpu does access the data relevant to the targeted cpu/node.
Of course, if the referencing cpu is not on node 0, then the
reference will still require cross node exchanges of cache
lines.  A common use of this is for an interrupt service
routine to pass the interrupt to other cpus local to that node.

Ideally, all the elements in these arrays should be moved to the
per_cpu data area.  In some cases (such as x86_cpu_to_apicid)
the array is referenced before the per_cpu data areas are setup.
In this case, a static array is declared in the __initdata
area and initialized by the booting cpu (BSP).  The values are
then moved to the per_cpu area after it is initialized and the
original static array is freed with the rest of the __initdata.
-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] x86: fix cpu_to_node references (v2)
  2007-08-24 22:26 [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2) travis
@ 2007-08-24 22:26 ` travis
  2007-08-25  0:23   ` Siddha, Suresh B
  2007-08-24 22:26 ` [PATCH 2/6] x86: Convert cpu_core_map to be a per cpu variable (v2) travis
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: travis @ 2007-08-24 22:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, Andrew Morton, Christoph Lameter

[-- Attachment #1: fix-cpu_to_node-refs --]
[-- Type: text/plain, Size: 1924 bytes --]

Fix four instances where cpu_to_node is referenced
by array instead of via the cpu_to_node macro.  This
is preparation to moving it to the per_cpu data area.

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86_64/kernel/vsyscall.c |    2 +-
 arch/x86_64/mm/numa.c         |    4 ++--
 arch/x86_64/mm/srat.c         |    4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86_64/kernel/vsyscall.c
+++ b/arch/x86_64/kernel/vsyscall.c
@@ -283,7 +283,7 @@
 	unsigned long *d;
 	unsigned long node = 0;
 #ifdef CONFIG_NUMA
-	node = cpu_to_node[cpu];
+	node = cpu_to_node(cpu);
 #endif
 	if (cpu_has(&cpu_data[cpu], X86_FEATURE_RDTSCP))
 		write_rdtscp_aux((node << 12) | cpu);
--- a/arch/x86_64/mm/numa.c
+++ b/arch/x86_64/mm/numa.c
@@ -264,7 +264,7 @@
 	   We round robin the existing nodes. */
 	rr = first_node(node_online_map);
 	for (i = 0; i < NR_CPUS; i++) {
-		if (cpu_to_node[i] != NUMA_NO_NODE)
+		if (cpu_to_node(i) != NUMA_NO_NODE)
 			continue;
  		numa_set_node(i, rr);
 		rr = next_node(rr, node_online_map);
@@ -546,7 +546,7 @@
 void __cpuinit numa_set_node(int cpu, int node)
 {
 	cpu_pda(cpu)->nodenumber = node;
-	cpu_to_node[cpu] = node;
+	cpu_to_node(cpu) = node;
 }
 
 unsigned long __init numa_free_all_bootmem(void) 
--- a/arch/x86_64/mm/srat.c
+++ b/arch/x86_64/mm/srat.c
@@ -431,9 +431,9 @@
 			setup_node_bootmem(i, nodes[i].start, nodes[i].end);
 
 	for (i = 0; i < NR_CPUS; i++) {
-		if (cpu_to_node[i] == NUMA_NO_NODE)
+		if (cpu_to_node(i) == NUMA_NO_NODE)
 			continue;
-		if (!node_isset(cpu_to_node[i], node_possible_map))
+		if (!node_isset(cpu_to_node(i), node_possible_map))
 			numa_set_node(i, NUMA_NO_NODE);
 	}
 	numa_init_array();

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/6] x86: Convert cpu_core_map to be a per cpu variable (v2)
  2007-08-24 22:26 [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2) travis
  2007-08-24 22:26 ` [PATCH 1/6] x86: fix cpu_to_node references (v2) travis
@ 2007-08-24 22:26 ` travis
  2007-08-24 22:26 ` [PATCH 3/6] x86: Convert cpu_sibling_map " travis
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: travis @ 2007-08-24 22:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, Andrew Morton, Christoph Lameter

[-- Attachment #1: convert-cpu_core_map-to-per_cpu_data --]
[-- Type: text/plain, Size: 13467 bytes --]

This is from an earlier message from 'Christoph Lameter':

    cpu_core_map is currently an array defined using NR_CPUS. This means that
    we overallocate since we will rarely really use maximum configured cpu.

    If we put the cpu_core_map into the per cpu area then it will be allocated
    for each processor as it comes online.

    This means that the core map cannot be accessed until the per cpu area
    has been allocated. Xen does a weird thing here looping over all processors
    and zeroing the masks that are not yet allocated and that will be zeroed
    when they are allocated. I commented the code out.

    Signed-off-by: Christoph Lameter <clameter@sgi.com>

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c |    2 -
 arch/i386/kernel/cpu/cpufreq/powernow-k8.c  |   10 ++++----
 arch/i386/kernel/cpu/proc.c                 |    3 +-
 arch/i386/kernel/smpboot.c                  |   34 ++++++++++++++--------------
 arch/i386/xen/smp.c                         |   14 +++++++++--
 arch/x86_64/kernel/mce_amd.c                |    6 ++--
 arch/x86_64/kernel/setup.c                  |    3 +-
 arch/x86_64/kernel/smpboot.c                |   24 +++++++++----------
 include/asm-i386/smp.h                      |    2 -
 include/asm-i386/topology.h                 |    2 -
 include/asm-x86_64/smp.h                    |    8 +++++-
 include/asm-x86_64/topology.h               |    2 -
 12 files changed, 64 insertions(+), 46 deletions(-)

--- a/include/asm-x86_64/smp.h
+++ b/include/asm-x86_64/smp.h
@@ -39,7 +39,13 @@
 extern void smp_send_reschedule(int cpu);
 
 extern cpumask_t cpu_sibling_map[NR_CPUS];
-extern cpumask_t cpu_core_map[NR_CPUS];
+/*
+ * cpu_core_map lives in a per cpu area
+ *
+ * extern cpumask_t cpu_core_map[NR_CPUS];
+ */
+DECLARE_PER_CPU(cpumask_t, cpu_core_map);
+
 extern u8 cpu_llc_id[NR_CPUS];
 
 #define SMP_TRAMPOLINE_BASE 0x6000
--- a/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -595,7 +595,7 @@
 	dmi_check_system(sw_any_bug_dmi_table);
 	if (bios_with_sw_any_bug && cpus_weight(policy->cpus) == 1) {
 		policy->shared_type = CPUFREQ_SHARED_TYPE_ALL;
-		policy->cpus = cpu_core_map[cpu];
+		policy->cpus = per_cpu(cpu_core_map, cpu);
 	}
 #endif
 
--- a/arch/i386/kernel/cpu/cpufreq/powernow-k8.c
+++ b/arch/i386/kernel/cpu/cpufreq/powernow-k8.c
@@ -57,7 +57,7 @@
 static int cpu_family = CPU_OPTERON;
 
 #ifndef CONFIG_SMP
-static cpumask_t cpu_core_map[1];
+DEFINE_PER_CPU(cpumask_t, cpu_core_map);
 #endif
 
 /* Return a frequency in MHz, given an input fid */
@@ -667,7 +667,7 @@
 
 	dprintk("cfid 0x%x, cvid 0x%x\n", data->currfid, data->currvid);
 	data->powernow_table = powernow_table;
-	if (first_cpu(cpu_core_map[data->cpu]) == data->cpu)
+	if (first_cpu(per_cpu(cpu_core_map, data->cpu)) == data->cpu)
 		print_basics(data);
 
 	for (j = 0; j < data->numps; j++)
@@ -821,7 +821,7 @@
 
 	/* fill in data */
 	data->numps = data->acpi_data.state_count;
-	if (first_cpu(cpu_core_map[data->cpu]) == data->cpu)
+	if (first_cpu(per_cpu(cpu_core_map, data->cpu)) == data->cpu)
 		print_basics(data);
 	powernow_k8_acpi_pst_values(data, 0);
 
@@ -1214,7 +1214,7 @@
 	if (cpu_family == CPU_HW_PSTATE)
 		pol->cpus = cpumask_of_cpu(pol->cpu);
 	else
-		pol->cpus = cpu_core_map[pol->cpu];
+		pol->cpus = per_cpu(cpu_core_map, pol->cpu);
 	data->available_cores = &(pol->cpus);
 
 	/* Take a crude guess here.
@@ -1281,7 +1281,7 @@
 	cpumask_t oldmask = current->cpus_allowed;
 	unsigned int khz = 0;
 
-	data = powernow_data[first_cpu(cpu_core_map[cpu])];
+	data = powernow_data[first_cpu(per_cpu(cpu_core_map, cpu))];
 
 	if (!data)
 		return -EINVAL;
--- a/arch/i386/kernel/cpu/proc.c
+++ b/arch/i386/kernel/cpu/proc.c
@@ -122,7 +122,8 @@
 #ifdef CONFIG_X86_HT
 	if (c->x86_max_cores * smp_num_siblings > 1) {
 		seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
-		seq_printf(m, "siblings\t: %d\n", cpus_weight(cpu_core_map[n]));
+		seq_printf(m, "siblings\t: %d\n",
+				cpus_weight(per_cpu(cpu_core_map, n)));
 		seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
 		seq_printf(m, "cpu cores\t: %d\n", c->booted_cores);
 	}
--- a/arch/i386/kernel/smpboot.c
+++ b/arch/i386/kernel/smpboot.c
@@ -74,8 +74,8 @@
 EXPORT_SYMBOL(cpu_sibling_map);
 
 /* representing HT and core siblings of each logical CPU */
-cpumask_t cpu_core_map[NR_CPUS] __read_mostly;
-EXPORT_SYMBOL(cpu_core_map);
+DEFINE_PER_CPU(cpumask_t, cpu_core_map);
+EXPORT_PER_CPU_SYMBOL(cpu_core_map);
 
 /* bitmap of online cpus */
 cpumask_t cpu_online_map __read_mostly;
@@ -300,7 +300,7 @@
 	 * And for power savings, we return cpu_core_map
 	 */
 	if (sched_mc_power_savings || sched_smt_power_savings)
-		return cpu_core_map[cpu];
+		return per_cpu(cpu_core_map, cpu);
 	else
 		return c->llc_shared_map;
 }
@@ -321,8 +321,8 @@
 			    c[cpu].cpu_core_id == c[i].cpu_core_id) {
 				cpu_set(i, cpu_sibling_map[cpu]);
 				cpu_set(cpu, cpu_sibling_map[i]);
-				cpu_set(i, cpu_core_map[cpu]);
-				cpu_set(cpu, cpu_core_map[i]);
+				cpu_set(i, per_cpu(cpu_core_map, cpu));
+				cpu_set(cpu, per_cpu(cpu_core_map, i));
 				cpu_set(i, c[cpu].llc_shared_map);
 				cpu_set(cpu, c[i].llc_shared_map);
 			}
@@ -334,7 +334,7 @@
 	cpu_set(cpu, c[cpu].llc_shared_map);
 
 	if (current_cpu_data.x86_max_cores == 1) {
-		cpu_core_map[cpu] = cpu_sibling_map[cpu];
+		per_cpu(cpu_core_map, cpu) = cpu_sibling_map[cpu];
 		c[cpu].booted_cores = 1;
 		return;
 	}
@@ -346,8 +346,8 @@
 			cpu_set(cpu, c[i].llc_shared_map);
 		}
 		if (c[cpu].phys_proc_id == c[i].phys_proc_id) {
-			cpu_set(i, cpu_core_map[cpu]);
-			cpu_set(cpu, cpu_core_map[i]);
+			cpu_set(i, per_cpu(cpu_core_map, cpu));
+			cpu_set(cpu, per_cpu(cpu_core_map, i));
 			/*
 			 *  Does this new cpu bringup a new core?
 			 */
@@ -984,7 +984,7 @@
 					   " Using dummy APIC emulation.\n");
 		map_cpu_to_logical_apicid();
 		cpu_set(0, cpu_sibling_map[0]);
-		cpu_set(0, cpu_core_map[0]);
+		cpu_set(0, per_cpu(cpu_core_map, 0));
 		return;
 	}
 
@@ -1009,7 +1009,7 @@
 		smpboot_clear_io_apic_irqs();
 		phys_cpu_present_map = physid_mask_of_physid(0);
 		cpu_set(0, cpu_sibling_map[0]);
-		cpu_set(0, cpu_core_map[0]);
+		cpu_set(0, per_cpu(cpu_core_map, 0));
 		return;
 	}
 
@@ -1024,7 +1024,7 @@
 		smpboot_clear_io_apic_irqs();
 		phys_cpu_present_map = physid_mask_of_physid(0);
 		cpu_set(0, cpu_sibling_map[0]);
-		cpu_set(0, cpu_core_map[0]);
+		cpu_set(0, per_cpu(cpu_core_map, 0));
 		return;
 	}
 
@@ -1107,11 +1107,11 @@
 	 */
 	for (cpu = 0; cpu < NR_CPUS; cpu++) {
 		cpus_clear(cpu_sibling_map[cpu]);
-		cpus_clear(cpu_core_map[cpu]);
+		cpus_clear(per_cpu(cpu_core_map, cpu));
 	}
 
 	cpu_set(0, cpu_sibling_map[0]);
-	cpu_set(0, cpu_core_map[0]);
+	cpu_set(0, per_cpu(cpu_core_map, 0));
 
 	smpboot_setup_io_apic();
 
@@ -1148,9 +1148,9 @@
 	int sibling;
 	struct cpuinfo_x86 *c = cpu_data;
 
-	for_each_cpu_mask(sibling, cpu_core_map[cpu]) {
-		cpu_clear(cpu, cpu_core_map[sibling]);
-		/*
+	for_each_cpu_mask(sibling, per_cpu(cpu_core_map, cpu)) {
+		cpu_clear(cpu, per_cpu(cpu_core_map, sibling));
+		/*/
 		 * last thread sibling in this cpu core going down
 		 */
 		if (cpus_weight(cpu_sibling_map[cpu]) == 1)
@@ -1160,7 +1160,7 @@
 	for_each_cpu_mask(sibling, cpu_sibling_map[cpu])
 		cpu_clear(cpu, cpu_sibling_map[sibling]);
 	cpus_clear(cpu_sibling_map[cpu]);
-	cpus_clear(cpu_core_map[cpu]);
+	cpus_clear(per_cpu(cpu_core_map, cpu));
 	c[cpu].phys_proc_id = 0;
 	c[cpu].cpu_core_id = 0;
 	cpu_clear(cpu, cpu_sibling_setup_map);
--- a/arch/i386/xen/smp.c
+++ b/arch/i386/xen/smp.c
@@ -148,7 +148,12 @@
 
 	for (cpu = 0; cpu < NR_CPUS; cpu++) {
 		cpus_clear(cpu_sibling_map[cpu]);
-		cpus_clear(cpu_core_map[cpu]);
+		/*
+		 * cpu_core_map lives in a per cpu area that is cleared
+		 * when the per cpu array is allocated.
+		 *
+		 * cpus_clear(per_cpu(cpu_core_map, cpu));
+		 */
 	}
 
 	xen_setup_vcpu_info_placement();
@@ -160,7 +165,12 @@
 
 	for (cpu = 0; cpu < NR_CPUS; cpu++) {
 		cpus_clear(cpu_sibling_map[cpu]);
-		cpus_clear(cpu_core_map[cpu]);
+		/*
+		 * cpu_core_ map will be zeroed when the per
+		 * cpu area is allocated.
+		 *
+		 * cpus_clear(per_cpu(cpu_core_map, cpu));
+		 */
 	}
 
 	smp_store_cpu_info(0);
--- a/arch/x86_64/kernel/mce_amd.c
+++ b/arch/x86_64/kernel/mce_amd.c
@@ -473,7 +473,7 @@
 
 #ifdef CONFIG_SMP
 	if (cpu_data[cpu].cpu_core_id && shared_bank[bank]) {	/* symlink */
-		i = first_cpu(cpu_core_map[cpu]);
+		i = first_cpu(per_cpu(cpu_core_map, cpu));
 
 		/* first core not up yet */
 		if (cpu_data[i].cpu_core_id)
@@ -493,7 +493,7 @@
 		if (err)
 			goto out;
 
-		b->cpus = cpu_core_map[cpu];
+		b->cpus = per_cpu(cpu_core_map, cpu);
 		per_cpu(threshold_banks, cpu)[bank] = b;
 		goto out;
 	}
@@ -510,7 +510,7 @@
 #ifndef CONFIG_SMP
 	b->cpus = CPU_MASK_ALL;
 #else
-	b->cpus = cpu_core_map[cpu];
+	b->cpus = per_cpu(cpu_core_map, cpu);
 #endif
 	err = kobject_register(&b->kobj);
 	if (err)
--- a/arch/x86_64/kernel/setup.c
+++ b/arch/x86_64/kernel/setup.c
@@ -1089,7 +1089,8 @@
 	if (smp_num_siblings * c->x86_max_cores > 1) {
 		int cpu = c - cpu_data;
 		seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
-		seq_printf(m, "siblings\t: %d\n", cpus_weight(cpu_core_map[cpu]));
+		seq_printf(m, "siblings\t: %d\n",
+			       cpus_weight(per_cpu(cpu_core_map, cpu)));
 		seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
 		seq_printf(m, "cpu cores\t: %d\n", c->booted_cores);
 	}
--- a/arch/x86_64/kernel/smpboot.c
+++ b/arch/x86_64/kernel/smpboot.c
@@ -95,8 +95,8 @@
 EXPORT_SYMBOL(cpu_sibling_map);
 
 /* representing HT and core siblings of each logical CPU */
-cpumask_t cpu_core_map[NR_CPUS] __read_mostly;
-EXPORT_SYMBOL(cpu_core_map);
+DEFINE_PER_CPU(cpumask_t, cpu_core_map);
+EXPORT_PER_CPU_SYMBOL(cpu_core_map);
 
 /*
  * Trampoline 80x86 program as an array.
@@ -245,7 +245,7 @@
 	 * And for power savings, we return cpu_core_map
 	 */
 	if (sched_mc_power_savings || sched_smt_power_savings)
-		return cpu_core_map[cpu];
+		return per_cpu(cpu_core_map, cpu);
 	else
 		return c->llc_shared_map;
 }
@@ -266,8 +266,8 @@
 			    c[cpu].cpu_core_id == c[i].cpu_core_id) {
 				cpu_set(i, cpu_sibling_map[cpu]);
 				cpu_set(cpu, cpu_sibling_map[i]);
-				cpu_set(i, cpu_core_map[cpu]);
-				cpu_set(cpu, cpu_core_map[i]);
+				cpu_set(i, per_cpu(cpu_core_map, cpu));
+				cpu_set(cpu, per_cpu(cpu_core_map, i));
 				cpu_set(i, c[cpu].llc_shared_map);
 				cpu_set(cpu, c[i].llc_shared_map);
 			}
@@ -279,7 +279,7 @@
 	cpu_set(cpu, c[cpu].llc_shared_map);
 
 	if (current_cpu_data.x86_max_cores == 1) {
-		cpu_core_map[cpu] = cpu_sibling_map[cpu];
+		per_cpu(cpu_core_map, cpu) = cpu_sibling_map[cpu];
 		c[cpu].booted_cores = 1;
 		return;
 	}
@@ -291,8 +291,8 @@
 			cpu_set(cpu, c[i].llc_shared_map);
 		}
 		if (c[cpu].phys_proc_id == c[i].phys_proc_id) {
-			cpu_set(i, cpu_core_map[cpu]);
-			cpu_set(cpu, cpu_core_map[i]);
+			cpu_set(i, per_cpu(cpu_core_map, cpu));
+			cpu_set(cpu, per_cpu(cpu_core_map, i));
 			/*
 			 *  Does this new cpu bringup a new core?
 			 */
@@ -742,7 +742,7 @@
 	else
 		phys_cpu_present_map = physid_mask_of_physid(0);
 	cpu_set(0, cpu_sibling_map[0]);
-	cpu_set(0, cpu_core_map[0]);
+	cpu_set(0, per_cpu(cpu_core_map, 0));
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -977,8 +977,8 @@
 	int sibling;
 	struct cpuinfo_x86 *c = cpu_data;
 
-	for_each_cpu_mask(sibling, cpu_core_map[cpu]) {
-		cpu_clear(cpu, cpu_core_map[sibling]);
+	for_each_cpu_mask(sibling, per_cpu(cpu_core_map, cpu)) {
+		cpu_clear(cpu, per_cpu(cpu_core_map, sibling));
 		/*
 		 * last thread sibling in this cpu core going down
 		 */
@@ -989,7 +989,7 @@
 	for_each_cpu_mask(sibling, cpu_sibling_map[cpu])
 		cpu_clear(cpu, cpu_sibling_map[sibling]);
 	cpus_clear(cpu_sibling_map[cpu]);
-	cpus_clear(cpu_core_map[cpu]);
+	cpus_clear(per_cpu(cpu_core_map, cpu));
 	c[cpu].phys_proc_id = 0;
 	c[cpu].cpu_core_id = 0;
 	cpu_clear(cpu, cpu_sibling_setup_map);
--- a/include/asm-i386/smp.h
+++ b/include/asm-i386/smp.h
@@ -31,7 +31,7 @@
 extern int pic_mode;
 extern int smp_num_siblings;
 extern cpumask_t cpu_sibling_map[];
-extern cpumask_t cpu_core_map[];
+DECLARE_PER_CPU(cpumask_t, cpu_core_map);
 
 extern void (*mtrr_hook) (void);
 extern void zap_low_mappings (void);
--- a/include/asm-i386/topology.h
+++ b/include/asm-i386/topology.h
@@ -30,7 +30,7 @@
 #ifdef CONFIG_X86_HT
 #define topology_physical_package_id(cpu)	(cpu_data[cpu].phys_proc_id)
 #define topology_core_id(cpu)			(cpu_data[cpu].cpu_core_id)
-#define topology_core_siblings(cpu)		(cpu_core_map[cpu])
+#define topology_core_siblings(cpu)		(per_cpu(cpu_core_map, cpu))
 #define topology_thread_siblings(cpu)		(cpu_sibling_map[cpu])
 #endif
 
--- a/include/asm-x86_64/topology.h
+++ b/include/asm-x86_64/topology.h
@@ -71,7 +71,7 @@
 #ifdef CONFIG_SMP
 #define topology_physical_package_id(cpu)	(cpu_data[cpu].phys_proc_id)
 #define topology_core_id(cpu)			(cpu_data[cpu].cpu_core_id)
-#define topology_core_siblings(cpu)		(cpu_core_map[cpu])
+#define topology_core_siblings(cpu)		(per_cpu(cpu_core_map, cpu))
 #define topology_thread_siblings(cpu)		(cpu_sibling_map[cpu])
 #define mc_capable()			(boot_cpu_data.x86_max_cores > 1)
 #define smt_capable() 			(smp_num_siblings > 1)

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/6] x86: Convert cpu_sibling_map to be a per cpu variable (v2)
  2007-08-24 22:26 [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2) travis
  2007-08-24 22:26 ` [PATCH 1/6] x86: fix cpu_to_node references (v2) travis
  2007-08-24 22:26 ` [PATCH 2/6] x86: Convert cpu_core_map to be a per cpu variable (v2) travis
@ 2007-08-24 22:26 ` travis
  2007-09-01  2:49   ` Andrew Morton
  2007-08-24 22:26 ` [PATCH 4/6] x86: Convert x86_cpu_to_apicid " travis
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: travis @ 2007-08-24 22:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, Andrew Morton, Christoph Lameter

[-- Attachment #1: convert-cpu_sibling_map-to-per_cpu_data --]
[-- Type: text/plain, Size: 13360 bytes --]

Convert cpu_sibling_map from a static array sized by NR_CPUS to a
per_cpu variable.  This saves sizeof(cpumask_t) * NR unused cpus.
Access is mostly from startup and CPU HOTPLUG functions.

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/i386/kernel/cpu/cpufreq/p4-clockmod.c   |    2 -
 arch/i386/kernel/cpu/cpufreq/speedstep-ich.c |    2 -
 arch/i386/kernel/io_apic.c                   |    4 +--
 arch/i386/kernel/smpboot.c                   |   36 +++++++++++++--------------
 arch/i386/oprofile/op_model_p4.c             |    2 -
 arch/i386/xen/smp.c                          |    4 +--
 arch/x86_64/kernel/smpboot.c                 |   26 +++++++++----------
 block/blktrace.c                             |    2 -
 include/asm-i386/smp.h                       |    2 -
 include/asm-i386/topology.h                  |    2 -
 include/asm-x86_64/smp.h                     |    6 +++-
 include/asm-x86_64/topology.h                |    2 -
 kernel/sched.c                               |    8 +++---
 13 files changed, 50 insertions(+), 48 deletions(-)

--- a/arch/i386/kernel/cpu/cpufreq/p4-clockmod.c
+++ b/arch/i386/kernel/cpu/cpufreq/p4-clockmod.c
@@ -200,7 +200,7 @@
 	unsigned int i;
 
 #ifdef CONFIG_SMP
-	policy->cpus = cpu_sibling_map[policy->cpu];
+	policy->cpus = per_cpu(cpu_sibling_map, policy->cpu);
 #endif
 
 	/* Errata workaround */
--- a/arch/i386/kernel/cpu/cpufreq/speedstep-ich.c
+++ b/arch/i386/kernel/cpu/cpufreq/speedstep-ich.c
@@ -322,7 +322,7 @@
 
 	/* only run on CPU to be set, or on its sibling */
 #ifdef CONFIG_SMP
-	policy->cpus = cpu_sibling_map[policy->cpu];
+	policy->cpus = per_cpu(cpu_sibling_map, policy->cpu);
 #endif
 
 	cpus_allowed = current->cpus_allowed;
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -378,7 +378,7 @@
 
 #define IRQ_ALLOWED(cpu, allowed_mask)	cpu_isset(cpu, allowed_mask)
 
-#define CPU_TO_PACKAGEINDEX(i) (first_cpu(cpu_sibling_map[i]))
+#define CPU_TO_PACKAGEINDEX(i) (first_cpu(per_cpu(cpu_sibling_map, i)))
 
 static cpumask_t balance_irq_affinity[NR_IRQS] = {
 	[0 ... NR_IRQS-1] = CPU_MASK_ALL
@@ -598,7 +598,7 @@
 	 * (A+B)/2 vs B
 	 */
 	load = CPU_IRQ(min_loaded) >> 1;
-	for_each_cpu_mask(j, cpu_sibling_map[min_loaded]) {
+	for_each_cpu_mask(j, per_cpu(cpu_sibling_map, min_loaded)) {
 		if (load > CPU_IRQ(j)) {
 			/* This won't change cpu_sibling_map[min_loaded] */
 			load = CPU_IRQ(j);
--- a/arch/i386/kernel/smpboot.c
+++ b/arch/i386/kernel/smpboot.c
@@ -70,8 +70,8 @@
 int cpu_llc_id[NR_CPUS] __cpuinitdata = {[0 ... NR_CPUS-1] = BAD_APICID};
 
 /* representing HT siblings of each logical CPU */
-cpumask_t cpu_sibling_map[NR_CPUS] __read_mostly;
-EXPORT_SYMBOL(cpu_sibling_map);
+DEFINE_PER_CPU(cpumask_t, cpu_sibling_map);
+EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
 
 /* representing HT and core siblings of each logical CPU */
 DEFINE_PER_CPU(cpumask_t, cpu_core_map);
@@ -319,8 +319,8 @@
 		for_each_cpu_mask(i, cpu_sibling_setup_map) {
 			if (c[cpu].phys_proc_id == c[i].phys_proc_id &&
 			    c[cpu].cpu_core_id == c[i].cpu_core_id) {
-				cpu_set(i, cpu_sibling_map[cpu]);
-				cpu_set(cpu, cpu_sibling_map[i]);
+				cpu_set(i, per_cpu(cpu_sibling_map, cpu));
+				cpu_set(cpu, per_cpu(cpu_sibling_map, i));
 				cpu_set(i, per_cpu(cpu_core_map, cpu));
 				cpu_set(cpu, per_cpu(cpu_core_map, i));
 				cpu_set(i, c[cpu].llc_shared_map);
@@ -328,13 +328,13 @@
 			}
 		}
 	} else {
-		cpu_set(cpu, cpu_sibling_map[cpu]);
+		cpu_set(cpu, per_cpu(cpu_sibling_map, cpu));
 	}
 
 	cpu_set(cpu, c[cpu].llc_shared_map);
 
 	if (current_cpu_data.x86_max_cores == 1) {
-		per_cpu(cpu_core_map, cpu) = cpu_sibling_map[cpu];
+		per_cpu(cpu_core_map, cpu) = per_cpu(cpu_sibling_map, cpu);
 		c[cpu].booted_cores = 1;
 		return;
 	}
@@ -351,12 +351,12 @@
 			/*
 			 *  Does this new cpu bringup a new core?
 			 */
-			if (cpus_weight(cpu_sibling_map[cpu]) == 1) {
+			if (cpus_weight(per_cpu(cpu_sibling_map, cpu)) == 1) {
 				/*
 				 * for each core in package, increment
 				 * the booted_cores for this new cpu
 				 */
-				if (first_cpu(cpu_sibling_map[i]) == i)
+				if (first_cpu(per_cpu(cpu_sibling_map, i)) == i)
 					c[cpu].booted_cores++;
 				/*
 				 * increment the core count for all
@@ -983,7 +983,7 @@
 			printk(KERN_NOTICE "Local APIC not detected."
 					   " Using dummy APIC emulation.\n");
 		map_cpu_to_logical_apicid();
-		cpu_set(0, cpu_sibling_map[0]);
+		cpu_set(0, per_cpu(cpu_sibling_map, 0));
 		cpu_set(0, per_cpu(cpu_core_map, 0));
 		return;
 	}
@@ -1008,7 +1008,7 @@
 		printk(KERN_ERR "... forcing use of dummy APIC emulation. (tell your hw vendor)\n");
 		smpboot_clear_io_apic_irqs();
 		phys_cpu_present_map = physid_mask_of_physid(0);
-		cpu_set(0, cpu_sibling_map[0]);
+		cpu_set(0, per_cpu(cpu_sibling_map, 0));
 		cpu_set(0, per_cpu(cpu_core_map, 0));
 		return;
 	}
@@ -1023,7 +1023,7 @@
 		printk(KERN_INFO "SMP mode deactivated, forcing use of dummy APIC emulation.\n");
 		smpboot_clear_io_apic_irqs();
 		phys_cpu_present_map = physid_mask_of_physid(0);
-		cpu_set(0, cpu_sibling_map[0]);
+		cpu_set(0, per_cpu(cpu_sibling_map, 0));
 		cpu_set(0, per_cpu(cpu_core_map, 0));
 		return;
 	}
@@ -1102,15 +1102,15 @@
 	Dprintk("Boot done.\n");
 
 	/*
-	 * construct cpu_sibling_map[], so that we can tell sibling CPUs
+	 * construct cpu_sibling_map, so that we can tell sibling CPUs
 	 * efficiently.
 	 */
 	for (cpu = 0; cpu < NR_CPUS; cpu++) {
-		cpus_clear(cpu_sibling_map[cpu]);
+		cpus_clear(per_cpu(cpu_sibling_map, cpu));
 		cpus_clear(per_cpu(cpu_core_map, cpu));
 	}
 
-	cpu_set(0, cpu_sibling_map[0]);
+	cpu_set(0, per_cpu(cpu_sibling_map, 0));
 	cpu_set(0, per_cpu(cpu_core_map, 0));
 
 	smpboot_setup_io_apic();
@@ -1153,13 +1153,13 @@
 		/*/
 		 * last thread sibling in this cpu core going down
 		 */
-		if (cpus_weight(cpu_sibling_map[cpu]) == 1)
+		if (cpus_weight(per_cpu(cpu_sibling_map, cpu)) == 1)
 			c[sibling].booted_cores--;
 	}
 			
-	for_each_cpu_mask(sibling, cpu_sibling_map[cpu])
-		cpu_clear(cpu, cpu_sibling_map[sibling]);
-	cpus_clear(cpu_sibling_map[cpu]);
+	for_each_cpu_mask(sibling, per_cpu(cpu_sibling_map, cpu))
+		cpu_clear(cpu, per_cpu(cpu_sibling_map, sibling));
+	cpus_clear(per_cpu(cpu_sibling_map, cpu));
 	cpus_clear(per_cpu(cpu_core_map, cpu));
 	c[cpu].phys_proc_id = 0;
 	c[cpu].cpu_core_id = 0;
--- a/arch/i386/oprofile/op_model_p4.c
+++ b/arch/i386/oprofile/op_model_p4.c
@@ -379,7 +379,7 @@
 {
 #ifdef CONFIG_SMP
 	int cpu = smp_processor_id();
-	return (cpu != first_cpu(cpu_sibling_map[cpu]));
+	return (cpu != first_cpu(per_cpu(cpu_sibling_map, cpu)));
 #endif	
 	return 0;
 }
--- a/arch/i386/xen/smp.c
+++ b/arch/i386/xen/smp.c
@@ -147,7 +147,7 @@
 	make_lowmem_page_readwrite(&per_cpu__gdt_page);
 
 	for (cpu = 0; cpu < NR_CPUS; cpu++) {
-		cpus_clear(cpu_sibling_map[cpu]);
+		cpus_clear(per_cpu(cpu_sibling_map, cpu));
 		/*
 		 * cpu_core_map lives in a per cpu area that is cleared
 		 * when the per cpu array is allocated.
@@ -164,7 +164,7 @@
 	unsigned cpu;
 
 	for (cpu = 0; cpu < NR_CPUS; cpu++) {
-		cpus_clear(cpu_sibling_map[cpu]);
+		cpus_clear(per_cpu(cpu_sibling_map, cpu));
 		/*
 		 * cpu_core_ map will be zeroed when the per
 		 * cpu area is allocated.
--- a/arch/x86_64/kernel/smpboot.c
+++ b/arch/x86_64/kernel/smpboot.c
@@ -91,8 +91,8 @@
 int smp_threads_ready;
 
 /* representing HT siblings of each logical CPU */
-cpumask_t cpu_sibling_map[NR_CPUS] __read_mostly;
-EXPORT_SYMBOL(cpu_sibling_map);
+DEFINE_PER_CPU(cpumask_t, cpu_sibling_map);
+EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
 
 /* representing HT and core siblings of each logical CPU */
 DEFINE_PER_CPU(cpumask_t, cpu_core_map);
@@ -264,8 +264,8 @@
 		for_each_cpu_mask(i, cpu_sibling_setup_map) {
 			if (c[cpu].phys_proc_id == c[i].phys_proc_id &&
 			    c[cpu].cpu_core_id == c[i].cpu_core_id) {
-				cpu_set(i, cpu_sibling_map[cpu]);
-				cpu_set(cpu, cpu_sibling_map[i]);
+				cpu_set(i, per_cpu(cpu_sibling_map, cpu));
+				cpu_set(cpu, per_cpu(cpu_sibling_map, i));
 				cpu_set(i, per_cpu(cpu_core_map, cpu));
 				cpu_set(cpu, per_cpu(cpu_core_map, i));
 				cpu_set(i, c[cpu].llc_shared_map);
@@ -273,13 +273,13 @@
 			}
 		}
 	} else {
-		cpu_set(cpu, cpu_sibling_map[cpu]);
+		cpu_set(cpu, per_cpu(cpu_sibling_map, cpu));
 	}
 
 	cpu_set(cpu, c[cpu].llc_shared_map);
 
 	if (current_cpu_data.x86_max_cores == 1) {
-		per_cpu(cpu_core_map, cpu) = cpu_sibling_map[cpu];
+		per_cpu(cpu_core_map, cpu) = per_cpu(cpu_sibling_map, cpu);
 		c[cpu].booted_cores = 1;
 		return;
 	}
@@ -296,12 +296,12 @@
 			/*
 			 *  Does this new cpu bringup a new core?
 			 */
-			if (cpus_weight(cpu_sibling_map[cpu]) == 1) {
+			if (cpus_weight(per_cpu(cpu_sibling_map, cpu)) == 1) {
 				/*
 				 * for each core in package, increment
 				 * the booted_cores for this new cpu
 				 */
-				if (first_cpu(cpu_sibling_map[i]) == i)
+				if (first_cpu(per_cpu(cpu_sibling_map, i)) == i)
 					c[cpu].booted_cores++;
 				/*
 				 * increment the core count for all
@@ -741,7 +741,7 @@
 		phys_cpu_present_map = physid_mask_of_physid(boot_cpu_id);
 	else
 		phys_cpu_present_map = physid_mask_of_physid(0);
-	cpu_set(0, cpu_sibling_map[0]);
+	cpu_set(0, per_cpu(cpu_sibling_map, 0));
 	cpu_set(0, per_cpu(cpu_core_map, 0));
 }
 
@@ -982,13 +982,13 @@
 		/*
 		 * last thread sibling in this cpu core going down
 		 */
-		if (cpus_weight(cpu_sibling_map[cpu]) == 1)
+		if (cpus_weight(per_cpu(cpu_sibling_map, cpu)) == 1)
 			c[sibling].booted_cores--;
 	}
 			
-	for_each_cpu_mask(sibling, cpu_sibling_map[cpu])
-		cpu_clear(cpu, cpu_sibling_map[sibling]);
-	cpus_clear(cpu_sibling_map[cpu]);
+	for_each_cpu_mask(sibling, per_cpu(cpu_sibling_map, cpu))
+		cpu_clear(cpu, per_cpu(cpu_sibling_map, sibling));
+	cpus_clear(per_cpu(cpu_sibling_map, cpu));
 	cpus_clear(per_cpu(cpu_core_map, cpu));
 	c[cpu].phys_proc_id = 0;
 	c[cpu].cpu_core_id = 0;
--- a/block/blktrace.c
+++ b/block/blktrace.c
@@ -536,7 +536,7 @@
 	for_each_online_cpu(cpu) {
 		unsigned long long *cpu_off, *sibling_off;
 
-		for_each_cpu_mask(i, cpu_sibling_map[cpu]) {
+		for_each_cpu_mask(i, per_cpu(cpu_sibling_map, cpu)) {
 			if (i == cpu)
 				continue;
 
--- a/include/asm-i386/smp.h
+++ b/include/asm-i386/smp.h
@@ -30,7 +30,7 @@
 extern void smp_alloc_memory(void);
 extern int pic_mode;
 extern int smp_num_siblings;
-extern cpumask_t cpu_sibling_map[];
+DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
 DECLARE_PER_CPU(cpumask_t, cpu_core_map);
 
 extern void (*mtrr_hook) (void);
--- a/include/asm-i386/topology.h
+++ b/include/asm-i386/topology.h
@@ -31,7 +31,7 @@
 #define topology_physical_package_id(cpu)	(cpu_data[cpu].phys_proc_id)
 #define topology_core_id(cpu)			(cpu_data[cpu].cpu_core_id)
 #define topology_core_siblings(cpu)		(per_cpu(cpu_core_map, cpu))
-#define topology_thread_siblings(cpu)		(cpu_sibling_map[cpu])
+#define topology_thread_siblings(cpu)		(per_cpu(cpu_sibling_map, cpu))
 #endif
 
 #ifdef CONFIG_NUMA
--- a/include/asm-x86_64/smp.h
+++ b/include/asm-x86_64/smp.h
@@ -38,12 +38,14 @@
 extern int smp_num_siblings;
 extern void smp_send_reschedule(int cpu);
 
-extern cpumask_t cpu_sibling_map[NR_CPUS];
 /*
- * cpu_core_map lives in a per cpu area
+ * cpu_sibling_map and cpu_core_map now live
+ * in the per cpu area
  *
+ * extern cpumask_t cpu_sibling_map[NR_CPUS];
  * extern cpumask_t cpu_core_map[NR_CPUS];
  */
+DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
 DECLARE_PER_CPU(cpumask_t, cpu_core_map);
 
 extern u8 cpu_llc_id[NR_CPUS];
--- a/include/asm-x86_64/topology.h
+++ b/include/asm-x86_64/topology.h
@@ -72,7 +72,7 @@
 #define topology_physical_package_id(cpu)	(cpu_data[cpu].phys_proc_id)
 #define topology_core_id(cpu)			(cpu_data[cpu].cpu_core_id)
 #define topology_core_siblings(cpu)		(per_cpu(cpu_core_map, cpu))
-#define topology_thread_siblings(cpu)		(cpu_sibling_map[cpu])
+#define topology_thread_siblings(cpu)		(per_cpu(cpu_sibling_map, cpu))
 #define mc_capable()			(boot_cpu_data.x86_max_cores > 1)
 #define smt_capable() 			(smp_num_siblings > 1)
 #endif
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5822,7 +5822,7 @@
 			     struct sched_group **sg)
 {
 	int group;
-	cpumask_t mask = cpu_sibling_map[cpu];
+	cpumask_t mask = per_cpu(cpu_sibling_map, cpu);
 	cpus_and(mask, mask, *cpu_map);
 	group = first_cpu(mask);
 	if (sg)
@@ -5851,7 +5851,7 @@
 	cpus_and(mask, mask, *cpu_map);
 	group = first_cpu(mask);
 #elif defined(CONFIG_SCHED_SMT)
-	cpumask_t mask = cpu_sibling_map[cpu];
+	cpumask_t mask = per_cpu(cpu_sibling_map, cpu);
 	cpus_and(mask, mask, *cpu_map);
 	group = first_cpu(mask);
 #else
@@ -6086,7 +6086,7 @@
 		p = sd;
 		sd = &per_cpu(cpu_domains, i);
 		*sd = SD_SIBLING_INIT;
-		sd->span = cpu_sibling_map[i];
+		sd->span = per_cpu(cpu_sibling_map, i);
 		cpus_and(sd->span, sd->span, *cpu_map);
 		sd->parent = p;
 		p->child = sd;
@@ -6097,7 +6097,7 @@
 #ifdef CONFIG_SCHED_SMT
 	/* Set up CPU (sibling) groups */
 	for_each_cpu_mask(i, *cpu_map) {
-		cpumask_t this_sibling_map = cpu_sibling_map[i];
+		cpumask_t this_sibling_map = per_cpu(cpu_sibling_map, i);
 		cpus_and(this_sibling_map, this_sibling_map, *cpu_map);
 		if (i != first_cpu(this_sibling_map))
 			continue;

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/6] x86: Convert x86_cpu_to_apicid to be a per cpu variable (v2)
  2007-08-24 22:26 [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2) travis
                   ` (2 preceding siblings ...)
  2007-08-24 22:26 ` [PATCH 3/6] x86: Convert cpu_sibling_map " travis
@ 2007-08-24 22:26 ` travis
  2007-08-24 22:26 ` [PATCH 5/6] x86: Convert cpu_llc_id " travis
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: travis @ 2007-08-24 22:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, Andrew Morton, Christoph Lameter

[-- Attachment #1: convert-x86_cpu_to_apicid-to-per_cpu_data --]
[-- Type: text/plain, Size: 9218 bytes --]

This patch converts the x86_cpu_to_apicid array to be a per
cpu variable.  This saves sizeof(apicid) * NR unused cpus.
Access is mostly from startup and CPU HOTPLUG functions.

MP_processor_info() is one of the functions that require access
to the x86_cpu_to_apicid array before the per_cpu data area is
setup.  For this case, a pointer to the __initdata array is
initialized in setup_arch() and removed in smp_prepare_cpus()
after the per_cpu data area is initialized.

A second change is included to change the initial array value
of ARCH i386 from 0xff to BAD_APICID to be consistent with
ARCH x86_64.

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/i386/kernel/acpi/boot.c      |    2 +-
 arch/i386/kernel/smp.c            |    2 +-
 arch/i386/kernel/smpboot.c        |   22 +++++++++++++++-------
 arch/x86_64/kernel/genapic.c      |   15 ++++++++++++---
 arch/x86_64/kernel/genapic_flat.c |    2 +-
 arch/x86_64/kernel/mpparse.c      |   15 +++++++++++++--
 arch/x86_64/kernel/setup.c        |    5 +++++
 arch/x86_64/kernel/smpboot.c      |   23 ++++++++++++++++++++++-
 arch/x86_64/mm/numa.c             |    2 +-
 include/asm-i386/smp.h            |    6 ++++--
 include/asm-x86_64/ipi.h          |    2 +-
 include/asm-x86_64/smp.h          |    6 ++++--
 12 files changed, 80 insertions(+), 22 deletions(-)

--- a/arch/i386/kernel/acpi/boot.c
+++ b/arch/i386/kernel/acpi/boot.c
@@ -555,7 +555,7 @@
 
 int acpi_unmap_lsapic(int cpu)
 {
-	x86_cpu_to_apicid[cpu] = -1;
+	per_cpu(x86_cpu_to_apicid, cpu) = -1;
 	cpu_clear(cpu, cpu_present_map);
 	num_processors--;
 
--- a/arch/i386/kernel/smp.c
+++ b/arch/i386/kernel/smp.c
@@ -676,7 +676,7 @@
 	int i;
 
 	for (i = 0; i < NR_CPUS; i++) {
-		if (x86_cpu_to_apicid[i] == apic_id)
+		if (per_cpu(x86_cpu_to_apicid, i) == apic_id)
 			return i;
 	}
 	return -1;
--- a/arch/i386/kernel/smpboot.c
+++ b/arch/i386/kernel/smpboot.c
@@ -92,9 +92,17 @@
 struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;
 EXPORT_SYMBOL(cpu_data);
 
-u8 x86_cpu_to_apicid[NR_CPUS] __read_mostly =
-			{ [0 ... NR_CPUS-1] = 0xff };
-EXPORT_SYMBOL(x86_cpu_to_apicid);
+/*
+ * The following static array is used during kernel startup
+ * and the x86_cpu_to_apicid_ptr contains the address of the
+ * array during this time.  Is it zeroed when the per_cpu
+ * data area is removed.
+ */
+u8 x86_cpu_to_apicid_init[NR_CPUS] __initdata =
+			{ [0 ... NR_CPUS-1] = BAD_APICID };
+void *x86_cpu_to_apicid_ptr;
+DEFINE_PER_CPU(u8, x86_cpu_to_apicid) = BAD_APICID;
+EXPORT_PER_CPU_SYMBOL(x86_cpu_to_apicid);
 
 u8 apicid_2_node[MAX_APICID];
 
@@ -804,7 +812,7 @@
 
 	irq_ctx_init(cpu);
 
-	x86_cpu_to_apicid[cpu] = apicid;
+	per_cpu(x86_cpu_to_apicid, cpu) = apicid;
 	/*
 	 * This grunge runs the startup process for
 	 * the targeted processor.
@@ -866,7 +874,7 @@
 		cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
 		cpucount--;
 	} else {
-		x86_cpu_to_apicid[cpu] = apicid;
+		per_cpu(x86_cpu_to_apicid, cpu) = apicid;
 		cpu_set(cpu, cpu_present_map);
 	}
 
@@ -915,7 +923,7 @@
 	struct warm_boot_cpu_info info;
 	int	apicid, ret;
 
-	apicid = x86_cpu_to_apicid[cpu];
+	apicid = per_cpu(x86_cpu_to_apicid, cpu);
 	if (apicid == BAD_APICID) {
 		ret = -ENODEV;
 		goto exit;
@@ -965,7 +973,7 @@
 
 	boot_cpu_physical_apicid = GET_APIC_ID(apic_read(APIC_ID));
 	boot_cpu_logical_apicid = logical_smp_processor_id();
-	x86_cpu_to_apicid[0] = boot_cpu_physical_apicid;
+	per_cpu(x86_cpu_to_apicid, 0) = boot_cpu_physical_apicid;
 
 	current_thread_info()->cpu = 0;
 
--- a/arch/x86_64/kernel/mpparse.c
+++ b/arch/x86_64/kernel/mpparse.c
@@ -86,7 +86,7 @@
 	return sum & 0xFF;
 }
 
-static void __cpuinit MP_processor_info (struct mpc_config_processor *m)
+static void __cpuinit MP_processor_info(struct mpc_config_processor *m)
 {
 	int cpu;
 	cpumask_t tmp_map;
@@ -123,7 +123,18 @@
 		cpu = 0;
  	}
 	bios_cpu_apicid[cpu] = m->mpc_apicid;
-	x86_cpu_to_apicid[cpu] = m->mpc_apicid;
+	/*
+	 * We get called early in the the start_kernel initialization
+	 * process when the per_cpu data area is not yet setup, so we
+	 * use a static array that is removed after the per_cpu data
+	 * area is created.
+	 */
+	if (x86_cpu_to_apicid_ptr) {
+		u8 *x86_cpu_to_apicid = (u8 *)x86_cpu_to_apicid_ptr;
+		x86_cpu_to_apicid[cpu] = m->mpc_apicid;
+	} else {
+		per_cpu(x86_cpu_to_apicid, cpu) = m->mpc_apicid;
+	}
 
 	cpu_set(cpu, cpu_possible_map);
 	cpu_set(cpu, cpu_present_map);
--- a/arch/x86_64/kernel/smpboot.c
+++ b/arch/x86_64/kernel/smpboot.c
@@ -701,7 +701,7 @@
 		clear_node_cpumask(cpu); /* was set by numa_add_cpu */
 		cpu_clear(cpu, cpu_present_map);
 		cpu_clear(cpu, cpu_possible_map);
-		x86_cpu_to_apicid[cpu] = BAD_APICID;
+		per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
 		return -EIO;
 	}
 
@@ -848,6 +848,26 @@
 }
 
 /*
+ * Copy apicid's found by MP_processor_info from initial array to the per cpu
+ * data area.  The x86_cpu_to_apicid_init array is then expendable and the
+ * x86_cpu_to_apicid_ptr is zeroed indicating that the static array is no
+ * longer available.
+ */
+void __init smp_set_apicids(void)
+{
+	int cpu;
+
+	for_each_cpu_mask(cpu, cpu_possible_map) {
+		if (per_cpu_offset(cpu))
+			per_cpu(x86_cpu_to_apicid, cpu) =
+						x86_cpu_to_apicid_init[cpu];
+	}
+
+	/* indicate the static array will be going away soon */
+	x86_cpu_to_apicid_ptr = NULL;
+}
+
+/*
  * Prepare for SMP bootup.  The MP table or ACPI has been read
  * earlier.  Just do some sanity checking here and enable APIC mode.
  */
@@ -856,6 +876,7 @@
 	nmi_watchdog_default();
 	current_cpu_data = boot_cpu_data;
 	current_thread_info()->cpu = 0;  /* needed? */
+	smp_set_apicids();
 	set_cpu_sibling_map(0);
 
 	if (smp_sanity_check(max_cpus) < 0) {
--- a/arch/x86_64/mm/numa.c
+++ b/arch/x86_64/mm/numa.c
@@ -615,7 +615,7 @@
 {
 	int i;
  	for (i = 0; i < NR_CPUS; i++) {
-		u8 apicid = x86_cpu_to_apicid[i];
+		u8 apicid = x86_cpu_to_apicid_init[i];
 		if (apicid == BAD_APICID)
 			continue;
 		if (apicid_to_node[apicid] == NUMA_NO_NODE)
--- a/include/asm-i386/smp.h
+++ b/include/asm-i386/smp.h
@@ -39,9 +39,11 @@
 extern void unlock_ipi_call_lock(void);
 
 #define MAX_APICID 256
-extern u8 x86_cpu_to_apicid[];
+extern u8 __initdata x86_cpu_to_apicid_init[];
+extern void *x86_cpu_to_apicid_ptr;
+DECLARE_PER_CPU(u8, x86_cpu_to_apicid);
 
-#define cpu_physical_id(cpu)	x86_cpu_to_apicid[cpu]
+#define cpu_physical_id(cpu)	per_cpu(x86_cpu_to_apicid, cpu)
 
 extern void set_cpu_sibling_map(int cpu);
 
--- a/include/asm-x86_64/ipi.h
+++ b/include/asm-x86_64/ipi.h
@@ -119,7 +119,7 @@
 	 */
 	local_irq_save(flags);
 	for_each_cpu_mask(query_cpu, mask) {
-		__send_IPI_dest_field(x86_cpu_to_apicid[query_cpu],
+		__send_IPI_dest_field(per_cpu(x86_cpu_to_apicid, query_cpu),
 				      vector, APIC_DEST_PHYSICAL);
 	}
 	local_irq_restore(flags);
--- a/include/asm-x86_64/smp.h
+++ b/include/asm-x86_64/smp.h
@@ -85,7 +85,9 @@
  * Some lowlevel functions might want to know about
  * the real APIC ID <-> CPU # mapping.
  */
-extern u8 x86_cpu_to_apicid[NR_CPUS];	/* physical ID */
+extern u8 __initdata x86_cpu_to_apicid_init[];
+extern void *x86_cpu_to_apicid_ptr;
+DECLARE_PER_CPU(u8, x86_cpu_to_apicid);	/* physical ID */
 extern u8 bios_cpu_apicid[];
 
 static inline int cpu_present_to_apicid(int mps_cpu)
@@ -116,7 +118,7 @@
 }
 
 #ifdef CONFIG_SMP
-#define cpu_physical_id(cpu)		x86_cpu_to_apicid[cpu]
+#define cpu_physical_id(cpu)		per_cpu(x86_cpu_to_apicid, cpu)
 #else
 #define cpu_physical_id(cpu)		boot_cpu_id
 #endif /* !CONFIG_SMP */
--- a/arch/x86_64/kernel/genapic_flat.c
+++ b/arch/x86_64/kernel/genapic_flat.c
@@ -172,7 +172,7 @@
 	 */
 	cpu = first_cpu(cpumask);
 	if ((unsigned)cpu < NR_CPUS)
-		return x86_cpu_to_apicid[cpu];
+		return per_cpu(x86_cpu_to_apicid, cpu);
 	else
 		return BAD_APICID;
 }
--- a/arch/x86_64/kernel/genapic.c
+++ b/arch/x86_64/kernel/genapic.c
@@ -24,10 +24,19 @@
 #include <acpi/acpi_bus.h>
 #endif
 
-/* which logical CPU number maps to which CPU (physical APIC ID) */
-u8 x86_cpu_to_apicid[NR_CPUS] __read_mostly
+/*
+ * which logical CPU number maps to which CPU (physical APIC ID)
+ *
+ * The following static array is used during kernel startup
+ * and the x86_cpu_to_apicid_ptr contains the address of the
+ * array during this time.  Is it zeroed when the per_cpu
+ * data area is removed.
+ */
+u8 x86_cpu_to_apicid_init[NR_CPUS] __initdata
 					= { [0 ... NR_CPUS-1] = BAD_APICID };
-EXPORT_SYMBOL(x86_cpu_to_apicid);
+void *x86_cpu_to_apicid_ptr;
+DEFINE_PER_CPU(u8, x86_cpu_to_apicid) = BAD_APICID;
+EXPORT_PER_CPU_SYMBOL(x86_cpu_to_apicid);
 
 struct genapic __read_mostly *genapic = &apic_flat;
 
--- a/arch/x86_64/kernel/setup.c
+++ b/arch/x86_64/kernel/setup.c
@@ -276,6 +276,11 @@
 
 	dmi_scan_machine();
 
+#ifdef CONFIG_SMP
+	/* setup to use the static apicid table during kernel startup */
+	x86_cpu_to_apicid_ptr = (void *)&x86_cpu_to_apicid_init;
+#endif
+
 #ifdef CONFIG_ACPI
 	/*
 	 * Initialize the ACPI boot-time table parser (gets the RSDP and SDT).

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 5/6] x86: Convert cpu_llc_id to be a per cpu variable (v2)
  2007-08-24 22:26 [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2) travis
                   ` (3 preceding siblings ...)
  2007-08-24 22:26 ` [PATCH 4/6] x86: Convert x86_cpu_to_apicid " travis
@ 2007-08-24 22:26 ` travis
  2007-08-24 22:27 ` [PATCH 6/6] x86: acpi-use-cpu_physical_id (v2) travis
  2007-08-25  0:50 ` [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2) Siddha, Suresh B
  6 siblings, 0 replies; 18+ messages in thread
From: travis @ 2007-08-24 22:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, Andrew Morton, Christoph Lameter

[-- Attachment #1: convert-cpu_llc_id-to-per_cpu_data --]
[-- Type: text/plain, Size: 3950 bytes --]

Convert cpu_llc_id from a static array sized by NR_CPUS to a
per_cpu variable.  This saves sizeof(cpu_llc_id) * NR unused
cpus.  Access is mostly from startup and CPU HOTPLUG functions.

Note there's an addtional change of the type of cpu_llc_id
from int to u8 for ARCH i386 to correspond with the same
type in ARCH x86_64.

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/i386/kernel/cpu/intel_cacheinfo.c |    4 ++--
 arch/i386/kernel/smpboot.c             |    6 +++---
 arch/x86_64/kernel/smpboot.c           |    6 +++---
 include/asm-i386/processor.h           |    6 +++++-
 include/asm-x86_64/smp.h               |    9 ++++-----
 5 files changed, 17 insertions(+), 14 deletions(-)

--- a/arch/i386/kernel/cpu/intel_cacheinfo.c
+++ b/arch/i386/kernel/cpu/intel_cacheinfo.c
@@ -417,14 +417,14 @@
 	if (new_l2) {
 		l2 = new_l2;
 #ifdef CONFIG_X86_HT
-		cpu_llc_id[cpu] = l2_id;
+		per_cpu(cpu_llc_id, cpu) = l2_id;
 #endif
 	}
 
 	if (new_l3) {
 		l3 = new_l3;
 #ifdef CONFIG_X86_HT
-		cpu_llc_id[cpu] = l3_id;
+		per_cpu(cpu_llc_id, cpu) = l3_id;
 #endif
 	}
 
--- a/arch/i386/kernel/smpboot.c
+++ b/arch/i386/kernel/smpboot.c
@@ -67,7 +67,7 @@
 EXPORT_SYMBOL(smp_num_siblings);
 
 /* Last level cache ID of each logical CPU */
-int cpu_llc_id[NR_CPUS] __cpuinitdata = {[0 ... NR_CPUS-1] = BAD_APICID};
+DEFINE_PER_CPU(u8, cpu_llc_id) = BAD_APICID;
 
 /* representing HT siblings of each logical CPU */
 DEFINE_PER_CPU(cpumask_t, cpu_sibling_map);
@@ -348,8 +348,8 @@
 	}
 
 	for_each_cpu_mask(i, cpu_sibling_setup_map) {
-		if (cpu_llc_id[cpu] != BAD_APICID &&
-		    cpu_llc_id[cpu] == cpu_llc_id[i]) {
+		if (per_cpu(cpu_llc_id, cpu) != BAD_APICID &&
+		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
 			cpu_set(i, c[cpu].llc_shared_map);
 			cpu_set(cpu, c[i].llc_shared_map);
 		}
--- a/arch/x86_64/kernel/smpboot.c
+++ b/arch/x86_64/kernel/smpboot.c
@@ -65,7 +65,7 @@
 EXPORT_SYMBOL(smp_num_siblings);
 
 /* Last level cache ID of each logical CPU */
-u8 cpu_llc_id[NR_CPUS] __cpuinitdata  = {[0 ... NR_CPUS-1] = BAD_APICID};
+DEFINE_PER_CPU(u8, cpu_llc_id) = BAD_APICID;
 
 /* Bitmask of currently online CPUs */
 cpumask_t cpu_online_map __read_mostly;
@@ -285,8 +285,8 @@
 	}
 
 	for_each_cpu_mask(i, cpu_sibling_setup_map) {
-		if (cpu_llc_id[cpu] != BAD_APICID &&
-		    cpu_llc_id[cpu] == cpu_llc_id[i]) {
+		if (per_cpu(cpu_llc_id, cpu) != BAD_APICID &&
+		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
 			cpu_set(i, c[cpu].llc_shared_map);
 			cpu_set(cpu, c[i].llc_shared_map);
 		}
--- a/include/asm-i386/processor.h
+++ b/include/asm-i386/processor.h
@@ -110,7 +110,11 @@
 #define current_cpu_data boot_cpu_data
 #endif
 
-extern	int cpu_llc_id[NR_CPUS];
+/*
+ * the following now lives in the per cpu area:
+ * extern	int cpu_llc_id[NR_CPUS];
+ */
+DECLARE_PER_CPU(u8, cpu_llc_id);
 extern char ignore_fpu_irq;
 
 void __init cpu_detect(struct cpuinfo_x86 *c);
--- a/include/asm-x86_64/smp.h
+++ b/include/asm-x86_64/smp.h
@@ -39,16 +39,14 @@
 extern void smp_send_reschedule(int cpu);
 
 /*
- * cpu_sibling_map and cpu_core_map now live
- * in the per cpu area
- *
+ * the following now live in the per cpu area:
  * extern cpumask_t cpu_sibling_map[NR_CPUS];
  * extern cpumask_t cpu_core_map[NR_CPUS];
+ * extern u8 cpu_llc_id[NR_CPUS];
  */
 DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
 DECLARE_PER_CPU(cpumask_t, cpu_core_map);
-
-extern u8 cpu_llc_id[NR_CPUS];
+DECLARE_PER_CPU(u8, cpu_llc_id);
 
 #define SMP_TRAMPOLINE_BASE 0x6000
 
@@ -120,6 +118,7 @@
 #ifdef CONFIG_SMP
 #define cpu_physical_id(cpu)		per_cpu(x86_cpu_to_apicid, cpu)
 #else
+extern unsigned int boot_cpu_id;
 #define cpu_physical_id(cpu)		boot_cpu_id
 #endif /* !CONFIG_SMP */
 #endif

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 6/6] x86: acpi-use-cpu_physical_id (v2)
  2007-08-24 22:26 [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2) travis
                   ` (4 preceding siblings ...)
  2007-08-24 22:26 ` [PATCH 5/6] x86: Convert cpu_llc_id " travis
@ 2007-08-24 22:27 ` travis
  2007-08-25  0:50 ` [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2) Siddha, Suresh B
  6 siblings, 0 replies; 18+ messages in thread
From: travis @ 2007-08-24 22:27 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, Andrew Morton, Christoph Lameter

[-- Attachment #1: acpi-use-cpu_physical_id --]
[-- Type: text/plain, Size: 1622 bytes --]

This is from an earlier message from Christoph Lameter:

    processor_core.c currently tries to determine the apicid by special casing
    for IA64 and x86. The desired information is readily available via

	    cpu_physical_id()

    on IA64, i386 and x86_64.

    Signed-off-by: Christoph Lameter <clameter@sgi.com>

Additionally, boot_cpu_id needed to be exported to fix compile errors in
dma code when !CONFIG_SMP.

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86_64/kernel/mpparse.c  |    2 ++
 drivers/acpi/processor_core.c |    8 +-------
 2 files changed, 3 insertions(+), 7 deletions(-)

--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -420,12 +420,6 @@
 	return 0;
 }
 
-#ifdef CONFIG_IA64
-#define arch_cpu_to_apicid 	ia64_cpu_to_sapicid
-#else
-#define arch_cpu_to_apicid 	x86_cpu_to_apicid
-#endif
-
 static int map_madt_entry(u32 acpi_id)
 {
 	unsigned long madt_end, entry;
@@ -499,7 +493,7 @@
 		return apic_id;
 
 	for (i = 0; i < NR_CPUS; ++i) {
-		if (arch_cpu_to_apicid[i] == apic_id)
+		if (cpu_physical_id(i) == apic_id)
 			return i;
 	}
 	return -1;
--- a/arch/x86_64/kernel/mpparse.c
+++ b/arch/x86_64/kernel/mpparse.c
@@ -57,6 +57,8 @@
 
 /* Processor that is doing the boot up */
 unsigned int boot_cpu_id = -1U;
+EXPORT_SYMBOL(boot_cpu_id);
+
 /* Internal processor count */
 unsigned int num_processors __cpuinitdata = 0;
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/6] x86: fix cpu_to_node references (v2)
  2007-08-24 22:26 ` [PATCH 1/6] x86: fix cpu_to_node references (v2) travis
@ 2007-08-25  0:23   ` Siddha, Suresh B
  2007-08-27 19:47     ` Mike Travis
  0 siblings, 1 reply; 18+ messages in thread
From: Siddha, Suresh B @ 2007-08-25  0:23 UTC (permalink / raw)
  To: travis; +Cc: Andi Kleen, linux-mm, linux-kernel, Andrew Morton,
	Christoph Lameter

On Fri, Aug 24, 2007 at 03:26:55PM -0700, travis@sgi.com wrote:
> Fix four instances where cpu_to_node is referenced
> by array instead of via the cpu_to_node macro.  This
> is preparation to moving it to the per_cpu data area.
> 
...

>  unsigned long __init numa_free_all_bootmem(void) 
> --- a/arch/x86_64/mm/srat.c
> +++ b/arch/x86_64/mm/srat.c
> @@ -431,9 +431,9 @@
>  			setup_node_bootmem(i, nodes[i].start, nodes[i].end);
>  
>  	for (i = 0; i < NR_CPUS; i++) {
> -		if (cpu_to_node[i] == NUMA_NO_NODE)
> +		if (cpu_to_node(i) == NUMA_NO_NODE)
>  			continue;
> -		if (!node_isset(cpu_to_node[i], node_possible_map))
> +		if (!node_isset(cpu_to_node(i), node_possible_map))
>  			numa_set_node(i, NUMA_NO_NODE);
>  	}
>  	numa_init_array();

During this particular routine execution, per cpu areas are not yet setup. In
future, when we make cpu_to_node(i) use per cpu area, then this code will break.

And actually setup_per_cpu_areas() uses cpu_to_node(). So...

thanks,
suresh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2)
  2007-08-24 22:26 [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2) travis
                   ` (5 preceding siblings ...)
  2007-08-24 22:27 ` [PATCH 6/6] x86: acpi-use-cpu_physical_id (v2) travis
@ 2007-08-25  0:50 ` Siddha, Suresh B
  2007-08-25  9:24   ` Andi Kleen
  6 siblings, 1 reply; 18+ messages in thread
From: Siddha, Suresh B @ 2007-08-25  0:50 UTC (permalink / raw)
  To: travis; +Cc: Andi Kleen, linux-mm, linux-kernel, Andrew Morton,
	Christoph Lameter

On Fri, Aug 24, 2007 at 03:26:54PM -0700, travis@sgi.com wrote:
> Previous Intro:

Thanks for doing this.

> In x86_64 and i386 architectures most arrays that are sized
> using NR_CPUS lay in local memory on node 0.  Not only will most
> (99%?) of the systems not use all the slots in these arrays,
> particularly when NR_CPUS is increased to accommodate future
> very high cpu count systems, but a number of cache lines are
> passed unnecessarily on the system bus when these arrays are
> referenced by cpus on other nodes.

Can we move cpuinfo_x86 also to per cpu area? Though critical run
time code doesn't access this area, it will be nice to move the cpuinfo_x86
also into per cpu area.

Perhaps the current cpuinfo_x86 layout might cause confusion and make people
add arch specific per cpu elements into cpuinfo_x86(thinking that it uses per
cpu area).

Wonder if this confusion is the reason for git commit f3fa8ebc

thanks,
suresh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2)
  2007-08-25  0:50 ` [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2) Siddha, Suresh B
@ 2007-08-25  9:24   ` Andi Kleen
  2007-08-25 16:52     ` Randy Dunlap
  2007-08-27 18:46     ` Mike Travis
  0 siblings, 2 replies; 18+ messages in thread
From: Andi Kleen @ 2007-08-25  9:24 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: travis, Andi Kleen, linux-mm, linux-kernel, Andrew Morton,
	Christoph Lameter

On Fri, Aug 24, 2007 at 05:50:18PM -0700, Siddha, Suresh B wrote:
> On Fri, Aug 24, 2007 at 03:26:54PM -0700, travis@sgi.com wrote:
> > Previous Intro:
> 
> Thanks for doing this.
> 
> > In x86_64 and i386 architectures most arrays that are sized
> > using NR_CPUS lay in local memory on node 0.  Not only will most
> > (99%?) of the systems not use all the slots in these arrays,
> > particularly when NR_CPUS is increased to accommodate future
> > very high cpu count systems, but a number of cache lines are
> > passed unnecessarily on the system bus when these arrays are
> > referenced by cpus on other nodes.
> 
> Can we move cpuinfo_x86 also to per cpu area? Though critical run

I worry how much impact that would be? boot_cpu_data is quite 
widely used. 

> Wonder if this confusion is the reason for git commit f3fa8ebc

What git commit (full id) ? 

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2)
  2007-08-25  9:24   ` Andi Kleen
@ 2007-08-25 16:52     ` Randy Dunlap
  2007-08-27 18:46     ` Mike Travis
  1 sibling, 0 replies; 18+ messages in thread
From: Randy Dunlap @ 2007-08-25 16:52 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Siddha, Suresh B, travis, linux-mm, linux-kernel, Andrew Morton,
	Christoph Lameter

On Sat, 25 Aug 2007 11:24:35 +0200 Andi Kleen wrote:

> On Fri, Aug 24, 2007 at 05:50:18PM -0700, Siddha, Suresh B wrote:
> > On Fri, Aug 24, 2007 at 03:26:54PM -0700, travis@sgi.com wrote:
> > > Previous Intro:
> > 
> > Thanks for doing this.
> > 
> > > In x86_64 and i386 architectures most arrays that are sized
> > > using NR_CPUS lay in local memory on node 0.  Not only will most
> > > (99%?) of the systems not use all the slots in these arrays,
> > > particularly when NR_CPUS is increased to accommodate future
> > > very high cpu count systems, but a number of cache lines are
> > > passed unnecessarily on the system bus when these arrays are
> > > referenced by cpus on other nodes.
> > 
> > Can we move cpuinfo_x86 also to per cpu area? Though critical run
> 
> I worry how much impact that would be? boot_cpu_data is quite 
> widely used. 
> 
> > Wonder if this confusion is the reason for git commit f3fa8ebc
> 
> What git commit (full id) ? 

Looks like it's
commit f3fa8ebc25129bb69929e20b0c84049c39029d8d
Author: Rohit Seth <rohitseth@google.com>
Date:   Mon Jun 26 13:58:17 2006 +0200

    [PATCH] x86_64: moving phys_proc_id and cpu_core_id to cpuinfo_x86

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2)
  2007-08-25  9:24   ` Andi Kleen
  2007-08-25 16:52     ` Randy Dunlap
@ 2007-08-27 18:46     ` Mike Travis
  1 sibling, 0 replies; 18+ messages in thread
From: Mike Travis @ 2007-08-27 18:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Siddha, Suresh B, travis, linux-mm, linux-kernel, Andrew Morton,
	Christoph Lameter


On Sat, 25 Aug 2007, Andi Kleen wrote:

> On Fri, Aug 24, 2007 at 05:50:18PM -0700, Siddha, Suresh B wrote:
> > On Fri, Aug 24, 2007 at 03:26:54PM -0700, travis@sgi.com wrote:
> > > Previous Intro:
> >
> > Thanks for doing this.
> >
> > > In x86_64 and i386 architectures most arrays that are sized
> > > using NR_CPUS lay in local memory on node 0.  Not only will most
> > > (99%?) of the systems not use all the slots in these arrays,
> > > particularly when NR_CPUS is increased to accommodate future
> > > very high cpu count systems, but a number of cache lines are
> > > passed unnecessarily on the system bus when these arrays are
> > > referenced by cpus on other nodes.
> >
> > Can we move cpuinfo_x86 also to per cpu area? Though critical run
>
> I worry how much impact that would be? boot_cpu_data is quite
> widely used.
>

I looked at this and it would be a big memory savings.  But I haven't
yet analyzed the various accesses to verify that we can cleanly move
the structure, and that we don't suffer a bunch of tlb misses because
accesses are primarily from node 0.

More info soon.

Thanks,
Mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/6] x86: fix cpu_to_node references (v2)
  2007-08-25  0:23   ` Siddha, Suresh B
@ 2007-08-27 19:47     ` Mike Travis
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Travis @ 2007-08-27 19:47 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: travis, Andi Kleen, linux-mm, linux-kernel, Andrew Morton,
	Christoph Lameter


On Fri, 24 Aug 2007, Siddha, Suresh B wrote:

> On Fri, Aug 24, 2007 at 03:26:55PM -0700, travis@sgi.com wrote:
> > Fix four instances where cpu_to_node is referenced
> > by array instead of via the cpu_to_node macro.  This
> > is preparation to moving it to the per_cpu data area.
> >
> ...
>
> >  unsigned long __init numa_free_all_bootmem(void)
> > --- a/arch/x86_64/mm/srat.c
> > +++ b/arch/x86_64/mm/srat.c
> > @@ -431,9 +431,9 @@
> >  			setup_node_bootmem(i, nodes[i].start, nodes[i].end);
> >
> >  	for (i = 0; i < NR_CPUS; i++) {
> > -		if (cpu_to_node[i] == NUMA_NO_NODE)
> > +		if (cpu_to_node(i) == NUMA_NO_NODE)
> >  			continue;
> > -		if (!node_isset(cpu_to_node[i], node_possible_map))
> > +		if (!node_isset(cpu_to_node(i), node_possible_map))
> >  			numa_set_node(i, NUMA_NO_NODE);
> >  	}
> >  	numa_init_array();
>
> During this particular routine execution, per cpu areas are not yet setup. In
> future, when we make cpu_to_node(i) use per cpu area, then this code will break.
>
> And actually setup_per_cpu_areas() uses cpu_to_node(). So...
>

I have a scheme to use an __initdata array during __init processing which
is removed after the per cpu data area is setup.  I'm looking more closely
at all the various node <--> cpu tables.

Thanks,
Mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] x86: Convert cpu_sibling_map to be a per cpu variable (v2)
  2007-08-24 22:26 ` [PATCH 3/6] x86: Convert cpu_sibling_map " travis
@ 2007-09-01  2:49   ` Andrew Morton
  2007-09-01 11:34     ` Kamalesh Babulal
  2007-09-01 14:06     ` Andi Kleen
  0 siblings, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2007-09-01  2:49 UTC (permalink / raw)
  To: travis; +Cc: Andi Kleen, linux-mm, linux-kernel, Christoph Lameter

On Fri, 24 Aug 2007 15:26:57 -0700 travis@sgi.com wrote:

> Convert cpu_sibling_map from a static array sized by NR_CPUS to a
> per_cpu variable.  This saves sizeof(cpumask_t) * NR unused cpus.
> Access is mostly from startup and CPU HOTPLUG functions.

ia64 allmodconfig:

kernel/sched.c: In function `cpu_to_phys_group':                                                                             kernel/sched.c:5937: error: `per_cpu__cpu_sibling_map' undeclared (first use in this function)                               kernel/sched.c:5937: error: (Each undeclared identifier is reported only once
kernel/sched.c:5937: error: for each function it appears in.)                                                                kernel/sched.c:5937: warning: type defaults to `int' in declaration of `type name'
kernel/sched.c:5937: error: invalid type argument of `unary *'                                                               kernel/sched.c: In function `build_sched_domains':                                                                           kernel/sched.c:6172: error: `per_cpu__cpu_sibling_map' undeclared (first use in this function)                               kernel/sched.c:6172: warning: type defaults to `int' in declaration of `type name'                                           kernel/sched.c:6172: error: invalid type argument of `unary *'                                                               kernel/sched.c:6183: warning: type defaults to `int' in declaration of `type name'                                           kernel/sched.c:6183: error: invalid type argument of `unary *'                                                               

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] x86: Convert cpu_sibling_map to be a per cpu variable (v2)
  2007-09-01  2:49   ` Andrew Morton
@ 2007-09-01 11:34     ` Kamalesh Babulal
  2007-09-01 16:10       ` Andrew Morton
  2007-09-01 14:06     ` Andi Kleen
  1 sibling, 1 reply; 18+ messages in thread
From: Kamalesh Babulal @ 2007-09-01 11:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: travis, Andi Kleen, linux-mm, linux-kernel, Christoph Lameter

Andrew Morton wrote:
> On Fri, 24 Aug 2007 15:26:57 -0700 travis@sgi.com wrote:
>
>   
>> Convert cpu_sibling_map from a static array sized by NR_CPUS to a
>> per_cpu variable.  This saves sizeof(cpumask_t) * NR unused cpus.
>> Access is mostly from startup and CPU HOTPLUG functions.
>>     
>
> ia64 allmodconfig:
>
> kernel/sched.c: In function `cpu_to_phys_group':                                                                             kernel/sched.c:5937: error: `per_cpu__cpu_sibling_map' undeclared (first use in this function)                               kernel/sched.c:5937: error: (Each undeclared identifier is reported only once
> kernel/sched.c:5937: error: for each function it appears in.)                                                                kernel/sched.c:5937: warning: type defaults to `int' in declaration of `type name'
> kernel/sched.c:5937: error: invalid type argument of `unary *'                                                               kernel/sched.c: In function `build_sched_domains':                                                                           kernel/sched.c:6172: error: `per_cpu__cpu_sibling_map' undeclared (first use in this function)                               kernel/sched.c:6172: warning: type defaults to `int' in declaration of `type name'                                           kernel/sched.c:6172: error: invalid type argument of `unary *'                                                               kernel/sched.c:6183: warning: type defaults to `int' in declaration of `type name'                                           kernel/sched.c:6183: error: invalid type argument of `unary *'                                                               
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>   
Hi Andrew,

I get the exact build failure on ppc64 machine with 2.6.23-rc4-mm1.

-
Kamalesh Babulal.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] x86: Convert cpu_sibling_map to be a per cpu variable (v2)
  2007-09-01  2:49   ` Andrew Morton
  2007-09-01 11:34     ` Kamalesh Babulal
@ 2007-09-01 14:06     ` Andi Kleen
  1 sibling, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2007-09-01 14:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: travis, linux-mm, linux-kernel, Christoph Lameter

On Saturday 01 September 2007 04:49, Andrew Morton wrote:
> On Fri, 24 Aug 2007 15:26:57 -0700 travis@sgi.com wrote:
> > Convert cpu_sibling_map from a static array sized by NR_CPUS to a
> > per_cpu variable.  This saves sizeof(cpumask_t) * NR unused cpus.
> > Access is mostly from startup and CPU HOTPLUG functions.

The patchset was broken anyways even on x86-64 because of the 
ordering issues at early boot Suresh pointed out.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] x86: Convert cpu_sibling_map to be a per cpu variable (v2)
  2007-09-01 11:34     ` Kamalesh Babulal
@ 2007-09-01 16:10       ` Andrew Morton
  2007-09-02 11:48         ` Kamalesh Babulal
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2007-09-01 16:10 UTC (permalink / raw)
  To: Kamalesh Babulal; +Cc: travis, ak, linux-mm, linux-kernel, clameter

> On Sat, 01 Sep 2007 17:04:06 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> Andrew Morton wrote:
> > On Fri, 24 Aug 2007 15:26:57 -0700 travis@sgi.com wrote:
> >
> >   
> >> Convert cpu_sibling_map from a static array sized by NR_CPUS to a
> >> per_cpu variable.  This saves sizeof(cpumask_t) * NR unused cpus.
> >> Access is mostly from startup and CPU HOTPLUG functions.
> >>     
> >
> > ia64 allmodconfig:
> >
> > kernel/sched.c: In function `cpu_to_phys_group':                                                                             kernel/sched.c:5937: error: `per_cpu__cpu_sibling_map' undeclared (first use in this function)                               kernel/sched.c:5937: error: (Each undeclared identifier is reported only once
> > kernel/sched.c:5937: error: for each function it appears in.)                                                                kernel/sched.c:5937: warning: type defaults to `int' in declaration of `type name'
> > kernel/sched.c:5937: error: invalid type argument of `unary *'                                                               kernel/sched.c: In function `build_sched_domains':                                                                           kernel/sched.c:6172: error: `per_cpu__cpu_sibling_map' undeclared (first use in this function)                               kernel/sched.c:6172: warning: type defaults to `int' in declaration of `type name'                                           kernel/sched.c:6172: error: invalid type argument of `unary *'                                                               kernel/sched.c:6183: warning: type defaults to `int' in declaration of `type name'                                           kernel/sched.c:6183: error: invalid type argument of `unary *'                                                               
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> >
> >   
> Hi Andrew,
> 
> I get the exact build failure on ppc64 machine with 2.6.23-rc4-mm1.
> 

The ia64 workaround was to disable SCHED_SMT.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] x86: Convert cpu_sibling_map to be a per cpu variable (v2)
  2007-09-01 16:10       ` Andrew Morton
@ 2007-09-02 11:48         ` Kamalesh Babulal
  0 siblings, 0 replies; 18+ messages in thread
From: Kamalesh Babulal @ 2007-09-02 11:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: travis, ak, linux-mm, linux-kernel, clameter

Andrew Morton wrote:
>> On Sat, 01 Sep 2007 17:04:06 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>> Andrew Morton wrote:
>>     
>>> On Fri, 24 Aug 2007 15:26:57 -0700 travis@sgi.com wrote:
>>>
>>>   
>>>       
>>>> Convert cpu_sibling_map from a static array sized by NR_CPUS to a
>>>> per_cpu variable.  This saves sizeof(cpumask_t) * NR unused cpus.
>>>> Access is mostly from startup and CPU HOTPLUG functions.
>>>>     
>>>>         
>>> ia64 allmodconfig:
>>>
>>> kernel/sched.c: In function `cpu_to_phys_group':                                                                             kernel/sched.c:5937: error: `per_cpu__cpu_sibling_map' undeclared (first use in this function)                               kernel/sched.c:5937: error: (Each undeclared identifier is reported only once
>>> kernel/sched.c:5937: error: for each function it appears in.)                                                                kernel/sched.c:5937: warning: type defaults to `int' in declaration of `type name'
>>> kernel/sched.c:5937: error: invalid type argument of `unary *'                                                               kernel/sched.c: In function `build_sched_domains':                                                                           kernel/sched.c:6172: error: `per_cpu__cpu_sibling_map' undeclared (first use in this function)                               kernel/sched.c:6172: warning: type defaults to `int' in declaration of `type name'                                           kernel/sched.c:6172: error: invalid type argument of `unary *'                                                               kernel/sched.c:6183: warning: type defaults to `int' in declaration of `type name'                                           kernel/sched.c:6183: error: invalid type argument of `unary *'                                                               
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>
>>>
>>>   
>>>       
>> Hi Andrew,
>>
>> I get the exact build failure on ppc64 machine with 2.6.23-rc4-mm1.
>>
>>     
>
> The ia64 workaround was to disable SCHED_SMT.
>   
Hi Andrew,

Same workaround works with ppc64 also.

Thanks & Regards,
Kamalesh Babulal.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2007-09-02 11:48 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-24 22:26 [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2) travis
2007-08-24 22:26 ` [PATCH 1/6] x86: fix cpu_to_node references (v2) travis
2007-08-25  0:23   ` Siddha, Suresh B
2007-08-27 19:47     ` Mike Travis
2007-08-24 22:26 ` [PATCH 2/6] x86: Convert cpu_core_map to be a per cpu variable (v2) travis
2007-08-24 22:26 ` [PATCH 3/6] x86: Convert cpu_sibling_map " travis
2007-09-01  2:49   ` Andrew Morton
2007-09-01 11:34     ` Kamalesh Babulal
2007-09-01 16:10       ` Andrew Morton
2007-09-02 11:48         ` Kamalesh Babulal
2007-09-01 14:06     ` Andi Kleen
2007-08-24 22:26 ` [PATCH 4/6] x86: Convert x86_cpu_to_apicid " travis
2007-08-24 22:26 ` [PATCH 5/6] x86: Convert cpu_llc_id " travis
2007-08-24 22:27 ` [PATCH 6/6] x86: acpi-use-cpu_physical_id (v2) travis
2007-08-25  0:50 ` [PATCH 0/6] x86: Reduce Memory Usage and Inter-Node message traffic (v2) Siddha, Suresh B
2007-08-25  9:24   ` Andi Kleen
2007-08-25 16:52     ` Randy Dunlap
2007-08-27 18:46     ` Mike Travis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).