[PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded
@ 2008-04-26  0:15 Mike Travis
  2008-04-26  0:15 ` [PATCH 01/11] x86: Modify Kconfig to allow up to 4096 cpus Mike Travis
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: Mike Travis @ 2008-04-26  0:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel


1/11:	Increase the limit of NR_CPUS to 4096 and introduce a boolean
	called "MAXSMP" which when set (e.g. "allyesconfig"), will set
	NR_CPUS = 4096 and NODES_SHIFT = 9 (512).  Changed max setting
	for NODES_SHIFT from 15 to 9 to accurately reflect the real limit.

2/11:	Introduce a new PER_CPU macro called "EARLY_PER_CPU".  This is
	used by some per_cpu variables that are initialized and accessed
	before there are per_cpu areas allocated.

	Add a flag "arch_provides_topology_pointers" that indicates pointers
	to topology cpumask_t maps are available.  Otherwise, use the function
	returning the cpumask_t value.  This is useful if cpumask_t set size
	is very large to avoid copying data on to/off of the stack.

3/11:	Restore the nodenumber field in the x86_64 pda.  This field is slightly
	different than the x86_cpu_to_node_map mainly because it's a static
	indication of which node the cpu is on while the cpu to node map is a
	dyanamic mapping that may get reset if the cpu goes offline.  This also
	simplifies the numa_node_id() macro.

4/11:	Consolidate node_to_cpumask operations and remove the 256k
	byte node_to_cpumask_map.  This is done by allocating the
	node_to_cpumask_map array after the number of possible
	nodes (nr_node_ids) is known.

5/11:	Replace usages of MAX_NUMNODES with nr_node_ids in kernel/sched.c,
	where appropriate.  This saves some allocated space as well as many
	wasted cycles going through node entries that are non-existent.

6/11:	Changed some global definitions in drivers/base/cpu.c to static.

7/11:	Remove 544k bytes from the kernel by removing the boot_cpu_pda
	array from the data section and allocating it during startup.

8/11:	Increase performance for systems with large count NR_CPUS by
	limiting the range of the cpumask operators that loop over
	the bits in a cpumask_t variable.  This removes a large amount
	of wasted cpu cycles.

9/11:	Change references from for_each_cpu_mask to for_each_cpu_mask_ptr
	in all cases for x86_64 and generic code.

10/11:	Change references from next_cpu to next_cpu_nr (or for_each_cpu_mask_ptr
	if applicable), in all cases for x86_64 and generic code.

11/11:  Pass reference to cpumask variable in net/sunrpc/svc.c


For inclusion into sched-devel/latest tree.

Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git


Signed-off-by: Mike Travis <travis@sgi.com>
---
v5-folded:
Folded in patches 7 - 11 above.
Fixed some warnings in NONUMA config build.

v4-folded:
Folded in Kconfig changes to increase NR_CPU limit to 4096 and add new
config option MAXSMP.

v3-folded:
Folded in follow on "fix" patches to consolidate changes into one place.
Includes change to drivers/base/topology.c to fix s390 build error.
Includes change to fix preemption warning when numa_node_id is used.
checkpatch.pl errors/warnings checked and fixed where possible.

v2: remerged PATCH 2/2 with latest x86.git/latest and sched-devel/latest,
rebuilt and retested 4k-akpm2, 4k-defconfig, nonuma, & nosmp configs.

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/11] x86: Modify Kconfig to allow up to 4096 cpus
  2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
@ 2008-04-26  0:15 ` Mike Travis
  2008-04-27 10:39   ` Pavel Machek
  2008-04-26  0:15 ` [PATCH 02/11] x86: cleanup early per cpu variables/accesses v4 Mike Travis
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Mike Travis @ 2008-04-26  0:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel

[-- Attachment #1: MAXSMP-Kconfig --]
[-- Type: text/plain, Size: 2515 bytes --]

  * Increase the limit of NR_CPUS to 4096 and introduce a boolean
    called "MAXSMP" which when set (e.g. "allyesconfig"), will set
    NR_CPUS = 4096 and NODES_SHIFT = 9 (512).
    
  * Changed max setting for NODES_SHIFT from 15 to 9 to accurately
    reflect the real limit.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/Kconfig |   37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

--- linux-2.6.sched.orig/arch/x86/Kconfig
+++ linux-2.6.sched/arch/x86/Kconfig
@@ -513,20 +513,35 @@ config SWIOTLB
 	  access 32-bits of memory can be used on systems with more than
 	  3 GB of memory. If unsure, say Y.
 
+config MAXSMP
+	bool "Configure Maximum number of SMP Processors and NUMA Nodes"
+	depends on X86_64 && SMP
+	default n
+	help
+	  Configure maximum number of CPUS and NUMA Nodes for this architecture.
+	  If unsure, say N.
+
+if MAXSMP
+config NR_CPUS
+	int
+	default "4096"
+endif
 
+if !MAXSMP
 config NR_CPUS
-	int "Maximum number of CPUs (2-255)"
-	range 2 255
+	int "Maximum number of CPUs (2-4096)"
+	range 2 4096
 	depends on SMP
 	default "32" if X86_NUMAQ || X86_SUMMIT || X86_BIGSMP || X86_ES7000
 	default "8"
 	help
 	  This allows you to specify the maximum number of CPUs which this
-	  kernel will support.  The maximum supported value is 255 and the
+	  kernel will support.  The maximum supported value is 4096 and the
 	  minimum value which makes sense is 2.
 
 	  This is purely to save memory - each supported CPU adds
 	  approximately eight kilobytes to the kernel image.
+endif
 
 config SCHED_SMT
 	bool "SMT (Hyperthreading) scheduler support"
@@ -917,13 +932,25 @@ config NUMA_EMU
 	  into virtual nodes when booted with "numa=fake=N", where N is the
 	  number of nodes. This is only useful for debugging.
 
+if MAXSMP
+
 config NODES_SHIFT
-	int "Max num nodes shift(1-15)"
-	range 1 15  if X86_64
+	int
+	default "9"
+endif
+
+if !MAXSMP
+config NODES_SHIFT
+	int "Maximum NUMA Nodes (as a power of 2)"
+	range 1 9   if X86_64
 	default "6" if X86_64
 	default "4" if X86_NUMAQ
 	default "3"
 	depends on NEED_MULTIPLE_NODES
+	help
+	  Specify the maximum number of NUMA Nodes available on the target
+	  system.  Increases memory reserved to accomodate various tables.
+endif
 
 config HAVE_ARCH_BOOTMEM_NODE
 	def_bool y

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 02/11] x86: cleanup early per cpu variables/accesses v4
  2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
  2008-04-26  0:15 ` [PATCH 01/11] x86: Modify Kconfig to allow up to 4096 cpus Mike Travis
@ 2008-04-26  0:15 ` Mike Travis
  2008-04-26  0:15 ` [PATCH 03/11] x86: restore pda nodenumber field Mike Travis
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Mike Travis @ 2008-04-26  0:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel

[-- Attachment #1: early_ptr --]
[-- Type: text/plain, Size: 28110 bytes --]

  * Introduce a new PER_CPU macro called "EARLY_PER_CPU".  This is
    used by some per_cpu variables that are initialized and accessed
    before there are per_cpu areas allocated.

    ["Early" in respect to per_cpu variables is "earlier than the per_cpu
    areas have been setup".]

    This patchset adds these new macros:

	DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)
	EXPORT_EARLY_PER_CPU_SYMBOL(_name)
	DECLARE_EARLY_PER_CPU(_type, _name)

	early_per_cpu_ptr(_name)
	early_per_cpu_map(_name, _idx)
	early_per_cpu(_name, _cpu)

    The DEFINE macro defines the per_cpu variable as well as the early
    map and pointer.  It also initializes the per_cpu variable and map
    elements to "_initvalue".  The early_* macros provide access to
    the initial map (usually setup during system init) and the early
    pointer.  This pointer is initialized to point to the early map
    but is then NULL'ed when the actual per_cpu areas are setup.  After
    that the per_cpu variable is the correct access to the variable.

    The early_per_cpu() macro is not very efficient but does show how to
    access the variable if you have a function that can be called both
    "early" and "late".  It tests the early ptr to be NULL, and if not
    then it's still valid.  Otherwise, the per_cpu variable is used
    instead:

	#define early_per_cpu(_name, _cpu) 			\
		(early_per_cpu_ptr(_name) ?			\
			early_per_cpu_ptr(_name)[_cpu] :	\
			per_cpu(_name, _cpu))


    A better method is to actually check the pointer manually.  In the
    case below, numa_set_node can be called both "early" and "late":

	void __cpuinit numa_set_node(int cpu, int node)
	{
	    int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);

	    if (cpu_to_node_map)
		    cpu_to_node_map[cpu] = node;
	    else 
		    per_cpu(x86_cpu_to_node_map, cpu) = node;
	}

  * Add a flag "arch_provides_topology_pointers" that indicates pointers
    to topology cpumask_t maps are available.  Otherwise, use the function
    returning the cpumask_t value.  This is useful if cpumask_t set size
    is very large to avoid copying data on to/off of the stack.

  * The coverage of CONFIG_DEBUG_PER_CPU_MAPS has been increased while
    the non-debug case has been optimized a bit.

  * Remove an unreferenced compiler warning in drivers/base/topology.c

  * Clean up #ifdef in setup.c


For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
v4:
    Some more minor cleanups (last 2 items above).

v3-folded:
    Folded in followon "fix" patches to consolidate changes into one place.
    Includes change to drivers/base/topology.c to fix s390 build error.
    Includes change to fix preemption warning when numa_node_id is used.

v2: Remerged PATCH 2/2 with latest x86.git/latest and sched-devel/latest,
    rebuilt and retested 4k-akpm2, 4k-defconfig, nonuma, & nosmp configs.
    Cleaned up asm-x86/topology.h a little bit.)

    (Removed checkpatch warnings and Cc list.)
---
 arch/x86/Kconfig           |    2 
 arch/x86/Kconfig.debug     |    2 
 arch/x86/kernel/apic_32.c  |    9 --
 arch/x86/kernel/apic_64.c  |   11 +--
 arch/x86/kernel/setup.c    |   99 +++++++++++++++++++++++++++----
 arch/x86/kernel/setup_32.c |   24 -------
 arch/x86/kernel/setup_64.c |    9 --
 arch/x86/kernel/smpboot.c  |   20 ------
 arch/x86/mm/numa_64.c      |   43 +++----------
 arch/x86/mm/srat_64.c      |    2 
 drivers/base/topology.c    |   25 +++++++
 include/asm-x86/numa_64.h  |   19 ++----
 include/asm-x86/percpu.h   |   46 ++++++++++++++
 include/asm-x86/smp.h      |   15 ----
 include/asm-x86/topology.h |  141 +++++++++++++++++++++++++--------------------
 15 files changed, 271 insertions(+), 196 deletions(-)

--- linux-2.6.sched.orig/arch/x86/Kconfig
+++ linux-2.6.sched/arch/x86/Kconfig
@@ -122,7 +122,7 @@ config ARCH_HAS_CPU_RELAX
 	def_bool y
 
 config HAVE_SETUP_PER_CPU_AREA
-	def_bool X86_64 || (X86_SMP && !X86_VOYAGER)
+	def_bool X86_64_SMP || (X86_SMP && !X86_VOYAGER)
 
 config HAVE_CPUMASK_OF_CPU_MAP
 	def_bool X86_64_SMP
--- linux-2.6.sched.orig/arch/x86/Kconfig.debug
+++ linux-2.6.sched/arch/x86/Kconfig.debug
@@ -56,7 +56,7 @@ config DEBUG_PAGEALLOC
 config DEBUG_PER_CPU_MAPS
 	bool "Debug access to per_cpu maps"
 	depends on DEBUG_KERNEL
-	depends on X86_64_SMP
+	depends on X86_SMP
 	default n
 	help
 	  Say Y to verify that the per_cpu map being accessed has
--- linux-2.6.sched.orig/arch/x86/kernel/apic_32.c
+++ linux-2.6.sched/arch/x86/kernel/apic_32.c
@@ -52,9 +52,6 @@
 
 unsigned long mp_lapic_addr;
 
-DEFINE_PER_CPU(u16, x86_bios_cpu_apicid) = BAD_APICID;
-EXPORT_PER_CPU_SYMBOL(x86_bios_cpu_apicid);
-
 /*
  * Knob to control our willingness to enable the local APIC.
  *
@@ -1538,9 +1535,9 @@ void __cpuinit generic_processor_info(in
 	}
 #ifdef CONFIG_SMP
 	/* are we being called early in kernel startup? */
-	if (x86_cpu_to_apicid_early_ptr) {
-		u16 *cpu_to_apicid = x86_cpu_to_apicid_early_ptr;
-		u16 *bios_cpu_apicid = x86_bios_cpu_apicid_early_ptr;
+	if (early_per_cpu_ptr(x86_cpu_to_apicid)) {
+		u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid);
+		u16 *bios_cpu_apicid = early_per_cpu_ptr(x86_bios_cpu_apicid);
 
 		cpu_to_apicid[cpu] = apicid;
 		bios_cpu_apicid[cpu] = apicid;
--- linux-2.6.sched.orig/arch/x86/kernel/apic_64.c
+++ linux-2.6.sched/arch/x86/kernel/apic_64.c
@@ -87,9 +87,6 @@ static unsigned long apic_phys;
 
 unsigned long mp_lapic_addr;
 
-DEFINE_PER_CPU(u16, x86_bios_cpu_apicid) = BAD_APICID;
-EXPORT_PER_CPU_SYMBOL(x86_bios_cpu_apicid);
-
 unsigned int __cpuinitdata maxcpus = NR_CPUS;
 /*
  * Get the LAPIC version
@@ -1091,9 +1088,9 @@ void __cpuinit generic_processor_info(in
 		cpu = 0;
 	}
 	/* are we being called early in kernel startup? */
-	if (x86_cpu_to_apicid_early_ptr) {
-		u16 *cpu_to_apicid = x86_cpu_to_apicid_early_ptr;
-		u16 *bios_cpu_apicid = x86_bios_cpu_apicid_early_ptr;
+	if (early_per_cpu_ptr(x86_cpu_to_apicid)) {
+		u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid);
+		u16 *bios_cpu_apicid = early_per_cpu_ptr(x86_bios_cpu_apicid);
 
 		cpu_to_apicid[cpu] = apicid;
 		bios_cpu_apicid[cpu] = apicid;
@@ -1269,7 +1266,7 @@ __cpuinit int apic_is_clustered_box(void
 	if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && !is_vsmp_box())
 		return 0;
 
-	bios_cpu_apicid = x86_bios_cpu_apicid_early_ptr;
+	bios_cpu_apicid = early_per_cpu_ptr(x86_bios_cpu_apicid);
 	bitmap_zero(clustermap, NUM_APIC_CLUSTERS);
 
 	for (i = 0; i < NR_CPUS; i++) {
--- linux-2.6.sched.orig/arch/x86/kernel/setup.c
+++ linux-2.6.sched/arch/x86/kernel/setup.c
@@ -14,17 +14,28 @@
 
 unsigned int num_processors;
 unsigned disabled_cpus __cpuinitdata;
+
 /* Processor that is doing the boot up */
 unsigned int boot_cpu_physical_apicid = -1U;
 EXPORT_SYMBOL(boot_cpu_physical_apicid);
 
-DEFINE_PER_CPU(u16, x86_cpu_to_apicid) = BAD_APICID;
-EXPORT_PER_CPU_SYMBOL(x86_cpu_to_apicid);
-
 /* Bitmask of physically existing CPUs */
 physid_mask_t phys_cpu_present_map;
 
-#if defined(CONFIG_HAVE_SETUP_PER_CPU_AREA) && defined(CONFIG_SMP)
+/* map cpu index to physical APIC ID */
+DEFINE_EARLY_PER_CPU(u16, x86_cpu_to_apicid, BAD_APICID);
+DEFINE_EARLY_PER_CPU(u16, x86_bios_cpu_apicid, BAD_APICID);
+EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_apicid);
+EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu_apicid);
+
+#if defined(CONFIG_NUMA) && defined(CONFIG_X86_64)
+#define	X86_64_NUMA	1
+
+DEFINE_EARLY_PER_CPU(int, x86_cpu_to_node_map, NUMA_NO_NODE);
+EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_node_map);
+#endif
+
+#ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
 /*
  * Copy data used in early init routines from the initial arrays to the
  * per cpu data areas.  These arrays then become expendable and the
@@ -35,20 +46,21 @@ static void __init setup_per_cpu_maps(vo
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		per_cpu(x86_cpu_to_apicid, cpu) = x86_cpu_to_apicid_init[cpu];
+		per_cpu(x86_cpu_to_apicid, cpu) =
+				early_per_cpu_map(x86_cpu_to_apicid, cpu);
 		per_cpu(x86_bios_cpu_apicid, cpu) =
-						x86_bios_cpu_apicid_init[cpu];
-#ifdef CONFIG_NUMA
+				early_per_cpu_map(x86_bios_cpu_apicid, cpu);
+#ifdef X86_64_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
-						x86_cpu_to_node_map_init[cpu];
+				early_per_cpu_map(x86_cpu_to_node_map, cpu);
 #endif
 	}
 
 	/* indicate the early static arrays will soon be gone */
-	x86_cpu_to_apicid_early_ptr = NULL;
-	x86_bios_cpu_apicid_early_ptr = NULL;
-#ifdef CONFIG_NUMA
-	x86_cpu_to_node_map_early_ptr = NULL;
+	early_per_cpu_ptr(x86_cpu_to_apicid) = NULL;
+	early_per_cpu_ptr(x86_bios_cpu_apicid) = NULL;
+#ifdef X86_64_NUMA
+	early_per_cpu_ptr(x86_cpu_to_node_map) = NULL;
 #endif
 }
 
@@ -107,7 +119,8 @@ void __init setup_per_cpu_areas(void)
 		if (!node_online(node) || !NODE_DATA(node)) {
 			ptr = alloc_bootmem_pages(size);
 			printk(KERN_INFO
-			       "cpu %d has no node or node-local memory\n", i);
+			       "cpu %d has no node %d or node-local memory\n",
+				i, node);
 		}
 		else
 			ptr = alloc_bootmem_pages_node(NODE_DATA(node), size);
@@ -135,3 +148,63 @@ void __init setup_per_cpu_areas(void)
 }
 
 #endif
+
+#ifdef X86_64_NUMA
+void __cpuinit numa_set_node(int cpu, int node)
+{
+	int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
+
+	if (cpu_to_node_map)
+		cpu_to_node_map[cpu] = node;
+
+	else if (per_cpu_offset(cpu))
+		per_cpu(x86_cpu_to_node_map, cpu) = node;
+
+	else
+		Dprintk(KERN_INFO "Setting node for non-present cpu %d\n", cpu);
+}
+
+void __cpuinit numa_clear_node(int cpu)
+{
+	numa_set_node(cpu, NUMA_NO_NODE);
+}
+
+void __cpuinit numa_add_cpu(int cpu)
+{
+	cpu_set(cpu, node_to_cpumask_map[early_cpu_to_node(cpu)]);
+}
+
+void __cpuinit numa_remove_cpu(int cpu)
+{
+	cpu_clear(cpu, node_to_cpumask_map[cpu_to_node(cpu)]);
+}
+#endif /* CONFIG_NUMA */
+
+#if defined(CONFIG_DEBUG_PER_CPU_MAPS) && defined(CONFIG_X86_64)
+
+int cpu_to_node(int cpu)
+{
+	if (early_per_cpu_ptr(x86_cpu_to_node_map)) {
+		printk(KERN_WARNING
+			"cpu_to_node(%d): usage too early!\n", cpu);
+		dump_stack();
+		return early_per_cpu_ptr(x86_cpu_to_node_map)[cpu];
+	}
+	return per_cpu(x86_cpu_to_node_map, cpu);
+}
+EXPORT_SYMBOL(cpu_to_node);
+
+int early_cpu_to_node(int cpu)
+{
+	if (early_per_cpu_ptr(x86_cpu_to_node_map))
+		return early_per_cpu_ptr(x86_cpu_to_node_map)[cpu];
+
+	if (!per_cpu_offset(cpu)) {
+		printk(KERN_WARNING
+			"early_cpu_to_node(%d): no per_cpu area!\n", cpu);
+			dump_stack();
+		return NUMA_NO_NODE;
+	}
+	return per_cpu(x86_cpu_to_node_map, cpu);
+}
+#endif
--- linux-2.6.sched.orig/arch/x86/kernel/setup_32.c
+++ linux-2.6.sched/arch/x86/kernel/setup_32.c
@@ -725,18 +725,6 @@ char * __init __attribute__((weak)) memo
 	return machine_specific_memory_setup();
 }
 
-#ifdef CONFIG_NUMA
-/*
- * In the golden day, when everything among i386 and x86_64 will be
- * integrated, this will not live here
- */
-void *x86_cpu_to_node_map_early_ptr;
-int x86_cpu_to_node_map_init[NR_CPUS] = {
-	[0 ... NR_CPUS-1] = NUMA_NO_NODE
-};
-DEFINE_PER_CPU(int, x86_cpu_to_node_map) = NUMA_NO_NODE;
-#endif
-
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -870,18 +858,6 @@ void __init setup_arch(char **cmdline_p)
 
 	io_delay_init();
 
-#ifdef CONFIG_X86_SMP
-	/*
-	 * setup to use the early static init tables during kernel startup
-	 * X86_SMP will exclude sub-arches that don't deal well with it.
-	 */
-	x86_cpu_to_apicid_early_ptr = (void *)x86_cpu_to_apicid_init;
-	x86_bios_cpu_apicid_early_ptr = (void *)x86_bios_cpu_apicid_init;
-#ifdef CONFIG_NUMA
-	x86_cpu_to_node_map_early_ptr = (void *)x86_cpu_to_node_map_init;
-#endif
-#endif
-
 #ifdef CONFIG_X86_GENERICARCH
 	generic_apic_probe();
 #endif
--- linux-2.6.sched.orig/arch/x86/kernel/setup_64.c
+++ linux-2.6.sched/arch/x86/kernel/setup_64.c
@@ -398,15 +398,6 @@ void __init setup_arch(char **cmdline_p)
 
 	io_delay_init();
 
-#ifdef CONFIG_SMP
-	/* setup to use the early static init tables during kernel startup */
-	x86_cpu_to_apicid_early_ptr = (void *)x86_cpu_to_apicid_init;
-	x86_bios_cpu_apicid_early_ptr = (void *)x86_bios_cpu_apicid_init;
-#ifdef CONFIG_NUMA
-	x86_cpu_to_node_map_early_ptr = (void *)x86_cpu_to_node_map_init;
-#endif
-#endif
-
 #ifdef CONFIG_ACPI
 	/*
 	 * Initialize the ACPI boot-time table parser (gets the RSDP and SDT).
--- linux-2.6.sched.orig/arch/x86/kernel/smpboot.c
+++ linux-2.6.sched/arch/x86/kernel/smpboot.c
@@ -68,22 +68,6 @@
 #include <mach_wakecpu.h>
 #include <smpboot_hooks.h>
 
-/*
- * FIXME: For x86_64, those are defined in other files. But moving them here,
- * would make the setup areas dependent on smp, which is a loss. When we
- * integrate apic between arches, we can probably do a better job, but
- * right now, they'll stay here -- glommer
- */
-
-/* which logical CPU number maps to which CPU (physical APIC ID) */
-u16 x86_cpu_to_apicid_init[NR_CPUS] __initdata =
-			{ [0 ... NR_CPUS-1] = BAD_APICID };
-void *x86_cpu_to_apicid_early_ptr;
-
-u16 x86_bios_cpu_apicid_init[NR_CPUS] __initdata
-				= { [0 ... NR_CPUS-1] = BAD_APICID };
-void *x86_bios_cpu_apicid_early_ptr;
-
 #ifdef CONFIG_X86_32
 u8 apicid_2_node[MAX_APICID];
 #endif
@@ -985,7 +969,7 @@ do_rest:
 		/* Try to put things back the way they were before ... */
 		unmap_cpu_to_logical_apicid(cpu);
 #ifdef CONFIG_X86_64
-		clear_node_cpumask(cpu); /* was set by numa_add_cpu */
+		numa_remove_cpu(cpu); /* was set by numa_add_cpu */
 #endif
 		cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
 		cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
@@ -1364,7 +1348,7 @@ static void __ref remove_cpu_from_maps(i
 	cpu_clear(cpu, cpu_callin_map);
 	/* was set by cpu_init() */
 	clear_bit(cpu, (unsigned long *)&cpu_initialized);
-	clear_node_cpumask(cpu);
+	numa_remove_cpu(cpu);
 #endif
 }
 
--- linux-2.6.sched.orig/arch/x86/mm/numa_64.c
+++ linux-2.6.sched/arch/x86/mm/numa_64.c
@@ -31,16 +31,6 @@ bootmem_data_t plat_node_bdata[MAX_NUMNO
 
 struct memnode memnode;
 
-#ifdef CONFIG_SMP
-int x86_cpu_to_node_map_init[NR_CPUS] = {
-	[0 ... NR_CPUS-1] = NUMA_NO_NODE
-};
-void *x86_cpu_to_node_map_early_ptr;
-EXPORT_SYMBOL(x86_cpu_to_node_map_early_ptr);
-#endif
-DEFINE_PER_CPU(int, x86_cpu_to_node_map) = NUMA_NO_NODE;
-EXPORT_PER_CPU_SYMBOL(x86_cpu_to_node_map);
-
 s16 apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
@@ -578,24 +568,6 @@ void __init numa_initmem_init(unsigned l
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT);
 }
 
-__cpuinit void numa_add_cpu(int cpu)
-{
-	set_bit(cpu,
-		(unsigned long *)&node_to_cpumask_map[early_cpu_to_node(cpu)]);
-}
-
-void __cpuinit numa_set_node(int cpu, int node)
-{
-	int *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
-
-	if(cpu_to_node_map)
-		cpu_to_node_map[cpu] = node;
-	else if(per_cpu_offset(cpu))
-		per_cpu(x86_cpu_to_node_map, cpu) = node;
-	else
-		Dprintk(KERN_INFO "Setting node for non-present cpu %d\n", cpu);
-}
-
 unsigned long __init numa_free_all_bootmem(void)
 {
 	unsigned long pages = 0;
@@ -642,6 +614,7 @@ static __init int numa_setup(char *opt)
 }
 early_param("numa", numa_setup);
 
+#ifdef CONFIG_NUMA
 /*
  * Setup early cpu_to_node.
  *
@@ -653,14 +626,19 @@ early_param("numa", numa_setup);
  * is already initialized in a round robin manner at numa_init_array,
  * prior to this call, and this initialization is good enough
  * for the fake NUMA cases.
+ *
+ * Called before the per_cpu areas are setup.
  */
 void __init init_cpu_to_node(void)
 {
-	int i;
+	int cpu;
+	u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid);
 
-	for (i = 0; i < NR_CPUS; i++) {
+	BUG_ON(cpu_to_apicid == NULL);
+
+	for_each_possible_cpu(cpu) {
 		int node;
-		u16 apicid = x86_cpu_to_apicid_init[i];
+		u16 apicid = cpu_to_apicid[cpu];
 
 		if (apicid == BAD_APICID)
 			continue;
@@ -669,8 +647,9 @@ void __init init_cpu_to_node(void)
 			continue;
 		if (!node_online(node))
 			continue;
-		numa_set_node(i, node);
+		numa_set_node(cpu, node);
 	}
 }
+#endif
 
 
--- linux-2.6.sched.orig/arch/x86/mm/srat_64.c
+++ linux-2.6.sched/arch/x86/mm/srat_64.c
@@ -403,7 +403,7 @@ int __init acpi_scan_nodes(unsigned long
 		if (node == NUMA_NO_NODE)
 			continue;
 		if (!node_isset(node, node_possible_map))
-			numa_set_node(i, NUMA_NO_NODE);
+			numa_clear_node(i);
 	}
 	numa_init_array();
 	return 0;
--- linux-2.6.sched.orig/drivers/base/topology.c
+++ linux-2.6.sched/drivers/base/topology.c
@@ -40,6 +40,7 @@ static ssize_t show_##name(struct sys_de
 	return sprintf(buf, "%d\n", topology_##name(cpu));	\
 }
 
+#if defined(topology_thread_siblings) || defined(topology_core_siblings)
 static ssize_t show_cpumap(int type, cpumask_t *mask, char *buf)
 {
 	ptrdiff_t len = PTR_ALIGN(buf + PAGE_SIZE - 1, PAGE_SIZE) - buf;
@@ -54,21 +55,41 @@ static ssize_t show_cpumap(int type, cpu
 	}
 	return n;
 }
+#endif
 
+#ifdef arch_provides_topology_pointers
 #define define_siblings_show_map(name)					\
-static inline ssize_t show_##name(struct sys_device *dev, char *buf)	\
+static ssize_t show_##name(struct sys_device *dev, char *buf)	\
 {									\
 	unsigned int cpu = dev->id;					\
 	return show_cpumap(0, &(topology_##name(cpu)), buf);		\
 }
 
 #define define_siblings_show_list(name)					\
-static inline ssize_t show_##name##_list(struct sys_device *dev, char *buf) \
+static ssize_t show_##name##_list(struct sys_device *dev, char *buf) \
 {									\
 	unsigned int cpu = dev->id;					\
 	return show_cpumap(1, &(topology_##name(cpu)), buf);		\
 }
 
+#else
+#define define_siblings_show_map(name)					\
+static ssize_t show_##name(struct sys_device *dev, char *buf)	\
+{									\
+	unsigned int cpu = dev->id;					\
+	cpumask_t mask = topology_##name(cpu);				\
+	return show_cpumap(0, &mask, buf);				\
+}
+
+#define define_siblings_show_list(name)					\
+static ssize_t show_##name##_list(struct sys_device *dev, char *buf) \
+{									\
+	unsigned int cpu = dev->id;					\
+	cpumask_t mask = topology_##name(cpu);				\
+	return show_cpumap(1, &mask, buf);				\
+}
+#endif
+
 #define define_siblings_show_func(name)		\
 	define_siblings_show_map(name); define_siblings_show_list(name)
 
--- linux-2.6.sched.orig/include/asm-x86/numa_64.h
+++ linux-2.6.sched/include/asm-x86/numa_64.h
@@ -14,11 +14,9 @@ extern int compute_hash_shift(struct boo
 
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
 
-extern void numa_add_cpu(int cpu);
 extern void numa_init_array(void);
 extern int numa_off;
 
-extern void numa_set_node(int cpu, int node);
 extern void srat_reserve_add_area(int nodeid);
 extern int hotadd_percent;
 
@@ -31,15 +29,16 @@ extern void setup_node_bootmem(int nodei
 
 #ifdef CONFIG_NUMA
 extern void __init init_cpu_to_node(void);
-
-static inline void clear_node_cpumask(int cpu)
-{
-	clear_bit(cpu, (unsigned long *)&node_to_cpumask_map[cpu_to_node(cpu)]);
-}
-
+extern void __cpuinit numa_set_node(int cpu, int node);
+extern void __cpuinit numa_clear_node(int cpu);
+extern void __cpuinit numa_add_cpu(int cpu);
+extern void __cpuinit numa_remove_cpu(int cpu);
 #else
-#define init_cpu_to_node() do {} while (0)
-#define clear_node_cpumask(cpu) do {} while (0)
+static inline void init_cpu_to_node(void)		{ }
+static inline void numa_set_node(int cpu, int node)	{ }
+static inline void numa_clear_node(int cpu)		{ }
+static inline void numa_add_cpu(int cpu, int node)	{ }
+static inline void numa_remove_cpu(int cpu)		{ }
 #endif
 
 #endif
--- linux-2.6.sched.orig/include/asm-x86/percpu.h
+++ linux-2.6.sched/include/asm-x86/percpu.h
@@ -143,4 +143,50 @@ do {							\
 #define x86_or_percpu(var, val) percpu_to_op("or", per_cpu__##var, val)
 #endif /* !__ASSEMBLY__ */
 #endif /* !CONFIG_X86_64 */
+
+#ifdef CONFIG_SMP
+
+/*
+ * Define the "EARLY_PER_CPU" macros.  These are used for some per_cpu
+ * variables that are initialized and accessed before there are per_cpu
+ * areas allocated.
+ */
+
+#define	DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)			\
+	DEFINE_PER_CPU(_type, _name) = _initvalue;			\
+	__typeof__(_type) _name##_early_map[NR_CPUS] __initdata =	\
+				{ [0 ... NR_CPUS-1] = _initvalue };	\
+	__typeof__(_type) *_name##_early_ptr = _name##_early_map
+
+#define EXPORT_EARLY_PER_CPU_SYMBOL(_name)			\
+	EXPORT_PER_CPU_SYMBOL(_name)
+
+#define DECLARE_EARLY_PER_CPU(_type, _name)			\
+	DECLARE_PER_CPU(_type, _name);				\
+	extern __typeof__(_type) *_name##_early_ptr;		\
+	extern __typeof__(_type)  _name##_early_map[]
+
+#define	early_per_cpu_ptr(_name) (_name##_early_ptr)
+#define	early_per_cpu_map(_name, _idx) (_name##_early_map[_idx])
+#define	early_per_cpu(_name, _cpu) 				\
+	(early_per_cpu_ptr(_name) ?				\
+		early_per_cpu_ptr(_name)[_cpu] :		\
+		per_cpu(_name, _cpu))
+
+#else	/* !CONFIG_SMP */
+#define	DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)		\
+	DEFINE_PER_CPU(_type, _name) = _initvalue
+
+#define EXPORT_EARLY_PER_CPU_SYMBOL(_name)			\
+	EXPORT_PER_CPU_SYMBOL(_name)
+
+#define DECLARE_EARLY_PER_CPU(_type, _name)			\
+	DECLARE_PER_CPU(_type, _name)
+
+#define	early_per_cpu(_name, _cpu) per_cpu(_name, _cpu)
+#define	early_per_cpu_ptr(_name) NULL
+/* no early_per_cpu_map() */
+
+#endif	/* !CONFIG_SMP */
+
 #endif /* _ASM_X86_PERCPU_H_ */
--- linux-2.6.sched.orig/include/asm-x86/smp.h
+++ linux-2.6.sched/include/asm-x86/smp.h
@@ -29,21 +29,12 @@ extern int smp_num_siblings;
 extern unsigned int num_processors;
 extern cpumask_t cpu_initialized;
 
-#ifdef CONFIG_SMP
-extern u16 x86_cpu_to_apicid_init[];
-extern u16 x86_bios_cpu_apicid_init[];
-extern void *x86_cpu_to_apicid_early_ptr;
-extern void *x86_bios_cpu_apicid_early_ptr;
-#else
-#define x86_cpu_to_apicid_early_ptr NULL
-#define x86_bios_cpu_apicid_early_ptr NULL
-#endif
-
 DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
 DECLARE_PER_CPU(cpumask_t, cpu_core_map);
 DECLARE_PER_CPU(u16, cpu_llc_id);
-DECLARE_PER_CPU(u16, x86_cpu_to_apicid);
-DECLARE_PER_CPU(u16, x86_bios_cpu_apicid);
+
+DECLARE_EARLY_PER_CPU(u16, x86_cpu_to_apicid);
+DECLARE_EARLY_PER_CPU(u16, x86_bios_cpu_apicid);
 
 /* Static state in head.S used to set up a CPU */
 extern struct {
--- linux-2.6.sched.orig/include/asm-x86/topology.h
+++ linux-2.6.sched/include/asm-x86/topology.h
@@ -25,87 +25,67 @@
 #ifndef _ASM_X86_TOPOLOGY_H
 #define _ASM_X86_TOPOLOGY_H
 
+/* Node not present */
+#define NUMA_NO_NODE	(-1)
+
 #ifdef CONFIG_NUMA
 #include <linux/cpumask.h>
 #include <asm/mpspec.h>
 
-/* Mappings between logical cpu number and node number */
 #ifdef CONFIG_X86_32
-extern int cpu_to_node_map[];
-#else
-/* Returns the number of the current Node. */
-#define numa_node_id()		(early_cpu_to_node(raw_smp_processor_id()))
-#endif
-
-DECLARE_PER_CPU(int, x86_cpu_to_node_map);
-
-#ifdef CONFIG_SMP
-extern int x86_cpu_to_node_map_init[];
-extern void *x86_cpu_to_node_map_early_ptr;
-#else
-#define x86_cpu_to_node_map_early_ptr NULL
-#endif
 
+/* Mappings between node number and cpus on that node. */
 extern cpumask_t node_to_cpumask_map[];
 
-#define NUMA_NO_NODE	(-1)
+/* Mappings between logical cpu number and node number */
+extern int cpu_to_node_map[];
 
 /* Returns the number of the node containing CPU 'cpu' */
-#ifdef CONFIG_X86_32
-#define early_cpu_to_node(cpu)	cpu_to_node(cpu)
 static inline int cpu_to_node(int cpu)
 {
 	return cpu_to_node_map[cpu];
 }
+#define early_cpu_to_node(cpu)	cpu_to_node(cpu)
 
 #else /* CONFIG_X86_64 */
 
-#ifdef CONFIG_SMP
-static inline int early_cpu_to_node(int cpu)
-{
-	int *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
+/* Mappings between node number and cpus on that node. */
+extern cpumask_t node_to_cpumask_map[];
 
-	if (cpu_to_node_map)
-		return cpu_to_node_map[cpu];
-	else if (per_cpu_offset(cpu))
-		return per_cpu(x86_cpu_to_node_map, cpu);
-	else
-		return NUMA_NO_NODE;
-}
-#else
-#define	early_cpu_to_node(cpu)	cpu_to_node(cpu)
-#endif
+/* Mappings between logical cpu number and node number */
+DECLARE_EARLY_PER_CPU(int, x86_cpu_to_node_map);
+
+/* Returns the number of the current Node. */
+#define numa_node_id()	(per_cpu(x86_cpu_to_node_map, raw_smp_processor_id()))
+
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+extern int cpu_to_node(int cpu);
+extern int early_cpu_to_node(int cpu);
+extern cpumask_t *_node_to_cpumask_ptr(int node);
+extern cpumask_t node_to_cpumask(int node);
+
+#else	/* !CONFIG_DEBUG_PER_CPU_MAPS */
 
+/* Returns the number of the node containing CPU 'cpu' */
 static inline int cpu_to_node(int cpu)
 {
-#ifdef CONFIG_DEBUG_PER_CPU_MAPS
-	if (x86_cpu_to_node_map_early_ptr) {
-		printk("KERN_NOTICE cpu_to_node(%d): usage too early!\n",
-		       (int)cpu);
-		dump_stack();
-		return ((int *)x86_cpu_to_node_map_early_ptr)[cpu];
-	}
-#endif
 	return per_cpu(x86_cpu_to_node_map, cpu);
 }
 
-#ifdef	CONFIG_NUMA
-
-/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
-#define node_to_cpumask_ptr(v, node)		\
-		cpumask_t *v = &(node_to_cpumask_map[node])
-
-#define node_to_cpumask_ptr_next(v, node)	\
-			   v = &(node_to_cpumask_map[node])
-#endif
+/* Same function but used if called before per_cpu areas are setup */
+static inline int early_cpu_to_node(int cpu)
+{
+	if (early_per_cpu_ptr(x86_cpu_to_node_map))
+		return early_per_cpu_ptr(x86_cpu_to_node_map)[cpu];
 
-#endif /* CONFIG_X86_64 */
+	return per_cpu(x86_cpu_to_node_map, cpu);
+}
 
-/*
- * Returns the number of the node containing Node 'node'. This
- * architecture is flat, so it is a pretty simple function!
- */
-#define parent_node(node) (node)
+/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
+static inline cpumask_t *_node_to_cpumask_ptr(int node)
+{
+	return &node_to_cpumask_map[node];
+}
 
 /* Returns a bitmask of CPUs on Node 'node'. */
 static inline cpumask_t node_to_cpumask(int node)
@@ -113,14 +93,29 @@ static inline cpumask_t node_to_cpumask(
 	return node_to_cpumask_map[node];
 }
 
+#endif /* !CONFIG_DEBUG_PER_CPU_MAPS */
+#endif /* CONFIG_X86_64 */
+
+/* Replace default node_to_cpumask_ptr with optimized version */
+#define node_to_cpumask_ptr(v, node)		\
+		cpumask_t *v = _node_to_cpumask_ptr(node)
+
+#define node_to_cpumask_ptr_next(v, node)	\
+			   v = _node_to_cpumask_ptr(node)
+
 /* Returns the number of the first CPU on Node 'node'. */
 static inline int node_to_first_cpu(int node)
 {
-	cpumask_t mask = node_to_cpumask(node);
-
-	return first_cpu(mask);
+	node_to_cpumask_ptr(mask, node);
+	return first_cpu(*mask);
 }
 
+/*
+ * Returns the number of the node containing Node 'node'. This
+ * architecture is flat, so it is a pretty simple function!
+ */
+#define parent_node(node) (node)
+
 #define pcibus_to_node(bus) __pcibus_to_node(bus)
 #define pcibus_to_cpumask(bus) __pcibus_to_cpumask(bus)
 
@@ -178,8 +173,31 @@ extern int __node_distance(int, int);
 #define node_distance(a, b) __node_distance(a, b)
 #endif
 
-#else /* CONFIG_NUMA */
+#else /* !CONFIG_NUMA */
+
+#define numa_node_id()		0
+#define	cpu_to_node(cpu)	0
+#define	early_cpu_to_node(cpu)	0
+
+static inline cpumask_t *_node_to_cpumask_ptr(int node)
+{
+	return &cpu_online_map;
+}
+static inline cpumask_t node_to_cpumask(int node)
+{
+	return cpu_online_map;
+}
+static inline int node_to_first_cpu(int node)
+{
+	return first_cpu(cpu_online_map);
+}
+
+/* Replace default node_to_cpumask_ptr with optimized version */
+#define node_to_cpumask_ptr(v, node)		\
+		cpumask_t *v = _node_to_cpumask_ptr(node)
 
+#define node_to_cpumask_ptr_next(v, node)	\
+			   v = _node_to_cpumask_ptr(node)
 #endif
 
 #include <asm-generic/topology.h>
@@ -191,6 +209,9 @@ extern cpumask_t cpu_coregroup_map(int c
 #define topology_core_id(cpu)			(cpu_data(cpu).cpu_core_id)
 #define topology_core_siblings(cpu)		(per_cpu(cpu_core_map, cpu))
 #define topology_thread_siblings(cpu)		(per_cpu(cpu_sibling_map, cpu))
+
+/* indicates that pointers to the topology cpumask_t maps are valid */
+#define arch_provides_topology_pointers		yes
 #endif
 
 struct pci_bus;
@@ -214,4 +235,4 @@ static inline void set_mp_bus_to_node(in
 }
 #endif
 
-#endif
+#endif /* _ASM_X86_TOPOLOGY_H */

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 03/11] x86: restore pda nodenumber field
  2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
  2008-04-26  0:15 ` [PATCH 01/11] x86: Modify Kconfig to allow up to 4096 cpus Mike Travis
  2008-04-26  0:15 ` [PATCH 02/11] x86: cleanup early per cpu variables/accesses v4 Mike Travis
@ 2008-04-26  0:15 ` Mike Travis
  2008-04-26  0:15 ` [PATCH 04/11] x86: remove the static 256k node_to_cpumask_map Mike Travis
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Mike Travis @ 2008-04-26  0:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel

[-- Attachment #1: restore-nodenumber --]
[-- Type: text/plain, Size: 2215 bytes --]

  * Restore the nodenumber field in the x86_64 pda.  This field is slightly
    different than the x86_cpu_to_node_map mainly because it's a static
    indication of which node the cpu is on while the cpu to node map is a
    dyanamic mapping that may get reset if the cpu goes offline.  This also
    simplifies the numa_node_id() macro.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/setup.c    |    4 ++++
 include/asm-x86/pda.h      |    1 +
 include/asm-x86/topology.h |    2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

--- linux-2.6.sched.orig/arch/x86/kernel/setup.c
+++ linux-2.6.sched/arch/x86/kernel/setup.c
@@ -31,6 +31,7 @@ EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu
 #if defined(CONFIG_NUMA) && defined(CONFIG_X86_64)
 #define	X86_64_NUMA	1
 
+/* map cpu index to node index */
 DEFINE_EARLY_PER_CPU(int, x86_cpu_to_node_map, NUMA_NO_NODE);
 EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_node_map);
 #endif
@@ -154,6 +155,9 @@ void __cpuinit numa_set_node(int cpu, in
 {
 	int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
 
+	if (node != NUMA_NO_NODE)
+		cpu_pda(cpu)->nodenumber = node;
+
 	if (cpu_to_node_map)
 		cpu_to_node_map[cpu] = node;
 
--- linux-2.6.sched.orig/include/asm-x86/pda.h
+++ linux-2.6.sched/include/asm-x86/pda.h
@@ -20,6 +20,7 @@ struct x8664_pda {
 					/* gcc-ABI: this canary MUST be at
 					   offset 40!!! */
 	char *irqstackptr;
+	int nodenumber;			/* number of current node */
 	unsigned int __softirq_pending;
 	unsigned int __nmi_count;	/* number of NMI on this CPUs */
 	short mmu_state;
--- linux-2.6.sched.orig/include/asm-x86/topology.h
+++ linux-2.6.sched/include/asm-x86/topology.h
@@ -56,7 +56,7 @@ extern cpumask_t node_to_cpumask_map[];
 DECLARE_EARLY_PER_CPU(int, x86_cpu_to_node_map);
 
 /* Returns the number of the current Node. */
-#define numa_node_id()	(per_cpu(x86_cpu_to_node_map, raw_smp_processor_id()))
+#define numa_node_id()		read_pda(nodenumber)
 
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
 extern int cpu_to_node(int cpu);

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 04/11] x86: remove the static 256k node_to_cpumask_map
  2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
                   ` (2 preceding siblings ...)
  2008-04-26  0:15 ` [PATCH 03/11] x86: restore pda nodenumber field Mike Travis
@ 2008-04-26  0:15 ` Mike Travis
  2008-04-26  0:15 ` [PATCH 05/11] sched: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c Mike Travis
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Mike Travis @ 2008-04-26  0:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel

[-- Attachment #1: node_to_cpumask --]
[-- Type: text/plain, Size: 8088 bytes --]

  * Consolidate node_to_cpumask operations and remove the 256k
    byte node_to_cpumask_map.  This is done by allocating the
    node_to_cpumask_map array after the number of possible nodes
    (nr_node_ids) is known.

  * Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
    been increased.  It now shows faults when calling node_to_cpumask()
    and node_to_cpumask_ptr().

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/setup.c    |  132 +++++++++++++++++++++++++++++++++++++++++++--
 arch/x86/mm/numa_64.c      |    6 --
 include/asm-x86/topology.h |   25 +++++---
 3 files changed, 144 insertions(+), 19 deletions(-)

--- linux-2.6.sched.orig/arch/x86/kernel/setup.c
+++ linux-2.6.sched/arch/x86/kernel/setup.c
@@ -34,6 +34,16 @@ EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu
 /* map cpu index to node index */
 DEFINE_EARLY_PER_CPU(int, x86_cpu_to_node_map, NUMA_NO_NODE);
 EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_node_map);
+
+/* which logical CPUs are on which nodes */
+cpumask_t *node_to_cpumask_map;
+EXPORT_SYMBOL(node_to_cpumask_map);
+
+/* setup node_to_cpumask_map */
+static void __init setup_node_to_cpumask_map(void);
+
+#else
+static inline void setup_node_to_cpumask_map(void) { }
 #endif
 
 #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
@@ -139,11 +149,15 @@ void __init setup_per_cpu_areas(void)
 	}
 
 	nr_cpu_ids = highest_cpu + 1;
-	printk(KERN_DEBUG "NR_CPUS: %d, nr_cpu_ids: %d\n", NR_CPUS, nr_cpu_ids);
+	printk(KERN_DEBUG "NR_CPUS: %d, nr_cpu_ids: %d, nr_node_ids %d\n",
+		NR_CPUS, nr_cpu_ids, nr_node_ids);
 
 	/* Setup percpu data maps */
 	setup_per_cpu_maps();
 
+	/* Setup node to cpumask map */
+	setup_node_to_cpumask_map();
+
 	/* Setup cpumask_of_cpu map */
 	setup_cpumask_of_cpu();
 }
@@ -151,6 +165,35 @@ void __init setup_per_cpu_areas(void)
 #endif
 
 #ifdef X86_64_NUMA
+
+/*
+ * Allocate node_to_cpumask_map based on number of available nodes
+ * Requires node_possible_map to be valid.
+ *
+ * Note: node_to_cpumask() is not valid until after this is done.
+ */
+static void __init setup_node_to_cpumask_map(void)
+{
+	unsigned int node, num = 0;
+	cpumask_t *map;
+
+	/* setup nr_node_ids if not done yet */
+	if (nr_node_ids == MAX_NUMNODES) {
+		for_each_node_mask(node, node_possible_map)
+			num = node;
+		nr_node_ids = num + 1;
+	}
+
+	/* allocate the map */
+	map = alloc_bootmem_low(nr_node_ids * sizeof(cpumask_t));
+
+	Dprintk(KERN_DEBUG "Node to cpumask map at %p for %d nodes\n",
+		map, nr_node_ids);
+
+	/* node_to_cpumask() will now work */
+	node_to_cpumask_map = map;
+}
+
 void __cpuinit numa_set_node(int cpu, int node)
 {
 	int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
@@ -173,6 +216,8 @@ void __cpuinit numa_clear_node(int cpu)
 	numa_set_node(cpu, NUMA_NO_NODE);
 }
 
+#ifndef CONFIG_DEBUG_PER_CPU_MAPS
+
 void __cpuinit numa_add_cpu(int cpu)
 {
 	cpu_set(cpu, node_to_cpumask_map[early_cpu_to_node(cpu)]);
@@ -182,9 +227,44 @@ void __cpuinit numa_remove_cpu(int cpu)
 {
 	cpu_clear(cpu, node_to_cpumask_map[cpu_to_node(cpu)]);
 }
-#endif /* CONFIG_NUMA */
 
-#if defined(CONFIG_DEBUG_PER_CPU_MAPS) && defined(CONFIG_X86_64)
+#else /* CONFIG_DEBUG_PER_CPU_MAPS */
+
+/*
+ * --------- debug versions of the numa functions ---------
+ */
+static void __cpuinit numa_set_cpumask(int cpu, int enable)
+{
+	int node = cpu_to_node(cpu);
+	cpumask_t *mask;
+	char buf[64];
+
+	if (node_to_cpumask_map == NULL) {
+		printk(KERN_ERR "node_to_cpumask_map NULL\n");
+		dump_stack();
+		return;
+	}
+
+	mask = &node_to_cpumask_map[node];
+	if (enable)
+		cpu_set(cpu, *mask);
+	else
+		cpu_clear(cpu, *mask);
+
+	cpulist_scnprintf(buf, sizeof(buf), *mask);
+	printk(KERN_DEBUG "%s cpu %d node %d: mask now %s\n",
+		enable? "numa_add_cpu":"numa_remove_cpu", cpu, node, buf);
+ }
+
+void __cpuinit numa_add_cpu(int cpu)
+{
+	numa_set_cpumask(cpu, 1);
+}
+
+void __cpuinit numa_remove_cpu(int cpu)
+{
+	numa_set_cpumask(cpu, 0);
+}
 
 int cpu_to_node(int cpu)
 {
@@ -198,6 +278,10 @@ int cpu_to_node(int cpu)
 }
 EXPORT_SYMBOL(cpu_to_node);
 
+/*
+ * Same function as cpu_to_node() but used if called before the
+ * per_cpu areas are setup.
+ */
 int early_cpu_to_node(int cpu)
 {
 	if (early_per_cpu_ptr(x86_cpu_to_node_map))
@@ -206,9 +290,47 @@ int early_cpu_to_node(int cpu)
 	if (!per_cpu_offset(cpu)) {
 		printk(KERN_WARNING
 			"early_cpu_to_node(%d): no per_cpu area!\n", cpu);
-			dump_stack();
+		dump_stack();
 		return NUMA_NO_NODE;
 	}
 	return per_cpu(x86_cpu_to_node_map, cpu);
 }
-#endif
+
+/*
+ * Returns a pointer to the bitmask of CPUs on Node 'node'.
+ */
+cpumask_t *_node_to_cpumask_ptr(int node)
+{
+	if (node_to_cpumask_map == NULL) {
+		printk(KERN_WARNING
+			"_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
+			node);
+		dump_stack();
+		return &cpu_online_map;
+	}
+	return &node_to_cpumask_map[node];
+}
+EXPORT_SYMBOL(_node_to_cpumask_ptr);
+
+/*
+ * Returns a bitmask of CPUs on Node 'node'.
+ */
+cpumask_t node_to_cpumask(int node)
+{
+	if (node_to_cpumask_map == NULL) {
+		printk(KERN_WARNING
+			"node_to_cpumask(%d): no node_to_cpumask_map!\n", node);
+		dump_stack();
+		return cpu_online_map;
+	}
+	return node_to_cpumask_map[node];
+}
+EXPORT_SYMBOL(node_to_cpumask);
+
+/*
+ * --------- end of debug versions of the numa functions ---------
+ */
+
+#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
+
+#endif /* X86_64_NUMA */
--- linux-2.6.sched.orig/arch/x86/mm/numa_64.c
+++ linux-2.6.sched/arch/x86/mm/numa_64.c
@@ -35,9 +35,6 @@ s16 apicid_to_node[MAX_LOCAL_APIC] __cpu
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
 
-cpumask_t node_to_cpumask_map[MAX_NUMNODES] __read_mostly;
-EXPORT_SYMBOL(node_to_cpumask_map);
-
 int numa_off __initdata;
 unsigned long __initdata nodemap_addr;
 unsigned long __initdata nodemap_size;
@@ -561,9 +558,6 @@ void __init numa_initmem_init(unsigned l
 	node_set(0, node_possible_map);
 	for (i = 0; i < NR_CPUS; i++)
 		numa_set_node(i, 0);
-	/* cpumask_of_cpu() may not be available during early startup */
-	memset(&node_to_cpumask_map[0], 0, sizeof(node_to_cpumask_map[0]));
-	cpu_set(0, node_to_cpumask_map[0]);
 	e820_register_active_regions(0, start_pfn, end_pfn);
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT);
 }
--- linux-2.6.sched.orig/include/asm-x86/topology.h
+++ linux-2.6.sched/include/asm-x86/topology.h
@@ -47,10 +47,16 @@ static inline int cpu_to_node(int cpu)
 }
 #define early_cpu_to_node(cpu)	cpu_to_node(cpu)
 
+/* Returns a bitmask of CPUs on Node 'node'. */
+static inline cpumask_t node_to_cpumask(int node)
+{
+	return node_to_cpumask_map[node];
+}
+
 #else /* CONFIG_X86_64 */
 
 /* Mappings between node number and cpus on that node. */
-extern cpumask_t node_to_cpumask_map[];
+extern cpumask_t *node_to_cpumask_map;
 
 /* Mappings between logical cpu number and node number */
 DECLARE_EARLY_PER_CPU(int, x86_cpu_to_node_map);
@@ -94,7 +100,6 @@ static inline cpumask_t node_to_cpumask(
 }
 
 #endif /* !CONFIG_DEBUG_PER_CPU_MAPS */
-#endif /* CONFIG_X86_64 */
 
 /* Replace default node_to_cpumask_ptr with optimized version */
 #define node_to_cpumask_ptr(v, node)		\
@@ -103,12 +108,7 @@ static inline cpumask_t node_to_cpumask(
 #define node_to_cpumask_ptr_next(v, node)	\
 			   v = _node_to_cpumask_ptr(node)
 
-/* Returns the number of the first CPU on Node 'node'. */
-static inline int node_to_first_cpu(int node)
-{
-	node_to_cpumask_ptr(mask, node);
-	return first_cpu(*mask);
-}
+#endif /* CONFIG_X86_64 */
 
 /*
  * Returns the number of the node containing Node 'node'. This
@@ -195,6 +195,15 @@ static inline int node_to_first_cpu(int 
 
 #include <asm-generic/topology.h>
 
+#ifdef CONFIG_NUMA
+/* Returns the number of the first CPU on Node 'node'. */
+static inline int node_to_first_cpu(int node)
+{
+	node_to_cpumask_ptr(mask, node);
+	return first_cpu(*mask);
+}
+#endif
+
 extern cpumask_t cpu_coregroup_map(int cpu);
 
 #ifdef ENABLE_TOPO_DEFINES

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 05/11] sched: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c
  2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
                   ` (3 preceding siblings ...)
  2008-04-26  0:15 ` [PATCH 04/11] x86: remove the static 256k node_to_cpumask_map Mike Travis
@ 2008-04-26  0:15 ` Mike Travis
  2008-04-26  0:15 ` [PATCH 06/11] cpu: change some globals to statics in drivers/base/cpu.c v2 Mike Travis
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Mike Travis @ 2008-04-26  0:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel

[-- Attachment #1: mod_kernel_sched --]
[-- Type: text/plain, Size: 2779 bytes --]

  * Replace usages of MAX_NUMNODES with nr_node_ids in kernel/sched.c,
    where appropriate.  This saves some allocated space as well as many
    wasted cycles going through node entries that are non-existent.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git


Signed-off-by: Mike Travis <travis@sgi.com>
---
 kernel/sched.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- linux-2.6.sched.orig/kernel/sched.c
+++ linux-2.6.sched/kernel/sched.c
@@ -7056,9 +7056,9 @@ static int find_next_best_node(int node,
 
 	min_val = INT_MAX;
 
-	for (i = 0; i < MAX_NUMNODES; i++) {
+	for (i = 0; i < nr_node_ids; i++) {
 		/* Start at @node */
-		n = (node + i) % MAX_NUMNODES;
+		n = (node + i) % nr_node_ids;
 
 		if (!nr_cpus_node(n))
 			continue;
@@ -7252,7 +7252,7 @@ static void free_sched_groups(const cpum
 		if (!sched_group_nodes)
 			continue;
 
-		for (i = 0; i < MAX_NUMNODES; i++) {
+		for (i = 0; i < nr_node_ids; i++) {
 			struct sched_group *oldsg, *sg = sched_group_nodes[i];
 
 			*nodemask = node_to_cpumask(i);
@@ -7440,7 +7440,7 @@ static int __build_sched_domains(const c
 	/*
 	 * Allocate the per-node list of sched groups
 	 */
-	sched_group_nodes = kcalloc(MAX_NUMNODES, sizeof(struct sched_group *),
+	sched_group_nodes = kcalloc(nr_node_ids, sizeof(struct sched_group *),
 				    GFP_KERNEL);
 	if (!sched_group_nodes) {
 		printk(KERN_WARNING "Can not alloc sched group node list\n");
@@ -7584,7 +7584,7 @@ static int __build_sched_domains(const c
 #endif
 
 	/* Set up physical groups */
-	for (i = 0; i < MAX_NUMNODES; i++) {
+	for (i = 0; i < nr_node_ids; i++) {
 		SCHED_CPUMASK_VAR(nodemask, allmasks);
 		SCHED_CPUMASK_VAR(send_covered, allmasks);
 
@@ -7608,7 +7608,7 @@ static int __build_sched_domains(const c
 					send_covered, tmpmask);
 	}
 
-	for (i = 0; i < MAX_NUMNODES; i++) {
+	for (i = 0; i < nr_node_ids; i++) {
 		/* Set up node groups */
 		struct sched_group *sg, *prev;
 		SCHED_CPUMASK_VAR(nodemask, allmasks);
@@ -7647,9 +7647,9 @@ static int __build_sched_domains(const c
 		cpus_or(*covered, *covered, *nodemask);
 		prev = sg;
 
-		for (j = 0; j < MAX_NUMNODES; j++) {
+		for (j = 0; j < nr_node_ids; j++) {
 			SCHED_CPUMASK_VAR(notcovered, allmasks);
-			int n = (i + j) % MAX_NUMNODES;
+			int n = (i + j) % nr_node_ids;
 			node_to_cpumask_ptr(pnodemask, n);
 
 			cpus_complement(*notcovered, *covered);
@@ -7702,7 +7702,7 @@ static int __build_sched_domains(const c
 	}
 
 #ifdef CONFIG_NUMA
-	for (i = 0; i < MAX_NUMNODES; i++)
+	for (i = 0; i < nr_node_ids; i++)
 		init_numa_sched_groups_power(sched_group_nodes[i]);
 
 	if (sd_allnodes) {

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 06/11] cpu: change some globals to statics in drivers/base/cpu.c v2
  2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
                   ` (4 preceding siblings ...)
  2008-04-26  0:15 ` [PATCH 05/11] sched: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c Mike Travis
@ 2008-04-26  0:15 ` Mike Travis
  2008-04-26  0:15 ` [PATCH 07/11] x86: remove static boot_cpu_pda array Mike Travis
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Mike Travis @ 2008-04-26  0:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel,
	Adrian Bunk

[-- Attachment #1: mk-static --]
[-- Type: text/plain, Size: 1195 bytes --]

This patch makes the following needlessly global code static:
- attr_online_map
- attr_possible_map
- attr_present_map

- cpu_state_attr [v2]

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mike Travis <travis@sgi.com>

---
v2: changed cpu_state_attr to static

8ab09fe4313384faa5e3577d99845cedccb245bc diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 6fe4174..da6f4ae 100644
---
 drivers/base/cpu.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-2.6.sched.orig/drivers/base/cpu.c
+++ linux-2.6.sched/drivers/base/cpu.c
@@ -119,7 +119,7 @@ static ssize_t print_cpus_##type(struct 
 {									\
 	return print_cpus_map(buf, &cpu_##type##_map);			\
 }									\
-struct sysdev_class_attribute attr_##type##_map = 			\
+static struct sysdev_class_attribute attr_##type##_map = 		\
 	_SYSDEV_CLASS_ATTR(type, 0444, print_cpus_##type, NULL)
 
 print_cpus_func(online);
@@ -127,7 +127,7 @@ print_cpus_func(possible);
 print_cpus_func(present);
 print_cpus_func(system);
 
-struct sysdev_class_attribute *cpu_state_attr[] = {
+static struct sysdev_class_attribute *cpu_state_attr[] = {
 	&attr_online_map,
 	&attr_possible_map,
 	&attr_present_map,

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 07/11] x86: remove static boot_cpu_pda array
  2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
                   ` (5 preceding siblings ...)
  2008-04-26  0:15 ` [PATCH 06/11] cpu: change some globals to statics in drivers/base/cpu.c v2 Mike Travis
@ 2008-04-26  0:15 ` Mike Travis
  2008-04-26  0:15 ` [PATCH 08/11] x86: Add performance variants of cpumask operators Mike Travis
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Mike Travis @ 2008-04-26  0:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel

[-- Attachment #1: rm-boot_pdas --]
[-- Type: text/plain, Size: 9459 bytes --]

  * Remove the boot_cpu_pda array and pointer table from the data section.
    Allocate the pointer table during init.  If CONFIG_HOTPLUG_CPU is set
    then also allocate the cpu_pda array during init.  In either case,
    allocate the cpu_pda from node local memory when the cpu is started
    in do_boot_cpu().

    This removes 512k + 32k bytes from the data section.


For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git


Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/head64.c  |   26 ++++++++++++--
 arch/x86/kernel/setup.c   |   83 +++++++++++++++++++++++++++++++++++++---------
 arch/x86/kernel/setup64.c |    8 ++--
 arch/x86/kernel/smpboot.c |   54 ++++++++++++++++++++++-------
 include/asm-x86/pda.h     |    6 +--
 5 files changed, 139 insertions(+), 38 deletions(-)

--- linux-2.6.sched.orig/arch/x86/kernel/head64.c
+++ linux-2.6.sched/arch/x86/kernel/head64.c
@@ -25,6 +25,24 @@
 #include <asm/e820.h>
 #include <asm/bios_ebda.h>
 
+/* boot cpu pda */
+static struct x8664_pda _boot_cpu_pda __read_mostly;
+
+#ifdef CONFIG_SMP
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+/*
+ * We install an empty cpu_pda pointer table to trap references before
+ * the actual cpu_pda pointer table is created in setup_cpu_pda_map().
+ */
+static struct x8664_pda *__cpu_pda[NR_CPUS] __initdata;
+#else
+static struct x8664_pda *__cpu_pda[1] __read_mostly;
+#endif
+
+#else /* !CONFIG_SMP (NR_CPUS will be 1) */
+static struct x8664_pda *__cpu_pda[NR_CPUS] __read_mostly;
+#endif
+
 static void __init zap_identity_mappings(void)
 {
 	pgd_t *pgd = pgd_offset_k(0UL);
@@ -156,10 +174,12 @@ void __init x86_64_start_kernel(char * r
 
 	early_printk("Kernel alive\n");
 
- 	for (i = 0; i < NR_CPUS; i++)
- 		cpu_pda(i) = &boot_cpu_pda[i];
-
+	_cpu_pda = __cpu_pda;
+	cpu_pda(0) = &_boot_cpu_pda;
 	pda_init(0);
+
+	early_printk("Kernel really alive\n");
+
 	copy_bootdata(__va(real_mode_data));
 
 	reserve_early(__pa_symbol(&_text), __pa_symbol(&_end), "TEXT DATA BSS");
--- linux-2.6.sched.orig/arch/x86/kernel/setup.c
+++ linux-2.6.sched/arch/x86/kernel/setup.c
@@ -100,6 +100,60 @@ static inline void setup_cpumask_of_cpu(
  */
 unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
 EXPORT_SYMBOL(__per_cpu_offset);
+static inline void setup_cpu_pda_map(void) { }
+
+#elif !defined(CONFIG_SMP)
+static inline void setup_cpu_pda_map(void) { }
+
+#else /* CONFIG_SMP && CONFIG_X86_64 */
+
+/*
+ * Allocate cpu_pda pointer table via alloc_bootmem
+ * and if CONFIG_HOTPLUG_CPU also allocate a static cpu_pda array.
+ *
+ * Note that the boot cpu's pda is left in place.
+ * Note also that cpu_pda(0) may be the only valid reference.
+ */
+static void __init setup_cpu_pda_map(void)
+{
+	char *pda;
+	struct x8664_pda **new_cpu_pda;
+	unsigned long size;
+	int cpu;
+
+#ifdef CONFIG_HOTPLUG_CPU
+	size = roundup(sizeof(struct x8664_pda), cache_line_size());
+#else
+	/* no need to preallocate the cpu_pda array */
+	size = 0;
+#endif
+
+	/* allocate cpu_pda array and pointer table */
+	{
+		unsigned long asize = size * (nr_cpu_ids - 1);
+		unsigned long tsize = nr_cpu_ids * sizeof(void *);
+
+		pda = alloc_bootmem(asize + tsize);
+		new_cpu_pda = (struct x8664_pda **)(pda + asize);
+	}
+
+#ifdef CONFIG_HOTPLUG_CPU
+	/* initialize pointer table to static pda's */
+	for_each_possible_cpu(cpu) {
+		if (cpu == 0) {
+			/* leave boot cpu pda in place */
+			new_cpu_pda[0] = cpu_pda(0);
+			continue;
+		}
+		new_cpu_pda[cpu] = (struct x8664_pda *)pda;
+		new_cpu_pda[cpu]->in_bootmem = 1;
+		pda += size;
+	}
+#endif
+
+	/* point to new pointer table */
+	_cpu_pda = new_cpu_pda;
+}
 #endif
 
 /*
@@ -109,46 +163,43 @@ EXPORT_SYMBOL(__per_cpu_offset);
  */
 void __init setup_per_cpu_areas(void)
 {
-	int i, highest_cpu = 0;
-	unsigned long size;
+	ssize_t size = PERCPU_ENOUGH_ROOM;
+	char *ptr;
+	int cpu;
 
 #ifdef CONFIG_HOTPLUG_CPU
 	prefill_possible_map();
+#else
+	nr_cpu_ids = num_processors;
 #endif
 
+	/* Setup cpu_pda map */
+	setup_cpu_pda_map();
+
 	/* Copy section for each CPU (we discard the original) */
 	size = PERCPU_ENOUGH_ROOM;
 	printk(KERN_INFO "PERCPU: Allocating %lu bytes of per cpu data\n",
 			  size);
 
-	for_each_possible_cpu(i) {
-		char *ptr;
+	for_each_possible_cpu(cpu) {
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 		ptr = alloc_bootmem_pages(size);
 #else
-		int node = early_cpu_to_node(i);
+		int node = early_cpu_to_node(cpu);
 		if (!node_online(node) || !NODE_DATA(node)) {
 			ptr = alloc_bootmem_pages(size);
 			printk(KERN_INFO
 			       "cpu %d has no node %d or node-local memory\n",
-				i, node);
+				cpu, node);
 		}
 		else
 			ptr = alloc_bootmem_pages_node(NODE_DATA(node), size);
 #endif
-		if (!ptr)
-			panic("Cannot allocate cpu data for CPU %d\n", i);
-#ifdef CONFIG_X86_64
-		cpu_pda(i)->data_offset = ptr - __per_cpu_start;
-#else
-		__per_cpu_offset[i] = ptr - __per_cpu_start;
-#endif
+		per_cpu_offset(cpu) = ptr - __per_cpu_start;
 		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
 
-		highest_cpu = i;
 	}
 
-	nr_cpu_ids = highest_cpu + 1;
 	printk(KERN_DEBUG "NR_CPUS: %d, nr_cpu_ids: %d, nr_node_ids %d\n",
 		NR_CPUS, nr_cpu_ids, nr_node_ids);
 
@@ -198,7 +249,7 @@ void __cpuinit numa_set_node(int cpu, in
 {
 	int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
 
-	if (node != NUMA_NO_NODE)
+	if (cpu_pda(cpu) && node != NUMA_NO_NODE)
 		cpu_pda(cpu)->nodenumber = node;
 
 	if (cpu_to_node_map)
--- linux-2.6.sched.orig/arch/x86/kernel/setup64.c
+++ linux-2.6.sched/arch/x86/kernel/setup64.c
@@ -12,6 +12,7 @@
 #include <linux/bitops.h>
 #include <linux/module.h>
 #include <linux/kgdb.h>
+#include <linux/topology.h>
 #include <asm/pda.h>
 #include <asm/pgtable.h>
 #include <asm/processor.h>
@@ -34,9 +35,8 @@ struct boot_params boot_params;
 
 cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE;
 
-struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly;
+struct x8664_pda **_cpu_pda __read_mostly;
 EXPORT_SYMBOL(_cpu_pda);
-struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned;
 
 struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table };
 
@@ -114,8 +114,10 @@ void pda_init(int cpu)
 			__get_free_pages(GFP_ATOMIC, IRQSTACK_ORDER);
 		if (!pda->irqstackptr)
 			panic("cannot allocate irqstack for cpu %d", cpu); 
-	}
 
+		if (pda->nodenumber == 0 && cpu_to_node(cpu) != NUMA_NO_NODE)
+			pda->nodenumber = cpu_to_node(cpu);
+	}
 
 	pda->irqstackptr += IRQSTACKSIZE-64;
 } 
--- linux-2.6.sched.orig/arch/x86/kernel/smpboot.c
+++ linux-2.6.sched/arch/x86/kernel/smpboot.c
@@ -809,6 +809,39 @@ static void __cpuinit do_fork_idle(struc
 	complete(&c_idle->done);
 }
 
+/*
+ * Allocate node local memory for the AP pda.
+ *
+ * Must be called after the _cpu_pda pointer table is initialized.
+ */
+static int __cpuinit get_local_pda(int cpu)
+{
+	struct x8664_pda *newpda;
+	unsigned long size = sizeof(struct x8664_pda);
+	int node = cpu_to_node(cpu);
+
+	if (cpu_pda(cpu) && !cpu_pda(cpu)->in_bootmem)
+		return 0;
+
+	newpda = kmalloc_node(size, GFP_ATOMIC, node);
+	if (!newpda) {
+		printk(KERN_ERR "Could not allocate node local PDA "
+			"for CPU %d on node %d\n", cpu, node);
+
+		if (cpu_pda(cpu))
+			return 0;	/* have a usable pda */
+		else
+			return -1;
+	}
+
+	if (cpu_pda(cpu))
+		memcpy(newpda, cpu_pda(cpu), size);
+
+	cpu_pda(cpu) = newpda;
+	cpu_pda(cpu)->in_bootmem = 0;
+	return 0;
+}
+
 static int __cpuinit do_boot_cpu(int apicid, int cpu)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
@@ -834,19 +867,10 @@ static int __cpuinit do_boot_cpu(int api
 	}
 
 	/* Allocate node local memory for AP pdas */
-	if (cpu_pda(cpu) == &boot_cpu_pda[cpu]) {
-		struct x8664_pda *newpda, *pda;
-		int node = cpu_to_node(cpu);
-		pda = cpu_pda(cpu);
-		newpda = kmalloc_node(sizeof(struct x8664_pda), GFP_ATOMIC,
-				      node);
-		if (newpda) {
-			memcpy(newpda, pda, sizeof(struct x8664_pda));
-			cpu_pda(cpu) = newpda;
-		} else
-			printk(KERN_ERR
-		"Could not allocate node local PDA for CPU %d on node %d\n",
-				cpu, node);
+	if (cpu > 0 ) {
+		boot_error = get_local_pda(cpu);
+		if (boot_error)
+			goto restore_state; /* can't get memory, can't start cpu */
 	}
 #endif
 
@@ -965,6 +989,8 @@ do_rest:
 		}
 	}
 
+restore_state:
+
 	if (boot_error) {
 		/* Try to put things back the way they were before ... */
 		unmap_cpu_to_logical_apicid(cpu);
@@ -1338,6 +1364,8 @@ __init void prefill_possible_map(void)
 
 	for (i = 0; i < possible; i++)
 		cpu_set(i, cpu_possible_map);
+
+	nr_cpu_ids = possible;
 }
 
 static void __ref remove_cpu_from_maps(int cpu)
--- linux-2.6.sched.orig/include/asm-x86/pda.h
+++ linux-2.6.sched/include/asm-x86/pda.h
@@ -20,7 +20,8 @@ struct x8664_pda {
 					/* gcc-ABI: this canary MUST be at
 					   offset 40!!! */
 	char *irqstackptr;
-	int nodenumber;			/* number of current node */
+	short nodenumber;		/* number of current node (32k max) */
+	short in_bootmem;		/* pda lives in bootmem */
 	unsigned int __softirq_pending;
 	unsigned int __nmi_count;	/* number of NMI on this CPUs */
 	short mmu_state;
@@ -36,8 +37,7 @@ struct x8664_pda {
 	unsigned irq_spurious_count;
 } ____cacheline_aligned_in_smp;
 
-extern struct x8664_pda *_cpu_pda[];
-extern struct x8664_pda boot_cpu_pda[];
+extern struct x8664_pda **_cpu_pda;
 extern void pda_init(int);
 
 #define cpu_pda(i) (_cpu_pda[i])

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 08/11] x86: Add performance variants of cpumask operators
  2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
                   ` (6 preceding siblings ...)
  2008-04-26  0:15 ` [PATCH 07/11] x86: remove static boot_cpu_pda array Mike Travis
@ 2008-04-26  0:15 ` Mike Travis
  2008-04-26  0:15 ` [PATCH 09/11] x86: Use performance variant for_each_cpu_mask_nr Mike Travis
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Mike Travis @ 2008-04-26  0:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel,
	Paul Jackson, Christoph Lameter

[-- Attachment #1: add-nr_cpu_ids --]
[-- Type: text/plain, Size: 8667 bytes --]

  * Increase performance for systems with large count NR_CPUS by limiting
    the range of the cpumask operators that loop over the bits in a cpumask_t
    variable.  This removes a large amount of wasted cpu cycles.

  * Add performance variants of the cpumask operators:

    int cpus_weight_nr(mask)	     Same using nr_cpu_ids instead of NR_CPUS
    int first_cpu_nr(mask)	     Number lowest set bit, or nr_cpu_ids
    int next_cpu_nr(cpu, mask)	     Next cpu past 'cpu', or nr_cpu_ids
    for_each_cpu_mask_nr(cpu, mask)  for-loop cpu over mask using nr_cpu_ids

  * Modify following to use performance variants:

    #define num_online_cpus()	cpus_weight_nr(cpu_online_map)
    #define num_possible_cpus()	cpus_weight_nr(cpu_possible_map)
    #define num_present_cpus()	cpus_weight_nr(cpu_present_map)

    #define for_each_possible_cpu(cpu) for_each_cpu_mask_nr((cpu), ...)
    #define for_each_online_cpu(cpu)   for_each_cpu_mask_nr((cpu), ...)
    #define for_each_present_cpu(cpu)  for_each_cpu_mask_nr((cpu), ...)

  * Comment added to include/linux/cpumask.h:

    Note: The alternate operations with the suffix "_nr" are used
	  to limit the range of the loop to nr_cpu_ids instead of
	  NR_CPUS when NR_CPUS > 64 for performance reasons.
	  If NR_CPUS is <= 64 then most assembler bitmask
	  operators execute faster with a constant range, so
	  the operator will continue to use NR_CPUS.

	  Another consideration is that nr_cpu_ids is initialized
	  to NR_CPUS and isn't lowered until the possible cpus are
	  discovered (including any disabled cpus).  So early uses
	  will span the entire range of NR_CPUS.

    (The net effect is that for systems with 64 or less CPU's there are no
     functional changes.)

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git


Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
---
 include/linux/cpumask.h |   92 ++++++++++++++++++++++++++++++++----------------
 lib/cpumask.c           |    9 ++++
 2 files changed, 71 insertions(+), 30 deletions(-)

--- linux-2.6.sched.orig/include/linux/cpumask.h
+++ linux-2.6.sched/include/linux/cpumask.h
@@ -15,6 +15,20 @@
  * For details of cpu_remap(), see bitmap_bitremap in lib/bitmap.c
  * For details of cpus_remap(), see bitmap_remap in lib/bitmap.c.
  *
+ * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
+ * Note: The alternate operations with the suffix "_nr" are used
+ *       to limit the range of the loop to nr_cpu_ids instead of
+ *       NR_CPUS when NR_CPUS > 64 for performance reasons.
+ *       If NR_CPUS is <= 64 then most assembler bitmask
+ *       operators execute faster with a constant range, so
+ *       the operator will continue to use NR_CPUS.
+ *
+ *       Another consideration is that nr_cpu_ids is initialized
+ *       to NR_CPUS and isn't lowered until the possible cpus are
+ *       discovered (including any disabled cpus).  So early uses
+ *       will span the entire range of NR_CPUS.
+ * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
+ *
  * The available cpumask operations are:
  *
  * void cpu_set(cpu, mask)		turn on bit 'cpu' in mask
@@ -36,12 +50,14 @@
  * int cpus_empty(mask)			Is mask empty (no bits sets)?
  * int cpus_full(mask)			Is mask full (all bits sets)?
  * int cpus_weight(mask)		Hamming weigh - number of set bits
+ * int cpus_weight_nr(mask)		Same using nr_cpu_ids instead of NR_CPUS
  *
  * void cpus_shift_right(dst, src, n)	Shift right
  * void cpus_shift_left(dst, src, n)	Shift left
  *
  * int first_cpu(mask)			Number lowest set bit, or NR_CPUS
  * int next_cpu(cpu, mask)		Next cpu past 'cpu', or NR_CPUS
+ * int next_cpu_nr(cpu, mask)		Next cpu past 'cpu', or nr_cpu_ids
  *
  * cpumask_t cpumask_of_cpu(cpu)	Return cpumask with bit 'cpu' set
  * CPU_MASK_ALL				Initializer - all bits set
@@ -55,7 +71,8 @@
  * int cpu_remap(oldbit, old, new)	newbit = map(old, new)(oldbit)
  * int cpus_remap(dst, src, old, new)	*dst = map(old, new)(src)
  *
- * for_each_cpu_mask(cpu, mask)		for-loop cpu over mask
+ * for_each_cpu_mask(cpu, mask)		for-loop cpu over mask using NR_CPUS
+ * for_each_cpu_mask_nr(cpu, mask)	for-loop cpu over mask using nr_cpu_ids
  *
  * int num_online_cpus()		Number of online CPUs
  * int num_possible_cpus()		Number of all possible CPUs
@@ -212,15 +229,6 @@ static inline void __cpus_shift_left(cpu
 	bitmap_shift_left(dstp->bits, srcp->bits, n, nbits);
 }
 
-#ifdef CONFIG_SMP
-int __first_cpu(const cpumask_t *srcp);
-#define first_cpu(src) __first_cpu(&(src))
-int __next_cpu(int n, const cpumask_t *srcp);
-#define next_cpu(n, src) __next_cpu((n), &(src))
-#else
-#define first_cpu(src)		({ (void)(src); 0; })
-#define next_cpu(n, src)	({ (void)(src); 1; })
-#endif
 
 #ifdef CONFIG_HAVE_CPUMASK_OF_CPU_MAP
 extern cpumask_t *cpumask_of_cpu_map;
@@ -330,15 +338,48 @@ static inline void __cpus_remap(cpumask_
 	bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits);
 }
 
-#if NR_CPUS > 1
+#if NR_CPUS == 1
+
+#define nr_cpu_ids		1
+#define first_cpu(src)		({ (void)(src); 0; })
+#define next_cpu(n, src)	({ (void)(src); 1; })
+#define any_online_cpu(mask)	0
+#define for_each_cpu_mask(cpu, mask)	\
+	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
+
+#else /* NR_CPUS > 1 */
+
+extern int nr_cpu_ids;
+int __first_cpu(const cpumask_t *srcp);
+int __next_cpu(int n, const cpumask_t *srcp);
+int __any_online_cpu(const cpumask_t *mask);
+
+#define first_cpu(src)		__first_cpu(&(src))
+#define next_cpu(n, src)	__next_cpu((n), &(src))
+#define any_online_cpu(mask) __any_online_cpu(&(mask))
 #define for_each_cpu_mask(cpu, mask)		\
 	for ((cpu) = first_cpu(mask);		\
 		(cpu) < NR_CPUS;		\
 		(cpu) = next_cpu((cpu), (mask)))
-#else /* NR_CPUS == 1 */
-#define for_each_cpu_mask(cpu, mask)		\
-	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
-#endif /* NR_CPUS */
+#endif
+
+#if NR_CPUS <= 64
+
+#define next_cpu_nr(n, src)		next_cpu(n, src)
+#define cpus_weight_nr(cpumask)		cpus_weight(cpumask)
+#define for_each_cpu_mask_nr(cpu, mask)	for_each_cpu_mask(cpu, mask)
+
+#else /* NR_CPUS > 64 */
+
+int __next_cpu_nr(int n, const cpumask_t *srcp);
+#define next_cpu_nr(n, src)	__next_cpu_nr((n), &(src))
+#define cpus_weight_nr(cpumask)	__cpus_weight(&(cpumask), nr_cpu_ids)
+#define for_each_cpu_mask_nr(cpu, mask)		\
+	for ((cpu) = first_cpu(mask);		\
+		(cpu) < nr_cpu_ids;		\
+		(cpu) = next_cpu_nr((cpu), (mask)))
+
+#endif /* NR_CPUS > 64 */
 
 /*
  * The following particular system cpumasks and operations manage
@@ -402,9 +443,9 @@ extern cpumask_t cpu_present_map;
 extern cpumask_t cpu_system_map;
 
 #if NR_CPUS > 1
-#define num_online_cpus()	cpus_weight(cpu_online_map)
-#define num_possible_cpus()	cpus_weight(cpu_possible_map)
-#define num_present_cpus()	cpus_weight(cpu_present_map)
+#define num_online_cpus()	cpus_weight_nr(cpu_online_map)
+#define num_possible_cpus()	cpus_weight_nr(cpu_possible_map)
+#define num_present_cpus()	cpus_weight_nr(cpu_present_map)
 #define cpu_online(cpu)		cpu_isset((cpu), cpu_online_map)
 #define cpu_possible(cpu)	cpu_isset((cpu), cpu_possible_map)
 #define cpu_present(cpu)	cpu_isset((cpu), cpu_present_map)
@@ -423,17 +464,8 @@ extern int cpus_match_system(cpumask_t m
 
 #define cpu_is_offline(cpu)	unlikely(!cpu_online(cpu))
 
-#ifdef CONFIG_SMP
-extern int nr_cpu_ids;
-#define any_online_cpu(mask) __any_online_cpu(&(mask))
-int __any_online_cpu(const cpumask_t *mask);
-#else
-#define nr_cpu_ids			1
-#define any_online_cpu(mask)		0
-#endif
-
-#define for_each_possible_cpu(cpu)  for_each_cpu_mask((cpu), cpu_possible_map)
-#define for_each_online_cpu(cpu)  for_each_cpu_mask((cpu), cpu_online_map)
-#define for_each_present_cpu(cpu) for_each_cpu_mask((cpu), cpu_present_map)
+#define for_each_possible_cpu(cpu) for_each_cpu_mask_nr((cpu), cpu_possible_map)
+#define for_each_online_cpu(cpu)   for_each_cpu_mask_nr((cpu), cpu_online_map)
+#define for_each_present_cpu(cpu)  for_each_cpu_mask_nr((cpu), cpu_present_map)
 
 #endif /* __LINUX_CPUMASK_H */
--- linux-2.6.sched.orig/lib/cpumask.c
+++ linux-2.6.sched/lib/cpumask.c
@@ -15,6 +15,15 @@ int __next_cpu(int n, const cpumask_t *s
 }
 EXPORT_SYMBOL(__next_cpu);
 
+#if NR_CPUS > 64
+int __next_cpu_nr(int n, const cpumask_t *srcp)
+{
+	return min_t(int, nr_cpu_ids,
+				find_next_bit(srcp->bits, nr_cpu_ids, n+1));
+}
+EXPORT_SYMBOL(__next_cpu_nr);
+#endif
+
 int __any_online_cpu(const cpumask_t *mask)
 {
 	int cpu;

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 09/11] x86: Use performance variant for_each_cpu_mask_nr
  2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
                   ` (7 preceding siblings ...)
  2008-04-26  0:15 ` [PATCH 08/11] x86: Add performance variants of cpumask operators Mike Travis
@ 2008-04-26  0:15 ` Mike Travis
  2008-04-26  0:15 ` [PATCH 10/11] x86: Use performance variant next_cpu_nr Mike Travis
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Mike Travis @ 2008-04-26  0:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel

[-- Attachment #1: use-for_each_cpu_mask_nr --]
[-- Type: text/plain, Size: 31353 bytes --]

  * Change references from for_each_cpu_mask to for_each_cpu_mask_nr
    where appropriate.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git


Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c       |    6 +--
 arch/x86/kernel/cpu/cpufreq/p4-clockmod.c        |    6 +--
 arch/x86/kernel/cpu/cpufreq/powernow-k8.c        |    8 ++--
 arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c |   10 ++---
 arch/x86/kernel/cpu/cpufreq/speedstep-ich.c      |    4 +-
 arch/x86/kernel/cpu/intel_cacheinfo.c            |    2 -
 arch/x86/kernel/cpu/mcheck/mce_amd_64.c          |    4 +-
 arch/x86/kernel/io_apic_64.c                     |    8 ++--
 arch/x86/kernel/smpboot.c                        |    8 ++--
 arch/x86/xen/smp.c                               |    4 +-
 drivers/acpi/processor_throttling.c              |    6 +--
 drivers/cpufreq/cpufreq.c                        |   14 +++----
 drivers/cpufreq/cpufreq_conservative.c           |    2 -
 drivers/cpufreq/cpufreq_ondemand.c               |    4 +-
 include/asm-x86/ipi.h                            |    2 -
 kernel/cpu.c                                     |    2 -
 kernel/rcuclassic.c                              |    2 -
 kernel/rcupreempt.c                              |   10 ++---
 kernel/sched.c                                   |   44 +++++++++++------------
 kernel/sched_fair.c                              |    2 -
 kernel/sched_rt.c                                |    6 +--
 kernel/taskstats.c                               |    4 +-
 kernel/trace/trace.c                             |    8 ++--
 kernel/workqueue.c                               |    6 +--
 mm/allocpercpu.c                                 |    4 +-
 mm/vmstat.c                                      |    2 -
 net/core/dev.c                                   |    4 +-
 net/iucv/iucv.c                                  |    2 -
 28 files changed, 92 insertions(+), 92 deletions(-)

--- linux-2.6.sched.orig/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ linux-2.6.sched/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -202,7 +202,7 @@ static void drv_write(struct drv_cmd *cm
 	cpumask_t saved_mask = current->cpus_allowed;
 	unsigned int i;
 
-	for_each_cpu_mask(i, cmd->mask) {
+	for_each_cpu_mask_nr(i, cmd->mask) {
 		set_cpus_allowed_ptr(current, &cpumask_of_cpu(i));
 		do_drv_write(cmd);
 	}
@@ -441,7 +441,7 @@ static int acpi_cpufreq_target(struct cp
 
 	freqs.old = perf->states[perf->state].core_frequency * 1000;
 	freqs.new = data->freq_table[next_state].frequency;
-	for_each_cpu_mask(i, cmd.mask) {
+	for_each_cpu_mask_nr(i, cmd.mask) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
 	}
@@ -456,7 +456,7 @@ static int acpi_cpufreq_target(struct cp
 		}
 	}
 
-	for_each_cpu_mask(i, cmd.mask) {
+	for_each_cpu_mask_nr(i, cmd.mask) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 	}
--- linux-2.6.sched.orig/arch/x86/kernel/cpu/cpufreq/p4-clockmod.c
+++ linux-2.6.sched/arch/x86/kernel/cpu/cpufreq/p4-clockmod.c
@@ -122,7 +122,7 @@ static int cpufreq_p4_target(struct cpuf
 		return 0;
 
 	/* notifiers */
-	for_each_cpu_mask(i, policy->cpus) {
+	for_each_cpu_mask_nr(i, policy->cpus) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
 	}
@@ -130,11 +130,11 @@ static int cpufreq_p4_target(struct cpuf
 	/* run on each logical CPU, see section 13.15.3 of IA32 Intel Architecture Software
 	 * Developer's Manual, Volume 3
 	 */
-	for_each_cpu_mask(i, policy->cpus)
+	for_each_cpu_mask_nr(i, policy->cpus)
 		cpufreq_p4_setdc(i, p4clockmod_table[newstate].index);
 
 	/* notifiers */
-	for_each_cpu_mask(i, policy->cpus) {
+	for_each_cpu_mask_nr(i, policy->cpus) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 	}
--- linux-2.6.sched.orig/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
+++ linux-2.6.sched/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
@@ -966,7 +966,7 @@ static int transition_frequency_fidvid(s
 	freqs.old = find_khz_freq_from_fid(data->currfid);
 	freqs.new = find_khz_freq_from_fid(fid);
 
-	for_each_cpu_mask(i, *(data->available_cores)) {
+	for_each_cpu_mask_nr(i, *(data->available_cores)) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
 	}
@@ -974,7 +974,7 @@ static int transition_frequency_fidvid(s
 	res = transition_fid_vid(data, fid, vid);
 	freqs.new = find_khz_freq_from_fid(data->currfid);
 
-	for_each_cpu_mask(i, *(data->available_cores)) {
+	for_each_cpu_mask_nr(i, *(data->available_cores)) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 	}
@@ -997,7 +997,7 @@ static int transition_frequency_pstate(s
 	freqs.old = find_khz_freq_from_pstate(data->powernow_table, data->currpstate);
 	freqs.new = find_khz_freq_from_pstate(data->powernow_table, pstate);
 
-	for_each_cpu_mask(i, *(data->available_cores)) {
+	for_each_cpu_mask_nr(i, *(data->available_cores)) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
 	}
@@ -1005,7 +1005,7 @@ static int transition_frequency_pstate(s
 	res = transition_pstate(data, pstate);
 	freqs.new = find_khz_freq_from_pstate(data->powernow_table, pstate);
 
-	for_each_cpu_mask(i, *(data->available_cores)) {
+	for_each_cpu_mask_nr(i, *(data->available_cores)) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 	}
--- linux-2.6.sched.orig/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
+++ linux-2.6.sched/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
@@ -476,7 +476,7 @@ static int centrino_target (struct cpufr
 	saved_mask = current->cpus_allowed;
 	first_cpu = 1;
 	cpus_clear(covered_cpus);
-	for_each_cpu_mask(j, online_policy_cpus) {
+	for_each_cpu_mask_nr(j, online_policy_cpus) {
 		/*
 		 * Support for SMP systems.
 		 * Make sure we are running on CPU that wants to change freq
@@ -517,7 +517,7 @@ static int centrino_target (struct cpufr
 			dprintk("target=%dkHz old=%d new=%d msr=%04x\n",
 				target_freq, freqs.old, freqs.new, msr);
 
-			for_each_cpu_mask(k, online_policy_cpus) {
+			for_each_cpu_mask_nr(k, online_policy_cpus) {
 				freqs.cpu = k;
 				cpufreq_notify_transition(&freqs,
 					CPUFREQ_PRECHANGE);
@@ -540,7 +540,7 @@ static int centrino_target (struct cpufr
 		preempt_enable();
 	}
 
-	for_each_cpu_mask(k, online_policy_cpus) {
+	for_each_cpu_mask_nr(k, online_policy_cpus) {
 		freqs.cpu = k;
 		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 	}
@@ -554,7 +554,7 @@ static int centrino_target (struct cpufr
 		 */
 
 		if (!cpus_empty(covered_cpus)) {
-			for_each_cpu_mask(j, covered_cpus) {
+			for_each_cpu_mask_nr(j, covered_cpus) {
 				set_cpus_allowed_ptr(current,
 						     &cpumask_of_cpu(j));
 				wrmsr(MSR_IA32_PERF_CTL, oldmsr, h);
@@ -564,7 +564,7 @@ static int centrino_target (struct cpufr
 		tmp = freqs.new;
 		freqs.new = freqs.old;
 		freqs.old = tmp;
-		for_each_cpu_mask(j, online_policy_cpus) {
+		for_each_cpu_mask_nr(j, online_policy_cpus) {
 			freqs.cpu = j;
 			cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
 			cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
--- linux-2.6.sched.orig/arch/x86/kernel/cpu/cpufreq/speedstep-ich.c
+++ linux-2.6.sched/arch/x86/kernel/cpu/cpufreq/speedstep-ich.c
@@ -279,7 +279,7 @@ static int speedstep_target (struct cpuf
 
 	cpus_allowed = current->cpus_allowed;
 
-	for_each_cpu_mask(i, policy->cpus) {
+	for_each_cpu_mask_nr(i, policy->cpus) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
 	}
@@ -292,7 +292,7 @@ static int speedstep_target (struct cpuf
 	/* allow to be run on all CPUs */
 	set_cpus_allowed_ptr(current, &cpus_allowed);
 
-	for_each_cpu_mask(i, policy->cpus) {
+	for_each_cpu_mask_nr(i, policy->cpus) {
 		freqs.cpu = i;
 		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 	}
--- linux-2.6.sched.orig/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ linux-2.6.sched/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -488,7 +488,7 @@ static void __cpuinit cache_remove_share
 	int sibling;
 
 	this_leaf = CPUID4_INFO_IDX(cpu, index);
-	for_each_cpu_mask(sibling, this_leaf->shared_cpu_map) {
+	for_each_cpu_mask_nr(sibling, this_leaf->shared_cpu_map) {
 		sibling_leaf = CPUID4_INFO_IDX(sibling, index);	
 		cpu_clear(cpu, sibling_leaf->shared_cpu_map);
 	}
--- linux-2.6.sched.orig/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
+++ linux-2.6.sched/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
@@ -527,7 +527,7 @@ static __cpuinit int threshold_create_ba
 	if (err)
 		goto out_free;
 
-	for_each_cpu_mask(i, b->cpus) {
+	for_each_cpu_mask_nr(i, b->cpus) {
 		if (i == cpu)
 			continue;
 
@@ -617,7 +617,7 @@ static void threshold_remove_bank(unsign
 #endif
 
 	/* remove all sibling symlinks before unregistering */
-	for_each_cpu_mask(i, b->cpus) {
+	for_each_cpu_mask_nr(i, b->cpus) {
 		if (i == cpu)
 			continue;
 
--- linux-2.6.sched.orig/arch/x86/kernel/io_apic_64.c
+++ linux-2.6.sched/arch/x86/kernel/io_apic_64.c
@@ -722,7 +722,7 @@ static int __assign_irq_vector(int irq, 
 			return 0;
 	}
 
-	for_each_cpu_mask(cpu, mask) {
+	for_each_cpu_mask_nr(cpu, mask) {
 		cpumask_t domain, new_mask;
 		int new_cpu;
 		int vector, offset;
@@ -743,7 +743,7 @@ next:
 			continue;
 		if (vector == IA32_SYSCALL_VECTOR)
 			goto next;
-		for_each_cpu_mask(new_cpu, new_mask)
+		for_each_cpu_mask_nr(new_cpu, new_mask)
 			if (per_cpu(vector_irq, new_cpu)[vector] != -1)
 				goto next;
 		/* Found one! */
@@ -753,7 +753,7 @@ next:
 			cfg->move_in_progress = 1;
 			cfg->old_domain = cfg->domain;
 		}
-		for_each_cpu_mask(new_cpu, new_mask)
+		for_each_cpu_mask_nr(new_cpu, new_mask)
 			per_cpu(vector_irq, new_cpu)[vector] = irq;
 		cfg->vector = vector;
 		cfg->domain = domain;
@@ -785,7 +785,7 @@ static void __clear_irq_vector(int irq)
 
 	vector = cfg->vector;
 	cpus_and(mask, cfg->domain, cpu_online_map);
-	for_each_cpu_mask(cpu, mask)
+	for_each_cpu_mask_nr(cpu, mask)
 		per_cpu(vector_irq, cpu)[vector] = -1;
 
 	cfg->vector = 0;
--- linux-2.6.sched.orig/arch/x86/kernel/smpboot.c
+++ linux-2.6.sched/arch/x86/kernel/smpboot.c
@@ -464,7 +464,7 @@ void __cpuinit set_cpu_sibling_map(int c
 	cpu_set(cpu, cpu_sibling_setup_map);
 
 	if (smp_num_siblings > 1) {
-		for_each_cpu_mask(i, cpu_sibling_setup_map) {
+		for_each_cpu_mask_nr(i, cpu_sibling_setup_map) {
 			if (c->phys_proc_id == cpu_data(i).phys_proc_id &&
 			    c->cpu_core_id == cpu_data(i).cpu_core_id) {
 				cpu_set(i, per_cpu(cpu_sibling_map, cpu));
@@ -487,7 +487,7 @@ void __cpuinit set_cpu_sibling_map(int c
 		return;
 	}
 
-	for_each_cpu_mask(i, cpu_sibling_setup_map) {
+	for_each_cpu_mask_nr(i, cpu_sibling_setup_map) {
 		if (per_cpu(cpu_llc_id, cpu) != BAD_APICID &&
 		    per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
 			cpu_set(i, c->llc_shared_map);
@@ -1301,7 +1301,7 @@ void remove_siblinginfo(int cpu)
 	int sibling;
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 
-	for_each_cpu_mask(sibling, per_cpu(cpu_core_map, cpu)) {
+	for_each_cpu_mask_nr(sibling, per_cpu(cpu_core_map, cpu)) {
 		cpu_clear(cpu, per_cpu(cpu_core_map, sibling));
 		/*/
 		 * last thread sibling in this cpu core going down
@@ -1310,7 +1310,7 @@ void remove_siblinginfo(int cpu)
 			cpu_data(sibling).booted_cores--;
 	}
 
-	for_each_cpu_mask(sibling, per_cpu(cpu_sibling_map, cpu))
+	for_each_cpu_mask_nr(sibling, per_cpu(cpu_sibling_map, cpu))
 		cpu_clear(cpu, per_cpu(cpu_sibling_map, sibling));
 	cpus_clear(per_cpu(cpu_sibling_map, cpu));
 	cpus_clear(per_cpu(cpu_core_map, cpu));
--- linux-2.6.sched.orig/arch/x86/xen/smp.c
+++ linux-2.6.sched/arch/x86/xen/smp.c
@@ -345,7 +345,7 @@ static void xen_send_IPI_mask(cpumask_t 
 
 	cpus_and(mask, mask, cpu_online_map);
 
-	for_each_cpu_mask(cpu, mask)
+	for_each_cpu_mask_nr(cpu, mask)
 		xen_send_IPI_one(cpu, vector);
 }
 
@@ -413,7 +413,7 @@ int xen_smp_call_function_mask(cpumask_t
 
 	/* Make sure other vcpus get a chance to run if they need to. */
 	yield = false;
-	for_each_cpu_mask(cpu, mask)
+	for_each_cpu_mask_nr(cpu, mask)
 		if (xen_vcpu_stolen(cpu))
 			yield = true;
 
--- linux-2.6.sched.orig/drivers/acpi/processor_throttling.c
+++ linux-2.6.sched/drivers/acpi/processor_throttling.c
@@ -1013,7 +1013,7 @@ int acpi_processor_set_throttling(struct
 	 * affected cpu in order to get one proper T-state.
 	 * The notifier event is THROTTLING_PRECHANGE.
 	 */
-	for_each_cpu_mask(i, online_throttling_cpus) {
+	for_each_cpu_mask_nr(i, online_throttling_cpus) {
 		t_state.cpu = i;
 		acpi_processor_throttling_notifier(THROTTLING_PRECHANGE,
 							&t_state);
@@ -1034,7 +1034,7 @@ int acpi_processor_set_throttling(struct
 		 * it is necessary to set T-state for every affected
 		 * cpus.
 		 */
-		for_each_cpu_mask(i, online_throttling_cpus) {
+		for_each_cpu_mask_nr(i, online_throttling_cpus) {
 			match_pr = processors[i];
 			/*
 			 * If the pointer is invalid, we will report the
@@ -1068,7 +1068,7 @@ int acpi_processor_set_throttling(struct
 	 * affected cpu to update the T-states.
 	 * The notifier event is THROTTLING_POSTCHANGE
 	 */
-	for_each_cpu_mask(i, online_throttling_cpus) {
+	for_each_cpu_mask_nr(i, online_throttling_cpus) {
 		t_state.cpu = i;
 		acpi_processor_throttling_notifier(THROTTLING_POSTCHANGE,
 							&t_state);
--- linux-2.6.sched.orig/drivers/cpufreq/cpufreq.c
+++ linux-2.6.sched/drivers/cpufreq/cpufreq.c
@@ -590,7 +590,7 @@ static ssize_t show_affected_cpus (struc
 	ssize_t i = 0;
 	unsigned int cpu;
 
-	for_each_cpu_mask(cpu, policy->cpus) {
+	for_each_cpu_mask_nr(cpu, policy->cpus) {
 		if (i)
 			i += scnprintf(&buf[i], (PAGE_SIZE - i - 2), " ");
 		i += scnprintf(&buf[i], (PAGE_SIZE - i - 2), "%u", cpu);
@@ -816,7 +816,7 @@ static int cpufreq_add_dev (struct sys_d
 	}
 #endif
 
-	for_each_cpu_mask(j, policy->cpus) {
+	for_each_cpu_mask_nr(j, policy->cpus) {
 		if (cpu == j)
 			continue;
 
@@ -889,14 +889,14 @@ static int cpufreq_add_dev (struct sys_d
 	}
 
 	spin_lock_irqsave(&cpufreq_driver_lock, flags);
-	for_each_cpu_mask(j, policy->cpus) {
+	for_each_cpu_mask_nr(j, policy->cpus) {
 		cpufreq_cpu_data[j] = policy;
 		per_cpu(policy_cpu, j) = policy->cpu;
 	}
 	spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
 	/* symlink affected CPUs */
-	for_each_cpu_mask(j, policy->cpus) {
+	for_each_cpu_mask_nr(j, policy->cpus) {
 		if (j == cpu)
 			continue;
 		if (!cpu_online(j))
@@ -938,7 +938,7 @@ static int cpufreq_add_dev (struct sys_d
 
 err_out_unregister:
 	spin_lock_irqsave(&cpufreq_driver_lock, flags);
-	for_each_cpu_mask(j, policy->cpus)
+	for_each_cpu_mask_nr(j, policy->cpus)
 		cpufreq_cpu_data[j] = NULL;
 	spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
@@ -1020,7 +1020,7 @@ static int __cpufreq_remove_dev (struct 
 	 * links afterwards.
 	 */
 	if (unlikely(cpus_weight(data->cpus) > 1)) {
-		for_each_cpu_mask(j, data->cpus) {
+		for_each_cpu_mask_nr(j, data->cpus) {
 			if (j == cpu)
 				continue;
 			cpufreq_cpu_data[j] = NULL;
@@ -1030,7 +1030,7 @@ static int __cpufreq_remove_dev (struct 
 	spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
 	if (unlikely(cpus_weight(data->cpus) > 1)) {
-		for_each_cpu_mask(j, data->cpus) {
+		for_each_cpu_mask_nr(j, data->cpus) {
 			if (j == cpu)
 				continue;
 			dprintk("removing link for cpu %u\n", j);
--- linux-2.6.sched.orig/drivers/cpufreq/cpufreq_conservative.c
+++ linux-2.6.sched/drivers/cpufreq/cpufreq_conservative.c
@@ -497,7 +497,7 @@ static int cpufreq_governor_dbs(struct c
 			return rc;
 		}
 
-		for_each_cpu_mask(j, policy->cpus) {
+		for_each_cpu_mask_nr(j, policy->cpus) {
 			struct cpu_dbs_info_s *j_dbs_info;
 			j_dbs_info = &per_cpu(cpu_dbs_info, j);
 			j_dbs_info->cur_policy = policy;
--- linux-2.6.sched.orig/drivers/cpufreq/cpufreq_ondemand.c
+++ linux-2.6.sched/drivers/cpufreq/cpufreq_ondemand.c
@@ -367,7 +367,7 @@ static void dbs_check_cpu(struct cpu_dbs
 
 	/* Get Idle Time */
 	idle_ticks = UINT_MAX;
-	for_each_cpu_mask(j, policy->cpus) {
+	for_each_cpu_mask_nr(j, policy->cpus) {
 		cputime64_t total_idle_ticks;
 		unsigned int tmp_idle_ticks;
 		struct cpu_dbs_info_s *j_dbs_info;
@@ -521,7 +521,7 @@ static int cpufreq_governor_dbs(struct c
 			return rc;
 		}
 
-		for_each_cpu_mask(j, policy->cpus) {
+		for_each_cpu_mask_nr(j, policy->cpus) {
 			struct cpu_dbs_info_s *j_dbs_info;
 			j_dbs_info = &per_cpu(cpu_dbs_info, j);
 			j_dbs_info->cur_policy = policy;
--- linux-2.6.sched.orig/include/asm-x86/ipi.h
+++ linux-2.6.sched/include/asm-x86/ipi.h
@@ -121,7 +121,7 @@ static inline void send_IPI_mask_sequenc
 	 * - mbligh
 	 */
 	local_irq_save(flags);
-	for_each_cpu_mask(query_cpu, mask) {
+	for_each_cpu_mask_nr(query_cpu, mask) {
 		__send_IPI_dest_field(per_cpu(x86_cpu_to_apicid, query_cpu),
 				      vector, APIC_DEST_PHYSICAL);
 	}
--- linux-2.6.sched.orig/kernel/cpu.c
+++ linux-2.6.sched/kernel/cpu.c
@@ -400,7 +400,7 @@ void __ref enable_nonboot_cpus(void)
 		goto out;
 
 	printk("Enabling non-boot CPUs ...\n");
-	for_each_cpu_mask(cpu, frozen_cpus) {
+	for_each_cpu_mask_nr(cpu, frozen_cpus) {
 		error = _cpu_up(cpu, 1);
 		if (!error) {
 			printk("CPU%d is up\n", cpu);
--- linux-2.6.sched.orig/kernel/rcuclassic.c
+++ linux-2.6.sched/kernel/rcuclassic.c
@@ -92,7 +92,7 @@ static void force_quiescent_state(struct
 		 */
 		cpumask = rcp->cpumask;
 		cpu_clear(rdp->cpu, cpumask);
-		for_each_cpu_mask(cpu, cpumask)
+		for_each_cpu_mask_nr(cpu, cpumask)
 			smp_send_reschedule(cpu);
 	}
 }
--- linux-2.6.sched.orig/kernel/rcupreempt.c
+++ linux-2.6.sched/kernel/rcupreempt.c
@@ -758,7 +758,7 @@ rcu_try_flip_idle(void)
 
 	/* Now ask each CPU for acknowledgement of the flip. */
 
-	for_each_cpu_mask(cpu, rcu_cpu_online_map) {
+	for_each_cpu_mask_nr(cpu, rcu_cpu_online_map) {
 		per_cpu(rcu_flip_flag, cpu) = rcu_flipped;
 		dyntick_save_progress_counter(cpu);
 	}
@@ -776,7 +776,7 @@ rcu_try_flip_waitack(void)
 	int cpu;
 
 	RCU_TRACE_ME(rcupreempt_trace_try_flip_a1);
-	for_each_cpu_mask(cpu, rcu_cpu_online_map)
+	for_each_cpu_mask_nr(cpu, rcu_cpu_online_map)
 		if (rcu_try_flip_waitack_needed(cpu) &&
 		    per_cpu(rcu_flip_flag, cpu) != rcu_flip_seen) {
 			RCU_TRACE_ME(rcupreempt_trace_try_flip_ae1);
@@ -808,7 +808,7 @@ rcu_try_flip_waitzero(void)
 	/* Check to see if the sum of the "last" counters is zero. */
 
 	RCU_TRACE_ME(rcupreempt_trace_try_flip_z1);
-	for_each_cpu_mask(cpu, rcu_cpu_online_map)
+	for_each_cpu_mask_nr(cpu, rcu_cpu_online_map)
 		sum += RCU_DATA_CPU(cpu)->rcu_flipctr[lastidx];
 	if (sum != 0) {
 		RCU_TRACE_ME(rcupreempt_trace_try_flip_ze1);
@@ -823,7 +823,7 @@ rcu_try_flip_waitzero(void)
 	smp_mb();  /*  ^^^^^^^^^^^^ */
 
 	/* Call for a memory barrier from each CPU. */
-	for_each_cpu_mask(cpu, rcu_cpu_online_map) {
+	for_each_cpu_mask_nr(cpu, rcu_cpu_online_map) {
 		per_cpu(rcu_mb_flag, cpu) = rcu_mb_needed;
 		dyntick_save_progress_counter(cpu);
 	}
@@ -843,7 +843,7 @@ rcu_try_flip_waitmb(void)
 	int cpu;
 
 	RCU_TRACE_ME(rcupreempt_trace_try_flip_m1);
-	for_each_cpu_mask(cpu, rcu_cpu_online_map)
+	for_each_cpu_mask_nr(cpu, rcu_cpu_online_map)
 		if (rcu_try_flip_waitmb_needed(cpu) &&
 		    per_cpu(rcu_mb_flag, cpu) != rcu_mb_done) {
 			RCU_TRACE_ME(rcupreempt_trace_try_flip_me1);
--- linux-2.6.sched.orig/kernel/sched.c
+++ linux-2.6.sched/kernel/sched.c
@@ -1682,7 +1682,7 @@ void aggregate_group_weight(struct task_
 	unsigned long task_weight = 0;
 	int i;
 
-	for_each_cpu_mask(i, sd->span) {
+	for_each_cpu_mask_nr(i, sd->span) {
 		rq_weight += tg->cfs_rq[i]->load.weight;
 		task_weight += tg->cfs_rq[i]->task_weight;
 	}
@@ -1737,7 +1737,7 @@ void aggregate_group_shares(struct task_
 	int i;
 
 again:
-	for_each_cpu_mask(i, sd->span)
+	for_each_cpu_mask_nr(i, sd->span)
 		shares += tg->cfs_rq[i]->shares;
 
 	/*
@@ -1765,7 +1765,7 @@ void aggregate_group_load(struct task_gr
 		int i;
 
 		load = 0;
-		for_each_cpu_mask(i, sd->span)
+		for_each_cpu_mask_nr(i, sd->span)
 			load += cpu_rq(i)->load.weight;
 
 	} else {
@@ -1874,7 +1874,7 @@ void aggregate_group_set_shares(struct t
 	unsigned long shares = aggregate(tg, sd)->shares;
 	int i;
 
-	for_each_cpu_mask(i, sd->span) {
+	for_each_cpu_mask_nr(i, sd->span) {
 		struct rq *rq = cpu_rq(i);
 		unsigned long flags;
 
@@ -2408,7 +2408,7 @@ find_idlest_group(struct sched_domain *s
 		/* Tally up the load of all CPUs in the group */
 		avg_load = 0;
 
-		for_each_cpu_mask(i, group->cpumask) {
+		for_each_cpu_mask_nr(i, group->cpumask) {
 			/* Bias balancing toward cpus of our domain */
 			if (local_group)
 				load = source_load(i, load_idx);
@@ -2450,7 +2450,7 @@ find_idlest_cpu(struct sched_group *grou
 	/* Traverse only the allowed CPUs */
 	cpus_and(*tmp, group->cpumask, p->cpus_allowed);
 
-	for_each_cpu_mask(i, *tmp) {
+	for_each_cpu_mask_nr(i, *tmp) {
 		load = weighted_cpuload(i);
 
 		if (load < min_load || (load == min_load && i == this_cpu)) {
@@ -3436,7 +3436,7 @@ find_busiest_group(struct sched_domain *
 		max_cpu_load = 0;
 		min_cpu_load = ~0UL;
 
-		for_each_cpu_mask(i, group->cpumask) {
+		for_each_cpu_mask_nr(i, group->cpumask) {
 			struct rq *rq;
 
 			if (!cpu_isset(i, *cpus))
@@ -3700,7 +3700,7 @@ find_busiest_queue(struct sched_group *g
 	unsigned long max_load = 0;
 	int i;
 
-	for_each_cpu_mask(i, group->cpumask) {
+	for_each_cpu_mask_nr(i, group->cpumask) {
 		unsigned long wl;
 
 		if (!cpu_isset(i, *cpus))
@@ -4240,7 +4240,7 @@ static void run_rebalance_domains(struct
 		int balance_cpu;
 
 		cpu_clear(this_cpu, cpus);
-		for_each_cpu_mask(balance_cpu, cpus) {
+		for_each_cpu_mask_nr(balance_cpu, cpus) {
 			/*
 			 * If this cpu gets work to do, stop the load balancing
 			 * work being done for other cpus. Next load
@@ -7009,7 +7009,7 @@ init_sched_build_groups(const cpumask_t 
 
 	cpus_clear(*covered);
 
-	for_each_cpu_mask(i, *span) {
+	for_each_cpu_mask_nr(i, *span) {
 		struct sched_group *sg;
 		int group = group_fn(i, cpu_map, &sg, tmpmask);
 		int j;
@@ -7020,7 +7020,7 @@ init_sched_build_groups(const cpumask_t 
 		cpus_clear(sg->cpumask);
 		sg->__cpu_power = 0;
 
-		for_each_cpu_mask(j, *span) {
+		for_each_cpu_mask_nr(j, *span) {
 			if (group_fn(j, cpu_map, NULL, tmpmask) != group)
 				continue;
 
@@ -7220,7 +7220,7 @@ static void init_numa_sched_groups_power
 	if (!sg)
 		return;
 	do {
-		for_each_cpu_mask(j, sg->cpumask) {
+		for_each_cpu_mask_nr(j, sg->cpumask) {
 			struct sched_domain *sd;
 
 			sd = &per_cpu(phys_domains, j);
@@ -7245,7 +7245,7 @@ static void free_sched_groups(const cpum
 {
 	int cpu, i;
 
-	for_each_cpu_mask(cpu, *cpu_map) {
+	for_each_cpu_mask_nr(cpu, *cpu_map) {
 		struct sched_group **sched_group_nodes
 			= sched_group_nodes_bycpu[cpu];
 
@@ -7479,7 +7479,7 @@ static int __build_sched_domains(const c
 	/*
 	 * Set up domains for cpus specified by the cpu_map.
 	 */
-	for_each_cpu_mask(i, *cpu_map) {
+	for_each_cpu_mask_nr(i, *cpu_map) {
 		struct sched_domain *sd = NULL, *p;
 		SCHED_CPUMASK_VAR(nodemask, allmasks);
 
@@ -7551,7 +7551,7 @@ static int __build_sched_domains(const c
 
 #ifdef CONFIG_SCHED_SMT
 	/* Set up CPU (sibling) groups */
-	for_each_cpu_mask(i, *cpu_map) {
+	for_each_cpu_mask_nr(i, *cpu_map) {
 		SCHED_CPUMASK_VAR(this_sibling_map, allmasks);
 		SCHED_CPUMASK_VAR(send_covered, allmasks);
 
@@ -7568,7 +7568,7 @@ static int __build_sched_domains(const c
 
 #ifdef CONFIG_SCHED_MC
 	/* Set up multi-core groups */
-	for_each_cpu_mask(i, *cpu_map) {
+	for_each_cpu_mask_nr(i, *cpu_map) {
 		SCHED_CPUMASK_VAR(this_core_map, allmasks);
 		SCHED_CPUMASK_VAR(send_covered, allmasks);
 
@@ -7635,7 +7635,7 @@ static int __build_sched_domains(const c
 			goto error;
 		}
 		sched_group_nodes[i] = sg;
-		for_each_cpu_mask(j, *nodemask) {
+		for_each_cpu_mask_nr(j, *nodemask) {
 			struct sched_domain *sd;
 
 			sd = &per_cpu(node_domains, j);
@@ -7681,21 +7681,21 @@ static int __build_sched_domains(const c
 
 	/* Calculate CPU power for physical packages and nodes */
 #ifdef CONFIG_SCHED_SMT
-	for_each_cpu_mask(i, *cpu_map) {
+	for_each_cpu_mask_nr(i, *cpu_map) {
 		struct sched_domain *sd = &per_cpu(cpu_domains, i);
 
 		init_sched_groups_power(i, sd);
 	}
 #endif
 #ifdef CONFIG_SCHED_MC
-	for_each_cpu_mask(i, *cpu_map) {
+	for_each_cpu_mask_nr(i, *cpu_map) {
 		struct sched_domain *sd = &per_cpu(core_domains, i);
 
 		init_sched_groups_power(i, sd);
 	}
 #endif
 
-	for_each_cpu_mask(i, *cpu_map) {
+	for_each_cpu_mask_nr(i, *cpu_map) {
 		struct sched_domain *sd = &per_cpu(phys_domains, i);
 
 		init_sched_groups_power(i, sd);
@@ -7715,7 +7715,7 @@ static int __build_sched_domains(const c
 #endif
 
 	/* Attach the domains */
-	for_each_cpu_mask(i, *cpu_map) {
+	for_each_cpu_mask_nr(i, *cpu_map) {
 		struct sched_domain *sd;
 #ifdef CONFIG_SCHED_SMT
 		sd = &per_cpu(cpu_domains, i);
@@ -7798,7 +7798,7 @@ static void detach_destroy_domains(const
 
 	unregister_sched_domain_sysctl();
 
-	for_each_cpu_mask(i, *cpu_map)
+	for_each_cpu_mask_nr(i, *cpu_map)
 		cpu_attach_domain(NULL, &def_root_domain, i);
 	synchronize_sched();
 	arch_destroy_sched_domains(cpu_map, &tmpmask);
--- linux-2.6.sched.orig/kernel/sched_fair.c
+++ linux-2.6.sched/kernel/sched_fair.c
@@ -1015,7 +1015,7 @@ static int wake_idle(int cpu, struct tas
 		    || ((sd->flags & SD_WAKE_IDLE_FAR)
 			&& !task_hot(p, task_rq(p)->clock, sd))) {
 			cpus_and(tmp, sd->span, p->cpus_allowed);
-			for_each_cpu_mask(i, tmp) {
+			for_each_cpu_mask_nr(i, tmp) {
 				if (idle_cpu(i)) {
 					if (i != task_cpu(p)) {
 						schedstat_inc(p,
--- linux-2.6.sched.orig/kernel/sched_rt.c
+++ linux-2.6.sched/kernel/sched_rt.c
@@ -231,7 +231,7 @@ static int do_sched_rt_period_timer(stru
 		return 1;
 
 	span = sched_rt_period_mask();
-	for_each_cpu_mask(i, span) {
+	for_each_cpu_mask_nr(i, span) {
 		int enqueue = 0;
 		struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
 		struct rq *rq = rq_of_rt_rq(rt_rq);
@@ -272,7 +272,7 @@ static int balance_runtime(struct rt_rq 
 
 	spin_lock(&rt_b->rt_runtime_lock);
 	rt_period = ktime_to_ns(rt_b->rt_period);
-	for_each_cpu_mask(i, rd->span) {
+	for_each_cpu_mask_nr(i, rd->span) {
 		struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
 		s64 diff;
 
@@ -960,7 +960,7 @@ static int pull_rt_task(struct rq *this_
 
 	next = pick_next_task_rt(this_rq);
 
-	for_each_cpu_mask(cpu, this_rq->rd->rto_mask) {
+	for_each_cpu_mask_nr(cpu, this_rq->rd->rto_mask) {
 		if (this_cpu == cpu)
 			continue;
 
--- linux-2.6.sched.orig/kernel/taskstats.c
+++ linux-2.6.sched/kernel/taskstats.c
@@ -301,7 +301,7 @@ static int add_del_listener(pid_t pid, c
 		return -EINVAL;
 
 	if (isadd == REGISTER) {
-		for_each_cpu_mask(cpu, mask) {
+		for_each_cpu_mask_nr(cpu, mask) {
 			s = kmalloc_node(sizeof(struct listener), GFP_KERNEL,
 					 cpu_to_node(cpu));
 			if (!s)
@@ -320,7 +320,7 @@ static int add_del_listener(pid_t pid, c
 
 	/* Deregister or cleanup */
 cleanup:
-	for_each_cpu_mask(cpu, mask) {
+	for_each_cpu_mask_nr(cpu, mask) {
 		listeners = &per_cpu(listener_array, cpu);
 		down_write(&listeners->sem);
 		list_for_each_entry_safe(s, tmp, &listeners->list, list) {
--- linux-2.6.sched.orig/kernel/trace/trace.c
+++ linux-2.6.sched/kernel/trace/trace.c
@@ -40,7 +40,7 @@ static unsigned long __read_mostly	traci
 static cpumask_t __read_mostly		tracing_buffer_mask;
 
 #define for_each_tracing_cpu(cpu)	\
-	for_each_cpu_mask(cpu, tracing_buffer_mask)
+	for_each_cpu_mask_nr(cpu, tracing_buffer_mask)
 
 /* dummy trace to disable tracing */
 static struct tracer no_tracer __read_mostly = {
@@ -2529,7 +2529,7 @@ tracing_read_pipe(struct file *filp, cha
 		cpu_set(cpu, mask);
 	}
 
-	for_each_cpu_mask(cpu, mask) {
+	for_each_cpu_mask_nr(cpu, mask) {
 		data = iter->tr->data[cpu];
 		__raw_spin_lock(&data->lock);
 
@@ -2555,12 +2555,12 @@ tracing_read_pipe(struct file *filp, cha
 			break;
 	}
 
-	for_each_cpu_mask(cpu, mask) {
+	for_each_cpu_mask_nr(cpu, mask) {
 		data = iter->tr->data[cpu];
 		__raw_spin_unlock(&data->lock);
 	}
 
-	for_each_cpu_mask(cpu, mask) {
+	for_each_cpu_mask_nr(cpu, mask) {
 		data = iter->tr->data[cpu];
 		atomic_dec(&data->disabled);
 	}
--- linux-2.6.sched.orig/kernel/workqueue.c
+++ linux-2.6.sched/kernel/workqueue.c
@@ -397,7 +397,7 @@ void flush_workqueue(struct workqueue_st
 	might_sleep();
 	lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
 	lock_release(&wq->lockdep_map, 1, _THIS_IP_);
-	for_each_cpu_mask(cpu, *cpu_map)
+	for_each_cpu_mask_nr(cpu, *cpu_map)
 		flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
 }
 EXPORT_SYMBOL_GPL(flush_workqueue);
@@ -477,7 +477,7 @@ static void wait_on_work(struct work_str
 	wq = cwq->wq;
 	cpu_map = wq_cpu_map(wq);
 
-	for_each_cpu_mask(cpu, *cpu_map)
+	for_each_cpu_mask_nr(cpu, *cpu_map)
 		wait_on_cpu_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
 }
 
@@ -817,7 +817,7 @@ void destroy_workqueue(struct workqueue_
 	spin_unlock(&workqueue_lock);
 	put_online_cpus();
 
-	for_each_cpu_mask(cpu, *cpu_map) {
+	for_each_cpu_mask_nr(cpu, *cpu_map) {
 		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
 		cleanup_workqueue_thread(cwq, cpu);
 	}
--- linux-2.6.sched.orig/mm/allocpercpu.c
+++ linux-2.6.sched/mm/allocpercpu.c
@@ -35,7 +35,7 @@ EXPORT_SYMBOL_GPL(percpu_depopulate);
 void __percpu_depopulate_mask(void *__pdata, cpumask_t *mask)
 {
 	int cpu;
-	for_each_cpu_mask(cpu, *mask)
+	for_each_cpu_mask_nr(cpu, *mask)
 		percpu_depopulate(__pdata, cpu);
 }
 EXPORT_SYMBOL_GPL(__percpu_depopulate_mask);
@@ -86,7 +86,7 @@ int __percpu_populate_mask(void *__pdata
 	int cpu;
 
 	cpus_clear(populated);
-	for_each_cpu_mask(cpu, *mask)
+	for_each_cpu_mask_nr(cpu, *mask)
 		if (unlikely(!percpu_populate(__pdata, size, gfp, cpu))) {
 			__percpu_depopulate_mask(__pdata, &populated);
 			return -ENOMEM;
--- linux-2.6.sched.orig/mm/vmstat.c
+++ linux-2.6.sched/mm/vmstat.c
@@ -26,7 +26,7 @@ static void sum_vm_events(unsigned long 
 
 	memset(ret, 0, NR_VM_EVENT_ITEMS * sizeof(unsigned long));
 
-	for_each_cpu_mask(cpu, *cpumask) {
+	for_each_cpu_mask_nr(cpu, *cpumask) {
 		struct vm_event_state *this = &per_cpu(vm_event_states, cpu);
 
 		for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
--- linux-2.6.sched.orig/net/core/dev.c
+++ linux-2.6.sched/net/core/dev.c
@@ -2231,7 +2231,7 @@ out:
 	 */
 	if (!cpus_empty(net_dma.channel_mask)) {
 		int chan_idx;
-		for_each_cpu_mask(chan_idx, net_dma.channel_mask) {
+		for_each_cpu_mask_nr(chan_idx, net_dma.channel_mask) {
 			struct dma_chan *chan = net_dma.channels[chan_idx];
 			if (chan)
 				dma_async_memcpy_issue_pending(chan);
@@ -4290,7 +4290,7 @@ static void net_dma_rebalance(struct net
 	i = 0;
 	cpu = first_cpu(cpu_online_map);
 
-	for_each_cpu_mask(chan_idx, net_dma->channel_mask) {
+	for_each_cpu_mask_nr(chan_idx, net_dma->channel_mask) {
 		chan = net_dma->channels[chan_idx];
 
 		n = ((num_online_cpus() / cpus_weight(net_dma->channel_mask))
--- linux-2.6.sched.orig/net/iucv/iucv.c
+++ linux-2.6.sched/net/iucv/iucv.c
@@ -497,7 +497,7 @@ static void iucv_setmask_up(void)
 	/* Disable all cpu but the first in cpu_irq_cpumask. */
 	cpumask = iucv_irq_cpumask;
 	cpu_clear(first_cpu(iucv_irq_cpumask), cpumask);
-	for_each_cpu_mask(cpu, cpumask)
+	for_each_cpu_mask_nr(cpu, cpumask)
 		smp_call_function_single(cpu, iucv_block_cpu, NULL, 0, 1);
 }
 

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 10/11] x86: Use performance variant next_cpu_nr
  2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
                   ` (8 preceding siblings ...)
  2008-04-26  0:15 ` [PATCH 09/11] x86: Use performance variant for_each_cpu_mask_nr Mike Travis
@ 2008-04-26  0:15 ` Mike Travis
  2008-04-26  0:15 ` [PATCH 11/11] net: Pass reference to cpumask variable in net/sunrpc/svc.c Mike Travis
  2008-04-28 13:42 ` [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Ingo Molnar
  11 siblings, 0 replies; 20+ messages in thread
From: Mike Travis @ 2008-04-26  0:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel

[-- Attachment #1: use-next_cpu_nr --]
[-- Type: text/plain, Size: 2271 bytes --]

  * Change references from for_each_cpu_mask to for_each_cpu_mask_nr
    where appropriate.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git


Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
---
 drivers/infiniband/hw/ehca/ehca_irq.c |    4 ++--
 kernel/time/clocksource.c             |    4 ++--
 kernel/time/tick-broadcast.c          |    3 +--
 3 files changed, 5 insertions(+), 6 deletions(-)

--- linux-2.6.sched.orig/drivers/infiniband/hw/ehca/ehca_irq.c
+++ linux-2.6.sched/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -637,8 +637,8 @@ static inline int find_next_online_cpu(s
 		ehca_dmp(&cpu_online_map, sizeof(cpumask_t), "");
 
 	spin_lock_irqsave(&pool->last_cpu_lock, flags);
-	cpu = next_cpu(pool->last_cpu, cpu_online_map);
-	if (cpu == NR_CPUS)
+	cpu = next_cpu_nr(pool->last_cpu, cpu_online_map);
+	if (cpu >= nr_cpu_ids)
 		cpu = first_cpu(cpu_online_map);
 	pool->last_cpu = cpu;
 	spin_unlock_irqrestore(&pool->last_cpu_lock, flags);
--- linux-2.6.sched.orig/kernel/time/clocksource.c
+++ linux-2.6.sched/kernel/time/clocksource.c
@@ -145,9 +145,9 @@ static void clocksource_watchdog(unsigne
 		 * Cycle through CPUs to check if the CPUs stay
 		 * synchronized to each other.
 		 */
-		int next_cpu = next_cpu(raw_smp_processor_id(), cpu_online_map);
+		int next_cpu = next_cpu_nr(raw_smp_processor_id(), cpu_online_map);
 
-		if (next_cpu >= NR_CPUS)
+		if (next_cpu >= nr_cpu_ids)
 			next_cpu = first_cpu(cpu_online_map);
 		watchdog_timer.expires += WATCHDOG_INTERVAL;
 		add_timer_on(&watchdog_timer, next_cpu);
--- linux-2.6.sched.orig/kernel/time/tick-broadcast.c
+++ linux-2.6.sched/kernel/time/tick-broadcast.c
@@ -397,8 +397,7 @@ again:
 	mask = CPU_MASK_NONE;
 	now = ktime_get();
 	/* Find all expired events */
-	for (cpu = first_cpu(tick_broadcast_oneshot_mask); cpu != NR_CPUS;
-	     cpu = next_cpu(cpu, tick_broadcast_oneshot_mask)) {
+	for_each_cpu_mask_nr(cpu, tick_broadcast_oneshot_mask) {
 		td = &per_cpu(tick_cpu_device, cpu);
 		if (td->evtdev->next_event.tv64 <= now.tv64)
 			cpu_set(cpu, mask);

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 11/11] net: Pass reference to cpumask variable in net/sunrpc/svc.c
  2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
                   ` (9 preceding siblings ...)
  2008-04-26  0:15 ` [PATCH 10/11] x86: Use performance variant next_cpu_nr Mike Travis
@ 2008-04-26  0:15 ` Mike Travis
  2008-04-28 13:42 ` [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Ingo Molnar
  11 siblings, 0 replies; 20+ messages in thread
From: Mike Travis @ 2008-04-26  0:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel

[-- Attachment #1: cpumask-in-sunrpc_svc --]
[-- Type: text/plain, Size: 726 bytes --]

  * Pass reference to cpumask variable instead of using stack.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
---
 net/sunrpc/svc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.sched.orig/net/sunrpc/svc.c
+++ linux-2.6.sched/net/sunrpc/svc.c
@@ -604,7 +604,7 @@ __svc_create_thread(svc_thread_fn func, 
 	error = kernel_thread((int (*)(void *)) func, rqstp, 0);
 
 	if (have_oldmask)
-		set_cpus_allowed(current, oldmask);
+		set_cpus_allowed_ptr(current, &oldmask);
 
 	if (error < 0)
 		goto out_thread;

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/11] x86: Modify Kconfig to allow up to 4096 cpus
  2008-04-26  0:15 ` [PATCH 01/11] x86: Modify Kconfig to allow up to 4096 cpus Mike Travis
@ 2008-04-27 10:39   ` Pavel Machek
  2008-04-28 13:38     ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Machek @ 2008-04-27 10:39 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	linux-kernel

On Fri 2008-04-25 17:15:49, Mike Travis wrote:
>   * Increase the limit of NR_CPUS to 4096 and introduce a boolean
>     called "MAXSMP" which when set (e.g. "allyesconfig"), will set
>     NR_CPUS = 4096 and NODES_SHIFT = 9 (512).

Why is redundant option 'maxsmp' a good idea?
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/11] x86: Modify Kconfig to allow up to 4096 cpus
  2008-04-27 10:39   ` Pavel Machek
@ 2008-04-28 13:38     ` Ingo Molnar
  2008-04-28 13:54       ` Pavel Machek
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2008-04-28 13:38 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Mike Travis, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	linux-kernel


* Pavel Machek <pavel@ucw.cz> wrote:

> On Fri 2008-04-25 17:15:49, Mike Travis wrote:
> >   * Increase the limit of NR_CPUS to 4096 and introduce a boolean
> >     called "MAXSMP" which when set (e.g. "allyesconfig"), will set
> >     NR_CPUS = 4096 and NODES_SHIFT = 9 (512).
> 
> Why is redundant option 'maxsmp' a good idea?

because this way randconfig can trigger it and can configure all the 
otherwise randconfig-invisible [or just plain unlikely] numerics and 
options up to their max.

I found 2-3 "large box" bugs via that way already.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded
  2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
                   ` (10 preceding siblings ...)
  2008-04-26  0:15 ` [PATCH 11/11] net: Pass reference to cpumask variable in net/sunrpc/svc.c Mike Travis
@ 2008-04-28 13:42 ` Ingo Molnar
  2008-04-28 15:13   ` Ingo Molnar
  11 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2008-04-28 13:42 UTC (permalink / raw)
  To: Mike Travis; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel


* Mike Travis <travis@sgi.com> wrote:

> 1/11:	Increase the limit of NR_CPUS to 4096 and introduce a boolean
> 	called "MAXSMP" which when set (e.g. "allyesconfig"), will set
> 	NR_CPUS = 4096 and NODES_SHIFT = 9 (512).  Changed max setting
> 	for NODES_SHIFT from 15 to 9 to accurately reflect the real limit.
> 
> 2/11:	Introduce a new PER_CPU macro called "EARLY_PER_CPU".  This is
> 	used by some per_cpu variables that are initialized and accessed
> 	before there are per_cpu areas allocated.
> 
> 	Add a flag "arch_provides_topology_pointers" that indicates pointers
> 	to topology cpumask_t maps are available.  Otherwise, use the function
> 	returning the cpumask_t value.  This is useful if cpumask_t set size
> 	is very large to avoid copying data on to/off of the stack.
> 
> 3/11:	Restore the nodenumber field in the x86_64 pda.  This field is slightly
> 	different than the x86_cpu_to_node_map mainly because it's a static
> 	indication of which node the cpu is on while the cpu to node map is a
> 	dyanamic mapping that may get reset if the cpu goes offline.  This also
> 	simplifies the numa_node_id() macro.
> 
> 4/11:	Consolidate node_to_cpumask operations and remove the 256k
> 	byte node_to_cpumask_map.  This is done by allocating the
> 	node_to_cpumask_map array after the number of possible
> 	nodes (nr_node_ids) is known.
> 
> 5/11:	Replace usages of MAX_NUMNODES with nr_node_ids in kernel/sched.c,
> 	where appropriate.  This saves some allocated space as well as many
> 	wasted cycles going through node entries that are non-existent.
> 
> 6/11:	Changed some global definitions in drivers/base/cpu.c to static.
> 
> 7/11:	Remove 544k bytes from the kernel by removing the boot_cpu_pda
> 	array from the data section and allocating it during startup.
> 
> 8/11:	Increase performance for systems with large count NR_CPUS by
> 	limiting the range of the cpumask operators that loop over
> 	the bits in a cpumask_t variable.  This removes a large amount
> 	of wasted cpu cycles.
> 
> 9/11:	Change references from for_each_cpu_mask to for_each_cpu_mask_ptr
> 	in all cases for x86_64 and generic code.
> 
> 10/11:	Change references from next_cpu to next_cpu_nr (or for_each_cpu_mask_ptr
> 	if applicable), in all cases for x86_64 and generic code.
> 
> 11/11:  Pass reference to cpumask variable in net/sunrpc/svc.c

thanks Travis, applied.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/11] x86: Modify Kconfig to allow up to 4096 cpus
  2008-04-28 13:38     ` Ingo Molnar
@ 2008-04-28 13:54       ` Pavel Machek
  2008-04-28 15:52         ` Ingo Molnar
  2008-04-28 17:13         ` H. Peter Anvin
  0 siblings, 2 replies; 20+ messages in thread
From: Pavel Machek @ 2008-04-28 13:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mike Travis, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	linux-kernel

> 
> * Pavel Machek <pavel@ucw.cz> wrote:
> 
> > On Fri 2008-04-25 17:15:49, Mike Travis wrote:
> > >   * Increase the limit of NR_CPUS to 4096 and introduce a boolean
> > >     called "MAXSMP" which when set (e.g. "allyesconfig"), will set
> > >     NR_CPUS = 4096 and NODES_SHIFT = 9 (512).
> > 
> > Why is redundant option 'maxsmp' a good idea?
> 
> because this way randconfig can trigger it and can configure all the 
> otherwise randconfig-invisible [or just plain unlikely] numerics and 
> options up to their max.
> 
> I found 2-3 "large box" bugs via that way already.

Should we improve randconfig to select numeric values when range is
given, too?
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded
  2008-04-28 13:42 ` [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Ingo Molnar
@ 2008-04-28 15:13   ` Ingo Molnar
  2008-04-28 16:27     ` Mike Travis
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2008-04-28 15:13 UTC (permalink / raw)
  To: Mike Travis; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> thanks Travis, applied.

got this early crash:

 Allocating PCI resources starting at 50000000 (gap: 40000000:a0000000)
 PERCPU: Allocating 32864 bytes of per cpu data
 PANIC: early exception 0e rip 10:ffffffff80aa9683 error 2 cr2 8
 Pid: 0, comm: swapper Not tainted 2.6.25-sched-devel.git-x86-latest.git  #93

 Call Trace:
 PANIC: early exception 0e rip 10:ffffffff8020cd69 error 0 cr2 30

  http://redhat.com/~mingo/misc/config-Mon_Apr_28_16_20_05_CEST_2008.bad
  http://redhat.com/~mingo/misc/log-Mon_Apr_28_16_20_05_CEST_2008.bad

... from the dreaded randconfig tester :)

the crash secondary one in dump_trace(), see below. Seems percpu data 
structures are confused?

	Ingo

---------------------->
ffffffff8020cd20 <dump_trace>:
ffffffff8020cd20:	55                   	push   %rbp
ffffffff8020cd21:	48 89 e5             	mov    %rsp,%rbp
ffffffff8020cd24:	41 57                	push   %r15
ffffffff8020cd26:	41 56                	push   %r14
ffffffff8020cd28:	41 55                	push   %r13
ffffffff8020cd2a:	41 54                	push   %r12
ffffffff8020cd2c:	53                   	push   %rbx
ffffffff8020cd2d:	48 83 ec 48          	sub    $0x48,%rsp
ffffffff8020cd31:	e8 5a e8 ff ff       	callq  ffffffff8020b590 <mcount>
ffffffff8020cd36:	65 8b 04 25 24 00 00 	mov    %gs:0x24,%eax
ffffffff8020cd3d:	00 
ffffffff8020cd3e:	89 c0                	mov    %eax,%eax
ffffffff8020cd40:	49 89 d4             	mov    %rdx,%r12
ffffffff8020cd43:	48 89 4d a8          	mov    %rcx,-0x58(%rbp)
ffffffff8020cd47:	48 8d 14 c5 00 00 00 	lea    0x0(,%rax,8),%rdx
ffffffff8020cd4e:	00 
ffffffff8020cd4f:	4c 89 45 a0          	mov    %r8,-0x60(%rbp)
ffffffff8020cd53:	4c 89 4d 98          	mov    %r9,-0x68(%rbp)
ffffffff8020cd57:	48 85 ff             	test   %rdi,%rdi
ffffffff8020cd5a:	48 89 55 b0          	mov    %rdx,-0x50(%rbp)
ffffffff8020cd5e:	48 8b 15 ab 0b 88 00 	mov    0x880bab(%rip),%rdx        # ffffffff80a8d910 <_cpu_pda>
ffffffff8020cd65:	48 8b 04 c2          	mov    (%rdx,%rax,8),%rax
ffffffff8020cd69:	48 8b 48 30          	mov    0x30(%rax),%rcx
ffffffff8020cd6d:	0f 84 8a 03 00 00    	je     ffffffff8020d0fd <dump_trace+0x3dd>
ffffffff8020cd73:	48 8b 5f 08          	mov    0x8(%rdi),%rbx
ffffffff8020cd77:	4d 85 e4             	test   %r12,%r12
ffffffff8020cd7a:	48 89 5d c0          	mov    %rbx,-0x40(%rbp)
ffffffff8020cd7e:	0f 84 57 03 00 00    	je     ffffffff8020d0db <dump_trace+0x3bb>
ffffffff8020cd84:	48 83 7d a8 00       	cmpq   $0x0,-0x58(%rbp)
ffffffff8020cd89:	75 20                	jne    ffffffff8020cdab <dump_trace+0x8b>
ffffffff8020cd8b:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
ffffffff8020cd92:	00 00 
ffffffff8020cd94:	48 39 c7             	cmp    %rax,%rdi
ffffffff8020cd97:	0f 84 6e 03 00 00    	je     ffffffff8020d10b <dump_trace+0x3eb>
ffffffff8020cd9d:	48 8b 87 b0 03 00 00 	mov    0x3b0(%rdi),%rax
ffffffff8020cda4:	48 8b 00             	mov    (%rax),%rax
ffffffff8020cda7:	48 89 45 a8          	mov    %rax,-0x58(%rbp)
ffffffff8020cdab:	49 89 ce             	mov    %rcx,%r14
ffffffff8020cdae:	c7 45 bc 00 00 00 00 	movl   $0x0,-0x44(%rbp)
ffffffff8020cdb5:	48 c7 c1 20 24 af 80 	mov    $0xffffffff80af2420,%rcx
ffffffff8020cdbc:	48 89 4d 90          	mov    %rcx,-0x70(%rbp)
ffffffff8020cdc0:	48 8b 5d b0          	mov    -0x50(%rbp),%rbx
ffffffff8020cdc4:	31 ff                	xor    %edi,%edi
ffffffff8020cdc6:	48 8b 04 13          	mov    (%rbx,%rdx,1),%rax
ffffffff8020cdca:	4c 8b 48 08          	mov    0x8(%rax),%r9
ffffffff8020cdce:	66 90                	xchg   %ax,%ax
ffffffff8020cdd0:	89 f8                	mov    %edi,%eax
ffffffff8020cdd2:	48 8b 5d 90          	mov    -0x70(%rbp),%rbx
ffffffff8020cdd6:	89 f9                	mov    %edi,%ecx
ffffffff8020cdd8:	48 8d 34 c5 00 00 00 	lea    0x0(,%rax,8),%rsi
ffffffff8020cddf:	00 
ffffffff8020cde0:	4a 8d 04 0e          	lea    (%rsi,%r9,1),%rax
ffffffff8020cde4:	48 8b 14 18          	mov    (%rax,%rbx,1),%rdx
ffffffff8020cde8:	49 39 d4             	cmp    %rdx,%r12
ffffffff8020cdeb:	73 19                	jae    ffffffff8020ce06 <dump_trace+0xe6>
ffffffff8020cded:	4c 8d 82 00 f0 ff ff 	lea    -0x1000(%rdx),%r8
ffffffff8020cdf4:	4d 39 c4             	cmp    %r8,%r12
ffffffff8020cdf7:	0f 83 0f 02 00 00    	jae    ffffffff8020d00c <dump_trace+0x2ec>
ffffffff8020cdfd:	83 ff 03             	cmp    $0x3,%edi
ffffffff8020ce00:	0f 84 e9 00 00 00    	je     ffffffff8020ceef <dump_trace+0x1cf>
ffffffff8020ce06:	48 ff c7             	inc    %rdi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/11] x86: Modify Kconfig to allow up to 4096 cpus
  2008-04-28 13:54       ` Pavel Machek
@ 2008-04-28 15:52         ` Ingo Molnar
  2008-04-28 17:13         ` H. Peter Anvin
  1 sibling, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2008-04-28 15:52 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Mike Travis, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	linux-kernel


* Pavel Machek <pavel@ucw.cz> wrote:

> > > Why is redundant option 'maxsmp' a good idea?
> > 
> > because this way randconfig can trigger it and can configure all the 
> > otherwise randconfig-invisible [or just plain unlikely] numerics and 
> > options up to their max.
> > 
> > I found 2-3 "large box" bugs via that way already.
> 
> Should we improve randconfig to select numeric values when range is 
> given, too?

definitely - but often there are constraints on the numeric value that 
are not spelled out in the Kconfig language. (alignment, granularity, 
etc.)

MAXSMP is still useful, as it would still be rather unlikely for all 
numeric values to go to the max at once. The other idea is that with it 
'make allyesconfig' really puts all build-time limits up to the max.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded
  2008-04-28 15:13   ` Ingo Molnar
@ 2008-04-28 16:27     ` Mike Travis
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Travis @ 2008-04-28 16:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Thomas Gleixner, H. Peter Anvin, linux-kernel

Thanks, I'll check it out asap.  It does look like a pda was accessed too
early.  Having "CONFIG_DEBUG_PER_CPU_MAPS" on might help with this.

Mike

Ingo Molnar wrote:
...
> got this early crash:
> 
>  Allocating PCI resources starting at 50000000 (gap: 40000000:a0000000)
>  PERCPU: Allocating 32864 bytes of per cpu data
>  PANIC: early exception 0e rip 10:ffffffff80aa9683 error 2 cr2 8
>  Pid: 0, comm: swapper Not tainted 2.6.25-sched-devel.git-x86-latest.git  #93
> 
>  Call Trace:
>  PANIC: early exception 0e rip 10:ffffffff8020cd69 error 0 cr2 30
> 
>   http://redhat.com/~mingo/misc/config-Mon_Apr_28_16_20_05_CEST_2008.bad
>   http://redhat.com/~mingo/misc/log-Mon_Apr_28_16_20_05_CEST_2008.bad
> 
> ... from the dreaded randconfig tester :)
> 
> the crash secondary one in dump_trace(), see below. Seems percpu data 
> structures are confused?
> 
> 	Ingo
> 
> ---------------------->
> ffffffff8020cd20 <dump_trace>:
> ffffffff8020cd20:	55                   	push   %rbp
> ffffffff8020cd21:	48 89 e5             	mov    %rsp,%rbp
> ffffffff8020cd24:	41 57                	push   %r15
> ffffffff8020cd26:	41 56                	push   %r14
> ffffffff8020cd28:	41 55                	push   %r13
> ffffffff8020cd2a:	41 54                	push   %r12
> ffffffff8020cd2c:	53                   	push   %rbx
> ffffffff8020cd2d:	48 83 ec 48          	sub    $0x48,%rsp
> ffffffff8020cd31:	e8 5a e8 ff ff       	callq  ffffffff8020b590 <mcount>
> ffffffff8020cd36:	65 8b 04 25 24 00 00 	mov    %gs:0x24,%eax
> ffffffff8020cd3d:	00 
> ffffffff8020cd3e:	89 c0                	mov    %eax,%eax
> ffffffff8020cd40:	49 89 d4             	mov    %rdx,%r12
> ffffffff8020cd43:	48 89 4d a8          	mov    %rcx,-0x58(%rbp)
> ffffffff8020cd47:	48 8d 14 c5 00 00 00 	lea    0x0(,%rax,8),%rdx
> ffffffff8020cd4e:	00 
> ffffffff8020cd4f:	4c 89 45 a0          	mov    %r8,-0x60(%rbp)
> ffffffff8020cd53:	4c 89 4d 98          	mov    %r9,-0x68(%rbp)
> ffffffff8020cd57:	48 85 ff             	test   %rdi,%rdi
> ffffffff8020cd5a:	48 89 55 b0          	mov    %rdx,-0x50(%rbp)
> ffffffff8020cd5e:	48 8b 15 ab 0b 88 00 	mov    0x880bab(%rip),%rdx        # ffffffff80a8d910 <_cpu_pda>
> ffffffff8020cd65:	48 8b 04 c2          	mov    (%rdx,%rax,8),%rax
> ffffffff8020cd69:	48 8b 48 30          	mov    0x30(%rax),%rcx
> ffffffff8020cd6d:	0f 84 8a 03 00 00    	je     ffffffff8020d0fd <dump_trace+0x3dd>
> ffffffff8020cd73:	48 8b 5f 08          	mov    0x8(%rdi),%rbx
> ffffffff8020cd77:	4d 85 e4             	test   %r12,%r12
> ffffffff8020cd7a:	48 89 5d c0          	mov    %rbx,-0x40(%rbp)
> ffffffff8020cd7e:	0f 84 57 03 00 00    	je     ffffffff8020d0db <dump_trace+0x3bb>
> ffffffff8020cd84:	48 83 7d a8 00       	cmpq   $0x0,-0x58(%rbp)
> ffffffff8020cd89:	75 20                	jne    ffffffff8020cdab <dump_trace+0x8b>
> ffffffff8020cd8b:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
> ffffffff8020cd92:	00 00 
> ffffffff8020cd94:	48 39 c7             	cmp    %rax,%rdi
> ffffffff8020cd97:	0f 84 6e 03 00 00    	je     ffffffff8020d10b <dump_trace+0x3eb>
> ffffffff8020cd9d:	48 8b 87 b0 03 00 00 	mov    0x3b0(%rdi),%rax
> ffffffff8020cda4:	48 8b 00             	mov    (%rax),%rax
> ffffffff8020cda7:	48 89 45 a8          	mov    %rax,-0x58(%rbp)
> ffffffff8020cdab:	49 89 ce             	mov    %rcx,%r14
> ffffffff8020cdae:	c7 45 bc 00 00 00 00 	movl   $0x0,-0x44(%rbp)
> ffffffff8020cdb5:	48 c7 c1 20 24 af 80 	mov    $0xffffffff80af2420,%rcx
> ffffffff8020cdbc:	48 89 4d 90          	mov    %rcx,-0x70(%rbp)
> ffffffff8020cdc0:	48 8b 5d b0          	mov    -0x50(%rbp),%rbx
> ffffffff8020cdc4:	31 ff                	xor    %edi,%edi
> ffffffff8020cdc6:	48 8b 04 13          	mov    (%rbx,%rdx,1),%rax
> ffffffff8020cdca:	4c 8b 48 08          	mov    0x8(%rax),%r9
> ffffffff8020cdce:	66 90                	xchg   %ax,%ax
> ffffffff8020cdd0:	89 f8                	mov    %edi,%eax
> ffffffff8020cdd2:	48 8b 5d 90          	mov    -0x70(%rbp),%rbx
> ffffffff8020cdd6:	89 f9                	mov    %edi,%ecx
> ffffffff8020cdd8:	48 8d 34 c5 00 00 00 	lea    0x0(,%rax,8),%rsi
> ffffffff8020cddf:	00 
> ffffffff8020cde0:	4a 8d 04 0e          	lea    (%rsi,%r9,1),%rax
> ffffffff8020cde4:	48 8b 14 18          	mov    (%rax,%rbx,1),%rdx
> ffffffff8020cde8:	49 39 d4             	cmp    %rdx,%r12
> ffffffff8020cdeb:	73 19                	jae    ffffffff8020ce06 <dump_trace+0xe6>
> ffffffff8020cded:	4c 8d 82 00 f0 ff ff 	lea    -0x1000(%rdx),%r8
> ffffffff8020cdf4:	4d 39 c4             	cmp    %r8,%r12
> ffffffff8020cdf7:	0f 83 0f 02 00 00    	jae    ffffffff8020d00c <dump_trace+0x2ec>
> ffffffff8020cdfd:	83 ff 03             	cmp    $0x3,%edi
> ffffffff8020ce00:	0f 84 e9 00 00 00    	je     ffffffff8020ceef <dump_trace+0x1cf>
> ffffffff8020ce06:	48 ff c7             	inc    %rdi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/11] x86: Modify Kconfig to allow up to 4096 cpus
  2008-04-28 13:54       ` Pavel Machek
  2008-04-28 15:52         ` Ingo Molnar
@ 2008-04-28 17:13         ` H. Peter Anvin
  1 sibling, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2008-04-28 17:13 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Ingo Molnar, Mike Travis, Andrew Morton, Thomas Gleixner,
	linux-kernel

Pavel Machek wrote:
>> * Pavel Machek <pavel@ucw.cz> wrote:
>>
>>> On Fri 2008-04-25 17:15:49, Mike Travis wrote:
>>>>   * Increase the limit of NR_CPUS to 4096 and introduce a boolean
>>>>     called "MAXSMP" which when set (e.g. "allyesconfig"), will set
>>>>     NR_CPUS = 4096 and NODES_SHIFT = 9 (512).
>>> Why is redundant option 'maxsmp' a good idea?
>> because this way randconfig can trigger it and can configure all the 
>> otherwise randconfig-invisible [or just plain unlikely] numerics and 
>> options up to their max.
>>
>> I found 2-3 "large box" bugs via that way already.
> 
> Should we improve randconfig to select numeric values when range is
> given, too?
> 								Pavel

Would not be a bad idea; preferrably with a logarithmic distribution.

	-hpa

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-04-28 17:14 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-26  0:15 [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Mike Travis
2008-04-26  0:15 ` [PATCH 01/11] x86: Modify Kconfig to allow up to 4096 cpus Mike Travis
2008-04-27 10:39   ` Pavel Machek
2008-04-28 13:38     ` Ingo Molnar
2008-04-28 13:54       ` Pavel Machek
2008-04-28 15:52         ` Ingo Molnar
2008-04-28 17:13         ` H. Peter Anvin
2008-04-26  0:15 ` [PATCH 02/11] x86: cleanup early per cpu variables/accesses v4 Mike Travis
2008-04-26  0:15 ` [PATCH 03/11] x86: restore pda nodenumber field Mike Travis
2008-04-26  0:15 ` [PATCH 04/11] x86: remove the static 256k node_to_cpumask_map Mike Travis
2008-04-26  0:15 ` [PATCH 05/11] sched: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c Mike Travis
2008-04-26  0:15 ` [PATCH 06/11] cpu: change some globals to statics in drivers/base/cpu.c v2 Mike Travis
2008-04-26  0:15 ` [PATCH 07/11] x86: remove static boot_cpu_pda array Mike Travis
2008-04-26  0:15 ` [PATCH 08/11] x86: Add performance variants of cpumask operators Mike Travis
2008-04-26  0:15 ` [PATCH 09/11] x86: Use performance variant for_each_cpu_mask_nr Mike Travis
2008-04-26  0:15 ` [PATCH 10/11] x86: Use performance variant next_cpu_nr Mike Travis
2008-04-26  0:15 ` [PATCH 11/11] net: Pass reference to cpumask variable in net/sunrpc/svc.c Mike Travis
2008-04-28 13:42 ` [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded Ingo Molnar
2008-04-28 15:13   ` Ingo Molnar
2008-04-28 16:27     ` Mike Travis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).