linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id()
@ 2010-04-15 17:29 Lee Schermerhorn
  2010-04-15 17:29 ` [PATCH 1/8] numa: add generic percpu var numa_node_id() implementation Lee Schermerhorn
                   ` (8 more replies)
  0 siblings, 9 replies; 29+ messages in thread
From: Lee Schermerhorn @ 2010-04-15 17:29 UTC (permalink / raw)
  To: linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi, Kleen, andi, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

Use Generic Per cpu infrastructure for numa_*_id() V4

Series Against: 2.6.34-rc3-mmotm-100405-1609

Background:

V1 of this series resolved a fairly serious performance problem on our ia64
platforms with memoryless nodes because SLAB cannot cache object from a remote
node, even tho' that node is the effective "local memory node" for a given cpu.
V1 caused no regression in x86_64 [a slight improvement even] for the admittedly
few tests that I ran.

Christoph Lameter suggested the approach implemented in V2 and later:  define
a new function--numa_mem_id()--that returns the "local memory node" for cpus
attached to memoryless nodes.  Christoph also suggested that, while at it, I
could modify the implementation of numa_node_id() [and the related cpu_to_node()]
to use the generic percpu variable implementation.

While implementing V2, I encountered a circular header dependency between:

	topology.h -> percpu.h -> slab.h -> gfp.h -> topology.h

I resolved this by moving the generic percpu functions to
include/asm-generic/percpu.h so that various arch asm/percpu.h could include
that, and topology.h could include asm/percpu.h to avoid including slab.h,
breaking the circular dependency.  Reviewers didn't like that.  Matthew Willcox
suggested that I uninline percpu_alloc()/free() for the !SMP config and remove
slab.h from percpu.h.  I tried that.  I broke the build of a LOT of files.  Tejun
Heo mentioned that percpu-defs.h would be a better place for the generic function
definitions.  V3 implemented that suggestion.

Later, Tejun decided to jump in and remove slab.h from percpu.h and semi-
automagically fix up all of the affected modules.  V4 is implemented atop Tejun's
series now in mmotm.  Again, this solves the slab performance problem on our
servers configured with memoryless nodes, and shows no regression with
hackbench on x86_64.  Of course, more performance testing would be welcome.

The slab changes in patch 6 of the series need review w/rt to node hot plug
that could change the effective "local memory node" for a memoryless node
by inserting a "nearer" node in the zonelists.  An additional patch may be
required to address this.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/8] numa:  add generic percpu var numa_node_id() implementation
  2010-04-15 17:29 [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
@ 2010-04-15 17:29 ` Lee Schermerhorn
  2010-04-16 16:43   ` Christoph Lameter
                     ` (2 more replies)
  2010-04-15 17:30 ` [PATCH 2/8] numa: x86_64: use " Lee Schermerhorn
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 29+ messages in thread
From: Lee Schermerhorn @ 2010-04-15 17:29 UTC (permalink / raw)
  To: linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi, Kleen, andi, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

Against:  2.6.34-rc3-mmotm-100405-1609

Rework the generic version of the numa_node_id() function to use the
new generic percpu variable infrastructure.

Guard the new implementation with a new config option:

        CONFIG_USE_PERCPU_NUMA_NODE_ID.

Archs which support this new implemention will default this option
to 'y' when NUMA is configured.  This config option could be removed
if/when all archs switch over to the generic percpu implementation
of numa_node_id().  Arch support involves:

  1) converting any existing per cpu variable implementations to use
     this implementation.  x86_64 is an instance of such an arch.
  2) archs that don't use a per cpu variable for numa_node_id() will
     need to initialize the new per cpu variable "numa_node" as cpus
     are brought on-line.  ia64 is an example.
  3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
     when NUMA is configured.  This is required because I have
     retained the old implementation by default to allow archs to
     be modified incrementally, as desired.

Subsequent patches will convert x86_64 and ia64 to use this
implemenation.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

---

V0:
#  From cl@linux-foundation.org Wed Nov  4 10:36:12 2009
#  Date: Wed, 4 Nov 2009 12:35:14 -0500 (EST)
#  From: Christoph Lameter <cl@linux-foundation.org>
#  To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
#  Subject: Re: [PATCH/RFC] slab:  handle memoryless nodes efficiently
#
#  I have a very early form of a draft of a patch here that genericizes
#  numa_node_id(). Uses the new generic this_cpu_xxx stuff.
#
#  Not complete.

V1:
  + split out x86 specific changes to subsequent patch
  + split out "numa_mem_id()" and related changes to separate patch
  + moved generic definitions of __this_cpu_xxx from linux/percpu.h
    to asm-generic/percpu.h where asm/percpu.h and other asm hdrs
    can use them.
  + export new percpu symbol 'numa_node' in mm/percpu.h
  + include <asm/percpu.h> in <linux/topology.h> for use by new
    numa_node_id().

V2:
  + add back the #ifndef/#endif guard around numa_node_id() so that archs
    can override generic definition
  + add generic stub for set_numa_node()
  + use generic percpu numa_node_id() only if enabled by
      CONFIG_USE_PERCPU_NUMA_NODE_ID
   to allow incremental per arch support.  This option could be removed when/if
   all archs that support NUMA support this option.

V3:
  + separated the rework of linux/percpu.h into another [preceding] patch.
  + moved definition of the numa_node percpu variable from mm/percpu.c to
    mm/page-alloc.c
  + moved premature definition of cpu_to_mem() to later patch.

V4:
  + topology.h:  include <linux/percpu.h> rather than <linux/percpu-defs.h>
    Requires Tejun Heo's percpu.h/slab.h cleanup series

 include/linux/topology.h |   33 ++++++++++++++++++++++++++++-----
 mm/page_alloc.c          |    5 +++++
 2 files changed, 33 insertions(+), 5 deletions(-)

Index: linux-2.6.34-rc3-mmotm-100405-1609/mm/page_alloc.c
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/mm/page_alloc.c	2010-04-07 10:04:04.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/mm/page_alloc.c	2010-04-07 10:10:23.000000000 -0400
@@ -56,6 +56,11 @@
 #include <asm/div64.h>
 #include "internal.h"
 
+#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
+DEFINE_PER_CPU(int, numa_node);
+EXPORT_PER_CPU_SYMBOL(numa_node);
+#endif
+
 /*
  * Array of node states.
  */
Index: linux-2.6.34-rc3-mmotm-100405-1609/include/linux/topology.h
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/include/linux/topology.h	2010-04-07 09:49:13.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/include/linux/topology.h	2010-04-07 10:10:23.000000000 -0400
@@ -31,6 +31,7 @@
 #include <linux/bitops.h>
 #include <linux/mmzone.h>
 #include <linux/smp.h>
+#include <linux/percpu.h>
 #include <asm/topology.h>
 
 #ifndef node_has_online_mem
@@ -203,8 +204,35 @@ int arch_update_cpu_topology(void);
 #ifndef SD_NODE_INIT
 #error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
 #endif
+
 #endif /* CONFIG_NUMA */
 
+#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
+DECLARE_PER_CPU(int, numa_node);
+
+#ifndef numa_node_id
+/* Returns the number of the current Node. */
+#define numa_node_id()		__this_cpu_read(numa_node)
+#endif
+
+#ifndef cpu_to_node
+#define cpu_to_node(__cpu)	per_cpu(numa_node, (__cpu))
+#endif
+
+#ifndef set_numa_node
+#define set_numa_node(__node) percpu_write(numa_node, __node)
+#endif
+
+#else	/* !CONFIG_USE_PERCPU_NUMA_NODE_ID */
+
+/* Returns the number of the current Node. */
+#ifndef numa_node_id
+#define numa_node_id()		(cpu_to_node(raw_smp_processor_id()))
+
+#endif
+
+#endif	/* [!]CONFIG_USE_PERCPU_NUMA_NODE_ID */
+
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
@@ -218,9 +246,4 @@ int arch_update_cpu_topology(void);
 #define topology_core_cpumask(cpu)		cpumask_of(cpu)
 #endif
 
-/* Returns the number of the current Node. */
-#ifndef numa_node_id
-#define numa_node_id()		(cpu_to_node(raw_smp_processor_id()))
-#endif
-
 #endif /* _LINUX_TOPOLOGY_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/8] numa:  x86_64:  use generic percpu var numa_node_id() implementation
  2010-04-15 17:29 [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
  2010-04-15 17:29 ` [PATCH 1/8] numa: add generic percpu var numa_node_id() implementation Lee Schermerhorn
@ 2010-04-15 17:30 ` Lee Schermerhorn
  2010-04-16 16:46   ` Christoph Lameter
  2010-04-15 17:30 ` [PATCH 3/8] numa: ia64: " Lee Schermerhorn
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Lee Schermerhorn @ 2010-04-15 17:30 UTC (permalink / raw)
  To: linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi, Kleen, andi, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

Against:  2.6.34-rc3-mmotm-100405-1609

x86 arch specific changes to use generic numa_node_id() based on
generic percpu variable infrastructure.  Back out x86's custom
version of numa_node_id()

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
[Christoph's signoff here?]

---

V0: based on:
# From cl@linux-foundation.org Wed Nov  4 10:36:12 2009
# Date: Wed, 4 Nov 2009 12:35:14 -0500 (EST)
# From: Christoph Lameter <cl@linux-foundation.org>
# To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
# Subject: Re: [PATCH/RFC] slab:  handle memoryless nodes efficiently
#
# I have a very early form of a draft of a patch here that genericizes
# numa_node_id(). Uses the new generic this_cpu_xxx stuff.
#
# Not complete.

V1:
  + split out x86-specific changes from generic.
  + change 'node_number' => 'numa_node' in x86 arch code
  + define __this_cpu_read in x86 asm/percpu.h
  + change x86/kernel/setup_percpu.c to use early_cpu_to_node() to
    setup 'numa_node' as cpu_to_node() now depends on the per cpu var.
    [I think!  What about cpu_to_node() func in x86/mm/numa_64.c ???]

V2:
  + cpu_to_node() => early_cpu_to_node(); incomplete change in V01
  + x86 arch define USE_PERCPU_NUMA_NODE_ID.

V4:
  + remove '__this_cpu_{read|write}() from arch/x86/include/asm/percpu.h.
  + rename cpu_to_node() to __cpu_to_node() in arch/x86/mm/numa_64.c and
    override generic percpu implementation of cpu_to_node() in
    arch/x86/include/asm/topology.h under CONFIG_DEBUG_PER_CPU_MAPS to
    fix build breakage.   [Don't know why we couldn't use the percpu version
    for debugging cpu maps.]

 arch/x86/Kconfig                |    4 ++++
 arch/x86/include/asm/topology.h |   20 +++++++-------------
 arch/x86/kernel/cpu/common.c    |    6 +++---
 arch/x86/kernel/setup_percpu.c  |    4 ++--
 arch/x86/mm/numa_64.c           |    9 +++------
 5 files changed, 19 insertions(+), 24 deletions(-)

Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/x86/include/asm/topology.h
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/x86/include/asm/topology.h	2010-04-07 09:49:13.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/arch/x86/include/asm/topology.h	2010-04-07 10:10:25.000000000 -0400
@@ -53,33 +53,27 @@
 extern int cpu_to_node_map[];
 
 /* Returns the number of the node containing CPU 'cpu' */
-static inline int cpu_to_node(int cpu)
+static inline int early_cpu_to_node(int cpu)
 {
 	return cpu_to_node_map[cpu];
 }
-#define early_cpu_to_node(cpu)	cpu_to_node(cpu)
 
 #else /* CONFIG_X86_64 */
 
 /* Mappings between logical cpu number and node number */
 DECLARE_EARLY_PER_CPU(int, x86_cpu_to_node_map);
 
-/* Returns the number of the current Node. */
-DECLARE_PER_CPU(int, node_number);
-#define numa_node_id()		percpu_read(node_number)
-
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
-extern int cpu_to_node(int cpu);
+/*
+ * override generic percpu implementation of cpu_to_node
+ */
+extern int __cpu_to_node(int cpu);
+#define cpu_to_node __cpu_to_node
+
 extern int early_cpu_to_node(int cpu);
 
 #else	/* !CONFIG_DEBUG_PER_CPU_MAPS */
 
-/* Returns the number of the node containing CPU 'cpu' */
-static inline int cpu_to_node(int cpu)
-{
-	return per_cpu(x86_cpu_to_node_map, cpu);
-}
-
 /* Same function but used if called before per_cpu areas are setup */
 static inline int early_cpu_to_node(int cpu)
 {
Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/x86/mm/numa_64.c	2010-04-07 10:03:41.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/arch/x86/mm/numa_64.c	2010-04-07 10:10:25.000000000 -0400
@@ -33,9 +33,6 @@ int numa_off __initdata;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
 
-DEFINE_PER_CPU(int, node_number) = 0;
-EXPORT_PER_CPU_SYMBOL(node_number);
-
 /*
  * Map cpu index to node index
  */
@@ -809,7 +806,7 @@ void __cpuinit numa_set_node(int cpu, in
 	per_cpu(x86_cpu_to_node_map, cpu) = node;
 
 	if (node != NUMA_NO_NODE)
-		per_cpu(node_number, cpu) = node;
+		per_cpu(numa_node, cpu) = node;
 }
 
 void __cpuinit numa_clear_node(int cpu)
@@ -867,7 +864,7 @@ void __cpuinit numa_remove_cpu(int cpu)
 	numa_set_cpumask(cpu, 0);
 }
 
-int cpu_to_node(int cpu)
+int __cpu_to_node(int cpu)
 {
 	if (early_per_cpu_ptr(x86_cpu_to_node_map)) {
 		printk(KERN_WARNING
@@ -877,7 +874,7 @@ int cpu_to_node(int cpu)
 	}
 	return per_cpu(x86_cpu_to_node_map, cpu);
 }
-EXPORT_SYMBOL(cpu_to_node);
+EXPORT_SYMBOL(__cpu_to_node);
 
 /*
  * Same function as cpu_to_node() but used if called before the
Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/x86/kernel/cpu/common.c
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/x86/kernel/cpu/common.c	2010-04-07 10:03:49.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/arch/x86/kernel/cpu/common.c	2010-04-07 10:10:25.000000000 -0400
@@ -1121,9 +1121,9 @@ void __cpuinit cpu_init(void)
 	oist = &per_cpu(orig_ist, cpu);
 
 #ifdef CONFIG_NUMA
-	if (cpu != 0 && percpu_read(node_number) == 0 &&
-	    cpu_to_node(cpu) != NUMA_NO_NODE)
-		percpu_write(node_number, cpu_to_node(cpu));
+	if (cpu != 0 && percpu_read(numa_node) == 0 &&
+	    early_cpu_to_node(cpu) != NUMA_NO_NODE)
+		set_numa_node(early_cpu_to_node(cpu));
 #endif
 
 	me = current;
Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/x86/kernel/setup_percpu.c
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/x86/kernel/setup_percpu.c	2010-04-07 10:03:49.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/arch/x86/kernel/setup_percpu.c	2010-04-07 10:10:25.000000000 -0400
@@ -265,10 +265,10 @@ void __init setup_per_cpu_areas(void)
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_NUMA)
 	/*
-	 * make sure boot cpu node_number is right, when boot cpu is on the
+	 * make sure boot cpu numa_node is right, when boot cpu is on the
 	 * node that doesn't have mem installed
 	 */
-	per_cpu(node_number, boot_cpu_id) = cpu_to_node(boot_cpu_id);
+	per_cpu(numa_node, boot_cpu_id) = early_cpu_to_node(boot_cpu_id);
 #endif
 
 	/* Setup node to cpumask map */
Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/x86/Kconfig
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/x86/Kconfig	2010-04-07 10:10:20.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/arch/x86/Kconfig	2010-04-07 10:10:25.000000000 -0400
@@ -1715,6 +1715,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
 	def_bool X86_64
 	depends on NUMA
 
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 3/8] numa:  ia64:  use generic percpu var numa_node_id() implementation
  2010-04-15 17:29 [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
  2010-04-15 17:29 ` [PATCH 1/8] numa: add generic percpu var numa_node_id() implementation Lee Schermerhorn
  2010-04-15 17:30 ` [PATCH 2/8] numa: x86_64: use " Lee Schermerhorn
@ 2010-04-15 17:30 ` Lee Schermerhorn
  2010-04-19  2:51   ` KAMEZAWA Hiroyuki
  2010-04-15 17:30 ` [PATCH 4/8] numa: Introduce numa_mem_id()- effective local memory node id Lee Schermerhorn
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Lee Schermerhorn @ 2010-04-15 17:30 UTC (permalink / raw)
  To: linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi, Kleen, andi, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

Against:  2.6.34-rc3-mmotm-100405-1609

ia64:  Use generic percpu implementation of numa_node_id()
   + intialize per cpu 'numa_node'
   + remove ia64 cpu_to_node() macro;  use generic
   + define CONFIG_USE_PERCPU_NUMA_NODE_ID when NUMA configured

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>

---

New in V2

V3, V4: no change

 arch/ia64/Kconfig                |    4 ++++
 arch/ia64/include/asm/topology.h |    5 -----
 arch/ia64/kernel/smpboot.c       |    6 ++++++
 3 files changed, 10 insertions(+), 5 deletions(-)

Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/kernel/smpboot.c
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/ia64/kernel/smpboot.c	2010-04-07 10:03:38.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/kernel/smpboot.c	2010-04-07 10:10:27.000000000 -0400
@@ -390,6 +390,11 @@ smp_callin (void)
 
 	fix_b0_for_bsp();
 
+	/*
+	 * numa_node_id() works after this.
+	 */
+	set_numa_node(cpu_to_node_map[cpuid]);
+
 	ipi_call_lock_irq();
 	spin_lock(&vector_lock);
 	/* Setup the per cpu irq handling data structures */
@@ -632,6 +637,7 @@ void __devinit smp_prepare_boot_cpu(void
 {
 	cpu_set(smp_processor_id(), cpu_online_map);
 	cpu_set(smp_processor_id(), cpu_callin_map);
+	set_numa_node(cpu_to_node_map[smp_processor_id()]);
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	paravirt_post_smp_prepare_boot_cpu();
 }
Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/include/asm/topology.h
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/ia64/include/asm/topology.h	2010-04-07 09:49:13.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/include/asm/topology.h	2010-04-07 10:10:27.000000000 -0400
@@ -26,11 +26,6 @@
 #define RECLAIM_DISTANCE 15
 
 /*
- * Returns the number of the node containing CPU 'cpu'
- */
-#define cpu_to_node(cpu) (int)(cpu_to_node_map[cpu])
-
-/*
  * Returns a bitmask of CPUs on Node 'node'.
  */
 #define cpumask_of_node(node) ((node) == -1 ?				\
Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/Kconfig
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/ia64/Kconfig	2010-04-07 10:04:03.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/Kconfig	2010-04-07 10:10:27.000000000 -0400
@@ -497,6 +497,10 @@ config HAVE_ARCH_NODEDATA_EXTENSION
 	def_bool y
 	depends on NUMA
 
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 config ARCH_PROC_KCORE_TEXT
 	def_bool y
 	depends on PROC_KCORE

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 4/8] numa:  Introduce numa_mem_id()- effective local memory node id
  2010-04-15 17:29 [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
                   ` (2 preceding siblings ...)
  2010-04-15 17:30 ` [PATCH 3/8] numa: ia64: " Lee Schermerhorn
@ 2010-04-15 17:30 ` Lee Schermerhorn
  2010-04-18  3:13   ` Tejun Heo
  2010-04-15 17:30 ` [PATCH 5/8] numa: ia64: support numa_mem_id() for memoryless nodes Lee Schermerhorn
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Lee Schermerhorn @ 2010-04-15 17:30 UTC (permalink / raw)
  To: linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi, Kleen, andi, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

Against:  2.6.34-rc3-mmotm-100405-1609

Introduce numa_mem_id(), based on generic percpu variable infrastructure
to track "nearest node with memory" for archs that support memoryless
nodes.

Define API in <linux/topology.h> when CONFIG_HAVE_MEMORYLESS_NODES
defined, else stubs. Architectures will define HAVE_MEMORYLESS_NODES
if/when they support them.

Archs can override definitions of:

numa_mem_id() - returns node number of "local memory" node
set_numa_mem() - initialize [this cpus'] per cpu variable 'numa_mem'
cpu_to_mem()  - return numa_mem for specified cpu; may be used as lvalue

Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
This will initialize the boot cpu at boot time, and all cpus on change of
numa_zonelist_order, or when node or memory hot-plug requires zonelist rebuild.
Archs that support memoryless nodes will need to initialize 'numa_mem' for
secondary cpus as they're brought on-line.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---

V2:  + split this out of Christoph's incomplete "starter patch"
     + flesh out the definition

V3,V4:  no change

 include/asm-generic/topology.h |    3 +++
 include/linux/mmzone.h         |    6 ++++++
 include/linux/topology.h       |   24 ++++++++++++++++++++++++
 mm/page_alloc.c                |   39 ++++++++++++++++++++++++++++++++++++++-
 4 files changed, 71 insertions(+), 1 deletion(-)

Index: linux-2.6.34-rc3-mmotm-100405-1609/include/linux/topology.h
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/include/linux/topology.h	2010-04-07 10:10:23.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/include/linux/topology.h	2010-04-07 10:10:28.000000000 -0400
@@ -233,6 +233,30 @@ DECLARE_PER_CPU(int, numa_node);
 
 #endif	/* [!]CONFIG_USE_PERCPU_NUMA_NODE_ID */
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+
+DECLARE_PER_CPU(int, numa_mem);
+
+#ifndef set_numa_mem
+#define set_numa_mem(__node) percpu_write(numa_mem, __node)
+#endif
+
+#else	/* !CONFIG_HAVE_MEMORYLESS_NODES */
+
+#define numa_mem numa_node
+static inline void set_numa_mem(int node) {}
+
+#endif	/* [!]CONFIG_HAVE_MEMORYLESS_NODES */
+
+#ifndef numa_mem_id
+/* Returns the number of the nearest Node with memory */
+#define numa_mem_id()		__this_cpu_read(numa_mem)
+#endif
+
+#ifndef cpu_to_mem
+#define cpu_to_mem(__cpu)	per_cpu(numa_mem, (__cpu))
+#endif
+
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
Index: linux-2.6.34-rc3-mmotm-100405-1609/mm/page_alloc.c
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/mm/page_alloc.c	2010-04-07 10:10:23.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/mm/page_alloc.c	2010-04-07 10:10:28.000000000 -0400
@@ -61,6 +61,11 @@ DEFINE_PER_CPU(int, numa_node);
 EXPORT_PER_CPU_SYMBOL(numa_node);
 #endif
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+DEFINE_PER_CPU(int, numa_mem);		/* Kernel "local memory" node */
+EXPORT_PER_CPU_SYMBOL(numa_mem);
+#endif
+
 /*
  * Array of node states.
  */
@@ -2752,6 +2757,24 @@ static void build_zonelist_cache(pg_data
 		zlc->z_to_n[z - zonelist->_zonerefs] = zonelist_node_idx(z);
 }
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+/*
+ * Return node id of node used for "local" allocations.
+ * I.e., first node id of first zone in arg node's generic zonelist.
+ * Used for initializing percpu 'numa_mem', which is used primarily
+ * for kernel allocations, so use GFP_KERNEL flags to locate zonelist.
+ */
+int local_memory_node(int node)
+{
+	struct zone *zone;
+
+	(void)first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
+				   gfp_zone(GFP_KERNEL),
+				   NULL,
+				   &zone);
+	return zone->node;
+}
+#endif
 
 #else	/* CONFIG_NUMA */
 
@@ -2851,9 +2874,23 @@ static int __build_all_zonelists(void *d
 	 * needs the percpu allocator in order to allocate its pagesets
 	 * (a chicken-egg dilemma).
 	 */
-	for_each_possible_cpu(cpu)
+	for_each_possible_cpu(cpu) {
 		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+		/*
+		 * We now know the "local memory node" for each node--
+		 * i.e., the node of the first zone in the generic zonelist.
+		 * Set up numa_mem percpu variable for on-line cpus.  During
+		 * boot, only the boot cpu should be on-line;  we'll init the
+		 * secondary cpus' numa_mem as they come on-line.  During
+		 * node/memory hotplug, we'll fixup all on-line cpus.
+		 */
+		if (cpu_online(cpu))
+			cpu_to_mem(cpu) = local_memory_node(cpu_to_node(cpu));
+#endif
+	}
+
 	return 0;
 }
 
Index: linux-2.6.34-rc3-mmotm-100405-1609/include/linux/mmzone.h
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/include/linux/mmzone.h	2010-04-07 10:03:46.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/include/linux/mmzone.h	2010-04-07 10:10:28.000000000 -0400
@@ -661,6 +661,12 @@ void memory_present(int nid, unsigned lo
 static inline void memory_present(int nid, unsigned long start, unsigned long end) {}
 #endif
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+int local_memory_node(int node_id);
+#else
+static inline int local_memory_node(int node_id) { return node_id; };
+#endif
+
 #ifdef CONFIG_NEED_NODE_MEMMAP_SIZE
 unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
 #endif
Index: linux-2.6.34-rc3-mmotm-100405-1609/include/asm-generic/topology.h
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/include/asm-generic/topology.h	2010-04-07 09:49:13.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/include/asm-generic/topology.h	2010-04-07 10:10:28.000000000 -0400
@@ -34,6 +34,9 @@
 #ifndef cpu_to_node
 #define cpu_to_node(cpu)	((void)(cpu),0)
 #endif
+#ifndef cpu_to_mem
+#define cpu_to_mem(cpu)		(void)(cpu),0)
+#endif
 #ifndef parent_node
 #define parent_node(node)	((void)(node),0)
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 5/8] numa: ia64: support numa_mem_id() for memoryless nodes
  2010-04-15 17:29 [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
                   ` (3 preceding siblings ...)
  2010-04-15 17:30 ` [PATCH 4/8] numa: Introduce numa_mem_id()- effective local memory node id Lee Schermerhorn
@ 2010-04-15 17:30 ` Lee Schermerhorn
  2010-04-18  3:14   ` Tejun Heo
  2010-04-15 17:30 ` [PATCH 6/8] numa: slab: use numa_mem_id() for slab local memory node Lee Schermerhorn
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Lee Schermerhorn @ 2010-04-15 17:30 UTC (permalink / raw)
  To: linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi, Kleen, andi, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

Against:  2.6.34-rc3-mmotm-100405-1609

IA64: Support memoryless nodes

Enable 'HAVE_MEMORYLESS_NODES' by default when NUMA configured
on ia64.  Initialize percpu 'numa_mem' variable when starting
secondary cpus.  Generic initialization will handle the boot
cpu.

Nothing uses 'numa_mem_id()' yet.  Subsequent patch with modify
slab to use this.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

---

New in V2

V3, V4:  no change

 arch/ia64/Kconfig          |    4 ++++
 arch/ia64/kernel/smpboot.c |    1 +
 2 files changed, 5 insertions(+)

Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/Kconfig
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/ia64/Kconfig	2010-04-07 10:10:27.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/Kconfig	2010-04-07 10:10:30.000000000 -0400
@@ -501,6 +501,10 @@ config USE_PERCPU_NUMA_NODE_ID
 	def_bool y
 	depends on NUMA
 
+config HAVE_MEMORYLESS_NODES
+	def_bool y
+	depends on NUMA
+
 config ARCH_PROC_KCORE_TEXT
 	def_bool y
 	depends on PROC_KCORE
Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/kernel/smpboot.c
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/ia64/kernel/smpboot.c	2010-04-07 10:10:27.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/kernel/smpboot.c	2010-04-07 10:10:30.000000000 -0400
@@ -394,6 +394,7 @@ smp_callin (void)
 	 * numa_node_id() works after this.
 	 */
 	set_numa_node(cpu_to_node_map[cpuid]);
+	set_numa_mem(local_memory_node(cpu_to_node_map[cpuid]));
 
 	ipi_call_lock_irq();
 	spin_lock(&vector_lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 6/8] numa: slab:  use numa_mem_id() for slab local memory node
  2010-04-15 17:29 [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
                   ` (4 preceding siblings ...)
  2010-04-15 17:30 ` [PATCH 5/8] numa: ia64: support numa_mem_id() for memoryless nodes Lee Schermerhorn
@ 2010-04-15 17:30 ` Lee Schermerhorn
  2010-05-12 18:49   ` Andrew Morton
  2010-04-15 17:30 ` [PATCH 7/8] numa: in-kernel profiling: use cpu_to_mem() for per cpu allocations Lee Schermerhorn
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Lee Schermerhorn @ 2010-04-15 17:30 UTC (permalink / raw)
  To: linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi, Kleen, andi, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki


Against:  2.6.34-rc3-mmotm-100405-1609

Example usage of generic "numa_mem_id()":

The mainline slab code, since ~ 2.6.19, does not handle memoryless
nodes well.  Specifically, the "fast path"--____cache_alloc()--will
never succeed as slab doesn't cache offnode object on the per cpu
queues, and for memoryless nodes, all memory will be "off node"
relative to numa_node_id().  This adds significant overhead to all
kmem cache allocations, incurring a significant regression relative
to earlier kernels [from before slab.c was reorganized].

This patch uses the generic topology function "numa_mem_id()" to
return the "effective local memory node" for the calling context.
This is the first node in the local node's generic fallback zonelist--
the same node that "local" mempolicy-based allocations would use.
This lets slab cache these "local" allocations and avoid
fallback/refill on every allocation.

N.B.:  Slab will need to handle node and memory hotplug events that
could change the value returned by numa_mem_id() for any given
node if recent changes to address memory hotplug don't already
address this.  E.g., flush all per cpu slab queues before rebuilding
the zonelists while the "machine" is held in the stopped state.

Performance impact on "hackbench 400 process 200"

2.6.34-rc3-mmotm-100405-1609		no-patch	this-patch
ia64 no memoryless nodes [avg of 10]:     11.713       11.637  ~0.65 diff
ia64 cpus all on memless nodes  [10]:    228.259       26.484  ~8.6x speedup

The slowdown of the patched kernel from ~12 sec to ~28 seconds when
configured with memoryless nodes is the result of all cpus allocating
from a single node's mm pagepool.  The cache lines of the single node
are distributed/interleaved over the memory of the real physical nodes,
but the zone lock, list heads, ... of the single node with memory still
each live in a single cache line that is accessed from all processors.

x86_64 [8x6 AMD] [avg of 40]:		2.883	   2.845

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

---

V4:	no change to code.  rebased patch and updated test results
	in description.


 mm/slab.c |   43 ++++++++++++++++++++++---------------------
 1 files changed, 22 insertions(+), 21 deletions(-)

Index: linux-2.6.34-rc3-mmotm-100405-1609/mm/slab.c
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/mm/slab.c	2010-04-07 10:04:02.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/mm/slab.c	2010-04-07 10:11:34.000000000 -0400
@@ -844,7 +844,7 @@ static void init_reap_node(int cpu)
 {
 	int node;
 
-	node = next_node(cpu_to_node(cpu), node_online_map);
+	node = next_node(cpu_to_mem(cpu), node_online_map);
 	if (node == MAX_NUMNODES)
 		node = first_node(node_online_map);
 
@@ -1073,7 +1073,7 @@ static inline int cache_free_alien(struc
 	struct array_cache *alien = NULL;
 	int node;
 
-	node = numa_node_id();
+	node = numa_mem_id();
 
 	/*
 	 * Make sure we are not freeing a object from another node to the array
@@ -1106,7 +1106,7 @@ static void __cpuinit cpuup_canceled(lon
 {
 	struct kmem_cache *cachep;
 	struct kmem_list3 *l3 = NULL;
-	int node = cpu_to_node(cpu);
+	int node = cpu_to_mem(cpu);
 	const struct cpumask *mask = cpumask_of_node(node);
 
 	list_for_each_entry(cachep, &cache_chain, next) {
@@ -1171,7 +1171,7 @@ static int __cpuinit cpuup_prepare(long
 {
 	struct kmem_cache *cachep;
 	struct kmem_list3 *l3 = NULL;
-	int node = cpu_to_node(cpu);
+	int node = cpu_to_mem(cpu);
 	const int memsize = sizeof(struct kmem_list3);
 
 	/*
@@ -1418,7 +1418,7 @@ void __init kmem_cache_init(void)
 	 * 6) Resize the head arrays of the kmalloc caches to their final sizes.
 	 */
 
-	node = numa_node_id();
+	node = numa_mem_id();
 
 	/* 1) create the cache_cache */
 	INIT_LIST_HEAD(&cache_chain);
@@ -2052,7 +2052,7 @@ static int __init_refok setup_cpu_cache(
 			}
 		}
 	}
-	cachep->nodelists[numa_node_id()]->next_reap =
+	cachep->nodelists[numa_mem_id()]->next_reap =
 			jiffies + REAPTIMEOUT_LIST3 +
 			((unsigned long)cachep) % REAPTIMEOUT_LIST3;
 
@@ -2383,7 +2383,7 @@ static void check_spinlock_acquired(stru
 {
 #ifdef CONFIG_SMP
 	check_irq_off();
-	assert_spin_locked(&cachep->nodelists[numa_node_id()]->list_lock);
+	assert_spin_locked(&cachep->nodelists[numa_mem_id()]->list_lock);
 #endif
 }
 
@@ -2410,7 +2410,7 @@ static void do_drain(void *arg)
 {
 	struct kmem_cache *cachep = arg;
 	struct array_cache *ac;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 
 	check_irq_off();
 	ac = cpu_cache_get(cachep);
@@ -2943,7 +2943,7 @@ static void *cache_alloc_refill(struct k
 
 retry:
 	check_irq_off();
-	node = numa_node_id();
+	node = numa_mem_id();
 	ac = cpu_cache_get(cachep);
 	batchcount = ac->batchcount;
 	if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {
@@ -3147,7 +3147,7 @@ static void *alternate_node_alloc(struct
 
 	if (in_interrupt() || (flags & __GFP_THISNODE))
 		return NULL;
-	nid_alloc = nid_here = numa_node_id();
+	nid_alloc = nid_here = numa_mem_id();
 	if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
 		nid_alloc = cpuset_mem_spread_node();
 	else if (current->mempolicy)
@@ -3209,7 +3209,7 @@ retry:
 		if (local_flags & __GFP_WAIT)
 			local_irq_enable();
 		kmem_flagcheck(cache, flags);
-		obj = kmem_getpages(cache, local_flags, numa_node_id());
+		obj = kmem_getpages(cache, local_flags, numa_mem_id());
 		if (local_flags & __GFP_WAIT)
 			local_irq_disable();
 		if (obj) {
@@ -3316,6 +3316,7 @@ __cache_alloc_node(struct kmem_cache *ca
 {
 	unsigned long save_flags;
 	void *ptr;
+	int slab_node = numa_mem_id();
 
 	flags &= gfp_allowed_mask;
 
@@ -3328,7 +3329,7 @@ __cache_alloc_node(struct kmem_cache *ca
 	local_irq_save(save_flags);
 
 	if (nodeid == -1)
-		nodeid = numa_node_id();
+		nodeid = slab_node;
 
 	if (unlikely(!cachep->nodelists[nodeid])) {
 		/* Node not bootstrapped yet */
@@ -3336,7 +3337,7 @@ __cache_alloc_node(struct kmem_cache *ca
 		goto out;
 	}
 
-	if (nodeid == numa_node_id()) {
+	if (nodeid == slab_node) {
 		/*
 		 * Use the locally cached objects if possible.
 		 * However ____cache_alloc does not allow fallback
@@ -3380,8 +3381,8 @@ __do_cache_alloc(struct kmem_cache *cach
 	 * We may just have run out of memory on the local node.
 	 * ____cache_alloc_node() knows how to locate memory on other nodes
 	 */
- 	if (!objp)
- 		objp = ____cache_alloc_node(cache, flags, numa_node_id());
+	if (!objp)
+		objp = ____cache_alloc_node(cache, flags, numa_mem_id());
 
   out:
 	return objp;
@@ -3478,7 +3479,7 @@ static void cache_flusharray(struct kmem
 {
 	int batchcount;
 	struct kmem_list3 *l3;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 
 	batchcount = ac->batchcount;
 #if DEBUG
@@ -3923,7 +3924,7 @@ static int do_tune_cpucache(struct kmem_
 		return -ENOMEM;
 
 	for_each_online_cpu(i) {
-		new->new[i] = alloc_arraycache(cpu_to_node(i), limit,
+		new->new[i] = alloc_arraycache(cpu_to_mem(i), limit,
 						batchcount, gfp);
 		if (!new->new[i]) {
 			for (i--; i >= 0; i--)
@@ -3945,9 +3946,9 @@ static int do_tune_cpucache(struct kmem_
 		struct array_cache *ccold = new->new[i];
 		if (!ccold)
 			continue;
-		spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
-		free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i));
-		spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
+		spin_lock_irq(&cachep->nodelists[cpu_to_mem(i)]->list_lock);
+		free_block(cachep, ccold->entry, ccold->avail, cpu_to_mem(i));
+		spin_unlock_irq(&cachep->nodelists[cpu_to_mem(i)]->list_lock);
 		kfree(ccold);
 	}
 	kfree(new);
@@ -4053,7 +4054,7 @@ static void cache_reap(struct work_struc
 {
 	struct kmem_cache *searchp;
 	struct kmem_list3 *l3;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 	struct delayed_work *work = to_delayed_work(w);
 
 	if (!mutex_trylock(&cache_chain_mutex))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 7/8] numa: in-kernel profiling: use cpu_to_mem() for per cpu allocations
  2010-04-15 17:29 [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
                   ` (5 preceding siblings ...)
  2010-04-15 17:30 ` [PATCH 6/8] numa: slab: use numa_mem_id() for slab local memory node Lee Schermerhorn
@ 2010-04-15 17:30 ` Lee Schermerhorn
  2010-04-15 17:30 ` [PATCH 8/8] numa: update Documentation/vm/numa, add memoryless node info Lee Schermerhorn
  2010-04-18  3:19 ` [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Tejun Heo
  8 siblings, 0 replies; 29+ messages in thread
From: Lee Schermerhorn @ 2010-04-15 17:30 UTC (permalink / raw)
  To: linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi, Kleen, andi, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

Against:  2.6.34-rc3-mmotm-100405-1609

Patch:  in-kernel profiling -- support memoryless nodes.

In kernel profiling requires that we be able to allocate "local"
memory for each cpu.  Use "cpu_to_mem()" instead of "cpu_to_node()"
to support memoryless nodes.

Depends on the "numa_mem_id()" patch.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

---

New in V3.

V4: No change

 kernel/profile.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.34-rc3-mmotm-100405-1609/kernel/profile.c
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/kernel/profile.c	2010-04-07 10:04:02.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/kernel/profile.c	2010-04-07 10:11:38.000000000 -0400
@@ -363,7 +363,7 @@ static int __cpuinit profile_cpu_callbac
 	switch (action) {
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
-		node = cpu_to_node(cpu);
+		node = cpu_to_mem(cpu);
 		per_cpu(cpu_profile_flip, cpu) = 0;
 		if (!per_cpu(cpu_profile_hits, cpu)[1]) {
 			page = alloc_pages_exact_node(node,
@@ -565,7 +565,7 @@ static int create_hash_tables(void)
 	int cpu;
 
 	for_each_online_cpu(cpu) {
-		int node = cpu_to_node(cpu);
+		int node = cpu_to_mem(cpu);
 		struct page *page;
 
 		page = alloc_pages_exact_node(node,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 8/8] numa:  update Documentation/vm/numa, add memoryless node info
  2010-04-15 17:29 [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
                   ` (6 preceding siblings ...)
  2010-04-15 17:30 ` [PATCH 7/8] numa: in-kernel profiling: use cpu_to_mem() for per cpu allocations Lee Schermerhorn
@ 2010-04-15 17:30 ` Lee Schermerhorn
  2010-04-15 18:00   ` Randy Dunlap
  2010-04-16  0:50   ` KAMEZAWA Hiroyuki
  2010-04-18  3:19 ` [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Tejun Heo
  8 siblings, 2 replies; 29+ messages in thread
From: Lee Schermerhorn @ 2010-04-15 17:30 UTC (permalink / raw)
  To: linux-mm, linux-numa
  Cc: Tejun Heo, Mel Gorman, Andi, Kleen, andi, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

Against:  2.6.34-rc3-mmotm-100405-1609

Kamezawa Hiroyuki requested documentation for the numa_mem_id()
and slab related changes.  He suggested Documentation/vm/numa for
this documentation.  Looking at this file, it seems to me to be
hopelessly out of date relative to current Linux NUMA support.
At the risk of going down a rathole, I have made an attempt to
rewrite the doc at a slightly higher level [I think] and provide
pointers to other in-tree documents and out-of-tree man pages that
cover the details.

Let the games begin.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

---

New in V4.

 Documentation/vm/numa |  184 +++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 146 insertions(+), 38 deletions(-)

Index: linux-2.6.34-rc3-mmotm-100405-1609/Documentation/vm/numa
===================================================================
--- linux-2.6.34-rc3-mmotm-100405-1609.orig/Documentation/vm/numa	2010-04-07 09:49:13.000000000 -0400
+++ linux-2.6.34-rc3-mmotm-100405-1609/Documentation/vm/numa	2010-04-07 10:11:40.000000000 -0400
@@ -1,41 +1,149 @@
 Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com>
 
-The intent of this file is to have an uptodate, running commentary 
-from different people about NUMA specific code in the Linux vm.
+What is NUMA?
 
-What is NUMA? It is an architecture where the memory access times
-for different regions of memory from a given processor varies
-according to the "distance" of the memory region from the processor.
-Each region of memory to which access times are the same from any 
-cpu, is called a node. On such architectures, it is beneficial if
-the kernel tries to minimize inter node communications. Schemes
-for this range from kernel text and read-only data replication
-across nodes, and trying to house all the data structures that
-key components of the kernel need on memory on that node.
-
-Currently, all the numa support is to provide efficient handling
-of widely discontiguous physical memory, so architectures which 
-are not NUMA but can have huge holes in the physical address space
-can use the same code. All this code is bracketed by CONFIG_DISCONTIGMEM.
-
-The initial port includes NUMAizing the bootmem allocator code by
-encapsulating all the pieces of information into a bootmem_data_t
-structure. Node specific calls have been added to the allocator. 
-In theory, any platform which uses the bootmem allocator should 
-be able to put the bootmem and mem_map data structures anywhere
-it deems best.
-
-Each node's page allocation data structures have also been encapsulated
-into a pg_data_t. The bootmem_data_t is just one part of this. To 
-make the code look uniform between NUMA and regular UMA platforms, 
-UMA platforms have a statically allocated pg_data_t too (contig_page_data).
-For the sake of uniformity, the function num_online_nodes() is also defined
-for all platforms. As we run benchmarks, we might decide to NUMAize 
-more variables like low_on_memory, nr_free_pages etc into the pg_data_t.
-
-The NUMA aware page allocation code currently tries to allocate pages 
-from different nodes in a round robin manner.  This will be changed to 
-do concentratic circle search, starting from current node, once the 
-NUMA port achieves more maturity. The call alloc_pages_node has been 
-added, so that drivers can make the call and not worry about whether 
-it is running on a NUMA or UMA platform.
+This question can be answered from a couple of perspectives:  the
+hardware view and the Linux software view.
+
+From the hardware perspective, a NUMA system is a computer platform that
+comprises multiple components or assemblies each of which may contain 0
+or more cpus, local memory, and/or IO buses.  For brevity and to
+disambiguate the hardware view of these physical components/assemblies
+from the software abstraction thereof, we'll call the components/assemblies
+'cells' in this document.
+
+Each of the 'cells' may be viewed as an SMP [symmetric multi-processor] subset
+of the system--although some components necessary for a stand-alone SMP system
+may not be populated on any given cell.   The cells of the NUMA system are
+connected together with some sort of system interconnect--e.g., a crossbar or
+point-to-point link are common types of NUMA system interconnects.  Both of
+these types of interconnects can be aggregated to create NUMA platforms with
+cells at multiple distances from other cells.
+
+For Linux, the NUMA platforms of interest are primarily what is known as Cache
+Coherent NUMA or CCNuma systems.   With CCNUMA systems, all memory is visible
+to and accessible from any cpu attached to any cell and cache coherency
+is handled in hardware by the processor caches and/or the system interconnect.
+
+Memory access time and effective memory bandwidth varies depending on how far
+away the cell containing the cpu or io bus making the memory access is from the
+cell containing the target memory.  For example, access to memory by cpus
+attached to the same cell will experience faster access times and higher
+bandwidths than accesses to memory on other, remote cells.  NUMA platforms
+can have cells at multiple remote distances from any given cell.
+
+Platform vendors don't build NUMA systems just to make software developers'
+lives interesting.  Rather, this architecture is a means to provide scalable
+memory bandwidth.  However, to achieve scalable memory bandwidth, system and
+application software must arrange for a large majority of the memory references
+[cache misses] to be to "local" memory--memory on the same cell, if any--or
+to the closest cell with memory.
+
+This leads to the Linux software view of a NUMA system:
+
+Linux divides the system's hardware resources into multiple software
+abstractions called "nodes".  Linux maps the nodes onto the physical cells
+of the hardware platform, abstracting away some of the details for some
+architectures.  As with physical cells, software nodes may contain 0 or more
+cpus, memory and/or IO buses.  And, again, memory access times to memory on
+"closer" nodes [nodes that map to closer cells] will generally experience
+faster access times and higher effective bandwidth than accesses to more
+remote cells.
+
+For some architectures, such as x86, Linux will "hide" any node representing a
+physical cell that has no memory attached, and reassign any cpus attached to
+that cell to a node representing a cell that does have memory.  Thus, on
+these architectures, one cannot assume that all cpus that Linux associates with
+a given node will see the same local memory access times and bandwidth.
+
+In addition, for some architectures, again x86 is an example, Linux supports
+the emulation of additional nodes.  For NUMA emulation, linux will carve up
+the existing nodes--or the system memory for non-NUMA platforms--into multiple
+nodes.  Each emulated node will manage a fraction of the underlying cells'
+physical memory.  Numa emluation is useful for testing NUMA kernel and
+application features on non-NUMA platforms, and as a sort of memory resource
+management mechanism when used together with cpusets.
+[See Documentation/cgroups/cpusets.txt]
+
+For each node with memory, Linux constructs an independent memory management
+subsystem, complete with its own free page lists, in-use page lists, usage
+statistics and locks to mediate access.  In addition, Linux constructs for
+each memory zone [one or more of DMA, DMA32, NORMAL, HIGH_MEMORY, MOVABLE],
+an ordered "zonelist".  A zonelist specifies the zones/nodes to visit when a
+selected zone/node cannot satisfy the allocation request.  This situation,
+when a zone's has no available memory to satisfy a request, is called
+'overflow" or "fallback".
+
+Because some nodes contain multiple zones containing different types of
+memory, Linux must decide whether to order the zonelists such that allocations
+fall back to the same zone type on a different node, or to a different zone
+type on the same node.  This is an important consideration because some zones,
+such as DMA or DMA32, represent relatively scarce resources.  Linux chooses
+a default zonelist order based on the sizes of the various zone types relative
+to the total memory of the node and the total memory of the system.  The
+default zonelist order may be overridden using the numa_zonelist_order kernel
+boot parameter or sysctl.  [See Documentation/kernel-parameters.txt and
+Documentation/sysctl/vm.txt]
+
+By default, Linux will attempt to satisfy memory allocation requests from the
+node to which the cpu that executes the request is assigned.  Specifically,
+Linux will attempt to allocate from the first node in the appropriate zonelist
+for the node where the request originates.  This is called "local allocation."
+If the "local" node cannot satisfy the request, the kernel will examine other
+nodes' zones in the selected zonelist looking for the first zone in the list
+that can satisfy the request.
+
+Local allocation will tend to keep subsequent access to the allocated memory
+"local" to the underlying physical resources and off the system interconnect--
+as long as the task on whose behalf the kernel allocated some memory does not
+later migrate away from that memory.  The Linux scheduler is aware of the
+NUMA topology of the platform--embodied in the "scheduling domains" data
+structures [See Documentation/scheduler/sched-domains.txt]--and the scheduler
+attempts to minimize task migration to distant scheduling domains.  However,
+the scheduler does not take a task's NUMA footprint into account directly.
+Thus, under sufficient imbalance, tasks can migrate between nodes, remote
+from their initial node and kernel data structures.
+
+System administrators and application designers can restrict a tasks migration
+to improve NUMA locality using various cpu affinity command line interfaces,
+such as taskset(1) and numactl(1), and program interfaces such as
+sched_setaffinity(2).  Further, one can modify the kernel's default local
+allocation behavior using Linux NUMA memory policy.
+[See Documentation/vm/numa_memory_policy.]
+
+System administrators can restrict the cpus and nodes' memories that a non-
+privileged user can specify in the scheduling or NUMA commands and functions
+using control groups and cpusets.  [See Documentation/cgroups/cpusets.txt]
+
+On architectures that do not hide memoryless nodes, Linux will include only
+zones [nodes] with memory in the zonelists.  This means that for a memoryless
+node the "local memory node"--the node of the first zone in cpu's node's
+zonelist--will not be the node itself.  Rather, it will be the node that the
+kernel selected as the nearest node with memory when it built the zonelists.
+So, default, local allocations will succeed with the kernel supplying the
+closest available memory.  This is a consequence of the same mechanism that
+allows such allocations to fallback to other nearby nodes when a node that
+does contain memory overflows.
+
+Some kernel allocations do not want or cannot tolerate this allocation fallback
+behavior.  Rather they want to be sure they get memory from the specified node
+or get notified that the node has no free memory.  This is usually the case when
+a subsystem allocates per cpu memory resources, for example.
+
+A typical model for making such an allocation is to obtain the node id of the
+node to which the "current cpu" is attached using one of the kernel's
+numa_node_id() or cpu_to_node() functions and then request memory from only
+the node id returned.  When such an allocation fails, the requesting subsystem
+may revert to its own fallback path.  The slab kernel memory allocator is an
+example of this.  Or, the subsystem may chose to disable or not to enable
+itself on allocation failure.  The kernel profiling subsystem is an example of
+this.
+
+If the architecture supports [does not hide] memoryless nodes, then cpus
+attached to memoryless nodes would always incur the fallback path overhead
+or some subsystems would fail to initialize if they attempted to allocated
+memory exclusively from the a node without memory.  To support such
+architectures transparently, kernel subsystems can use the numa_mem_id()
+or cpu_to_mem() function to locate the "local memory node" for the calling or
+specified cpu.  Again, this is the same node from which default, local page
+allocations will be attempted.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 8/8] numa:  update Documentation/vm/numa, add memoryless node info
  2010-04-15 17:30 ` [PATCH 8/8] numa: update Documentation/vm/numa, add memoryless node info Lee Schermerhorn
@ 2010-04-15 18:00   ` Randy Dunlap
  2010-04-16  0:50   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 29+ messages in thread
From: Randy Dunlap @ 2010-04-15 18:00 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, linux-numa, Tejun Heo, Mel Gorman, Andi, Kleen, andi,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	Andrew Morton, KAMEZAWA Hiroyuki

On Thu, 15 Apr 2010 13:30:42 -0400 Lee Schermerhorn wrote:

> Against:  2.6.34-rc3-mmotm-100405-1609
> 
> Kamezawa Hiroyuki requested documentation for the numa_mem_id()
> and slab related changes.  He suggested Documentation/vm/numa for
> this documentation.  Looking at this file, it seems to me to be
> hopelessly out of date relative to current Linux NUMA support.
> At the risk of going down a rathole, I have made an attempt to
> rewrite the doc at a slightly higher level [I think] and provide
> pointers to other in-tree documents and out-of-tree man pages that
> cover the details.
> 
> Let the games begin.

OK.

> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
> ---
> 
> New in V4.
> 
>  Documentation/vm/numa |  184 +++++++++++++++++++++++++++++++++++++++-----------
>  1 files changed, 146 insertions(+), 38 deletions(-)
> 
> Index: linux-2.6.34-rc3-mmotm-100405-1609/Documentation/vm/numa
> ===================================================================
> --- linux-2.6.34-rc3-mmotm-100405-1609.orig/Documentation/vm/numa	2010-04-07 09:49:13.000000000 -0400
> +++ linux-2.6.34-rc3-mmotm-100405-1609/Documentation/vm/numa	2010-04-07 10:11:40.000000000 -0400
> @@ -1,41 +1,149 @@
>  Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com>
>  
> -The intent of this file is to have an uptodate, running commentary 
> -from different people about NUMA specific code in the Linux vm.
> +What is NUMA?
>  
...
> +This question can be answered from a couple of perspectives:  the
> +hardware view and the Linux software view.
> +
> +From the hardware perspective, a NUMA system is a computer platform that
> +comprises multiple components or assemblies each of which may contain 0
> +or more cpus, local memory, and/or IO buses.  For brevity and to
> +disambiguate the hardware view of these physical components/assemblies
> +from the software abstraction thereof, we'll call the components/assemblies
> +'cells' in this document.
> +
> +Each of the 'cells' may be viewed as an SMP [symmetric multi-processor] subset
> +of the system--although some components necessary for a stand-alone SMP system
> +may not be populated on any given cell.   The cells of the NUMA system are
> +connected together with some sort of system interconnect--e.g., a crossbar or
> +point-to-point link are common types of NUMA system interconnects.  Both of
> +these types of interconnects can be aggregated to create NUMA platforms with
> +cells at multiple distances from other cells.
> +
> +For Linux, the NUMA platforms of interest are primarily what is known as Cache
> +Coherent NUMA or CCNuma systems.   With CCNUMA systems, all memory is visible
> +to and accessible from any cpu attached to any cell and cache coherency
> +is handled in hardware by the processor caches and/or the system interconnect.
> +

CCNuma or CCNUMA ?

Please spell "cpu" as "CPU" (or plural: CPUs).
and "io" as "IO".

> +Memory access time and effective memory bandwidth varies depending on how far
> +away the cell containing the cpu or io bus making the memory access is from the
> +cell containing the target memory.  For example, access to memory by cpus
> +attached to the same cell will experience faster access times and higher
> +bandwidths than accesses to memory on other, remote cells.  NUMA platforms
> +can have cells at multiple remote distances from any given cell.
> +
> +Platform vendors don't build NUMA systems just to make software developers'
> +lives interesting.  Rather, this architecture is a means to provide scalable
> +memory bandwidth.  However, to achieve scalable memory bandwidth, system and
> +application software must arrange for a large majority of the memory references
> +[cache misses] to be to "local" memory--memory on the same cell, if any--or
> +to the closest cell with memory.
> +
> +This leads to the Linux software view of a NUMA system:
> +
> +Linux divides the system's hardware resources into multiple software
> +abstractions called "nodes".  Linux maps the nodes onto the physical cells
> +of the hardware platform, abstracting away some of the details for some
> +architectures.  As with physical cells, software nodes may contain 0 or more
> +cpus, memory and/or IO buses.  And, again, memory access times to memory on
> +"closer" nodes [nodes that map to closer cells] will generally experience
> +faster access times and higher effective bandwidth than accesses to more
> +remote cells.
> +
> +For some architectures, such as x86, Linux will "hide" any node representing a
> +physical cell that has no memory attached, and reassign any cpus attached to
> +that cell to a node representing a cell that does have memory.  Thus, on
> +these architectures, one cannot assume that all cpus that Linux associates with
> +a given node will see the same local memory access times and bandwidth.
> +
> +In addition, for some architectures, again x86 is an example, Linux supports
> +the emulation of additional nodes.  For NUMA emulation, linux will carve up
> +the existing nodes--or the system memory for non-NUMA platforms--into multiple
> +nodes.  Each emulated node will manage a fraction of the underlying cells'
> +physical memory.  Numa emluation is useful for testing NUMA kernel and

                     NUMA

> +application features on non-NUMA platforms, and as a sort of memory resource
> +management mechanism when used together with cpusets.
> +[See Documentation/cgroups/cpusets.txt]
> +
> +For each node with memory, Linux constructs an independent memory management
> +subsystem, complete with its own free page lists, in-use page lists, usage
> +statistics and locks to mediate access.  In addition, Linux constructs for
> +each memory zone [one or more of DMA, DMA32, NORMAL, HIGH_MEMORY, MOVABLE],
> +an ordered "zonelist".  A zonelist specifies the zones/nodes to visit when a
> +selected zone/node cannot satisfy the allocation request.  This situation,
> +when a zone's has no available memory to satisfy a request, is called

          zone

> +'overflow" or "fallback".

   "overflow"

> +
> +Because some nodes contain multiple zones containing different types of
> +memory, Linux must decide whether to order the zonelists such that allocations
> +fall back to the same zone type on a different node, or to a different zone
> +type on the same node.  This is an important consideration because some zones,
> +such as DMA or DMA32, represent relatively scarce resources.  Linux chooses
> +a default zonelist order based on the sizes of the various zone types relative
> +to the total memory of the node and the total memory of the system.  The
> +default zonelist order may be overridden using the numa_zonelist_order kernel
> +boot parameter or sysctl.  [See Documentation/kernel-parameters.txt and
> +Documentation/sysctl/vm.txt]
> +
> +By default, Linux will attempt to satisfy memory allocation requests from the
> +node to which the cpu that executes the request is assigned.  Specifically,
> +Linux will attempt to allocate from the first node in the appropriate zonelist
> +for the node where the request originates.  This is called "local allocation."
> +If the "local" node cannot satisfy the request, the kernel will examine other
> +nodes' zones in the selected zonelist looking for the first zone in the list
> +that can satisfy the request.
> +
> +Local allocation will tend to keep subsequent access to the allocated memory
> +"local" to the underlying physical resources and off the system interconnect--
> +as long as the task on whose behalf the kernel allocated some memory does not
> +later migrate away from that memory.  The Linux scheduler is aware of the
> +NUMA topology of the platform--embodied in the "scheduling domains" data
> +structures [See Documentation/scheduler/sched-domains.txt]--and the scheduler

               see

> +attempts to minimize task migration to distant scheduling domains.  However,
> +the scheduler does not take a task's NUMA footprint into account directly.
> +Thus, under sufficient imbalance, tasks can migrate between nodes, remote
> +from their initial node and kernel data structures.
> +
> +System administrators and application designers can restrict a tasks migration

                                                                  task's

> +to improve NUMA locality using various cpu affinity command line interfaces,
> +such as taskset(1) and numactl(1), and program interfaces such as
> +sched_setaffinity(2).  Further, one can modify the kernel's default local
> +allocation behavior using Linux NUMA memory policy.
> +[See Documentation/vm/numa_memory_policy.]
> +
> +System administrators can restrict the cpus and nodes' memories that a non-
> +privileged user can specify in the scheduling or NUMA commands and functions
> +using control groups and cpusets.  [See Documentation/cgroups/cpusets.txt]
> +
> +On architectures that do not hide memoryless nodes, Linux will include only
> +zones [nodes] with memory in the zonelists.  This means that for a memoryless
> +node the "local memory node"--the node of the first zone in cpu's node's
> +zonelist--will not be the node itself.  Rather, it will be the node that the
> +kernel selected as the nearest node with memory when it built the zonelists.
> +So, default, local allocations will succeed with the kernel supplying the
> +closest available memory.  This is a consequence of the same mechanism that
> +allows such allocations to fallback to other nearby nodes when a node that
> +does contain memory overflows.
> +
> +Some kernel allocations do not want or cannot tolerate this allocation fallback
> +behavior.  Rather they want to be sure they get memory from the specified node
> +or get notified that the node has no free memory.  This is usually the case when
> +a subsystem allocates per cpu memory resources, for example.
> +
> +A typical model for making such an allocation is to obtain the node id of the
> +node to which the "current cpu" is attached using one of the kernel's
> +numa_node_id() or cpu_to_node() functions and then request memory from only
> +the node id returned.  When such an allocation fails, the requesting subsystem
> +may revert to its own fallback path.  The slab kernel memory allocator is an
> +example of this.  Or, the subsystem may chose to disable or not to enable

                                           choose

> +itself on allocation failure.  The kernel profiling subsystem is an example of
> +this.
> +
> +If the architecture supports [does not hide] memoryless nodes, then cpus
> +attached to memoryless nodes would always incur the fallback path overhead
> +or some subsystems would fail to initialize if they attempted to allocated
> +memory exclusively from the a node without memory.  To support such
> +architectures transparently, kernel subsystems can use the numa_mem_id()
> +or cpu_to_mem() function to locate the "local memory node" for the calling or
> +specified cpu.  Again, this is the same node from which default, local page
> +allocations will be attempted.
> 
> --


Nice update, thanks.

---
~Randy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 8/8] numa:  update Documentation/vm/numa, add memoryless node info
  2010-04-15 17:30 ` [PATCH 8/8] numa: update Documentation/vm/numa, add memoryless node info Lee Schermerhorn
  2010-04-15 18:00   ` Randy Dunlap
@ 2010-04-16  0:50   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 29+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-16  0:50 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, linux-numa, Tejun Heo, Mel Gorman, andi,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	Andrew Morton

On Thu, 15 Apr 2010 13:30:42 -0400
Lee Schermerhorn <lee.schermerhorn@hp.com> wrote:

> Against:  2.6.34-rc3-mmotm-100405-1609
> 
> Kamezawa Hiroyuki requested documentation for the numa_mem_id()
> and slab related changes.  He suggested Documentation/vm/numa for
> this documentation.  Looking at this file, it seems to me to be
> hopelessly out of date relative to current Linux NUMA support.
> At the risk of going down a rathole, I have made an attempt to
> rewrite the doc at a slightly higher level [I think] and provide
> pointers to other in-tree documents and out-of-tree man pages that
> cover the details.
> 
> Let the games begin.
> 
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> 

Thank you, seems very nice and covers almost all range we have to explain
to new comers. 
My eye can't check details enough but...;)

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

I think this patch itself is very good.

Being more greedy...

Hmm, from user's view, I feel quick guide of

/sys/devices/system/node/
and 
 /sys/devices/system/node/node0/numastat 
can be added somewhere. (Documentation/numastat.txt is not under /vm :( )

And one more important? thing.

[kamezawa@firextal Documentation]$ cat /sys/bus/pci/devices/0000\:00\:01.0/numa_node
-1

PCI device (and other??) has numa_node_id in it, if it has locality information.
I hear some guy had to be aware locality of NIC to do high-throuput network
transaction. Then, "how to get device's locality via sysfs" is worth to be
written.

And mentioning what "nid = -1" means may help new comer.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/8] numa:  add generic percpu var numa_node_id() implementation
  2010-04-15 17:29 ` [PATCH 1/8] numa: add generic percpu var numa_node_id() implementation Lee Schermerhorn
@ 2010-04-16 16:43   ` Christoph Lameter
  2010-04-16 20:33   ` Andrew Morton
  2010-04-19  2:32   ` KAMEZAWA Hiroyuki
  2 siblings, 0 replies; 29+ messages in thread
From: Christoph Lameter @ 2010-04-16 16:43 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, linux-numa, Tejun Heo, Mel Gorman, Andi, Kleen, andi,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki


Reviewed-by: Christoph Lameter <cl@linux-foundation.org>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/8] numa:  x86_64:  use generic percpu var numa_node_id() implementation
  2010-04-15 17:30 ` [PATCH 2/8] numa: x86_64: use " Lee Schermerhorn
@ 2010-04-16 16:46   ` Christoph Lameter
  2010-04-18  2:56     ` Tejun Heo
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Lameter @ 2010-04-16 16:46 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, linux-numa, Tejun Heo, Mel Gorman, Andi, Kleen, andi,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

On Thu, 15 Apr 2010, Lee Schermerhorn wrote:

> x86 arch specific changes to use generic numa_node_id() based on
> generic percpu variable infrastructure.  Back out x86's custom
> version of numa_node_id()
>
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> [Christoph's signoff here?]

Hmmm. Its mostly your work now. Maybe Reviewed-by will be ok?

> @@ -809,7 +806,7 @@ void __cpuinit numa_set_node(int cpu, in
>  	per_cpu(x86_cpu_to_node_map, cpu) = node;
>
>  	if (node != NUMA_NO_NODE)
> -		per_cpu(node_number, cpu) = node;
> +		per_cpu(numa_node, cpu) = node;
>  }

Maybe provide a generic function to set the node for cpu X?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/8] numa:  add generic percpu var numa_node_id() implementation
  2010-04-15 17:29 ` [PATCH 1/8] numa: add generic percpu var numa_node_id() implementation Lee Schermerhorn
  2010-04-16 16:43   ` Christoph Lameter
@ 2010-04-16 20:33   ` Andrew Morton
  2010-04-19 13:22     ` Lee Schermerhorn
  2010-04-19  2:32   ` KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2010-04-16 20:33 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, linux-numa, Tejun Heo, Mel Gorman, andi,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	KAMEZAWA Hiroyuki, linux-arch

On Thu, 15 Apr 2010 13:29:56 -0400
Lee Schermerhorn <lee.schermerhorn@hp.com> wrote:

> Rework the generic version of the numa_node_id() function to use the
> new generic percpu variable infrastructure.
> 
> Guard the new implementation with a new config option:
> 
>         CONFIG_USE_PERCPU_NUMA_NODE_ID.
> 
> Archs which support this new implemention will default this option
> to 'y' when NUMA is configured.  This config option could be removed
> if/when all archs switch over to the generic percpu implementation
> of numa_node_id().  Arch support involves:
> 
>   1) converting any existing per cpu variable implementations to use
>      this implementation.  x86_64 is an instance of such an arch.
>   2) archs that don't use a per cpu variable for numa_node_id() will
>      need to initialize the new per cpu variable "numa_node" as cpus
>      are brought on-line.  ia64 is an example.
>   3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
>      when NUMA is configured.  This is required because I have
>      retained the old implementation by default to allow archs to
>      be modified incrementally, as desired.
> 
> Subsequent patches will convert x86_64 and ia64 to use this
> implemenation.

So which arches _aren't_ converted?  powerpc, sparc and alpha?

Is there sufficient info here for the maintainers to be able to
perform the conversion with minimal head-scratching?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/8] numa:  x86_64:  use generic percpu var numa_node_id() implementation
  2010-04-16 16:46   ` Christoph Lameter
@ 2010-04-18  2:56     ` Tejun Heo
  2010-04-29 16:56       ` Lee Schermerhorn
  0 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2010-04-18  2:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Lee Schermerhorn, linux-mm, linux-numa, Mel Gorman, Andi, Kleen,
	andi, Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

On 04/17/2010 01:46 AM, Christoph Lameter wrote:
> Maybe provide a generic function to set the node for cpu X?

Yeap, seconded.  Also, why not use numa_node_id() in
common.c::cpu_init()?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/8] numa:  Introduce numa_mem_id()- effective local memory node id
  2010-04-15 17:30 ` [PATCH 4/8] numa: Introduce numa_mem_id()- effective local memory node id Lee Schermerhorn
@ 2010-04-18  3:13   ` Tejun Heo
  0 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2010-04-18  3:13 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, linux-numa, Mel Gorman, Andi, Kleen, andi,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	Andrew Morton, KAMEZAWA Hiroyuki

On 04/16/2010 02:30 AM, Lee Schermerhorn wrote:
> +#ifdef CONFIG_HAVE_MEMORYLESS_NODES
> +
> +DECLARE_PER_CPU(int, numa_mem);
> +
> +#ifndef set_numa_mem
> +#define set_numa_mem(__node) percpu_write(numa_mem, __node)
> +#endif
> +
> +#else	/* !CONFIG_HAVE_MEMORYLESS_NODES */
> +
> +#define numa_mem numa_node

Please make it a macro which takes arguments or an inline function.
Name substitutions like this can easily lead to pretty strange
problems when they end up substituting local variable names.

> +static inline void set_numa_mem(int node) {}

and maybe it's a good idea to make the above one emit warning if the
given node id doesn't match the cpu's numa node id?  Also, in general,
setting numa id (cpu or mem) isn't a hot path and it would be better
to take both cpu and the node id arguments.  ie,

  set_numa_mem(unsigned int cpu, int node).

> +#endif	/* [!]CONFIG_HAVE_MEMORYLESS_NODES */
> +
> +#ifndef numa_mem_id
> +/* Returns the number of the nearest Node with memory */
> +#define numa_mem_id()		__this_cpu_read(numa_mem)
> +#endif
> +
> +#ifndef cpu_to_mem
> +#define cpu_to_mem(__cpu)	per_cpu(numa_mem, (__cpu))
> +#endif

Isn't cpu_to_mem() too generic?  Maybe it's a good idea to put 'numa'
or 'node' in the name?

> +#ifdef CONFIG_HAVE_MEMORYLESS_NODES
> +		/*
> +		 * We now know the "local memory node" for each node--
> +		 * i.e., the node of the first zone in the generic zonelist.
> +		 * Set up numa_mem percpu variable for on-line cpus.  During
> +		 * boot, only the boot cpu should be on-line;  we'll init the
> +		 * secondary cpus' numa_mem as they come on-line.  During
> +		 * node/memory hotplug, we'll fixup all on-line cpus.
> +		 */
> +		if (cpu_online(cpu))
> +			cpu_to_mem(cpu) = local_memory_node(cpu_to_node(cpu));

Please make cpu_to_node() evaluate to a rvalue and use set_numa_mem()
to set node.  The above is a bit too easy to get wrong when archs
override the macro.

> +#ifdef CONFIG_HAVE_MEMORYLESS_NODES
> +int local_memory_node(int node_id);
> +#else
> +static inline int local_memory_node(int node_id) { return node_id; };
> +#endif

Hmmm... can there be local_memory_node() users when MEMORYLESS_NODES
is not enabled?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/8] numa: ia64: support numa_mem_id() for memoryless nodes
  2010-04-15 17:30 ` [PATCH 5/8] numa: ia64: support numa_mem_id() for memoryless nodes Lee Schermerhorn
@ 2010-04-18  3:14   ` Tejun Heo
  0 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2010-04-18  3:14 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, linux-numa, Mel Gorman, Andi, Kleen, andi,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	Andrew Morton, KAMEZAWA Hiroyuki

Hello,

On 04/16/2010 02:30 AM, Lee Schermerhorn wrote:
> Against:  2.6.34-rc3-mmotm-100405-1609
> 
> IA64: Support memoryless nodes
> 
> Enable 'HAVE_MEMORYLESS_NODES' by default when NUMA configured
                                                     ^is
> on ia64.  Initialize percpu 'numa_mem' variable when starting
> secondary cpus.  Generic initialization will handle the boot
> cpu.
> 
> Nothing uses 'numa_mem_id()' yet.  Subsequent patch with modify
                                                      will
> slab to use this.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id()
  2010-04-15 17:29 [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
                   ` (7 preceding siblings ...)
  2010-04-15 17:30 ` [PATCH 8/8] numa: update Documentation/vm/numa, add memoryless node info Lee Schermerhorn
@ 2010-04-18  3:19 ` Tejun Heo
  2010-04-19 13:29   ` Lee Schermerhorn
  8 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2010-04-18  3:19 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, linux-numa, Mel Gorman, Andi, Kleen, andi,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	Andrew Morton, KAMEZAWA Hiroyuki

On 04/16/2010 02:29 AM, Lee Schermerhorn wrote:
> Use Generic Per cpu infrastructure for numa_*_id() V4
> 
> Series Against: 2.6.34-rc3-mmotm-100405-1609

Other than the minor nitpicks, the patchset looks great to me.
Through which tree should this be routed?  If no one else is gonna
take it, I can route it through percpu after patchset refresh.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/8] numa:  add generic percpu var numa_node_id() implementation
  2010-04-15 17:29 ` [PATCH 1/8] numa: add generic percpu var numa_node_id() implementation Lee Schermerhorn
  2010-04-16 16:43   ` Christoph Lameter
  2010-04-16 20:33   ` Andrew Morton
@ 2010-04-19  2:32   ` KAMEZAWA Hiroyuki
  2 siblings, 0 replies; 29+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-19  2:32 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, linux-numa, Tejun Heo, Mel Gorman, andi,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	Andrew Morton

On Thu, 15 Apr 2010 13:29:56 -0400
Lee Schermerhorn <lee.schermerhorn@hp.com> wrote:

> Against:  2.6.34-rc3-mmotm-100405-1609
> 
> Rework the generic version of the numa_node_id() function to use the
> new generic percpu variable infrastructure.
> 
> Guard the new implementation with a new config option:
> 
>         CONFIG_USE_PERCPU_NUMA_NODE_ID.
> 
> Archs which support this new implemention will default this option
> to 'y' when NUMA is configured.  This config option could be removed
> if/when all archs switch over to the generic percpu implementation
> of numa_node_id().  Arch support involves:
> 
>   1) converting any existing per cpu variable implementations to use
>      this implementation.  x86_64 is an instance of such an arch.
>   2) archs that don't use a per cpu variable for numa_node_id() will
>      need to initialize the new per cpu variable "numa_node" as cpus
>      are brought on-line.  ia64 is an example.
>   3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
>      when NUMA is configured.  This is required because I have
>      retained the old implementation by default to allow archs to
>      be modified incrementally, as desired.
> 
> Subsequent patches will convert x86_64 and ia64 to use this
> implemenation.
> 
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

> 
> ---
> 
> V0:
> #  From cl@linux-foundation.org Wed Nov  4 10:36:12 2009
> #  Date: Wed, 4 Nov 2009 12:35:14 -0500 (EST)
> #  From: Christoph Lameter <cl@linux-foundation.org>
> #  To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
> #  Subject: Re: [PATCH/RFC] slab:  handle memoryless nodes efficiently
> #
> #  I have a very early form of a draft of a patch here that genericizes
> #  numa_node_id(). Uses the new generic this_cpu_xxx stuff.
> #
> #  Not complete.
> 
> V1:
>   + split out x86 specific changes to subsequent patch
>   + split out "numa_mem_id()" and related changes to separate patch
>   + moved generic definitions of __this_cpu_xxx from linux/percpu.h
>     to asm-generic/percpu.h where asm/percpu.h and other asm hdrs
>     can use them.
>   + export new percpu symbol 'numa_node' in mm/percpu.h
>   + include <asm/percpu.h> in <linux/topology.h> for use by new
>     numa_node_id().
> 
> V2:
>   + add back the #ifndef/#endif guard around numa_node_id() so that archs
>     can override generic definition
>   + add generic stub for set_numa_node()
>   + use generic percpu numa_node_id() only if enabled by
>       CONFIG_USE_PERCPU_NUMA_NODE_ID
>    to allow incremental per arch support.  This option could be removed when/if
>    all archs that support NUMA support this option.
> 
> V3:
>   + separated the rework of linux/percpu.h into another [preceding] patch.
>   + moved definition of the numa_node percpu variable from mm/percpu.c to
>     mm/page-alloc.c
>   + moved premature definition of cpu_to_mem() to later patch.
> 
> V4:
>   + topology.h:  include <linux/percpu.h> rather than <linux/percpu-defs.h>
>     Requires Tejun Heo's percpu.h/slab.h cleanup series
> 
>  include/linux/topology.h |   33 ++++++++++++++++++++++++++++-----
>  mm/page_alloc.c          |    5 +++++
>  2 files changed, 33 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6.34-rc3-mmotm-100405-1609/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.34-rc3-mmotm-100405-1609.orig/mm/page_alloc.c	2010-04-07 10:04:04.000000000 -0400
> +++ linux-2.6.34-rc3-mmotm-100405-1609/mm/page_alloc.c	2010-04-07 10:10:23.000000000 -0400
> @@ -56,6 +56,11 @@
>  #include <asm/div64.h>
>  #include "internal.h"
>  
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +DEFINE_PER_CPU(int, numa_node);
> +EXPORT_PER_CPU_SYMBOL(numa_node);
> +#endif
> +
>  /*
>   * Array of node states.
>   */
> Index: linux-2.6.34-rc3-mmotm-100405-1609/include/linux/topology.h
> ===================================================================
> --- linux-2.6.34-rc3-mmotm-100405-1609.orig/include/linux/topology.h	2010-04-07 09:49:13.000000000 -0400
> +++ linux-2.6.34-rc3-mmotm-100405-1609/include/linux/topology.h	2010-04-07 10:10:23.000000000 -0400
> @@ -31,6 +31,7 @@
>  #include <linux/bitops.h>
>  #include <linux/mmzone.h>
>  #include <linux/smp.h>
> +#include <linux/percpu.h>
>  #include <asm/topology.h>
>  
>  #ifndef node_has_online_mem
> @@ -203,8 +204,35 @@ int arch_update_cpu_topology(void);
>  #ifndef SD_NODE_INIT
>  #error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
>  #endif
> +
>  #endif /* CONFIG_NUMA */
>  
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +DECLARE_PER_CPU(int, numa_node);
> +
> +#ifndef numa_node_id
> +/* Returns the number of the current Node. */
> +#define numa_node_id()		__this_cpu_read(numa_node)
> +#endif
> +
> +#ifndef cpu_to_node
> +#define cpu_to_node(__cpu)	per_cpu(numa_node, (__cpu))
> +#endif
> +
> +#ifndef set_numa_node
> +#define set_numa_node(__node) percpu_write(numa_node, __node)
> +#endif
> +
> +#else	/* !CONFIG_USE_PERCPU_NUMA_NODE_ID */
> +
> +/* Returns the number of the current Node. */
> +#ifndef numa_node_id
> +#define numa_node_id()		(cpu_to_node(raw_smp_processor_id()))
> +
> +#endif
> +
> +#endif	/* [!]CONFIG_USE_PERCPU_NUMA_NODE_ID */
> +
>  #ifndef topology_physical_package_id
>  #define topology_physical_package_id(cpu)	((void)(cpu), -1)
>  #endif
> @@ -218,9 +246,4 @@ int arch_update_cpu_topology(void);
>  #define topology_core_cpumask(cpu)		cpumask_of(cpu)
>  #endif
>  
> -/* Returns the number of the current Node. */
> -#ifndef numa_node_id
> -#define numa_node_id()		(cpu_to_node(raw_smp_processor_id()))
> -#endif
> -
>  #endif /* _LINUX_TOPOLOGY_H */
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/8] numa:  ia64:  use generic percpu var numa_node_id() implementation
  2010-04-15 17:30 ` [PATCH 3/8] numa: ia64: " Lee Schermerhorn
@ 2010-04-19  2:51   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 29+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-04-19  2:51 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, linux-numa, Tejun Heo, Mel Gorman, andi,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	Andrew Morton

On Thu, 15 Apr 2010 13:30:09 -0400
Lee Schermerhorn <lee.schermerhorn@hp.com> wrote:

> Against:  2.6.34-rc3-mmotm-100405-1609
> 
> ia64:  Use generic percpu implementation of numa_node_id()
>    + intialize per cpu 'numa_node'
>    + remove ia64 cpu_to_node() macro;  use generic
>    + define CONFIG_USE_PERCPU_NUMA_NODE_ID when NUMA configured
> 
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
> 

Reviewd-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

BTW, Could add some explanation about "when numa_node_id() turns to be available" ?

IIUC,
 - BOOT cpu ...  after smp_prepare_boot_cpu()
 - Other cpu ..  after smp_init() (i.e. always.)

Right ? I'm sorry if it's well-known.

Thanks,
-Kame



> ---
> 
> New in V2
> 
> V3, V4: no change
> 
>  arch/ia64/Kconfig                |    4 ++++
>  arch/ia64/include/asm/topology.h |    5 -----
>  arch/ia64/kernel/smpboot.c       |    6 ++++++
>  3 files changed, 10 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/kernel/smpboot.c
> ===================================================================
> --- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/ia64/kernel/smpboot.c	2010-04-07 10:03:38.000000000 -0400
> +++ linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/kernel/smpboot.c	2010-04-07 10:10:27.000000000 -0400
> @@ -390,6 +390,11 @@ smp_callin (void)
>  
>  	fix_b0_for_bsp();
>  
> +	/*
> +	 * numa_node_id() works after this.
> +	 */
> +	set_numa_node(cpu_to_node_map[cpuid]);
> +
>  	ipi_call_lock_irq();
>  	spin_lock(&vector_lock);
>  	/* Setup the per cpu irq handling data structures */
> @@ -632,6 +637,7 @@ void __devinit smp_prepare_boot_cpu(void
>  {
>  	cpu_set(smp_processor_id(), cpu_online_map);
>  	cpu_set(smp_processor_id(), cpu_callin_map);
> +	set_numa_node(cpu_to_node_map[smp_processor_id()]);
>  	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
>  	paravirt_post_smp_prepare_boot_cpu();
>  }
> Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/include/asm/topology.h
> ===================================================================
> --- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/ia64/include/asm/topology.h	2010-04-07 09:49:13.000000000 -0400
> +++ linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/include/asm/topology.h	2010-04-07 10:10:27.000000000 -0400
> @@ -26,11 +26,6 @@
>  #define RECLAIM_DISTANCE 15
>  
>  /*
> - * Returns the number of the node containing CPU 'cpu'
> - */
> -#define cpu_to_node(cpu) (int)(cpu_to_node_map[cpu])
> -
> -/*
>   * Returns a bitmask of CPUs on Node 'node'.
>   */
>  #define cpumask_of_node(node) ((node) == -1 ?				\
> Index: linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/Kconfig
> ===================================================================
> --- linux-2.6.34-rc3-mmotm-100405-1609.orig/arch/ia64/Kconfig	2010-04-07 10:04:03.000000000 -0400
> +++ linux-2.6.34-rc3-mmotm-100405-1609/arch/ia64/Kconfig	2010-04-07 10:10:27.000000000 -0400
> @@ -497,6 +497,10 @@ config HAVE_ARCH_NODEDATA_EXTENSION
>  	def_bool y
>  	depends on NUMA
>  
> +config USE_PERCPU_NUMA_NODE_ID
> +	def_bool y
> +	depends on NUMA
> +
>  config ARCH_PROC_KCORE_TEXT
>  	def_bool y
>  	depends on PROC_KCORE
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/8] numa:  add generic percpu var numa_node_id() implementation
  2010-04-16 20:33   ` Andrew Morton
@ 2010-04-19 13:22     ` Lee Schermerhorn
  0 siblings, 0 replies; 29+ messages in thread
From: Lee Schermerhorn @ 2010-04-19 13:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-numa, Tejun Heo, Mel Gorman, andi,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	KAMEZAWA Hiroyuki, linux-arch

On Fri, 2010-04-16 at 13:33 -0700, Andrew Morton wrote:
> On Thu, 15 Apr 2010 13:29:56 -0400
> Lee Schermerhorn <lee.schermerhorn@hp.com> wrote:
> 
> > Rework the generic version of the numa_node_id() function to use the
> > new generic percpu variable infrastructure.
> > 
> > Guard the new implementation with a new config option:
> > 
> >         CONFIG_USE_PERCPU_NUMA_NODE_ID.
> > 
> > Archs which support this new implemention will default this option
> > to 'y' when NUMA is configured.  This config option could be removed
> > if/when all archs switch over to the generic percpu implementation
> > of numa_node_id().  Arch support involves:
> > 
> >   1) converting any existing per cpu variable implementations to use
> >      this implementation.  x86_64 is an instance of such an arch.
> >   2) archs that don't use a per cpu variable for numa_node_id() will
> >      need to initialize the new per cpu variable "numa_node" as cpus
> >      are brought on-line.  ia64 is an example.
> >   3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
> >      when NUMA is configured.  This is required because I have
> >      retained the old implementation by default to allow archs to
> >      be modified incrementally, as desired.
> > 
> > Subsequent patches will convert x86_64 and ia64 to use this
> > implemenation.
> 
> So which arches _aren't_ converted?  powerpc, sparc and alpha?

Right.  Plus ARM, mips, ...

I could take a cut at other archs, but can't test them.  I'm hoping that
this patch doesn't break the existing implementation for them.  It
should be a no-op until the new support is enabled via Kconfig.  The
fact that both x86_64 and ia64 build with just this patch gives me some
hope but not a lot of confidence.

I see that you've merged the series with into -mm.  We'll see what
happens.  Of course, no reports of errors could just mean no testing.

> 
> Is there sufficient info here for the maintainers to be able to
> perform the conversion with minimal head-scratching?

Arch maintainers will need to chime in on that.  I'd hoped that the list
above and the examples of x86_64 and ia64 in the subsequent patches
would suffice.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id()
  2010-04-18  3:19 ` [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Tejun Heo
@ 2010-04-19 13:29   ` Lee Schermerhorn
  0 siblings, 0 replies; 29+ messages in thread
From: Lee Schermerhorn @ 2010-04-19 13:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-mm, linux-numa, Mel Gorman, andi, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

On Sun, 2010-04-18 at 12:19 +0900, Tejun Heo wrote:
> On 04/16/2010 02:29 AM, Lee Schermerhorn wrote:
> > Use Generic Per cpu infrastructure for numa_*_id() V4
> > 
> > Series Against: 2.6.34-rc3-mmotm-100405-1609
> 
> Other than the minor nitpicks, the patchset looks great to me.
> Through which tree should this be routed?  If no one else is gonna
> take it, I can route it through percpu after patchset refresh.

Andrew has merged this set into the -mm tree.  I think that's fine and
will proceed to address all of the comments there as incremental
patches.

I have comments/requests from yourself:

2/8:  seconding Christoph's suggestion re: generic function to add
generic function to set per cpu node id; plus suggestion to use
numa_node_id() in common.c::cpu_init().

4/8:  lose the "#define numa_mem numa_node".  I'll need to rework this.
Currently, one can access the per cpu variable 'numa_node' directly as
such.  I added 'numa_mem' [actually got it from Christoph's starter
patch] as an analog to numa_node.  I/Christoph wanted to eliminate the
redundant variable when it wasn't needed, but not break code that
directly accesses it.  Maybe better to not provide it at all?  

5/8:  wording error in patch description.

Randy D and Kamezawa-san:  comments on documentation patch

Kame-san:  request for clarification in 3/8

Thanks,
Lee





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/8] numa:  x86_64:  use generic percpu var numa_node_id() implementation
  2010-04-18  2:56     ` Tejun Heo
@ 2010-04-29 16:56       ` Lee Schermerhorn
  2010-04-30  4:58         ` Tejun Heo
  0 siblings, 1 reply; 29+ messages in thread
From: Lee Schermerhorn @ 2010-04-29 16:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Lameter, linux-mm, linux-numa, Mel Gorman, andi,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

On Sun, 2010-04-18 at 11:56 +0900, Tejun Heo wrote:
> On 04/17/2010 01:46 AM, Christoph Lameter wrote:
> > Maybe provide a generic function to set the node for cpu X?
> 
> Yeap, seconded.  Also, why not use numa_node_id() in
> common.c::cpu_init()?

Tejun:  do you mean:

#ifdef CONFIG_NUMA
        if (cpu != 0 && percpu_read(numa_node) == 0 &&
........................^ here?
            early_cpu_to_node(cpu) != NUMA_NO_NODE)
                set_numa_node(early_cpu_to_node(cpu));
#endif

Looks like 'numa_node_id()' would work there.

But, I wonder what the "cpu != 0 && percpu_read(numa_node) == 0" is
trying to do?

E.g., is "cpu != 0" testing "cpu != boot_cpu_id"?  Is there an implicit
assumption that the boot cpu is zero?  Or just a non-zero cpuid is
obviously initialized?

And the "percpu_read(numa_node) == 0" is testing that this cpu's
'numa_node' MAY not be initialized?  0 is a valid node id for !0 cpu
ids.  But it's OK to reinitialize numa_node in that case.

Just trying to grok the intent.  Maybe someone will chime in.

Anyway, if the intent is to test the percpu 'numa_node' for
initialization, using numa_node_id() might obscure this even more.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/8] numa:  x86_64:  use generic percpu var numa_node_id() implementation
  2010-04-29 16:56       ` Lee Schermerhorn
@ 2010-04-30  4:58         ` Tejun Heo
  2010-05-02  1:49           ` Christoph Lameter
  0 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2010-04-30  4:58 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Christoph Lameter, linux-mm, linux-numa, Mel Gorman, andi,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

Hello,

On 04/29/2010 06:56 PM, Lee Schermerhorn wrote:
> Tejun:  do you mean:
> 
> #ifdef CONFIG_NUMA
>         if (cpu != 0 && percpu_read(numa_node) == 0 &&
> ........................^ here?
>             early_cpu_to_node(cpu) != NUMA_NO_NODE)
>                 set_numa_node(early_cpu_to_node(cpu));
> #endif
> 
> Looks like 'numa_node_id()' would work there.

Yeah, it just looked weird to use raw variable when an access wrapper
is there.

> But, I wonder what the "cpu != 0 && percpu_read(numa_node) == 0" is
> trying to do?

That I have don't have any clue about.  :-)

> Just trying to grok the intent.  Maybe someone will chime in.

Christoph?  Mel?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/8] numa:  x86_64:  use generic percpu var numa_node_id() implementation
  2010-04-30  4:58         ` Tejun Heo
@ 2010-05-02  1:49           ` Christoph Lameter
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Lameter @ 2010-05-02  1:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lee Schermerhorn, linux-mm, linux-numa, Mel Gorman, andi,
	Nick Piggin, David Rientjes, eric.whitney, Andrew Morton,
	KAMEZAWA Hiroyuki

On Fri, 30 Apr 2010, Tejun Heo wrote:

> Hello,
>
> On 04/29/2010 06:56 PM, Lee Schermerhorn wrote:
> > Tejun:  do you mean:
> >
> > #ifdef CONFIG_NUMA
> >         if (cpu != 0 && percpu_read(numa_node) == 0 &&
> > ........................^ here?
> >             early_cpu_to_node(cpu) != NUMA_NO_NODE)
> >                 set_numa_node(early_cpu_to_node(cpu));
> > #endif
> >
> > Looks like 'numa_node_id()' would work there.
>
> Yeah, it just looked weird to use raw variable when an access wrapper
> is there.
>
> > But, I wonder what the "cpu != 0 && percpu_read(numa_node) == 0" is
> > trying to do?
>
> That I have don't have any clue about.  :-)

I guess that cpu 0 is used for booting and its initialized early when
certain functionality is not available yet.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/8] numa: slab:  use numa_mem_id() for slab local memory node
  2010-04-15 17:30 ` [PATCH 6/8] numa: slab: use numa_mem_id() for slab local memory node Lee Schermerhorn
@ 2010-05-12 18:49   ` Andrew Morton
  2010-05-12 19:11     ` Lee Schermerhorn
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2010-05-12 18:49 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, linux-numa, Tejun Heo, Mel Gorman, Andi, Kleen, andi,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	KAMEZAWA Hiroyuki

I have a note here that this patch "breaks slab.c".  But I don't recall what
the problem was and I don't see a fix against this patch in your recently-sent
fixup series?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/8] numa: slab:  use numa_mem_id() for slab local memory node
  2010-05-12 18:49   ` Andrew Morton
@ 2010-05-12 19:11     ` Lee Schermerhorn
  2010-05-12 19:25       ` Valdis.Kletnieks
  0 siblings, 1 reply; 29+ messages in thread
From: Lee Schermerhorn @ 2010-05-12 19:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-numa, Tejun Heo, Mel Gorman, Andi Kleen,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	KAMEZAWA Hiroyuki, Valdis.Kletnieks

On Wed, 2010-05-12 at 11:49 -0700, Andrew Morton wrote:
> I have a note here that this patch "breaks slab.c".  But I don't recall what
> the problem was and I don't see a fix against this patch in your recently-sent
> fixup series?

Is that Valdis Kletnieks' issue?  That was an i386 build.  Happened
because the earlier patches didn't properly default numa_mem_id() to
numa_node_id() for the i386 build.  The rework to those patches has
fixed that.   I have successfully built mmotm with the rework patches
for i386+!NUMA.  Valdis tested the series and confirmed that it fixed
the problem.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/8] numa: slab: use numa_mem_id() for slab local memory node
  2010-05-12 19:11     ` Lee Schermerhorn
@ 2010-05-12 19:25       ` Valdis.Kletnieks
  2010-05-12 20:03         ` Lee Schermerhorn
  0 siblings, 1 reply; 29+ messages in thread
From: Valdis.Kletnieks @ 2010-05-12 19:25 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Andrew Morton, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney, KAMEZAWA Hiroyuki

[-- Attachment #1: Type: text/plain, Size: 808 bytes --]

On Wed, 12 May 2010 15:11:43 EDT, Lee Schermerhorn said:
> On Wed, 2010-05-12 at 11:49 -0700, Andrew Morton wrote:
> > I have a note here that this patch "breaks slab.c".  But I don't recall what
> > the problem was and I don't see a fix against this patch in your recently-sent
> > fixup series?
> 
> Is that Valdis Kletnieks' issue?  That was an i386 build.  Happened
> because the earlier patches didn't properly default numa_mem_id() to
> numa_node_id() for the i386 build.  The rework to those patches has
> fixed that.   I have successfully built mmotm with the rework patches
> for i386+!NUMA.  Valdis tested the series and confirmed that it fixed
> the problem.

I thought the problem was common to both i386 and X86_64 non-NUMA (which is
where I hit the problem). In any case, builds OK for me now.

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/8] numa: slab: use numa_mem_id() for slab local memory node
  2010-05-12 19:25       ` Valdis.Kletnieks
@ 2010-05-12 20:03         ` Lee Schermerhorn
  0 siblings, 0 replies; 29+ messages in thread
From: Lee Schermerhorn @ 2010-05-12 20:03 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Andrew Morton, linux-mm, linux-numa, Tejun Heo, Mel Gorman,
	Andi Kleen, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney, KAMEZAWA Hiroyuki

On Wed, 2010-05-12 at 15:25 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Wed, 12 May 2010 15:11:43 EDT, Lee Schermerhorn said:
> > On Wed, 2010-05-12 at 11:49 -0700, Andrew Morton wrote:
> > > I have a note here that this patch "breaks slab.c".  But I don't recall what
> > > the problem was and I don't see a fix against this patch in your recently-sent
> > > fixup series?
> > 
> > Is that Valdis Kletnieks' issue?  That was an i386 build.  Happened
> > because the earlier patches didn't properly default numa_mem_id() to
> > numa_node_id() for the i386 build.  The rework to those patches has
> > fixed that.   I have successfully built mmotm with the rework patches
> > for i386+!NUMA.  Valdis tested the series and confirmed that it fixed
> > the problem.
> 
> I thought the problem was common to both i386 and X86_64 non-NUMA (which is
> where I hit the problem). In any case, builds OK for me now.

The x86_64 !NUMA issue was another one I introduced in the rework --
patch 1/7 first version you tested.   Fixed in the current version.

Happened because x86_64 defines it's own fallback for numa_node_id().
See the description of patch 1/7.  Turns out x86_64 builds fine with
NUMA or !NUMA if I just remove the !NUMA numa_node_id() definition.
I'll submit that patch shortly.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2010-05-12 20:03 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-15 17:29 [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
2010-04-15 17:29 ` [PATCH 1/8] numa: add generic percpu var numa_node_id() implementation Lee Schermerhorn
2010-04-16 16:43   ` Christoph Lameter
2010-04-16 20:33   ` Andrew Morton
2010-04-19 13:22     ` Lee Schermerhorn
2010-04-19  2:32   ` KAMEZAWA Hiroyuki
2010-04-15 17:30 ` [PATCH 2/8] numa: x86_64: use " Lee Schermerhorn
2010-04-16 16:46   ` Christoph Lameter
2010-04-18  2:56     ` Tejun Heo
2010-04-29 16:56       ` Lee Schermerhorn
2010-04-30  4:58         ` Tejun Heo
2010-05-02  1:49           ` Christoph Lameter
2010-04-15 17:30 ` [PATCH 3/8] numa: ia64: " Lee Schermerhorn
2010-04-19  2:51   ` KAMEZAWA Hiroyuki
2010-04-15 17:30 ` [PATCH 4/8] numa: Introduce numa_mem_id()- effective local memory node id Lee Schermerhorn
2010-04-18  3:13   ` Tejun Heo
2010-04-15 17:30 ` [PATCH 5/8] numa: ia64: support numa_mem_id() for memoryless nodes Lee Schermerhorn
2010-04-18  3:14   ` Tejun Heo
2010-04-15 17:30 ` [PATCH 6/8] numa: slab: use numa_mem_id() for slab local memory node Lee Schermerhorn
2010-05-12 18:49   ` Andrew Morton
2010-05-12 19:11     ` Lee Schermerhorn
2010-05-12 19:25       ` Valdis.Kletnieks
2010-05-12 20:03         ` Lee Schermerhorn
2010-04-15 17:30 ` [PATCH 7/8] numa: in-kernel profiling: use cpu_to_mem() for per cpu allocations Lee Schermerhorn
2010-04-15 17:30 ` [PATCH 8/8] numa: update Documentation/vm/numa, add memoryless node info Lee Schermerhorn
2010-04-15 18:00   ` Randy Dunlap
2010-04-16  0:50   ` KAMEZAWA Hiroyuki
2010-04-18  3:19 ` [PATCH 0/8] Numa: Use Generic Per-cpu Variables for numa_*_id() Tejun Heo
2010-04-19 13:29   ` Lee Schermerhorn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).