[PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id()
@ 2009-11-13 21:17 Lee Schermerhorn
  2009-11-13 21:17 ` [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id() Lee Schermerhorn
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Lee Schermerhorn @ 2009-11-13 21:17 UTC (permalink / raw)
  To: linux-arch, linux-mm
  Cc: akpm, Mel Gorman, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney

PATCH/RFC - 00/NN - numa:  Use generic per-cpu variables for numa_*_id()

In http://marc.info/?l=linux-mm&m=125683610312546&w=4 , I described a
performance problem with slab and memoryless nodes that we see on some
of our platforms.  I proposed modifying slab to use the "effective local
memory node"--the node that local mempolicy would select--as the "local"
node id for slab allocation purposes.  This will allow slab to cache objects
from its "local memory node" on the percpu queues, effectively eliminating
the problem.

Christoph Lameter suggested a more general approach using the generic
percpu support:  define a new interface--e.g., numa_mem_id()--that returns
the "effective local memory node" for the calling context [cpu].  For
nodes with memory, this will == the id of the node itself.  For memoryless
nodes, this will be the first node in the generic [!this_node] zonelist.

Christoph also suggested converting the current "numa_node_id()" interface
to use the generic percpu infrastructure.  x86[_64] supports a custom [arch-
specific] per cpu variable implementation of numa_node_id().  Most other
archs do a table lookup.

This series introduces a generic percpu implementation of numa_node_id()
and numa_mem_id() in separate patches based on an incomplete "starter patch"
from Christoph.  Both of these implementations are conditional on new respective
config options.  I know that new config options aren't popular, but this
allows other archs to adapt to the new implementations incrementally.

Additional patches provide x86_64 and ia64 arch specific changes to use the
new numa_node_id() implementation, and ia64 support for the numa_mem_id()
interface.  Finally, I've reimplemented the "slab memoryless node 'regression'
fix" patch linked above atop the new numa_mem_id() interface.

Ad hoc measurements on x86_64 using:  hackbench 400 process 200

2.6.32-rc5+mmotm-091101		no patch	this series
x86_64 avg of 40:		  4.605		  4.628  ~0.5%

Ia64 showed ~1.2% longer time with the series applied.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id()
  2009-11-13 21:17 [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
@ 2009-11-13 21:17 ` Lee Schermerhorn
  2009-11-20 15:46   ` Christoph Lameter
  2009-11-13 21:17 ` [PATCH/RFC 2/6] numa: x86_64: use generic percpu var numa_node_id() implementation Lee Schermerhorn
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Lee Schermerhorn @ 2009-11-13 21:17 UTC (permalink / raw)
  To: linux-arch, linux-mm
  Cc: akpm, Mel Gorman, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney

Against:  2.6.32-rc5-mmotm-091101-1001

Rework the generic version of the numa_node_id() function to use the
new generic percpu variable infrastructure.

Guard the new implementation with a new config option:

        CONFIG_USE_PERCPU_NUMA_NODE_ID.

Archs which support this new implemention will default this option
to 'y' when NUMA is configured.  This config option could be removed
if/when all archs switch over to the generic percpu implementation
of numa_node_id().  Arch support involves:

1) converting any existing per cpu variable implementations to use
   this implementation.  x86_64 is an instance of such an arch.
2) archs that don't use a per cpu variable for numa_node_id() will
   need to initialize the new per cpu variable "numa_node" as cpus
   are brought on-line.  ia64 is an example.

Subsequent patches will convert x86_64 and ia64 to use this
implemenation.


Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
[Christoph's signoff here?]

V0:
#  From cl@linux-foundation.org Wed Nov  4 10:36:12 2009
#  Date: Wed, 4 Nov 2009 12:35:14 -0500 (EST)
#  From: Christoph Lameter <cl@linux-foundation.org>
#  To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
#  Subject: Re: [PATCH/RFC] slab:  handle memoryless nodes efficiently
#
#  I have a very early form of a draft of a patch here that genericizes
#  numa_node_id(). Uses the new generic this_cpu_xxx stuff.
#
#  Not complete.

V1:
  + split out x86 specific changes to subsequent patch
  + split out "numa_mem_id()" and related changes to separate patch
  + moved generic definitions of __this_cpu_xxx from linux/percpu.h
    to asm-generic/percpu.h where asm/percpu.h and other asm hdrs
    can use them.
  + export new percpu symbol 'numa_node' in mm/percpu.h
  + include <asm/percpu.h> in <linux/topology.h> for use by new
    numa_node_id().

V2:
  + add back the #ifndef/#endif guard around numa_node_id() so that archs
    can override generic definition
  + add generic stub for set_numa_node()
  + use generic percpu numa_node_id() only if enabled by
      CONFIG_USE_PERCPU_NUMA_NODE_ID
   to allow incremental per arch support.  This option could be removed when/if
   all archs that support NUMA support this option.

 include/asm-generic/percpu.h   |  456 ++++++++++++++++++++++++++++++++++++++++-
 include/asm-generic/topology.h |    3 
 include/linux/percpu.h         |  454 ----------------------------------------
 include/linux/topology.h       |   32 ++
 mm/percpu.c                    |    8 
 5 files changed, 493 insertions(+), 460 deletions(-)

Index: linux-2.6.32-rc5-mmotm-091101-1001/include/linux/topology.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/include/linux/topology.h
+++ linux-2.6.32-rc5-mmotm-091101-1001/include/linux/topology.h
@@ -203,8 +203,35 @@ int arch_update_cpu_topology(void);
 #ifndef SD_NODE_INIT
 #error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
 #endif
+
 #endif /* CONFIG_NUMA */
 
+#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
+DECLARE_PER_CPU(int, numa_node);
+
+#ifndef numa_node_id
+/* Returns the number of the current Node. */
+#define numa_node_id()		__this_cpu_read(numa_node)
+#endif
+
+#ifndef cpu_to_node
+#define cpu_to_node(__cpu)	per_cpu(numa_node, (__cpu))
+#endif
+
+#ifndef set_numa_node
+#define set_numa_node(__node) percpu_write(numa_node, __node)
+#endif
+
+#else	/* !CONFIG_USE_PERCPU_NUMA_NODE_ID */
+
+/* Returns the number of the current Node. */
+#ifndef numa_node_id
+#define numa_node_id()		(cpu_to_node(raw_smp_processor_id()))
+
+#endif
+
+#endif	/* [!]CONFIG_USE_PERCPU_NUMA_NODE_ID */
+
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
@@ -218,9 +245,4 @@ int arch_update_cpu_topology(void);
 #define topology_core_cpumask(cpu)		cpumask_of(cpu)
 #endif
 
-/* Returns the number of the current Node. */
-#ifndef numa_node_id
-#define numa_node_id()		(cpu_to_node(raw_smp_processor_id()))
-#endif
-
 #endif /* _LINUX_TOPOLOGY_H */
Index: linux-2.6.32-rc5-mmotm-091101-1001/mm/percpu.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/mm/percpu.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/mm/percpu.c
@@ -2070,3 +2070,11 @@ void __init setup_per_cpu_areas(void)
 		__per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
 }
 #endif /* CONFIG_HAVE_SETUP_PER_CPU_AREA */
+
+/* NUMA Setup */
+
+#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
+DEFINE_PER_CPU(int, numa_node);
+EXPORT_PER_CPU_SYMBOL(numa_node);
+#endif
+
Index: linux-2.6.32-rc5-mmotm-091101-1001/include/asm-generic/topology.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/include/asm-generic/topology.h
+++ linux-2.6.32-rc5-mmotm-091101-1001/include/asm-generic/topology.h
@@ -34,6 +34,9 @@
 #ifndef cpu_to_node
 #define cpu_to_node(cpu)	((void)(cpu),0)
 #endif
+#ifndef cpu_to_mem
+#define cpu_to_mem(cpu)		((void)(cpu),0)
+#endif
 #ifndef parent_node
 #define parent_node(node)	((void)(node),0)
 #endif
Index: linux-2.6.32-rc5-mmotm-091101-1001/include/asm-generic/percpu.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/include/asm-generic/percpu.h
+++ linux-2.6.32-rc5-mmotm-091101-1001/include/asm-generic/percpu.h
@@ -63,11 +63,465 @@ extern unsigned long __per_cpu_offset[NR
 #define this_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, my_cpu_offset)
 #define __this_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, __my_cpu_offset)
 
-
 #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
 extern void setup_per_cpu_areas(void);
 #endif
 
+/*
+ * Optional methods for optimized non-lvalue per-cpu variable access.
+ *
+ * @var can be a percpu variable or a field of it and its size should
+ * equal char, int or long.  percpu_read() evaluates to a lvalue and
+ * all others to void.
+ *
+ * These operations are guaranteed to be atomic w.r.t. preemption.
+ * The generic versions use plain get/put_cpu_var().  Archs are
+ * encouraged to implement single-instruction alternatives which don't
+ * require preemption protection.
+ */
+#ifndef percpu_read
+# define percpu_read(var)						\
+  ({									\
+	typeof(var) *pr_ptr__ = &(var);					\
+	typeof(var) pr_ret__;						\
+	pr_ret__ = get_cpu_var(*pr_ptr__);				\
+	put_cpu_var(*pr_ptr__);						\
+	pr_ret__;							\
+  })
+#endif
+
+#define __percpu_generic_to_op(var, val, op)				\
+do {									\
+	typeof(var) *pgto_ptr__ = &(var);				\
+	get_cpu_var(*pgto_ptr__) op val;				\
+	put_cpu_var(*pgto_ptr__);					\
+} while (0)
+
+#ifndef percpu_write
+# define percpu_write(var, val)		__percpu_generic_to_op(var, (val), =)
+#endif
+
+#ifndef percpu_add
+# define percpu_add(var, val)		__percpu_generic_to_op(var, (val), +=)
+#endif
+
+#ifndef percpu_sub
+# define percpu_sub(var, val)		__percpu_generic_to_op(var, (val), -=)
+#endif
+
+#ifndef percpu_and
+# define percpu_and(var, val)		__percpu_generic_to_op(var, (val), &=)
+#endif
+
+#ifndef percpu_or
+# define percpu_or(var, val)		__percpu_generic_to_op(var, (val), |=)
+#endif
+
+#ifndef percpu_xor
+# define percpu_xor(var, val)		__percpu_generic_to_op(var, (val), ^=)
+#endif
+
+/*
+ * Branching function to split up a function into a set of functions that
+ * are called for different scalar sizes of the objects handled.
+ */
+
+extern void __bad_size_call_parameter(void);
+
+#define __pcpu_size_call_return(stem, variable)				\
+({	typeof(variable) pscr_ret__;					\
+	__verify_pcpu_ptr(&(variable));					\
+	switch(sizeof(variable)) {					\
+	case 1: pscr_ret__ = stem##1(variable);break;			\
+	case 2: pscr_ret__ = stem##2(variable);break;			\
+	case 4: pscr_ret__ = stem##4(variable);break;			\
+	case 8: pscr_ret__ = stem##8(variable);break;			\
+	default:							\
+		__bad_size_call_parameter();break;			\
+	}								\
+	pscr_ret__;							\
+})
+
+#define __pcpu_size_call(stem, variable, ...)				\
+do {									\
+	__verify_pcpu_ptr(&(variable));					\
+	switch(sizeof(variable)) {					\
+		case 1: stem##1(variable, __VA_ARGS__);break;		\
+		case 2: stem##2(variable, __VA_ARGS__);break;		\
+		case 4: stem##4(variable, __VA_ARGS__);break;		\
+		case 8: stem##8(variable, __VA_ARGS__);break;		\
+		default: 						\
+			__bad_size_call_parameter();break;		\
+	}								\
+} while (0)
+
+/*
+ * Optimized manipulation for memory allocated through the per cpu
+ * allocator or for addresses of per cpu variables.
+ *
+ * These operation guarantee exclusivity of access for other operations
+ * on the *same* processor. The assumption is that per cpu data is only
+ * accessed by a single processor instance (the current one).
+ *
+ * The first group is used for accesses that must be done in a
+ * preemption safe way since we know that the context is not preempt
+ * safe. Interrupts may occur. If the interrupt modifies the variable
+ * too then RMW actions will not be reliable.
+ *
+ * The arch code can provide optimized functions in two ways:
+ *
+ * 1. Override the function completely. F.e. define this_cpu_add().
+ *    The arch must then ensure that the various scalar format passed
+ *    are handled correctly.
+ *
+ * 2. Provide functions for certain scalar sizes. F.e. provide
+ *    this_cpu_add_2() to provide per cpu atomic operations for 2 byte
+ *    sized RMW actions. If arch code does not provide operations for
+ *    a scalar size then the fallback in the generic code will be
+ *    used.
+ */
+
+#define _this_cpu_generic_read(pcp)					\
+({	typeof(pcp) ret__;						\
+	preempt_disable();						\
+	ret__ = *this_cpu_ptr(&(pcp));					\
+	preempt_enable();						\
+	ret__;								\
+})
+
+#ifndef this_cpu_read
+# ifndef this_cpu_read_1
+#  define this_cpu_read_1(pcp)	_this_cpu_generic_read(pcp)
+# endif
+# ifndef this_cpu_read_2
+#  define this_cpu_read_2(pcp)	_this_cpu_generic_read(pcp)
+# endif
+# ifndef this_cpu_read_4
+#  define this_cpu_read_4(pcp)	_this_cpu_generic_read(pcp)
+# endif
+# ifndef this_cpu_read_8
+#  define this_cpu_read_8(pcp)	_this_cpu_generic_read(pcp)
+# endif
+# define this_cpu_read(pcp)	__pcpu_size_call_return(this_cpu_read_, (pcp))
+#endif
+
+#define _this_cpu_generic_to_op(pcp, val, op)				\
+do {									\
+	preempt_disable();						\
+	*__this_cpu_ptr(&(pcp)) op val;					\
+	preempt_enable();						\
+} while (0)
+
+#ifndef this_cpu_write
+# ifndef this_cpu_write_1
+#  define this_cpu_write_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef this_cpu_write_2
+#  define this_cpu_write_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef this_cpu_write_4
+#  define this_cpu_write_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef this_cpu_write_8
+#  define this_cpu_write_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# define this_cpu_write(pcp, val)	__pcpu_size_call(this_cpu_write_, (pcp), (val))
+#endif
+
+#ifndef this_cpu_add
+# ifndef this_cpu_add_1
+#  define this_cpu_add_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef this_cpu_add_2
+#  define this_cpu_add_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef this_cpu_add_4
+#  define this_cpu_add_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef this_cpu_add_8
+#  define this_cpu_add_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# define this_cpu_add(pcp, val)		__pcpu_size_call(this_cpu_add_, (pcp), (val))
+#endif
+
+#ifndef this_cpu_sub
+# define this_cpu_sub(pcp, val)		this_cpu_add((pcp), -(val))
+#endif
+
+#ifndef this_cpu_inc
+# define this_cpu_inc(pcp)		this_cpu_add((pcp), 1)
+#endif
+
+#ifndef this_cpu_dec
+# define this_cpu_dec(pcp)		this_cpu_sub((pcp), 1)
+#endif
+
+#ifndef this_cpu_and
+# ifndef this_cpu_and_1
+#  define this_cpu_and_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef this_cpu_and_2
+#  define this_cpu_and_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef this_cpu_and_4
+#  define this_cpu_and_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef this_cpu_and_8
+#  define this_cpu_and_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# define this_cpu_and(pcp, val)		__pcpu_size_call(this_cpu_and_, (pcp), (val))
+#endif
+
+#ifndef this_cpu_or
+# ifndef this_cpu_or_1
+#  define this_cpu_or_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef this_cpu_or_2
+#  define this_cpu_or_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef this_cpu_or_4
+#  define this_cpu_or_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef this_cpu_or_8
+#  define this_cpu_or_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# define this_cpu_or(pcp, val)		__pcpu_size_call(this_cpu_or_, (pcp), (val))
+#endif
+
+#ifndef this_cpu_xor
+# ifndef this_cpu_xor_1
+#  define this_cpu_xor_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef this_cpu_xor_2
+#  define this_cpu_xor_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef this_cpu_xor_4
+#  define this_cpu_xor_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef this_cpu_xor_8
+#  define this_cpu_xor_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# define this_cpu_xor(pcp, val)		__pcpu_size_call(this_cpu_or_, (pcp), (val))
+#endif
+
+/*
+ * Generic percpu operations that do not require preemption handling.
+ * Either we do not care about races or the caller has the
+ * responsibility of handling preemptions issues. Arch code can still
+ * override these instructions since the arch per cpu code may be more
+ * efficient and may actually get race freeness for free (that is the
+ * case for x86 for example).
+ *
+ * If there is no other protection through preempt disable and/or
+ * disabling interupts then one of these RMW operations can show unexpected
+ * behavior because the execution thread was rescheduled on another processor
+ * or an interrupt occurred and the same percpu variable was modified from
+ * the interrupt context.
+ */
+#ifndef __this_cpu_read
+# ifndef __this_cpu_read_1
+#  define __this_cpu_read_1(pcp)	(*__this_cpu_ptr(&(pcp)))
+# endif
+# ifndef __this_cpu_read_2
+#  define __this_cpu_read_2(pcp)	(*__this_cpu_ptr(&(pcp)))
+# endif
+# ifndef __this_cpu_read_4
+#  define __this_cpu_read_4(pcp)	(*__this_cpu_ptr(&(pcp)))
+# endif
+# ifndef __this_cpu_read_8
+#  define __this_cpu_read_8(pcp)	(*__this_cpu_ptr(&(pcp)))
+# endif
+# define __this_cpu_read(pcp)	__pcpu_size_call_return(__this_cpu_read_, (pcp))
+#endif
+
+#define __this_cpu_generic_to_op(pcp, val, op)				\
+do {									\
+	*__this_cpu_ptr(&(pcp)) op val;					\
+} while (0)
+
+#ifndef __this_cpu_write
+# ifndef __this_cpu_write_1
+#  define __this_cpu_write_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef __this_cpu_write_2
+#  define __this_cpu_write_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef __this_cpu_write_4
+#  define __this_cpu_write_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# ifndef __this_cpu_write_8
+#  define __this_cpu_write_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
+# endif
+# define __this_cpu_write(pcp, val)	__pcpu_size_call(__this_cpu_write_, (pcp), (val))
+#endif
+
+#ifndef __this_cpu_add
+# ifndef __this_cpu_add_1
+#  define __this_cpu_add_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef __this_cpu_add_2
+#  define __this_cpu_add_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef __this_cpu_add_4
+#  define __this_cpu_add_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef __this_cpu_add_8
+#  define __this_cpu_add_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# define __this_cpu_add(pcp, val)	__pcpu_size_call(__this_cpu_add_, (pcp), (val))
+#endif
+
+#ifndef __this_cpu_sub
+# define __this_cpu_sub(pcp, val)	__this_cpu_add((pcp), -(val))
+#endif
+
+#ifndef __this_cpu_inc
+# define __this_cpu_inc(pcp)		__this_cpu_add((pcp), 1)
+#endif
+
+#ifndef __this_cpu_dec
+# define __this_cpu_dec(pcp)		__this_cpu_sub((pcp), 1)
+#endif
+
+#ifndef __this_cpu_and
+# ifndef __this_cpu_and_1
+#  define __this_cpu_and_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef __this_cpu_and_2
+#  define __this_cpu_and_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef __this_cpu_and_4
+#  define __this_cpu_and_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef __this_cpu_and_8
+#  define __this_cpu_and_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# define __this_cpu_and(pcp, val)	__pcpu_size_call(__this_cpu_and_, (pcp), (val))
+#endif
+
+#ifndef __this_cpu_or
+# ifndef __this_cpu_or_1
+#  define __this_cpu_or_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef __this_cpu_or_2
+#  define __this_cpu_or_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef __this_cpu_or_4
+#  define __this_cpu_or_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef __this_cpu_or_8
+#  define __this_cpu_or_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# define __this_cpu_or(pcp, val)	__pcpu_size_call(__this_cpu_or_, (pcp), (val))
+#endif
+
+#ifndef __this_cpu_xor
+# ifndef __this_cpu_xor_1
+#  define __this_cpu_xor_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef __this_cpu_xor_2
+#  define __this_cpu_xor_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef __this_cpu_xor_4
+#  define __this_cpu_xor_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef __this_cpu_xor_8
+#  define __this_cpu_xor_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# define __this_cpu_xor(pcp, val)	__pcpu_size_call(__this_cpu_xor_, (pcp), (val))
+#endif
+
+/*
+ * IRQ safe versions of the per cpu RMW operations. Note that these operations
+ * are *not* safe against modification of the same variable from another
+ * processors (which one gets when using regular atomic operations)
+ . They are guaranteed to be atomic vs. local interrupts and
+ * preemption only.
+ */
+#define irqsafe_cpu_generic_to_op(pcp, val, op)				\
+do {									\
+	unsigned long flags;						\
+	local_irq_save(flags);						\
+	*__this_cpu_ptr(&(pcp)) op val;					\
+	local_irq_restore(flags);					\
+} while (0)
+
+#ifndef irqsafe_cpu_add
+# ifndef irqsafe_cpu_add_1
+#  define irqsafe_cpu_add_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef irqsafe_cpu_add_2
+#  define irqsafe_cpu_add_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef irqsafe_cpu_add_4
+#  define irqsafe_cpu_add_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# ifndef irqsafe_cpu_add_8
+#  define irqsafe_cpu_add_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
+# endif
+# define irqsafe_cpu_add(pcp, val) __pcpu_size_call(irqsafe_cpu_add_, (pcp), (val))
+#endif
+
+#ifndef irqsafe_cpu_sub
+# define irqsafe_cpu_sub(pcp, val)	irqsafe_cpu_add((pcp), -(val))
+#endif
+
+#ifndef irqsafe_cpu_inc
+# define irqsafe_cpu_inc(pcp)	irqsafe_cpu_add((pcp), 1)
+#endif
+
+#ifndef irqsafe_cpu_dec
+# define irqsafe_cpu_dec(pcp)	irqsafe_cpu_sub((pcp), 1)
+#endif
+
+#ifndef irqsafe_cpu_and
+# ifndef irqsafe_cpu_and_1
+#  define irqsafe_cpu_and_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef irqsafe_cpu_and_2
+#  define irqsafe_cpu_and_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef irqsafe_cpu_and_4
+#  define irqsafe_cpu_and_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# ifndef irqsafe_cpu_and_8
+#  define irqsafe_cpu_and_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
+# endif
+# define irqsafe_cpu_and(pcp, val) __pcpu_size_call(irqsafe_cpu_and_, (val))
+#endif
+
+#ifndef irqsafe_cpu_or
+# ifndef irqsafe_cpu_or_1
+#  define irqsafe_cpu_or_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef irqsafe_cpu_or_2
+#  define irqsafe_cpu_or_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef irqsafe_cpu_or_4
+#  define irqsafe_cpu_or_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# ifndef irqsafe_cpu_or_8
+#  define irqsafe_cpu_or_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
+# endif
+# define irqsafe_cpu_or(pcp, val) __pcpu_size_call(irqsafe_cpu_or_, (val))
+#endif
+
+#ifndef irqsafe_cpu_xor
+# ifndef irqsafe_cpu_xor_1
+#  define irqsafe_cpu_xor_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef irqsafe_cpu_xor_2
+#  define irqsafe_cpu_xor_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef irqsafe_cpu_xor_4
+#  define irqsafe_cpu_xor_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# ifndef irqsafe_cpu_xor_8
+#  define irqsafe_cpu_xor_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
+# endif
+# define irqsafe_cpu_xor(pcp, val) __pcpu_size_call(irqsafe_cpu_xor_, (val))
+#endif
+
 #else /* ! SMP */
 
 #define per_cpu(var, cpu)			(*((void)(cpu), &(var)))
Index: linux-2.6.32-rc5-mmotm-091101-1001/include/linux/percpu.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/include/linux/percpu.h
+++ linux-2.6.32-rc5-mmotm-091101-1001/include/linux/percpu.h
@@ -174,459 +174,5 @@ static inline void *pcpu_lpage_remapped(
 #define alloc_percpu(type)	\
 	(typeof(type) __percpu *)__alloc_percpu(sizeof(type), __alignof__(type))
 
-/*
- * Optional methods for optimized non-lvalue per-cpu variable access.
- *
- * @var can be a percpu variable or a field of it and its size should
- * equal char, int or long.  percpu_read() evaluates to a lvalue and
- * all others to void.
- *
- * These operations are guaranteed to be atomic w.r.t. preemption.
- * The generic versions use plain get/put_cpu_var().  Archs are
- * encouraged to implement single-instruction alternatives which don't
- * require preemption protection.
- */
-#ifndef percpu_read
-# define percpu_read(var)						\
-  ({									\
-	typeof(var) *pr_ptr__ = &(var);					\
-	typeof(var) pr_ret__;						\
-	pr_ret__ = get_cpu_var(*pr_ptr__);				\
-	put_cpu_var(*pr_ptr__);						\
-	pr_ret__;							\
-  })
-#endif
-
-#define __percpu_generic_to_op(var, val, op)				\
-do {									\
-	typeof(var) *pgto_ptr__ = &(var);				\
-	get_cpu_var(*pgto_ptr__) op val;				\
-	put_cpu_var(*pgto_ptr__);					\
-} while (0)
-
-#ifndef percpu_write
-# define percpu_write(var, val)		__percpu_generic_to_op(var, (val), =)
-#endif
-
-#ifndef percpu_add
-# define percpu_add(var, val)		__percpu_generic_to_op(var, (val), +=)
-#endif
-
-#ifndef percpu_sub
-# define percpu_sub(var, val)		__percpu_generic_to_op(var, (val), -=)
-#endif
-
-#ifndef percpu_and
-# define percpu_and(var, val)		__percpu_generic_to_op(var, (val), &=)
-#endif
-
-#ifndef percpu_or
-# define percpu_or(var, val)		__percpu_generic_to_op(var, (val), |=)
-#endif
-
-#ifndef percpu_xor
-# define percpu_xor(var, val)		__percpu_generic_to_op(var, (val), ^=)
-#endif
-
-/*
- * Branching function to split up a function into a set of functions that
- * are called for different scalar sizes of the objects handled.
- */
-
-extern void __bad_size_call_parameter(void);
-
-#define __pcpu_size_call_return(stem, variable)				\
-({	typeof(variable) pscr_ret__;					\
-	__verify_pcpu_ptr(&(variable));					\
-	switch(sizeof(variable)) {					\
-	case 1: pscr_ret__ = stem##1(variable);break;			\
-	case 2: pscr_ret__ = stem##2(variable);break;			\
-	case 4: pscr_ret__ = stem##4(variable);break;			\
-	case 8: pscr_ret__ = stem##8(variable);break;			\
-	default:							\
-		__bad_size_call_parameter();break;			\
-	}								\
-	pscr_ret__;							\
-})
-
-#define __pcpu_size_call(stem, variable, ...)				\
-do {									\
-	__verify_pcpu_ptr(&(variable));					\
-	switch(sizeof(variable)) {					\
-		case 1: stem##1(variable, __VA_ARGS__);break;		\
-		case 2: stem##2(variable, __VA_ARGS__);break;		\
-		case 4: stem##4(variable, __VA_ARGS__);break;		\
-		case 8: stem##8(variable, __VA_ARGS__);break;		\
-		default: 						\
-			__bad_size_call_parameter();break;		\
-	}								\
-} while (0)
-
-/*
- * Optimized manipulation for memory allocated through the per cpu
- * allocator or for addresses of per cpu variables.
- *
- * These operation guarantee exclusivity of access for other operations
- * on the *same* processor. The assumption is that per cpu data is only
- * accessed by a single processor instance (the current one).
- *
- * The first group is used for accesses that must be done in a
- * preemption safe way since we know that the context is not preempt
- * safe. Interrupts may occur. If the interrupt modifies the variable
- * too then RMW actions will not be reliable.
- *
- * The arch code can provide optimized functions in two ways:
- *
- * 1. Override the function completely. F.e. define this_cpu_add().
- *    The arch must then ensure that the various scalar format passed
- *    are handled correctly.
- *
- * 2. Provide functions for certain scalar sizes. F.e. provide
- *    this_cpu_add_2() to provide per cpu atomic operations for 2 byte
- *    sized RMW actions. If arch code does not provide operations for
- *    a scalar size then the fallback in the generic code will be
- *    used.
- */
-
-#define _this_cpu_generic_read(pcp)					\
-({	typeof(pcp) ret__;						\
-	preempt_disable();						\
-	ret__ = *this_cpu_ptr(&(pcp));					\
-	preempt_enable();						\
-	ret__;								\
-})
-
-#ifndef this_cpu_read
-# ifndef this_cpu_read_1
-#  define this_cpu_read_1(pcp)	_this_cpu_generic_read(pcp)
-# endif
-# ifndef this_cpu_read_2
-#  define this_cpu_read_2(pcp)	_this_cpu_generic_read(pcp)
-# endif
-# ifndef this_cpu_read_4
-#  define this_cpu_read_4(pcp)	_this_cpu_generic_read(pcp)
-# endif
-# ifndef this_cpu_read_8
-#  define this_cpu_read_8(pcp)	_this_cpu_generic_read(pcp)
-# endif
-# define this_cpu_read(pcp)	__pcpu_size_call_return(this_cpu_read_, (pcp))
-#endif
-
-#define _this_cpu_generic_to_op(pcp, val, op)				\
-do {									\
-	preempt_disable();						\
-	*__this_cpu_ptr(&(pcp)) op val;					\
-	preempt_enable();						\
-} while (0)
-
-#ifndef this_cpu_write
-# ifndef this_cpu_write_1
-#  define this_cpu_write_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef this_cpu_write_2
-#  define this_cpu_write_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef this_cpu_write_4
-#  define this_cpu_write_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef this_cpu_write_8
-#  define this_cpu_write_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# define this_cpu_write(pcp, val)	__pcpu_size_call(this_cpu_write_, (pcp), (val))
-#endif
-
-#ifndef this_cpu_add
-# ifndef this_cpu_add_1
-#  define this_cpu_add_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef this_cpu_add_2
-#  define this_cpu_add_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef this_cpu_add_4
-#  define this_cpu_add_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef this_cpu_add_8
-#  define this_cpu_add_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# define this_cpu_add(pcp, val)		__pcpu_size_call(this_cpu_add_, (pcp), (val))
-#endif
-
-#ifndef this_cpu_sub
-# define this_cpu_sub(pcp, val)		this_cpu_add((pcp), -(val))
-#endif
-
-#ifndef this_cpu_inc
-# define this_cpu_inc(pcp)		this_cpu_add((pcp), 1)
-#endif
-
-#ifndef this_cpu_dec
-# define this_cpu_dec(pcp)		this_cpu_sub((pcp), 1)
-#endif
-
-#ifndef this_cpu_and
-# ifndef this_cpu_and_1
-#  define this_cpu_and_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef this_cpu_and_2
-#  define this_cpu_and_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef this_cpu_and_4
-#  define this_cpu_and_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef this_cpu_and_8
-#  define this_cpu_and_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# define this_cpu_and(pcp, val)		__pcpu_size_call(this_cpu_and_, (pcp), (val))
-#endif
-
-#ifndef this_cpu_or
-# ifndef this_cpu_or_1
-#  define this_cpu_or_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef this_cpu_or_2
-#  define this_cpu_or_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef this_cpu_or_4
-#  define this_cpu_or_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef this_cpu_or_8
-#  define this_cpu_or_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# define this_cpu_or(pcp, val)		__pcpu_size_call(this_cpu_or_, (pcp), (val))
-#endif
-
-#ifndef this_cpu_xor
-# ifndef this_cpu_xor_1
-#  define this_cpu_xor_1(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef this_cpu_xor_2
-#  define this_cpu_xor_2(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef this_cpu_xor_4
-#  define this_cpu_xor_4(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef this_cpu_xor_8
-#  define this_cpu_xor_8(pcp, val)	_this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# define this_cpu_xor(pcp, val)		__pcpu_size_call(this_cpu_or_, (pcp), (val))
-#endif
-
-/*
- * Generic percpu operations that do not require preemption handling.
- * Either we do not care about races or the caller has the
- * responsibility of handling preemptions issues. Arch code can still
- * override these instructions since the arch per cpu code may be more
- * efficient and may actually get race freeness for free (that is the
- * case for x86 for example).
- *
- * If there is no other protection through preempt disable and/or
- * disabling interupts then one of these RMW operations can show unexpected
- * behavior because the execution thread was rescheduled on another processor
- * or an interrupt occurred and the same percpu variable was modified from
- * the interrupt context.
- */
-#ifndef __this_cpu_read
-# ifndef __this_cpu_read_1
-#  define __this_cpu_read_1(pcp)	(*__this_cpu_ptr(&(pcp)))
-# endif
-# ifndef __this_cpu_read_2
-#  define __this_cpu_read_2(pcp)	(*__this_cpu_ptr(&(pcp)))
-# endif
-# ifndef __this_cpu_read_4
-#  define __this_cpu_read_4(pcp)	(*__this_cpu_ptr(&(pcp)))
-# endif
-# ifndef __this_cpu_read_8
-#  define __this_cpu_read_8(pcp)	(*__this_cpu_ptr(&(pcp)))
-# endif
-# define __this_cpu_read(pcp)	__pcpu_size_call_return(__this_cpu_read_, (pcp))
-#endif
-
-#define __this_cpu_generic_to_op(pcp, val, op)				\
-do {									\
-	*__this_cpu_ptr(&(pcp)) op val;					\
-} while (0)
-
-#ifndef __this_cpu_write
-# ifndef __this_cpu_write_1
-#  define __this_cpu_write_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef __this_cpu_write_2
-#  define __this_cpu_write_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef __this_cpu_write_4
-#  define __this_cpu_write_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# ifndef __this_cpu_write_8
-#  define __this_cpu_write_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), =)
-# endif
-# define __this_cpu_write(pcp, val)	__pcpu_size_call(__this_cpu_write_, (pcp), (val))
-#endif
-
-#ifndef __this_cpu_add
-# ifndef __this_cpu_add_1
-#  define __this_cpu_add_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef __this_cpu_add_2
-#  define __this_cpu_add_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef __this_cpu_add_4
-#  define __this_cpu_add_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef __this_cpu_add_8
-#  define __this_cpu_add_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# define __this_cpu_add(pcp, val)	__pcpu_size_call(__this_cpu_add_, (pcp), (val))
-#endif
-
-#ifndef __this_cpu_sub
-# define __this_cpu_sub(pcp, val)	__this_cpu_add((pcp), -(val))
-#endif
-
-#ifndef __this_cpu_inc
-# define __this_cpu_inc(pcp)		__this_cpu_add((pcp), 1)
-#endif
-
-#ifndef __this_cpu_dec
-# define __this_cpu_dec(pcp)		__this_cpu_sub((pcp), 1)
-#endif
-
-#ifndef __this_cpu_and
-# ifndef __this_cpu_and_1
-#  define __this_cpu_and_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef __this_cpu_and_2
-#  define __this_cpu_and_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef __this_cpu_and_4
-#  define __this_cpu_and_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef __this_cpu_and_8
-#  define __this_cpu_and_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# define __this_cpu_and(pcp, val)	__pcpu_size_call(__this_cpu_and_, (pcp), (val))
-#endif
-
-#ifndef __this_cpu_or
-# ifndef __this_cpu_or_1
-#  define __this_cpu_or_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef __this_cpu_or_2
-#  define __this_cpu_or_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef __this_cpu_or_4
-#  define __this_cpu_or_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef __this_cpu_or_8
-#  define __this_cpu_or_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# define __this_cpu_or(pcp, val)	__pcpu_size_call(__this_cpu_or_, (pcp), (val))
-#endif
-
-#ifndef __this_cpu_xor
-# ifndef __this_cpu_xor_1
-#  define __this_cpu_xor_1(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef __this_cpu_xor_2
-#  define __this_cpu_xor_2(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef __this_cpu_xor_4
-#  define __this_cpu_xor_4(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef __this_cpu_xor_8
-#  define __this_cpu_xor_8(pcp, val)	__this_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# define __this_cpu_xor(pcp, val)	__pcpu_size_call(__this_cpu_xor_, (pcp), (val))
-#endif
-
-/*
- * IRQ safe versions of the per cpu RMW operations. Note that these operations
- * are *not* safe against modification of the same variable from another
- * processors (which one gets when using regular atomic operations)
- . They are guaranteed to be atomic vs. local interrupts and
- * preemption only.
- */
-#define irqsafe_cpu_generic_to_op(pcp, val, op)				\
-do {									\
-	unsigned long flags;						\
-	local_irq_save(flags);						\
-	*__this_cpu_ptr(&(pcp)) op val;					\
-	local_irq_restore(flags);					\
-} while (0)
-
-#ifndef irqsafe_cpu_add
-# ifndef irqsafe_cpu_add_1
-#  define irqsafe_cpu_add_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef irqsafe_cpu_add_2
-#  define irqsafe_cpu_add_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef irqsafe_cpu_add_4
-#  define irqsafe_cpu_add_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# ifndef irqsafe_cpu_add_8
-#  define irqsafe_cpu_add_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), +=)
-# endif
-# define irqsafe_cpu_add(pcp, val) __pcpu_size_call(irqsafe_cpu_add_, (pcp), (val))
-#endif
-
-#ifndef irqsafe_cpu_sub
-# define irqsafe_cpu_sub(pcp, val)	irqsafe_cpu_add((pcp), -(val))
-#endif
-
-#ifndef irqsafe_cpu_inc
-# define irqsafe_cpu_inc(pcp)	irqsafe_cpu_add((pcp), 1)
-#endif
-
-#ifndef irqsafe_cpu_dec
-# define irqsafe_cpu_dec(pcp)	irqsafe_cpu_sub((pcp), 1)
-#endif
-
-#ifndef irqsafe_cpu_and
-# ifndef irqsafe_cpu_and_1
-#  define irqsafe_cpu_and_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef irqsafe_cpu_and_2
-#  define irqsafe_cpu_and_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef irqsafe_cpu_and_4
-#  define irqsafe_cpu_and_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# ifndef irqsafe_cpu_and_8
-#  define irqsafe_cpu_and_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), &=)
-# endif
-# define irqsafe_cpu_and(pcp, val) __pcpu_size_call(irqsafe_cpu_and_, (val))
-#endif
-
-#ifndef irqsafe_cpu_or
-# ifndef irqsafe_cpu_or_1
-#  define irqsafe_cpu_or_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef irqsafe_cpu_or_2
-#  define irqsafe_cpu_or_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef irqsafe_cpu_or_4
-#  define irqsafe_cpu_or_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# ifndef irqsafe_cpu_or_8
-#  define irqsafe_cpu_or_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), |=)
-# endif
-# define irqsafe_cpu_or(pcp, val) __pcpu_size_call(irqsafe_cpu_or_, (val))
-#endif
-
-#ifndef irqsafe_cpu_xor
-# ifndef irqsafe_cpu_xor_1
-#  define irqsafe_cpu_xor_1(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef irqsafe_cpu_xor_2
-#  define irqsafe_cpu_xor_2(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef irqsafe_cpu_xor_4
-#  define irqsafe_cpu_xor_4(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# ifndef irqsafe_cpu_xor_8
-#  define irqsafe_cpu_xor_8(pcp, val) irqsafe_cpu_generic_to_op((pcp), (val), ^=)
-# endif
-# define irqsafe_cpu_xor(pcp, val) __pcpu_size_call(irqsafe_cpu_xor_, (val))
-#endif
 
 #endif /* __LINUX_PERCPU_H */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH/RFC 2/6] numa:  x86_64:  use generic percpu var numa_node_id() implementation
  2009-11-13 21:17 [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
  2009-11-13 21:17 ` [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id() Lee Schermerhorn
@ 2009-11-13 21:17 ` Lee Schermerhorn
  2009-11-13 21:17   ` Lee Schermerhorn
  2009-11-20 15:48   ` Christoph Lameter
  2009-11-13 21:18 ` [PATCH/RFC 3/6] numa: ia64: " Lee Schermerhorn
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Lee Schermerhorn @ 2009-11-13 21:17 UTC (permalink / raw)
  To: linux-arch, linux-mm
  Cc: akpm, Mel Gorman, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney

Against:  2.6.32-rc5-mmotm-091101-1001

x86 arch specific changes to use generic numa_node_id() based on
generic percpu variable infrastructure.  Back out x86's custom
version of numa_node_id()

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
[Christoph's signoff here?]

V0: based on:
# From cl@linux-foundation.org Wed Nov  4 10:36:12 2009
# Date: Wed, 4 Nov 2009 12:35:14 -0500 (EST)
# From: Christoph Lameter <cl@linux-foundation.org>
# To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
# Subject: Re: [PATCH/RFC] slab:  handle memoryless nodes efficiently
# 
# I have a very early form of a draft of a patch here that genericizes
# numa_node_id(). Uses the new generic this_cpu_xxx stuff.
# 
# Not complete.

V1:
  + split out x86-specific changes from generic.
  + change 'node_number' => 'numa_node' in x86 arch code
  + define __this_cpu_read in x86 asm/percpu.h
  + change x86/kernel/setup_percpu.c to use early_cpu_to_node() to
    setup 'numa_node' as cpu_to_node() now depends on the per cpu var.
    [I think!  What about cpu_to_node() func in x86/mm/numa_64.c ???]

V2:
  + cpu_to_node() => early_cpu_to_node(); incomplete change in V01
  + x86 arch define USE_PERCPU_NUMA_NODE_ID.

 arch/x86/Kconfig                |    4 ++++
 arch/x86/include/asm/percpu.h   |    2 ++
 arch/x86/include/asm/topology.h |   13 +------------
 arch/x86/kernel/cpu/common.c    |    6 +++---
 arch/x86/kernel/setup_percpu.c  |    4 ++--
 arch/x86/mm/numa_64.c           |    5 +----
 6 files changed, 13 insertions(+), 21 deletions(-)

Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/include/asm/topology.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/include/asm/topology.h
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/include/asm/topology.h
@@ -53,33 +53,22 @@
 extern int cpu_to_node_map[];
 
 /* Returns the number of the node containing CPU 'cpu' */
-static inline int cpu_to_node(int cpu)
+static inline int early_cpu_to_node(int cpu)
 {
 	return cpu_to_node_map[cpu];
 }
-#define early_cpu_to_node(cpu)	cpu_to_node(cpu)
 
 #else /* CONFIG_X86_64 */
 
 /* Mappings between logical cpu number and node number */
 DECLARE_EARLY_PER_CPU(int, x86_cpu_to_node_map);
 
-/* Returns the number of the current Node. */
-DECLARE_PER_CPU(int, node_number);
-#define numa_node_id()		percpu_read(node_number)
-
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
 extern int cpu_to_node(int cpu);
 extern int early_cpu_to_node(int cpu);
 
 #else	/* !CONFIG_DEBUG_PER_CPU_MAPS */
 
-/* Returns the number of the node containing CPU 'cpu' */
-static inline int cpu_to_node(int cpu)
-{
-	return per_cpu(x86_cpu_to_node_map, cpu);
-}
-
 /* Same function but used if called before per_cpu areas are setup */
 static inline int early_cpu_to_node(int cpu)
 {
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/mm/numa_64.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/mm/numa_64.c
@@ -33,9 +33,6 @@ int numa_off __initdata;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
 
-DEFINE_PER_CPU(int, node_number) = 0;
-EXPORT_PER_CPU_SYMBOL(node_number);
-
 /*
  * Map cpu index to node index
  */
@@ -816,7 +813,7 @@ void __cpuinit numa_set_node(int cpu, in
 	per_cpu(x86_cpu_to_node_map, cpu) = node;
 
 	if (node != NUMA_NO_NODE)
-		per_cpu(node_number, cpu) = node;
+		per_cpu(numa_node, cpu) = node;
 }
 
 void __cpuinit numa_clear_node(int cpu)
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/include/asm/percpu.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/include/asm/percpu.h
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/include/asm/percpu.h
@@ -150,10 +150,12 @@ do {							\
 #define percpu_or(var, val)		percpu_to_op("or", var, val)
 #define percpu_xor(var, val)		percpu_to_op("xor", var, val)
 
+#define __this_cpu_read(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 #define __this_cpu_read_1(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 #define __this_cpu_read_2(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 #define __this_cpu_read_4(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 
+#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
 #define __this_cpu_write_1(pcp, val)	percpu_to_op("mov", (pcp), val)
 #define __this_cpu_write_2(pcp, val)	percpu_to_op("mov", (pcp), val)
 #define __this_cpu_write_4(pcp, val)	percpu_to_op("mov", (pcp), val)
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/kernel/cpu/common.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/kernel/cpu/common.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/kernel/cpu/common.c
@@ -1105,9 +1105,9 @@ void __cpuinit cpu_init(void)
 	oist = &per_cpu(orig_ist, cpu);
 
 #ifdef CONFIG_NUMA
-	if (cpu != 0 && percpu_read(node_number) == 0 &&
-	    cpu_to_node(cpu) != NUMA_NO_NODE)
-		percpu_write(node_number, cpu_to_node(cpu));
+	if (cpu != 0 && percpu_read(numa_node) == 0 &&
+	    early_cpu_to_node(cpu) != NUMA_NO_NODE)
+		set_numa_node(early_cpu_to_node(cpu));
 #endif
 
 	me = current;
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/kernel/setup_percpu.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/kernel/setup_percpu.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/kernel/setup_percpu.c
@@ -258,10 +258,10 @@ void __init setup_per_cpu_areas(void)
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_NUMA)
 	/*
-	 * make sure boot cpu node_number is right, when boot cpu is on the
+	 * make sure boot cpu numa_node is right, when boot cpu is on the
 	 * node that doesn't have mem installed
 	 */
-	per_cpu(node_number, boot_cpu_id) = cpu_to_node(boot_cpu_id);
+	per_cpu(numa_node, boot_cpu_id) = early_cpu_to_node(boot_cpu_id);
 #endif
 
 	/* Setup node to cpumask map */
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/Kconfig
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/Kconfig
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/Kconfig
@@ -1677,6 +1677,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
 	def_bool X86_64
 	depends on NUMA
 
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH/RFC 2/6] numa:  x86_64:  use generic percpu var numa_node_id() implementation
  2009-11-13 21:17 ` [PATCH/RFC 2/6] numa: x86_64: use generic percpu var numa_node_id() implementation Lee Schermerhorn
@ 2009-11-13 21:17   ` Lee Schermerhorn
  2009-11-20 15:48   ` Christoph Lameter
  1 sibling, 0 replies; 27+ messages in thread
From: Lee Schermerhorn @ 2009-11-13 21:17 UTC (permalink / raw)
  To: linux-arch, linux-mm
  Cc: akpm, Mel Gorman, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney

Against:  2.6.32-rc5-mmotm-091101-1001

x86 arch specific changes to use generic numa_node_id() based on
generic percpu variable infrastructure.  Back out x86's custom
version of numa_node_id()

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
[Christoph's signoff here?]

V0: based on:
# From cl@linux-foundation.org Wed Nov  4 10:36:12 2009
# Date: Wed, 4 Nov 2009 12:35:14 -0500 (EST)
# From: Christoph Lameter <cl@linux-foundation.org>
# To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
# Subject: Re: [PATCH/RFC] slab:  handle memoryless nodes efficiently
# 
# I have a very early form of a draft of a patch here that genericizes
# numa_node_id(). Uses the new generic this_cpu_xxx stuff.
# 
# Not complete.

V1:
  + split out x86-specific changes from generic.
  + change 'node_number' => 'numa_node' in x86 arch code
  + define __this_cpu_read in x86 asm/percpu.h
  + change x86/kernel/setup_percpu.c to use early_cpu_to_node() to
    setup 'numa_node' as cpu_to_node() now depends on the per cpu var.
    [I think!  What about cpu_to_node() func in x86/mm/numa_64.c ???]

V2:
  + cpu_to_node() => early_cpu_to_node(); incomplete change in V01
  + x86 arch define USE_PERCPU_NUMA_NODE_ID.

 arch/x86/Kconfig                |    4 ++++
 arch/x86/include/asm/percpu.h   |    2 ++
 arch/x86/include/asm/topology.h |   13 +------------
 arch/x86/kernel/cpu/common.c    |    6 +++---
 arch/x86/kernel/setup_percpu.c  |    4 ++--
 arch/x86/mm/numa_64.c           |    5 +----
 6 files changed, 13 insertions(+), 21 deletions(-)

Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/include/asm/topology.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/include/asm/topology.h
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/include/asm/topology.h
@@ -53,33 +53,22 @@
 extern int cpu_to_node_map[];
 
 /* Returns the number of the node containing CPU 'cpu' */
-static inline int cpu_to_node(int cpu)
+static inline int early_cpu_to_node(int cpu)
 {
 	return cpu_to_node_map[cpu];
 }
-#define early_cpu_to_node(cpu)	cpu_to_node(cpu)
 
 #else /* CONFIG_X86_64 */
 
 /* Mappings between logical cpu number and node number */
 DECLARE_EARLY_PER_CPU(int, x86_cpu_to_node_map);
 
-/* Returns the number of the current Node. */
-DECLARE_PER_CPU(int, node_number);
-#define numa_node_id()		percpu_read(node_number)
-
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
 extern int cpu_to_node(int cpu);
 extern int early_cpu_to_node(int cpu);
 
 #else	/* !CONFIG_DEBUG_PER_CPU_MAPS */
 
-/* Returns the number of the node containing CPU 'cpu' */
-static inline int cpu_to_node(int cpu)
-{
-	return per_cpu(x86_cpu_to_node_map, cpu);
-}
-
 /* Same function but used if called before per_cpu areas are setup */
 static inline int early_cpu_to_node(int cpu)
 {
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/mm/numa_64.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/mm/numa_64.c
@@ -33,9 +33,6 @@ int numa_off __initdata;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
 
-DEFINE_PER_CPU(int, node_number) = 0;
-EXPORT_PER_CPU_SYMBOL(node_number);
-
 /*
  * Map cpu index to node index
  */
@@ -816,7 +813,7 @@ void __cpuinit numa_set_node(int cpu, in
 	per_cpu(x86_cpu_to_node_map, cpu) = node;
 
 	if (node != NUMA_NO_NODE)
-		per_cpu(node_number, cpu) = node;
+		per_cpu(numa_node, cpu) = node;
 }
 
 void __cpuinit numa_clear_node(int cpu)
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/include/asm/percpu.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/include/asm/percpu.h
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/include/asm/percpu.h
@@ -150,10 +150,12 @@ do {							\
 #define percpu_or(var, val)		percpu_to_op("or", var, val)
 #define percpu_xor(var, val)		percpu_to_op("xor", var, val)
 
+#define __this_cpu_read(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 #define __this_cpu_read_1(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 #define __this_cpu_read_2(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 #define __this_cpu_read_4(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
 
+#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
 #define __this_cpu_write_1(pcp, val)	percpu_to_op("mov", (pcp), val)
 #define __this_cpu_write_2(pcp, val)	percpu_to_op("mov", (pcp), val)
 #define __this_cpu_write_4(pcp, val)	percpu_to_op("mov", (pcp), val)
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/kernel/cpu/common.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/kernel/cpu/common.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/kernel/cpu/common.c
@@ -1105,9 +1105,9 @@ void __cpuinit cpu_init(void)
 	oist = &per_cpu(orig_ist, cpu);
 
 #ifdef CONFIG_NUMA
-	if (cpu != 0 && percpu_read(node_number) == 0 &&
-	    cpu_to_node(cpu) != NUMA_NO_NODE)
-		percpu_write(node_number, cpu_to_node(cpu));
+	if (cpu != 0 && percpu_read(numa_node) == 0 &&
+	    early_cpu_to_node(cpu) != NUMA_NO_NODE)
+		set_numa_node(early_cpu_to_node(cpu));
 #endif
 
 	me = current;
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/kernel/setup_percpu.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/kernel/setup_percpu.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/kernel/setup_percpu.c
@@ -258,10 +258,10 @@ void __init setup_per_cpu_areas(void)
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_NUMA)
 	/*
-	 * make sure boot cpu node_number is right, when boot cpu is on the
+	 * make sure boot cpu numa_node is right, when boot cpu is on the
 	 * node that doesn't have mem installed
 	 */
-	per_cpu(node_number, boot_cpu_id) = cpu_to_node(boot_cpu_id);
+	per_cpu(numa_node, boot_cpu_id) = early_cpu_to_node(boot_cpu_id);
 #endif
 
 	/* Setup node to cpumask map */
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/Kconfig
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/Kconfig
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/Kconfig
@@ -1677,6 +1677,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
 	def_bool X86_64
 	depends on NUMA
 
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH/RFC 3/6] numa:  ia64:  use generic percpu var numa_node_id() implementation
  2009-11-13 21:17 [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
  2009-11-13 21:17 ` [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id() Lee Schermerhorn
  2009-11-13 21:17 ` [PATCH/RFC 2/6] numa: x86_64: use generic percpu var numa_node_id() implementation Lee Schermerhorn
@ 2009-11-13 21:18 ` Lee Schermerhorn
  2009-11-13 21:18   ` Lee Schermerhorn
  2009-11-20 15:50   ` Christoph Lameter
  2009-11-13 21:18 ` [PATCH/RFC 4/6] numa: Introduce numa_mem_id()- effective local memory node id Lee Schermerhorn
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Lee Schermerhorn @ 2009-11-13 21:18 UTC (permalink / raw)
  To: linux-arch, linux-mm
  Cc: akpm, Mel Gorman, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney

Against:  2.6.32-rc5-mmotm-091101-1001

ia64:  Use generic percpu implementation of numa_node_id()
   + intialize per cpu 'numa_node'
   + remove ia64 cpu_to_node() macro;  use generic
   + define CONFIG_USE_PERCPU_NUMA_NODE_ID when NUMA configured

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

New in V2

---

 arch/ia64/Kconfig                |    4 ++++
 arch/ia64/include/asm/topology.h |    5 -----
 arch/ia64/kernel/smpboot.c       |    6 ++++++
 3 files changed, 10 insertions(+), 5 deletions(-)

Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/kernel/smpboot.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/ia64/kernel/smpboot.c	2009-11-11 11:43:47.000000000 -0500
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/kernel/smpboot.c	2009-11-11 12:05:42.000000000 -0500
@@ -391,6 +391,11 @@ smp_callin (void)
 
 	fix_b0_for_bsp();
 
+	/*
+	 * numa_node_id() works after this.
+	 */
+	set_numa_node(cpu_to_node_map[cpuid]);
+
 	ipi_call_lock_irq();
 	spin_lock(&vector_lock);
 	/* Setup the per cpu irq handling data structures */
@@ -637,6 +642,7 @@ void __devinit smp_prepare_boot_cpu(void
 {
 	cpu_set(smp_processor_id(), cpu_online_map);
 	cpu_set(smp_processor_id(), cpu_callin_map);
+	set_numa_node(cpu_to_node_map[smp_processor_id()]);
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	paravirt_post_smp_prepare_boot_cpu();
 }
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/include/asm/topology.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/ia64/include/asm/topology.h	2009-11-11 11:43:47.000000000 -0500
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/include/asm/topology.h	2009-11-11 12:05:42.000000000 -0500
@@ -26,11 +26,6 @@
 #define RECLAIM_DISTANCE 15
 
 /*
- * Returns the number of the node containing CPU 'cpu'
- */
-#define cpu_to_node(cpu) (int)(cpu_to_node_map[cpu])
-
-/*
  * Returns a bitmask of CPUs on Node 'node'.
  */
 #define cpumask_of_node(node) (&node_to_cpu_mask[node])
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/Kconfig
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/ia64/Kconfig	2009-11-02 15:51:36.000000000 -0500
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/Kconfig	2009-11-11 12:09:13.000000000 -0500
@@ -495,6 +495,10 @@ config HAVE_ARCH_NODEDATA_EXTENSION
 	def_bool y
 	depends on NUMA
 
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 config ARCH_PROC_KCORE_TEXT
 	def_bool y
 	depends on PROC_KCORE

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH/RFC 3/6] numa:  ia64:  use generic percpu var numa_node_id() implementation
  2009-11-13 21:18 ` [PATCH/RFC 3/6] numa: ia64: " Lee Schermerhorn
@ 2009-11-13 21:18   ` Lee Schermerhorn
  2009-11-20 15:50   ` Christoph Lameter
  1 sibling, 0 replies; 27+ messages in thread
From: Lee Schermerhorn @ 2009-11-13 21:18 UTC (permalink / raw)
  To: linux-arch, linux-mm
  Cc: akpm, Mel Gorman, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney

Against:  2.6.32-rc5-mmotm-091101-1001

ia64:  Use generic percpu implementation of numa_node_id()
   + intialize per cpu 'numa_node'
   + remove ia64 cpu_to_node() macro;  use generic
   + define CONFIG_USE_PERCPU_NUMA_NODE_ID when NUMA configured

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

New in V2

---

 arch/ia64/Kconfig                |    4 ++++
 arch/ia64/include/asm/topology.h |    5 -----
 arch/ia64/kernel/smpboot.c       |    6 ++++++
 3 files changed, 10 insertions(+), 5 deletions(-)

Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/kernel/smpboot.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/ia64/kernel/smpboot.c	2009-11-11 11:43:47.000000000 -0500
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/kernel/smpboot.c	2009-11-11 12:05:42.000000000 -0500
@@ -391,6 +391,11 @@ smp_callin (void)
 
 	fix_b0_for_bsp();
 
+	/*
+	 * numa_node_id() works after this.
+	 */
+	set_numa_node(cpu_to_node_map[cpuid]);
+
 	ipi_call_lock_irq();
 	spin_lock(&vector_lock);
 	/* Setup the per cpu irq handling data structures */
@@ -637,6 +642,7 @@ void __devinit smp_prepare_boot_cpu(void
 {
 	cpu_set(smp_processor_id(), cpu_online_map);
 	cpu_set(smp_processor_id(), cpu_callin_map);
+	set_numa_node(cpu_to_node_map[smp_processor_id()]);
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	paravirt_post_smp_prepare_boot_cpu();
 }
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/include/asm/topology.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/ia64/include/asm/topology.h	2009-11-11 11:43:47.000000000 -0500
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/include/asm/topology.h	2009-11-11 12:05:42.000000000 -0500
@@ -26,11 +26,6 @@
 #define RECLAIM_DISTANCE 15
 
 /*
- * Returns the number of the node containing CPU 'cpu'
- */
-#define cpu_to_node(cpu) (int)(cpu_to_node_map[cpu])
-
-/*
  * Returns a bitmask of CPUs on Node 'node'.
  */
 #define cpumask_of_node(node) (&node_to_cpu_mask[node])
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/Kconfig
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/ia64/Kconfig	2009-11-02 15:51:36.000000000 -0500
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/Kconfig	2009-11-11 12:09:13.000000000 -0500
@@ -495,6 +495,10 @@ config HAVE_ARCH_NODEDATA_EXTENSION
 	def_bool y
 	depends on NUMA
 
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 config ARCH_PROC_KCORE_TEXT
 	def_bool y
 	depends on PROC_KCORE

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH/RFC 4/6] numa:  Introduce numa_mem_id()- effective local memory node id
  2009-11-13 21:17 [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
                   ` (2 preceding siblings ...)
  2009-11-13 21:18 ` [PATCH/RFC 3/6] numa: ia64: " Lee Schermerhorn
@ 2009-11-13 21:18 ` Lee Schermerhorn
  2009-11-13 21:18   ` Lee Schermerhorn
  2009-11-20 15:53   ` Christoph Lameter
  2009-11-13 21:18 ` [PATCH/RFC 5/6] numa: ia64: support numa_mem_id() for memoryless nodes Lee Schermerhorn
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Lee Schermerhorn @ 2009-11-13 21:18 UTC (permalink / raw)
  To: linux-arch, linux-mm
  Cc: akpm, Mel Gorman, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney

Against:  2.6.32-rc5-mmotm-091101-1001

Introduce numa_mem_id(), based on generic percpu variable infrastructure
to track "nearest node with memory" for archs that support memoryless
nodes.

Define API in <linux/topology.h> when CONFIG_HAVE_MEMORYLESS_NODES
defined, else stubs. Architectures will define HAVE_MEMORYLESS_NODES
if/when they support them.

Archs can override definitions of:

numa_mem_id() - returns node number of "local memory" node
set_numa_mem() - initialize [this cpus'] per cpu variable 'numa_mem'
cpu_to_mem()  - return numa_mem for specified cpu; may be used as lvalue

Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
This will initialize the boot cpu at boot time, and all cpus on change of
numa_zonelist_order, or when node or memory hot-plug requires zonelist rebuild.
Archs that support memoryless nodes will need to initialize 'numa_mem' for
secondary cpus as they're brought on-line.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

V2:  + split this out of Christoph's incomplete "starter patch"
     + flesh out the definition

---

 include/linux/mmzone.h   |    6 ++++++
 include/linux/topology.h |   24 ++++++++++++++++++++++++
 mm/page_alloc.c          |   35 +++++++++++++++++++++++++++++++++++
 mm/percpu.c              |    5 +++++
 4 files changed, 70 insertions(+)

Index: linux-2.6.32-rc5-mmotm-091101-1001/include/linux/topology.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/include/linux/topology.h
+++ linux-2.6.32-rc5-mmotm-091101-1001/include/linux/topology.h
@@ -232,6 +232,30 @@ DECLARE_PER_CPU(int, numa_node);
 
 #endif	/* [!]CONFIG_USE_PERCPU_NUMA_NODE_ID */
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+
+DECLARE_PER_CPU(int, numa_mem);
+
+#ifndef set_numa_mem
+#define set_numa_mem(__node) percpu_write(numa_mem, __node)
+#endif
+
+#else	/* !CONFIG_HAVE_MEMORYLESS_NODES */
+
+#define numa_mem numa_node
+static inline void set_numa_mem(int node) {}
+
+#endif	/* [!]CONFIG_HAVE_MEMORYLESS_NODES */
+
+#ifndef numa_mem_id
+/* Returns the number of the nearest Node with memory */
+#define numa_mem_id()		__this_cpu_read(numa_mem)
+#endif
+
+#ifndef cpu_to_mem
+#define cpu_to_mem(__cpu)	per_cpu(numa_mem, (__cpu))
+#endif
+
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
Index: linux-2.6.32-rc5-mmotm-091101-1001/mm/percpu.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/mm/percpu.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/mm/percpu.c
@@ -2078,3 +2078,8 @@ DEFINE_PER_CPU(int, numa_node);
 EXPORT_PER_CPU_SYMBOL(numa_node);
 #endif
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+DEFINE_PER_CPU(int, numa_mem);		/* Kernel "local memory" node */
+EXPORT_PER_CPU_SYMBOL(numa_mem);
+#endif
+
Index: linux-2.6.32-rc5-mmotm-091101-1001/mm/page_alloc.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/mm/page_alloc.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/mm/page_alloc.c
@@ -2688,6 +2688,24 @@ static void build_zonelist_cache(pg_data
 		zlc->z_to_n[z - zonelist->_zonerefs] = zonelist_node_idx(z);
 }
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+/*
+ * Return node id of node used for "local" allocations.
+ * I.e., first node id of first zone in arg node's generic zonelist.
+ * Used for initializing percpu 'numa_mem', which is used primarily
+ * for kernel allocations, so use GFP_KERNEL flags to locate zonelist.
+ */
+int local_memory_node(int node)
+{
+	struct zone *zone;
+
+	(void)first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
+				   gfp_zone(GFP_KERNEL),
+				   NULL,
+				   &zone);
+	return zone->node;
+}
+#endif
 
 #else	/* CONFIG_NUMA */
 
@@ -2754,6 +2772,23 @@ static int __build_all_zonelists(void *d
 		build_zonelists(pgdat);
 		build_zonelist_cache(pgdat);
 	}
+
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+	{
+		/*
+		 * We now know the "local memory node" for each node--
+		 * i.e., the node of the first zone in the generic zonelist.
+		 * Set up numa_mem percpu variable for on-line cpus.  During
+		 * boot, only the boot cpu should be on-line;  we'll init the
+		 * secondary cpus' numa_mem as they come on-line.  During
+		 * node/memory hotplug, we'll fixup all cpus.
+		 */
+		int cpu;
+		for_each_online_cpu(cpu) {
+			cpu_to_mem(cpu) = local_memory_node(cpu_to_node(cpu));
+		}
+	}
+#endif
 	return 0;
 }
 
Index: linux-2.6.32-rc5-mmotm-091101-1001/include/linux/mmzone.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/include/linux/mmzone.h
+++ linux-2.6.32-rc5-mmotm-091101-1001/include/linux/mmzone.h
@@ -672,6 +672,12 @@ void memory_present(int nid, unsigned lo
 static inline void memory_present(int nid, unsigned long start, unsigned long end) {}
 #endif
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+int local_memory_node(int node_id);
+#else
+static inline int local_memory_node(int node_id) { return node_id; };
+#endif
+
 #ifdef CONFIG_NEED_NODE_MEMMAP_SIZE
 unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH/RFC 4/6] numa:  Introduce numa_mem_id()- effective local memory node id
  2009-11-13 21:18 ` [PATCH/RFC 4/6] numa: Introduce numa_mem_id()- effective local memory node id Lee Schermerhorn
@ 2009-11-13 21:18   ` Lee Schermerhorn
  2009-11-20 15:53   ` Christoph Lameter
  1 sibling, 0 replies; 27+ messages in thread
From: Lee Schermerhorn @ 2009-11-13 21:18 UTC (permalink / raw)
  To: linux-arch, linux-mm
  Cc: akpm, Mel Gorman, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney

Against:  2.6.32-rc5-mmotm-091101-1001

Introduce numa_mem_id(), based on generic percpu variable infrastructure
to track "nearest node with memory" for archs that support memoryless
nodes.

Define API in <linux/topology.h> when CONFIG_HAVE_MEMORYLESS_NODES
defined, else stubs. Architectures will define HAVE_MEMORYLESS_NODES
if/when they support them.

Archs can override definitions of:

numa_mem_id() - returns node number of "local memory" node
set_numa_mem() - initialize [this cpus'] per cpu variable 'numa_mem'
cpu_to_mem()  - return numa_mem for specified cpu; may be used as lvalue

Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
This will initialize the boot cpu at boot time, and all cpus on change of
numa_zonelist_order, or when node or memory hot-plug requires zonelist rebuild.
Archs that support memoryless nodes will need to initialize 'numa_mem' for
secondary cpus as they're brought on-line.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

V2:  + split this out of Christoph's incomplete "starter patch"
     + flesh out the definition

---

 include/linux/mmzone.h   |    6 ++++++
 include/linux/topology.h |   24 ++++++++++++++++++++++++
 mm/page_alloc.c          |   35 +++++++++++++++++++++++++++++++++++
 mm/percpu.c              |    5 +++++
 4 files changed, 70 insertions(+)

Index: linux-2.6.32-rc5-mmotm-091101-1001/include/linux/topology.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/include/linux/topology.h
+++ linux-2.6.32-rc5-mmotm-091101-1001/include/linux/topology.h
@@ -232,6 +232,30 @@ DECLARE_PER_CPU(int, numa_node);
 
 #endif	/* [!]CONFIG_USE_PERCPU_NUMA_NODE_ID */
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+
+DECLARE_PER_CPU(int, numa_mem);
+
+#ifndef set_numa_mem
+#define set_numa_mem(__node) percpu_write(numa_mem, __node)
+#endif
+
+#else	/* !CONFIG_HAVE_MEMORYLESS_NODES */
+
+#define numa_mem numa_node
+static inline void set_numa_mem(int node) {}
+
+#endif	/* [!]CONFIG_HAVE_MEMORYLESS_NODES */
+
+#ifndef numa_mem_id
+/* Returns the number of the nearest Node with memory */
+#define numa_mem_id()		__this_cpu_read(numa_mem)
+#endif
+
+#ifndef cpu_to_mem
+#define cpu_to_mem(__cpu)	per_cpu(numa_mem, (__cpu))
+#endif
+
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
Index: linux-2.6.32-rc5-mmotm-091101-1001/mm/percpu.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/mm/percpu.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/mm/percpu.c
@@ -2078,3 +2078,8 @@ DEFINE_PER_CPU(int, numa_node);
 EXPORT_PER_CPU_SYMBOL(numa_node);
 #endif
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+DEFINE_PER_CPU(int, numa_mem);		/* Kernel "local memory" node */
+EXPORT_PER_CPU_SYMBOL(numa_mem);
+#endif
+
Index: linux-2.6.32-rc5-mmotm-091101-1001/mm/page_alloc.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/mm/page_alloc.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/mm/page_alloc.c
@@ -2688,6 +2688,24 @@ static void build_zonelist_cache(pg_data
 		zlc->z_to_n[z - zonelist->_zonerefs] = zonelist_node_idx(z);
 }
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+/*
+ * Return node id of node used for "local" allocations.
+ * I.e., first node id of first zone in arg node's generic zonelist.
+ * Used for initializing percpu 'numa_mem', which is used primarily
+ * for kernel allocations, so use GFP_KERNEL flags to locate zonelist.
+ */
+int local_memory_node(int node)
+{
+	struct zone *zone;
+
+	(void)first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
+				   gfp_zone(GFP_KERNEL),
+				   NULL,
+				   &zone);
+	return zone->node;
+}
+#endif
 
 #else	/* CONFIG_NUMA */
 
@@ -2754,6 +2772,23 @@ static int __build_all_zonelists(void *d
 		build_zonelists(pgdat);
 		build_zonelist_cache(pgdat);
 	}
+
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+	{
+		/*
+		 * We now know the "local memory node" for each node--
+		 * i.e., the node of the first zone in the generic zonelist.
+		 * Set up numa_mem percpu variable for on-line cpus.  During
+		 * boot, only the boot cpu should be on-line;  we'll init the
+		 * secondary cpus' numa_mem as they come on-line.  During
+		 * node/memory hotplug, we'll fixup all cpus.
+		 */
+		int cpu;
+		for_each_online_cpu(cpu) {
+			cpu_to_mem(cpu) = local_memory_node(cpu_to_node(cpu));
+		}
+	}
+#endif
 	return 0;
 }
 
Index: linux-2.6.32-rc5-mmotm-091101-1001/include/linux/mmzone.h
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/include/linux/mmzone.h
+++ linux-2.6.32-rc5-mmotm-091101-1001/include/linux/mmzone.h
@@ -672,6 +672,12 @@ void memory_present(int nid, unsigned lo
 static inline void memory_present(int nid, unsigned long start, unsigned long end) {}
 #endif
 
+#ifdef CONFIG_HAVE_MEMORYLESS_NODES
+int local_memory_node(int node_id);
+#else
+static inline int local_memory_node(int node_id) { return node_id; };
+#endif
+
 #ifdef CONFIG_NEED_NODE_MEMMAP_SIZE
 unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
 #endif

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH/RFC 5/6] numa: ia64: support numa_mem_id() for memoryless nodes
  2009-11-13 21:17 [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
                   ` (3 preceding siblings ...)
  2009-11-13 21:18 ` [PATCH/RFC 4/6] numa: Introduce numa_mem_id()- effective local memory node id Lee Schermerhorn
@ 2009-11-13 21:18 ` Lee Schermerhorn
  2009-11-13 21:18 ` [PATCH/RFC 6/6] numa: slab: use numa_mem_id() for slab local memory node Lee Schermerhorn
  2009-11-20 15:43 ` [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id() Christoph Lameter
  6 siblings, 0 replies; 27+ messages in thread
From: Lee Schermerhorn @ 2009-11-13 21:18 UTC (permalink / raw)
  To: linux-arch, linux-mm
  Cc: akpm, Mel Gorman, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney

PATCH/RFC numa: ia64:  support memoryless nodes

Against: 2.6.32-rc5-mmotm-091101-1001

Enable 'HAVE_MEMORYLESS_NODES' by default when NUMA configured
on ia64.  Initialize percpu 'numa_mem' variable when starting
secondary cpus.  Generic initialization will handle the boot
cpu.

Nothing uses 'numa_mem_id()' yet.  Subsequent patch with modify
slab to use this.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

New in V2

---

 arch/ia64/Kconfig          |    4 ++++
 arch/ia64/kernel/smpboot.c |    1 +
 2 files changed, 5 insertions(+)

Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/Kconfig
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/ia64/Kconfig	2009-11-11 12:09:13.000000000 -0500
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/Kconfig	2009-11-11 12:16:56.000000000 -0500
@@ -499,6 +499,10 @@ config USE_PERCPU_NUMA_NODE_ID
 	def_bool y
 	depends on NUMA
 
+config HAVE_MEMORYLESS_NODES
+	def_bool y
+	depends on NUMA
+
 config ARCH_PROC_KCORE_TEXT
 	def_bool y
 	depends on PROC_KCORE
Index: linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/kernel/smpboot.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/ia64/kernel/smpboot.c	2009-11-11 12:05:42.000000000 -0500
+++ linux-2.6.32-rc5-mmotm-091101-1001/arch/ia64/kernel/smpboot.c	2009-11-11 12:16:56.000000000 -0500
@@ -395,6 +395,7 @@ smp_callin (void)
 	 * numa_node_id() works after this.
 	 */
 	set_numa_node(cpu_to_node_map[cpuid]);
+	set_numa_mem(local_memory_node(cpu_to_node_map[cpuid]));
 
 	ipi_call_lock_irq();
 	spin_lock(&vector_lock);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH/RFC 6/6] numa: slab:  use numa_mem_id() for slab local memory node
  2009-11-13 21:17 [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
                   ` (4 preceding siblings ...)
  2009-11-13 21:18 ` [PATCH/RFC 5/6] numa: ia64: support numa_mem_id() for memoryless nodes Lee Schermerhorn
@ 2009-11-13 21:18 ` Lee Schermerhorn
  2009-11-13 21:18   ` Lee Schermerhorn
  2009-11-20 15:56   ` Christoph Lameter
  2009-11-20 15:43 ` [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id() Christoph Lameter
  6 siblings, 2 replies; 27+ messages in thread
From: Lee Schermerhorn @ 2009-11-13 21:18 UTC (permalink / raw)
  To: linux-arch, linux-mm
  Cc: akpm, Mel Gorman, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney

[PATCH] numa:  Slab handle memoryless nodes

Against:  2.6.32-rc5-mmotm-091101-1001

Example usage of generic "numa_mem_id()":

The mainline slab code, since ~ 2.6.19, does not handle memoryless
nodes well.  Specifically, the "fast path"--____cache_alloc()--will
never succeed as slab doesn't cache offnode object on the per cpu
queues, and for memoryless nodes, all memory will be "off node"
relative to numa_node_id().  This adds significant overhead to all
kmem cache allocations, incurring a significant regression relative
to earlier kernels [from before slab.c was reorganized].

This patch uses the generic topology function "numa_mem_id()" to
return the "effective local memory node" for the calling context.
This is the first node in the local node's generic fallback zonelist--
i.e., the same node that "local" mempolicy-based allocations would
use.  This lets slab cache these "local" allocations and avoid 
fallback/refill on every allocation.

N.B.:  incomplete.  slab will need to handle node and memory hotplug
that could change the value returned by numa_mem_id() for any given
node.  This will be addressed by a subsequent patch, if we decide to
go this route.

Performance impact on "hackbench 400 process 200"

2.6.32-rc5+mmotm-091101		no-patch	this-patch
no memoryless nodes [avg of 10]:  12.700	  12.856  ~1.2%
cpus all on memless nodes  [20]: 261.530	  27.700 ~10x speedup

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

---

 mm/slab.c |   27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

Index: linux-2.6.32-rc5-mmotm-091101-1001/mm/slab.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/mm/slab.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/mm/slab.c
@@ -1064,7 +1064,7 @@ static inline int cache_free_alien(struc
 	struct array_cache *alien = NULL;
 	int node;
 
-	node = numa_node_id();
+	node = numa_mem_id();
 
 	/*
 	 * Make sure we are not freeing a object from another node to the array
@@ -1407,7 +1407,7 @@ void __init kmem_cache_init(void)
 	 * 6) Resize the head arrays of the kmalloc caches to their final sizes.
 	 */
 
-	node = numa_node_id();
+	node = numa_mem_id();
 
 	/* 1) create the cache_cache */
 	INIT_LIST_HEAD(&cache_chain);
@@ -2041,7 +2041,7 @@ static int __init_refok setup_cpu_cache(
 			}
 		}
 	}
-	cachep->nodelists[numa_node_id()]->next_reap =
+	cachep->nodelists[numa_mem_id()]->next_reap =
 			jiffies + REAPTIMEOUT_LIST3 +
 			((unsigned long)cachep) % REAPTIMEOUT_LIST3;
 
@@ -2372,7 +2372,7 @@ static void check_spinlock_acquired(stru
 {
 #ifdef CONFIG_SMP
 	check_irq_off();
-	assert_spin_locked(&cachep->nodelists[numa_node_id()]->list_lock);
+	assert_spin_locked(&cachep->nodelists[numa_mem_id()]->list_lock);
 #endif
 }
 
@@ -2399,7 +2399,7 @@ static void do_drain(void *arg)
 {
 	struct kmem_cache *cachep = arg;
 	struct array_cache *ac;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 
 	check_irq_off();
 	ac = cpu_cache_get(cachep);
@@ -2932,7 +2932,7 @@ static void *cache_alloc_refill(struct k
 
 retry:
 	check_irq_off();
-	node = numa_node_id();
+	node = numa_mem_id();
 	ac = cpu_cache_get(cachep);
 	batchcount = ac->batchcount;
 	if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {
@@ -3128,7 +3128,7 @@ static void *alternate_node_alloc(struct
 
 	if (in_interrupt() || (flags & __GFP_THISNODE))
 		return NULL;
-	nid_alloc = nid_here = numa_node_id();
+	nid_alloc = nid_here = numa_mem_id();
 	if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
 		nid_alloc = cpuset_mem_spread_node();
 	else if (current->mempolicy)
@@ -3297,6 +3297,7 @@ __cache_alloc_node(struct kmem_cache *ca
 {
 	unsigned long save_flags;
 	void *ptr;
+	int slab_node = numa_mem_id();
 
 	flags &= gfp_allowed_mask;
 
@@ -3309,7 +3310,7 @@ __cache_alloc_node(struct kmem_cache *ca
 	local_irq_save(save_flags);
 
 	if (unlikely(nodeid == -1))
-		nodeid = numa_node_id();
+		nodeid = slab_node;
 
 	if (unlikely(!cachep->nodelists[nodeid])) {
 		/* Node not bootstrapped yet */
@@ -3317,7 +3318,7 @@ __cache_alloc_node(struct kmem_cache *ca
 		goto out;
 	}
 
-	if (nodeid == numa_node_id()) {
+	if (nodeid == slab_node) {
 		/*
 		 * Use the locally cached objects if possible.
 		 * However ____cache_alloc does not allow fallback
@@ -3361,8 +3362,8 @@ __do_cache_alloc(struct kmem_cache *cach
 	 * We may just have run out of memory on the local node.
 	 * ____cache_alloc_node() knows how to locate memory on other nodes
 	 */
- 	if (!objp)
- 		objp = ____cache_alloc_node(cache, flags, numa_node_id());
+	if (!objp)
+		objp = ____cache_alloc_node(cache, flags, numa_mem_id());
 
   out:
 	return objp;
@@ -3459,7 +3460,7 @@ static void cache_flusharray(struct kmem
 {
 	int batchcount;
 	struct kmem_list3 *l3;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 
 	batchcount = ac->batchcount;
 #if DEBUG
@@ -4034,7 +4035,7 @@ static void cache_reap(struct work_struc
 {
 	struct kmem_cache *searchp;
 	struct kmem_list3 *l3;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 	struct delayed_work *work = to_delayed_work(w);
 
 	if (!mutex_trylock(&cache_chain_mutex))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH/RFC 6/6] numa: slab:  use numa_mem_id() for slab local memory node
  2009-11-13 21:18 ` [PATCH/RFC 6/6] numa: slab: use numa_mem_id() for slab local memory node Lee Schermerhorn
@ 2009-11-13 21:18   ` Lee Schermerhorn
  2009-11-20 15:56   ` Christoph Lameter
  1 sibling, 0 replies; 27+ messages in thread
From: Lee Schermerhorn @ 2009-11-13 21:18 UTC (permalink / raw)
  To: linux-arch, linux-mm
  Cc: akpm, Mel Gorman, Christoph Lameter, Nick Piggin, David Rientjes,
	eric.whitney

[PATCH] numa:  Slab handle memoryless nodes

Against:  2.6.32-rc5-mmotm-091101-1001

Example usage of generic "numa_mem_id()":

The mainline slab code, since ~ 2.6.19, does not handle memoryless
nodes well.  Specifically, the "fast path"--____cache_alloc()--will
never succeed as slab doesn't cache offnode object on the per cpu
queues, and for memoryless nodes, all memory will be "off node"
relative to numa_node_id().  This adds significant overhead to all
kmem cache allocations, incurring a significant regression relative
to earlier kernels [from before slab.c was reorganized].

This patch uses the generic topology function "numa_mem_id()" to
return the "effective local memory node" for the calling context.
This is the first node in the local node's generic fallback zonelist--
i.e., the same node that "local" mempolicy-based allocations would
use.  This lets slab cache these "local" allocations and avoid 
fallback/refill on every allocation.

N.B.:  incomplete.  slab will need to handle node and memory hotplug
that could change the value returned by numa_mem_id() for any given
node.  This will be addressed by a subsequent patch, if we decide to
go this route.

Performance impact on "hackbench 400 process 200"

2.6.32-rc5+mmotm-091101		no-patch	this-patch
no memoryless nodes [avg of 10]:  12.700	  12.856  ~1.2%
cpus all on memless nodes  [20]: 261.530	  27.700 ~10x speedup

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

---

 mm/slab.c |   27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

Index: linux-2.6.32-rc5-mmotm-091101-1001/mm/slab.c
===================================================================
--- linux-2.6.32-rc5-mmotm-091101-1001.orig/mm/slab.c
+++ linux-2.6.32-rc5-mmotm-091101-1001/mm/slab.c
@@ -1064,7 +1064,7 @@ static inline int cache_free_alien(struc
 	struct array_cache *alien = NULL;
 	int node;
 
-	node = numa_node_id();
+	node = numa_mem_id();
 
 	/*
 	 * Make sure we are not freeing a object from another node to the array
@@ -1407,7 +1407,7 @@ void __init kmem_cache_init(void)
 	 * 6) Resize the head arrays of the kmalloc caches to their final sizes.
 	 */
 
-	node = numa_node_id();
+	node = numa_mem_id();
 
 	/* 1) create the cache_cache */
 	INIT_LIST_HEAD(&cache_chain);
@@ -2041,7 +2041,7 @@ static int __init_refok setup_cpu_cache(
 			}
 		}
 	}
-	cachep->nodelists[numa_node_id()]->next_reap =
+	cachep->nodelists[numa_mem_id()]->next_reap =
 			jiffies + REAPTIMEOUT_LIST3 +
 			((unsigned long)cachep) % REAPTIMEOUT_LIST3;
 
@@ -2372,7 +2372,7 @@ static void check_spinlock_acquired(stru
 {
 #ifdef CONFIG_SMP
 	check_irq_off();
-	assert_spin_locked(&cachep->nodelists[numa_node_id()]->list_lock);
+	assert_spin_locked(&cachep->nodelists[numa_mem_id()]->list_lock);
 #endif
 }
 
@@ -2399,7 +2399,7 @@ static void do_drain(void *arg)
 {
 	struct kmem_cache *cachep = arg;
 	struct array_cache *ac;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 
 	check_irq_off();
 	ac = cpu_cache_get(cachep);
@@ -2932,7 +2932,7 @@ static void *cache_alloc_refill(struct k
 
 retry:
 	check_irq_off();
-	node = numa_node_id();
+	node = numa_mem_id();
 	ac = cpu_cache_get(cachep);
 	batchcount = ac->batchcount;
 	if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {
@@ -3128,7 +3128,7 @@ static void *alternate_node_alloc(struct
 
 	if (in_interrupt() || (flags & __GFP_THISNODE))
 		return NULL;
-	nid_alloc = nid_here = numa_node_id();
+	nid_alloc = nid_here = numa_mem_id();
 	if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
 		nid_alloc = cpuset_mem_spread_node();
 	else if (current->mempolicy)
@@ -3297,6 +3297,7 @@ __cache_alloc_node(struct kmem_cache *ca
 {
 	unsigned long save_flags;
 	void *ptr;
+	int slab_node = numa_mem_id();
 
 	flags &= gfp_allowed_mask;
 
@@ -3309,7 +3310,7 @@ __cache_alloc_node(struct kmem_cache *ca
 	local_irq_save(save_flags);
 
 	if (unlikely(nodeid == -1))
-		nodeid = numa_node_id();
+		nodeid = slab_node;
 
 	if (unlikely(!cachep->nodelists[nodeid])) {
 		/* Node not bootstrapped yet */
@@ -3317,7 +3318,7 @@ __cache_alloc_node(struct kmem_cache *ca
 		goto out;
 	}
 
-	if (nodeid == numa_node_id()) {
+	if (nodeid == slab_node) {
 		/*
 		 * Use the locally cached objects if possible.
 		 * However ____cache_alloc does not allow fallback
@@ -3361,8 +3362,8 @@ __do_cache_alloc(struct kmem_cache *cach
 	 * We may just have run out of memory on the local node.
 	 * ____cache_alloc_node() knows how to locate memory on other nodes
 	 */
- 	if (!objp)
- 		objp = ____cache_alloc_node(cache, flags, numa_node_id());
+	if (!objp)
+		objp = ____cache_alloc_node(cache, flags, numa_mem_id());
 
   out:
 	return objp;
@@ -3459,7 +3460,7 @@ static void cache_flusharray(struct kmem
 {
 	int batchcount;
 	struct kmem_list3 *l3;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 
 	batchcount = ac->batchcount;
 #if DEBUG
@@ -4034,7 +4035,7 @@ static void cache_reap(struct work_struc
 {
 	struct kmem_cache *searchp;
 	struct kmem_list3 *l3;
-	int node = numa_node_id();
+	int node = numa_mem_id();
 	struct delayed_work *work = to_delayed_work(w);
 
 	if (!mutex_trylock(&cache_chain_mutex))

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id()
  2009-11-13 21:17 [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
                   ` (5 preceding siblings ...)
  2009-11-13 21:18 ` [PATCH/RFC 6/6] numa: slab: use numa_mem_id() for slab local memory node Lee Schermerhorn
@ 2009-11-20 15:43 ` Christoph Lameter
  2009-11-20 15:43   ` Christoph Lameter
  6 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2009-11-20 15:43 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, akpm, Mel Gorman, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney

On Fri, 13 Nov 2009, Lee Schermerhorn wrote:

> Ad hoc measurements on x86_64 using:  hackbench 400 process 200
>
> 2.6.32-rc5+mmotm-091101		no patch	this series
> x86_64 avg of 40:		  4.605		  4.628  ~0.5%

Instructions become more efficient here.

> Ia64 showed ~1.2% longer time with the series applied.

IA64 can use the per cpu TLB entry to get to the numa node id with the
platform specific per cpu handling. The per cpu implementation
currently requires fallback. IA64 percpu ops could be reworked to avoid
consulting the per cpu offset arrray which would make it equivalent to the
current implementation.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id()
  2009-11-20 15:43 ` [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id() Christoph Lameter
@ 2009-11-20 15:43   ` Christoph Lameter
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2009-11-20 15:43 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, akpm, Mel Gorman, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney

On Fri, 13 Nov 2009, Lee Schermerhorn wrote:

> Ad hoc measurements on x86_64 using:  hackbench 400 process 200
>
> 2.6.32-rc5+mmotm-091101		no patch	this series
> x86_64 avg of 40:		  4.605		  4.628  ~0.5%

Instructions become more efficient here.

> Ia64 showed ~1.2% longer time with the series applied.

IA64 can use the per cpu TLB entry to get to the numa node id with the
platform specific per cpu handling. The per cpu implementation
currently requires fallback. IA64 percpu ops could be reworked to avoid
consulting the per cpu offset arrray which would make it equivalent to the
current implementation.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id()
  2009-11-13 21:17 ` [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id() Lee Schermerhorn
@ 2009-11-20 15:46   ` Christoph Lameter
  2009-11-20 15:46     ` Christoph Lameter
  2009-11-30 20:28     ` Lee Schermerhorn
  0 siblings, 2 replies; 27+ messages in thread
From: Christoph Lameter @ 2009-11-20 15:46 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, akpm, Mel Gorman, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Tejun Heo

On Fri, 13 Nov 2009, Lee Schermerhorn wrote:

> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> [Christoph's signoff here?]

Basically yes. The moving of the this_cpu ops to asm-generic is something
that is bothering me. Tejun?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id()
  2009-11-20 15:46   ` Christoph Lameter
@ 2009-11-20 15:46     ` Christoph Lameter
  2009-11-30 20:28     ` Lee Schermerhorn
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2009-11-20 15:46 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, akpm, Mel Gorman, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Tejun Heo

On Fri, 13 Nov 2009, Lee Schermerhorn wrote:

> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> [Christoph's signoff here?]

Basically yes. The moving of the this_cpu ops to asm-generic is something
that is bothering me. Tejun?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 2/6] numa:  x86_64:  use generic percpu var numa_node_id() implementation
  2009-11-13 21:17 ` [PATCH/RFC 2/6] numa: x86_64: use generic percpu var numa_node_id() implementation Lee Schermerhorn
  2009-11-13 21:17   ` Lee Schermerhorn
@ 2009-11-20 15:48   ` Christoph Lameter
  2009-11-20 15:48     ` Christoph Lameter
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2009-11-20 15:48 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, akpm, Mel Gorman, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney

On Fri, 13 Nov 2009, Lee Schermerhorn wrote:

>     [I think!  What about cpu_to_node() func in x86/mm/numa_64.c ???]

If thats too early for per cpu operations then it cannot be used there.

> ===================================================================
> --- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/include/asm/percpu.h
> +++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/include/asm/percpu.h
> @@ -150,10 +150,12 @@ do {							\
>  #define percpu_or(var, val)		percpu_to_op("or", var, val)
>  #define percpu_xor(var, val)		percpu_to_op("xor", var, val)
>
> +#define __this_cpu_read(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>  #define __this_cpu_read_1(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>  #define __this_cpu_read_2(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>  #define __this_cpu_read_4(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>
> +#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
>  #define __this_cpu_write_1(pcp, val)	percpu_to_op("mov", (pcp), val)
>  #define __this_cpu_write_2(pcp, val)	percpu_to_op("mov", (pcp), val)
>  #define __this_cpu_write_4(pcp, val)	percpu_to_op("mov", (pcp), val)


What does percpu generic stuff do here?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 2/6] numa:  x86_64:  use generic percpu var numa_node_id() implementation
  2009-11-20 15:48   ` Christoph Lameter
@ 2009-11-20 15:48     ` Christoph Lameter
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2009-11-20 15:48 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, akpm, Mel Gorman, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney

On Fri, 13 Nov 2009, Lee Schermerhorn wrote:

>     [I think!  What about cpu_to_node() func in x86/mm/numa_64.c ???]

If thats too early for per cpu operations then it cannot be used there.

> ===================================================================
> --- linux-2.6.32-rc5-mmotm-091101-1001.orig/arch/x86/include/asm/percpu.h
> +++ linux-2.6.32-rc5-mmotm-091101-1001/arch/x86/include/asm/percpu.h
> @@ -150,10 +150,12 @@ do {							\
>  #define percpu_or(var, val)		percpu_to_op("or", var, val)
>  #define percpu_xor(var, val)		percpu_to_op("xor", var, val)
>
> +#define __this_cpu_read(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>  #define __this_cpu_read_1(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>  #define __this_cpu_read_2(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>  #define __this_cpu_read_4(pcp)		percpu_from_op("mov", (pcp), "m"(pcp))
>
> +#define __this_cpu_write(pcp, val)	percpu_to_op("mov", (pcp), val)
>  #define __this_cpu_write_1(pcp, val)	percpu_to_op("mov", (pcp), val)
>  #define __this_cpu_write_2(pcp, val)	percpu_to_op("mov", (pcp), val)
>  #define __this_cpu_write_4(pcp, val)	percpu_to_op("mov", (pcp), val)


What does percpu generic stuff do here?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 3/6] numa:  ia64:  use generic percpu var numa_node_id() implementation
  2009-11-13 21:18 ` [PATCH/RFC 3/6] numa: ia64: " Lee Schermerhorn
  2009-11-13 21:18   ` Lee Schermerhorn
@ 2009-11-20 15:50   ` Christoph Lameter
  2009-11-20 15:50     ` Christoph Lameter
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2009-11-20 15:50 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, akpm, Mel Gorman, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney


Reviewed-by: Christoph Lameter <cl@linux-foundation.org>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 3/6] numa:  ia64:  use generic percpu var numa_node_id() implementation
  2009-11-20 15:50   ` Christoph Lameter
@ 2009-11-20 15:50     ` Christoph Lameter
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2009-11-20 15:50 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, akpm, Mel Gorman, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney


Reviewed-by: Christoph Lameter <cl@linux-foundation.org>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 4/6] numa:  Introduce numa_mem_id()- effective local memory node id
  2009-11-13 21:18 ` [PATCH/RFC 4/6] numa: Introduce numa_mem_id()- effective local memory node id Lee Schermerhorn
  2009-11-13 21:18   ` Lee Schermerhorn
@ 2009-11-20 15:53   ` Christoph Lameter
  2009-11-20 15:53     ` Christoph Lameter
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2009-11-20 15:53 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, akpm, Mel Gorman, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney


Reviewed-by: Christoph Lameter <cl@linux-foundation.org>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 4/6] numa:  Introduce numa_mem_id()- effective local memory node id
  2009-11-20 15:53   ` Christoph Lameter
@ 2009-11-20 15:53     ` Christoph Lameter
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2009-11-20 15:53 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, akpm, Mel Gorman, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney


Reviewed-by: Christoph Lameter <cl@linux-foundation.org>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 6/6] numa: slab:  use numa_mem_id() for slab local memory node
  2009-11-13 21:18 ` [PATCH/RFC 6/6] numa: slab: use numa_mem_id() for slab local memory node Lee Schermerhorn
  2009-11-13 21:18   ` Lee Schermerhorn
@ 2009-11-20 15:56   ` Christoph Lameter
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2009-11-20 15:56 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-arch, linux-mm, akpm, Mel Gorman, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Pekka Enberg

On Fri, 13 Nov 2009, Lee Schermerhorn wrote:

> N.B.:  incomplete.  slab will need to handle node and memory hotplug
> that could change the value returned by numa_mem_id() for any given
> node.  This will be addressed by a subsequent patch, if we decide to
> go this route.

It needs to be verified that this actually works. Locking is highly
depending on numa locality in slab. Can you run this under load with
lockdep? See also the lockdep issue that Pekka is dealing with right now.

>
> 2.6.32-rc5+mmotm-091101		no-patch	this-patch
> no memoryless nodes [avg of 10]:  12.700	  12.856  ~1.2%
> cpus all on memless nodes  [20]: 261.530	  27.700 ~10x speedup

This is due to memoryless nodes being able to use per cpu queues in slab
now. So far memoryless nodes always use fallback_alloc().

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id()
  2009-11-20 15:46   ` Christoph Lameter
  2009-11-20 15:46     ` Christoph Lameter
@ 2009-11-30 20:28     ` Lee Schermerhorn
  2009-11-30 20:40       ` Matthew Wilcox
  2009-11-30 23:43       ` Arnd Bergmann
  1 sibling, 2 replies; 27+ messages in thread
From: Lee Schermerhorn @ 2009-11-30 20:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-arch, linux-mm, akpm, Mel Gorman, Christoph Lameter,
	Nick Piggin, David Rientjes, eric.whitney, Tejun Heo

On Fri, 2009-11-20 at 10:46 -0500, Christoph Lameter wrote:
> On Fri, 13 Nov 2009, Lee Schermerhorn wrote:
> 
> > Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> > [Christoph's signoff here?]
> 
> Basically yes. The moving of the this_cpu ops to asm-generic is something
> that is bothering me. Tejun?

So here's what happened:

linux/topology.h now depends on */percpu.h to implement numa_node_id()
and numa_mem_id().  Not so much an issue for x86 because its
asm/topology.h already depended on its asm/percpu.h.  But ia64, for
instance--maybe any arch that doesn't already implement numa_node_id()
as a percpu variable--didn't define this_cpu_read() for
linux/topology.h.

So, I included <linux/percpu.h>.

linux/percpu.h, for reasons of its own, includes linux/swap.h which
includes linux/gfp.h which includes linux/topology.h for the definition
of numa_node_id().  topology.h hasn't gotten around to defining
numa_node_id() yet--it's still including percpu.h.  ...

Looking at other asm/foo.h and asm-generic/foo.h relationships, I see
that some define the generic version of the api in the asm-generic
header if the arch asm header hasn't already defined it.  asm/topology.h
is an instance of this.  It includes asm-generic/topology.h after
defining arch specific versions of some of the api.

Following this model, I moved the generic definitions of the percpu api
back to the asm-generic version where it would be available without the
inclusion of swap.h, et al. 

I tried including <asm/percpu.h> in linux/topology.h but the was advised
to use the generic header.  So I followed the model of the x86
asm/topology.h and included asm/percpu.h in the ia64 asm/topology.h,
making the definitions visible to linux/topology.h.

This reminds me that I should add to the patch description a 3rd item
required for an arch to use the generic percpu numa_node_id()
implementation:  make the percpu variable access interface visible via
asm/topology.h.

Does that sound reasonable?

Lee

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id()
  2009-11-30 20:28     ` Lee Schermerhorn
@ 2009-11-30 20:40       ` Matthew Wilcox
  2009-11-30 23:43       ` Arnd Bergmann
  1 sibling, 0 replies; 27+ messages in thread
From: Matthew Wilcox @ 2009-11-30 20:40 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Christoph Lameter, linux-arch, linux-mm, akpm, Mel Gorman,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	Tejun Heo

On Mon, Nov 30, 2009 at 03:28:40PM -0500, Lee Schermerhorn wrote:
> linux/topology.h now depends on */percpu.h to implement numa_node_id()
> and numa_mem_id().  Not so much an issue for x86 because its
> asm/topology.h already depended on its asm/percpu.h.  But ia64, for
> instance--maybe any arch that doesn't already implement numa_node_id()
> as a percpu variable--didn't define this_cpu_read() for
> linux/topology.h.
> 
> So, I included <linux/percpu.h>.
> 
> linux/percpu.h, for reasons of its own, includes linux/swap.h which

typo there ... slab.h, not swap.h.  I thought we might be able to break
the cycle here, but slab.h is more reasonable than swap.h.

We could move __alloc_percpu out of line ... it's only inline for the
!SMP case.

> includes linux/gfp.h which includes linux/topology.h for the definition
> of numa_node_id().  topology.h hasn't gotten around to defining
> numa_node_id() yet--it's still including percpu.h.  ...
> 
> Looking at other asm/foo.h and asm-generic/foo.h relationships, I see
> that some define the generic version of the api in the asm-generic
> header if the arch asm header hasn't already defined it.  asm/topology.h
> is an instance of this.  It includes asm-generic/topology.h after
> defining arch specific versions of some of the api.
> 
> Following this model, I moved the generic definitions of the percpu api
> back to the asm-generic version where it would be available without the
> inclusion of swap.h, et al. 
> 
> I tried including <asm/percpu.h> in linux/topology.h but the was advised
> to use the generic header.  So I followed the model of the x86
> asm/topology.h and included asm/percpu.h in the ia64 asm/topology.h,
> making the definitions visible to linux/topology.h.
> 
> This reminds me that I should add to the patch description a 3rd item
> required for an arch to use the generic percpu numa_node_id()
> implementation:  make the percpu variable access interface visible via
> asm/topology.h.
> 
> Does that sound reasonable?
> 
> Lee
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id()
  2009-11-30 20:28     ` Lee Schermerhorn
  2009-11-30 20:40       ` Matthew Wilcox
@ 2009-11-30 23:43       ` Arnd Bergmann
  2009-11-30 23:43         ` Arnd Bergmann
  2009-12-02 16:29         ` Lee Schermerhorn
  1 sibling, 2 replies; 27+ messages in thread
From: Arnd Bergmann @ 2009-11-30 23:43 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Christoph Lameter, linux-arch, linux-mm, akpm, Mel Gorman,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	Tejun Heo

On Monday 30 November 2009, Lee Schermerhorn wrote:
> Looking at other asm/foo.h and asm-generic/foo.h relationships, I see
> that some define the generic version of the api in the asm-generic
> header if the arch asm header hasn't already defined it.  asm/topology.h
> is an instance of this.  It includes asm-generic/topology.h after
> defining arch specific versions of some of the api.

This works alright, but if you expect every architecture to include the
asm-generic version, you might just as well take that choice away from
the architecture and put the common code into the linux/foo.h file,
which you can still override with definitions in asm/foo.h.

Most of the asm-generic headers are just mostly generic, and get included
by some but not all architectures, the others defining the whole contents
of the asm-generic file themselves in a different way.

So if you e.g. want ia64 to do everything itself and all other architectures to
share some or all parts of asm-generic/topology, your approach is right,
otherwise just leave the code in some file in include/linux/.

	Arnd <><

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id()
  2009-11-30 23:43       ` Arnd Bergmann
@ 2009-11-30 23:43         ` Arnd Bergmann
  2009-12-02 16:29         ` Lee Schermerhorn
  1 sibling, 0 replies; 27+ messages in thread
From: Arnd Bergmann @ 2009-11-30 23:43 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Christoph Lameter, linux-arch, linux-mm, akpm, Mel Gorman,
	Christoph Lameter, Nick Piggin, David Rientjes, eric.whitney,
	Tejun Heo

On Monday 30 November 2009, Lee Schermerhorn wrote:
> Looking at other asm/foo.h and asm-generic/foo.h relationships, I see
> that some define the generic version of the api in the asm-generic
> header if the arch asm header hasn't already defined it.  asm/topology.h
> is an instance of this.  It includes asm-generic/topology.h after
> defining arch specific versions of some of the api.

This works alright, but if you expect every architecture to include the
asm-generic version, you might just as well take that choice away from
the architecture and put the common code into the linux/foo.h file,
which you can still override with definitions in asm/foo.h.

Most of the asm-generic headers are just mostly generic, and get included
by some but not all architectures, the others defining the whole contents
of the asm-generic file themselves in a different way.

So if you e.g. want ia64 to do everything itself and all other architectures to
share some or all parts of asm-generic/topology, your approach is right,
otherwise just leave the code in some file in include/linux/.

	Arnd <><

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id()
  2009-11-30 23:43       ` Arnd Bergmann
  2009-11-30 23:43         ` Arnd Bergmann
@ 2009-12-02 16:29         ` Lee Schermerhorn
  1 sibling, 0 replies; 27+ messages in thread
From: Lee Schermerhorn @ 2009-12-02 16:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Lameter, linux-arch, linux-mm, akpm, Mel Gorman,
	Nick Piggin, David Rientjes, eric.whitney, Tejun Heo

On Tue, 2009-12-01 at 00:43 +0100, Arnd Bergmann wrote:
> On Monday 30 November 2009, Lee Schermerhorn wrote:
> > Looking at other asm/foo.h and asm-generic/foo.h relationships, I see
> > that some define the generic version of the api in the asm-generic
> > header if the arch asm header hasn't already defined it.  asm/topology.h
> > is an instance of this.  It includes asm-generic/topology.h after
> > defining arch specific versions of some of the api.
> 
> This works alright, but if you expect every architecture to include the
> asm-generic version, you might just as well take that choice away from
> the architecture and put the common code into the linux/foo.h file,
> which you can still override with definitions in asm/foo.h.
> 
> Most of the asm-generic headers are just mostly generic, and get included
> by some but not all architectures, the others defining the whole contents
> of the asm-generic file themselves in a different way.
> 
> So if you e.g. want ia64 to do everything itself and all other architectures to
> share some or all parts of asm-generic/topology, your approach is right,
> otherwise just leave the code in some file in include/linux/.

Actually, I just wanted to make the generic definitions of
this_cpu_{read|write}() visible to topology.h when building on ia64 w/o
the circular header dependencies.  Willy pointed out a way to do this by
un-inlining __alloc_percpu().  Perhaps this is the way to go.  Tejun is
looking at the patches.

Lee

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-12-02 16:29 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-13 21:17 [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id() Lee Schermerhorn
2009-11-13 21:17 ` [PATCH/RFC 1/6] numa: Use Generic Per-cpu Variables for numa_node_id() Lee Schermerhorn
2009-11-20 15:46   ` Christoph Lameter
2009-11-20 15:46     ` Christoph Lameter
2009-11-30 20:28     ` Lee Schermerhorn
2009-11-30 20:40       ` Matthew Wilcox
2009-11-30 23:43       ` Arnd Bergmann
2009-11-30 23:43         ` Arnd Bergmann
2009-12-02 16:29         ` Lee Schermerhorn
2009-11-13 21:17 ` [PATCH/RFC 2/6] numa: x86_64: use generic percpu var numa_node_id() implementation Lee Schermerhorn
2009-11-13 21:17   ` Lee Schermerhorn
2009-11-20 15:48   ` Christoph Lameter
2009-11-20 15:48     ` Christoph Lameter
2009-11-13 21:18 ` [PATCH/RFC 3/6] numa: ia64: " Lee Schermerhorn
2009-11-13 21:18   ` Lee Schermerhorn
2009-11-20 15:50   ` Christoph Lameter
2009-11-20 15:50     ` Christoph Lameter
2009-11-13 21:18 ` [PATCH/RFC 4/6] numa: Introduce numa_mem_id()- effective local memory node id Lee Schermerhorn
2009-11-13 21:18   ` Lee Schermerhorn
2009-11-20 15:53   ` Christoph Lameter
2009-11-20 15:53     ` Christoph Lameter
2009-11-13 21:18 ` [PATCH/RFC 5/6] numa: ia64: support numa_mem_id() for memoryless nodes Lee Schermerhorn
2009-11-13 21:18 ` [PATCH/RFC 6/6] numa: slab: use numa_mem_id() for slab local memory node Lee Schermerhorn
2009-11-13 21:18   ` Lee Schermerhorn
2009-11-20 15:56   ` Christoph Lameter
2009-11-20 15:43 ` [PATCH/RFC 0/6] Numa: Use Generic Per-cpu Variables for numa_*_id() Christoph Lameter
2009-11-20 15:43   ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).