linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP
@ 2009-01-13 10:38 Tejun Heo
  2009-01-13 10:38 ` [PATCH 01/13] x86_64: fix pda_to_op() Tejun Heo
                   ` (13 more replies)
  0 siblings, 14 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh

Hello,

This patchset is re-hash of Mike Travis' original work on zero-based
percpu offsets.  I couldn't get the original patches working (where is
the linker script changes other than the PHDR addition?) and ended up
doing a quite similar series of patches.  A lot of this patch series
was copied verbatim from or got a lot of hints from Mike Travis'
original patches.  Some differences are...

* better splitted - one thing at a time.

* how the pda is allocated at the head.  this patch uses preallocation
  directly from linker script as it's something arch specific anyway.

* pda ops are wrappers around x86 percpu ops.

* x86-32 SMP is also converted to zero-based offsets, so we should be
  able to do cpualloc (or whatever) for both subarchs.

This patchset contains the following thirteen patches.

  0001-x86_64-fix-pda_to_op.patch
  0002-x86-make-early_per_cpu-a-lvalue-and-use-it.patch
  0003-x86_64-Cleanup-early-setup_percpu-references.patch
  0004-x86_32-make-vmlinux_32.lds.S-use-PERCPU-macro.patch
  0005-x86_64-make-percpu-symbols-zerobased-on-SMP.patch
  0006-x86_64-load-pointer-to-pda-into-gs-while-brining-u.patch
  0007-x86_64-use-static-_cpu_pda-array.patch
  0008-x86_64-fold-pda-into-percpu-area-on-SMP.patch
  0009-x86_64-merge-64-and-32-SMP-percpu-handling.patch
  0010-x86_64-make-pda-a-percpu-variable.patch
  0011-x86_64-convert-pda-ops-to-wrappers-around-x86-percp.patch
  0012-x86_64-misc-clean-up-after-the-percpu-update.patch
  0013-x86_32-make-percpu-symbols-zerobased-on-SMP.patch

0001-0004 prepares for coming changes.  0005 converts x86_64 SMP
percpu to zero based offset.  0006-0007 preps for percpu/pda
simplification.  0008-0012 unifies percpu and pda handling
step-by-step.  0013 converts x86_32 SMP percpu to zer based offset.

Each step was tested with gcc-4.1.3, 4.2.3 and 4.3.1 on x86_64, 4.3.1
on i386 for both SMP and UP configurations.  The final state is
currently being tested using 4.2.3 on both x86_64 and 32.

This patchset is on top of the current linux-2.6-x86:tip[1] and also
available in the following git tree.

  http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=x86-percpu-zerobased
  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86-percpu-zerobased

 arch/x86/include/asm/pda.h          |   97 ++--------------------
 arch/x86/include/asm/percpu.h       |  137 ++++++++++++-------------------
 arch/x86/include/asm/setup.h        |    1 
 arch/x86/include/asm/smp.h          |    2 
 arch/x86/include/asm/topology.h     |    5 -
 arch/x86/include/asm/trampoline.h   |    1 
 arch/x86/kernel/acpi/sleep.c        |    1 
 arch/x86/kernel/apic.c              |   13 ---
 arch/x86/kernel/asm-offsets_64.c    |    2 
 arch/x86/kernel/cpu/common.c        |   17 ++-
 arch/x86/kernel/entry_64.S          |    7 -
 arch/x86/kernel/head64.c            |   23 -----
 arch/x86/kernel/head_32.S           |   37 +++++++-
 arch/x86/kernel/head_64.S           |   47 +++++++++-
 arch/x86/kernel/setup_percpu.c      |  156 +++++++++++++++++++-----------------
 arch/x86/kernel/smpboot.c           |   61 --------------
 arch/x86/kernel/smpcommon.c         |    3 
 arch/x86/kernel/vmlinux_32.lds.S    |   23 +++--
 arch/x86/kernel/vmlinux_64.lds.S    |   22 ++++-
 arch/x86/kernel/x8664_ksyms_64.c    |    2 
 arch/x86/mach-voyager/voyager_smp.c |    2 
 arch/x86/xen/enlighten.c            |    2 
 arch/x86/xen/smp.c                  |   10 --
 include/asm-generic/sections.h      |    2 
 include/asm-generic/vmlinux.lds.h   |   74 +++++++++++++++--
 25 files changed, 353 insertions(+), 394 deletions(-)

Thanks.

--
tejun

[1] c69feb89654622a77c5a8d8e5b20e3dbe0a7a92b

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 01/13] x86_64: fix pda_to_op()
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-13 10:38 ` [PATCH 02/13] x86: make early_per_cpu() a lvalue and use it Tejun Heo
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

There's no instruction to move a 64bit immediate into memory location.
Drop "i".

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 arch/x86/include/asm/pda.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/pda.h b/arch/x86/include/asm/pda.h
index 2fbfff8..cbd3f48 100644
--- a/arch/x86/include/asm/pda.h
+++ b/arch/x86/include/asm/pda.h
@@ -78,7 +78,7 @@ do {									\
 	case 8:								\
 		asm(op "q %1,%%gs:%c2":					\
 		    "+m" (_proxy_pda.field) :				\
-		    "ri" ((T__)val),					\
+		    "r" ((T__)val),					\
 		    "i"(pda_offset(field)));				\
 		break;							\
 	default:							\
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 02/13] x86: make early_per_cpu() a lvalue and use it
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
  2009-01-13 10:38 ` [PATCH 01/13] x86_64: fix pda_to_op() Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-13 11:43   ` [PATCH 02/13 UPDATED] " Tejun Heo
  2009-01-13 10:38 ` [PATCH 03/13] x86_64: Cleanup early setup_percpu references Tejun Heo
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

Make early_per_cpu() a lvalue as per_cpu() is and use it where
applicable.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 arch/x86/include/asm/percpu.h   |    6 +++---
 arch/x86/include/asm/topology.h |    5 +----
 arch/x86/kernel/apic.c          |   13 ++-----------
 3 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index ece7205..df644f3 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -195,9 +195,9 @@ do {							\
 #define	early_per_cpu_ptr(_name) (_name##_early_ptr)
 #define	early_per_cpu_map(_name, _idx) (_name##_early_map[_idx])
 #define	early_per_cpu(_name, _cpu) 				\
-	(early_per_cpu_ptr(_name) ?				\
-		early_per_cpu_ptr(_name)[_cpu] :		\
-		per_cpu(_name, _cpu))
+	*(early_per_cpu_ptr(_name) ?				\
+		&early_per_cpu_ptr(_name)[_cpu] :		\
+		&per_cpu(_name, _cpu))
 
 #else	/* !CONFIG_SMP */
 #define	DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)		\
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index ff386ff..3ac51b5 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -96,10 +96,7 @@ static inline int cpu_to_node(int cpu)
 /* Same function but used if called before per_cpu areas are setup */
 static inline int early_cpu_to_node(int cpu)
 {
-	if (early_per_cpu_ptr(x86_cpu_to_node_map))
-		return early_per_cpu_ptr(x86_cpu_to_node_map)[cpu];
-
-	return per_cpu(x86_cpu_to_node_map, cpu);
+	return early_per_cpu(x86_cpu_to_node_map);
 }
 
 /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
diff --git a/arch/x86/kernel/apic.c b/arch/x86/kernel/apic.c
index b5229af..c2f4cbd 100644
--- a/arch/x86/kernel/apic.c
+++ b/arch/x86/kernel/apic.c
@@ -1865,17 +1865,8 @@ void __cpuinit generic_processor_info(int apicid, int version)
 #endif
 
 #if defined(CONFIG_X86_SMP) || defined(CONFIG_X86_64)
-	/* are we being called early in kernel startup? */
-	if (early_per_cpu_ptr(x86_cpu_to_apicid)) {
-		u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid);
-		u16 *bios_cpu_apicid = early_per_cpu_ptr(x86_bios_cpu_apicid);
-
-		cpu_to_apicid[cpu] = apicid;
-		bios_cpu_apicid[cpu] = apicid;
-	} else {
-		per_cpu(x86_cpu_to_apicid, cpu) = apicid;
-		per_cpu(x86_bios_cpu_apicid, cpu) = apicid;
-	}
+	early_per_cpu(x86_cpu_to_apicid, cpu) = apicid;
+	early_per_cpu(x86_bios_cpu_apicid, cpu) = apicid;
 #endif
 
 	cpu_set(cpu, cpu_possible_map);
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 03/13] x86_64: Cleanup early setup_percpu references
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
  2009-01-13 10:38 ` [PATCH 01/13] x86_64: fix pda_to_op() Tejun Heo
  2009-01-13 10:38 ` [PATCH 02/13] x86: make early_per_cpu() a lvalue and use it Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-13 10:38 ` [PATCH 04/13] x86_32: make vmlinux_32.lds.S use PERCPU() macro Tejun Heo
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

From: Mike Travis <travis@sgi.com>

From: Mike Travis <travis@sgi.com>

  * Ruggedize some calls in setup_percpu.c to prevent mishaps
    in early calls, particularly for non-critical functions.

  * Cleanup DEBUG_PER_CPU_MAPS usages and some comments.

Based on linux-2.6.tip/master with following patches applied:

	cpumask: Make cpumask_of_cpu_map generic
	cpumask: Put cpumask_of_cpu_map in the initdata section
	cpumask: Change cpumask_of_cpu_ptr to use new cpumask_of_cpu

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 arch/x86/kernel/setup_percpu.c |   60 ++++++++++++++++++++++++++++------------
 1 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index ae0c0d3..ee55e3b 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -15,6 +15,12 @@
 #include <asm/apicdef.h>
 #include <asm/highmem.h>
 
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+# define DBG(x...) printk(KERN_DEBUG x)
+#else
+# define DBG(x...)
+#endif
+
 #ifdef CONFIG_X86_LOCAL_APIC
 unsigned int num_processors;
 unsigned disabled_cpus __cpuinitdata;
@@ -27,31 +33,39 @@ EXPORT_SYMBOL(boot_cpu_physical_apicid);
 physid_mask_t phys_cpu_present_map;
 #endif
 
-/* map cpu index to physical APIC ID */
+/*
+ * Map cpu index to physical APIC ID
+ */
 DEFINE_EARLY_PER_CPU(u16, x86_cpu_to_apicid, BAD_APICID);
 DEFINE_EARLY_PER_CPU(u16, x86_bios_cpu_apicid, BAD_APICID);
 EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_apicid);
 EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu_apicid);
 
 #if defined(CONFIG_NUMA) && defined(CONFIG_X86_64)
-#define	X86_64_NUMA	1
+#define	X86_64_NUMA	1	/* (used later) */
 
-/* map cpu index to node index */
+/*
+ * Map cpu index to node index
+ */
 DEFINE_EARLY_PER_CPU(int, x86_cpu_to_node_map, NUMA_NO_NODE);
 EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_node_map);
 
-/* which logical CPUs are on which nodes */
+/*
+ * Which logical CPUs are on which nodes
+ */
 cpumask_t *node_to_cpumask_map;
 EXPORT_SYMBOL(node_to_cpumask_map);
 
-/* setup node_to_cpumask_map */
+/*
+ * Setup node_to_cpumask_map
+ */
 static void __init setup_node_to_cpumask_map(void);
 
 #else
 static inline void setup_node_to_cpumask_map(void) { }
 #endif
 
-#if defined(CONFIG_HAVE_SETUP_PER_CPU_AREA) && defined(CONFIG_X86_SMP)
+#ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
 /*
  * Copy data used in early init routines from the initial arrays to the
  * per cpu data areas.  These arrays then become expendable and the
@@ -181,10 +195,12 @@ void __init setup_per_cpu_areas(void)
 #endif
 		per_cpu_offset(cpu) = ptr - __per_cpu_start;
 		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+
+		DBG("PERCPU: cpu %4d %p\n", cpu, ptr);
 	}
 
-	printk(KERN_DEBUG "NR_CPUS: %d, nr_cpu_ids: %d, nr_node_ids %d\n",
-		NR_CPUS, nr_cpu_ids, nr_node_ids);
+	printk(KERN_INFO "NR_CPUS: %d, nr_cpu_ids: %d, nr_node_ids: %d\n",
+	       NR_CPUS, nr_cpu_ids, nr_node_ids);
 
 	/* Setup percpu data maps */
 	setup_per_cpu_maps();
@@ -202,6 +218,7 @@ void __init setup_per_cpu_areas(void)
  * Requires node_possible_map to be valid.
  *
  * Note: node_to_cpumask() is not valid until after this is done.
+ * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
  */
 static void __init setup_node_to_cpumask_map(void)
 {
@@ -217,6 +234,7 @@ static void __init setup_node_to_cpumask_map(void)
 
 	/* allocate the map */
 	map = alloc_bootmem_low(nr_node_ids * sizeof(cpumask_t));
+	DBG("node_to_cpumask_map at %p for %d nodes\n", map, nr_node_ids);
 
 	pr_debug("Node to cpumask map at %p for %d nodes\n",
 		 map, nr_node_ids);
@@ -229,17 +247,23 @@ void __cpuinit numa_set_node(int cpu, int node)
 {
 	int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
 
-	if (cpu_pda(cpu) && node != NUMA_NO_NODE)
-		cpu_pda(cpu)->nodenumber = node;
-
-	if (cpu_to_node_map)
+	/* early setting, no percpu area yet */
+	if (cpu_to_node_map) {
 		cpu_to_node_map[cpu] = node;
+		return;
+	}
 
-	else if (per_cpu_offset(cpu))
-		per_cpu(x86_cpu_to_node_map, cpu) = node;
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+	if (cpu >= nr_cpu_ids || !per_cpu_offset(cpu)) {
+		printk(KERN_ERR "numa_set_node: invalid cpu# (%d)\n", cpu);
+		dump_stack();
+		return;
+	}
+#endif
+	per_cpu(x86_cpu_to_node_map, cpu) = node;
 
-	else
-		pr_debug("Setting node for non-present cpu %d\n", cpu);
+	if (node != NUMA_NO_NODE)
+		cpu_pda(cpu)->nodenumber = node;
 }
 
 void __cpuinit numa_clear_node(int cpu)
@@ -256,7 +280,7 @@ void __cpuinit numa_add_cpu(int cpu)
 
 void __cpuinit numa_remove_cpu(int cpu)
 {
-	cpu_clear(cpu, node_to_cpumask_map[cpu_to_node(cpu)]);
+	cpu_clear(cpu, node_to_cpumask_map[early_cpu_to_node(cpu)]);
 }
 
 #else /* CONFIG_DEBUG_PER_CPU_MAPS */
@@ -266,7 +290,7 @@ void __cpuinit numa_remove_cpu(int cpu)
  */
 static void __cpuinit numa_set_cpumask(int cpu, int enable)
 {
-	int node = cpu_to_node(cpu);
+	int node = early_cpu_to_node(cpu);
 	cpumask_t *mask;
 	char buf[64];
 
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 04/13] x86_32: make vmlinux_32.lds.S use PERCPU() macro
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
                   ` (2 preceding siblings ...)
  2009-01-13 10:38 ` [PATCH 03/13] x86_64: Cleanup early setup_percpu references Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-13 10:38 ` [PATCH 05/13] x86_64: make percpu symbols zerobased on SMP Tejun Heo
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

Make vmlinux_32.lds.S use the generic PERCPU() macro instead of open
coding it.  This will ease future changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 arch/x86/kernel/vmlinux_32.lds.S |    9 +--------
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/vmlinux_32.lds.S b/arch/x86/kernel/vmlinux_32.lds.S
index 82c6755..3eba7f7 100644
--- a/arch/x86/kernel/vmlinux_32.lds.S
+++ b/arch/x86/kernel/vmlinux_32.lds.S
@@ -178,14 +178,7 @@ SECTIONS
 	__initramfs_end = .;
   }
 #endif
-  . = ALIGN(PAGE_SIZE);
-  .data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) {
-	__per_cpu_start = .;
-	*(.data.percpu.page_aligned)
-	*(.data.percpu)
-	*(.data.percpu.shared_aligned)
-	__per_cpu_end = .;
-  }
+  PERCPU(PAGE_SIZE)
   . = ALIGN(PAGE_SIZE);
   /* freed after init ends here */
 
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 05/13] x86_64: make percpu symbols zerobased on SMP
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
                   ` (3 preceding siblings ...)
  2009-01-13 10:38 ` [PATCH 04/13] x86_32: make vmlinux_32.lds.S use PERCPU() macro Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-13 10:38 ` [PATCH 06/13] x86_64: load pointer to pda into %gs while brining up a CPU Tejun Heo
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

This patch makes percpu symbols zerobased on x86_64 SMP by adding
PERCPU_VADDR() to vmlinux.lds.h which helps setting explicit vaddr on
the percpu output section and using it in vmlinux_64.lds.S.  A new
PHDR is added as existing ones cannot contain sections near address
zero.  PERCPU_VADDR() also adds a new symbol __per_cpu_load which
always points to the vaddr of the loaded percpu data.init region.

The following adjustments have been made to accomodate the address
change.

* code to locate percpu gdt_page in head_64.S is updated to add the
  load address to the gdt_page offset.

* __per_cpu_load is used in places where access to the init data area
  is necessary.

* pda->data_offset is initialized soon after C code is entered as zero
  value doesn't work anymore.

This patch is mostly taken from Mike Travis' "x86_64: Base percpu
variables at zero" patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/head64.c          |    2 +
 arch/x86/kernel/head_64.S         |   24 ++++++++++++++++-
 arch/x86/kernel/setup_percpu.c    |    2 +-
 arch/x86/kernel/vmlinux_64.lds.S  |   17 +++++++++++-
 include/asm-generic/sections.h    |    2 +-
 include/asm-generic/vmlinux.lds.h |   51 ++++++++++++++++++++++++++++++++----
 6 files changed, 88 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 388e05a..a63261b 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -44,6 +44,8 @@ void __init x86_64_init_pda(void)
 {
 	_cpu_pda = __cpu_pda;
 	cpu_pda(0) = &_boot_cpu_pda;
+	cpu_pda(0)->data_offset =
+		(unsigned long)(__per_cpu_load - __per_cpu_start);
 	pda_init(0);
 }
 
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 26cfdc1..5ab77b3 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -204,6 +204,23 @@ ENTRY(secondary_startup_64)
 	pushq $0
 	popfq
 
+#ifdef CONFIG_SMP
+	/*
+	 * early_gdt_base should point to the gdt_page in static percpu init
+	 * data area.  Computing this requires two symbols - __per_cpu_load
+	 * and per_cpu__gdt_page.  As linker can't do no such relocation, do
+	 * it by hand.  As early_gdt_descr is manipulated by C code for
+	 * secondary CPUs, this should be done only once for the boot CPU
+	 * when early_gdt_descr_base contains zero.
+	 */
+	movq	early_gdt_descr_base(%rip), %rax
+	testq	%rax, %rax
+	jnz	1f
+	movq	$__per_cpu_load, %rax
+	addq	$per_cpu__gdt_page, %rax
+	movq	%rax, early_gdt_descr_base(%rip)
+1:
+#endif
 	/*
 	 * We must switch to a new descriptor in kernel space for the GDT
 	 * because soon the kernel won't have access anymore to the userspace
@@ -401,7 +418,12 @@ NEXT_PAGE(level2_spare_pgt)
 	.globl early_gdt_descr
 early_gdt_descr:
 	.word	GDT_ENTRIES*8-1
-	.quad   per_cpu__gdt_page
+#ifdef CONFIG_SMP
+early_gdt_descr_base:
+	.quad   0x0000000000000000
+#else
+	.quad	per_cpu__gdt_page
+#endif
 
 ENTRY(phys_base)
 	/* This must match the first entry in level2_kernel_pgt */
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index ee55e3b..8a22c94 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -194,7 +194,7 @@ void __init setup_per_cpu_areas(void)
 		}
 #endif
 		per_cpu_offset(cpu) = ptr - __per_cpu_start;
-		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+		memcpy(ptr, __per_cpu_load, __per_cpu_end - __per_cpu_start);
 
 		DBG("PERCPU: cpu %4d %p\n", cpu, ptr);
 	}
diff --git a/arch/x86/kernel/vmlinux_64.lds.S b/arch/x86/kernel/vmlinux_64.lds.S
index 1a614c0..f50280d 100644
--- a/arch/x86/kernel/vmlinux_64.lds.S
+++ b/arch/x86/kernel/vmlinux_64.lds.S
@@ -19,6 +19,9 @@ PHDRS {
 	data PT_LOAD FLAGS(7);	/* RWE */
 	user PT_LOAD FLAGS(7);	/* RWE */
 	data.init PT_LOAD FLAGS(7);	/* RWE */
+#ifdef CONFIG_SMP
+	percpu PT_LOAD FLAGS(7);	/* RWE */
+#endif
 	note PT_NOTE FLAGS(0);	/* ___ */
 }
 SECTIONS
@@ -208,14 +211,26 @@ SECTIONS
   __initramfs_end = .;
 #endif
 
+#ifdef CONFIG_SMP
+  /*
+   * percpu offsets are zero-based on SMP.  PERCPU_VADDR() changes the
+   * output PHDR, so the next output section - __data_nosave - should
+   * switch it back to data.init.
+   */
+  . = ALIGN(PAGE_SIZE);
+  PERCPU_VADDR(0, :percpu)
+#else
   PERCPU(PAGE_SIZE)
+#endif
 
   . = ALIGN(PAGE_SIZE);
   __init_end = .;
 
   . = ALIGN(PAGE_SIZE);
   __nosave_begin = .;
-  .data_nosave : AT(ADDR(.data_nosave) - LOAD_OFFSET) { *(.data.nosave) }
+  .data_nosave : AT(ADDR(.data_nosave) - LOAD_OFFSET) {
+      *(.data.nosave)
+  } :data.init	/* switch back to data.init, see PERCPU_VADDR() above */
   . = ALIGN(PAGE_SIZE);
   __nosave_end = .;
 
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 79a7ff9..4ce48e8 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -9,7 +9,7 @@ extern char __bss_start[], __bss_stop[];
 extern char __init_begin[], __init_end[];
 extern char _sinittext[], _einittext[];
 extern char _end[];
-extern char __per_cpu_start[], __per_cpu_end[];
+extern char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[];
 extern char __kprobes_text_start[], __kprobes_text_end[];
 extern char __initdata_begin[], __initdata_end[];
 extern char __start_rodata[], __end_rodata[];
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index c61fab1..fc2f55f 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -430,12 +430,51 @@
   	*(.initcall7.init)						\
   	*(.initcall7s.init)
 
-#define PERCPU(align)							\
-	. = ALIGN(align);						\
-	VMLINUX_SYMBOL(__per_cpu_start) = .;				\
-	.data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) {		\
+#define PERCPU_PROLOG(vaddr)						\
+	VMLINUX_SYMBOL(__per_cpu_load) = .;				\
+	.data.percpu vaddr : AT(__per_cpu_load - LOAD_OFFSET) {		\
+		VMLINUX_SYMBOL(__per_cpu_start) = .;
+
+#define PERCPU_EPILOG(phdr)						\
+		VMLINUX_SYMBOL(__per_cpu_end) = .;			\
+	} phdr								\
+	. = __per_cpu_load + SIZEOF(.data.percpu);
+
+/**
+ * PERCPU_VADDR - define output section for percpu area
+ * @vaddr: explicit base address (optional)
+ * @phdr: destination PHDR (optional)
+ *
+ * Macro which expands to output section for percpu area.  If @vaddr
+ * is not blank, it specifies explicit base address and all percpu
+ * symbols will be offset from the given address.  If blank, @vaddr
+ * always equals @laddr + LOAD_OFFSET.
+ *
+ * @phdr defines the output PHDR to use if not blank.  Be warned that
+ * output PHDR is sticky.  If @phdr is specified, the next output
+ * section in the linker script will go there too.  @phdr should have
+ * a leading colon.
+ *
+ * This macro defines three symbols, __per_cpu_load, __per_cpu_start
+ * and __per_cpu_end.  The first one is the vaddr of loaded percpu
+ * init data.  __per_cpu_start equals @vaddr and __per_cpu_end is the
+ * end offset.
+ */
+#define PERCPU_VADDR(vaddr, phdr)					\
+	PERCPU_PROLOG(vaddr)						\
 		*(.data.percpu.page_aligned)				\
 		*(.data.percpu)						\
 		*(.data.percpu.shared_aligned)				\
-	}								\
-	VMLINUX_SYMBOL(__per_cpu_end) = .;
+	PERCPU_EPILOG(phdr)
+
+/**
+ * PERCPU - define output section for percpu area, simple version
+ * @align: required alignment
+ *
+ * Align to @align and outputs output section for percpu area.  This
+ * macro doesn't maniuplate @vaddr or @phdr and __per_cpu_load and
+ * __per_cpu_start will be identical.
+ */
+#define PERCPU(align)							\
+	. = ALIGN(align);						\
+	PERCPU_VADDR( , )
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 06/13] x86_64: load pointer to pda into %gs while brining up a CPU
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
                   ` (4 preceding siblings ...)
  2009-01-13 10:38 ` [PATCH 05/13] x86_64: make percpu symbols zerobased on SMP Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-13 10:38 ` [PATCH 07/13] x86_64: use static _cpu_pda array Tejun Heo
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

CPU startup code in head_64.S loaded address of a zero page into %gs
for temporary use till pda is loaded but address to the actual pda is
available at the point.  Load the real address directly instead.

This will help unifying percpu and pda handling later on.

This patch is mostly taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Travis <travis@sgi.com>
---
 arch/x86/include/asm/trampoline.h |    1 +
 arch/x86/kernel/acpi/sleep.c      |    1 +
 arch/x86/kernel/head64.c          |    4 ++--
 arch/x86/kernel/head_64.S         |   15 ++++++++++-----
 arch/x86/kernel/smpboot.c         |    1 +
 5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index 780ba0a..90f06c2 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -13,6 +13,7 @@ extern unsigned char *trampoline_base;
 
 extern unsigned long init_rsp;
 extern unsigned long initial_code;
+extern unsigned long initial_gs;
 
 #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
 #define TRAMPOLINE_BASE 0x6000
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 806b4e9..fe23ff8 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -101,6 +101,7 @@ int acpi_save_state_mem(void)
 	stack_start.sp = temp_stack + sizeof(temp_stack);
 	early_gdt_descr.address =
 			(unsigned long)get_cpu_gdt_table(smp_processor_id());
+	initial_gs = (unsigned long)cpu_pda(smp_processor_id());
 #endif
 	initial_code = (unsigned long)wakeup_long64;
 	saved_magic = 0x123456789abcdef0;
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index a63261b..54f52cf 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,8 +26,8 @@
 #include <asm/bios_ebda.h>
 #include <asm/trampoline.h>
 
-/* boot cpu pda */
-static struct x8664_pda _boot_cpu_pda __read_mostly;
+/* boot cpu pda, referenced by head_64.S to initialize %gs for boot CPU */
+struct x8664_pda _boot_cpu_pda __read_mostly;
 
 #ifdef CONFIG_SMP
 /*
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 5ab77b3..3f56a8f 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -243,12 +243,15 @@ ENTRY(secondary_startup_64)
 	movl %eax,%fs
 	movl %eax,%gs
 
-	/* 
-	 * Setup up a dummy PDA. this is just for some early bootup code
-	 * that does in_interrupt() 
-	 */ 
+	/* Set up %gs.
+	 *
+	 * %gs should point to the pda.  For initial boot, make %gs point
+	 * to the _boot_cpu_pda in data section.  For a secondary CPU,
+	 * initial_gs should be set to its pda address before the CPU runs
+	 * this code.
+	 */
 	movl	$MSR_GS_BASE,%ecx
-	movq	$empty_zero_page,%rax
+	movq	initial_gs(%rip),%rax
 	movq    %rax,%rdx
 	shrq	$32,%rdx
 	wrmsr	
@@ -274,6 +277,8 @@ ENTRY(secondary_startup_64)
 	.align	8
 	ENTRY(initial_code)
 	.quad	x86_64_start_kernel
+	ENTRY(initial_gs)
+	.quad	_boot_cpu_pda
 	__FINITDATA
 
 	ENTRY(stack_start)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f8500c9..7cd97d9 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -860,6 +860,7 @@ do_rest:
 #else
 	cpu_pda(cpu)->pcurrent = c_idle.idle;
 	clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
+	initial_gs = (unsigned long)cpu_pda(cpu);
 #endif
 	early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
 	initial_code = (unsigned long)start_secondary;
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 07/13] x86_64: use static _cpu_pda array
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
                   ` (5 preceding siblings ...)
  2009-01-13 10:38 ` [PATCH 06/13] x86_64: load pointer to pda into %gs while brining up a CPU Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-13 10:38 ` [PATCH 08/13] x86_64: fold pda into percpu area on SMP Tejun Heo
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

_cpu_pda array first uses statically allocated storage in data.init
and then switches to allocated bootmem to conserve space.  However,
after folding pda area into percpu area, _cpu_pda array will be
removed completely.  Drop the reallocation part to simplify the code
for soon-to-follow changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 arch/x86/include/asm/pda.h     |    3 ++-
 arch/x86/kernel/cpu/common.c   |    2 +-
 arch/x86/kernel/head64.c       |   12 ------------
 arch/x86/kernel/setup_percpu.c |   14 +++-----------
 4 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/pda.h b/arch/x86/include/asm/pda.h
index cbd3f48..2d5b49c 100644
--- a/arch/x86/include/asm/pda.h
+++ b/arch/x86/include/asm/pda.h
@@ -5,6 +5,7 @@
 #include <linux/stddef.h>
 #include <linux/types.h>
 #include <linux/cache.h>
+#include <linux/threads.h>
 #include <asm/page.h>
 
 /* Per processor datastructure. %gs points to it while the kernel runs */
@@ -39,7 +40,7 @@ struct x8664_pda {
 	unsigned irq_spurious_count;
 } ____cacheline_aligned_in_smp;
 
-extern struct x8664_pda **_cpu_pda;
+extern struct x8664_pda *_cpu_pda[NR_CPUS];
 extern void pda_init(int);
 
 #define cpu_pda(i) (_cpu_pda[i])
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 42e0853..5c3d6b3 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -859,7 +859,7 @@ __setup("clearcpuid=", setup_disablecpuid);
 cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE;
 
 #ifdef CONFIG_X86_64
-struct x8664_pda **_cpu_pda __read_mostly;
+struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly;
 EXPORT_SYMBOL(_cpu_pda);
 
 struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table };
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 54f52cf..84b96b9 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -29,20 +29,8 @@
 /* boot cpu pda, referenced by head_64.S to initialize %gs for boot CPU */
 struct x8664_pda _boot_cpu_pda __read_mostly;
 
-#ifdef CONFIG_SMP
-/*
- * We install an empty cpu_pda pointer table to indicate to early users
- * (numa_set_node) that the cpu_pda pointer table for cpus other than
- * the boot cpu is not yet setup.
- */
-static struct x8664_pda *__cpu_pda[NR_CPUS] __initdata;
-#else
-static struct x8664_pda *__cpu_pda[NR_CPUS] __read_mostly;
-#endif
-
 void __init x86_64_init_pda(void)
 {
-	_cpu_pda = __cpu_pda;
 	cpu_pda(0) = &_boot_cpu_pda;
 	cpu_pda(0)->data_offset =
 		(unsigned long)(__per_cpu_load - __per_cpu_start);
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 8a22c94..bd47bbd 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -114,7 +114,6 @@ static inline void setup_cpu_pda_map(void) { }
 static void __init setup_cpu_pda_map(void)
 {
 	char *pda;
-	struct x8664_pda **new_cpu_pda;
 	unsigned long size;
 	int cpu;
 
@@ -122,28 +121,21 @@ static void __init setup_cpu_pda_map(void)
 
 	/* allocate cpu_pda array and pointer table */
 	{
-		unsigned long tsize = nr_cpu_ids * sizeof(void *);
 		unsigned long asize = size * (nr_cpu_ids - 1);
 
-		tsize = roundup(tsize, cache_line_size());
-		new_cpu_pda = alloc_bootmem(tsize + asize);
-		pda = (char *)new_cpu_pda + tsize;
+		pda = alloc_bootmem(asize);
 	}
 
 	/* initialize pointer table to static pda's */
 	for_each_possible_cpu(cpu) {
 		if (cpu == 0) {
 			/* leave boot cpu pda in place */
-			new_cpu_pda[0] = cpu_pda(0);
 			continue;
 		}
-		new_cpu_pda[cpu] = (struct x8664_pda *)pda;
-		new_cpu_pda[cpu]->in_bootmem = 1;
+		cpu_pda(cpu) = (struct x8664_pda *)pda;
+		cpu_pda(cpu)->in_bootmem = 1;
 		pda += size;
 	}
-
-	/* point to new pointer table */
-	_cpu_pda = new_cpu_pda;
 }
 #endif
 
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 08/13] x86_64: fold pda into percpu area on SMP
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
                   ` (6 preceding siblings ...)
  2009-01-13 10:38 ` [PATCH 07/13] x86_64: use static _cpu_pda array Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-13 10:38 ` [PATCH 09/13] x86_64: merge 64 and 32 SMP percpu handling Tejun Heo
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

Currently pdas and percpu areas are allocated separately.  %gs points
to local pda and percpu area can be reached using pda->data_offset.
This patch folds pda into percpu area.

Due to strange gcc requirement, pda needs to be at the beginning of
the percpu area so that pda->stack_canary is at %gs:40.  To achieve
this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
added and used to reserve pda sized chunk at the start of the percpu
area.

After this change, for boot cpu, %gs first points to pda in the
data.init area and later during setup_per_cpu_areas() gets updated to
point to the actual pda.  This means that setup_per_cpu_areas() need
to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
already has modified it when control reaches setup_per_cpu_areas().

This patch also removes now unnecessary get_local_pda() and its call
sites.

A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Travis <travis@sgi.com>
---
 arch/x86/include/asm/percpu.h     |    8 ++++
 arch/x86/include/asm/smp.h        |    2 -
 arch/x86/kernel/asm-offsets_64.c  |    1 +
 arch/x86/kernel/cpu/common.c      |    6 +--
 arch/x86/kernel/head64.c          |    8 ++++-
 arch/x86/kernel/head_64.S         |   15 ++++++--
 arch/x86/kernel/setup_percpu.c    |   65 ++++++++++++++----------------------
 arch/x86/kernel/smpboot.c         |   60 +---------------------------------
 arch/x86/kernel/vmlinux_64.lds.S  |    6 ++-
 arch/x86/xen/smp.c                |   10 ------
 include/asm-generic/vmlinux.lds.h |   25 +++++++++++++-
 11 files changed, 83 insertions(+), 123 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index df644f3..0ed77cf 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -1,6 +1,14 @@
 #ifndef _ASM_X86_PERCPU_H
 #define _ASM_X86_PERCPU_H
 
+#ifndef __ASSEMBLY__
+#ifdef CONFIG_X86_64
+extern void load_pda_offset(int cpu);
+#else
+static inline void load_pda_offset(int cpu) { }
+#endif
+#endif
+
 #ifdef CONFIG_X86_64
 #include <linux/compiler.h>
 
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index d12811c..288a89e 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -25,8 +25,6 @@ extern cpumask_t cpu_callin_map;
 extern void (*mtrr_hook)(void);
 extern void zap_low_mappings(void);
 
-extern int __cpuinit get_local_pda(int cpu);
-
 extern int smp_num_siblings;
 extern unsigned int num_processors;
 extern cpumask_t cpu_initialized;
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 1d41d3f..f8d1b04 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -56,6 +56,7 @@ int main(void)
 	ENTRY(cpunumber);
 	ENTRY(irqstackptr);
 	ENTRY(data_offset);
+	DEFINE(pda_size, sizeof(struct x8664_pda));
 	BLANK();
 #undef ENTRY
 #ifdef CONFIG_PARAVIRT
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5c3d6b3..cdd419a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -873,10 +873,8 @@ void __cpuinit pda_init(int cpu)
 	/* Setup up data that may be needed in __get_free_pages early */
 	loadsegment(fs, 0);
 	loadsegment(gs, 0);
-	/* Memory clobbers used to order PDA accessed */
-	mb();
-	wrmsrl(MSR_GS_BASE, pda);
-	mb();
+
+	load_pda_offset(cpu);
 
 	pda->cpunumber = cpu;
 	pda->irqcount = -1;
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 84b96b9..ff39bdd 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,12 +26,18 @@
 #include <asm/bios_ebda.h>
 #include <asm/trampoline.h>
 
-/* boot cpu pda, referenced by head_64.S to initialize %gs for boot CPU */
+#ifndef CONFIG_SMP
+/* boot cpu pda, referenced by head_64.S to initialize %gs on UP */
 struct x8664_pda _boot_cpu_pda __read_mostly;
+#endif
 
 void __init x86_64_init_pda(void)
 {
+#ifdef CONFIG_SMP
+	cpu_pda(0) = (void *)__per_cpu_load;
+#else
 	cpu_pda(0) = &_boot_cpu_pda;
+#endif
 	cpu_pda(0)->data_offset =
 		(unsigned long)(__per_cpu_load - __per_cpu_start);
 	pda_init(0);
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 3f56a8f..519185f 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -245,10 +245,13 @@ ENTRY(secondary_startup_64)
 
 	/* Set up %gs.
 	 *
-	 * %gs should point to the pda.  For initial boot, make %gs point
-	 * to the _boot_cpu_pda in data section.  For a secondary CPU,
-	 * initial_gs should be set to its pda address before the CPU runs
-	 * this code.
+	 * On SMP, %gs should point to the per-cpu area.  For initial
+	 * boot, make %gs point to the init data section.  For a
+	 * secondary CPU,initial_gs should be set to its pda address
+	 * before the CPU runs this code.
+	 *
+	 * On UP, initial_gs points to _boot_cpu_pda and doesn't
+	 * change.
 	 */
 	movl	$MSR_GS_BASE,%ecx
 	movq	initial_gs(%rip),%rax
@@ -278,7 +281,11 @@ ENTRY(secondary_startup_64)
 	ENTRY(initial_code)
 	.quad	x86_64_start_kernel
 	ENTRY(initial_gs)
+#ifdef CONFIG_SMP
+	.quad	__per_cpu_load
+#else
 	.quad	_boot_cpu_pda
+#endif
 	__FINITDATA
 
 	ENTRY(stack_start)
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index bd47bbd..24d3e0d 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -14,6 +14,7 @@
 #include <asm/mpspec.h>
 #include <asm/apicdef.h>
 #include <asm/highmem.h>
+#include <asm/proto.h>
 
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
 # define DBG(x...) printk(KERN_DEBUG x)
@@ -65,6 +66,16 @@ static void __init setup_node_to_cpumask_map(void);
 static inline void setup_node_to_cpumask_map(void) { }
 #endif
 
+#ifdef CONFIG_X86_64
+void __cpuinit load_pda_offset(int cpu)
+{
+	/* Memory clobbers used to order pda/percpu accesses */
+	mb();
+	wrmsrl(MSR_GS_BASE, cpu_pda(cpu));
+	mb();
+}
+#endif
+
 #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
 /*
  * Copy data used in early init routines from the initial arrays to the
@@ -101,42 +112,6 @@ static void __init setup_per_cpu_maps(void)
  */
 unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
 EXPORT_SYMBOL(__per_cpu_offset);
-static inline void setup_cpu_pda_map(void) { }
-
-#elif !defined(CONFIG_SMP)
-static inline void setup_cpu_pda_map(void) { }
-
-#else /* CONFIG_SMP && CONFIG_X86_64 */
-
-/*
- * Allocate cpu_pda pointer table and array via alloc_bootmem.
- */
-static void __init setup_cpu_pda_map(void)
-{
-	char *pda;
-	unsigned long size;
-	int cpu;
-
-	size = roundup(sizeof(struct x8664_pda), cache_line_size());
-
-	/* allocate cpu_pda array and pointer table */
-	{
-		unsigned long asize = size * (nr_cpu_ids - 1);
-
-		pda = alloc_bootmem(asize);
-	}
-
-	/* initialize pointer table to static pda's */
-	for_each_possible_cpu(cpu) {
-		if (cpu == 0) {
-			/* leave boot cpu pda in place */
-			continue;
-		}
-		cpu_pda(cpu) = (struct x8664_pda *)pda;
-		cpu_pda(cpu)->in_bootmem = 1;
-		pda += size;
-	}
-}
 #endif
 
 /*
@@ -151,9 +126,6 @@ void __init setup_per_cpu_areas(void)
 	int cpu;
 	unsigned long align = 1;
 
-	/* Setup cpu_pda map */
-	setup_cpu_pda_map();
-
 	/* Copy section for each CPU (we discard the original) */
 	old_size = PERCPU_ENOUGH_ROOM;
 	align = max_t(unsigned long, PAGE_SIZE, align);
@@ -185,8 +157,21 @@ void __init setup_per_cpu_areas(void)
 					 cpu, node, __pa(ptr));
 		}
 #endif
-		per_cpu_offset(cpu) = ptr - __per_cpu_start;
+
 		memcpy(ptr, __per_cpu_load, __per_cpu_end - __per_cpu_start);
+#ifdef CONFIG_X86_64
+		cpu_pda(cpu) = (void *)ptr;
+
+		/*
+		 * CPU0 modified pda in the init data area, reload pda
+		 * offset for CPU0 and clear the area for others.
+		 */
+		if (cpu == 0)
+			load_pda_offset(0);
+		else
+			memset(cpu_pda(cpu), 0, sizeof(*cpu_pda(cpu)));
+#endif
+		per_cpu_offset(cpu) = ptr - __per_cpu_start;
 
 		DBG("PERCPU: cpu %4d %p\n", cpu, ptr);
 	}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7cd97d9..3a0fec6 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -750,52 +750,6 @@ static void __cpuinit do_fork_idle(struct work_struct *work)
 	complete(&c_idle->done);
 }
 
-#ifdef CONFIG_X86_64
-
-/* __ref because it's safe to call free_bootmem when after_bootmem == 0. */
-static void __ref free_bootmem_pda(struct x8664_pda *oldpda)
-{
-	if (!after_bootmem)
-		free_bootmem((unsigned long)oldpda, sizeof(*oldpda));
-}
-
-/*
- * Allocate node local memory for the AP pda.
- *
- * Must be called after the _cpu_pda pointer table is initialized.
- */
-int __cpuinit get_local_pda(int cpu)
-{
-	struct x8664_pda *oldpda, *newpda;
-	unsigned long size = sizeof(struct x8664_pda);
-	int node = cpu_to_node(cpu);
-
-	if (cpu_pda(cpu) && !cpu_pda(cpu)->in_bootmem)
-		return 0;
-
-	oldpda = cpu_pda(cpu);
-	newpda = kmalloc_node(size, GFP_ATOMIC, node);
-	if (!newpda) {
-		printk(KERN_ERR "Could not allocate node local PDA "
-			"for CPU %d on node %d\n", cpu, node);
-
-		if (oldpda)
-			return 0;	/* have a usable pda */
-		else
-			return -1;
-	}
-
-	if (oldpda) {
-		memcpy(newpda, oldpda, size);
-		free_bootmem_pda(oldpda);
-	}
-
-	newpda->in_bootmem = 0;
-	cpu_pda(cpu) = newpda;
-	return 0;
-}
-#endif /* CONFIG_X86_64 */
-
 static int __cpuinit do_boot_cpu(int apicid, int cpu)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
@@ -813,16 +767,6 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu)
 	};
 	INIT_WORK(&c_idle.work, do_fork_idle);
 
-#ifdef CONFIG_X86_64
-	/* Allocate node local memory for AP pdas */
-	if (cpu > 0) {
-		boot_error = get_local_pda(cpu);
-		if (boot_error)
-			goto restore_state;
-			/* if can't get pda memory, can't start cpu */
-	}
-#endif
-
 	alternatives_smp_switch(1);
 
 	c_idle.idle = get_idle_for_cpu(cpu);
@@ -937,9 +881,7 @@ do_rest:
 				inquire_remote_apic(apicid);
 		}
 	}
-#ifdef CONFIG_X86_64
-restore_state:
-#endif
+
 	if (boot_error) {
 		/* Try to put things back the way they were before ... */
 		numa_remove_cpu(cpu); /* was set by numa_add_cpu */
diff --git a/arch/x86/kernel/vmlinux_64.lds.S b/arch/x86/kernel/vmlinux_64.lds.S
index f50280d..962f21f 100644
--- a/arch/x86/kernel/vmlinux_64.lds.S
+++ b/arch/x86/kernel/vmlinux_64.lds.S
@@ -5,6 +5,7 @@
 #define LOAD_OFFSET __START_KERNEL_map
 
 #include <asm-generic/vmlinux.lds.h>
+#include <asm/asm-offsets.h>
 #include <asm/page.h>
 
 #undef i386	/* in case the preprocessor is a 32bit one */
@@ -215,10 +216,11 @@ SECTIONS
   /*
    * percpu offsets are zero-based on SMP.  PERCPU_VADDR() changes the
    * output PHDR, so the next output section - __data_nosave - should
-   * switch it back to data.init.
+   * switch it back to data.init.  Also, pda should be at the head of
+   * percpu area.  Preallocate it.
    */
   . = ALIGN(PAGE_SIZE);
-  PERCPU_VADDR(0, :percpu)
+  PERCPU_VADDR_PREALLOC(0, :percpu, pda_size)
 #else
   PERCPU(PAGE_SIZE)
 #endif
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index acd9b67..ae22284 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -280,16 +280,6 @@ static int __cpuinit xen_cpu_up(unsigned int cpu)
 	struct task_struct *idle = idle_task(cpu);
 	int rc;
 
-#ifdef CONFIG_X86_64
-	/* Allocate node local memory for AP pdas */
-	WARN_ON(cpu == 0);
-	if (cpu > 0) {
-		rc = get_local_pda(cpu);
-		if (rc)
-			return rc;
-	}
-#endif
-
 #ifdef CONFIG_X86_32
 	init_gdt(cpu);
 	per_cpu(current_task, cpu) = idle;
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index fc2f55f..e53319c 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -441,9 +441,10 @@
 	. = __per_cpu_load + SIZEOF(.data.percpu);
 
 /**
- * PERCPU_VADDR - define output section for percpu area
+ * PERCPU_VADDR_PREALLOC - define output section for percpu area with prealloc
  * @vaddr: explicit base address (optional)
  * @phdr: destination PHDR (optional)
+ * @prealloc: the size of prealloc area
  *
  * Macro which expands to output section for percpu area.  If @vaddr
  * is not blank, it specifies explicit base address and all percpu
@@ -455,11 +456,33 @@
  * section in the linker script will go there too.  @phdr should have
  * a leading colon.
  *
+ * If @prealloc is non-zero, the specified number of bytes will be
+ * reserved at the start of percpu area.  As the prealloc area is
+ * likely to break alignment, this macro puts areas in increasing
+ * alignment order.
+ *
  * This macro defines three symbols, __per_cpu_load, __per_cpu_start
  * and __per_cpu_end.  The first one is the vaddr of loaded percpu
  * init data.  __per_cpu_start equals @vaddr and __per_cpu_end is the
  * end offset.
  */
+#define PERCPU_VADDR_PREALLOC(vaddr, segment, prealloc)			\
+	PERCPU_PROLOG(vaddr)						\
+		. += prealloc;						\
+		*(.data.percpu)						\
+		*(.data.percpu.shared_aligned)				\
+		*(.data.percpu.page_aligned)				\
+	PERCPU_EPILOG(segment)
+
+/**
+ * PERCPU_VADDR - define output section for percpu area
+ * @vaddr: explicit base address (optional)
+ * @phdr: destination PHDR (optional)
+ *
+ * Macro which expands to output section for percpu area.  Mostly
+ * identical to PERCPU_VADDR_PREALLOC(@vaddr, @phdr, 0) other than
+ * using slighly different layout.
+ */
 #define PERCPU_VADDR(vaddr, phdr)					\
 	PERCPU_PROLOG(vaddr)						\
 		*(.data.percpu.page_aligned)				\
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 09/13] x86_64: merge 64 and 32 SMP percpu handling
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
                   ` (7 preceding siblings ...)
  2009-01-13 10:38 ` [PATCH 08/13] x86_64: fold pda into percpu area on SMP Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-13 10:38 ` [PATCH 10/13] x86_64: make pda a percpu variable Tejun Heo
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

Now that pda is allocated as part of percpu, percpu doesn't need to be
accessed through pda.  Unify x86_64 SMP percpu access with x86_32 SMP
one.  Other than the segment register, operand size and the base of
percpu symbols, they behave identical now.

This patch replaces now unnecessary pda->data_offset with a dummy
field which is necessary to keep stack_canary at its place.  This
patch also moves per_cpu_offset initialization out of init_gdt() into
setup_per_cpu_areas().  Note that this change also necessitates
explicit per_cpu_offset initializations in voyager_smp.c.

With this change, x86_OP_percpu()'s are as efficient on x86_64 as on
x86_32 and also x86_64 can use assembly PER_CPU macros.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Travis <travis@sgi.com>
---
 arch/x86/include/asm/pda.h          |    3 +-
 arch/x86/include/asm/percpu.h       |  127 +++++++++++------------------------
 arch/x86/kernel/asm-offsets_64.c    |    1 -
 arch/x86/kernel/entry_64.S          |    7 +-
 arch/x86/kernel/head64.c            |    2 -
 arch/x86/kernel/setup_percpu.c      |   15 ++--
 arch/x86/kernel/smpcommon.c         |    3 +-
 arch/x86/mach-voyager/voyager_smp.c |    2 +
 8 files changed, 55 insertions(+), 105 deletions(-)

diff --git a/arch/x86/include/asm/pda.h b/arch/x86/include/asm/pda.h
index 2d5b49c..e91558e 100644
--- a/arch/x86/include/asm/pda.h
+++ b/arch/x86/include/asm/pda.h
@@ -11,8 +11,7 @@
 /* Per processor datastructure. %gs points to it while the kernel runs */
 struct x8664_pda {
 	struct task_struct *pcurrent;	/* 0  Current process */
-	unsigned long data_offset;	/* 8 Per cpu data offset from linker
-					   address */
+	unsigned long dummy;
 	unsigned long kernelstack;	/* 16 top of kernel stack for current */
 	unsigned long oldrsp;		/* 24 user rsp for system call */
 	int irqcount;			/* 32 Irq nesting counter. Starts -1 */
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 0ed77cf..556f84b 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -1,62 +1,13 @@
 #ifndef _ASM_X86_PERCPU_H
 #define _ASM_X86_PERCPU_H
 
-#ifndef __ASSEMBLY__
 #ifdef CONFIG_X86_64
-extern void load_pda_offset(int cpu);
+#define __percpu_seg		gs
+#define __percpu_mov_op		movq
 #else
-static inline void load_pda_offset(int cpu) { }
-#endif
-#endif
-
-#ifdef CONFIG_X86_64
-#include <linux/compiler.h>
-
-/* Same as asm-generic/percpu.h, except that we store the per cpu offset
-   in the PDA. Longer term the PDA and every per cpu variable
-   should be just put into a single section and referenced directly
-   from %gs */
-
-#ifdef CONFIG_SMP
-#include <asm/pda.h>
-
-#define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
-#define __my_cpu_offset read_pda(data_offset)
-
-#define per_cpu_offset(x) (__per_cpu_offset(x))
-
+#define __percpu_seg		fs
+#define __percpu_mov_op		movl
 #endif
-#include <asm-generic/percpu.h>
-
-DECLARE_PER_CPU(struct x8664_pda, pda);
-
-/*
- * These are supposed to be implemented as a single instruction which
- * operates on the per-cpu data base segment.  x86-64 doesn't have
- * that yet, so this is a fairly inefficient workaround for the
- * meantime.  The single instruction is atomic with respect to
- * preemption and interrupts, so we need to explicitly disable
- * interrupts here to achieve the same effect.  However, because it
- * can be used from within interrupt-disable/enable, we can't actually
- * disable interrupts; disabling preemption is enough.
- */
-#define x86_read_percpu(var)						\
-	({								\
-		typeof(per_cpu_var(var)) __tmp;				\
-		preempt_disable();					\
-		__tmp = __get_cpu_var(var);				\
-		preempt_enable();					\
-		__tmp;							\
-	})
-
-#define x86_write_percpu(var, val)					\
-	do {								\
-		preempt_disable();					\
-		__get_cpu_var(var) = (val);				\
-		preempt_enable();					\
-	} while(0)
-
-#else /* CONFIG_X86_64 */
 
 #ifdef __ASSEMBLY__
 
@@ -73,42 +24,26 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
  *    PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
-#define PER_CPU(var, reg)				\
-	movl %fs:per_cpu__##this_cpu_off, reg;		\
+#define PER_CPU(var, reg)						\
+	__percpu_mov_op %__percpu_seg:per_cpu__this_cpu_off, reg;	\
 	lea per_cpu__##var(reg), reg
-#define PER_CPU_VAR(var)	%fs:per_cpu__##var
+#define PER_CPU_VAR(var)	%__percpu_seg:per_cpu__##var
 #else /* ! SMP */
-#define PER_CPU(var, reg)			\
-	movl $per_cpu__##var, reg
+#define PER_CPU(var, reg)						\
+	__percpu_mov_op $per_cpu__##var, reg
 #define PER_CPU_VAR(var)	per_cpu__##var
 #endif	/* SMP */
 
 #else /* ...!ASSEMBLY */
 
-/*
- * PER_CPU finds an address of a per-cpu variable.
- *
- * Args:
- *    var - variable name
- *    cpu - 32bit register containing the current CPU number
- *
- * The resulting address is stored in the "cpu" argument.
- *
- * Example:
- *    PER_CPU(cpu_gdt_descr, %ebx)
- */
-#ifdef CONFIG_SMP
-
-#define __my_cpu_offset x86_read_percpu(this_cpu_off)
-
-/* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
-#define __percpu_seg "%%fs:"
+#include <linux/stringify.h>
 
-#else  /* !SMP */
-
-#define __percpu_seg ""
-
-#endif	/* SMP */
+#ifdef CONFIG_SMP
+#define __percpu_seg_str	"%%"__stringify(__percpu_seg)":"
+#define __my_cpu_offset		x86_read_percpu(this_cpu_off)
+#else
+#define __percpu_seg_str
+#endif
 
 #include <asm-generic/percpu.h>
 
@@ -128,20 +63,25 @@ do {							\
 	}						\
 	switch (sizeof(var)) {				\
 	case 1:						\
-		asm(op "b %1,"__percpu_seg"%0"		\
+		asm(op "b %1,"__percpu_seg_str"%0"	\
 		    : "+m" (var)			\
 		    : "ri" ((T__)val));			\
 		break;					\
 	case 2:						\
-		asm(op "w %1,"__percpu_seg"%0"		\
+		asm(op "w %1,"__percpu_seg_str"%0"	\
 		    : "+m" (var)			\
 		    : "ri" ((T__)val));			\
 		break;					\
 	case 4:						\
-		asm(op "l %1,"__percpu_seg"%0"		\
+		asm(op "l %1,"__percpu_seg_str"%0"	\
 		    : "+m" (var)			\
 		    : "ri" ((T__)val));			\
 		break;					\
+	case 8:						\
+		asm(op "q %1,"__percpu_seg_str"%0"	\
+		    : "+m" (var)			\
+		    : "r" ((T__)val));			\
+		break;					\
 	default: __bad_percpu_size();			\
 	}						\
 } while (0)
@@ -151,17 +91,22 @@ do {							\
 	typeof(var) ret__;				\
 	switch (sizeof(var)) {				\
 	case 1:						\
-		asm(op "b "__percpu_seg"%1,%0"		\
+		asm(op "b "__percpu_seg_str"%1,%0"	\
 		    : "=r" (ret__)			\
 		    : "m" (var));			\
 		break;					\
 	case 2:						\
-		asm(op "w "__percpu_seg"%1,%0"		\
+		asm(op "w "__percpu_seg_str"%1,%0"	\
 		    : "=r" (ret__)			\
 		    : "m" (var));			\
 		break;					\
 	case 4:						\
-		asm(op "l "__percpu_seg"%1,%0"		\
+		asm(op "l "__percpu_seg_str"%1,%0"	\
+		    : "=r" (ret__)			\
+		    : "m" (var));			\
+		break;					\
+	case 8:						\
+		asm(op "q "__percpu_seg_str"%1,%0"	\
 		    : "=r" (ret__)			\
 		    : "m" (var));			\
 		break;					\
@@ -175,8 +120,14 @@ do {							\
 #define x86_add_percpu(var, val) percpu_to_op("add", per_cpu__##var, val)
 #define x86_sub_percpu(var, val) percpu_to_op("sub", per_cpu__##var, val)
 #define x86_or_percpu(var, val) percpu_to_op("or", per_cpu__##var, val)
+
+#ifdef CONFIG_X86_64
+extern void load_pda_offset(int cpu);
+#else
+static inline void load_pda_offset(int cpu) { }
+#endif
+
 #endif /* !__ASSEMBLY__ */
-#endif /* !CONFIG_X86_64 */
 
 #ifdef CONFIG_SMP
 
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index f8d1b04..f4cc81b 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -55,7 +55,6 @@ int main(void)
 	ENTRY(irqcount);
 	ENTRY(cpunumber);
 	ENTRY(irqstackptr);
-	ENTRY(data_offset);
 	DEFINE(pda_size, sizeof(struct x8664_pda));
 	BLANK();
 #undef ENTRY
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e28c7a9..4833f3a 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -52,6 +52,7 @@
 #include <asm/irqflags.h>
 #include <asm/paravirt.h>
 #include <asm/ftrace.h>
+#include <asm/percpu.h>
 
 /* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this.  */
 #include <linux/elf-em.h>
@@ -1072,10 +1073,10 @@ ENTRY(\sym)
 	TRACE_IRQS_OFF
 	movq %rsp,%rdi		/* pt_regs pointer */
 	xorl %esi,%esi		/* no error code */
-	movq %gs:pda_data_offset, %rbp
-	subq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
+	PER_CPU(init_tss, %rbp)
+	subq $EXCEPTION_STKSZ, TSS_ist + (\ist - 1) * 8(%rbp)
 	call \do_sym
-	addq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
+	addq $EXCEPTION_STKSZ, TSS_ist + (\ist - 1) * 8(%rbp)
 	jmp paranoid_exit	/* %ebx: no swapgs flag */
 	CFI_ENDPROC
 END(\sym)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index ff39bdd..9194074 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -38,8 +38,6 @@ void __init x86_64_init_pda(void)
 #else
 	cpu_pda(0) = &_boot_cpu_pda;
 #endif
-	cpu_pda(0)->data_offset =
-		(unsigned long)(__per_cpu_load - __per_cpu_start);
 	pda_init(0);
 }
 
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 24d3e0d..76a2498 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -105,14 +105,14 @@ static void __init setup_per_cpu_maps(void)
 #endif
 }
 
-#ifdef CONFIG_X86_32
-/*
- * Great future not-so-futuristic plan: make i386 and x86_64 do it
- * the same way
- */
+#ifdef CONFIG_X86_64
+unsigned long __per_cpu_offset[NR_CPUS] __read_mostly = {
+	[0] = (unsigned long)__per_cpu_load,
+};
+#else
 unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
-EXPORT_SYMBOL(__per_cpu_offset);
 #endif
+EXPORT_SYMBOL(__per_cpu_offset);
 
 /*
  * Great future plan:
@@ -159,6 +159,7 @@ void __init setup_per_cpu_areas(void)
 #endif
 
 		memcpy(ptr, __per_cpu_load, __per_cpu_end - __per_cpu_start);
+		per_cpu_offset(cpu) = ptr - __per_cpu_start;
 #ifdef CONFIG_X86_64
 		cpu_pda(cpu) = (void *)ptr;
 
@@ -171,7 +172,7 @@ void __init setup_per_cpu_areas(void)
 		else
 			memset(cpu_pda(cpu), 0, sizeof(*cpu_pda(cpu)));
 #endif
-		per_cpu_offset(cpu) = ptr - __per_cpu_start;
+		per_cpu(this_cpu_off, cpu) = per_cpu_offset(cpu);
 
 		DBG("PERCPU: cpu %4d %p\n", cpu, ptr);
 	}
diff --git a/arch/x86/kernel/smpcommon.c b/arch/x86/kernel/smpcommon.c
index 397e309..84395fa 100644
--- a/arch/x86/kernel/smpcommon.c
+++ b/arch/x86/kernel/smpcommon.c
@@ -4,10 +4,10 @@
 #include <linux/module.h>
 #include <asm/smp.h>
 
-#ifdef CONFIG_X86_32
 DEFINE_PER_CPU(unsigned long, this_cpu_off);
 EXPORT_PER_CPU_SYMBOL(this_cpu_off);
 
+#ifdef CONFIG_X86_32
 /*
  * Initialize the CPU's GDT.  This is either the boot CPU doing itself
  * (still using the master per-cpu area), or a CPU doing it for a
@@ -24,7 +24,6 @@ __cpuinit void init_gdt(int cpu)
 	write_gdt_entry(get_cpu_gdt_table(cpu),
 			GDT_ENTRY_PERCPU, &gdt, DESCTYPE_S);
 
-	per_cpu(this_cpu_off, cpu) = __per_cpu_offset[cpu];
 	per_cpu(cpu_number, cpu) = cpu;
 }
 #endif
diff --git a/arch/x86/mach-voyager/voyager_smp.c b/arch/x86/mach-voyager/voyager_smp.c
index 5214500..7d79a52 100644
--- a/arch/x86/mach-voyager/voyager_smp.c
+++ b/arch/x86/mach-voyager/voyager_smp.c
@@ -539,6 +539,7 @@ static void __init do_boot_cpu(__u8 cpu)
 	stack_start.sp = (void *)idle->thread.sp;
 
 	init_gdt(cpu);
+	per_cpu(this_cpu_off, cpu) = __per_cpu_offset[cpu];
 	per_cpu(current_task, cpu) = idle;
 	early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
 	irq_ctx_init(cpu);
@@ -1756,6 +1757,7 @@ static void __init voyager_smp_prepare_cpus(unsigned int max_cpus)
 static void __cpuinit voyager_smp_prepare_boot_cpu(void)
 {
 	init_gdt(smp_processor_id());
+	per_cpu(this_cpu_off, cpu) = __per_cpu_offset[cpu];
 	switch_to_new_gdt();
 
 	cpu_set(smp_processor_id(), cpu_online_map);
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 10/13] x86_64: make pda a percpu variable
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
                   ` (8 preceding siblings ...)
  2009-01-13 10:38 ` [PATCH 09/13] x86_64: merge 64 and 32 SMP percpu handling Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-13 10:38 ` [PATCH 11/13] x86_64: convert pda ops to wrappers around x86 percpu accessors Tejun Heo
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

As pda is now allocated in percpu area, it can easily be made a proper
percpu variable.  Make it so by defining per cpu symbol from linker
script and declaring it in C code for SMP and simply defining it for
UP.  This change cleans up code and brings SMP and UP closer a bit.

A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
per cpu area" and "x86_64: Reference zero-based percpu variables
offset from gs" patches.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Travis <travis@sgi.com>
---
 arch/x86/include/asm/pda.h       |    5 +++--
 arch/x86/kernel/cpu/common.c     |    3 ---
 arch/x86/kernel/head64.c         |   10 ----------
 arch/x86/kernel/head_64.S        |    5 +++--
 arch/x86/kernel/setup_percpu.c   |   16 ++++++++++++++--
 arch/x86/kernel/vmlinux_64.lds.S |    4 +++-
 6 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/pda.h b/arch/x86/include/asm/pda.h
index e91558e..66ae104 100644
--- a/arch/x86/include/asm/pda.h
+++ b/arch/x86/include/asm/pda.h
@@ -7,6 +7,7 @@
 #include <linux/cache.h>
 #include <linux/threads.h>
 #include <asm/page.h>
+#include <asm/percpu.h>
 
 /* Per processor datastructure. %gs points to it while the kernel runs */
 struct x8664_pda {
@@ -39,10 +40,10 @@ struct x8664_pda {
 	unsigned irq_spurious_count;
 } ____cacheline_aligned_in_smp;
 
-extern struct x8664_pda *_cpu_pda[NR_CPUS];
+DECLARE_PER_CPU(struct x8664_pda, __pda);
 extern void pda_init(int);
 
-#define cpu_pda(i) (_cpu_pda[i])
+#define cpu_pda(cpu)		(&per_cpu(__pda, cpu))
 
 /*
  * There is no fast way to get the base address of the PDA, all the accesses
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cdd419a..bd38a0f 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -859,9 +859,6 @@ __setup("clearcpuid=", setup_disablecpuid);
 cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE;
 
 #ifdef CONFIG_X86_64
-struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly;
-EXPORT_SYMBOL(_cpu_pda);
-
 struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table };
 
 static char boot_cpu_stack[IRQSTACKSIZE] __page_aligned_bss;
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 9194074..71b6f6e 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,18 +26,8 @@
 #include <asm/bios_ebda.h>
 #include <asm/trampoline.h>
 
-#ifndef CONFIG_SMP
-/* boot cpu pda, referenced by head_64.S to initialize %gs on UP */
-struct x8664_pda _boot_cpu_pda __read_mostly;
-#endif
-
 void __init x86_64_init_pda(void)
 {
-#ifdef CONFIG_SMP
-	cpu_pda(0) = (void *)__per_cpu_load;
-#else
-	cpu_pda(0) = &_boot_cpu_pda;
-#endif
 	pda_init(0);
 }
 
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 519185f..ce5975c 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -19,6 +19,7 @@
 #include <asm/msr.h>
 #include <asm/cache.h>
 #include <asm/processor-flags.h>
+#include <asm/percpu.h>
 
 #ifdef CONFIG_PARAVIRT
 #include <asm/asm-offsets.h>
@@ -250,7 +251,7 @@ ENTRY(secondary_startup_64)
 	 * secondary CPU,initial_gs should be set to its pda address
 	 * before the CPU runs this code.
 	 *
-	 * On UP, initial_gs points to _boot_cpu_pda and doesn't
+	 * On UP, initial_gs points to PER_CPU_VAR(__pda) and doesn't
 	 * change.
 	 */
 	movl	$MSR_GS_BASE,%ecx
@@ -284,7 +285,7 @@ ENTRY(secondary_startup_64)
 #ifdef CONFIG_SMP
 	.quad	__per_cpu_load
 #else
-	.quad	_boot_cpu_pda
+	.quad	PER_CPU_VAR(__pda)
 #endif
 	__FINITDATA
 
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 76a2498..9edc081 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -66,6 +66,16 @@ static void __init setup_node_to_cpumask_map(void);
 static inline void setup_node_to_cpumask_map(void) { }
 #endif
 
+/*
+ * Define load_pda_offset() and per-cpu __pda for x86_64.
+ * load_pda_offset() is responsible for loading the offset of pda into
+ * %gs.
+ *
+ * On SMP, pda offset also duals as percpu base address and thus it
+ * should be at the start of per-cpu area.  To achieve this, it's
+ * preallocated in vmlinux_64.lds.S directly instead of using
+ * DEFINE_PER_CPU().
+ */
 #ifdef CONFIG_X86_64
 void __cpuinit load_pda_offset(int cpu)
 {
@@ -74,6 +84,10 @@ void __cpuinit load_pda_offset(int cpu)
 	wrmsrl(MSR_GS_BASE, cpu_pda(cpu));
 	mb();
 }
+#ifndef CONFIG_SMP
+DEFINE_PER_CPU(struct x8664_pda, __pda);
+#endif
+EXPORT_PER_CPU_SYMBOL(__pda);
 #endif
 
 #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
@@ -161,8 +175,6 @@ void __init setup_per_cpu_areas(void)
 		memcpy(ptr, __per_cpu_load, __per_cpu_end - __per_cpu_start);
 		per_cpu_offset(cpu) = ptr - __per_cpu_start;
 #ifdef CONFIG_X86_64
-		cpu_pda(cpu) = (void *)ptr;
-
 		/*
 		 * CPU0 modified pda in the init data area, reload pda
 		 * offset for CPU0 and clear the area for others.
diff --git a/arch/x86/kernel/vmlinux_64.lds.S b/arch/x86/kernel/vmlinux_64.lds.S
index 962f21f..d2a0baa 100644
--- a/arch/x86/kernel/vmlinux_64.lds.S
+++ b/arch/x86/kernel/vmlinux_64.lds.S
@@ -217,10 +217,12 @@ SECTIONS
    * percpu offsets are zero-based on SMP.  PERCPU_VADDR() changes the
    * output PHDR, so the next output section - __data_nosave - should
    * switch it back to data.init.  Also, pda should be at the head of
-   * percpu area.  Preallocate it.
+   * percpu area.  Preallocate it and define the percpu offset symbol
+   * so that it can be accessed as a percpu variable.
    */
   . = ALIGN(PAGE_SIZE);
   PERCPU_VADDR_PREALLOC(0, :percpu, pda_size)
+  per_cpu____pda = __per_cpu_start;
 #else
   PERCPU(PAGE_SIZE)
 #endif
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 11/13] x86_64: convert pda ops to wrappers around x86 percpu accessors
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
                   ` (9 preceding siblings ...)
  2009-01-13 10:38 ` [PATCH 10/13] x86_64: make pda a percpu variable Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-13 10:38 ` [PATCH 12/13] x86_64: misc clean up after the percpu update Tejun Heo
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

pda is now a percpu variable and there's no reason it can't use plain
x86 percpu accessors.  Add x86_test_and_clear_bit_percpu() and replace
pda op implementations with wrappers around x86 percpu accessors.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 arch/x86/include/asm/pda.h       |   88 +++-----------------------------------
 arch/x86/include/asm/percpu.h    |   10 ++++
 arch/x86/kernel/vmlinux_64.lds.S |    1 -
 arch/x86/kernel/x8664_ksyms_64.c |    2 -
 4 files changed, 16 insertions(+), 85 deletions(-)

diff --git a/arch/x86/include/asm/pda.h b/arch/x86/include/asm/pda.h
index 66ae104..e3d3a08 100644
--- a/arch/x86/include/asm/pda.h
+++ b/arch/x86/include/asm/pda.h
@@ -45,91 +45,15 @@ extern void pda_init(int);
 
 #define cpu_pda(cpu)		(&per_cpu(__pda, cpu))
 
-/*
- * There is no fast way to get the base address of the PDA, all the accesses
- * have to mention %fs/%gs.  So it needs to be done this Torvaldian way.
- */
-extern void __bad_pda_field(void) __attribute__((noreturn));
-
-/*
- * proxy_pda doesn't actually exist, but tell gcc it is accessed for
- * all PDA accesses so it gets read/write dependencies right.
- */
-extern struct x8664_pda _proxy_pda;
-
-#define pda_offset(field) offsetof(struct x8664_pda, field)
-
-#define pda_to_op(op, field, val)					\
-do {									\
-	typedef typeof(_proxy_pda.field) T__;				\
-	if (0) { T__ tmp__; tmp__ = (val); }	/* type checking */	\
-	switch (sizeof(_proxy_pda.field)) {				\
-	case 2:								\
-		asm(op "w %1,%%gs:%c2" :				\
-		    "+m" (_proxy_pda.field) :				\
-		    "ri" ((T__)val),					\
-		    "i"(pda_offset(field)));				\
-		break;							\
-	case 4:								\
-		asm(op "l %1,%%gs:%c2" :				\
-		    "+m" (_proxy_pda.field) :				\
-		    "ri" ((T__)val),					\
-		    "i" (pda_offset(field)));				\
-		break;							\
-	case 8:								\
-		asm(op "q %1,%%gs:%c2":					\
-		    "+m" (_proxy_pda.field) :				\
-		    "r" ((T__)val),					\
-		    "i"(pda_offset(field)));				\
-		break;							\
-	default:							\
-		__bad_pda_field();					\
-	}								\
-} while (0)
-
-#define pda_from_op(op, field)			\
-({						\
-	typeof(_proxy_pda.field) ret__;		\
-	switch (sizeof(_proxy_pda.field)) {	\
-	case 2:					\
-		asm(op "w %%gs:%c1,%0" :	\
-		    "=r" (ret__) :		\
-		    "i" (pda_offset(field)),	\
-		    "m" (_proxy_pda.field));	\
-		break;				\
-	case 4:					\
-		asm(op "l %%gs:%c1,%0":		\
-		    "=r" (ret__):		\
-		    "i" (pda_offset(field)),	\
-		    "m" (_proxy_pda.field));	\
-		break;				\
-	case 8:					\
-		asm(op "q %%gs:%c1,%0":		\
-		    "=r" (ret__) :		\
-		    "i" (pda_offset(field)),	\
-		    "m" (_proxy_pda.field));	\
-		break;				\
-	default:				\
-		__bad_pda_field();		\
-	}					\
-	ret__;					\
-})
-
-#define read_pda(field)		pda_from_op("mov", field)
-#define write_pda(field, val)	pda_to_op("mov", field, val)
-#define add_pda(field, val)	pda_to_op("add", field, val)
-#define sub_pda(field, val)	pda_to_op("sub", field, val)
-#define or_pda(field, val)	pda_to_op("or", field, val)
+#define read_pda(field)		x86_read_percpu(__pda.field)
+#define write_pda(field, val)	x86_write_percpu(__pda.field, val)
+#define add_pda(field, val)	x86_add_percpu(__pda.field, val)
+#define sub_pda(field, val)	x86_sub_percpu(__pda.field, val)
+#define or_pda(field, val)	x86_or_percpu(__pda.field, val)
 
 /* This is not atomic against other CPUs -- CPU preemption needs to be off */
 #define test_and_clear_bit_pda(bit, field)				\
-({									\
-	int old__;							\
-	asm volatile("btr %2,%%gs:%c3\n\tsbbl %0,%0"			\
-		     : "=r" (old__), "+m" (_proxy_pda.field)		\
-		     : "dIr" (bit), "i" (pda_offset(field)) : "memory");\
-	old__;								\
-})
+	x86_test_and_clear_bit_percpu(bit, __pda.field)
 
 #endif
 
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 556f84b..328b31a 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -121,6 +121,16 @@ do {							\
 #define x86_sub_percpu(var, val) percpu_to_op("sub", per_cpu__##var, val)
 #define x86_or_percpu(var, val) percpu_to_op("or", per_cpu__##var, val)
 
+/* This is not atomic against other CPUs -- CPU preemption needs to be off */
+#define x86_test_and_clear_bit_percpu(bit, var)				\
+({									\
+	int old__;							\
+	asm volatile("btr %1,"__percpu_seg_str"%c2\n\tsbbl %0,%0"	\
+		     : "=r" (old__)					\
+		     : "dIr" (bit), "i" (&per_cpu__##var) : "memory");	\
+	old__;								\
+})
+
 #ifdef CONFIG_X86_64
 extern void load_pda_offset(int cpu);
 #else
diff --git a/arch/x86/kernel/vmlinux_64.lds.S b/arch/x86/kernel/vmlinux_64.lds.S
index d2a0baa..a09abb8 100644
--- a/arch/x86/kernel/vmlinux_64.lds.S
+++ b/arch/x86/kernel/vmlinux_64.lds.S
@@ -14,7 +14,6 @@ OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64")
 OUTPUT_ARCH(i386:x86-64)
 ENTRY(phys_startup_64)
 jiffies_64 = jiffies;
-_proxy_pda = 1;
 PHDRS {
 	text PT_LOAD FLAGS(5);	/* R_E */
 	data PT_LOAD FLAGS(7);	/* RWE */
diff --git a/arch/x86/kernel/x8664_ksyms_64.c b/arch/x86/kernel/x8664_ksyms_64.c
index 695e426..3909e3b 100644
--- a/arch/x86/kernel/x8664_ksyms_64.c
+++ b/arch/x86/kernel/x8664_ksyms_64.c
@@ -58,5 +58,3 @@ EXPORT_SYMBOL(__memcpy);
 EXPORT_SYMBOL(empty_zero_page);
 EXPORT_SYMBOL(init_level4_pgt);
 EXPORT_SYMBOL(load_gs_index);
-
-EXPORT_SYMBOL(_proxy_pda);
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 12/13] x86_64: misc clean up after the percpu update
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
                   ` (10 preceding siblings ...)
  2009-01-13 10:38 ` [PATCH 11/13] x86_64: convert pda ops to wrappers around x86 percpu accessors Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-13 10:38 ` [PATCH 13/13] x86_32: make percpu symbols zerobased on SMP Tejun Heo
  2009-01-13 10:48 ` [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based " Tejun Heo
  13 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

Do the following cleanups.

* kill x86_64_init_pda() which now is equivalent to pda_init()

* use per_cpu_offset() instead of cpu_pda() when initializing
  initial_gs

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Travis <travis@sgi.com>
---
 arch/x86/include/asm/setup.h |    1 -
 arch/x86/kernel/acpi/sleep.c |    2 +-
 arch/x86/kernel/head64.c     |    7 +------
 arch/x86/kernel/smpboot.c    |    2 +-
 arch/x86/xen/enlighten.c     |    2 +-
 5 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 4fcd53f..2f3e50e 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -100,7 +100,6 @@ extern unsigned long init_pg_tables_start;
 extern unsigned long init_pg_tables_end;
 
 #else
-void __init x86_64_init_pda(void);
 void __init x86_64_start_kernel(char *real_mode);
 void __init x86_64_start_reservations(char *real_mode_data);
 
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index fe23ff8..94c9b75 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -101,7 +101,7 @@ int acpi_save_state_mem(void)
 	stack_start.sp = temp_stack + sizeof(temp_stack);
 	early_gdt_descr.address =
 			(unsigned long)get_cpu_gdt_table(smp_processor_id());
-	initial_gs = (unsigned long)cpu_pda(smp_processor_id());
+	initial_gs = per_cpu_offset(smp_processor_id());
 #endif
 	initial_code = (unsigned long)wakeup_long64;
 	saved_magic = 0x123456789abcdef0;
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 71b6f6e..af67d32 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,11 +26,6 @@
 #include <asm/bios_ebda.h>
 #include <asm/trampoline.h>
 
-void __init x86_64_init_pda(void)
-{
-	pda_init(0);
-}
-
 static void __init zap_identity_mappings(void)
 {
 	pgd_t *pgd = pgd_offset_k(0UL);
@@ -96,7 +91,7 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	if (console_loglevel == 10)
 		early_printk("Kernel alive\n");
 
-	x86_64_init_pda();
+	pda_init(0);
 
 	x86_64_start_reservations(real_mode_data);
 }
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3a0fec6..3179bb9 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -804,7 +804,7 @@ do_rest:
 #else
 	cpu_pda(cpu)->pcurrent = c_idle.idle;
 	clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
-	initial_gs = (unsigned long)cpu_pda(cpu);
+	initial_gs = per_cpu_offset(cpu);
 #endif
 	early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
 	initial_code = (unsigned long)start_secondary;
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index bea2152..76e092d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1652,7 +1652,7 @@ asmlinkage void __init xen_start_kernel(void)
 #ifdef CONFIG_X86_64
 	/* Disable until direct per-cpu data access. */
 	have_vcpu_info_placement = 0;
-	x86_64_init_pda();
+	pda_init(0);
 #endif
 
 	xen_smp_init();
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 13/13] x86_32: make percpu symbols zerobased on SMP
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
                   ` (11 preceding siblings ...)
  2009-01-13 10:38 ` [PATCH 12/13] x86_64: misc clean up after the percpu update Tejun Heo
@ 2009-01-13 10:38 ` Tejun Heo
  2009-01-14  0:18   ` Rusty Russell
  2009-01-13 10:48 ` [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based " Tejun Heo
  13 siblings, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:38 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh
  Cc: Tejun Heo

This patch makes percpu symbols zerobased on x86_32 SMP by using
PERCPU_VADDR() with 0 vaddr in vmlinux_32.lds.S.  A new PHDR is added
as existing ones cannot contain sections near address zero.

The following adjustments have been made to accomodate the address
change.

* code to locate percpu gdt_page in head_64.S is updated to add the
  load address to the gdt_page offset.

* for boot CPU, we can't use __KERNEL_DS for %fs.  initialize
  gdt_page.gdt[GDT_ENTRY_PERCPU] with flags and limit and set the base
  address to __per_cpu_load from head_32.S and load it.  the base
  address needs to be set manually because linker can't shift bits as
  required for segment descriptors.

* __per_cpu_offset[0] is now initialized to __per_cpu_load like on
  x86_64.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/cpu/common.c     |    8 ++++++++
 arch/x86/kernel/head_32.S        |   37 ++++++++++++++++++++++++++++++++++---
 arch/x86/kernel/setup_percpu.c   |    4 ----
 arch/x86/kernel/vmlinux_32.lds.S |   16 +++++++++++++++-
 4 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index bd38a0f..376e142 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -90,7 +90,15 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
 	[GDT_ENTRY_APMBIOS_BASE+2] = { { { 0x0000ffff, 0x00409200 } } },
 
 	[GDT_ENTRY_ESPFIX_SS] = { { { 0x00000000, 0x00c09200 } } },
+#ifdef CONFIG_SMP
+	/*
+	 * Linker can't handle the address bit shifting.  Address will
+	 * be set in head_32.S for boot CPU and init_gdt for others.
+	 */
+	[GDT_ENTRY_PERCPU] = { { { 0x0000ffff, 0x00cf9200 } } },
+#else
 	[GDT_ENTRY_PERCPU] = { { { 0x00000000, 0x00000000 } } },
+#endif
 } };
 #endif
 EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index e835b4e..3284199 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -19,6 +19,7 @@
 #include <asm/asm-offsets.h>
 #include <asm/setup.h>
 #include <asm/processor-flags.h>
+#include <asm/percpu.h>
 
 /* Physical address */
 #define pa(X) ((X) - __PAGE_OFFSET)
@@ -424,12 +425,39 @@ is386:	movl $2,%ecx		# set MP
 	movl %eax,%cr0
 
 	call check_x87
+#ifdef CONFIG_SMP
+	/*
+	 * early_gdt_base should point to the gdt_page in static percpu init
+	 * data area.  Computing this requires two symbols - __per_cpu_load
+	 * and per_cpu__gdt_page.  As linker can't do no such relocation, do
+	 * it by hand.  As early_gdt_descr is manipulated by C code for
+	 * secondary CPUs, this should be done only once for the boot CPU
+	 * when early_gdt_descr_base contains zero.
+	 *
+	 * Also, manually load __per_cpu_load into address fields of PERCPU
+	 * segment descriptor for boot CPU.  Linker can't do it.
+	 */
+	movl	early_gdt_descr_base, %eax
+	testl	%eax, %eax
+	jnz	3f
+	movl	$__per_cpu_load, %eax
+	movl	%eax, %ecx
+	addl	$per_cpu__gdt_page, %eax
+	movl	%eax, early_gdt_descr_base
+
+	movw	%cx, 8 * GDT_ENTRY_PERCPU + 2(%eax)
+	shrl	$16, %ecx
+	movb	%cl, 8 * GDT_ENTRY_PERCPU + 4(%eax)
+	movb	%ch, 8 * GDT_ENTRY_PERCPU + 7(%eax)
+3:
+#endif /* CONFIG_SMP */
 	lgdt early_gdt_descr
 	lidt idt_descr
 	ljmp $(__KERNEL_CS),$1f
 1:	movl $(__KERNEL_DS),%eax	# reload all the segment registers
 	movl %eax,%ss			# after changing gdt.
-	movl %eax,%fs			# gets reset once there's real percpu
+	movl $(__KERNEL_PERCPU),%eax
+	movl %eax,%fs			# will be reloaded for boot cpu
 
 	movl $(__USER_DS),%eax		# DS/ES contains default USER segment
 	movl %eax,%ds
@@ -446,8 +474,6 @@ is386:	movl $2,%ecx		# set MP
 	movb $1, ready
 	cmpb $0,%cl		# the first CPU calls start_kernel
 	je   1f
-	movl $(__KERNEL_PERCPU), %eax
-	movl %eax,%fs		# set this cpu's percpu
 	movl (stack_start), %esp
 1:
 #endif /* CONFIG_SMP */
@@ -702,7 +728,12 @@ idt_descr:
 	.word 0				# 32 bit align gdt_desc.address
 ENTRY(early_gdt_descr)
 	.word GDT_ENTRIES*8-1
+#ifdef CONFIG_SMP
+early_gdt_descr_base:
+	.long 0x00000000
+#else
 	.long per_cpu__gdt_page		/* Overwritten for secondary CPUs */
+#endif
 
 /*
  * The boot_gdt must mirror the equivalent in setup.S and is
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 9edc081..6f47227 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -119,13 +119,9 @@ static void __init setup_per_cpu_maps(void)
 #endif
 }
 
-#ifdef CONFIG_X86_64
 unsigned long __per_cpu_offset[NR_CPUS] __read_mostly = {
 	[0] = (unsigned long)__per_cpu_load,
 };
-#else
-unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
-#endif
 EXPORT_SYMBOL(__per_cpu_offset);
 
 /*
diff --git a/arch/x86/kernel/vmlinux_32.lds.S b/arch/x86/kernel/vmlinux_32.lds.S
index 3eba7f7..03abadf 100644
--- a/arch/x86/kernel/vmlinux_32.lds.S
+++ b/arch/x86/kernel/vmlinux_32.lds.S
@@ -24,6 +24,9 @@ jiffies = jiffies_64;
 PHDRS {
 	text PT_LOAD FLAGS(5);	/* R_E */
 	data PT_LOAD FLAGS(7);	/* RWE */
+#ifdef CONFIG_SMP
+	percpu PT_LOAD FLAGS(7);	/* RWE */
+#endif
 	note PT_NOTE FLAGS(0);	/* ___ */
 }
 SECTIONS
@@ -178,7 +181,18 @@ SECTIONS
 	__initramfs_end = .;
   }
 #endif
+
+#ifdef CONFIG_SMP
+  /*
+   * percpu offsets are zero-based on SMP.  PERCPU_VADDR() changes the
+   * output PHDR, so the next output section - bss - should switch it
+   * back to data.
+   */
+  . = ALIGN(PAGE_SIZE);
+  PERCPU_VADDR(0, :percpu)
+#else
   PERCPU(PAGE_SIZE)
+#endif
   . = ALIGN(PAGE_SIZE);
   /* freed after init ends here */
 
@@ -193,7 +207,7 @@ SECTIONS
 	/* This is where the kernel creates the early boot page tables */
 	. = ALIGN(PAGE_SIZE);
 	pg0 = . ;
-  }
+  } :data	/* switch back to data, see PERCPU_VADDR() above */
 
   /* Sections to be discarded */
   /DISCARD/ : {
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* Re: [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP
  2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
                   ` (12 preceding siblings ...)
  2009-01-13 10:38 ` [PATCH 13/13] x86_32: make percpu symbols zerobased on SMP Tejun Heo
@ 2009-01-13 10:48 ` Tejun Heo
  2009-01-13 13:27   ` Brian Gerst
  13 siblings, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 10:48 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh

Tejun Heo wrote:
> Each step was tested with gcc-4.1.3, 4.2.3 and 4.3.1 on x86_64, 4.3.1
> on i386 for both SMP and UP configurations.  The final state is
> currently being tested using 4.2.3 on both x86_64 and 32.

Aieee.. two sentences mixed up in my head.  Correction:

Each step was tested with gcc-4.3.1 for both x86_64 and 32.  The final
step for x86_64 has been tested with gcc-4.1.3, 4.2.3 and 4.3.1.
Currently, I'm burn-testing both x86_64 and 32 using gcc-4.2.3 with
all patches applied.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 02/13 UPDATED] x86: make early_per_cpu() a lvalue and use it
  2009-01-13 10:38 ` [PATCH 02/13] x86: make early_per_cpu() a lvalue and use it Tejun Heo
@ 2009-01-13 11:43   ` Tejun Heo
  0 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 11:43 UTC (permalink / raw)
  To: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh

Make early_per_cpu() a lvalue as per_cpu() is and use it where
applicable.

Meriusz spotted that early_per_cpu() conversion in topology.h was
missing @cpu argument.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mariusz Ceier <mceier@gmail.com>
---
git tree has been regenerated accordingly.

Thanks.

 arch/x86/include/asm/percpu.h   |    6 +++---
 arch/x86/include/asm/topology.h |    5 +----
 arch/x86/kernel/apic.c          |   13 ++-----------
 3 files changed, 6 insertions(+), 18 deletions(-)

Index: work/arch/x86/include/asm/topology.h
===================================================================
--- work.orig/arch/x86/include/asm/topology.h
+++ work/arch/x86/include/asm/topology.h
@@ -96,10 +96,7 @@ static inline int cpu_to_node(int cpu)
 /* Same function but used if called before per_cpu areas are setup */
 static inline int early_cpu_to_node(int cpu)
 {
-	if (early_per_cpu_ptr(x86_cpu_to_node_map))
-		return early_per_cpu_ptr(x86_cpu_to_node_map)[cpu];
-
-	return per_cpu(x86_cpu_to_node_map, cpu);
+	return early_per_cpu(x86_cpu_to_node_map, cpu);
 }
 
 /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
Index: work/arch/x86/kernel/apic.c
===================================================================
--- work.orig/arch/x86/kernel/apic.c
+++ work/arch/x86/kernel/apic.c
@@ -1865,17 +1865,8 @@ void __cpuinit generic_processor_info(in
 #endif
 
 #if defined(CONFIG_X86_SMP) || defined(CONFIG_X86_64)
-	/* are we being called early in kernel startup? */
-	if (early_per_cpu_ptr(x86_cpu_to_apicid)) {
-		u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid);
-		u16 *bios_cpu_apicid = early_per_cpu_ptr(x86_bios_cpu_apicid);
-
-		cpu_to_apicid[cpu] = apicid;
-		bios_cpu_apicid[cpu] = apicid;
-	} else {
-		per_cpu(x86_cpu_to_apicid, cpu) = apicid;
-		per_cpu(x86_bios_cpu_apicid, cpu) = apicid;
-	}
+	early_per_cpu(x86_cpu_to_apicid, cpu) = apicid;
+	early_per_cpu(x86_bios_cpu_apicid, cpu) = apicid;
 #endif
 
 	cpu_set(cpu, cpu_possible_map);
Index: work/arch/x86/include/asm/percpu.h
===================================================================
--- work.orig/arch/x86/include/asm/percpu.h
+++ work/arch/x86/include/asm/percpu.h
@@ -195,9 +195,9 @@ do {							\
 #define	early_per_cpu_ptr(_name) (_name##_early_ptr)
 #define	early_per_cpu_map(_name, _idx) (_name##_early_map[_idx])
 #define	early_per_cpu(_name, _cpu) 				\
-	(early_per_cpu_ptr(_name) ?				\
-		early_per_cpu_ptr(_name)[_cpu] :		\
-		per_cpu(_name, _cpu))
+	*(early_per_cpu_ptr(_name) ?				\
+		&early_per_cpu_ptr(_name)[_cpu] :		\
+		&per_cpu(_name, _cpu))
 
 #else	/* !CONFIG_SMP */
 #define	DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)		\

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP
  2009-01-13 10:48 ` [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based " Tejun Heo
@ 2009-01-13 13:27   ` Brian Gerst
  2009-01-13 14:05     ` Tejun Heo
  0 siblings, 1 reply; 109+ messages in thread
From: Brian Gerst @ 2009-01-13 13:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh

On Tue, Jan 13, 2009 at 5:48 AM, Tejun Heo <tj@kernel.org> wrote:
> Tejun Heo wrote:
>> Each step was tested with gcc-4.1.3, 4.2.3 and 4.3.1 on x86_64, 4.3.1
>> on i386 for both SMP and UP configurations.  The final state is
>> currently being tested using 4.2.3 on both x86_64 and 32.
>
> Aieee.. two sentences mixed up in my head.  Correction:
>
> Each step was tested with gcc-4.3.1 for both x86_64 and 32.  The final
> step for x86_64 has been tested with gcc-4.1.3, 4.2.3 and 4.3.1.
> Currently, I'm burn-testing both x86_64 and 32 using gcc-4.2.3 with
> all patches applied.
>
> Thanks.

I've been working on a patchset that does something similar, but
eliminating the PDA completely.  The start is already in tip/x86/pda.
The plan is to change all PDA variables to be normal per-cpu
variables, merging with 32-bit where possible.  Once the PDA is empty,
I'll base %gs at the start of the per-cpu area.  I've been working out
the bugs with the last patch (zero-basing the percpu area) before
submitting, but I probably won't have the time until this weekend to
polish it off.  I could submit all but the last patch if you'd like.
They are functionally correct, but because the per-cpu area isn't
zero-based yet the generated code is a bit bloated due to having to
calculate the delta for the %gs offset.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP
  2009-01-13 13:27   ` Brian Gerst
@ 2009-01-13 14:05     ` Tejun Heo
  2009-01-13 14:26       ` Brian Gerst
  2009-01-14  6:57       ` H. Peter Anvin
  0 siblings, 2 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 14:05 UTC (permalink / raw)
  To: Brian Gerst
  Cc: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh

Hello, Brian.

Brian Gerst wrote:
> I've been working on a patchset that does something similar, but
> eliminating the PDA completely.  The start is already in tip/x86/pda.
> The plan is to change all PDA variables to be normal per-cpu
> variables, merging with 32-bit where possible.

I think the two changes aren't exclusive at all.  The order of things
could be different but in the end, yeah, zero-based percpu symbols w/
mostly empty pda is the goal.

> Once the PDA is empty, I'll base %gs at the start of the per-cpu
> area.  I've been working out the bugs with the last patch
> (zero-basing the percpu area) before submitting, but I probably
> won't have the time until this weekend to polish it off.  I could
> submit all but the last patch if you'd like.

Any chance you can rebase those patches on top of mine?  If you don't
have time, just send them to me, I'll try to integrate them this week.

> They are functionally correct, but because the per-cpu area isn't
> zero-based yet the generated code is a bit bloated due to having to
> calculate the delta for the %gs offset.

BTW, what did you do about the dreaded stack_canary?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP
  2009-01-13 14:05     ` Tejun Heo
@ 2009-01-13 14:26       ` Brian Gerst
  2009-01-13 14:37         ` Tejun Heo
  2009-01-14  6:57       ` H. Peter Anvin
  1 sibling, 1 reply; 109+ messages in thread
From: Brian Gerst @ 2009-01-13 14:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh

On Tue, Jan 13, 2009 at 9:05 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Brian.
>
> Brian Gerst wrote:
>> I've been working on a patchset that does something similar, but
>> eliminating the PDA completely.  The start is already in tip/x86/pda.
>> The plan is to change all PDA variables to be normal per-cpu
>> variables, merging with 32-bit where possible.
>
> I think the two changes aren't exclusive at all.  The order of things
> could be different but in the end, yeah, zero-based percpu symbols w/
> mostly empty pda is the goal.
>
>> Once the PDA is empty, I'll base %gs at the start of the per-cpu
>> area.  I've been working out the bugs with the last patch
>> (zero-basing the percpu area) before submitting, but I probably
>> won't have the time until this weekend to polish it off.  I could
>> submit all but the last patch if you'd like.
>
> Any chance you can rebase those patches on top of mine?  If you don't
> have time, just send them to me, I'll try to integrate them this week.

Are your patches available via git by chance?

>> They are functionally correct, but because the per-cpu area isn't
>> zero-based yet the generated code is a bit bloated due to having to
>> calculate the delta for the %gs offset.
>
> BTW, what did you do about the dreaded stack_canary?

I put the irqstack at the start of the per-cpu area and overlaid the
canary on the bottom 48 bytes of the 16k stack.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP
  2009-01-13 14:26       ` Brian Gerst
@ 2009-01-13 14:37         ` Tejun Heo
  0 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-13 14:37 UTC (permalink / raw)
  To: Brian Gerst
  Cc: ebiederm, cl, rusty, mingo, travis, linux-kernel, hpa, akpm,
	steiner, hugh

Brian Gerst wrote:
>> Any chance you can rebase those patches on top of mine?  If you don't
>> have time, just send them to me, I'll try to integrate them this week.
> 
> Are your patches available via git by chance?

Sure,

  http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=x86-percpu-zerobased
  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86-percpu-zerobased

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 13/13] x86_32: make percpu symbols zerobased on SMP
  2009-01-13 10:38 ` [PATCH 13/13] x86_32: make percpu symbols zerobased on SMP Tejun Heo
@ 2009-01-14  0:18   ` Rusty Russell
  2009-01-14  2:03     ` Tejun Heo
  0 siblings, 1 reply; 109+ messages in thread
From: Rusty Russell @ 2009-01-14  0:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: ebiederm, cl, mingo, travis, linux-kernel, hpa, akpm, steiner,
	hugh

On Tuesday 13 January 2009 21:08:17 Tejun Heo wrote:
> This patch makes percpu symbols zerobased on x86_32 SMP by using
> PERCPU_VADDR() with 0 vaddr in vmlinux_32.lds.S.  A new PHDR is added
> as existing ones cannot contain sections near address zero.
...
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Mike Travis <travis@sgi.com>
> ---
>  arch/x86/kernel/cpu/common.c     |    8 ++++++++
>  arch/x86/kernel/head_32.S        |   37 ++++++++++++++++++++++++++++++++++---
>  arch/x86/kernel/setup_percpu.c   |    4 ----
>  arch/x86/kernel/vmlinux_32.lds.S |   16 +++++++++++++++-
>  4 files changed, 57 insertions(+), 8 deletions(-)

Hmm, the only reason for this change is to unify with 64-bit, yes?  Yet it
doesn't actually win us anything on that front, as this diffstat shows.

If gcc's -mcmodel=kernel had used a weak symbol for the offset of the stack
canary, we would have been happy.  Unfortunately generic per-cpu and x86-64
PDA were developed separately, so noone realize the problem until too late.

The basic series looks good: it will clash with my per-cpu work (mainly
because I remove the per_cpu__ prefix) in a purely-textual way though.

Thanks for this!
Rusty.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 13/13] x86_32: make percpu symbols zerobased on SMP
  2009-01-14  0:18   ` Rusty Russell
@ 2009-01-14  2:03     ` Tejun Heo
  0 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-14  2:03 UTC (permalink / raw)
  To: Rusty Russell
  Cc: ebiederm, cl, mingo, travis, linux-kernel, hpa, akpm, steiner,
	hugh

Hello, Rusty.

Rusty Russell wrote:
> On Tuesday 13 January 2009 21:08:17 Tejun Heo wrote:
>> This patch makes percpu symbols zerobased on x86_32 SMP by using
>> PERCPU_VADDR() with 0 vaddr in vmlinux_32.lds.S.  A new PHDR is added
>> as existing ones cannot contain sections near address zero.
> ...
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> Cc: Mike Travis <travis@sgi.com>
>> ---
>>  arch/x86/kernel/cpu/common.c     |    8 ++++++++
>>  arch/x86/kernel/head_32.S        |   37 ++++++++++++++++++++++++++++++++++---
>>  arch/x86/kernel/setup_percpu.c   |    4 ----
>>  arch/x86/kernel/vmlinux_32.lds.S |   16 +++++++++++++++-
>>  4 files changed, 57 insertions(+), 8 deletions(-)
> 
> Hmm, the only reason for this change is to unify with 64-bit, yes?  Yet it
> doesn't actually win us anything on that front, as this diffstat shows.

Yeah and to simplify things for future changes.  In this patch, the
only actual unification is in __per_cpu_offset initialization but the
lack of unification is partly due to the usage of pda and differences
in initialization sequence, which again couldn't be unified because
percpu handling was different, so it's a tied knot and this patch
helps untangling it.  Also, with cpualloc (or whatever) scheduled, I
think it's better to have two percpu models, up and smp, even if it
costs 49 more lines.

> If gcc's -mcmodel=kernel had used a weak symbol for the offset of the stack
> canary, we would have been happy.  Unfortunately generic per-cpu and x86-64
> PDA were developed separately, so noone realize the problem until too late.

And we seem to be stuck with it if we want to keep compatibility with
compilers.  :-(

> The basic series looks good: it will clash with my per-cpu work (mainly
> because I remove the per_cpu__ prefix) in a purely-textual way though.

I thought about removing all explicit per_cpu__ prefixes with
per_cpu_var() but the usage seemed prevalent so left it alone.  Why
remove it BTW?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP
  2009-01-13 14:05     ` Tejun Heo
  2009-01-13 14:26       ` Brian Gerst
@ 2009-01-14  6:57       ` H. Peter Anvin
  2009-01-14  9:38         ` [patch] add optimized generic percpu accessors Ingo Molnar
  1 sibling, 1 reply; 109+ messages in thread
From: H. Peter Anvin @ 2009-01-14  6:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Brian Gerst, ebiederm, cl, rusty, mingo, travis, linux-kernel,
	akpm, steiner, hugh

Tejun Heo wrote:
> 
> I think the two changes aren't exclusive at all.  The order of things
> could be different but in the end, yeah, zero-based percpu symbols w/
> mostly empty pda is the goal.
> 
>> Once the PDA is empty, I'll base %gs at the start of the per-cpu
>> area.  I've been working out the bugs with the last patch
>> (zero-basing the percpu area) before submitting, but I probably
>> won't have the time until this weekend to polish it off.  I could
>> submit all but the last patch if you'd like.
> 
> Any chance you can rebase those patches on top of mine?  If you don't
> have time, just send them to me, I'll try to integrate them this week.
> 

A merged tree here would be absolutely wonderful.  I've kept an eye on
the discussion so far, but it looks like you guys are handling it fine.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [patch] add optimized generic percpu accessors
  2009-01-14  6:57       ` H. Peter Anvin
@ 2009-01-14  9:38         ` Ingo Molnar
  2009-01-14  9:45           ` Ingo Molnar
                             ` (4 more replies)
  0 siblings, 5 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-14  9:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tejun Heo, Brian Gerst, ebiederm, cl, rusty, travis, linux-kernel,
	akpm, steiner, hugh


* H. Peter Anvin <hpa@zytor.com> wrote:

> Tejun Heo wrote:
> > 
> > I think the two changes aren't exclusive at all.  The order of things
> > could be different but in the end, yeah, zero-based percpu symbols w/
> > mostly empty pda is the goal.
> > 
> >> Once the PDA is empty, I'll base %gs at the start of the per-cpu
> >> area.  I've been working out the bugs with the last patch
> >> (zero-basing the percpu area) before submitting, but I probably
> >> won't have the time until this weekend to polish it off.  I could
> >> submit all but the last patch if you'd like.
> > 
> > Any chance you can rebase those patches on top of mine?  If you don't
> > have time, just send them to me, I'll try to integrate them this week.
> > 
> 
> A merged tree here would be absolutely wonderful.  I've kept an eye on 
> the discussion so far, but it looks like you guys are handling it fine.

agreed, it looks really good.

Tejun, could you please also add the patch below to your lineup too?

It is an optimization and a cleanup, and adds the following new generic 
percpu methods:

  percpu_read()
  percpu_write()
  percpu_add()
  percpu_sub()
  percpu_or() 
  percpu_xor()

and implements support for them on x86. (other architectures will fall 
back to a default implementation)

The advantage is that for example to read a local percpu variable, instead 
of this sequence:

 return __get_cpu_var(var);

 ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
 ffffffff8102ca32:	81 
 ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
 ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax

We can get a single instruction by using the optimized variants:

 return percpu_read(var);

 ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax

I also cleaned up the x86-specific APIs and made the x86 code use these 
new generic percpu primitives.

It looks quite hard to convince the compiler to generate the optimized 
single-instruction sequence for us out of __get_cpu_var(var) - or can you 
perhaps see a way to do it?

The patch is against your latest zero-based percpu / pda unification tree. 
Untested.

	Ingo

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/current.h        |    2 +-
 arch/x86/include/asm/irq_regs_32.h    |    4 ++--
 arch/x86/include/asm/mmu_context_32.h |   12 ++++++------
 arch/x86/include/asm/pda.h            |   10 +++++-----
 arch/x86/include/asm/percpu.h         |   23 ++++++++++++-----------
 arch/x86/include/asm/smp.h            |    2 +-
 arch/x86/kernel/process_32.c          |    2 +-
 arch/x86/kernel/tlb_32.c              |   10 +++++-----
 arch/x86/mach-voyager/voyager_smp.c   |    4 ++--
 arch/x86/xen/enlighten.c              |   14 +++++++-------
 arch/x86/xen/irq.c                    |    8 ++++----
 arch/x86/xen/mmu.c                    |    2 +-
 arch/x86/xen/multicalls.h             |    2 +-
 arch/x86/xen/smp.c                    |    2 +-
 include/asm-generic/percpu.h          |   28 ++++++++++++++++++++++++++++
 16 files changed, 90 insertions(+), 48 deletions(-)

Index: linux/arch/x86/include/asm/current.h
===================================================================
--- linux.orig/arch/x86/include/asm/current.h
+++ linux/arch/x86/include/asm/current.h
@@ -10,7 +10,7 @@ struct task_struct;
 DECLARE_PER_CPU(struct task_struct *, current_task);
 static __always_inline struct task_struct *get_current(void)
 {
-	return x86_read_percpu(current_task);
+	return percpu_read(current_task);
 }
 
 #else /* X86_32 */
Index: linux/arch/x86/include/asm/irq_regs_32.h
===================================================================
--- linux.orig/arch/x86/include/asm/irq_regs_32.h
+++ linux/arch/x86/include/asm/irq_regs_32.h
@@ -15,7 +15,7 @@ DECLARE_PER_CPU(struct pt_regs *, irq_re
 
 static inline struct pt_regs *get_irq_regs(void)
 {
-	return x86_read_percpu(irq_regs);
+	return percpu_read(irq_regs);
 }
 
 static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs)
@@ -23,7 +23,7 @@ static inline struct pt_regs *set_irq_re
 	struct pt_regs *old_regs;
 
 	old_regs = get_irq_regs();
-	x86_write_percpu(irq_regs, new_regs);
+	percpu_write(irq_regs, new_regs);
 
 	return old_regs;
 }
Index: linux/arch/x86/include/asm/mmu_context_32.h
===================================================================
--- linux.orig/arch/x86/include/asm/mmu_context_32.h
+++ linux/arch/x86/include/asm/mmu_context_32.h
@@ -4,8 +4,8 @@
 static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 {
 #ifdef CONFIG_SMP
-	if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK)
-		x86_write_percpu(cpu_tlbstate.state, TLBSTATE_LAZY);
+	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
+		percpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
 #endif
 }
 
@@ -19,8 +19,8 @@ static inline void switch_mm(struct mm_s
 		/* stop flush ipis for the previous mm */
 		cpu_clear(cpu, prev->cpu_vm_mask);
 #ifdef CONFIG_SMP
-		x86_write_percpu(cpu_tlbstate.state, TLBSTATE_OK);
-		x86_write_percpu(cpu_tlbstate.active_mm, next);
+		percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
+		percpu_write(cpu_tlbstate.active_mm, next);
 #endif
 		cpu_set(cpu, next->cpu_vm_mask);
 
@@ -35,8 +35,8 @@ static inline void switch_mm(struct mm_s
 	}
 #ifdef CONFIG_SMP
 	else {
-		x86_write_percpu(cpu_tlbstate.state, TLBSTATE_OK);
-		BUG_ON(x86_read_percpu(cpu_tlbstate.active_mm) != next);
+		percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
+		BUG_ON(percpu_read(cpu_tlbstate.active_mm) != next);
 
 		if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) {
 			/* We were in lazy tlb mode and leave_mm disabled
Index: linux/arch/x86/include/asm/pda.h
===================================================================
--- linux.orig/arch/x86/include/asm/pda.h
+++ linux/arch/x86/include/asm/pda.h
@@ -45,11 +45,11 @@ extern void pda_init(int);
 
 #define cpu_pda(cpu)		(&per_cpu(__pda, cpu))
 
-#define read_pda(field)		x86_read_percpu(__pda.field)
-#define write_pda(field, val)	x86_write_percpu(__pda.field, val)
-#define add_pda(field, val)	x86_add_percpu(__pda.field, val)
-#define sub_pda(field, val)	x86_sub_percpu(__pda.field, val)
-#define or_pda(field, val)	x86_or_percpu(__pda.field, val)
+#define read_pda(field)		percpu_read(__pda.field)
+#define write_pda(field, val)	percpu_write(__pda.field, val)
+#define add_pda(field, val)	percpu_add(__pda.field, val)
+#define sub_pda(field, val)	percpu_sub(__pda.field, val)
+#define or_pda(field, val)	percpu_or(__pda.field, val)
 
 /* This is not atomic against other CPUs -- CPU preemption needs to be off */
 #define test_and_clear_bit_pda(bit, field)				\
Index: linux/arch/x86/include/asm/percpu.h
===================================================================
--- linux.orig/arch/x86/include/asm/percpu.h
+++ linux/arch/x86/include/asm/percpu.h
@@ -40,16 +40,11 @@
 
 #ifdef CONFIG_SMP
 #define __percpu_seg_str	"%%"__stringify(__percpu_seg)":"
-#define __my_cpu_offset		x86_read_percpu(this_cpu_off)
+#define __my_cpu_offset		percpu_read(this_cpu_off)
 #else
 #define __percpu_seg_str
 #endif
 
-#include <asm-generic/percpu.h>
-
-/* We can use this directly for local CPU (faster). */
-DECLARE_PER_CPU(unsigned long, this_cpu_off);
-
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
@@ -115,11 +110,12 @@ do {							\
 	ret__;						\
 })
 
-#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var)
-#define x86_write_percpu(var, val) percpu_to_op("mov", per_cpu__##var, val)
-#define x86_add_percpu(var, val) percpu_to_op("add", per_cpu__##var, val)
-#define x86_sub_percpu(var, val) percpu_to_op("sub", per_cpu__##var, val)
-#define x86_or_percpu(var, val) percpu_to_op("or", per_cpu__##var, val)
+#define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
+#define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
+#define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
+#define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
+#define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
+#define percpu_xor(var, val)	percpu_to_op("xor", per_cpu__##var, val)
 
 /* This is not atomic against other CPUs -- CPU preemption needs to be off */
 #define x86_test_and_clear_bit_percpu(bit, var)				\
@@ -131,6 +127,11 @@ do {							\
 	old__;								\
 })
 
+#include <asm-generic/percpu.h>
+
+/* We can use this directly for local CPU (faster). */
+DECLARE_PER_CPU(unsigned long, this_cpu_off);
+
 #ifdef CONFIG_X86_64
 extern void load_pda_offset(int cpu);
 #else
Index: linux/arch/x86/include/asm/smp.h
===================================================================
--- linux.orig/arch/x86/include/asm/smp.h
+++ linux/arch/x86/include/asm/smp.h
@@ -163,7 +163,7 @@ extern unsigned disabled_cpus __cpuinitd
  * from the initial startup. We map APIC_BASE very early in page_setup(),
  * so this is correct in the x86 case.
  */
-#define raw_smp_processor_id() (x86_read_percpu(cpu_number))
+#define raw_smp_processor_id() (percpu_read(cpu_number))
 extern int safe_smp_processor_id(void);
 
 #elif defined(CONFIG_X86_64_SMP)
Index: linux/arch/x86/kernel/process_32.c
===================================================================
--- linux.orig/arch/x86/kernel/process_32.c
+++ linux/arch/x86/kernel/process_32.c
@@ -592,7 +592,7 @@ __switch_to(struct task_struct *prev_p, 
 	if (prev->gs | next->gs)
 		loadsegment(gs, next->gs);
 
-	x86_write_percpu(current_task, next_p);
+	percpu_write(current_task, next_p);
 
 	return prev_p;
 }
Index: linux/arch/x86/kernel/tlb_32.c
===================================================================
--- linux.orig/arch/x86/kernel/tlb_32.c
+++ linux/arch/x86/kernel/tlb_32.c
@@ -34,8 +34,8 @@ static DEFINE_SPINLOCK(tlbstate_lock);
  */
 void leave_mm(int cpu)
 {
-	BUG_ON(x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK);
-	cpu_clear(cpu, x86_read_percpu(cpu_tlbstate.active_mm)->cpu_vm_mask);
+	BUG_ON(percpu_read(cpu_tlbstate.state) == TLBSTATE_OK);
+	cpu_clear(cpu, percpu_read(cpu_tlbstate.active_mm)->cpu_vm_mask);
 	load_cr3(swapper_pg_dir);
 }
 EXPORT_SYMBOL_GPL(leave_mm);
@@ -103,8 +103,8 @@ void smp_invalidate_interrupt(struct pt_
 		 * BUG();
 		 */
 
-	if (flush_mm == x86_read_percpu(cpu_tlbstate.active_mm)) {
-		if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK) {
+	if (flush_mm == percpu_read(cpu_tlbstate.active_mm)) {
+		if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK) {
 			if (flush_va == TLB_FLUSH_ALL)
 				local_flush_tlb();
 			else
@@ -237,7 +237,7 @@ static void do_flush_tlb_all(void *info)
 	unsigned long cpu = smp_processor_id();
 
 	__flush_tlb_all();
-	if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_LAZY)
+	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_LAZY)
 		leave_mm(cpu);
 }
 
Index: linux/arch/x86/mach-voyager/voyager_smp.c
===================================================================
--- linux.orig/arch/x86/mach-voyager/voyager_smp.c
+++ linux/arch/x86/mach-voyager/voyager_smp.c
@@ -410,7 +410,7 @@ void __init find_smp_config(void)
 	     VOYAGER_SUS_IN_CONTROL_PORT);
 
 	current_thread_info()->cpu = boot_cpu_id;
-	x86_write_percpu(cpu_number, boot_cpu_id);
+	percpu_write(cpu_number, boot_cpu_id);
 }
 
 /*
@@ -1790,7 +1790,7 @@ static void __init voyager_smp_cpus_done
 void __init smp_setup_processor_id(void)
 {
 	current_thread_info()->cpu = hard_smp_processor_id();
-	x86_write_percpu(cpu_number, hard_smp_processor_id());
+	percpu_write(cpu_number, hard_smp_processor_id());
 }
 
 static void voyager_send_call_func(cpumask_t callmask)
Index: linux/arch/x86/xen/enlighten.c
===================================================================
--- linux.orig/arch/x86/xen/enlighten.c
+++ linux/arch/x86/xen/enlighten.c
@@ -702,17 +702,17 @@ static void xen_write_cr0(unsigned long 
 
 static void xen_write_cr2(unsigned long cr2)
 {
-	x86_read_percpu(xen_vcpu)->arch.cr2 = cr2;
+	percpu_read(xen_vcpu)->arch.cr2 = cr2;
 }
 
 static unsigned long xen_read_cr2(void)
 {
-	return x86_read_percpu(xen_vcpu)->arch.cr2;
+	return percpu_read(xen_vcpu)->arch.cr2;
 }
 
 static unsigned long xen_read_cr2_direct(void)
 {
-	return x86_read_percpu(xen_vcpu_info.arch.cr2);
+	return percpu_read(xen_vcpu_info.arch.cr2);
 }
 
 static void xen_write_cr4(unsigned long cr4)
@@ -725,12 +725,12 @@ static void xen_write_cr4(unsigned long 
 
 static unsigned long xen_read_cr3(void)
 {
-	return x86_read_percpu(xen_cr3);
+	return percpu_read(xen_cr3);
 }
 
 static void set_current_cr3(void *v)
 {
-	x86_write_percpu(xen_current_cr3, (unsigned long)v);
+	percpu_write(xen_current_cr3, (unsigned long)v);
 }
 
 static void __xen_write_cr3(bool kernel, unsigned long cr3)
@@ -755,7 +755,7 @@ static void __xen_write_cr3(bool kernel,
 	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
 
 	if (kernel) {
-		x86_write_percpu(xen_cr3, cr3);
+		percpu_write(xen_cr3, cr3);
 
 		/* Update xen_current_cr3 once the batch has actually
 		   been submitted. */
@@ -771,7 +771,7 @@ static void xen_write_cr3(unsigned long 
 
 	/* Update while interrupts are disabled, so its atomic with
 	   respect to ipis */
-	x86_write_percpu(xen_cr3, cr3);
+	percpu_write(xen_cr3, cr3);
 
 	__xen_write_cr3(true, cr3);
 
Index: linux/arch/x86/xen/irq.c
===================================================================
--- linux.orig/arch/x86/xen/irq.c
+++ linux/arch/x86/xen/irq.c
@@ -39,7 +39,7 @@ static unsigned long xen_save_fl(void)
 	struct vcpu_info *vcpu;
 	unsigned long flags;
 
-	vcpu = x86_read_percpu(xen_vcpu);
+	vcpu = percpu_read(xen_vcpu);
 
 	/* flag has opposite sense of mask */
 	flags = !vcpu->evtchn_upcall_mask;
@@ -62,7 +62,7 @@ static void xen_restore_fl(unsigned long
 	   make sure we're don't switch CPUs between getting the vcpu
 	   pointer and updating the mask. */
 	preempt_disable();
-	vcpu = x86_read_percpu(xen_vcpu);
+	vcpu = percpu_read(xen_vcpu);
 	vcpu->evtchn_upcall_mask = flags;
 	preempt_enable_no_resched();
 
@@ -83,7 +83,7 @@ static void xen_irq_disable(void)
 	   make sure we're don't switch CPUs between getting the vcpu
 	   pointer and updating the mask. */
 	preempt_disable();
-	x86_read_percpu(xen_vcpu)->evtchn_upcall_mask = 1;
+	percpu_read(xen_vcpu)->evtchn_upcall_mask = 1;
 	preempt_enable_no_resched();
 }
 
@@ -96,7 +96,7 @@ static void xen_irq_enable(void)
 	   the caller is confused and is trying to re-enable interrupts
 	   on an indeterminate processor. */
 
-	vcpu = x86_read_percpu(xen_vcpu);
+	vcpu = percpu_read(xen_vcpu);
 	vcpu->evtchn_upcall_mask = 0;
 
 	/* Doesn't matter if we get preempted here, because any
Index: linux/arch/x86/xen/mmu.c
===================================================================
--- linux.orig/arch/x86/xen/mmu.c
+++ linux/arch/x86/xen/mmu.c
@@ -1074,7 +1074,7 @@ static void drop_other_mm_ref(void *info
 
 	/* If this cpu still has a stale cr3 reference, then make sure
 	   it has been flushed. */
-	if (x86_read_percpu(xen_current_cr3) == __pa(mm->pgd)) {
+	if (percpu_read(xen_current_cr3) == __pa(mm->pgd)) {
 		load_cr3(swapper_pg_dir);
 		arch_flush_lazy_cpu_mode();
 	}
Index: linux/arch/x86/xen/multicalls.h
===================================================================
--- linux.orig/arch/x86/xen/multicalls.h
+++ linux/arch/x86/xen/multicalls.h
@@ -39,7 +39,7 @@ static inline void xen_mc_issue(unsigned
 		xen_mc_flush();
 
 	/* restore flags saved in xen_mc_batch */
-	local_irq_restore(x86_read_percpu(xen_mc_irq_flags));
+	local_irq_restore(percpu_read(xen_mc_irq_flags));
 }
 
 /* Set up a callback to be called when the current batch is flushed */
Index: linux/arch/x86/xen/smp.c
===================================================================
--- linux.orig/arch/x86/xen/smp.c
+++ linux/arch/x86/xen/smp.c
@@ -78,7 +78,7 @@ static __cpuinit void cpu_bringup(void)
 	xen_setup_cpu_clockevents();
 
 	cpu_set(cpu, cpu_online_map);
-	x86_write_percpu(cpu_state, CPU_ONLINE);
+	percpu_write(cpu_state, CPU_ONLINE);
 	wmb();
 
 	/* We can take interrupts now: we're officially "up". */
Index: linux/include/asm-generic/percpu.h
===================================================================
--- linux.orig/include/asm-generic/percpu.h
+++ linux/include/asm-generic/percpu.h
@@ -80,4 +80,32 @@ extern void setup_per_cpu_areas(void);
 #define DECLARE_PER_CPU(type, name) extern PER_CPU_ATTRIBUTES \
 					__typeof__(type) per_cpu_var(name)
 
+/*
+ * Optional methods for optimized non-lvalue per-cpu variable access:
+ */
+
+#ifndef percpu_read
+# define percpu_read(var)		__get_cpu_var(var)
+#endif
+
+#ifndef percpu_write
+# define percpu_write(var, val)		({ __get_cpu_var(var) = (val); })
+#endif
+
+#ifndef percpu_add
+# define percpu_add(var, val)		({ __get_cpu_var(var) += (val); })
+#endif
+
+#ifndef percpu_sub
+# define percpu_add(var, val)		({ __get_cpu_var(var) += (val); })
+#endif
+
+#ifndef percpu_or
+# define percpu_or(var, val)		({ __get_cpu_var(var) |= (val); })
+#endif
+
+#ifndef percpu_xor
+# define percpu_xor(var, val)		({ __get_cpu_var(var) ^= (val); })
+#endif
+
 #endif /* _ASM_GENERIC_PERCPU_H_ */

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-14  9:38         ` [patch] add optimized generic percpu accessors Ingo Molnar
@ 2009-01-14  9:45           ` Ingo Molnar
       [not found]           ` <200901151253.44016.rusty@rustcorp.com.au>
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-14  9:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tejun Heo, Brian Gerst, ebiederm, cl, rusty, travis, linux-kernel,
	akpm, steiner, hugh


* Ingo Molnar <mingo@elte.hu> wrote:

> Tejun, could you please also add the patch below to your lineup too?
> 
> It is an optimization and a cleanup, and adds the following new generic 
> percpu methods:
> 
>   percpu_read()
>   percpu_write()
>   percpu_add()
>   percpu_sub()
>   percpu_or() 
>   percpu_xor()
> 
> and implements support for them on x86. (other architectures will fall 
> back to a default implementation)

btw., this patch means that we can move Brian's original PDA cleanups 
fully and without kernel size bloat: instead of pda_read() we can use 
percpu_read(), and not bloat the kernel at all.

and thus the pda ops can go away completely.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
       [not found]           ` <200901151253.44016.rusty@rustcorp.com.au>
@ 2009-01-15  9:55             ` Ingo Molnar
  2009-01-15 10:27             ` Tejun Heo
  1 sibling, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15  9:55 UTC (permalink / raw)
  To: Rusty Russell
  Cc: H. Peter Anvin, Tejun Heo, Brian Gerst, ebiederm, cl, travis,
	linux-kernel, akpm, steiner, hugh


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Wednesday 14 January 2009 20:08:34 Ingo Molnar wrote:
> > 
> > * H. Peter Anvin <hpa@zytor.com> wrote:
> > 
> > > Tejun Heo wrote:
> > > > 
> > > > I think the two changes aren't exclusive at all.  The order of things
> > > > could be different but in the end, yeah, zero-based percpu symbols w/
> > > > mostly empty pda is the goal.
> > > > 
> > > >> Once the PDA is empty, I'll base %gs at the start of the per-cpu
> > > >> area.  I've been working out the bugs with the last patch
> > > >> (zero-basing the percpu area) before submitting, but I probably
> > > >> won't have the time until this weekend to polish it off.  I could
> > > >> submit all but the last patch if you'd like.
> > > > 
> > > > Any chance you can rebase those patches on top of mine?  If you don't
> > > > have time, just send them to me, I'll try to integrate them this week.
> > > > 
> > > 
> > > A merged tree here would be absolutely wonderful.  I've kept an eye on 
> > > the discussion so far, but it looks like you guys are handling it fine.
> > 
> > agreed, it looks really good.
> > 
> > Tejun, could you please also add the patch below to your lineup too?
> > 
> > It is an optimization and a cleanup, and adds the following new generic 
> > percpu methods:
> > 
> >   percpu_read()
> 
> I already have this in my patch series.

hm, where is that exactly? Do you have a commit ID (or could send a patch) 
that i could have a look at?

> Frankly, it starts to get marginal in generic code after percpu_read(), 
> so I didn't do them (and the per-cpu interfaces are already pretty 
> wide).
> 
> Tejun, is now a good time to rebase my alloc_percpu series on top of 
> yours?  I'm more than happy to hand them over to someone with more 
> cycles...

We can pick it up into a topic in -tip if you dont mind.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-14  9:38         ` [patch] add optimized generic percpu accessors Ingo Molnar
  2009-01-14  9:45           ` Ingo Molnar
       [not found]           ` <200901151253.44016.rusty@rustcorp.com.au>
@ 2009-01-15 10:04           ` roel kluin
  2009-01-15 10:26             ` Tejun Heo
  2009-01-15 10:26           ` Tejun Heo
  2009-01-31 10:30           ` Jeremy Fitzhardinge
  4 siblings, 1 reply; 109+ messages in thread
From: roel kluin @ 2009-01-15 10:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Tejun Heo, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

> Index: linux/include/asm-generic/percpu.h
> ===================================================================
> --- linux.orig/include/asm-generic/percpu.h
> +++ linux/include/asm-generic/percpu.h
> @@ -80,4 +80,32 @@ extern void setup_per_cpu_areas(void);
>  #define DECLARE_PER_CPU(type, name) extern PER_CPU_ATTRIBUTES \
>                                        __typeof__(type) per_cpu_var(name)
>
> +/*
> + * Optional methods for optimized non-lvalue per-cpu variable access:
> + */
> +
> +#ifndef percpu_read
> +# define percpu_read(var)              __get_cpu_var(var)
> +#endif
> +
> +#ifndef percpu_write
> +# define percpu_write(var, val)                ({ __get_cpu_var(var) = (val); })
> +#endif
> +
> +#ifndef percpu_add
> +# define percpu_add(var, val)          ({ __get_cpu_var(var) += (val); })
> +#endif
> +
> +#ifndef percpu_sub
> +# define percpu_add(var, val)          ({ __get_cpu_var(var) += (val); })

this should be:

define percpu_sub(var, val)          ({ __get_cpu_var(var) -= (val); })

> +#endif
> +
> +#ifndef percpu_or
> +# define percpu_or(var, val)           ({ __get_cpu_var(var) |= (val); })
> +#endif
> +
> +#ifndef percpu_xor
> +# define percpu_xor(var, val)          ({ __get_cpu_var(var) ^= (val); })
> +#endif
> +
>  #endif /* _ASM_GENERIC_PERCPU_H_ */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-14  9:38         ` [patch] add optimized generic percpu accessors Ingo Molnar
                             ` (2 preceding siblings ...)
  2009-01-15 10:04           ` roel kluin
@ 2009-01-15 10:26           ` Tejun Heo
  2009-01-15 11:30             ` Ingo Molnar
  2009-01-31 10:30           ` Jeremy Fitzhardinge
  4 siblings, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-15 10:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty, travis,
	linux-kernel, akpm, steiner, hugh

Hello, Ingo.

Ingo Molnar wrote:
> Tejun, could you please also add the patch below to your lineup too?

Sure thing.

> It is an optimization and a cleanup, and adds the following new generic 
> percpu methods:
> 
>   percpu_read()
>   percpu_write()
>   percpu_add()
>   percpu_sub()
>   percpu_or() 
>   percpu_xor()
> 
> and implements support for them on x86. (other architectures will fall 
> back to a default implementation)
> 
> The advantage is that for example to read a local percpu variable, instead 
> of this sequence:
> 
>  return __get_cpu_var(var);
> 
>  ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
>  ffffffff8102ca32:	81 
>  ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
>  ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax
> 
> We can get a single instruction by using the optimized variants:
> 
>  return percpu_read(var);
> 
>  ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax
> 
> I also cleaned up the x86-specific APIs and made the x86 code use these 
> new generic percpu primitives.
> 
> It looks quite hard to convince the compiler to generate the optimized 
> single-instruction sequence for us out of __get_cpu_var(var) - or can you 
> perhaps see a way to do it?

Yeah, I thought about that too but couldn't think of a way to persuade
the compiler because the compiler doesn't know how to access the
address.  I'll play with it a bit more but the clumsy percpu_*()
accessors probably might be the only way.  :-(

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 10:04           ` roel kluin
@ 2009-01-15 10:26             ` Tejun Heo
  2009-01-15 11:32               ` Ingo Molnar
  0 siblings, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-15 10:26 UTC (permalink / raw)
  To: roel kluin
  Cc: Ingo Molnar, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

roel kluin wrote:
>> Index: linux/include/asm-generic/percpu.h
>> ===================================================================
>> --- linux.orig/include/asm-generic/percpu.h
>> +++ linux/include/asm-generic/percpu.h
>> @@ -80,4 +80,32 @@ extern void setup_per_cpu_areas(void);
>>  #define DECLARE_PER_CPU(type, name) extern PER_CPU_ATTRIBUTES \
>>                                        __typeof__(type) per_cpu_var(name)
>>
>> +/*
>> + * Optional methods for optimized non-lvalue per-cpu variable access:
>> + */
>> +
>> +#ifndef percpu_read
>> +# define percpu_read(var)              __get_cpu_var(var)
>> +#endif
>> +
>> +#ifndef percpu_write
>> +# define percpu_write(var, val)                ({ __get_cpu_var(var) = (val); })
>> +#endif
>> +
>> +#ifndef percpu_add
>> +# define percpu_add(var, val)          ({ __get_cpu_var(var) += (val); })
>> +#endif
>> +
>> +#ifndef percpu_sub
>> +# define percpu_add(var, val)          ({ __get_cpu_var(var) += (val); })
> 
> this should be:
> 
> define percpu_sub(var, val)          ({ __get_cpu_var(var) -= (val); })

Thanks.  Will fold into the patch.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
       [not found]           ` <200901151253.44016.rusty@rustcorp.com.au>
  2009-01-15  9:55             ` Ingo Molnar
@ 2009-01-15 10:27             ` Tejun Heo
  1 sibling, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-15 10:27 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, H. Peter Anvin, Brian Gerst, ebiederm, cl, travis,
	linux-kernel, akpm, steiner, hugh

Rusty Russell wrote:
>> percpu_read()
> 
> I already have this in my patch series.
> 
> Frankly, it starts to get marginal in generic code after percpu_read(),
> so I didn't do them (and the per-cpu interfaces are already pretty wide).
> 
> Tejun, is now a good time to rebase my alloc_percpu series on top of
> yours? I'm more than happy to hand them over to someone with more cycles...

Sure, send them my way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 10:26           ` Tejun Heo
@ 2009-01-15 11:30             ` Ingo Molnar
  2009-01-15 11:38               ` Tejun Heo
  0 siblings, 1 reply; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 11:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty, travis,
	linux-kernel, akpm, steiner, hugh


* Tejun Heo <tj@kernel.org> wrote:

> Hello, Ingo.
> 
> Ingo Molnar wrote:
> > Tejun, could you please also add the patch below to your lineup too?
> 
> Sure thing.
> 
> > It is an optimization and a cleanup, and adds the following new generic 
> > percpu methods:
> > 
> >   percpu_read()
> >   percpu_write()
> >   percpu_add()
> >   percpu_sub()
> >   percpu_or() 
> >   percpu_xor()
> > 
> > and implements support for them on x86. (other architectures will fall 
> > back to a default implementation)
> > 
> > The advantage is that for example to read a local percpu variable, instead 
> > of this sequence:
> > 
> >  return __get_cpu_var(var);
> > 
> >  ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
> >  ffffffff8102ca32:	81 
> >  ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
> >  ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax
> > 
> > We can get a single instruction by using the optimized variants:
> > 
> >  return percpu_read(var);
> > 
> >  ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax
> > 
> > I also cleaned up the x86-specific APIs and made the x86 code use these 
> > new generic percpu primitives.
> > 
> > It looks quite hard to convince the compiler to generate the optimized 
> > single-instruction sequence for us out of __get_cpu_var(var) - or can you 
> > perhaps see a way to do it?
> 
> Yeah, I thought about that too but couldn't think of a way to persuade 
> the compiler because the compiler doesn't know how to access the 
> address.  I'll play with it a bit more but the clumsy percpu_*() 
> accessors probably might be the only way.  :-(

the new ops are a pretty nice and clean solution i think.

Firstly, accessing the current CPU is the only safe shortcut anyway (there 
is where we can do %fs/%gs / rip-relative addressing modes), and the 
generic per_cpu() APIs dont really provide that guarantee for us. We might 
be able to hook into __get_cpu_var() but those both require to be an 
lvalue and are also relatively rarely used.

So introducing the new, rather straightforward APIs and using them 
wherever they matter for performance is good. Your patchset already shaved 
off an instruction from ordinary per_cpu() accesses, so it's all moving 
rather close to the most-optimal situation already.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 10:26             ` Tejun Heo
@ 2009-01-15 11:32               ` Ingo Molnar
  2009-01-15 11:36                 ` Tejun Heo
  0 siblings, 1 reply; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 11:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh


* Tejun Heo <tj@kernel.org> wrote:

> roel kluin wrote:
> >> Index: linux/include/asm-generic/percpu.h
> >> ===================================================================
> >> --- linux.orig/include/asm-generic/percpu.h
> >> +++ linux/include/asm-generic/percpu.h
> >> @@ -80,4 +80,32 @@ extern void setup_per_cpu_areas(void);
> >>  #define DECLARE_PER_CPU(type, name) extern PER_CPU_ATTRIBUTES \
> >>                                        __typeof__(type) per_cpu_var(name)
> >>
> >> +/*
> >> + * Optional methods for optimized non-lvalue per-cpu variable access:
> >> + */
> >> +
> >> +#ifndef percpu_read
> >> +# define percpu_read(var)              __get_cpu_var(var)
> >> +#endif
> >> +
> >> +#ifndef percpu_write
> >> +# define percpu_write(var, val)                ({ __get_cpu_var(var) = (val); })
> >> +#endif
> >> +
> >> +#ifndef percpu_add
> >> +# define percpu_add(var, val)          ({ __get_cpu_var(var) += (val); })
> >> +#endif
> >> +
> >> +#ifndef percpu_sub
> >> +# define percpu_add(var, val)          ({ __get_cpu_var(var) += (val); })
> > 
> > this should be:
> > 
> > define percpu_sub(var, val)          ({ __get_cpu_var(var) -= (val); })

well spotted!

> Thanks.  Will fold into the patch.

thanks!

Is there any interim tree for us to pull into -tip? I'd rather not let 
this grow too big, it will be harder and harder to debug any regressions. 
Gradual progress is a lot more debuggable. Your initial patchset is 
fantastic already (gives a ~0.2% kernel image size saving for defconfig), 
so it's a very good start.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 11:32               ` Ingo Molnar
@ 2009-01-15 11:36                 ` Tejun Heo
  2009-01-15 12:22                   ` Ingo Molnar
  0 siblings, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-15 11:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

Ingo Molnar wrote:
> Is there any interim tree for us to pull into -tip? I'd rather not let 
> this grow too big, it will be harder and harder to debug any regressions. 
> Gradual progress is a lot more debuggable. Your initial patchset is 
> fantastic already (gives a ~0.2% kernel image size saving for defconfig), 
> so it's a very good start.

Sans the last patch (I'm still working on it), the git tree is at...

  http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=x86-percpu-zerobased
  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86-percpu-zerobased

And this happens to be the third time I wrote the above addresses in
this thread.  :-)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 11:30             ` Ingo Molnar
@ 2009-01-15 11:38               ` Tejun Heo
  2009-01-15 12:26                 ` Ingo Molnar
       [not found]                 ` <200901170748.53734.rusty@rustcorp.com.au>
  0 siblings, 2 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-15 11:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty, travis,
	linux-kernel, akpm, steiner, hugh

Hello,

Ingo Molnar wrote:
> The new ops are a pretty nice and clean solution i think.
> 
> Firstly, accessing the current CPU is the only safe shortcut anyway (there 
> is where we can do %fs/%gs / rip-relative addressing modes), and the 
> generic per_cpu() APIs dont really provide that guarantee for us. We might 
> be able to hook into __get_cpu_var() but those both require to be an 
> lvalue and are also relatively rarely used.
> 
> So introducing the new, rather straightforward APIs and using them 
> wherever they matter for performance is good. Your patchset already shaved 
> off an instruction from ordinary per_cpu() accesses, so it's all moving 
> rather close to the most-optimal situation already.

Yeah, I don't think we can do much better than those ops.  I have two
issues tho.

1. percpu_and() is missing.  I added it for completeness's sake.

2. The generic percpu_op() should be local to the cpu, so it should
   expand to...

   do { get_cpu_var(var) OP (val); put_cpu_var(var) } while (0)

   as the original x86_OP_percpu() did.  Right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 11:36                 ` Tejun Heo
@ 2009-01-15 12:22                   ` Ingo Molnar
  2009-01-15 13:09                     ` Tejun Heo
  2009-01-15 20:31                     ` Christoph Lameter
  0 siblings, 2 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 12:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh


* Tejun Heo <tj@kernel.org> wrote:

> Ingo Molnar wrote:
> > Is there any interim tree for us to pull into -tip? I'd rather not let 
> > this grow too big, it will be harder and harder to debug any regressions. 
> > Gradual progress is a lot more debuggable. Your initial patchset is 
> > fantastic already (gives a ~0.2% kernel image size saving for defconfig), 
> > so it's a very good start.
> 
> Sans the last patch (I'm still working on it), the git tree is at...
> 
>   http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=x86-percpu-zerobased
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86-percpu-zerobased
> 
> And this happens to be the third time I wrote the above addresses in 
> this thread.  :-)

You know the drill: you have to explicitly ask for stuff to be pulled :-)

( Otherwise if i just pull it and you rebased it or were unhappy about it 
  in some fashion we all add stress that could have been avoided. )

So i've picked up these commits from you:

 d060676: x86_64: misc clean up after the percpu update
 6d459d9: x86_64: convert pda ops to wrappers around x86 percpu accessors
 d7051ef: x86_64: make pda a percpu variable
 4fe7fdf: x86_64: merge 64 and 32 SMP percpu handling
 44f5fbd: x86_64: fold pda into percpu area on SMP
 cc1d354: x86_64: use static _cpu_pda array
 c701268: x86_64: load pointer to pda into %gs while brining up a CPU
 7e36da9: x86_64: make percpu symbols zerobased on SMP
 3fc860d: x86_32: make vmlinux_32.lds.S use PERCPU() macro
 bc497c7: x86_64: Cleanup early setup_percpu references
 769d780: x86: make early_per_cpu() a lvalue and use it
 66cbc8e: x86_64: fix pda_to_op()

into tip/x86/percpu.

I did the following small reorganizations:

 - i rebased them on top of tip/cpus4096, tip/x86/cleanups and 
   tip/x86/urgent - that is a stable base.

 - i resolved the conflicts that arose due to recent cpumask and upstream 
   changes.

 - i standardized the commit logs to the usual x86 style and added 
   Original-From: Mike Travis tags to those patches that were derived from 
   Mike's patches.

If they pass testing they'll be stable commits from that point on, and you 
can base your git trees on that topic branch.

Could you please have a look at the end result? Here are the Git 
coordinates for it:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86/percpu

If we are happy with it then these commits can be stable commits from now 
on, and you can base your future changes on this Git tree, and i can pull 
updates from you without having to rebase them.

Does that workflow sound OK to you?

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 11:38               ` Tejun Heo
@ 2009-01-15 12:26                 ` Ingo Molnar
  2009-01-15 13:04                   ` Tejun Heo
       [not found]                 ` <200901170748.53734.rusty@rustcorp.com.au>
  1 sibling, 1 reply; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 12:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty, travis,
	linux-kernel, akpm, steiner, hugh


* Tejun Heo <tj@kernel.org> wrote:

> Hello,
> 
> Ingo Molnar wrote:
> > The new ops are a pretty nice and clean solution i think.
> > 
> > Firstly, accessing the current CPU is the only safe shortcut anyway (there 
> > is where we can do %fs/%gs / rip-relative addressing modes), and the 
> > generic per_cpu() APIs dont really provide that guarantee for us. We might 
> > be able to hook into __get_cpu_var() but those both require to be an 
> > lvalue and are also relatively rarely used.
> > 
> > So introducing the new, rather straightforward APIs and using them 
> > wherever they matter for performance is good. Your patchset already shaved 
> > off an instruction from ordinary per_cpu() accesses, so it's all moving 
> > rather close to the most-optimal situation already.
> 
> Yeah, I don't think we can do much better than those ops.  I have two
> issues tho.
> 
> 1. percpu_and() is missing.  I added it for completeness's sake.

Sure - it would be commonly used as well. Perhaps we dont need 
percpu_xor() at all? (or and and ops already give a complete algebra)

> 2. The generic percpu_op() should be local to the cpu, so it should
>    expand to...
> 
>    do { get_cpu_var(var) OP (val); put_cpu_var(var) } while (0)
> 
>    as the original x86_OP_percpu() did.  Right?
> 
> Thanks.

hm, that removes much of its appeal - a preempt off+on sequence is quite 
bloaty. Most percpu usage sites are already within critical sections.

I think they are more analogous to per_cpu(var, cpu), which does not 
disable preemption either. There's no 'get/put' in them, which signals 
that they dont auto-disable preemption.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 12:26                 ` Ingo Molnar
@ 2009-01-15 13:04                   ` Tejun Heo
  2009-01-15 13:07                     ` Ingo Molnar
  2009-01-15 13:59                     ` [patch] " roel kluin
  0 siblings, 2 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-15 13:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty, travis,
	linux-kernel, akpm, steiner, hugh

Ingo Molnar wrote:
>> Yeah, I don't think we can do much better than those ops.  I have two
>> issues tho.
>>
>> 1. percpu_and() is missing.  I added it for completeness's sake.
> 
> Sure - it would be commonly used as well. Perhaps we dont need 
> percpu_xor() at all? (or and and ops already give a complete algebra)

Maybe... I don't know.

>> 2. The generic percpu_op() should be local to the cpu, so it should
>>    expand to...
>>
>>    do { get_cpu_var(var) OP (val); put_cpu_var(var) } while (0)
>>
>>    as the original x86_OP_percpu() did.  Right?
>>
>> Thanks.
> 
> hm, that removes much of its appeal - a preempt off+on sequence is quite 
> bloaty. Most percpu usage sites are already within critical sections.
> 
> I think they are more analogous to per_cpu(var, cpu), which does not 
> disable preemption either. There's no 'get/put' in them, which signals 
> that they dont auto-disable preemption.

Yeah, that's exactly the reason why x86 has those specific ops in the
first place because if they can be made single instruction, there's no
need to mess with preemption thus reducing the cost of the operations
greatly but for generic implementation it should be guarded by
preemption disable/enable to guarantee correct operation to maintain
the same/correct semantics (being preempted inbetween the load and
store instructions implementing += can be disastrous for example).

>From the comment I added on top of the generic ops....

/*
 * Optional methods for optimized non-lvalue per-cpu variable access.
 *
 * @var can be a percpu variable or a field of it and its size should
 * equal char, int or long.  percpu_read() evaluates to a lvalue and
 * all others to void.
 *
 * These operations are guaranteed to be atomic w.r.t. preemption.
 * The generic versions use plain get/put_cpu_var().  Archs are
 * encouraged to implement single-instruction alternatives which don't
 * require preemption protection.
 */

And the updated patch looks like...

---
From: Ingo Molnar <mingo@elte.hu>
Subject: percpu: add optimized generic percpu accessors

It is an optimization and a cleanup, and adds the following new
generic percpu methods:

  percpu_read()
  percpu_write()
  percpu_add()
  percpu_sub()
  percpu_or() 
  percpu_xor()

and implements support for them on x86. (other architectures will fall
back to a default implementation)

The advantage is that for example to read a local percpu variable,
instead of this sequence:

 return __get_cpu_var(var);

 ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
 ffffffff8102ca32:	81 
 ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
 ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax

We can get a single instruction by using the optimized variants:

 return percpu_read(var);

 ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax

I also cleaned up the x86-specific APIs and made the x86 code use
these new generic percpu primitives.

tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
    * added percpu_and() for completeness's sake
    * made generic percpu ops atomic against preemption

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Roel Kluin <roel.kluin@gmail.com>
---
 arch/x86/include/asm/current.h        |    2 -
 arch/x86/include/asm/irq_regs_32.h    |    4 +-
 arch/x86/include/asm/mmu_context_32.h |   12 +++----
 arch/x86/include/asm/pda.h            |   10 +++---
 arch/x86/include/asm/percpu.h         |   24 ++++++++-------
 arch/x86/include/asm/smp.h            |    2 -
 arch/x86/kernel/process_32.c          |    2 -
 arch/x86/kernel/tlb_32.c              |   10 +++---
 arch/x86/mach-voyager/voyager_smp.c   |    4 +-
 arch/x86/xen/enlighten.c              |   14 ++++-----
 arch/x86/xen/irq.c                    |    8 ++---
 arch/x86/xen/mmu.c                    |    2 -
 arch/x86/xen/multicalls.h             |    2 -
 arch/x86/xen/smp.c                    |    2 -
 include/asm-generic/percpu.h          |   52 ++++++++++++++++++++++++++++++++++
 15 files changed, 102 insertions(+), 48 deletions(-)

Index: work/arch/x86/include/asm/current.h
===================================================================
--- work.orig/arch/x86/include/asm/current.h
+++ work/arch/x86/include/asm/current.h
@@ -10,7 +10,7 @@ struct task_struct;
 DECLARE_PER_CPU(struct task_struct *, current_task);
 static __always_inline struct task_struct *get_current(void)
 {
-	return x86_read_percpu(current_task);
+	return percpu_read(current_task);
 }
 
 #else /* X86_32 */
Index: work/arch/x86/include/asm/irq_regs_32.h
===================================================================
--- work.orig/arch/x86/include/asm/irq_regs_32.h
+++ work/arch/x86/include/asm/irq_regs_32.h
@@ -15,7 +15,7 @@ DECLARE_PER_CPU(struct pt_regs *, irq_re
 
 static inline struct pt_regs *get_irq_regs(void)
 {
-	return x86_read_percpu(irq_regs);
+	return percpu_read(irq_regs);
 }
 
 static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs)
@@ -23,7 +23,7 @@ static inline struct pt_regs *set_irq_re
 	struct pt_regs *old_regs;
 
 	old_regs = get_irq_regs();
-	x86_write_percpu(irq_regs, new_regs);
+	percpu_write(irq_regs, new_regs);
 
 	return old_regs;
 }
Index: work/arch/x86/include/asm/mmu_context_32.h
===================================================================
--- work.orig/arch/x86/include/asm/mmu_context_32.h
+++ work/arch/x86/include/asm/mmu_context_32.h
@@ -4,8 +4,8 @@
 static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 {
 #ifdef CONFIG_SMP
-	if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK)
-		x86_write_percpu(cpu_tlbstate.state, TLBSTATE_LAZY);
+	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
+		percpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
 #endif
 }
 
@@ -19,8 +19,8 @@ static inline void switch_mm(struct mm_s
 		/* stop flush ipis for the previous mm */
 		cpu_clear(cpu, prev->cpu_vm_mask);
 #ifdef CONFIG_SMP
-		x86_write_percpu(cpu_tlbstate.state, TLBSTATE_OK);
-		x86_write_percpu(cpu_tlbstate.active_mm, next);
+		percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
+		percpu_write(cpu_tlbstate.active_mm, next);
 #endif
 		cpu_set(cpu, next->cpu_vm_mask);
 
@@ -35,8 +35,8 @@ static inline void switch_mm(struct mm_s
 	}
 #ifdef CONFIG_SMP
 	else {
-		x86_write_percpu(cpu_tlbstate.state, TLBSTATE_OK);
-		BUG_ON(x86_read_percpu(cpu_tlbstate.active_mm) != next);
+		percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
+		BUG_ON(percpu_read(cpu_tlbstate.active_mm) != next);
 
 		if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) {
 			/* We were in lazy tlb mode and leave_mm disabled
Index: work/arch/x86/include/asm/pda.h
===================================================================
--- work.orig/arch/x86/include/asm/pda.h
+++ work/arch/x86/include/asm/pda.h
@@ -45,11 +45,11 @@ extern void pda_init(int);
 
 #define cpu_pda(cpu)		(&per_cpu(__pda, cpu))
 
-#define read_pda(field)		x86_read_percpu(__pda.field)
-#define write_pda(field, val)	x86_write_percpu(__pda.field, val)
-#define add_pda(field, val)	x86_add_percpu(__pda.field, val)
-#define sub_pda(field, val)	x86_sub_percpu(__pda.field, val)
-#define or_pda(field, val)	x86_or_percpu(__pda.field, val)
+#define read_pda(field)		percpu_read(__pda.field)
+#define write_pda(field, val)	percpu_write(__pda.field, val)
+#define add_pda(field, val)	percpu_add(__pda.field, val)
+#define sub_pda(field, val)	percpu_sub(__pda.field, val)
+#define or_pda(field, val)	percpu_or(__pda.field, val)
 
 /* This is not atomic against other CPUs -- CPU preemption needs to be off */
 #define test_and_clear_bit_pda(bit, field)				\
Index: work/arch/x86/include/asm/percpu.h
===================================================================
--- work.orig/arch/x86/include/asm/percpu.h
+++ work/arch/x86/include/asm/percpu.h
@@ -40,16 +40,11 @@
 
 #ifdef CONFIG_SMP
 #define __percpu_seg_str	"%%"__stringify(__percpu_seg)":"
-#define __my_cpu_offset		x86_read_percpu(this_cpu_off)
+#define __my_cpu_offset		percpu_read(this_cpu_off)
 #else
 #define __percpu_seg_str
 #endif
 
-#include <asm-generic/percpu.h>
-
-/* We can use this directly for local CPU (faster). */
-DECLARE_PER_CPU(unsigned long, this_cpu_off);
-
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
@@ -115,11 +110,13 @@ do {							\
 	ret__;						\
 })
 
-#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var)
-#define x86_write_percpu(var, val) percpu_to_op("mov", per_cpu__##var, val)
-#define x86_add_percpu(var, val) percpu_to_op("add", per_cpu__##var, val)
-#define x86_sub_percpu(var, val) percpu_to_op("sub", per_cpu__##var, val)
-#define x86_or_percpu(var, val) percpu_to_op("or", per_cpu__##var, val)
+#define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
+#define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
+#define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
+#define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
+#define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
+#define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
+#define percpu_xor(var, val)	percpu_to_op("xor", per_cpu__##var, val)
 
 /* This is not atomic against other CPUs -- CPU preemption needs to be off */
 #define x86_test_and_clear_bit_percpu(bit, var)				\
@@ -131,6 +128,11 @@ do {							\
 	old__;								\
 })
 
+#include <asm-generic/percpu.h>
+
+/* We can use this directly for local CPU (faster). */
+DECLARE_PER_CPU(unsigned long, this_cpu_off);
+
 #ifdef CONFIG_X86_64
 extern void load_pda_offset(int cpu);
 #else
Index: work/arch/x86/include/asm/smp.h
===================================================================
--- work.orig/arch/x86/include/asm/smp.h
+++ work/arch/x86/include/asm/smp.h
@@ -163,7 +163,7 @@ extern unsigned disabled_cpus __cpuinitd
  * from the initial startup. We map APIC_BASE very early in page_setup(),
  * so this is correct in the x86 case.
  */
-#define raw_smp_processor_id() (x86_read_percpu(cpu_number))
+#define raw_smp_processor_id() (percpu_read(cpu_number))
 extern int safe_smp_processor_id(void);
 
 #elif defined(CONFIG_X86_64_SMP)
Index: work/arch/x86/kernel/process_32.c
===================================================================
--- work.orig/arch/x86/kernel/process_32.c
+++ work/arch/x86/kernel/process_32.c
@@ -592,7 +592,7 @@ __switch_to(struct task_struct *prev_p,
 	if (prev->gs | next->gs)
 		loadsegment(gs, next->gs);
 
-	x86_write_percpu(current_task, next_p);
+	percpu_write(current_task, next_p);
 
 	return prev_p;
 }
Index: work/arch/x86/kernel/tlb_32.c
===================================================================
--- work.orig/arch/x86/kernel/tlb_32.c
+++ work/arch/x86/kernel/tlb_32.c
@@ -34,8 +34,8 @@ static DEFINE_SPINLOCK(tlbstate_lock);
  */
 void leave_mm(int cpu)
 {
-	BUG_ON(x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK);
-	cpu_clear(cpu, x86_read_percpu(cpu_tlbstate.active_mm)->cpu_vm_mask);
+	BUG_ON(percpu_read(cpu_tlbstate.state) == TLBSTATE_OK);
+	cpu_clear(cpu, percpu_read(cpu_tlbstate.active_mm)->cpu_vm_mask);
 	load_cr3(swapper_pg_dir);
 }
 EXPORT_SYMBOL_GPL(leave_mm);
@@ -103,8 +103,8 @@ void smp_invalidate_interrupt(struct pt_
 		 * BUG();
 		 */
 
-	if (flush_mm == x86_read_percpu(cpu_tlbstate.active_mm)) {
-		if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK) {
+	if (flush_mm == percpu_read(cpu_tlbstate.active_mm)) {
+		if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK) {
 			if (flush_va == TLB_FLUSH_ALL)
 				local_flush_tlb();
 			else
@@ -237,7 +237,7 @@ static void do_flush_tlb_all(void *info)
 	unsigned long cpu = smp_processor_id();
 
 	__flush_tlb_all();
-	if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_LAZY)
+	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_LAZY)
 		leave_mm(cpu);
 }
 
Index: work/arch/x86/mach-voyager/voyager_smp.c
===================================================================
--- work.orig/arch/x86/mach-voyager/voyager_smp.c
+++ work/arch/x86/mach-voyager/voyager_smp.c
@@ -410,7 +410,7 @@ void __init find_smp_config(void)
 	     VOYAGER_SUS_IN_CONTROL_PORT);
 
 	current_thread_info()->cpu = boot_cpu_id;
-	x86_write_percpu(cpu_number, boot_cpu_id);
+	percpu_write(cpu_number, boot_cpu_id);
 }
 
 /*
@@ -1790,7 +1790,7 @@ static void __init voyager_smp_cpus_done
 void __init smp_setup_processor_id(void)
 {
 	current_thread_info()->cpu = hard_smp_processor_id();
-	x86_write_percpu(cpu_number, hard_smp_processor_id());
+	percpu_write(cpu_number, hard_smp_processor_id());
 }
 
 static void voyager_send_call_func(cpumask_t callmask)
Index: work/arch/x86/xen/enlighten.c
===================================================================
--- work.orig/arch/x86/xen/enlighten.c
+++ work/arch/x86/xen/enlighten.c
@@ -702,17 +702,17 @@ static void xen_write_cr0(unsigned long
 
 static void xen_write_cr2(unsigned long cr2)
 {
-	x86_read_percpu(xen_vcpu)->arch.cr2 = cr2;
+	percpu_read(xen_vcpu)->arch.cr2 = cr2;
 }
 
 static unsigned long xen_read_cr2(void)
 {
-	return x86_read_percpu(xen_vcpu)->arch.cr2;
+	return percpu_read(xen_vcpu)->arch.cr2;
 }
 
 static unsigned long xen_read_cr2_direct(void)
 {
-	return x86_read_percpu(xen_vcpu_info.arch.cr2);
+	return percpu_read(xen_vcpu_info.arch.cr2);
 }
 
 static void xen_write_cr4(unsigned long cr4)
@@ -725,12 +725,12 @@ static void xen_write_cr4(unsigned long
 
 static unsigned long xen_read_cr3(void)
 {
-	return x86_read_percpu(xen_cr3);
+	return percpu_read(xen_cr3);
 }
 
 static void set_current_cr3(void *v)
 {
-	x86_write_percpu(xen_current_cr3, (unsigned long)v);
+	percpu_write(xen_current_cr3, (unsigned long)v);
 }
 
 static void __xen_write_cr3(bool kernel, unsigned long cr3)
@@ -755,7 +755,7 @@ static void __xen_write_cr3(bool kernel,
 	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
 
 	if (kernel) {
-		x86_write_percpu(xen_cr3, cr3);
+		percpu_write(xen_cr3, cr3);
 
 		/* Update xen_current_cr3 once the batch has actually
 		   been submitted. */
@@ -771,7 +771,7 @@ static void xen_write_cr3(unsigned long
 
 	/* Update while interrupts are disabled, so its atomic with
 	   respect to ipis */
-	x86_write_percpu(xen_cr3, cr3);
+	percpu_write(xen_cr3, cr3);
 
 	__xen_write_cr3(true, cr3);
 
Index: work/arch/x86/xen/irq.c
===================================================================
--- work.orig/arch/x86/xen/irq.c
+++ work/arch/x86/xen/irq.c
@@ -39,7 +39,7 @@ static unsigned long xen_save_fl(void)
 	struct vcpu_info *vcpu;
 	unsigned long flags;
 
-	vcpu = x86_read_percpu(xen_vcpu);
+	vcpu = percpu_read(xen_vcpu);
 
 	/* flag has opposite sense of mask */
 	flags = !vcpu->evtchn_upcall_mask;
@@ -62,7 +62,7 @@ static void xen_restore_fl(unsigned long
 	   make sure we're don't switch CPUs between getting the vcpu
 	   pointer and updating the mask. */
 	preempt_disable();
-	vcpu = x86_read_percpu(xen_vcpu);
+	vcpu = percpu_read(xen_vcpu);
 	vcpu->evtchn_upcall_mask = flags;
 	preempt_enable_no_resched();
 
@@ -83,7 +83,7 @@ static void xen_irq_disable(void)
 	   make sure we're don't switch CPUs between getting the vcpu
 	   pointer and updating the mask. */
 	preempt_disable();
-	x86_read_percpu(xen_vcpu)->evtchn_upcall_mask = 1;
+	percpu_read(xen_vcpu)->evtchn_upcall_mask = 1;
 	preempt_enable_no_resched();
 }
 
@@ -96,7 +96,7 @@ static void xen_irq_enable(void)
 	   the caller is confused and is trying to re-enable interrupts
 	   on an indeterminate processor. */
 
-	vcpu = x86_read_percpu(xen_vcpu);
+	vcpu = percpu_read(xen_vcpu);
 	vcpu->evtchn_upcall_mask = 0;
 
 	/* Doesn't matter if we get preempted here, because any
Index: work/arch/x86/xen/mmu.c
===================================================================
--- work.orig/arch/x86/xen/mmu.c
+++ work/arch/x86/xen/mmu.c
@@ -1074,7 +1074,7 @@ static void drop_other_mm_ref(void *info
 
 	/* If this cpu still has a stale cr3 reference, then make sure
 	   it has been flushed. */
-	if (x86_read_percpu(xen_current_cr3) == __pa(mm->pgd)) {
+	if (percpu_read(xen_current_cr3) == __pa(mm->pgd)) {
 		load_cr3(swapper_pg_dir);
 		arch_flush_lazy_cpu_mode();
 	}
Index: work/arch/x86/xen/multicalls.h
===================================================================
--- work.orig/arch/x86/xen/multicalls.h
+++ work/arch/x86/xen/multicalls.h
@@ -39,7 +39,7 @@ static inline void xen_mc_issue(unsigned
 		xen_mc_flush();
 
 	/* restore flags saved in xen_mc_batch */
-	local_irq_restore(x86_read_percpu(xen_mc_irq_flags));
+	local_irq_restore(percpu_read(xen_mc_irq_flags));
 }
 
 /* Set up a callback to be called when the current batch is flushed */
Index: work/arch/x86/xen/smp.c
===================================================================
--- work.orig/arch/x86/xen/smp.c
+++ work/arch/x86/xen/smp.c
@@ -78,7 +78,7 @@ static __cpuinit void cpu_bringup(void)
 	xen_setup_cpu_clockevents();
 
 	cpu_set(cpu, cpu_online_map);
-	x86_write_percpu(cpu_state, CPU_ONLINE);
+	percpu_write(cpu_state, CPU_ONLINE);
 	wmb();
 
 	/* We can take interrupts now: we're officially "up". */
Index: work/include/asm-generic/percpu.h
===================================================================
--- work.orig/include/asm-generic/percpu.h
+++ work/include/asm-generic/percpu.h
@@ -80,4 +80,56 @@ extern void setup_per_cpu_areas(void);
 #define DECLARE_PER_CPU(type, name) extern PER_CPU_ATTRIBUTES \
 					__typeof__(type) per_cpu_var(name)
 
+/*
+ * Optional methods for optimized non-lvalue per-cpu variable access.
+ *
+ * @var can be a percpu variable or a field of it and its size should
+ * equal char, int or long.  percpu_read() evaluates to a lvalue and
+ * all others to void.
+ *
+ * These operations are guaranteed to be atomic w.r.t. preemption.
+ * The generic versions use plain get/put_cpu_var().  Archs are
+ * encouraged to implement single-instruction alternatives which don't
+ * require preemption protection.
+ */
+#ifndef percpu_read
+# define percpu_read(var)						\
+  ({									\
+	typeof(per_cpu_var(var)) __tmp_var__;				\
+	__tmp_var__ = get_cpu_var(var);					\
+	put_cpu_var(var);						\
+	__tmp_var__;							\
+  })
+#endif
+
+#define __percpu_generic_to_op(var, val, op)				\
+do {									\
+	get_cpu_var(var) op val;					\
+	put_cpu_var(var);						\
+} while (0)
+
+#ifndef percpu_write
+# define percpu_write(var, val)		__percpu_generic_to_op(var, (val), =)
+#endif
+
+#ifndef percpu_add
+# define percpu_add(var, val)		__percpu_generic_to_op(var, (val), +=)
+#endif
+
+#ifndef percpu_sub
+# define percpu_sub(var, val)		__percpu_generic_to_op(var, (val), -=)
+#endif
+
+#ifndef percpu_and
+# define percpu_and(var, val)		__percpu_generic_to_op(var, (val), &=)
+#endif
+
+#ifndef percpu_or
+# define percpu_or(var, val)		__percpu_generic_to_op(var, (val), |=)
+#endif
+
+#ifndef percpu_xor
+# define percpu_xor(var, val)		__percpu_generic_to_op(var, (val), ^=)
+#endif
+
 #endif /* _ASM_GENERIC_PERCPU_H_ */

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 13:04                   ` Tejun Heo
@ 2009-01-15 13:07                     ` Ingo Molnar
  2009-01-15 13:23                       ` [PATCH] percpu: " Tejun Heo
  2009-01-15 13:59                     ` [patch] " roel kluin
  1 sibling, 1 reply; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 13:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty, travis,
	linux-kernel, akpm, steiner, hugh


* Tejun Heo <tj@kernel.org> wrote:

> It is an optimization and a cleanup, and adds the following new
> generic percpu methods:
> 
>   percpu_read()
>   percpu_write()
>   percpu_add()
>   percpu_sub()
>   percpu_or() 
>   percpu_xor()

nit: add percpu_and() here ;-)

> tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
>     * added percpu_and() for completeness's sake
>     * made generic percpu ops atomic against preemption

okay - as long as the optimized versions stay single-instruction i'm a 
happy camper.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 12:22                   ` Ingo Molnar
@ 2009-01-15 13:09                     ` Tejun Heo
  2009-01-15 13:32                       ` Ingo Molnar
  2009-01-15 20:31                     ` Christoph Lameter
  1 sibling, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-15 13:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

Ingo Molnar wrote:
>> And this happens to be the third time I wrote the above addresses in 
>> this thread.  :-)
> 
> You know the drill: you have to explicitly ask for stuff to be pulled :-)

Heh... Jeff usually just regenerates patches from the git tree or just
takes the mails, so not too familiar with pull requests.  :-)

> ( Otherwise if i just pull it and you rebased it or were unhappy about it 
>   in some fashion we all add stress that could have been avoided. )
> 
> So i've picked up these commits from you:
> 
>  d060676: x86_64: misc clean up after the percpu update
>  6d459d9: x86_64: convert pda ops to wrappers around x86 percpu accessors
>  d7051ef: x86_64: make pda a percpu variable
>  4fe7fdf: x86_64: merge 64 and 32 SMP percpu handling
>  44f5fbd: x86_64: fold pda into percpu area on SMP
>  cc1d354: x86_64: use static _cpu_pda array
>  c701268: x86_64: load pointer to pda into %gs while brining up a CPU
>  7e36da9: x86_64: make percpu symbols zerobased on SMP
>  3fc860d: x86_32: make vmlinux_32.lds.S use PERCPU() macro
>  bc497c7: x86_64: Cleanup early setup_percpu references
>  769d780: x86: make early_per_cpu() a lvalue and use it
>  66cbc8e: x86_64: fix pda_to_op()
> 
> into tip/x86/percpu.
> 
> I did the following small reorganizations:
> 
>  - i rebased them on top of tip/cpus4096, tip/x86/cleanups and 
>    tip/x86/urgent - that is a stable base.
> 
>  - i resolved the conflicts that arose due to recent cpumask and upstream 
>    changes.
> 
>  - i standardized the commit logs to the usual x86 style and added 
>    Original-From: Mike Travis tags to those patches that were derived from 
>    Mike's patches.
> 
> If they pass testing they'll be stable commits from that point on, and you 
> can base your git trees on that topic branch.
> 
> Could you please have a look at the end result? Here are the Git 
> coordinates for it:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86/percpu

Yeap, looks fine to me.

> If we are happy with it then these commits can be stable commits from now 
> on, and you can base your future changes on this Git tree, and i can pull 
> updates from you without having to rebase them.
> 
> Does that workflow sound OK to you?

Yes, perfectly okay.

Thanks.  :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH] percpu: add optimized generic percpu accessors
  2009-01-15 13:07                     ` Ingo Molnar
@ 2009-01-15 13:23                       ` Tejun Heo
  2009-01-15 13:36                         ` Ingo Molnar
                                           ` (3 more replies)
  0 siblings, 4 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-15 13:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty, travis,
	linux-kernel, akpm, steiner, hugh

From: Ingo Molnar <mingo@elte.hu>

It is an optimization and a cleanup, and adds the following new
generic percpu methods:

  percpu_read()
  percpu_write()
  percpu_add()
  percpu_sub()
  percpu_and()
  percpu_or()
  percpu_xor()

and implements support for them on x86. (other architectures will fall
back to a default implementation)

The advantage is that for example to read a local percpu variable,
instead of this sequence:

 return __get_cpu_var(var);

 ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
 ffffffff8102ca32:	81
 ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
 ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax

We can get a single instruction by using the optimized variants:

 return percpu_read(var);

 ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax

I also cleaned up the x86-specific APIs and made the x86 code use
these new generic percpu primitives.

tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
    * added percpu_and() for completeness's sake
    * made generic percpu ops atomic against preemption

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Roel Kluin <roel.kluin@gmail.com>
---
Okay, here's the updated patch.  It's on top of x86/percpu[1].  You
can pull from... (or just take the mail, I don't have much problem
with rebasing, so whatever suits you better works for me).

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu

[1] d5fd35177b763186a3f6288624094ee402f0a7e3

Thanks.

 arch/x86/include/asm/current.h        |    2 +-
 arch/x86/include/asm/irq_regs_32.h    |    4 +-
 arch/x86/include/asm/mmu_context_32.h |   12 ++++----
 arch/x86/include/asm/pda.h            |   10 +++---
 arch/x86/include/asm/percpu.h         |   24 ++++++++-------
 arch/x86/include/asm/smp.h            |    2 +-
 arch/x86/kernel/process_32.c          |    2 +-
 arch/x86/kernel/tlb_32.c              |   10 +++---
 arch/x86/mach-voyager/voyager_smp.c   |    4 +-
 arch/x86/xen/enlighten.c              |   14 ++++----
 arch/x86/xen/irq.c                    |    8 ++--
 arch/x86/xen/mmu.c                    |    2 +-
 arch/x86/xen/multicalls.h             |    2 +-
 arch/x86/xen/smp.c                    |    2 +-
 include/asm-generic/percpu.h          |   52 +++++++++++++++++++++++++++++++++
 15 files changed, 102 insertions(+), 48 deletions(-)

diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index 0930b4f..0728480 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -10,7 +10,7 @@ struct task_struct;
 DECLARE_PER_CPU(struct task_struct *, current_task);
 static __always_inline struct task_struct *get_current(void)
 {
-	return x86_read_percpu(current_task);
+	return percpu_read(current_task);
 }
 
 #else /* X86_32 */
diff --git a/arch/x86/include/asm/irq_regs_32.h b/arch/x86/include/asm/irq_regs_32.h
index 86afd74..d7ed33e 100644
--- a/arch/x86/include/asm/irq_regs_32.h
+++ b/arch/x86/include/asm/irq_regs_32.h
@@ -15,7 +15,7 @@ DECLARE_PER_CPU(struct pt_regs *, irq_regs);
 
 static inline struct pt_regs *get_irq_regs(void)
 {
-	return x86_read_percpu(irq_regs);
+	return percpu_read(irq_regs);
 }
 
 static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs)
@@ -23,7 +23,7 @@ static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs)
 	struct pt_regs *old_regs;
 
 	old_regs = get_irq_regs();
-	x86_write_percpu(irq_regs, new_regs);
+	percpu_write(irq_regs, new_regs);
 
 	return old_regs;
 }
diff --git a/arch/x86/include/asm/mmu_context_32.h b/arch/x86/include/asm/mmu_context_32.h
index 7e98ce1..08b5345 100644
--- a/arch/x86/include/asm/mmu_context_32.h
+++ b/arch/x86/include/asm/mmu_context_32.h
@@ -4,8 +4,8 @@
 static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 {
 #ifdef CONFIG_SMP
-	if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK)
-		x86_write_percpu(cpu_tlbstate.state, TLBSTATE_LAZY);
+	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
+		percpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
 #endif
 }
 
@@ -19,8 +19,8 @@ static inline void switch_mm(struct mm_struct *prev,
 		/* stop flush ipis for the previous mm */
 		cpu_clear(cpu, prev->cpu_vm_mask);
 #ifdef CONFIG_SMP
-		x86_write_percpu(cpu_tlbstate.state, TLBSTATE_OK);
-		x86_write_percpu(cpu_tlbstate.active_mm, next);
+		percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
+		percpu_write(cpu_tlbstate.active_mm, next);
 #endif
 		cpu_set(cpu, next->cpu_vm_mask);
 
@@ -35,8 +35,8 @@ static inline void switch_mm(struct mm_struct *prev,
 	}
 #ifdef CONFIG_SMP
 	else {
-		x86_write_percpu(cpu_tlbstate.state, TLBSTATE_OK);
-		BUG_ON(x86_read_percpu(cpu_tlbstate.active_mm) != next);
+		percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
+		BUG_ON(percpu_read(cpu_tlbstate.active_mm) != next);
 
 		if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) {
 			/* We were in lazy tlb mode and leave_mm disabled
diff --git a/arch/x86/include/asm/pda.h b/arch/x86/include/asm/pda.h
index e3d3a08..47f274f 100644
--- a/arch/x86/include/asm/pda.h
+++ b/arch/x86/include/asm/pda.h
@@ -45,11 +45,11 @@ extern void pda_init(int);
 
 #define cpu_pda(cpu)		(&per_cpu(__pda, cpu))
 
-#define read_pda(field)		x86_read_percpu(__pda.field)
-#define write_pda(field, val)	x86_write_percpu(__pda.field, val)
-#define add_pda(field, val)	x86_add_percpu(__pda.field, val)
-#define sub_pda(field, val)	x86_sub_percpu(__pda.field, val)
-#define or_pda(field, val)	x86_or_percpu(__pda.field, val)
+#define read_pda(field)		percpu_read(__pda.field)
+#define write_pda(field, val)	percpu_write(__pda.field, val)
+#define add_pda(field, val)	percpu_add(__pda.field, val)
+#define sub_pda(field, val)	percpu_sub(__pda.field, val)
+#define or_pda(field, val)	percpu_or(__pda.field, val)
 
 /* This is not atomic against other CPUs -- CPU preemption needs to be off */
 #define test_and_clear_bit_pda(bit, field)				\
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 328b31a..03aa4b0 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -40,16 +40,11 @@
 
 #ifdef CONFIG_SMP
 #define __percpu_seg_str	"%%"__stringify(__percpu_seg)":"
-#define __my_cpu_offset		x86_read_percpu(this_cpu_off)
+#define __my_cpu_offset		percpu_read(this_cpu_off)
 #else
 #define __percpu_seg_str
 #endif
 
-#include <asm-generic/percpu.h>
-
-/* We can use this directly for local CPU (faster). */
-DECLARE_PER_CPU(unsigned long, this_cpu_off);
-
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
@@ -115,11 +110,13 @@ do {							\
 	ret__;						\
 })
 
-#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var)
-#define x86_write_percpu(var, val) percpu_to_op("mov", per_cpu__##var, val)
-#define x86_add_percpu(var, val) percpu_to_op("add", per_cpu__##var, val)
-#define x86_sub_percpu(var, val) percpu_to_op("sub", per_cpu__##var, val)
-#define x86_or_percpu(var, val) percpu_to_op("or", per_cpu__##var, val)
+#define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
+#define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
+#define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
+#define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
+#define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
+#define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
+#define percpu_xor(var, val)	percpu_to_op("xor", per_cpu__##var, val)
 
 /* This is not atomic against other CPUs -- CPU preemption needs to be off */
 #define x86_test_and_clear_bit_percpu(bit, var)				\
@@ -131,6 +128,11 @@ do {							\
 	old__;								\
 })
 
+#include <asm-generic/percpu.h>
+
+/* We can use this directly for local CPU (faster). */
+DECLARE_PER_CPU(unsigned long, this_cpu_off);
+
 #ifdef CONFIG_X86_64
 extern void load_pda_offset(int cpu);
 #else
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 1274154..c7bbbbe 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -160,7 +160,7 @@ extern unsigned disabled_cpus __cpuinitdata;
  * from the initial startup. We map APIC_BASE very early in page_setup(),
  * so this is correct in the x86 case.
  */
-#define raw_smp_processor_id() (x86_read_percpu(cpu_number))
+#define raw_smp_processor_id() (percpu_read(cpu_number))
 extern int safe_smp_processor_id(void);
 
 #elif defined(CONFIG_X86_64_SMP)
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index a546f55..77d5468 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -591,7 +591,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	if (prev->gs | next->gs)
 		loadsegment(gs, next->gs);
 
-	x86_write_percpu(current_task, next_p);
+	percpu_write(current_task, next_p);
 
 	return prev_p;
 }
diff --git a/arch/x86/kernel/tlb_32.c b/arch/x86/kernel/tlb_32.c
index ec53818..e65449d 100644
--- a/arch/x86/kernel/tlb_32.c
+++ b/arch/x86/kernel/tlb_32.c
@@ -34,8 +34,8 @@ static DEFINE_SPINLOCK(tlbstate_lock);
  */
 void leave_mm(int cpu)
 {
-	BUG_ON(x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK);
-	cpu_clear(cpu, x86_read_percpu(cpu_tlbstate.active_mm)->cpu_vm_mask);
+	BUG_ON(percpu_read(cpu_tlbstate.state) == TLBSTATE_OK);
+	cpu_clear(cpu, percpu_read(cpu_tlbstate.active_mm)->cpu_vm_mask);
 	load_cr3(swapper_pg_dir);
 }
 EXPORT_SYMBOL_GPL(leave_mm);
@@ -103,8 +103,8 @@ void smp_invalidate_interrupt(struct pt_regs *regs)
 		 * BUG();
 		 */
 
-	if (flush_mm == x86_read_percpu(cpu_tlbstate.active_mm)) {
-		if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK) {
+	if (flush_mm == percpu_read(cpu_tlbstate.active_mm)) {
+		if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK) {
 			if (flush_va == TLB_FLUSH_ALL)
 				local_flush_tlb();
 			else
@@ -222,7 +222,7 @@ static void do_flush_tlb_all(void *info)
 	unsigned long cpu = smp_processor_id();
 
 	__flush_tlb_all();
-	if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_LAZY)
+	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_LAZY)
 		leave_mm(cpu);
 }
 
diff --git a/arch/x86/mach-voyager/voyager_smp.c b/arch/x86/mach-voyager/voyager_smp.c
index 1a48368..96f15b0 100644
--- a/arch/x86/mach-voyager/voyager_smp.c
+++ b/arch/x86/mach-voyager/voyager_smp.c
@@ -402,7 +402,7 @@ void __init find_smp_config(void)
 	     VOYAGER_SUS_IN_CONTROL_PORT);
 
 	current_thread_info()->cpu = boot_cpu_id;
-	x86_write_percpu(cpu_number, boot_cpu_id);
+	percpu_write(cpu_number, boot_cpu_id);
 }
 
 /*
@@ -1782,7 +1782,7 @@ static void __init voyager_smp_cpus_done(unsigned int max_cpus)
 void __init smp_setup_processor_id(void)
 {
 	current_thread_info()->cpu = hard_smp_processor_id();
-	x86_write_percpu(cpu_number, hard_smp_processor_id());
+	percpu_write(cpu_number, hard_smp_processor_id());
 }
 
 static void voyager_send_call_func(cpumask_t callmask)
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 312414e..75b9413 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -695,17 +695,17 @@ static void xen_write_cr0(unsigned long cr0)
 
 static void xen_write_cr2(unsigned long cr2)
 {
-	x86_read_percpu(xen_vcpu)->arch.cr2 = cr2;
+	percpu_read(xen_vcpu)->arch.cr2 = cr2;
 }
 
 static unsigned long xen_read_cr2(void)
 {
-	return x86_read_percpu(xen_vcpu)->arch.cr2;
+	return percpu_read(xen_vcpu)->arch.cr2;
 }
 
 static unsigned long xen_read_cr2_direct(void)
 {
-	return x86_read_percpu(xen_vcpu_info.arch.cr2);
+	return percpu_read(xen_vcpu_info.arch.cr2);
 }
 
 static void xen_write_cr4(unsigned long cr4)
@@ -718,12 +718,12 @@ static void xen_write_cr4(unsigned long cr4)
 
 static unsigned long xen_read_cr3(void)
 {
-	return x86_read_percpu(xen_cr3);
+	return percpu_read(xen_cr3);
 }
 
 static void set_current_cr3(void *v)
 {
-	x86_write_percpu(xen_current_cr3, (unsigned long)v);
+	percpu_write(xen_current_cr3, (unsigned long)v);
 }
 
 static void __xen_write_cr3(bool kernel, unsigned long cr3)
@@ -748,7 +748,7 @@ static void __xen_write_cr3(bool kernel, unsigned long cr3)
 	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
 
 	if (kernel) {
-		x86_write_percpu(xen_cr3, cr3);
+		percpu_write(xen_cr3, cr3);
 
 		/* Update xen_current_cr3 once the batch has actually
 		   been submitted. */
@@ -764,7 +764,7 @@ static void xen_write_cr3(unsigned long cr3)
 
 	/* Update while interrupts are disabled, so its atomic with
 	   respect to ipis */
-	x86_write_percpu(xen_cr3, cr3);
+	percpu_write(xen_cr3, cr3);
 
 	__xen_write_cr3(true, cr3);
 
diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index bb04260..2e82714 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -39,7 +39,7 @@ static unsigned long xen_save_fl(void)
 	struct vcpu_info *vcpu;
 	unsigned long flags;
 
-	vcpu = x86_read_percpu(xen_vcpu);
+	vcpu = percpu_read(xen_vcpu);
 
 	/* flag has opposite sense of mask */
 	flags = !vcpu->evtchn_upcall_mask;
@@ -62,7 +62,7 @@ static void xen_restore_fl(unsigned long flags)
 	   make sure we're don't switch CPUs between getting the vcpu
 	   pointer and updating the mask. */
 	preempt_disable();
-	vcpu = x86_read_percpu(xen_vcpu);
+	vcpu = percpu_read(xen_vcpu);
 	vcpu->evtchn_upcall_mask = flags;
 	preempt_enable_no_resched();
 
@@ -83,7 +83,7 @@ static void xen_irq_disable(void)
 	   make sure we're don't switch CPUs between getting the vcpu
 	   pointer and updating the mask. */
 	preempt_disable();
-	x86_read_percpu(xen_vcpu)->evtchn_upcall_mask = 1;
+	percpu_read(xen_vcpu)->evtchn_upcall_mask = 1;
 	preempt_enable_no_resched();
 }
 
@@ -96,7 +96,7 @@ static void xen_irq_enable(void)
 	   the caller is confused and is trying to re-enable interrupts
 	   on an indeterminate processor. */
 
-	vcpu = x86_read_percpu(xen_vcpu);
+	vcpu = percpu_read(xen_vcpu);
 	vcpu->evtchn_upcall_mask = 0;
 
 	/* Doesn't matter if we get preempted here, because any
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 503c240..7bc7852 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1074,7 +1074,7 @@ static void drop_other_mm_ref(void *info)
 
 	/* If this cpu still has a stale cr3 reference, then make sure
 	   it has been flushed. */
-	if (x86_read_percpu(xen_current_cr3) == __pa(mm->pgd)) {
+	if (percpu_read(xen_current_cr3) == __pa(mm->pgd)) {
 		load_cr3(swapper_pg_dir);
 		arch_flush_lazy_cpu_mode();
 	}
diff --git a/arch/x86/xen/multicalls.h b/arch/x86/xen/multicalls.h
index 8589382..e786fa7 100644
--- a/arch/x86/xen/multicalls.h
+++ b/arch/x86/xen/multicalls.h
@@ -39,7 +39,7 @@ static inline void xen_mc_issue(unsigned mode)
 		xen_mc_flush();
 
 	/* restore flags saved in xen_mc_batch */
-	local_irq_restore(x86_read_percpu(xen_mc_irq_flags));
+	local_irq_restore(percpu_read(xen_mc_irq_flags));
 }
 
 /* Set up a callback to be called when the current batch is flushed */
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 83fa423..3bfd6dd 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -78,7 +78,7 @@ static __cpuinit void cpu_bringup(void)
 	xen_setup_cpu_clockevents();
 
 	cpu_set(cpu, cpu_online_map);
-	x86_write_percpu(cpu_state, CPU_ONLINE);
+	percpu_write(cpu_state, CPU_ONLINE);
 	wmb();
 
 	/* We can take interrupts now: we're officially "up". */
diff --git a/include/asm-generic/percpu.h b/include/asm-generic/percpu.h
index b0e63c6..00f45ff 100644
--- a/include/asm-generic/percpu.h
+++ b/include/asm-generic/percpu.h
@@ -80,4 +80,56 @@ extern void setup_per_cpu_areas(void);
 #define DECLARE_PER_CPU(type, name) extern PER_CPU_ATTRIBUTES \
 					__typeof__(type) per_cpu_var(name)
 
+/*
+ * Optional methods for optimized non-lvalue per-cpu variable access.
+ *
+ * @var can be a percpu variable or a field of it and its size should
+ * equal char, int or long.  percpu_read() evaluates to a lvalue and
+ * all others to void.
+ *
+ * These operations are guaranteed to be atomic w.r.t. preemption.
+ * The generic versions use plain get/put_cpu_var().  Archs are
+ * encouraged to implement single-instruction alternatives which don't
+ * require preemption protection.
+ */
+#ifndef percpu_read
+# define percpu_read(var)						\
+  ({									\
+	typeof(per_cpu_var(var)) __tmp_var__;				\
+	__tmp_var__ = get_cpu_var(var);					\
+	put_cpu_var(var);						\
+	__tmp_var__;							\
+  })
+#endif
+
+#define __percpu_generic_to_op(var, val, op)				\
+do {									\
+	get_cpu_var(var) op val;					\
+	put_cpu_var(var);						\
+} while (0)
+
+#ifndef percpu_write
+# define percpu_write(var, val)		__percpu_generic_to_op(var, (val), =)
+#endif
+
+#ifndef percpu_add
+# define percpu_add(var, val)		__percpu_generic_to_op(var, (val), +=)
+#endif
+
+#ifndef percpu_sub
+# define percpu_sub(var, val)		__percpu_generic_to_op(var, (val), -=)
+#endif
+
+#ifndef percpu_and
+# define percpu_and(var, val)		__percpu_generic_to_op(var, (val), &=)
+#endif
+
+#ifndef percpu_or
+# define percpu_or(var, val)		__percpu_generic_to_op(var, (val), |=)
+#endif
+
+#ifndef percpu_xor
+# define percpu_xor(var, val)		__percpu_generic_to_op(var, (val), ^=)
+#endif
+
 #endif /* _ASM_GENERIC_PERCPU_H_ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 13:09                     ` Tejun Heo
@ 2009-01-15 13:32                       ` Ingo Molnar
  2009-01-15 13:33                         ` Ingo Molnar
  2009-01-15 13:39                         ` Ingo Molnar
  0 siblings, 2 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 13:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

[-- Attachment #1: Type: text/plain, Size: 9458 bytes --]


FYI, -tip testing found the following bug with your percpu stuff:

There's an early exception during bootup, on 64-bit x86:

  PANIC: early exception 0e rip 10:ffffffff80276855: error ? cr2 6688

 - gcc version 4.3.2 20081007 (Red Hat 4.3.2-6) (GCC) 
 - binutils-2.18.50.0.6-2.x86_64

config attached. You can find the disassembly of lock_release_holdtime() 
below - that's where it crashed:

ffffffff80276851:	48 8d 04 06          	lea    (%rsi,%rax,1),%rax
ffffffff80276855:	4c 3b a0 a8 00 00 00 	cmp    0xa8(%rax),%r12
ffffffff8027685c:	7e 07                	jle    ffffffff80276865 <lock_release_holdtime+0x155>

it probably went wrong here (due to the PDA changes):

ffffffff80276784:	65 48 8b 15 f4 d9 d8 	mov    %gs:0x7fd8d9f4(%rip),%rdx        # 4180 <per_cpu__this_cpu_off>
ffffffff8027678b:	7f 

and we jumped to ffffffff80276840 after that and crashed.

Since the crash is so early, you can build the attached config on any 
64-bit test-system and try to boot into it - it should crash all the time. 
Let me know if you have trouble reproducing it.

	Ingo


ffffffff80276710 <lock_release_holdtime>:
ffffffff80276710:	55                   	push   %rbp
ffffffff80276711:	48 89 e5             	mov    %rsp,%rbp
ffffffff80276714:	48 83 ec 10          	sub    $0x10,%rsp
ffffffff80276718:	8b 05 42 6f a2 00    	mov    0xa26f42(%rip),%eax        # ffffffff80c9d660 <lock_stat>
ffffffff8027671e:	48 89 1c 24          	mov    %rbx,(%rsp)
ffffffff80276722:	4c 89 64 24 08       	mov    %r12,0x8(%rsp)
ffffffff80276727:	48 89 fb             	mov    %rdi,%rbx
ffffffff8027672a:	85 c0                	test   %eax,%eax
ffffffff8027672c:	75 12                	jne    ffffffff80276740 <lock_release_holdtime+0x30>
ffffffff8027672e:	48 8b 1c 24          	mov    (%rsp),%rbx
ffffffff80276732:	4c 8b 64 24 08       	mov    0x8(%rsp),%r12
ffffffff80276737:	c9                   	leaveq 
ffffffff80276738:	c3                   	retq   
ffffffff80276739:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
ffffffff80276740:	e8 0b d7 f9 ff       	callq  ffffffff80213e50 <sched_clock>
ffffffff80276745:	49 89 c4             	mov    %rax,%r12
ffffffff80276748:	0f b7 43 30          	movzwl 0x30(%rbx),%eax
ffffffff8027674c:	4c 2b 63 28          	sub    0x28(%rbx),%r12
ffffffff80276750:	66 25 ff 1f          	and    $0x1fff,%ax
ffffffff80276754:	0f 84 76 01 00 00    	je     ffffffff802768d0 <lock_release_holdtime+0x1c0>
ffffffff8027675a:	0f b7 c0             	movzwl %ax,%eax
ffffffff8027675d:	48 8d 04 80          	lea    (%rax,%rax,4),%rax
ffffffff80276761:	48 8d 04 80          	lea    (%rax,%rax,4),%rax
ffffffff80276765:	48 c1 e0 04          	shl    $0x4,%rax
ffffffff80276769:	48 2d 90 01 00 00    	sub    $0x190,%rax
ffffffff8027676f:	48 8d 88 e0 b5 21 81 	lea    -0x7ede4a20(%rax),%rcx
ffffffff80276776:	48 81 e9 e0 b5 21 81 	sub    $0xffffffff8121b5e0,%rcx
ffffffff8027677d:	48 c7 c0 e0 65 00 00 	mov    $0x65e0,%rax
ffffffff80276784:	65 48 8b 15 f4 d9 d8 	mov    %gs:0x7fd8d9f4(%rip),%rdx        # 4180 <per_cpu__this_cpu_off>
ffffffff8027678b:	7f 
ffffffff8027678c:	48 c1 f9 04          	sar    $0x4,%rcx
ffffffff80276790:	48 8d 34 10          	lea    (%rax,%rdx,1),%rsi
ffffffff80276794:	48 b8 29 5c 8f c2 f5 	mov    $0x8f5c28f5c28f5c29,%rax
ffffffff8027679b:	28 5c 8f 
ffffffff8027679e:	48 0f af c8          	imul   %rax,%rcx
ffffffff802767a2:	f6 43 32 03          	testb  $0x3,0x32(%rbx)
ffffffff802767a6:	0f 84 94 00 00 00    	je     ffffffff80276840 <lock_release_holdtime+0x130>
ffffffff802767ac:	48 89 ca             	mov    %rcx,%rdx
ffffffff802767af:	48 89 c8             	mov    %rcx,%rax
ffffffff802767b2:	48 c1 e2 05          	shl    $0x5,%rdx
ffffffff802767b6:	48 c1 e0 08          	shl    $0x8,%rax
ffffffff802767ba:	48 29 d0             	sub    %rdx,%rax
ffffffff802767bd:	48 8d 04 06          	lea    (%rsi,%rax,1),%rax
ffffffff802767c1:	4c 3b a0 88 00 00 00 	cmp    0x88(%rax),%r12
ffffffff802767c8:	7e 07                	jle    ffffffff802767d1 <lock_release_holdtime+0xc1>
ffffffff802767ca:	4c 89 a0 88 00 00 00 	mov    %r12,0x88(%rax)
ffffffff802767d1:	48 89 ca             	mov    %rcx,%rdx
ffffffff802767d4:	48 89 c8             	mov    %rcx,%rax
ffffffff802767d7:	48 c1 e2 05          	shl    $0x5,%rdx
ffffffff802767db:	48 c1 e0 08          	shl    $0x8,%rax
ffffffff802767df:	48 29 d0             	sub    %rdx,%rax
ffffffff802767e2:	48 8b 84 06 80 00 00 	mov    0x80(%rsi,%rax,1),%rax
ffffffff802767e9:	00 
ffffffff802767ea:	49 39 c4             	cmp    %rax,%r12
ffffffff802767ed:	7c 05                	jl     ffffffff802767f4 <lock_release_holdtime+0xe4>
ffffffff802767ef:	48 85 c0             	test   %rax,%rax
ffffffff802767f2:	75 19                	jne    ffffffff8027680d <lock_release_holdtime+0xfd>
ffffffff802767f4:	48 89 ca             	mov    %rcx,%rdx
ffffffff802767f7:	48 89 c8             	mov    %rcx,%rax
ffffffff802767fa:	48 c1 e2 05          	shl    $0x5,%rdx
ffffffff802767fe:	48 c1 e0 08          	shl    $0x8,%rax
ffffffff80276802:	48 29 d0             	sub    %rdx,%rax
ffffffff80276805:	4c 89 a4 06 80 00 00 	mov    %r12,0x80(%rsi,%rax,1)
ffffffff8027680c:	00 
ffffffff8027680d:	48 89 ca             	mov    %rcx,%rdx
ffffffff80276810:	48 89 c8             	mov    %rcx,%rax
ffffffff80276813:	48 c1 e2 05          	shl    $0x5,%rdx
ffffffff80276817:	48 c1 e0 08          	shl    $0x8,%rax
ffffffff8027681b:	48 29 d0             	sub    %rdx,%rax
ffffffff8027681e:	48 8d 04 06          	lea    (%rsi,%rax,1),%rax
ffffffff80276822:	4c 01 a0 90 00 00 00 	add    %r12,0x90(%rax)
ffffffff80276829:	48 83 80 98 00 00 00 	addq   $0x1,0x98(%rax)
ffffffff80276830:	01 
ffffffff80276831:	e9 f8 fe ff ff       	jmpq   ffffffff8027672e <lock_release_holdtime+0x1e>
ffffffff80276836:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
ffffffff8027683d:	00 00 00 
ffffffff80276840:	48 89 ca             	mov    %rcx,%rdx
ffffffff80276843:	48 89 c8             	mov    %rcx,%rax
ffffffff80276846:	48 c1 e2 05          	shl    $0x5,%rdx
ffffffff8027684a:	48 c1 e0 08          	shl    $0x8,%rax
ffffffff8027684e:	48 29 d0             	sub    %rdx,%rax
ffffffff80276851:	48 8d 04 06          	lea    (%rsi,%rax,1),%rax
ffffffff80276855:	4c 3b a0 a8 00 00 00 	cmp    0xa8(%rax),%r12
ffffffff8027685c:	7e 07                	jle    ffffffff80276865 <lock_release_holdtime+0x155>
ffffffff8027685e:	4c 89 a0 a8 00 00 00 	mov    %r12,0xa8(%rax)
ffffffff80276865:	48 89 ca             	mov    %rcx,%rdx
ffffffff80276868:	48 89 c8             	mov    %rcx,%rax
ffffffff8027686b:	48 c1 e2 05          	shl    $0x5,%rdx
ffffffff8027686f:	48 c1 e0 08          	shl    $0x8,%rax
ffffffff80276873:	48 29 d0             	sub    %rdx,%rax
ffffffff80276876:	48 8b 84 06 a0 00 00 	mov    0xa0(%rsi,%rax,1),%rax
ffffffff8027687d:	00 
ffffffff8027687e:	49 39 c4             	cmp    %rax,%r12
ffffffff80276881:	7c 05                	jl     ffffffff80276888 <lock_release_holdtime+0x178>
ffffffff80276883:	48 85 c0             	test   %rax,%rax
ffffffff80276886:	75 19                	jne    ffffffff802768a1 <lock_release_holdtime+0x191>
ffffffff80276888:	48 89 ca             	mov    %rcx,%rdx
ffffffff8027688b:	48 89 c8             	mov    %rcx,%rax
ffffffff8027688e:	48 c1 e2 05          	shl    $0x5,%rdx
ffffffff80276892:	48 c1 e0 08          	shl    $0x8,%rax
ffffffff80276896:	48 29 d0             	sub    %rdx,%rax
ffffffff80276899:	4c 89 a4 06 a0 00 00 	mov    %r12,0xa0(%rsi,%rax,1)
ffffffff802768a0:	00 
ffffffff802768a1:	48 89 ca             	mov    %rcx,%rdx
ffffffff802768a4:	48 89 c8             	mov    %rcx,%rax
ffffffff802768a7:	48 c1 e2 05          	shl    $0x5,%rdx
ffffffff802768ab:	48 c1 e0 08          	shl    $0x8,%rax
ffffffff802768af:	48 29 d0             	sub    %rdx,%rax
ffffffff802768b2:	48 8d 04 06          	lea    (%rsi,%rax,1),%rax
ffffffff802768b6:	4c 01 a0 b0 00 00 00 	add    %r12,0xb0(%rax)
ffffffff802768bd:	48 83 80 b8 00 00 00 	addq   $0x1,0xb8(%rax)
ffffffff802768c4:	01 
ffffffff802768c5:	e9 64 fe ff ff       	jmpq   ffffffff8027672e <lock_release_holdtime+0x1e>
ffffffff802768ca:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
ffffffff802768d0:	8b 05 6a 30 d3 00    	mov    0xd3306a(%rip),%eax        # ffffffff80fa9940 <oops_in_progress>
ffffffff802768d6:	85 c0                	test   %eax,%eax
ffffffff802768d8:	74 07                	je     ffffffff802768e1 <lock_release_holdtime+0x1d1>
ffffffff802768da:	31 c9                	xor    %ecx,%ecx
ffffffff802768dc:	e9 95 fe ff ff       	jmpq   ffffffff80276776 <lock_release_holdtime+0x66>
ffffffff802768e1:	e8 aa 2c 20 00       	callq  ffffffff80479590 <debug_locks_off>
ffffffff802768e6:	85 c0                	test   %eax,%eax
ffffffff802768e8:	74 f0                	je     ffffffff802768da <lock_release_holdtime+0x1ca>
ffffffff802768ea:	8b 05 d0 ed 51 01    	mov    0x151edd0(%rip),%eax        # ffffffff817956c0 <debug_locks_silent>
ffffffff802768f0:	85 c0                	test   %eax,%eax
ffffffff802768f2:	75 e6                	jne    ffffffff802768da <lock_release_holdtime+0x1ca>
ffffffff802768f4:	31 d2                	xor    %edx,%edx
ffffffff802768f6:	be 83 00 00 00       	mov    $0x83,%esi
ffffffff802768fb:	48 c7 c7 2c 73 b9 80 	mov    $0xffffffff80b9732c,%rdi
ffffffff80276902:	31 c0                	xor    %eax,%eax
ffffffff80276904:	e8 d7 5b fd ff       	callq  ffffffff8024c4e0 <warn_slowpath>
ffffffff80276909:	31 c9                	xor    %ecx,%ecx
ffffffff8027690b:	e9 66 fe ff ff       	jmpq   ffffffff80276776 <lock_release_holdtime+0x66>


[-- Attachment #2: config --]
[-- Type: text/plain, Size: 60343 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.29-rc1
# Thu Jan 15 14:31:16 2009
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_X86_SMP=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
CONFIG_AUDIT=y
# CONFIG_AUDITSYSCALL is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=21
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
# CONFIG_GROUP_SCHED is not set

#
# Control Group support
#
# CONFIG_CGROUPS is not set
# CONFIG_SYSFS_DEPRECATED_V2 is not set
CONFIG_RELAY=y
# CONFIG_NAMESPACES is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_STRIP_GENERATED is not set
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_BASE_FULL is not set
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
# CONFIG_TIMERFD is not set
# CONFIG_EVENTFD is not set
# CONFIG_SHMEM is not set
CONFIG_AIO=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_MARKERS=y
# CONFIG_OPROFILE is not set
CONFIG_HAVE_OPROFILE=y
# CONFIG_KPROBES is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=1
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_BLK_DEV_BSG=y
# CONFIG_BLK_DEV_INTEGRITY is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=m
CONFIG_IOSCHED_DEADLINE=m
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_CLASSIC_RCU=y
# CONFIG_TREE_RCU is not set
# CONFIG_PREEMPT_RCU is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_PREEMPT_RCU_TRACE is not set
# CONFIG_FREEZER is not set

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
# CONFIG_SPARSE_IRQ is not set
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_VSMP is not set
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_PARAVIRT_GUEST=y
# CONFIG_XEN is not set
CONFIG_KVM_CLOCK=y
CONFIG_KVM_GUEST=y
CONFIG_PARAVIRT=y
CONFIG_PARAVIRT_CLOCK=y
CONFIG_PARAVIRT_DEBUG=y
CONFIG_MEMTEST=y
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_CPU=y
CONFIG_X86_L1_CACHE_BYTES=128
CONFIG_X86_INTERNODE_CACHE_BYTES=128
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
# CONFIG_PROCESSOR_SELECT is not set
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR_64=y
CONFIG_X86_DS=y
CONFIG_X86_PTRACE_BTS=y
CONFIG_HPET_TIMER=y
# CONFIG_DMI is not set
# CONFIG_GART_IOMMU is not set
CONFIG_CALGARY_IOMMU=y
CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
# CONFIG_IOMMU_API is not set
# CONFIG_MAXSMP is not set
CONFIG_NR_CPUS=8
CONFIG_SCHED_SMT=y
# CONFIG_SCHED_MC is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
# CONFIG_X86_MCE is not set
CONFIG_I8K=m
CONFIG_MICROCODE=m
# CONFIG_MICROCODE_INTEL is not set
# CONFIG_MICROCODE_AMD is not set
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
# CONFIG_NUMA is not set
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_MEMORY_PROBE=y
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
# CONFIG_SPARSEMEM_VMEMMAP is not set
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
# CONFIG_MEMORY_HOTREMOVE is not set
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_UNEVICTABLE_LRU=y
CONFIG_MMU_NOTIFIER=y
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
# CONFIG_X86_RESERVE_LOW_64K is not set
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
# CONFIG_X86_PAT is not set
# CONFIG_SECCOMP is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_SCHED_HRTICK=y
# CONFIG_KEXEC is not set
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x200000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_HOTPLUG_CPU=y
CONFIG_COMPAT_VDSO=y
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y

#
# Power management and ACPI options
#
CONFIG_PM=y
CONFIG_PM_DEBUG=y
CONFIG_PM_VERBOSE=y
# CONFIG_SUSPEND is not set
# CONFIG_HIBERNATION is not set
# CONFIG_ACPI is not set

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=m
CONFIG_CPU_FREQ_STAT_DETAILS=y
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m

#
# CPUFreq processor drivers
#
# CONFIG_X86_POWERNOW_K8 is not set
CONFIG_X86_P4_CLOCKMOD=y

#
# shared options
#
CONFIG_X86_SPEEDSTEP_LIB=y
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y

#
# Memory power savings
#
# CONFIG_I7300_IDLE is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
# CONFIG_HOTPLUG_PCI_PCIE is not set
CONFIG_PCIEAER=y
# CONFIG_PCIEASPM is not set
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
# CONFIG_PCI_LEGACY is not set
CONFIG_PCI_DEBUG=y
# CONFIG_PCI_STUB is not set
CONFIG_HT_IRQ=y
CONFIG_ISA_DMA_API=y
CONFIG_K8_NB=y
CONFIG_PCCARD=m
CONFIG_PCMCIA_DEBUG=y
CONFIG_PCMCIA=m
CONFIG_PCMCIA_LOAD_CIS=y
# CONFIG_PCMCIA_IOCTL is not set
CONFIG_CARDBUS=y

#
# PC-card bridges
#
# CONFIG_YENTA is not set
CONFIG_PD6729=m
CONFIG_I82092=m
CONFIG_PCCARD_NONSTATIC=m
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_FAKE=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=y

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_HAVE_AOUT is not set
CONFIG_BINFMT_MISC=m
CONFIG_IA32_EMULATION=y
CONFIG_IA32_AOUT=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_COMPAT_NET_DEV_OPS=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_XFRM=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_SUB_POLICY is not set
CONFIG_XFRM_MIGRATE=y
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=y
CONFIG_NET_KEY_MIGRATE=y
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
# CONFIG_IP_FIB_TRIE is not set
CONFIG_IP_FIB_HASH=y
# CONFIG_IP_MULTIPLE_TABLES is not set
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
CONFIG_NET_IPIP=y
# CONFIG_NET_IPGRE is not set
CONFIG_ARPD=y
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=y
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
CONFIG_INET_XFRM_MODE_TUNNEL=y
CONFIG_INET_XFRM_MODE_BEET=y
CONFIG_INET_LRO=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6=y
# CONFIG_IPV6_PRIVACY is not set
# CONFIG_IPV6_ROUTER_PREF is not set
CONFIG_IPV6_OPTIMISTIC_DAD=y
CONFIG_INET6_AH=y
# CONFIG_INET6_ESP is not set
CONFIG_INET6_IPCOMP=m
# CONFIG_IPV6_MIP6 is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=y
# CONFIG_INET6_XFRM_MODE_BEET is not set
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=y
CONFIG_IPV6_SIT=y
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SUBTREES=y
# CONFIG_IPV6_MROUTE is not set
CONFIG_NETLABEL=y
CONFIG_NETWORK_SECMARK=y
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
# CONFIG_NETFILTER_ADVANCED is not set

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_NETLINK_LOG=m
# CONFIG_NF_CONNTRACK is not set
CONFIG_NETFILTER_XTABLES=y
# CONFIG_NETFILTER_XT_TARGET_MARK is not set
# CONFIG_NETFILTER_XT_TARGET_NFLOG is not set
CONFIG_NETFILTER_XT_TARGET_SECMARK=y
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_MATCH_MARK=y
CONFIG_NETFILTER_XT_MATCH_POLICY=y
CONFIG_IP_VS=m
CONFIG_IP_VS_IPV6=y
CONFIG_IP_VS_DEBUG=y
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_AH_ESP=y
# CONFIG_IP_VS_PROTO_ESP is not set
CONFIG_IP_VS_PROTO_AH=y

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
# CONFIG_IP_VS_WRR is not set
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
# CONFIG_IP_VS_LBLC is not set
# CONFIG_IP_VS_LBLCR is not set
CONFIG_IP_VS_DH=m
# CONFIG_IP_VS_SH is not set
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m

#
# IP: Netfilter Configuration
#
# CONFIG_NF_DEFRAG_IPV4 is not set
CONFIG_IP_NF_IPTABLES=y
# CONFIG_IP_NF_FILTER is not set
# CONFIG_IP_NF_TARGET_LOG is not set
CONFIG_IP_NF_TARGET_ULOG=y
CONFIG_IP_NF_MANGLE=y

#
# IPv6: Netfilter Configuration
#
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
# CONFIG_IP6_NF_TARGET_LOG is not set
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
# CONFIG_IP6_NF_MANGLE is not set
CONFIG_IP_DCCP=y
CONFIG_INET_DCCP_DIAG=y

#
# DCCP CCIDs Configuration (EXPERIMENTAL)
#
CONFIG_IP_DCCP_CCID2_DEBUG=y
CONFIG_IP_DCCP_CCID3=y
CONFIG_IP_DCCP_CCID3_DEBUG=y
CONFIG_IP_DCCP_CCID3_RTO=100
CONFIG_IP_DCCP_TFRC_LIB=y
CONFIG_IP_DCCP_TFRC_DEBUG=y

#
# DCCP Kernel Hacking
#
CONFIG_IP_DCCP_DEBUG=y
CONFIG_IP_SCTP=y
CONFIG_SCTP_DBG_MSG=y
# CONFIG_SCTP_DBG_OBJCNT is not set
# CONFIG_SCTP_HMAC_NONE is not set
# CONFIG_SCTP_HMAC_SHA1 is not set
CONFIG_SCTP_HMAC_MD5=y
CONFIG_TIPC=m
CONFIG_TIPC_ADVANCED=y
CONFIG_TIPC_ZONES=3
CONFIG_TIPC_CLUSTERS=1
CONFIG_TIPC_NODES=255
CONFIG_TIPC_SLAVE_NODES=0
CONFIG_TIPC_PORTS=8191
CONFIG_TIPC_LOG=0
CONFIG_TIPC_DEBUG=y
# CONFIG_ATM is not set
CONFIG_STP=y
CONFIG_GARP=m
CONFIG_BRIDGE=y
CONFIG_NET_DSA=y
CONFIG_NET_DSA_TAG_DSA=y
CONFIG_NET_DSA_TAG_EDSA=y
CONFIG_NET_DSA_TAG_TRAILER=y
CONFIG_NET_DSA_MV88E6XXX=y
CONFIG_NET_DSA_MV88E6060=y
CONFIG_NET_DSA_MV88E6XXX_NEED_PPU=y
CONFIG_NET_DSA_MV88E6131=y
CONFIG_NET_DSA_MV88E6123_61_65=y
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
CONFIG_DECNET=y
CONFIG_DECNET_ROUTER=y
CONFIG_LLC=y
CONFIG_LLC2=y
CONFIG_IPX=y
CONFIG_IPX_INTERN=y
CONFIG_ATALK=y
# CONFIG_DEV_APPLETALK is not set
CONFIG_X25=y
CONFIG_LAPB=m
CONFIG_ECONET=m
# CONFIG_ECONET_AUNUDP is not set
# CONFIG_ECONET_NATIVE is not set
# CONFIG_WAN_ROUTER is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
# CONFIG_NET_SCH_CBQ is not set
CONFIG_NET_SCH_HTB=y
CONFIG_NET_SCH_HFSC=y
CONFIG_NET_SCH_PRIO=y
CONFIG_NET_SCH_MULTIQ=y
# CONFIG_NET_SCH_RED is not set
# CONFIG_NET_SCH_SFQ is not set
CONFIG_NET_SCH_TEQL=m
# CONFIG_NET_SCH_TBF is not set
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=y
# CONFIG_NET_SCH_INGRESS is not set

#
# Classification
#
CONFIG_NET_CLS=y
# CONFIG_NET_CLS_BASIC is not set
CONFIG_NET_CLS_TCINDEX=y
CONFIG_NET_CLS_ROUTE4=y
CONFIG_NET_CLS_ROUTE=y
# CONFIG_NET_CLS_FW is not set
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
# CONFIG_NET_CLS_RSVP is not set
CONFIG_NET_CLS_RSVP6=m
# CONFIG_NET_CLS_FLOW is not set
# CONFIG_NET_EMATCH is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=y
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
# CONFIG_NET_ACT_MIRRED is not set
CONFIG_NET_ACT_IPT=m
CONFIG_NET_ACT_NAT=y
CONFIG_NET_ACT_PEDIT=y
CONFIG_NET_ACT_SIMP=y
CONFIG_NET_ACT_SKBEDIT=y
CONFIG_NET_CLS_IND=y
CONFIG_NET_SCH_FIFO=y
CONFIG_DCB=y

#
# Network testing
#
CONFIG_NET_PKTGEN=y
CONFIG_HAMRADIO=y

#
# Packet Radio protocols
#
CONFIG_AX25=m
# CONFIG_AX25_DAMA_SLAVE is not set
CONFIG_NETROM=m
CONFIG_ROSE=m

#
# AX.25 network device drivers
#
CONFIG_MKISS=m
# CONFIG_6PACK is not set
CONFIG_BPQETHER=m
# CONFIG_BAYCOM_SER_FDX is not set
CONFIG_BAYCOM_SER_HDX=m
CONFIG_BAYCOM_PAR=m
CONFIG_YAM=m
CONFIG_CAN=y
# CONFIG_CAN_RAW is not set
# CONFIG_CAN_BCM is not set

#
# CAN Device Drivers
#
# CONFIG_CAN_VCAN is not set
CONFIG_CAN_DEBUG_DEVICES=y
CONFIG_IRDA=y

#
# IrDA protocols
#
# CONFIG_IRLAN is not set
CONFIG_IRNET=m
CONFIG_IRCOMM=y
CONFIG_IRDA_ULTRA=y

#
# IrDA options
#
CONFIG_IRDA_CACHE_LAST_LSAP=y
CONFIG_IRDA_FAST_RR=y
CONFIG_IRDA_DEBUG=y

#
# Infrared-port device drivers
#

#
# SIR device drivers
#
# CONFIG_IRTTY_SIR is not set

#
# Dongle support
#
CONFIG_KINGSUN_DONGLE=y
# CONFIG_KSDAZZLE_DONGLE is not set
CONFIG_KS959_DONGLE=y

#
# FIR device drivers
#
CONFIG_USB_IRDA=m
CONFIG_SIGMATEL_FIR=m
# CONFIG_NSC_FIR is not set
CONFIG_WINBOND_FIR=m
# CONFIG_SMC_IRCC_FIR is not set
CONFIG_ALI_FIR=m
# CONFIG_VLSI_FIR is not set
CONFIG_VIA_FIR=y
# CONFIG_MCS_FIR is not set
CONFIG_BT=m
CONFIG_BT_L2CAP=m
CONFIG_BT_SCO=m
# CONFIG_BT_RFCOMM is not set
# CONFIG_BT_BNEP is not set
CONFIG_BT_HIDP=m

#
# Bluetooth device drivers
#
# CONFIG_BT_HCIBTUSB is not set
CONFIG_BT_HCIBTSDIO=m
CONFIG_BT_HCIUART=m
CONFIG_BT_HCIUART_H4=y
# CONFIG_BT_HCIUART_BCSP is not set
CONFIG_BT_HCIUART_LL=y
CONFIG_BT_HCIBCM203X=m
CONFIG_BT_HCIBPA10X=m
# CONFIG_BT_HCIBFUSB is not set
# CONFIG_BT_HCIDTL1 is not set
# CONFIG_BT_HCIBT3C is not set
# CONFIG_BT_HCIBLUECARD is not set
CONFIG_BT_HCIBTUART=m
CONFIG_BT_HCIVHCI=m
CONFIG_AF_RXRPC=y
CONFIG_AF_RXRPC_DEBUG=y
CONFIG_RXKAD=y
# CONFIG_PHONET is not set
CONFIG_FIB_RULES=y
# CONFIG_WIRELESS is not set
CONFIG_WIRELESS_EXT=y
CONFIG_LIB80211=y
CONFIG_LIB80211_CRYPT_WEP=y
CONFIG_LIB80211_CRYPT_CCMP=y
CONFIG_LIB80211_CRYPT_TKIP=y
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_DEBUG_DRIVER=y
CONFIG_DEBUG_DEVRES=y
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_CONNECTOR=m
# CONFIG_MTD is not set
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
# CONFIG_PARPORT_SERIAL is not set
CONFIG_PARPORT_PC_FIFO=y
# CONFIG_PARPORT_PC_SUPERIO is not set
# CONFIG_PARPORT_PC_PCMCIA is not set
# CONFIG_PARPORT_GSC is not set
CONFIG_PARPORT_AX88796=m
# CONFIG_PARPORT_1284 is not set
CONFIG_PARPORT_NOT_PC=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_FD=y
# CONFIG_PARIDE is not set
CONFIG_BLK_CPQ_DA=y
CONFIG_BLK_CPQ_CISS_DA=y
CONFIG_CISS_SCSI_TAPE=y
CONFIG_BLK_DEV_DAC960=m
CONFIG_BLK_DEV_UMEM=y
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=m
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_SX8=y
CONFIG_BLK_DEV_UB=y
CONFIG_BLK_DEV_RAM=m
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_BLK_DEV_XIP=y
# CONFIG_CDROM_PKTCDVD is not set
CONFIG_ATA_OVER_ETH=m
# CONFIG_VIRTIO_BLK is not set
CONFIG_BLK_DEV_HD=y
CONFIG_MISC_DEVICES=y
CONFIG_IBM_ASM=y
# CONFIG_PHANTOM is not set
CONFIG_EEPROM_93CX6=m
CONFIG_SGI_IOC4=y
CONFIG_TIFM_CORE=y
# CONFIG_TIFM_7XX1 is not set
CONFIG_ICS932S401=m
# CONFIG_ENCLOSURE_SERVICES is not set
CONFIG_SGI_XP=m
CONFIG_HP_ILO=m
CONFIG_SGI_GRU=y
CONFIG_SGI_GRU_DEBUG=y
CONFIG_C2PORT=m
# CONFIG_C2PORT_DURAMAR_2150 is not set
CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_TGT=m
# CONFIG_SCSI_NETLINK is not set
# CONFIG_SCSI_PROC_FS is not set

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=m
# CONFIG_BLK_DEV_SR_VENDOR is not set
CONFIG_CHR_DEV_SG=y
CONFIG_CHR_DEV_SCH=y

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
# CONFIG_SCSI_FC_ATTRS is not set
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
# CONFIG_SCSI_SAS_LIBSAS is not set
CONFIG_SCSI_SRP_ATTRS=y
# CONFIG_SCSI_LOWLEVEL is not set
CONFIG_SCSI_LOWLEVEL_PCMCIA=y
# CONFIG_PCMCIA_FDOMAIN is not set
CONFIG_PCMCIA_QLOGIC=m
CONFIG_PCMCIA_SYM53C500=m
CONFIG_SCSI_DH=y
CONFIG_SCSI_DH_RDAC=m
CONFIG_SCSI_DH_HP_SW=m
# CONFIG_SCSI_DH_EMC is not set
CONFIG_SCSI_DH_ALUA=m
CONFIG_ATA=y
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_SATA_PMP=y
CONFIG_SATA_AHCI=y
CONFIG_SATA_SIL24=y
CONFIG_ATA_SFF=y
CONFIG_SATA_SVW=y
CONFIG_ATA_PIIX=y
# CONFIG_SATA_MV is not set
CONFIG_SATA_NV=y
CONFIG_PDC_ADMA=y
CONFIG_SATA_QSTOR=m
CONFIG_SATA_PROMISE=m
# CONFIG_SATA_SX4 is not set
CONFIG_SATA_SIL=y
CONFIG_SATA_SIS=y
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
CONFIG_SATA_VITESSE=y
CONFIG_SATA_INIC162X=m
# CONFIG_PATA_ALI is not set
CONFIG_PATA_AMD=y
CONFIG_PATA_ARTOP=y
CONFIG_PATA_ATIIXP=y
# CONFIG_PATA_CMD640_PCI is not set
CONFIG_PATA_CMD64X=m
CONFIG_PATA_CS5520=m
CONFIG_PATA_CS5530=m
# CONFIG_PATA_CYPRESS is not set
CONFIG_PATA_EFAR=y
CONFIG_ATA_GENERIC=y
CONFIG_PATA_HPT366=y
CONFIG_PATA_HPT37X=y
# CONFIG_PATA_HPT3X2N is not set
CONFIG_PATA_HPT3X3=m
# CONFIG_PATA_HPT3X3_DMA is not set
CONFIG_PATA_IT821X=y
CONFIG_PATA_IT8213=y
# CONFIG_PATA_JMICRON is not set
CONFIG_PATA_TRIFLEX=m
# CONFIG_PATA_MARVELL is not set
CONFIG_PATA_MPIIX=m
CONFIG_PATA_OLDPIIX=y
# CONFIG_PATA_NETCELL is not set
CONFIG_PATA_NINJA32=m
CONFIG_PATA_NS87410=y
CONFIG_PATA_NS87415=m
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PCMCIA is not set
CONFIG_PATA_PDC_OLD=y
CONFIG_PATA_RADISYS=m
CONFIG_PATA_RZ1000=m
# CONFIG_PATA_SC1200 is not set
CONFIG_PATA_SERVERWORKS=y
CONFIG_PATA_PDC2027X=m
CONFIG_PATA_SIL680=m
CONFIG_PATA_SIS=y
CONFIG_PATA_VIA=m
CONFIG_PATA_WINBOND=m
CONFIG_PATA_PLATFORM=m
# CONFIG_PATA_SCH is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
# CONFIG_MD_RAID10 is not set
CONFIG_MD_RAID456=m
CONFIG_MD_RAID5_RESHAPE=y
# CONFIG_MD_MULTIPATH is not set
CONFIG_MD_FAULTY=m
CONFIG_BLK_DEV_DM=m
CONFIG_DM_DEBUG=y
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_MIRROR=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_DELAY=m
CONFIG_DM_UEVENT=y
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#

#
# Enable only one of the two stacks, unless you know what you are doing
#
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
CONFIG_FIREWIRE_OHCI_DEBUG=y
CONFIG_FIREWIRE_SBP2=m
CONFIG_IEEE1394=m
CONFIG_IEEE1394_OHCI1394=m
CONFIG_IEEE1394_PCILYNX=m
CONFIG_IEEE1394_SBP2=m
CONFIG_IEEE1394_SBP2_PHYS_DMA=y
CONFIG_IEEE1394_ETH1394_ROM_ENTRY=y
CONFIG_IEEE1394_ETH1394=m
# CONFIG_IEEE1394_RAWIO is not set
CONFIG_IEEE1394_VIDEO1394=m
# CONFIG_IEEE1394_DV1394 is not set
CONFIG_IEEE1394_VERBOSEDEBUG=y
# CONFIG_I2O is not set
CONFIG_MACINTOSH_DRIVERS=y
# CONFIG_MAC_EMUMOUSEBTN is not set
CONFIG_NETDEVICES=y
CONFIG_IFB=m
CONFIG_DUMMY=y
CONFIG_BONDING=m
CONFIG_MACVLAN=m
CONFIG_EQUALIZER=m
CONFIG_TUN=m
CONFIG_VETH=y
CONFIG_ARCNET=m
# CONFIG_ARCNET_1201 is not set
CONFIG_ARCNET_1051=m
CONFIG_ARCNET_RAW=m
CONFIG_ARCNET_CAP=m
# CONFIG_ARCNET_COM90xx is not set
CONFIG_ARCNET_COM90xxIO=m
# CONFIG_ARCNET_RIM_I is not set
# CONFIG_ARCNET_COM20020 is not set
CONFIG_PHYLIB=y

#
# MII PHY device drivers
#
CONFIG_MARVELL_PHY=m
CONFIG_DAVICOM_PHY=m
CONFIG_QSEMI_PHY=m
CONFIG_LXT_PHY=y
CONFIG_CICADA_PHY=m
CONFIG_VITESSE_PHY=m
# CONFIG_SMSC_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_ICPLUS_PHY is not set
CONFIG_REALTEK_PHY=m
CONFIG_NATIONAL_PHY=m
# CONFIG_STE10XP is not set
CONFIG_LSI_ET1011C_PHY=y
CONFIG_FIXED_PHY=y
CONFIG_MDIO_BITBANG=y
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
# CONFIG_HAPPYMEAL is not set
CONFIG_SUNGEM=m
CONFIG_CASSINI=y
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
CONFIG_TYPHOON=m
CONFIG_ENC28J60=y
CONFIG_ENC28J60_WRITEVERIFY=y
CONFIG_NET_TULIP=y
# CONFIG_DE2104X is not set
CONFIG_TULIP=y
CONFIG_TULIP_MWI=y
CONFIG_TULIP_MMIO=y
# CONFIG_TULIP_NAPI is not set
CONFIG_DE4X5=m
CONFIG_WINBOND_840=m
CONFIG_DM9102=y
CONFIG_ULI526X=m
CONFIG_PCMCIA_XIRCOM=m
CONFIG_HP100=y
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
CONFIG_NET_PCI=y
CONFIG_PCNET32=y
CONFIG_AMD8111_ETH=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_B44=y
CONFIG_B44_PCI_AUTOSELECT=y
CONFIG_B44_PCICORE_AUTOSELECT=y
CONFIG_B44_PCI=y
CONFIG_FORCEDETH=y
CONFIG_FORCEDETH_NAPI=y
CONFIG_E100=y
CONFIG_FEALNX=y
CONFIG_NATSEMI=y
# CONFIG_NE2K_PCI is not set
# CONFIG_8139CP is not set
CONFIG_8139TOO=y
CONFIG_8139TOO_PIO=y
CONFIG_8139TOO_TUNE_TWISTER=y
# CONFIG_8139TOO_8129 is not set
CONFIG_8139_OLD_RX_RESET=y
# CONFIG_R6040 is not set
# CONFIG_SIS900 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SMSC9420 is not set
CONFIG_SUNDANCE=m
CONFIG_SUNDANCE_MMIO=y
# CONFIG_TLAN is not set
CONFIG_VIA_RHINE=m
CONFIG_VIA_RHINE_MMIO=y
CONFIG_SC92031=y
CONFIG_NET_POCKET=y
CONFIG_ATP=m
CONFIG_DE600=y
# CONFIG_DE620 is not set
CONFIG_ATL2=m
CONFIG_NETDEV_1000=y
CONFIG_ACENIC=y
# CONFIG_ACENIC_OMIT_TIGON_I is not set
CONFIG_DL2K=m
CONFIG_E1000=m
CONFIG_E1000E=y
# CONFIG_IP1000 is not set
# CONFIG_IGB is not set
CONFIG_NS83820=m
CONFIG_HAMACHI=m
CONFIG_YELLOWFIN=m
CONFIG_R8169=y
CONFIG_R8169_VLAN=y
CONFIG_SIS190=m
CONFIG_SKGE=y
CONFIG_SKGE_DEBUG=y
CONFIG_SKY2=y
CONFIG_SKY2_DEBUG=y
CONFIG_VIA_VELOCITY=y
CONFIG_TIGON3=y
# CONFIG_BNX2 is not set
CONFIG_QLA3XXX=y
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
CONFIG_JME=m
CONFIG_NETDEV_10000=y
CONFIG_CHELSIO_T1=y
# CONFIG_CHELSIO_T1_1G is not set
CONFIG_CHELSIO_T3_DEPENDS=y
CONFIG_CHELSIO_T3=m
# CONFIG_ENIC is not set
# CONFIG_IXGBE is not set
# CONFIG_IXGB is not set
# CONFIG_S2IO is not set
# CONFIG_MYRI10GE is not set
CONFIG_NETXEN_NIC=m
CONFIG_NIU=y
# CONFIG_MLX4_EN is not set
# CONFIG_MLX4_CORE is not set
# CONFIG_TEHUTI is not set
# CONFIG_BNX2X is not set
CONFIG_QLGE=y
CONFIG_SFC=y
CONFIG_TR=y
CONFIG_IBMOL=m
CONFIG_3C359=y
# CONFIG_TMS380TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
CONFIG_WLAN_80211=y
CONFIG_PCMCIA_RAYCS=m
# CONFIG_LIBERTAS is not set
CONFIG_AIRO=y
CONFIG_HERMES=y
CONFIG_HERMES_CACHE_FW_ON_INIT=y
CONFIG_PLX_HERMES=y
CONFIG_TMD_HERMES=m
CONFIG_NORTEL_HERMES=y
# CONFIG_PCI_HERMES is not set
CONFIG_PCMCIA_HERMES=m
CONFIG_PCMCIA_SPECTRUM=m
# CONFIG_ATMEL is not set
CONFIG_AIRO_CS=m
CONFIG_PCMCIA_WL3501=m
# CONFIG_PRISM54 is not set
CONFIG_USB_ZD1201=m
# CONFIG_USB_NET_RNDIS_WLAN is not set
# CONFIG_IPW2100 is not set
CONFIG_IPW2200=y
# CONFIG_IPW2200_MONITOR is not set
CONFIG_IPW2200_QOS=y
CONFIG_IPW2200_DEBUG=y
CONFIG_LIBIPW=y
# CONFIG_LIBIPW_DEBUG is not set
# CONFIG_IWLWIFI_LEDS is not set
CONFIG_HOSTAP=y
CONFIG_HOSTAP_FIRMWARE=y
# CONFIG_HOSTAP_FIRMWARE_NVRAM is not set
CONFIG_HOSTAP_PLX=y
CONFIG_HOSTAP_PCI=y
# CONFIG_HOSTAP_CS is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
CONFIG_USB_PEGASUS=m
CONFIG_USB_RTL8150=y
# CONFIG_USB_USBNET is not set
CONFIG_NET_PCMCIA=y
# CONFIG_PCMCIA_3C589 is not set
CONFIG_PCMCIA_3C574=m
CONFIG_PCMCIA_FMVJ18X=m
CONFIG_PCMCIA_PCNET=m
CONFIG_PCMCIA_NMCLAN=m
CONFIG_PCMCIA_SMC91C92=m
CONFIG_PCMCIA_XIRC2PS=m
# CONFIG_PCMCIA_AXNET is not set
# CONFIG_PCMCIA_IBMTR is not set
# CONFIG_WAN is not set
CONFIG_FDDI=y
CONFIG_DEFXX=m
CONFIG_DEFXX_MMIO=y
CONFIG_SKFP=m
# CONFIG_HIPPI is not set
CONFIG_PLIP=y
CONFIG_PPP=m
CONFIG_PPP_MULTILINK=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=m
CONFIG_PPP_SYNC_TTY=m
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_MPPE=m
CONFIG_PPPOE=m
# CONFIG_PPPOL2TP is not set
CONFIG_SLIP=m
# CONFIG_SLIP_COMPRESSED is not set
CONFIG_SLHC=m
CONFIG_SLIP_SMART=y
CONFIG_SLIP_MODE_SLIP6=y
# CONFIG_NET_FC is not set
CONFIG_NETCONSOLE=y
# CONFIG_NETCONSOLE_DYNAMIC is not set
CONFIG_NETPOLL=y
# CONFIG_NETPOLL_TRAP is not set
CONFIG_NET_POLL_CONTROLLER=y
CONFIG_VIRTIO_NET=y
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
CONFIG_INPUT_EVBUG=m

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
CONFIG_KEYBOARD_LKKBD=y
CONFIG_KEYBOARD_XTKBD=y
# CONFIG_KEYBOARD_NEWTON is not set
CONFIG_KEYBOARD_STOWAWAY=y
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
CONFIG_INPUT_TABLET=y
CONFIG_TABLET_USB_ACECAD=m
CONFIG_TABLET_USB_AIPTEK=y
CONFIG_TABLET_USB_GTCO=m
CONFIG_TABLET_USB_KBTAB=y
# CONFIG_TABLET_USB_WACOM is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=m
CONFIG_INPUT_APANEL=m
# CONFIG_INPUT_ATI_REMOTE is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
CONFIG_INPUT_KEYSPAN_REMOTE=m
CONFIG_INPUT_POWERMATE=y
# CONFIG_INPUT_YEALINK is not set
CONFIG_INPUT_CM109=y
CONFIG_INPUT_UINPUT=m

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
CONFIG_SERIO_CT82C710=y
CONFIG_SERIO_PARKBD=y
CONFIG_SERIO_PCIPS2=m
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
CONFIG_GAMEPORT=y
CONFIG_GAMEPORT_NS558=m
CONFIG_GAMEPORT_L4=y
CONFIG_GAMEPORT_EMU10K1=y
CONFIG_GAMEPORT_FM801=y

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
# CONFIG_DEVKMEM is not set
# CONFIG_SERIAL_NONSTANDARD is not set
CONFIG_NOZOMI=y

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
# CONFIG_SERIAL_8250_CS is not set
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_CONSOLE_POLL=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
# CONFIG_PRINTER is not set
# CONFIG_PPDEV is not set
CONFIG_HVC_DRIVER=y
CONFIG_VIRTIO_CONSOLE=m
CONFIG_IPMI_HANDLER=y
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=y
# CONFIG_HW_RANDOM is not set
# CONFIG_NVRAM is not set
CONFIG_R3964=y
CONFIG_APPLICOM=m

#
# PCMCIA character devices
#
CONFIG_SYNCLINK_CS=m
CONFIG_CARDMAN_4000=m
CONFIG_CARDMAN_4040=m
CONFIG_IPWIRELESS=m
CONFIG_MWAVE=y
CONFIG_PC8736x_GPIO=y
CONFIG_NSC_GPIO=y
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=256
CONFIG_HANGCHECK_TIMER=y
CONFIG_TCG_TPM=y
CONFIG_TCG_NSC=m
# CONFIG_TCG_ATMEL is not set
CONFIG_TELCLOCK=m
CONFIG_DEVPORT=y
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
# CONFIG_I2C_CHARDEV is not set
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ALGOPCA=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
CONFIG_I2C_ALI1535=y
# CONFIG_I2C_ALI1563 is not set
CONFIG_I2C_ALI15X3=y
CONFIG_I2C_AMD756=y
# CONFIG_I2C_AMD756_S4882 is not set
CONFIG_I2C_AMD8111=m
CONFIG_I2C_I801=y
CONFIG_I2C_ISCH=y
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=y
# CONFIG_I2C_NFORCE2_S4985 is not set
CONFIG_I2C_SIS5595=y
CONFIG_I2C_SIS630=y
CONFIG_I2C_SIS96X=y
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
CONFIG_I2C_OCORES=y
CONFIG_I2C_SIMTEC=y

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT=y
CONFIG_I2C_PARPORT_LIGHT=m
CONFIG_I2C_TAOS_EVM=y
# CONFIG_I2C_TINY_USB is not set

#
# Graphics adapter I2C/DDC channel drivers
#
# CONFIG_I2C_VOODOO3 is not set

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_PCA_PLATFORM=m
CONFIG_I2C_STUB=m

#
# Miscellaneous I2C Chip support
#
CONFIG_DS1682=y
CONFIG_AT24=y
# CONFIG_SENSORS_EEPROM is not set
CONFIG_SENSORS_PCF8574=y
# CONFIG_PCF8575 is not set
# CONFIG_SENSORS_PCA9539 is not set
CONFIG_SENSORS_PCF8591=y
# CONFIG_SENSORS_MAX6875 is not set
# CONFIG_SENSORS_TSL2550 is not set
CONFIG_I2C_DEBUG_CORE=y
CONFIG_I2C_DEBUG_ALGO=y
CONFIG_I2C_DEBUG_BUS=y
CONFIG_I2C_DEBUG_CHIP=y
CONFIG_SPI=y
# CONFIG_SPI_DEBUG is not set
CONFIG_SPI_MASTER=y

#
# SPI Master Controller Drivers
#
CONFIG_SPI_BITBANG=y
CONFIG_SPI_BUTTERFLY=m
CONFIG_SPI_LM70_LLP=y

#
# SPI Protocol Masters
#
CONFIG_SPI_AT25=y
# CONFIG_SPI_SPIDEV is not set
CONFIG_SPI_TLE62X0=y
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
CONFIG_W1=m
CONFIG_W1_CON=y

#
# 1-wire Bus Masters
#
CONFIG_W1_MASTER_MATROX=m
CONFIG_W1_MASTER_DS2490=m
CONFIG_W1_MASTER_DS2482=m

#
# 1-wire Slaves
#
CONFIG_W1_SLAVE_THERM=m
CONFIG_W1_SLAVE_SMEM=m
# CONFIG_W1_SLAVE_DS2433 is not set
CONFIG_W1_SLAVE_DS2760=m
# CONFIG_W1_SLAVE_BQ27000 is not set
CONFIG_POWER_SUPPLY=y
CONFIG_POWER_SUPPLY_DEBUG=y
CONFIG_PDA_POWER=m
CONFIG_BATTERY_DS2760=m
CONFIG_BATTERY_BQ27x00=y
CONFIG_HWMON=m
CONFIG_HWMON_VID=m
CONFIG_SENSORS_ABITUGURU=m
CONFIG_SENSORS_ABITUGURU3=m
# CONFIG_SENSORS_AD7414 is not set
# CONFIG_SENSORS_AD7418 is not set
CONFIG_SENSORS_ADCXX=m
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
# CONFIG_SENSORS_ADM1029 is not set
CONFIG_SENSORS_ADM1031=m
CONFIG_SENSORS_ADM9240=m
CONFIG_SENSORS_ADT7462=m
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7473 is not set
CONFIG_SENSORS_K8TEMP=m
CONFIG_SENSORS_ASB100=m
CONFIG_SENSORS_ATXP1=m
CONFIG_SENSORS_DS1621=m
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
CONFIG_SENSORS_F71882FG=m
# CONFIG_SENSORS_F75375S is not set
CONFIG_SENSORS_FSCHER=m
# CONFIG_SENSORS_FSCPOS is not set
CONFIG_SENSORS_FSCHMD=m
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
# CONFIG_SENSORS_CORETEMP is not set
# CONFIG_SENSORS_IBMAEM is not set
# CONFIG_SENSORS_IBMPEX is not set
CONFIG_SENSORS_IT87=m
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM70 is not set
# CONFIG_SENSORS_LM75 is not set
CONFIG_SENSORS_LM77=m
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
CONFIG_SENSORS_LM87=m
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
# CONFIG_SENSORS_LM93 is not set
CONFIG_SENSORS_LTC4245=m
# CONFIG_SENSORS_MAX1111 is not set
CONFIG_SENSORS_MAX1619=m
CONFIG_SENSORS_MAX6650=m
CONFIG_SENSORS_PC87360=m
CONFIG_SENSORS_PC87427=m
# CONFIG_SENSORS_SIS5595 is not set
CONFIG_SENSORS_DME1737=m
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_ADS7828 is not set
CONFIG_SENSORS_THMC50=m
CONFIG_SENSORS_VIA686A=m
# CONFIG_SENSORS_VT1211 is not set
CONFIG_SENSORS_VT8231=m
# CONFIG_SENSORS_W83781D is not set
CONFIG_SENSORS_W83791D=m
# CONFIG_SENSORS_W83792D is not set
CONFIG_SENSORS_W83793=m
CONFIG_SENSORS_W83L785TS=m
CONFIG_SENSORS_W83L786NG=m
CONFIG_SENSORS_W83627HF=m
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_SENSORS_HDAPS is not set
CONFIG_SENSORS_APPLESMC=m
# CONFIG_HWMON_DEBUG_CHIP is not set
CONFIG_THERMAL=m
CONFIG_THERMAL_HWMON=y
CONFIG_WATCHDOG=y
# CONFIG_WATCHDOG_NOWAYOUT is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=y
CONFIG_ACQUIRE_WDT=m
# CONFIG_ADVANTECH_WDT is not set
CONFIG_ALIM1535_WDT=m
CONFIG_ALIM7101_WDT=m
CONFIG_SC520_WDT=y
# CONFIG_EUROTECH_WDT is not set
# CONFIG_IB700_WDT is not set
CONFIG_IBMASR=y
CONFIG_WAFER_WDT=y
# CONFIG_I6300ESB_WDT is not set
# CONFIG_ITCO_WDT is not set
CONFIG_IT8712F_WDT=y
CONFIG_IT87_WDT=m
CONFIG_HP_WATCHDOG=m
CONFIG_SC1200_WDT=y
CONFIG_PC87413_WDT=y
# CONFIG_60XX_WDT is not set
# CONFIG_SBC8360_WDT is not set
# CONFIG_CPU5_WDT is not set
CONFIG_SMSC_SCH311X_WDT=m
CONFIG_SMSC37B787_WDT=m
CONFIG_W83627HF_WDT=m
# CONFIG_W83697HF_WDT is not set
# CONFIG_W83697UG_WDT is not set
CONFIG_W83877F_WDT=m
CONFIG_W83977F_WDT=m
# CONFIG_MACHZ_WDT is not set
CONFIG_SBC_EPX_C3_WATCHDOG=y

#
# PCI-based Watchdog Cards
#
CONFIG_PCIPCWATCHDOG=y
# CONFIG_WDTPCI is not set

#
# USB-based Watchdog Cards
#
CONFIG_USBPCWATCHDOG=m
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
CONFIG_SSB=y
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
CONFIG_SSB_SILENT=y
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
# CONFIG_MFD_SM501 is not set
CONFIG_HTC_PASIC3=m
CONFIG_TWL4030_CORE=y
# CONFIG_MFD_TMIO is not set
# CONFIG_PMIC_DA903X is not set
CONFIG_MFD_WM8400=y
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_REGULATOR is not set

#
# Multimedia devices
#

#
# Multimedia core support
#
CONFIG_VIDEO_DEV=m
CONFIG_VIDEO_V4L2_COMMON=m
CONFIG_VIDEO_ALLOW_V4L1=y
CONFIG_VIDEO_V4L1_COMPAT=y
CONFIG_DVB_CORE=y
CONFIG_VIDEO_MEDIA=m

#
# Multimedia drivers
#
CONFIG_MEDIA_ATTACH=y
CONFIG_MEDIA_TUNER=m
CONFIG_MEDIA_TUNER_CUSTOMIZE=y
CONFIG_MEDIA_TUNER_SIMPLE=m
# CONFIG_MEDIA_TUNER_TDA8290 is not set
CONFIG_MEDIA_TUNER_TDA827X=m
CONFIG_MEDIA_TUNER_TDA18271=m
CONFIG_MEDIA_TUNER_TDA9887=m
# CONFIG_MEDIA_TUNER_TEA5761 is not set
# CONFIG_MEDIA_TUNER_TEA5767 is not set
# CONFIG_MEDIA_TUNER_MT20XX is not set
CONFIG_MEDIA_TUNER_MT2060=m
# CONFIG_MEDIA_TUNER_MT2266 is not set
# CONFIG_MEDIA_TUNER_MT2131 is not set
CONFIG_MEDIA_TUNER_QT1010=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
# CONFIG_MEDIA_TUNER_MXL5005S is not set
# CONFIG_MEDIA_TUNER_MXL5007T is not set
CONFIG_VIDEO_V4L2=m
CONFIG_VIDEO_V4L1=m
# CONFIG_VIDEO_CAPTURE_DRIVERS is not set
# CONFIG_RADIO_ADAPTERS is not set
CONFIG_DVB_DYNAMIC_MINORS=y
# CONFIG_DVB_CAPTURE_DRIVERS is not set
CONFIG_DAB=y
# CONFIG_USB_DABUSB is not set

#
# Graphics support
#
CONFIG_AGP=m
CONFIG_AGP_AMD64=m
CONFIG_AGP_INTEL=m
CONFIG_AGP_SIS=m
CONFIG_AGP_VIA=m
CONFIG_DRM=m
# CONFIG_DRM_TDFX is not set
# CONFIG_DRM_R128 is not set
CONFIG_DRM_RADEON=m
CONFIG_DRM_I810=m
# CONFIG_DRM_I830 is not set
# CONFIG_DRM_I915 is not set
# CONFIG_DRM_MGA is not set
# CONFIG_DRM_SIS is not set
CONFIG_DRM_VIA=m
CONFIG_DRM_SAVAGE=m
CONFIG_VGASTATE=y
CONFIG_VIDEO_OUTPUT_CONTROL=m
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_DDC=y
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
CONFIG_FB_SYS_FILLRECT=y
CONFIG_FB_SYS_COPYAREA=y
CONFIG_FB_SYS_IMAGEBLIT=y
CONFIG_FB_FOREIGN_ENDIAN=y
CONFIG_FB_BOTH_ENDIAN=y
# CONFIG_FB_BIG_ENDIAN is not set
# CONFIG_FB_LITTLE_ENDIAN is not set
CONFIG_FB_SYS_FOPS=y
CONFIG_FB_DEFERRED_IO=y
CONFIG_FB_HECUBA=y
CONFIG_FB_SVGALIB=y
# CONFIG_FB_MACMODES is not set
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
CONFIG_FB_PM2=y
CONFIG_FB_PM2_FIFO_DISCONNECT=y
CONFIG_FB_CYBER2000=y
CONFIG_FB_ARC=y
# CONFIG_FB_ASILIANT is not set
CONFIG_FB_IMSTT=y
# CONFIG_FB_VGA16 is not set
CONFIG_FB_UVESA=m
# CONFIG_FB_VESA is not set
CONFIG_FB_N411=y
CONFIG_FB_HGA=m
CONFIG_FB_HGA_ACCEL=y
CONFIG_FB_S1D13XXX=y
CONFIG_FB_NVIDIA=y
CONFIG_FB_NVIDIA_I2C=y
CONFIG_FB_NVIDIA_DEBUG=y
CONFIG_FB_NVIDIA_BACKLIGHT=y
# CONFIG_FB_RIVA is not set
CONFIG_FB_LE80578=m
# CONFIG_FB_CARILLO_RANCH is not set
# CONFIG_FB_INTEL is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
CONFIG_FB_ATY=y
CONFIG_FB_ATY_CT=y
CONFIG_FB_ATY_GENERIC_LCD=y
CONFIG_FB_ATY_GX=y
CONFIG_FB_ATY_BACKLIGHT=y
CONFIG_FB_S3=y
CONFIG_FB_SAVAGE=m
CONFIG_FB_SAVAGE_I2C=y
CONFIG_FB_SAVAGE_ACCEL=y
CONFIG_FB_SIS=y
# CONFIG_FB_SIS_300 is not set
# CONFIG_FB_SIS_315 is not set
# CONFIG_FB_VIA is not set
CONFIG_FB_NEOMAGIC=y
# CONFIG_FB_KYRO is not set
CONFIG_FB_3DFX=y
# CONFIG_FB_3DFX_ACCEL is not set
CONFIG_FB_VOODOO1=y
CONFIG_FB_VT8623=m
CONFIG_FB_TRIDENT=m
# CONFIG_FB_TRIDENT_ACCEL is not set
CONFIG_FB_ARK=y
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
CONFIG_FB_GEODE=y
CONFIG_FB_GEODE_LX=m
CONFIG_FB_GEODE_GX=y
CONFIG_FB_GEODE_GX1=y
# CONFIG_FB_TMIO is not set
# CONFIG_FB_VIRTUAL is not set
CONFIG_FB_METRONOME=m
# CONFIG_FB_MB862XX is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=y
# CONFIG_LCD_LTV350QV is not set
# CONFIG_LCD_ILI9320 is not set
CONFIG_LCD_TDO24M=y
# CONFIG_LCD_VGG2432A4 is not set
CONFIG_LCD_PLATFORM=m
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_GENERIC is not set
# CONFIG_BACKLIGHT_PROGEAR is not set
CONFIG_BACKLIGHT_CARILLO_RANCH=m
CONFIG_BACKLIGHT_MBP_NVIDIA=m
CONFIG_BACKLIGHT_SAHARA=y

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
CONFIG_DUMMY_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE is not set
CONFIG_FONT_8x16=y
# CONFIG_LOGO is not set
CONFIG_SOUND=m
CONFIG_SOUND_OSS_CORE=y
CONFIG_SND=m
CONFIG_SND_TIMER=m
CONFIG_SND_PCM=m
CONFIG_SND_HWDEP=m
CONFIG_SND_RAWMIDI=m
CONFIG_SND_SEQUENCER=m
# CONFIG_SND_SEQ_DUMMY is not set
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=m
# CONFIG_SND_PCM_OSS is not set
# CONFIG_SND_SEQUENCER_OSS is not set
# CONFIG_SND_HRTIMER is not set
# CONFIG_SND_DYNAMIC_MINORS is not set
CONFIG_SND_SUPPORT_OLD_API=y
CONFIG_SND_VERBOSE_PROCFS=y
CONFIG_SND_VERBOSE_PRINTK=y
CONFIG_SND_DEBUG=y
CONFIG_SND_DEBUG_VERBOSE=y
# CONFIG_SND_PCM_XRUN_DEBUG is not set
CONFIG_SND_MPU401_UART=m
CONFIG_SND_VX_LIB=m
CONFIG_SND_DRIVERS=y
# CONFIG_SND_PCSP is not set
# CONFIG_SND_DUMMY is not set
CONFIG_SND_VIRMIDI=m
CONFIG_SND_MTPAV=m
CONFIG_SND_MTS64=m
CONFIG_SND_SERIAL_U16550=m
CONFIG_SND_MPU401=m
CONFIG_SND_PORTMAN2X4=m
# CONFIG_SND_PCI is not set
# CONFIG_SND_SPI is not set
CONFIG_SND_USB=y
CONFIG_SND_USB_AUDIO=m
CONFIG_SND_USB_USX2Y=m
# CONFIG_SND_USB_CAIAQ is not set
CONFIG_SND_USB_US122L=m
CONFIG_SND_PCMCIA=y
CONFIG_SND_VXPOCKET=m
CONFIG_SND_PDAUDIOCF=m
# CONFIG_SND_SOC is not set
# CONFIG_SOUND_PRIME is not set
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
# CONFIG_HID_DEBUG is not set
# CONFIG_HIDRAW is not set

#
# USB Input Devices
#
CONFIG_USB_HID=y
CONFIG_HID_PID=y
# CONFIG_USB_HIDDEV is not set

#
# Special HID drivers
#
CONFIG_HID_COMPAT=y
# CONFIG_HID_A4TECH is not set
CONFIG_HID_APPLE=y
CONFIG_HID_BELKIN=y
CONFIG_HID_CHERRY=m
# CONFIG_HID_CHICONY is not set
# CONFIG_HID_CYPRESS is not set
CONFIG_HID_EZKEY=y
CONFIG_HID_GYRATION=y
CONFIG_HID_LOGITECH=m
CONFIG_LOGITECH_FF=y
CONFIG_LOGIRUMBLEPAD2_FF=y
CONFIG_HID_MICROSOFT=m
CONFIG_HID_MONTEREY=y
# CONFIG_HID_NTRIG is not set
CONFIG_HID_PANTHERLORD=m
CONFIG_PANTHERLORD_FF=y
CONFIG_HID_PETALYNX=y
CONFIG_HID_SAMSUNG=y
CONFIG_HID_SONY=m
# CONFIG_HID_SUNPLUS is not set
CONFIG_GREENASIA_FF=y
CONFIG_HID_TOPSEED=m
CONFIG_THRUSTMASTER_FF=m
CONFIG_ZEROPLUS_FF=m
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
CONFIG_USB_DEVICE_CLASS=y
CONFIG_USB_DYNAMIC_MINORS=y
CONFIG_USB_SUSPEND=y
# CONFIG_USB_OTG is not set
CONFIG_USB_OTG_WHITELIST=y
# CONFIG_USB_OTG_BLACKLIST_HUB is not set
CONFIG_USB_MON=m
CONFIG_USB_WUSB=y
CONFIG_USB_WUSB_CBAF=m
CONFIG_USB_WUSB_CBAF_DEBUG=y

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_OXU210HP_HCD=m
CONFIG_USB_ISP116X_HCD=y
# CONFIG_USB_ISP1760_HCD is not set
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_OHCI_HCD_SSB=y
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_U132_HCD is not set
CONFIG_USB_SL811_HCD=y
CONFIG_USB_SL811_CS=m
CONFIG_USB_R8A66597_HCD=y
# CONFIG_USB_WHCI_HCD is not set
CONFIG_USB_HWA_HCD=y

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
CONFIG_USB_PRINTER=y
# CONFIG_USB_WDM is not set
CONFIG_USB_TMC=m

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may also be needed;
#

#
# see USB_STORAGE Help for more information
#
CONFIG_USB_STORAGE=m
CONFIG_USB_STORAGE_DEBUG=y
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_USBAT=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_USB_STORAGE_ALAUDA=y
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
CONFIG_USB_STORAGE_CYPRESS_ATACB=y
CONFIG_USB_LIBUSUAL=y

#
# USB Imaging devices
#
CONFIG_USB_MDC800=y
CONFIG_USB_MICROTEK=m

#
# USB port drivers
#
CONFIG_USB_USS720=y
CONFIG_USB_SERIAL=m
CONFIG_USB_EZUSB=y
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_AIRCABLE is not set
# CONFIG_USB_SERIAL_ARK3116 is not set
CONFIG_USB_SERIAL_BELKIN=m
CONFIG_USB_SERIAL_CH341=m
CONFIG_USB_SERIAL_WHITEHEAT=m
# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
CONFIG_USB_SERIAL_CP2101=m
# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
# CONFIG_USB_SERIAL_EMPEG is not set
# CONFIG_USB_SERIAL_FTDI_SIO is not set
CONFIG_USB_SERIAL_FUNSOFT=m
CONFIG_USB_SERIAL_VISOR=m
# CONFIG_USB_SERIAL_IPAQ is not set
CONFIG_USB_SERIAL_IR=m
# CONFIG_USB_SERIAL_EDGEPORT is not set
CONFIG_USB_SERIAL_EDGEPORT_TI=m
# CONFIG_USB_SERIAL_GARMIN is not set
# CONFIG_USB_SERIAL_IPW is not set
CONFIG_USB_SERIAL_IUU=m
CONFIG_USB_SERIAL_KEYSPAN_PDA=m
# CONFIG_USB_SERIAL_KEYSPAN is not set
CONFIG_USB_SERIAL_KLSI=m
CONFIG_USB_SERIAL_KOBIL_SCT=m
CONFIG_USB_SERIAL_MCT_U232=m
# CONFIG_USB_SERIAL_MOS7720 is not set
CONFIG_USB_SERIAL_MOS7840=m
CONFIG_USB_SERIAL_MOTOROLA=m
CONFIG_USB_SERIAL_NAVMAN=m
# CONFIG_USB_SERIAL_PL2303 is not set
CONFIG_USB_SERIAL_OTI6858=m
CONFIG_USB_SERIAL_SPCP8X5=m
CONFIG_USB_SERIAL_HP4X=m
CONFIG_USB_SERIAL_SAFE=m
# CONFIG_USB_SERIAL_SAFE_PADDED is not set
# CONFIG_USB_SERIAL_SIEMENS_MPI is not set
# CONFIG_USB_SERIAL_SIERRAWIRELESS is not set
CONFIG_USB_SERIAL_TI=m
# CONFIG_USB_SERIAL_CYBERJACK is not set
CONFIG_USB_SERIAL_XIRCOM=m
CONFIG_USB_SERIAL_OPTION=m
CONFIG_USB_SERIAL_OMNINET=m
CONFIG_USB_SERIAL_OPTICON=m
CONFIG_USB_SERIAL_DEBUG=m

#
# USB Miscellaneous drivers
#
CONFIG_USB_EMI62=m
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
CONFIG_USB_SEVSEG=m
CONFIG_USB_RIO500=y
CONFIG_USB_LEGOTOWER=y
CONFIG_USB_LCD=m
CONFIG_USB_BERRY_CHARGE=m
CONFIG_USB_LED=y
CONFIG_USB_CYPRESS_CY7C63=m
# CONFIG_USB_CYTHERM is not set
CONFIG_USB_PHIDGET=y
CONFIG_USB_PHIDGETKIT=m
CONFIG_USB_PHIDGETMOTORCONTROL=m
CONFIG_USB_PHIDGETSERVO=m
CONFIG_USB_IDMOUSE=y
CONFIG_USB_FTDI_ELAN=y
# CONFIG_USB_APPLEDISPLAY is not set
CONFIG_USB_SISUSBVGA=m
CONFIG_USB_SISUSBVGA_CON=y
CONFIG_USB_LD=m
CONFIG_USB_TRANCEVIBRATOR=y
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_ISIGHTFW is not set
CONFIG_USB_VST=y
# CONFIG_USB_GADGET is not set

#
# OTG and related infrastructure
#
CONFIG_USB_OTG_UTILS=y
CONFIG_TWL4030_USB=y
CONFIG_UWB=y
CONFIG_UWB_HWA=y
CONFIG_UWB_WHCI=m
CONFIG_UWB_WLP=m
CONFIG_UWB_I1480U=m
# CONFIG_UWB_I1480U_WLP is not set
CONFIG_MMC=m
CONFIG_MMC_DEBUG=y
CONFIG_MMC_UNSAFE_RESUME=y

#
# MMC/SD/SDIO Card Drivers
#
# CONFIG_MMC_BLOCK is not set
CONFIG_SDIO_UART=m
# CONFIG_MMC_TEST is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
CONFIG_MMC_SDHCI=m
CONFIG_MMC_SDHCI_PCI=m
CONFIG_MMC_RICOH_MMC=m
# CONFIG_MMC_WBSD is not set
CONFIG_MMC_TIFM_SD=m
CONFIG_MMC_SPI=m
CONFIG_MMC_SDRICOH_CS=m
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
# CONFIG_LEDS_ALIX2 is not set
# CONFIG_LEDS_PCA9532 is not set
CONFIG_LEDS_PCA955X=y

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
CONFIG_LEDS_TRIGGER_BACKLIGHT=m
CONFIG_LEDS_TRIGGER_DEFAULT_ON=m
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
CONFIG_EDAC=y

#
# Reporting subsystems
#
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_MM_EDAC=m
CONFIG_EDAC_E752X=m
CONFIG_EDAC_I82975X=m
CONFIG_EDAC_I3000=m
# CONFIG_EDAC_X38 is not set
# CONFIG_EDAC_I5400 is not set
# CONFIG_EDAC_I5000 is not set
CONFIG_EDAC_I5100=m
CONFIG_RTC_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
CONFIG_RTC_DEBUG=y

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
# CONFIG_RTC_INTF_DEV is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
CONFIG_RTC_DRV_DS1307=m
CONFIG_RTC_DRV_DS1374=m
CONFIG_RTC_DRV_DS1672=y
# CONFIG_RTC_DRV_MAX6900 is not set
CONFIG_RTC_DRV_RS5C372=m
# CONFIG_RTC_DRV_ISL1208 is not set
# CONFIG_RTC_DRV_X1205 is not set
# CONFIG_RTC_DRV_PCF8563 is not set
CONFIG_RTC_DRV_PCF8583=y
CONFIG_RTC_DRV_M41T80=y
CONFIG_RTC_DRV_M41T80_WDT=y
# CONFIG_RTC_DRV_TWL4030 is not set
CONFIG_RTC_DRV_S35390A=y
CONFIG_RTC_DRV_FM3130=y
CONFIG_RTC_DRV_RX8581=m

#
# SPI RTC drivers
#
CONFIG_RTC_DRV_M41T94=m
CONFIG_RTC_DRV_DS1305=y
# CONFIG_RTC_DRV_DS1390 is not set
# CONFIG_RTC_DRV_MAX6902 is not set
CONFIG_RTC_DRV_R9701=y
CONFIG_RTC_DRV_RS5C348=m
CONFIG_RTC_DRV_DS3234=y

#
# Platform RTC drivers
#
# CONFIG_RTC_DRV_CMOS is not set
# CONFIG_RTC_DRV_DS1286 is not set
# CONFIG_RTC_DRV_DS1511 is not set
CONFIG_RTC_DRV_DS1553=y
CONFIG_RTC_DRV_DS1742=y
CONFIG_RTC_DRV_STK17TA8=y
CONFIG_RTC_DRV_M48T86=y
# CONFIG_RTC_DRV_M48T35 is not set
# CONFIG_RTC_DRV_M48T59 is not set
CONFIG_RTC_DRV_BQ4802=y
CONFIG_RTC_DRV_V3020=m

#
# on-CPU RTC drivers
#
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y

#
# Firmware Drivers
#
CONFIG_EDD=y
CONFIG_EDD_OFF=y
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_DELL_RBU=m
CONFIG_DCDBAS=y
# CONFIG_ISCSI_IBFT_FIND is not set

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
# CONFIG_EXT2_FS_SECURITY is not set
CONFIG_EXT2_FS_XIP=y
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_EXT4_FS=m
# CONFIG_EXT4DEV_COMPAT is not set
CONFIG_EXT4_FS_XATTR=y
# CONFIG_EXT4_FS_POSIX_ACL is not set
# CONFIG_EXT4_FS_SECURITY is not set
CONFIG_FS_XIP=y
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_JBD2=m
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=m
CONFIG_REISERFS_CHECK=y
CONFIG_REISERFS_PROC_INFO=y
# CONFIG_REISERFS_FS_XATTR is not set
CONFIG_JFS_FS=y
CONFIG_JFS_POSIX_ACL=y
# CONFIG_JFS_SECURITY is not set
# CONFIG_JFS_DEBUG is not set
CONFIG_JFS_STATISTICS=y
CONFIG_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
# CONFIG_XFS_FS is not set
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=m
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
# CONFIG_OCFS2_FS_STATS is not set
CONFIG_OCFS2_DEBUG_MASKLOG=y
# CONFIG_OCFS2_DEBUG_FS is not set
CONFIG_OCFS2_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS is not set
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
CONFIG_QUOTA_TREE=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
# CONFIG_AUTOFS_FS is not set
CONFIG_AUTOFS4_FS=m
# CONFIG_FUSE_FS is not set
CONFIG_GENERIC_ACL=y

#
# CD-ROM/DVD Filesystems
#
# CONFIG_ISO9660_FS is not set
CONFIG_UDF_FS=m
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
# CONFIG_VFAT_FS is not set
CONFIG_FAT_DEFAULT_CODEPAGE=437
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
# CONFIG_PROC_SYSCTL is not set
# CONFIG_PROC_PAGE_MONITOR is not set
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m
# CONFIG_MISC_FILESYSTEMS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
CONFIG_NFS_V4=y
# CONFIG_ROOT_NFS is not set
CONFIG_NFSD=y
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=y
CONFIG_SUNRPC_REGISTER_V4=y
CONFIG_RPCSEC_GSS_KRB5=y
# CONFIG_RPCSEC_GSS_SPKM3 is not set
# CONFIG_SMB_FS is not set
CONFIG_CIFS=y
# CONFIG_CIFS_STATS is not set
CONFIG_CIFS_WEAK_PW_HASH=y
# CONFIG_CIFS_UPCALL is not set
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
CONFIG_CIFS_DEBUG2=y
CONFIG_CIFS_EXPERIMENTAL=y
CONFIG_CIFS_DFS_UPCALL=y
# CONFIG_NCP_FS is not set
CONFIG_CODA_FS=y
# CONFIG_AFS_FS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
CONFIG_ACORN_PARTITION=y
CONFIG_ACORN_PARTITION_CUMANA=y
CONFIG_ACORN_PARTITION_EESOX=y
CONFIG_ACORN_PARTITION_ICS=y
# CONFIG_ACORN_PARTITION_ADFS is not set
CONFIG_ACORN_PARTITION_POWERTEC=y
CONFIG_ACORN_PARTITION_RISCIX=y
CONFIG_OSF_PARTITION=y
# CONFIG_AMIGA_PARTITION is not set
CONFIG_ATARI_PARTITION=y
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
# CONFIG_MINIX_SUBPARTITION is not set
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
CONFIG_LDM_PARTITION=y
# CONFIG_LDM_DEBUG is not set
CONFIG_SGI_PARTITION=y
CONFIG_ULTRIX_PARTITION=y
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
CONFIG_SYSV68_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_737=m
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=y
CONFIG_NLS_CODEPAGE_852=m
# CONFIG_NLS_CODEPAGE_855 is not set
CONFIG_NLS_CODEPAGE_857=m
# CONFIG_NLS_CODEPAGE_860 is not set
CONFIG_NLS_CODEPAGE_861=y
CONFIG_NLS_CODEPAGE_862=y
# CONFIG_NLS_CODEPAGE_863 is not set
CONFIG_NLS_CODEPAGE_864=y
CONFIG_NLS_CODEPAGE_865=y
CONFIG_NLS_CODEPAGE_866=y
# CONFIG_NLS_CODEPAGE_869 is not set
CONFIG_NLS_CODEPAGE_936=y
# CONFIG_NLS_CODEPAGE_950 is not set
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=y
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
CONFIG_NLS_CODEPAGE_1250=m
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_2=y
# CONFIG_NLS_ISO8859_3 is not set
CONFIG_NLS_ISO8859_4=y
CONFIG_NLS_ISO8859_5=m
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
CONFIG_NLS_ISO8859_9=y
CONFIG_NLS_ISO8859_13=m
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=y
CONFIG_DLM=m
CONFIG_DLM_DEBUG=y

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_PRINTK_TIME=y
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
CONFIG_UNUSED_SYMBOLS=y
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
# CONFIG_DETECT_SOFTLOCKUP is not set
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
# CONFIG_TIMER_STATS is not set
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_SELFTEST=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_PI_LIST=y
CONFIG_RT_MUTEX_TESTER=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
# CONFIG_PROVE_LOCKING is not set
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
# CONFIG_DEBUG_INFO is not set
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VIRTUAL is not set
CONFIG_DEBUG_WRITECOUNT=y
# CONFIG_DEBUG_MEMORY_INIT is not set
# CONFIG_DEBUG_LIST is not set
CONFIG_DEBUG_SG=y
CONFIG_DEBUG_NOTIFIERS=y
CONFIG_FRAME_POINTER=y
CONFIG_BOOT_PRINTK_DELAY=y
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_CPU_STALL_DETECTOR=y
CONFIG_BACKTRACE_SELF_TEST=y
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
CONFIG_FAULT_INJECTION=y
# CONFIG_FAILSLAB is not set
# CONFIG_FAIL_PAGE_ALLOC is not set
# CONFIG_FAIL_MAKE_REQUEST is not set
# CONFIG_FAIL_IO_TIMEOUT is not set
CONFIG_FAULT_INJECTION_DEBUG_FS=y
CONFIG_LATENCYTOP=y
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_HW_BRANCH_TRACER=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_RING_BUFFER=y
CONFIG_TRACING=y

#
# Tracers
#
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_IRQSOFF_TRACER=y
CONFIG_SYSPROF_TRACER=y
# CONFIG_SCHED_TRACER is not set
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_BOOT_TRACER=y
# CONFIG_TRACE_BRANCH_PROFILING is not set
CONFIG_POWER_TRACER=y
CONFIG_STACK_TRACER=y
CONFIG_HW_BRANCH_TRACER=y
# CONFIG_DYNAMIC_FTRACE is not set
CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
# CONFIG_FIREWIRE_OHCI_REMOTE_DMA is not set
# CONFIG_DYNAMIC_PRINTK_DEBUG is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
CONFIG_KGDB_TESTS=y
# CONFIG_KGDB_TESTS_ON_BOOT is not set
CONFIG_STRICT_DEVMEM=y
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
# CONFIG_DEBUG_STACKOVERFLOW is not set
CONFIG_DEBUG_STACK_USAGE=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_DEBUG_PER_CPU_MAPS=y
# CONFIG_X86_PTDUMP is not set
# CONFIG_DEBUG_RODATA is not set
CONFIG_DEBUG_NX_TEST=m
# CONFIG_MMIOTRACE is not set
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
CONFIG_DEBUG_BOOT_PARAMS=y
CONFIG_CPA_DEBUG=y
CONFIG_OPTIMIZE_INLINING=y

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_DEBUG_PROC_KEYS is not set
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_NETWORK_XFRM=y
# CONFIG_SECURITY_PATH is not set
CONFIG_SECURITY_FILE_CAPABILITIES=y
# CONFIG_SECURITY_ROOTPLUG is not set
CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR=0
# CONFIG_SECURITY_SELINUX is not set
# CONFIG_SECURITY_SMACK is not set
CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_FIPS=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_AUTHENC=y
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=y
CONFIG_CRYPTO_GCM=m
CONFIG_CRYPTO_SEQIV=y

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=y
CONFIG_CRYPTO_PCBC=y
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_XCBC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32C_INTEL is not set
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=y
CONFIG_CRYPTO_RMD128=m
CONFIG_CRYPTO_RMD160=y
# CONFIG_CRYPTO_RMD256 is not set
CONFIG_CRYPTO_RMD320=y
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=m
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_TGR192 is not set
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_X86_64 is not set
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=y
CONFIG_CRYPTO_BLOWFISH=y
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAST5 is not set
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_FCRYPT=y
# CONFIG_CRYPTO_KHAZAD is not set
CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SALSA20_X86_64=y
CONFIG_CRYPTO_SEED=y
CONFIG_CRYPTO_SERPENT=y
# CONFIG_CRYPTO_TEA is not set
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
# CONFIG_CRYPTO_TWOFISH_X86_64 is not set

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_LZO=y

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=y
CONFIG_CRYPTO_HW=y
CONFIG_CRYPTO_DEV_HIFN_795X=y
# CONFIG_CRYPTO_DEV_HIFN_795X_RNG is not set
CONFIG_HAVE_KVM=y
CONFIG_VIRTUALIZATION=y
# CONFIG_KVM is not set
CONFIG_VIRTIO=y
CONFIG_VIRTIO_RING=y
CONFIG_VIRTIO_PCI=y
# CONFIG_VIRTIO_BALLOON is not set

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=m
CONFIG_CRC_T10DIF=m
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
CONFIG_CRC7=m
CONFIG_LIBCRC32C=y
CONFIG_ZLIB_INFLATE=m
CONFIG_ZLIB_DEFLATE=m
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_PLIST=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_CHECK_SIGNATURE=y
CONFIG_CPUMASK_OFFSTACK=y

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 13:32                       ` Ingo Molnar
@ 2009-01-15 13:33                         ` Ingo Molnar
  2009-01-15 13:39                         ` Ingo Molnar
  1 sibling, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 13:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh


* Ingo Molnar <mingo@elte.hu> wrote:

> FYI, -tip testing found the following bug with your percpu stuff:
> 
> There's an early exception during bootup, on 64-bit x86:
> 
>   PANIC: early exception 0e rip 10:ffffffff80276855: error ? cr2 6688
> 
>  - gcc version 4.3.2 20081007 (Red Hat 4.3.2-6) (GCC) 
>  - binutils-2.18.50.0.6-2.x86_64
> 
> config attached. You can find the disassembly of lock_release_holdtime() 
> below - that's where it crashed:

the tree i tested it with was:

  d5fd351: x86: misc clean up after the percpu update

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-15 13:23                       ` [PATCH] percpu: " Tejun Heo
@ 2009-01-15 13:36                         ` Ingo Molnar
  2009-01-15 17:30                         ` Andrew Morton
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 13:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty, travis,
	linux-kernel, akpm, steiner, hugh


* Tejun Heo <tj@kernel.org> wrote:

> Okay, here's the updated patch.  It's on top of x86/percpu[1].  You
> can pull from... (or just take the mail, I don't have much problem
> with rebasing, so whatever suits you better works for me).
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu
> 
> [1] d5fd35177b763186a3f6288624094ee402f0a7e3

Pulled into tip/x86/percpu, thanks Tejun!

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 13:32                       ` Ingo Molnar
  2009-01-15 13:33                         ` Ingo Molnar
@ 2009-01-15 13:39                         ` Ingo Molnar
  2009-01-15 21:54                           ` Tejun Heo
  2009-01-15 22:34                           ` [patch] add optimized generic percpu accessors Ingo Molnar
  1 sibling, 2 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 13:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh


* Ingo Molnar <mingo@elte.hu> wrote:

> FYI, -tip testing found the following bug with your percpu stuff:
> 
> There's an early exception during bootup, on 64-bit x86:
> 
>   PANIC: early exception 0e rip 10:ffffffff80276855: error ? cr2 6688
> 
>  - gcc version 4.3.2 20081007 (Red Hat 4.3.2-6) (GCC) 
>  - binutils-2.18.50.0.6-2.x86_64
> 
> config attached. You can find the disassembly of lock_release_holdtime() 
> below - that's where it crashed:

Gut feeling: we must have gotten a window where the PDA is not right. Note 
how it crashes in lock_release_holdtime() - that is CONFIG_LOCK_STAT 
instrumentation - very lowlevel and pervasive. Function tracing is also 
enabled although it should be inactive at the early stages.

So i'd take a good look at the PDA setup portion of the boot code, and see 
whether some spinlock acquire/release can slip in while the 
PDA/percpu-area is not reliable.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 13:04                   ` Tejun Heo
  2009-01-15 13:07                     ` Ingo Molnar
@ 2009-01-15 13:59                     ` roel kluin
  2009-01-15 21:51                       ` Tejun Heo
  1 sibling, 1 reply; 109+ messages in thread
From: roel kluin @ 2009-01-15 13:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

> +#ifndef percpu_read
> +# define percpu_read(var)                                              \
> +  ({                                                                   \
> +       typeof(per_cpu_var(var)) __tmp_var__;                           \
> +       __tmp_var__ = get_cpu_var(var);                                 \
> +       put_cpu_var(var);                                               \
> +       __tmp_var__;                                                    \
> +  })
> +#endif

I'm sorry for your eyes, but since var occurs twice, isn't it better to do:

# define percpu_read(var)                                              \
  ({                                                                   \
       typeof(var) __pcpu_read_var__ = var;
               \
       typeof(per_cpu_var(__pcpu_read_var__)) __tmp_var__;                    \
       __tmp_var__ = get_cpu_var(__pcpu_read_var__);                           \
       put_cpu_var(__pcpu_read_var__);                                         \
       __tmp_var__;                                                    \
  })

> +
> +#define __percpu_generic_to_op(var, val, op)                           \
> +do {                                                                   \
> +       get_cpu_var(var) op val;                                        \
> +       put_cpu_var(var);                                               \
> +} while (0)

and:

#define __percpu_generic_to_op(var, val, op)                           \
do {                                                                   \
       typeof(var) __pcpu_gto_var__ = var;
              \
       get_cpu_var(__pcpu_gto_var__) op val;
             \
       put_cpu_var(__pcpu_gto_var__);
             \
} while (0)

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-15 13:23                       ` [PATCH] percpu: " Tejun Heo
  2009-01-15 13:36                         ` Ingo Molnar
@ 2009-01-15 17:30                         ` Andrew Morton
  2009-01-15 18:02                           ` Ingo Molnar
  2009-01-15 18:46                         ` Ingo Molnar
  2009-01-31 10:36                         ` Jeremy Fitzhardinge
  3 siblings, 1 reply; 109+ messages in thread
From: Andrew Morton @ 2009-01-15 17:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, steiner, hugh

On Thu, 15 Jan 2009 22:23:19 +0900 Tejun Heo <tj@kernel.org> wrote:

> --- a/include/asm-generic/percpu.h
> +++ b/include/asm-generic/percpu.h
> @@ -80,4 +80,56 @@ extern void setup_per_cpu_areas(void);
>  #define DECLARE_PER_CPU(type, name) extern PER_CPU_ATTRIBUTES \
>  					__typeof__(type) per_cpu_var(name)
>  
> +/*
> + * Optional methods for optimized non-lvalue per-cpu variable access.
> + *
> + * @var can be a percpu variable or a field of it and its size should
> + * equal char, int or long.  percpu_read() evaluates to a lvalue and
> + * all others to void.
> + *
> + * These operations are guaranteed to be atomic w.r.t. preemption.
> + * The generic versions use plain get/put_cpu_var().  Archs are
> + * encouraged to implement single-instruction alternatives which don't
> + * require preemption protection.
> + */
> +#ifndef percpu_read
> +# define percpu_read(var)						\
> +  ({									\
> +	typeof(per_cpu_var(var)) __tmp_var__;				\
> +	__tmp_var__ = get_cpu_var(var);					\
> +	put_cpu_var(var);						\
> +	__tmp_var__;							\
> +  })
> +#endif

I wonder if the preempt_disable()/preempt_enable() in here actually
does anything useful on any architecture.

> +#define __percpu_generic_to_op(var, val, op)				\
> +do {									\
> +	get_cpu_var(var) op val;					\
> +	put_cpu_var(var);						\
> +} while (0)

We'll need it here.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-15 17:30                         ` Andrew Morton
@ 2009-01-15 18:02                           ` Ingo Molnar
  2009-01-15 18:34                             ` Andrew Morton
  0 siblings, 1 reply; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 18:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tejun Heo, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, steiner, hugh


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 15 Jan 2009 22:23:19 +0900 Tejun Heo <tj@kernel.org> wrote:
> 
> > --- a/include/asm-generic/percpu.h
> > +++ b/include/asm-generic/percpu.h
> > @@ -80,4 +80,56 @@ extern void setup_per_cpu_areas(void);
> >  #define DECLARE_PER_CPU(type, name) extern PER_CPU_ATTRIBUTES \
> >  					__typeof__(type) per_cpu_var(name)
> >  
> > +/*
> > + * Optional methods for optimized non-lvalue per-cpu variable access.
> > + *
> > + * @var can be a percpu variable or a field of it and its size should
> > + * equal char, int or long.  percpu_read() evaluates to a lvalue and
> > + * all others to void.
> > + *
> > + * These operations are guaranteed to be atomic w.r.t. preemption.
> > + * The generic versions use plain get/put_cpu_var().  Archs are
> > + * encouraged to implement single-instruction alternatives which don't
> > + * require preemption protection.
> > + */
> > +#ifndef percpu_read
> > +# define percpu_read(var)						\
> > +  ({									\
> > +	typeof(per_cpu_var(var)) __tmp_var__;				\
> > +	__tmp_var__ = get_cpu_var(var);					\
> > +	put_cpu_var(var);						\
> > +	__tmp_var__;							\
> > +  })
> > +#endif
> 
> I wonder if the preempt_disable()/preempt_enable() in here actually
> does anything useful on any architecture.

Provides "this is IRQ safe" and "this is preempt safe" semantics.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-15 18:02                           ` Ingo Molnar
@ 2009-01-15 18:34                             ` Andrew Morton
  2009-01-15 18:39                               ` Ingo Molnar
  2009-01-16 14:10                               ` Mark Lord
  0 siblings, 2 replies; 109+ messages in thread
From: Andrew Morton @ 2009-01-15 18:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tejun Heo, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, steiner, hugh

On Thu, 15 Jan 2009 19:02:59 +0100 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Thu, 15 Jan 2009 22:23:19 +0900 Tejun Heo <tj@kernel.org> wrote:
> > 
> > > --- a/include/asm-generic/percpu.h
> > > +++ b/include/asm-generic/percpu.h
> > > @@ -80,4 +80,56 @@ extern void setup_per_cpu_areas(void);
> > >  #define DECLARE_PER_CPU(type, name) extern PER_CPU_ATTRIBUTES \
> > >  					__typeof__(type) per_cpu_var(name)
> > >  
> > > +/*
> > > + * Optional methods for optimized non-lvalue per-cpu variable access.
> > > + *
> > > + * @var can be a percpu variable or a field of it and its size should
> > > + * equal char, int or long.  percpu_read() evaluates to a lvalue and
> > > + * all others to void.
> > > + *
> > > + * These operations are guaranteed to be atomic w.r.t. preemption.
> > > + * The generic versions use plain get/put_cpu_var().  Archs are
> > > + * encouraged to implement single-instruction alternatives which don't
> > > + * require preemption protection.
> > > + */
> > > +#ifndef percpu_read
> > > +# define percpu_read(var)						\
> > > +  ({									\
> > > +	typeof(per_cpu_var(var)) __tmp_var__;				\
> > > +	__tmp_var__ = get_cpu_var(var);					\
> > > +	put_cpu_var(var);						\
> > > +	__tmp_var__;							\
> > > +  })
> > > +#endif
> > 
> > I wonder if the preempt_disable()/preempt_enable() in here actually
> > does anything useful on any architecture.
> 
> Provides "this is IRQ safe"

?

> and "this is preempt safe" semantics.

Of course.  But do any architectures actually _need_ that for a single
read?

Maybe.  And if so, they can interpose their arch-specific
implementation.  But if the generic version is optimal for them, they
wouldn't need to..



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-15 18:34                             ` Andrew Morton
@ 2009-01-15 18:39                               ` Ingo Molnar
  2009-01-15 21:53                                 ` Tejun Heo
  2009-01-16  0:12                                 ` Herbert Xu
  2009-01-16 14:10                               ` Mark Lord
  1 sibling, 2 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 18:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tejun Heo, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, steiner, hugh


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 15 Jan 2009 19:02:59 +0100 Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > On Thu, 15 Jan 2009 22:23:19 +0900 Tejun Heo <tj@kernel.org> wrote:
> > > 
> > > > --- a/include/asm-generic/percpu.h
> > > > +++ b/include/asm-generic/percpu.h
> > > > @@ -80,4 +80,56 @@ extern void setup_per_cpu_areas(void);
> > > >  #define DECLARE_PER_CPU(type, name) extern PER_CPU_ATTRIBUTES \
> > > >  					__typeof__(type) per_cpu_var(name)
> > > >  
> > > > +/*
> > > > + * Optional methods for optimized non-lvalue per-cpu variable access.
> > > > + *
> > > > + * @var can be a percpu variable or a field of it and its size should
> > > > + * equal char, int or long.  percpu_read() evaluates to a lvalue and
> > > > + * all others to void.
> > > > + *
> > > > + * These operations are guaranteed to be atomic w.r.t. preemption.
> > > > + * The generic versions use plain get/put_cpu_var().  Archs are
> > > > + * encouraged to implement single-instruction alternatives which don't
> > > > + * require preemption protection.
> > > > + */
> > > > +#ifndef percpu_read
> > > > +# define percpu_read(var)						\
> > > > +  ({									\
> > > > +	typeof(per_cpu_var(var)) __tmp_var__;				\
> > > > +	__tmp_var__ = get_cpu_var(var);					\
> > > > +	put_cpu_var(var);						\
> > > > +	__tmp_var__;							\
> > > > +  })
> > > > +#endif
> > > 
> > > I wonder if the preempt_disable()/preempt_enable() in here actually
> > > does anything useful on any architecture.
> > 
> > Provides "this is IRQ safe"
> 
> ?
> 
> > and "this is preempt safe" semantics.
> 
> Of course.  But do any architectures actually _need_ that for a single 
> read?

not for a read i guess - but for the other ops like add/and/or/xor.

> Maybe.  And if so, they can interpose their arch-specific 
> implementation.  But if the generic version is optimal for them, they 
> wouldn't need to..

we cannot turn the generic ops into a single instruction so arch methods 
are preferable no matter how thick or thin the generic version is. But i 
agree that the optimization you suggest could be done.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-15 13:23                       ` [PATCH] percpu: " Tejun Heo
  2009-01-15 13:36                         ` Ingo Molnar
  2009-01-15 17:30                         ` Andrew Morton
@ 2009-01-15 18:46                         ` Ingo Molnar
  2009-01-31 10:36                         ` Jeremy Fitzhardinge
  3 siblings, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 18:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty, travis,
	linux-kernel, akpm, steiner, hugh


* Tejun Heo <tj@kernel.org> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> 
> It is an optimization and a cleanup, and adds the following new
> generic percpu methods:
> 
>   percpu_read()
>   percpu_write()
>   percpu_add()
>   percpu_sub()
>   percpu_and()
>   percpu_or()
>   percpu_xor()

on a second thought ... i just started using this construct in code, and 
promptly typoed it: i typed "per_cpu_read()".

Which is really the more logical name for this. Mind if i do a 
s/percpu/per_cpu/ rename of all of these new APIs?

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 12:22                   ` Ingo Molnar
  2009-01-15 13:09                     ` Tejun Heo
@ 2009-01-15 20:31                     ` Christoph Lameter
  2009-01-16  9:41                       ` Tejun Heo
  1 sibling, 1 reply; 109+ messages in thread
From: Christoph Lameter @ 2009-01-15 20:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tejun Heo, roel kluin, H. Peter Anvin, Brian Gerst, ebiederm,
	rusty, travis, linux-kernel, akpm, steiner, hugh

On Thu, 15 Jan 2009, Ingo Molnar wrote:

>  - i standardized the commit logs to the usual x86 style and added
>    Original-From: Mike Travis tags to those patches that were derived from
>    Mike's patches.

Most of that is based on earlier patches by me.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 13:59                     ` [patch] " roel kluin
@ 2009-01-15 21:51                       ` Tejun Heo
  0 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-15 21:51 UTC (permalink / raw)
  To: roel kluin
  Cc: Ingo Molnar, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

Hello, Roel.

roel kluin wrote:
>> +#ifndef percpu_read
>> +# define percpu_read(var)                                              \
>> +  ({                                                                   \
>> +       typeof(per_cpu_var(var)) __tmp_var__;                           \
>> +       __tmp_var__ = get_cpu_var(var);                                 \
>> +       put_cpu_var(var);                                               \
>> +       __tmp_var__;                                                    \
>> +  })
>> +#endif
> 
> I'm sorry for your eyes, but since var occurs twice, isn't it better to do:
> 
> # define percpu_read(var)                                              \
>   ({                                                                   \
>        typeof(var) __pcpu_read_var__ = var;
>                \
>        typeof(per_cpu_var(__pcpu_read_var__)) __tmp_var__;                    \
>        __tmp_var__ = get_cpu_var(__pcpu_read_var__);                           \
>        put_cpu_var(__pcpu_read_var__);                                         \
>        __tmp_var__;                                                    \
>   })
> 
>> +
>> +#define __percpu_generic_to_op(var, val, op)                           \
>> +do {                                                                   \
>> +       get_cpu_var(var) op val;                                        \
>> +       put_cpu_var(var);                                               \
>> +} while (0)
> 
> and:
> 
> #define __percpu_generic_to_op(var, val, op)                           \
> do {                                                                   \
>        typeof(var) __pcpu_gto_var__ = var;
>               \
>        get_cpu_var(__pcpu_gto_var__) op val;
>              \
>        put_cpu_var(__pcpu_gto_var__);
>              \
> } while (0)

@var has to be simple identifier as it ends up getting concatenated to
a string.  There's even a check for it in get_cpu_var() macro.  Please
also note that lack of any protecting ()'s around @var for the same
reason.  So, basically, typeof(var) just doesn't exist.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-15 18:39                               ` Ingo Molnar
@ 2009-01-15 21:53                                 ` Tejun Heo
  2009-01-16  0:12                                 ` Herbert Xu
  1 sibling, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-15 21:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, steiner, hugh

Ingo Molnar wrote:
>>> and "this is preempt safe" semantics.
>> Of course.  But do any architectures actually _need_ that for a single 
>> read?
> 
> not for a read i guess - but for the other ops like add/and/or/xor.

FWIW, it prevents cross-cpu cacheline contamination.  :-)

>> Maybe.  And if so, they can interpose their arch-specific 
>> implementation.  But if the generic version is optimal for them, they 
>> wouldn't need to..
> 
> we cannot turn the generic ops into a single instruction so arch methods 
> are preferable no matter how thick or thin the generic version is. But i 
> agree that the optimization you suggest could be done.

I think the preemption protection is good to have there for, if
nothing else, documentation.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 13:39                         ` Ingo Molnar
@ 2009-01-15 21:54                           ` Tejun Heo
  2009-01-16  1:28                             ` [PATCH x86/percpu] x86: fix build bug introduced during merge Tejun Heo
  2009-01-15 22:34                           ` [patch] add optimized generic percpu accessors Ingo Molnar
  1 sibling, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-15 21:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

Hello, Ingo.

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
>> FYI, -tip testing found the following bug with your percpu stuff:
>>
>> There's an early exception during bootup, on 64-bit x86:
>>
>>   PANIC: early exception 0e rip 10:ffffffff80276855: error ? cr2 6688
>>
>>  - gcc version 4.3.2 20081007 (Red Hat 4.3.2-6) (GCC) 
>>  - binutils-2.18.50.0.6-2.x86_64
>>
>> config attached. You can find the disassembly of lock_release_holdtime() 
>> below - that's where it crashed:
> 
> Gut feeling: we must have gotten a window where the PDA is not right. Note 
> how it crashes in lock_release_holdtime() - that is CONFIG_LOCK_STAT 
> instrumentation - very lowlevel and pervasive. Function tracing is also 
> enabled although it should be inactive at the early stages.
> 
> So i'd take a good look at the PDA setup portion of the boot code, and see 
> whether some spinlock acquire/release can slip in while the 
> PDA/percpu-area is not reliable.

Heh... I tought I got this covered pretty good.  I'll get onto
debugging in a few hours.  Thanks for testing.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 13:39                         ` Ingo Molnar
  2009-01-15 21:54                           ` Tejun Heo
@ 2009-01-15 22:34                           ` Ingo Molnar
  1 sibling, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-15 22:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh


* Ingo Molnar <mingo@elte.hu> wrote:

> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > FYI, -tip testing found the following bug with your percpu stuff:
> > 
> > There's an early exception during bootup, on 64-bit x86:
> > 
> >   PANIC: early exception 0e rip 10:ffffffff80276855: error ? cr2 6688
> > 
> >  - gcc version 4.3.2 20081007 (Red Hat 4.3.2-6) (GCC) 
> >  - binutils-2.18.50.0.6-2.x86_64
> > 
> > config attached. You can find the disassembly of lock_release_holdtime() 
> > below - that's where it crashed:
> 
> Gut feeling: we must have gotten a window where the PDA is not right. 
> Note how it crashes in lock_release_holdtime() - that is 
> CONFIG_LOCK_STAT instrumentation - very lowlevel and pervasive. Function 
> tracing is also enabled although it should be inactive at the early 
> stages.
> 
> So i'd take a good look at the PDA setup portion of the boot code, and 
> see whether some spinlock acquire/release can slip in while the 
> PDA/percpu-area is not reliable.

Btw., if this turns out to be a genuine linker bug due to us crossing 
RIP-relative references from minus-1.9G-ish negative addresses up to 
slightly-above-zero positive addresses (or something similar - we are 
pushing the linker here quite a bit), we still have a contingency plan:

We can relocate the percpu symbols to small-negative addresses, and place 
it so that the end of the dynamic percpu area ends at zero.

Then we could bootmem alloc the percpu size plus 4096 bytes. Voila: we've 
got a page free for the canary field, at the end of the percpu area, and 
placed just correctly for gcc to see it at %gs:40.

But this isnt as totally clean and simple as your current patchset, so 
lets only do it if needed. It will also put some constraints on how we can 
shape dynamic percpu areas, and an overflow near the end of the dynamic 
percpu area could affect the canary - but still, it's a workable path as 
well, should the zero-based approach fail.

OTOH, this crash does not have the feeling of a linker bug to me - it has 
the feeling of a setup ordering anomaly.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-15 18:39                               ` Ingo Molnar
  2009-01-15 21:53                                 ` Tejun Heo
@ 2009-01-16  0:12                                 ` Herbert Xu
  2009-01-16  0:15                                   ` Ingo Molnar
       [not found]                                   ` <200901170804.18622.rusty@rustcorp.com.au>
  1 sibling, 2 replies; 109+ messages in thread
From: Herbert Xu @ 2009-01-16  0:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: akpm, tj, hpa, brgerst, ebiederm, cl, rusty, travis, linux-kernel,
	steiner, hugh

Ingo Molnar <mingo@elte.hu> wrote:
>
>> Of course.  But do any architectures actually _need_ that for a single 
>> read?
> 
> not for a read i guess - but for the other ops like add/and/or/xor.

One of the things I'd like to see happen with this work is for
us to have a cheap per-cpu atomic counter that we can use for
SNMP stats.

If we can make the inc/add variants into a single instruction,
then it won't need to disable preemption or interrupts.

So if you could design the API such that we have a variant of
add/inc that automatically disables/enables preemption then we
can optimise that away on x86.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-16  0:12                                 ` Herbert Xu
@ 2009-01-16  0:15                                   ` Ingo Molnar
  2009-01-16  0:18                                     ` Herbert Xu
  2009-01-16  1:09                                     ` H. Peter Anvin
       [not found]                                   ` <200901170804.18622.rusty@rustcorp.com.au>
  1 sibling, 2 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-16  0:15 UTC (permalink / raw)
  To: Herbert Xu
  Cc: akpm, tj, hpa, brgerst, ebiederm, cl, rusty, travis, linux-kernel,
	steiner, hugh


* Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Ingo Molnar <mingo@elte.hu> wrote:
> >
> >> Of course.  But do any architectures actually _need_ that for a single 
> >> read?
> > 
> > not for a read i guess - but for the other ops like add/and/or/xor.
> 
> One of the things I'd like to see happen with this work is for us to 
> have a cheap per-cpu atomic counter that we can use for SNMP stats.
> 
> If we can make the inc/add variants into a single instruction, then it 
> won't need to disable preemption or interrupts.
> 
> So if you could design the API such that we have a variant of add/inc 
> that automatically disables/enables preemption then we can optimise that 
> away on x86.

Yeah. percpu_add(var, 1) does exactly that on x86.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-16  0:15                                   ` Ingo Molnar
@ 2009-01-16  0:18                                     ` Herbert Xu
       [not found]                                       ` <200901170827.33729.rusty@rustcorp.com.au>
  2009-01-16  1:09                                     ` H. Peter Anvin
  1 sibling, 1 reply; 109+ messages in thread
From: Herbert Xu @ 2009-01-16  0:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: akpm, tj, hpa, brgerst, ebiederm, cl, rusty, travis, linux-kernel,
	steiner, hugh, David S. Miller, netdev

On Fri, Jan 16, 2009 at 01:15:44AM +0100, Ingo Molnar wrote:
>
> > So if you could design the API such that we have a variant of add/inc 
> > that automatically disables/enables preemption then we can optimise that 
> > away on x86.
> 
> Yeah. percpu_add(var, 1) does exactly that on x86.

Awesome! Sounds like we can finally do away with those nasty
hacks in the SNMP macros.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-16  0:15                                   ` Ingo Molnar
  2009-01-16  0:18                                     ` Herbert Xu
@ 2009-01-16  1:09                                     ` H. Peter Anvin
  1 sibling, 0 replies; 109+ messages in thread
From: H. Peter Anvin @ 2009-01-16  1:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, akpm, tj, brgerst, ebiederm, cl, rusty, travis,
	linux-kernel, steiner, hugh

Ingo Molnar wrote:
>>
>> So if you could design the API such that we have a variant of add/inc 
>> that automatically disables/enables preemption then we can optimise that 
>> away on x86.
> 
> Yeah. percpu_add(var, 1) does exactly that on x86.
> 

And even on architectures which can only do percpu inc/dec we can make 
the right thing happen for a constant here, so we really don't need a 
new interface.  This is a good thing.

	-hpa

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH x86/percpu] x86: fix build bug introduced during merge
  2009-01-15 21:54                           ` Tejun Heo
@ 2009-01-16  1:28                             ` Tejun Heo
  2009-01-16  3:25                               ` [PATCH x86/percpu] x86_64: initialize this_cpu_off to __per_cpu_load Tejun Heo
  0 siblings, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-16  1:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

EXPORT_PER_CPU_SYMBOL() got misplaced during merge leading to build
failure.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 arch/x86/kernel/setup_percpu.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/mach-integrator/clock.h b/arch/arm/mach-integrator/clock.h
deleted file mode 100644
index e69de29..0000000
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index daeedf8..b5c35af 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -86,9 +86,8 @@ void __cpuinit load_pda_offset(int cpu)
 }
 #ifndef CONFIG_SMP
 DEFINE_PER_CPU(struct x8664_pda, __pda);
-EXPORT_PER_CPU_SYMBOL(__pda);
 #endif
-
+EXPORT_PER_CPU_SYMBOL(__pda);
 #endif /* CONFIG_SMP && CONFIG_X86_64 */
 
 #ifdef CONFIG_X86_64

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH x86/percpu] x86_64: initialize this_cpu_off to __per_cpu_load
  2009-01-16  1:28                             ` [PATCH x86/percpu] x86: fix build bug introduced during merge Tejun Heo
@ 2009-01-16  3:25                               ` Tejun Heo
  2009-01-16 13:16                                 ` Ingo Molnar
  0 siblings, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-16  3:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

On x86_64, if get_per_cpu_var() is used before per cpu area is setup
(if lockdep is turned on, it happens), it needs this_cpu_off to point
to __per_cpu_load.  Initialize accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
Okay, this fixes the problem.  Tested with gcc-4.3.2 and binutils-2.19
(openSUSE 11.1).  For all four variants I can test (64-smp, 64-up,
32-smp, 32-up).  This and the previous merge fix patch are available
in the following git branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu

 arch/x86/kernel/smpcommon.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/smpcommon.c b/arch/x86/kernel/smpcommon.c
index 84395fa..7e15781 100644
--- a/arch/x86/kernel/smpcommon.c
+++ b/arch/x86/kernel/smpcommon.c
@@ -3,8 +3,13 @@
  */
 #include <linux/module.h>
 #include <asm/smp.h>
+#include <asm/sections.h>
 
+#ifdef CONFIG_X86_64
+DEFINE_PER_CPU(unsigned long, this_cpu_off) = (unsigned long)__per_cpu_load;
+#else
 DEFINE_PER_CPU(unsigned long, this_cpu_off);
+#endif
 EXPORT_PER_CPU_SYMBOL(this_cpu_off);
 
 #ifdef CONFIG_X86_32
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-15 20:31                     ` Christoph Lameter
@ 2009-01-16  9:41                       ` Tejun Heo
  2009-01-16 13:23                         ` Ingo Molnar
  0 siblings, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-16  9:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ingo Molnar, roel kluin, H. Peter Anvin, Brian Gerst, ebiederm,
	rusty, travis, linux-kernel, akpm, steiner, hugh

Christoph Lameter wrote:
> On Thu, 15 Jan 2009, Ingo Molnar wrote:
> 
>>  - i standardized the commit logs to the usual x86 style and added
>>    Original-From: Mike Travis tags to those patches that were derived from
>>    Mike's patches.
> 
> Most of that is based on earlier patches by me.

Sorry, I didn't know that.  Ingo, please feel free to re-base the tree
with updated credits.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH x86/percpu] x86_64: initialize this_cpu_off to __per_cpu_load
  2009-01-16  3:25                               ` [PATCH x86/percpu] x86_64: initialize this_cpu_off to __per_cpu_load Tejun Heo
@ 2009-01-16 13:16                                 ` Ingo Molnar
  2009-01-16 13:47                                   ` Tejun Heo
  0 siblings, 1 reply; 109+ messages in thread
From: Ingo Molnar @ 2009-01-16 13:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh


* Tejun Heo <htejun@gmail.com> wrote:

> On x86_64, if get_per_cpu_var() is used before per cpu area is setup
> (if lockdep is turned on, it happens), it needs this_cpu_off to point
> to __per_cpu_load.  Initialize accordingly.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> Okay, this fixes the problem.  Tested with gcc-4.3.2 and binutils-2.19
> (openSUSE 11.1).  For all four variants I can test (64-smp, 64-up,
> 32-smp, 32-up).  This and the previous merge fix patch are available
> in the following git branch.

cool!

>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu
> 
>  arch/x86/kernel/smpcommon.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/smpcommon.c b/arch/x86/kernel/smpcommon.c
> index 84395fa..7e15781 100644
> --- a/arch/x86/kernel/smpcommon.c
> +++ b/arch/x86/kernel/smpcommon.c
> @@ -3,8 +3,13 @@
>   */
>  #include <linux/module.h>
>  #include <asm/smp.h>
> +#include <asm/sections.h>
>  
> +#ifdef CONFIG_X86_64
> +DEFINE_PER_CPU(unsigned long, this_cpu_off) = (unsigned long)__per_cpu_load;
> +#else
>  DEFINE_PER_CPU(unsigned long, this_cpu_off);
> +#endif

I've pulled your tree, but couldnt we do this symmetrically in the 32-bit 
case too and avoid the ugly #ifdef somehow?

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-16  9:41                       ` Tejun Heo
@ 2009-01-16 13:23                         ` Ingo Molnar
  2009-01-16 13:49                           ` Tejun Heo
  0 siblings, 1 reply; 109+ messages in thread
From: Ingo Molnar @ 2009-01-16 13:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Lameter, roel kluin, H. Peter Anvin, Brian Gerst,
	ebiederm, rusty, travis, linux-kernel, akpm, steiner, hugh


* Tejun Heo <tj@kernel.org> wrote:

> Christoph Lameter wrote:
> > On Thu, 15 Jan 2009, Ingo Molnar wrote:
> > 
> >>  - i standardized the commit logs to the usual x86 style and added
> >>    Original-From: Mike Travis tags to those patches that were derived from
> >>    Mike's patches.
> > 
> > Most of that is based on earlier patches by me.
> 
> Sorry, I didn't know that.  Ingo, please feel free to re-base the tree 
> with updated credits.

Sure, i've rebased it and added this to the front of every affected patch:

[ Based on original patch from Christoph Lameter and Mike Travis. ]

I've also first pulled your latest tree, please base new changes and new 
pull requests on the tip/x86/percpu tree i just pushed out.

I also merged tip/x86/percpu into tip/master and resumed testing it.

I suspect the next step will be to integrate Brian's "get rid of the rest 
of the PDA" patches?

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH x86/percpu] x86_64: initialize this_cpu_off to __per_cpu_load
  2009-01-16 13:16                                 ` Ingo Molnar
@ 2009-01-16 13:47                                   ` Tejun Heo
       [not found]                                     ` <20090116221651.GA32736@elte.hu>
  0 siblings, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-16 13:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

Ingo Molnar wrote:
>> +#ifdef CONFIG_X86_64
>> +DEFINE_PER_CPU(unsigned long, this_cpu_off) = (unsigned long)__per_cpu_load;
>> +#else
>>  DEFINE_PER_CPU(unsigned long, this_cpu_off);
>> +#endif
> 
> I've pulled your tree, but couldnt we do this symmetrically in the 32-bit 
> case too and avoid the ugly #ifdef somehow?

You can take the "x86_32: make percpu symbols zerobased on SMP" patch
and the above ifdef won't be necessary along with similar one in
setup_percpu.c.  The above ifdef is because 64 is zero based while 32
is not.  If you want something, say, __per_cpu_base_off or something
to be defined conditionally and used in these two cases, I'm not sure
whether that would clean up the code or obfuscate it.  :-)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-16 13:23                         ` Ingo Molnar
@ 2009-01-16 13:49                           ` Tejun Heo
  0 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-16 13:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Lameter, roel kluin, H. Peter Anvin, Brian Gerst,
	ebiederm, rusty, travis, linux-kernel, akpm, steiner, hugh

Hello,

Ingo Molnar wrote:
>>> Most of that is based on earlier patches by me.
>> Sorry, I didn't know that.  Ingo, please feel free to re-base the tree 
>> with updated credits.
> 
> Sure, i've rebased it and added this to the front of every affected patch:
> 
> [ Based on original patch from Christoph Lameter and Mike Travis. ]

Good.

> I've also first pulled your latest tree, please base new changes and new 
> pull requests on the tip/x86/percpu tree i just pushed out.
> 
> I also merged tip/x86/percpu into tip/master and resumed testing it.

Great.

> I suspect the next step will be to integrate Brian's "get rid of the rest 
> of the PDA" patches?

Yeap, Brian, any progress?  If you don't have the time, feel free to
just shoot the patches my way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-15 18:34                             ` Andrew Morton
  2009-01-15 18:39                               ` Ingo Molnar
@ 2009-01-16 14:10                               ` Mark Lord
  1 sibling, 0 replies; 109+ messages in thread
From: Mark Lord @ 2009-01-16 14:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Tejun Heo, H. Peter Anvin, Brian Gerst, ebiederm, cl,
	rusty, travis, linux-kernel, steiner, hugh

Andrew Morton wrote:
> On Thu, 15 Jan 2009 19:02:59 +0100 Ingo Molnar <mingo@elte.hu> wrote:
> 
>> * Andrew Morton <akpm@linux-foundation.org> wrote:
..
>>> I wonder if the preempt_disable()/preempt_enable() in here actually
>>> does anything useful on any architecture.
>> Provides "this is IRQ safe"
> 
> ?
> 
>> and "this is preempt safe" semantics.
> 
> Of course.  But do any architectures actually _need_ that for a single read?
..

If the target is unaligned, then RISC architectures will need protection there.
If we can guarantee correct memory alignment of the target, then no / none.

Cheers

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
       [not found]                 ` <200901170748.53734.rusty@rustcorp.com.au>
@ 2009-01-16 21:24                   ` Ingo Molnar
  0 siblings, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-16 21:24 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Tejun Heo, H. Peter Anvin, Brian Gerst, ebiederm, cl, travis,
	linux-kernel, akpm, steiner, hugh


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Thursday 15 January 2009 22:08:26 Tejun Heo wrote:
> > Hello,
> > 
> > Ingo Molnar wrote:
> > > The new ops are a pretty nice and clean solution i think.
> > > 
> > > Firstly, accessing the current CPU is the only safe shortcut anyway (there 
> > > is where we can do %fs/%gs / rip-relative addressing modes), and the 
> > > generic per_cpu() APIs dont really provide that guarantee for us. We might 
> > > be able to hook into __get_cpu_var() but those both require to be an 
> > > lvalue and are also relatively rarely used.
> > > 
> > > So introducing the new, rather straightforward APIs and using them 
> > > wherever they matter for performance is good. Your patchset already shaved 
> > > off an instruction from ordinary per_cpu() accesses, so it's all moving 
> > > rather close to the most-optimal situation already.
> > 
> > Yeah, I don't think we can do much better than those ops.  I have two
> > issues tho.
> > 
> > 1. percpu_and() is missing.  I added it for completeness's sake.
> > 
> > 2. The generic percpu_op() should be local to the cpu, so it should
> >    expand to...
> > 
> >    do { get_cpu_var(var) OP (val); put_cpu_var(var) } while (0)
> > 
> >    as the original x86_OP_percpu() did.  Right?
> > 
> > Thanks.
> 
> No no no no.  This is a crapload of infrastructure noone will use.
>
> Please just start by adding read_percpu like so (this won't apply since 
> there's lots of other per-cpu things going on, but you get the idea).
> 
> We don't need a whole set of operators for a handful of 
> non-arch-specific cases.  Reading a var is fairly common, the other ops 
> are diminishing returns and we already have local_t for some of these 
> cases (and we're reviewing that, too).

Actually, the percpu_add()/sub() ops are useful for statistics. (can be 
done without preempt disable/enable) percpu_write() is also obviously 
useful. The others are common arithmetic operators, for completeness of 
the API.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
       [not found]                                   ` <200901170804.18622.rusty@rustcorp.com.au>
@ 2009-01-16 21:59                                     ` Ingo Molnar
  2009-01-16 22:09                                     ` Ingo Molnar
  1 sibling, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-16 21:59 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Herbert Xu, akpm, tj, hpa, brgerst, ebiederm, cl, travis,
	linux-kernel, steiner, hugh


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Friday 16 January 2009 10:42:00 Herbert Xu wrote:
> > Ingo Molnar <mingo@elte.hu> wrote:
> > >
> > >> Of course.  But do any architectures actually _need_ that for a single 
> > >> read?
> > > 
> > > not for a read i guess - but for the other ops like add/and/or/xor.
> > 
> > One of the things I'd like to see happen with this work is for
> > us to have a cheap per-cpu atomic counter that we can use for
> > SNMP stats.
> > 
> > If we can make the inc/add variants into a single instruction,
> > then it won't need to disable preemption or interrupts.
> > 
> > So if you could design the API such that we have a variant of
> > add/inc that automatically disables/enables preemption then we
> > can optimise that away on x86.
> 
> Yep, already on it.  It's called local_t; that's what it was originally 
> designed for.
> 
> Unfortunately, to use it efficiently, we need large per-cpu areas.

Do you mean constructs like:

        local_inc(&__get_cpu_var(var));

?

If yes then i think you are missing the point here.

Yes, local_t can be useful when something is in an object and we know only 
a local IRQ context can update it and we dont want to disable irqs or use 
heavy atomics.

But percpu_read()/write()/add()/sub() ops are about optimizing _percpu_ 
variables. local_t alone does not solve that problem - because to use 
local_t as a percpu variable you have to get to the address of that 
variable - and that alone is not well optimized.

Or do you propose some new API that would allow that?

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
       [not found]                                       ` <200901170827.33729.rusty@rustcorp.com.au>
@ 2009-01-16 22:08                                         ` Ingo Molnar
       [not found]                                           ` <200901201328.24605.rusty@rustcorp.com.au>
  0 siblings, 1 reply; 109+ messages in thread
From: Ingo Molnar @ 2009-01-16 22:08 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Herbert Xu, akpm, tj, hpa, brgerst, ebiederm, cl, travis,
	linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Friday 16 January 2009 10:48:24 Herbert Xu wrote:
> > On Fri, Jan 16, 2009 at 01:15:44AM +0100, Ingo Molnar wrote:
> > >
> > > > So if you could design the API such that we have a variant of add/inc 
> > > > that automatically disables/enables preemption then we can optimise that 
> > > > away on x86.
> > > 
> > > Yeah. percpu_add(var, 1) does exactly that on x86.
> 
> <sigh>.  No it doesn't.

What do you mean by "No it doesn't". It does exactly what i claimed it 
does.

> It's really nice that everyone's excited about this, but it's more 
> complex than this.  Unf. I'm too busy preparing for linux.conf.au to 
> explain it all properly right now, but here's the highlights:
> 
> 1) This only works on static per-cpu vars.
>    - We are working on fixing this, but it's non-trivial for large allocs like
>      those in networking.  Small allocs, we have patches for.

How do difficulties of dynamic percpu-alloc make my above suggestion 
unsuitable for SNMP stats in networking? Most of those stats are not 
dynamically allocated - they are plain straightforward percpu variables.

Plus the majority of percpu usage is static - just like the majority of 
local variables is static, not dynamic. So any percpu-alloc complication 
is a non-issue.

> 2) The generic versions of these as posted by Tejun are unsuitable for
>    networking; they need to bh_disable.  That would make networking less
>    efficient than it is now for non-x86, and to be generic it would have
>    to be local_irq_save/restore anyway.

The generic versions will not be used on 95%+ of the active Linux systems 
out there, as they run on x86. If you worry about the remaining 5%, those 
can be optimized too.

> 3) local_t was designed to do exactly this: a fast cpu-local counter
>    implemented optimally for each arch.  For sparc64, doing a trivalue version
>    seems optimal, for s390 atomics, for x86 single-insn, for powerpc
>    irq_save/restore, etc.

But local_t does not actually solve this problem at all - because one 
still has to have per-cpu-ness.

> 4) Unfortunately, local_t has been extended beyond a simple counter, meaning
>    it now has more complex requirements (eg. Mathieu wants nmi-safe, even
>    though that's impossible on sparc and parisc, and percpu_counter wants
>    local_add_return, which makes trival less desirable).  These discussions
>    are on the back burner at the moment, but ongoing.

In reality local_t has almost zero users in the kernel - despite being 
with us at least since v2.6.12. That pretty much tells us all about its 
utility.

The thing is, local_t without proper percpu integration is a toothless 
tiger in the jungle. And our APIS do exactly that kind of integration and 
i expect them to be more popular than local_t. There's already a dozen 
usage sites of it in arch/x86.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
       [not found]                                   ` <200901170804.18622.rusty@rustcorp.com.au>
  2009-01-16 21:59                                     ` Ingo Molnar
@ 2009-01-16 22:09                                     ` Ingo Molnar
  1 sibling, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-16 22:09 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Herbert Xu, akpm, tj, hpa, brgerst, ebiederm, cl, travis,
	linux-kernel, steiner, hugh


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Friday 16 January 2009 10:42:00 Herbert Xu wrote:
> > Ingo Molnar <mingo@elte.hu> wrote:
> > >
> > >> Of course.  But do any architectures actually _need_ that for a single 
> > >> read?
> > > 
> > > not for a read i guess - but for the other ops like add/and/or/xor.
> > 
> > One of the things I'd like to see happen with this work is for
> > us to have a cheap per-cpu atomic counter that we can use for
> > SNMP stats.
> > 
> > If we can make the inc/add variants into a single instruction, then it 
> > won't need to disable preemption or interrupts.
> > 
> > So if you could design the API such that we have a variant of add/inc 
> > that automatically disables/enables preemption then we can optimise 
> > that away on x86.
> 
> Yep, already on it.  It's called local_t; that's what it was originally 
> designed for.
> 
> Unfortunately, to use it efficiently, we need large per-cpu areas.

This makes no sense to me at all. Care to explain?

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH core/percpu REPOST] linker script: add missing VMLINUX_SYMBOL
       [not found]                                           ` <49716FBF.7080203@kernel.org>
@ 2009-01-17  6:29                                             ` Tejun Heo
  2009-01-17  6:32                                               ` [PATCH] linker script: add missing .data.percpu.page_aligned Tejun Heo
  2009-01-21 23:45                                               ` [PATCH core/percpu REPOST] linker script: add missing VMLINUX_SYMBOL Christoph Lameter
  0 siblings, 2 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-17  6:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, Eric Biederman,
	Christoph Lameter, Rusty Russell, travis,
	Linux Kernel Mailing List, Andrew Morton, steiner, Hugh Dickins

The newly added PERCPU_*() macros define and use __per_cpu_load but
VMLINUX_SYMBOL() was missing from usages causing build failures on
archs where linker visible symbol is different from C symbols
(e.g. blackfin).

Signed-off-by: Tejun Heo <tj@kernel.org>
---
Just realized this was reply to a private mail.  Restoring cc list
and resending.

Thanks.

 include/asm-generic/vmlinux.lds.h |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e53319c..aa6b9b1 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -432,13 +432,14 @@
 
 #define PERCPU_PROLOG(vaddr)						\
 	VMLINUX_SYMBOL(__per_cpu_load) = .;				\
-	.data.percpu vaddr : AT(__per_cpu_load - LOAD_OFFSET) {		\
+	.data.percpu vaddr : AT(VMLINUX_SYMBOL(__per_cpu_load)		\
+				- LOAD_OFFSET) {			\
 		VMLINUX_SYMBOL(__per_cpu_start) = .;
 
 #define PERCPU_EPILOG(phdr)						\
 		VMLINUX_SYMBOL(__per_cpu_end) = .;			\
 	} phdr								\
-	. = __per_cpu_load + SIZEOF(.data.percpu);
+	. = VMLINUX_SYMBOL(__per_cpu_load) + SIZEOF(.data.percpu);
 
 /**
  * PERCPU_VADDR_PREALLOC - define output section for percpu area with prealloc
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH] linker script: add missing .data.percpu.page_aligned
  2009-01-17  6:29                                             ` [PATCH core/percpu REPOST] linker script: add missing VMLINUX_SYMBOL Tejun Heo
@ 2009-01-17  6:32                                               ` Tejun Heo
  2009-01-17  8:21                                                 ` Ingo Molnar
  2009-01-21 23:46                                                 ` Christoph Lameter
  2009-01-21 23:45                                               ` [PATCH core/percpu REPOST] linker script: add missing VMLINUX_SYMBOL Christoph Lameter
  1 sibling, 2 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-17  6:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, Eric Biederman,
	Christoph Lameter, Rusty Russell, travis,
	Linux Kernel Mailing List, Andrew Morton, steiner, Hugh Dickins

arm, arm/mach-integrator and powerpc were missing
.data.percpu.page_aligned in their percpu output section definitions.
Add it.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
This and the previous patch is available in the following git branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu

The tree is on top of the rebased tip/core/percpu (cd3adf52).

Thanks.

 arch/arm/kernel/vmlinux.lds.S     |    1 +
 arch/ia64/kernel/vmlinux.lds.S    |    1 +
 arch/powerpc/kernel/vmlinux.lds.S |    1 +
 3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 0021607..85598f7 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -65,6 +65,7 @@ SECTIONS
 #endif
 		. = ALIGN(4096);
 		__per_cpu_start = .;
+			*(.data.percpu.page_aligned)
 			*(.data.percpu)
 			*(.data.percpu.shared_aligned)
 		__per_cpu_end = .;
diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
index 10a7d47..f45e4e5 100644
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -219,6 +219,7 @@ SECTIONS
   .data.percpu PERCPU_ADDR : AT(__phys_per_cpu_start - LOAD_OFFSET)
 	{
 		__per_cpu_start = .;
+		*(.data.percpu.page_aligned)
 		*(.data.percpu)
 		*(.data.percpu.shared_aligned)
 		__per_cpu_end = .;
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index 47bf15c..04e8ece 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -182,6 +182,7 @@ SECTIONS
 	. = ALIGN(PAGE_SIZE);
 	.data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) {
 		__per_cpu_start = .;
+		*(.data.percpu.page_aligned)
 		*(.data.percpu)
 		*(.data.percpu.shared_aligned)
 		__per_cpu_end = .;
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* Re: [PATCH] linker script: add missing .data.percpu.page_aligned
  2009-01-17  6:32                                               ` [PATCH] linker script: add missing .data.percpu.page_aligned Tejun Heo
@ 2009-01-17  8:21                                                 ` Ingo Molnar
  2009-01-21 23:46                                                 ` Christoph Lameter
  1 sibling, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-17  8:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: roel kluin, H. Peter Anvin, Brian Gerst, Eric Biederman,
	Christoph Lameter, Rusty Russell, travis,
	Linux Kernel Mailing List, Andrew Morton, steiner, Hugh Dickins


* Tejun Heo <htejun@gmail.com> wrote:

> arm, arm/mach-integrator and powerpc were missing
> .data.percpu.page_aligned in their percpu output section definitions.
> Add it.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> This and the previous patch is available in the following git branch.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu
> 
> The tree is on top of the rebased tip/core/percpu (cd3adf52).
> 
> Thanks.

Pulled into tip/core/percpu, thanks Tejun!

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
       [not found]                                           ` <200901201328.24605.rusty@rustcorp.com.au>
@ 2009-01-20  6:25                                             ` Tejun Heo
  2009-01-20 10:36                                               ` Ingo Molnar
       [not found]                                               ` <200901271213.18605.rusty@rustcorp.com.au>
  2009-01-20 10:40                                             ` Ingo Molnar
  1 sibling, 2 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-20  6:25 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl, travis,
	linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers

Rusty Russell wrote:
> The generic versions Tejun posted are not softirq safe, so not
> suitable for network counters. To figure out what semantics we
> really want we need to we must audit the users; I'm sorry I haven't
> finished that task (and won't until after the conference).

No, they're not.  They're preempt safe as mentioned in the comment and
is basically just generalization of the original x86 versions used by
x86_64 on SMP before pda and percpu areas were merged.  I agree that
it's something very close to local_t and it would be nice to see those
somehow unified (and I have patches which make use of local_t in my
queue waiting for dynamic percpu allocation).

Another question to ask is whether to keep using separate interfaces
for static and dynamic percpu variables or migrate to something which
can take both.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-20  6:25                                             ` Tejun Heo
@ 2009-01-20 10:36                                               ` Ingo Molnar
       [not found]                                               ` <200901271213.18605.rusty@rustcorp.com.au>
  1 sibling, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-20 10:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Rusty Russell, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl,
	travis, linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers


* Tejun Heo <tj@kernel.org> wrote:

> Rusty Russell wrote:
> > The generic versions Tejun posted are not softirq safe, so not
> > suitable for network counters. To figure out what semantics we
> > really want we need to we must audit the users; I'm sorry I haven't
> > finished that task (and won't until after the conference).
> 
> No, they're not.  They're preempt safe as mentioned in the comment and 
> is basically just generalization of the original x86 versions used by 
> x86_64 on SMP before pda and percpu areas were merged.  I agree that 
> it's something very close to local_t and it would be nice to see those 
> somehow unified (and I have patches which make use of local_t in my 
> queue waiting for dynamic percpu allocation).
> 
> Another question to ask is whether to keep using separate interfaces for 
> static and dynamic percpu variables or migrate to something which can 
> take both.

Also, there's over 400 PER_CPU variable definitions in the kernel, while 
only about 40 dynamic percpu allocation usage sites. (in that i included 
the percpu_counter bits used by networking)

So our percpu code usage is on the static side, by a large margin.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
       [not found]                                           ` <200901201328.24605.rusty@rustcorp.com.au>
  2009-01-20  6:25                                             ` Tejun Heo
@ 2009-01-20 10:40                                             ` Ingo Molnar
  2009-01-21  5:52                                               ` Tejun Heo
  1 sibling, 1 reply; 109+ messages in thread
From: Ingo Molnar @ 2009-01-20 10:40 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Herbert Xu, akpm, tj, hpa, brgerst, ebiederm, cl, travis,
	linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Saturday 17 January 2009 08:38:32 Ingo Molnar wrote:
> > How do difficulties of dynamic percpu-alloc make my above suggestion 
> > unsuitable for SNMP stats in networking? Most of those stats are not 
> > dynamically allocated - they are plain straightforward percpu variables.
> 
> No they're not. [...]

hm, i see this got changed recently - part of the networking stats went 
over to the lib/percpu_counter API recently.

The larger point still remains: the kernel dominantly uses static percpu 
variables by a margin of 10 to 1, so we cannot just brush away the static 
percpu variables and must concentrate on optimizing that side with 
priority. It's nice if the dynamic percpu-alloc side improves as well, of 
course.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-20 10:40                                             ` Ingo Molnar
@ 2009-01-21  5:52                                               ` Tejun Heo
  2009-01-21 10:05                                                 ` Ingo Molnar
  2009-01-21 11:21                                                 ` Eric W. Biederman
  0 siblings, 2 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-21  5:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl,
	travis, linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers

Ingo Molnar wrote:
> The larger point still remains: the kernel dominantly uses static percpu 
> variables by a margin of 10 to 1, so we cannot just brush away the static 
> percpu variables and must concentrate on optimizing that side with 
> priority. It's nice if the dynamic percpu-alloc side improves as well, of 
> course.

Well, the infrequent usage of dynamic percpu allocation is in some
part due to the poor implementation, so it's sort of chicken and egg
problem.  I got into this percpu thing because I wanted a percpu
reference count which can be dynamically allocated and it sucked.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-21  5:52                                               ` Tejun Heo
@ 2009-01-21 10:05                                                 ` Ingo Molnar
  2009-01-21 11:21                                                 ` Eric W. Biederman
  1 sibling, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-21 10:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Rusty Russell, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl,
	travis, linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers


* Tejun Heo <tj@kernel.org> wrote:

> Ingo Molnar wrote:
> > The larger point still remains: the kernel dominantly uses static percpu 
> > variables by a margin of 10 to 1, so we cannot just brush away the static 
> > percpu variables and must concentrate on optimizing that side with 
> > priority. It's nice if the dynamic percpu-alloc side improves as well, of 
> > course.
> 
> Well, the infrequent usage of dynamic percpu allocation is in some part 
> due to the poor implementation, so it's sort of chicken and egg problem.  
> I got into this percpu thing because I wanted a percpu reference count 
> which can be dynamically allocated and it sucked.

Sure, but even static percpu sucked very much (it expanded to half a dozen 
or more instructions), and dynamic is _more_ complex. Anyway, it's getting 
fixed now :-)

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-21  5:52                                               ` Tejun Heo
  2009-01-21 10:05                                                 ` Ingo Molnar
@ 2009-01-21 11:21                                                 ` Eric W. Biederman
  2009-01-21 12:45                                                   ` Stephen Hemminger
  1 sibling, 1 reply; 109+ messages in thread
From: Eric W. Biederman @ 2009-01-21 11:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, Rusty Russell, Herbert Xu, akpm, hpa, brgerst, cl,
	travis, linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers

Tejun Heo <tj@kernel.org> writes:

> Ingo Molnar wrote:
>> The larger point still remains: the kernel dominantly uses static percpu 
>> variables by a margin of 10 to 1, so we cannot just brush away the static 
>> percpu variables and must concentrate on optimizing that side with 
>> priority. It's nice if the dynamic percpu-alloc side improves as well, of 
>> course.
>
> Well, the infrequent usage of dynamic percpu allocation is in some
> part due to the poor implementation, so it's sort of chicken and egg
> problem.  I got into this percpu thing because I wanted a percpu
> reference count which can be dynamically allocated and it sucked.

Counters are our other special case, and counters are interesting
because they are individually very small.  I just looked and the vast
majority of the alloc_percpu users are counters.

I just did a rough count in include/linux/snmp.h  and I came
up with 171*2 counters.  At 8 bytes per counter that is roughly
2.7K.  Or about two thirds of a 4K page.

What makes this is a challenge is that those counters are per network
namespace, and there are no static limits on the number of network
namespaces.

If we push the system and allocate 1024 network namespaces we wind up
needing 2.7MB per cpu, just for the SNMP counters.  

Which nicely illustrates the principle that typically each individual
per cpu allocation is small, but with dynamic allocation we have the
challenge that number of allocations becomes unbounded and in some cases
could be large, while the typical per cpu size is likely to be very small.

I wonder if for the specific case of counters it might make sense to
simply optimize the per cpu allocator for machine word size allocations
and allocate each counter individually freeing us from the burden of
worrying about fragmentation.


The pain with the current alloc_percpu implementation when working
with counters is that it has to allocate an array with one entry
for each cpu to point at the per cpu data.  Which isn't especially efficient.


Eric




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-21 11:21                                                 ` Eric W. Biederman
@ 2009-01-21 12:45                                                   ` Stephen Hemminger
  2009-01-21 14:13                                                     ` Eric W. Biederman
  2009-01-21 20:34                                                     ` David Miller
  0 siblings, 2 replies; 109+ messages in thread
From: Stephen Hemminger @ 2009-01-21 12:45 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tejun Heo, Ingo Molnar, Rusty Russell, Herbert Xu, akpm, hpa,
	brgerst, cl, travis, linux-kernel, steiner, hugh, David S. Miller,
	netdev, Mathieu Desnoyers

On Wed, 21 Jan 2009 03:21:23 -0800
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Tejun Heo <tj@kernel.org> writes:
> 
> > Ingo Molnar wrote:
> >> The larger point still remains: the kernel dominantly uses static percpu 
> >> variables by a margin of 10 to 1, so we cannot just brush away the static 
> >> percpu variables and must concentrate on optimizing that side with 
> >> priority. It's nice if the dynamic percpu-alloc side improves as well, of 
> >> course.
> >
> > Well, the infrequent usage of dynamic percpu allocation is in some
> > part due to the poor implementation, so it's sort of chicken and egg
> > problem.  I got into this percpu thing because I wanted a percpu
> > reference count which can be dynamically allocated and it sucked.
> 
> Counters are our other special case, and counters are interesting
> because they are individually very small.  I just looked and the vast
> majority of the alloc_percpu users are counters.
> 
> I just did a rough count in include/linux/snmp.h  and I came
> up with 171*2 counters.  At 8 bytes per counter that is roughly
> 2.7K.  Or about two thirds of a 4K page.

This is crap. only a small fraction of these SNMP counters are
close enough to the hot path to deserve per-cpu treatment.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-21 12:45                                                   ` Stephen Hemminger
@ 2009-01-21 14:13                                                     ` Eric W. Biederman
  2009-01-21 20:34                                                     ` David Miller
  1 sibling, 0 replies; 109+ messages in thread
From: Eric W. Biederman @ 2009-01-21 14:13 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Tejun Heo, Ingo Molnar, Rusty Russell, Herbert Xu, akpm, hpa,
	brgerst, cl, travis, linux-kernel, steiner, hugh, David S. Miller,
	netdev, Mathieu Desnoyers

Stephen Hemminger <shemminger@vyatta.com> writes:

> This is crap. only a small fraction of these SNMP counters are
> close enough to the hot path to deserve per-cpu treatment.

Interesting.  I was just reading af_inet.c:ipv4_mib_init_net()
to get a feel for what a large consumer of percpu counters looks
like.

I expect the patterns I saw will hold even if the base size is much
smaller.


Your statement reinforces my point that our allocations in a per cpu
area are small and our dynamic per cpu allocator is not really
optimized for small allocations.

In most places what we want are 1-5 counters, which is max 40 bytes.
And our minimum allocation is a full cache line (64-128 bytes) per
allocation.



I'm wondering if dynamic per cpu allocations should work more like
slab.  Have a set of percpu base pointers for an area, and return an
area + offset in some convenient form.

Ideally we would just have one area (allowing us to keep the base
pointer in a register), but I'm not convinced that doesn't fall down
in the face of dynamic allocation.

Eric
 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-21 12:45                                                   ` Stephen Hemminger
  2009-01-21 14:13                                                     ` Eric W. Biederman
@ 2009-01-21 20:34                                                     ` David Miller
  1 sibling, 0 replies; 109+ messages in thread
From: David Miller @ 2009-01-21 20:34 UTC (permalink / raw)
  To: shemminger
  Cc: ebiederm, tj, mingo, rusty, herbert, akpm, hpa, brgerst, cl,
	travis, linux-kernel, steiner, hugh, netdev, mathieu.desnoyers

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 21 Jan 2009 23:45:01 +1100

> This is crap. only a small fraction of these SNMP counters are
> close enough to the hot path to deserve per-cpu treatment.

Only if your router/firewall/webserver isn't hitting that code path
which bumps the counters you think aren't hot path.

It's a micro-DoS waiting to happen if we start trying to split
counters up into groups which matter for hot path processing
(and thus use per-cpu stuff) and those which don't (and thus
use atomics or whatever your idea is).

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH core/percpu REPOST] linker script: add missing VMLINUX_SYMBOL
  2009-01-17  6:29                                             ` [PATCH core/percpu REPOST] linker script: add missing VMLINUX_SYMBOL Tejun Heo
  2009-01-17  6:32                                               ` [PATCH] linker script: add missing .data.percpu.page_aligned Tejun Heo
@ 2009-01-21 23:45                                               ` Christoph Lameter
  1 sibling, 0 replies; 109+ messages in thread
From: Christoph Lameter @ 2009-01-21 23:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, roel kluin, H. Peter Anvin, Brian Gerst,
	Eric Biederman, Rusty Russell, travis, Linux Kernel Mailing List,
	Andrew Morton, steiner, Hugh Dickins


Yup we forgot that one.

Reviewed-by: Christoph Lameter <cl@linux-foundation.org>

(not sure about the exact tag to use. AFAICT I wrote some early version
of this code)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] linker script: add missing .data.percpu.page_aligned
  2009-01-17  6:32                                               ` [PATCH] linker script: add missing .data.percpu.page_aligned Tejun Heo
  2009-01-17  8:21                                                 ` Ingo Molnar
@ 2009-01-21 23:46                                                 ` Christoph Lameter
  1 sibling, 0 replies; 109+ messages in thread
From: Christoph Lameter @ 2009-01-21 23:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, roel kluin, H. Peter Anvin, Brian Gerst,
	Eric Biederman, Rusty Russell, travis, Linux Kernel Mailing List,
	Andrew Morton, steiner, Hugh Dickins


Reviewed-by: Christoph Lameter <cl@linux-foundation.org>


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
       [not found]                                               ` <200901271213.18605.rusty@rustcorp.com.au>
@ 2009-01-27  2:24                                                 ` Tejun Heo
  2009-01-27 13:13                                                   ` Ingo Molnar
                                                                     ` (2 more replies)
  0 siblings, 3 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-27  2:24 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl, travis,
	linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers

Hello, Rusty.

Rusty Russell wrote:

>> No, they're not. They're preempt safe as mentioned in the comment
>> and is basically just generalization of the original x86 versions
>> used by x86_64 on SMP before pda and percpu areas were merged. I
>> agree that it's something very close to local_t and it would be
>> nice to see those somehow unified (and I have patches which make
>> use of local_t in my queue waiting for dynamic percpu allocation).
> 
> Yes, which is one reason I dislike Ingo's patch:
> 1) Mine did just read because that covers the most common fast-path use
> and is easily atomic for word-sizes on all archs,
> 2) Didn't replace x86, just #defined generic one, so much less churn,
> 3) read_percpu_var and read_percpu_ptr variants following the convention
> reinforced by my other patches.
>
> Linus' tree had read/write/add/or counts at 22/13/0/0. Yours has
> more write usage, so I'm happy there, but still only one add and one
> or. If we assume that generic code will look a bit like that when
> converted, I'm not convinced that generic and/or/etc ops are worth
> it.

There actually were quite some places where atomic add ops would be
useful, especially the places where statistics are collected.  For
logical bitops, I don't think we'll have too many of them.

> If they are worth doing generically, should the ops be atomic? To
> extrapolate from x86 usages again, it seems to be happy with
> non-atomic (tho of course it is atomic on x86).

If atomic rw/add/sub are implementible on most archs (and judging from
local_t, I suppose it is), I think it should.  So that it can replace
local_t and we won't need something else again in the future.

>> Another question to ask is whether to keep using separate
>> interfaces for static and dynamic percpu variables or migrate to
>> something which can take both.
>
> Well, IA64 can do stuff with static percpus that it can't do with
> dynamic (assuming we get expanding dynamic percpu areas
> later). That's because they use TLB tricks for a static 64k per-cpu
> area, but this doesn't scale.  That might not be vital: abandoning
> that trick will mean they can't optimise read_percpu/read_percpu_var
> etc as much.

Isn't something like the following possible?

#define pcpu_read(ptr)						\
({								\
	if (__builtin_constant_p(ptr) &&			\
	    ptr >= PCPU_STATIC_START && ptr < PCPU_STATIC_END)	\
		do 64k TLB trick for static pcpu;		\
	else							\
		do generic stuff;				\
})

> Tejun, any chance of you updating the tj-percpu tree? My current
> patches are against Linus's tree, and rebasing them on yours
> involves some icky merging.

If Ingo is okay with it, I'm fine with it too.  Unless Ingo objects,
I'll do it tomorrow-ish (still big holiday here).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-27  2:24                                                 ` Tejun Heo
@ 2009-01-27 13:13                                                   ` Ingo Molnar
  2009-01-27 23:07                                                     ` Tejun Heo
  2009-01-27 20:08                                                   ` Christoph Lameter
  2009-01-28 10:38                                                   ` Rusty Russell
  2 siblings, 1 reply; 109+ messages in thread
From: Ingo Molnar @ 2009-01-27 13:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Rusty Russell, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl,
	travis, linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers


* Tejun Heo <tj@kernel.org> wrote:

> > Tejun, any chance of you updating the tj-percpu tree? My current 
> > patches are against Linus's tree, and rebasing them on yours involves 
> > some icky merging.
> 
> If Ingo is okay with it, I'm fine with it too.  Unless Ingo objects, 
> I'll do it tomorrow-ish (still big holiday here).

Sure - but please do it as a clean rebase ontop of latest, not as a 
conflict-merge, as i'd expect there to be quite many clashes.

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-27  2:24                                                 ` Tejun Heo
  2009-01-27 13:13                                                   ` Ingo Molnar
@ 2009-01-27 20:08                                                   ` Christoph Lameter
  2009-01-27 21:47                                                     ` David Miller
  2009-01-28 10:38                                                   ` Rusty Russell
  2 siblings, 1 reply; 109+ messages in thread
From: Christoph Lameter @ 2009-01-27 20:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Rusty Russell, Ingo Molnar, Herbert Xu, akpm, hpa, brgerst,
	ebiederm, travis, linux-kernel, steiner, hugh, David S. Miller,
	netdev, Mathieu Desnoyers

On Tue, 27 Jan 2009, Tejun Heo wrote:

> > later). That's because they use TLB tricks for a static 64k per-cpu
> > area, but this doesn't scale.  That might not be vital: abandoning
> > that trick will mean they can't optimise read_percpu/read_percpu_var
> > etc as much.

Why wont it scale? this is a separate TLB entry for each processor.

>
> Isn't something like the following possible?
>
> #define pcpu_read(ptr)						\
> ({								\
> 	if (__builtin_constant_p(ptr) &&			\
> 	    ptr >= PCPU_STATIC_START && ptr < PCPU_STATIC_END)	\
> 		do 64k TLB trick for static pcpu;		\
> 	else							\
> 		do generic stuff;				\
> })

The TLB trick is just to access the percpu data at a fixed base. I.e.
	value = SHIFT_PERCPU_PTR(percpu_var, FIXED_ADDRESS);


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-27 20:08                                                   ` Christoph Lameter
@ 2009-01-27 21:47                                                     ` David Miller
  2009-01-27 22:47                                                       ` Rick Jones
  2009-01-28 16:45                                                       ` Christoph Lameter
  0 siblings, 2 replies; 109+ messages in thread
From: David Miller @ 2009-01-27 21:47 UTC (permalink / raw)
  To: cl
  Cc: tj, rusty, mingo, herbert, akpm, hpa, brgerst, ebiederm, travis,
	linux-kernel, steiner, hugh, netdev, mathieu.desnoyers

From: Christoph Lameter <cl@linux-foundation.org>
Date: Tue, 27 Jan 2009 15:08:57 -0500 (EST)

> On Tue, 27 Jan 2009, Tejun Heo wrote:
> 
> > > later). That's because they use TLB tricks for a static 64k per-cpu
> > > area, but this doesn't scale.  That might not be vital: abandoning
> > > that trick will mean they can't optimise read_percpu/read_percpu_var
> > > etc as much.
> 
> Why wont it scale? this is a separate TLB entry for each processor.

The IA64 per-cpu TLB entry only covers 64k which makes use of it for
dynamic per-cpu stuff out of the question.  That's why it "doesn't
scale"

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-27 21:47                                                     ` David Miller
@ 2009-01-27 22:47                                                       ` Rick Jones
  2009-01-28  0:17                                                         ` Luck, Tony
  2009-01-28 16:45                                                       ` Christoph Lameter
  1 sibling, 1 reply; 109+ messages in thread
From: Rick Jones @ 2009-01-27 22:47 UTC (permalink / raw)
  To: David Miller
  Cc: cl, tj, rusty, mingo, herbert, akpm, hpa, brgerst, ebiederm,
	travis, linux-kernel, steiner, hugh, netdev, mathieu.desnoyers,
	linux-ia64@vger.kernel.org

David Miller wrote:
> From: Christoph Lameter <cl@linux-foundation.org>
> Date: Tue, 27 Jan 2009 15:08:57 -0500 (EST)
> 
> 
>>On Tue, 27 Jan 2009, Tejun Heo wrote:
>>
>>
>>>>later). That's because they use TLB tricks for a static 64k per-cpu
>>>>area, but this doesn't scale.  That might not be vital: abandoning
>>>>that trick will mean they can't optimise read_percpu/read_percpu_var
>>>>etc as much.
>>
>>Why wont it scale? this is a separate TLB entry for each processor.
> 
> 
> The IA64 per-cpu TLB entry only covers 64k which makes use of it for
> dynamic per-cpu stuff out of the question.  That's why it "doesn't
> scale"

I was asking around, and was told that on IA64 *harware* at least, in addition to 
supporting multiple page sizes (up to a GB IIRC), one can pin up to 8 or perhaps 
1/2 the TLB entries.

So, in theory if one were so inclined the special pinned per-CPU entry could 
either be more than one 64K entry, or a single, rather larger entry.

As a sanity check, I've cc'd linux-ia64.

rick jones

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-27 13:13                                                   ` Ingo Molnar
@ 2009-01-27 23:07                                                     ` Tejun Heo
  2009-01-28  3:36                                                       ` Tejun Heo
  0 siblings, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-27 23:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl,
	travis, linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers

Ingo Molnar wrote:
> * Tejun Heo <tj@kernel.org> wrote:
> 
>>> Tejun, any chance of you updating the tj-percpu tree? My current 
>>> patches are against Linus's tree, and rebasing them on yours involves 
>>> some icky merging.
>> If Ingo is okay with it, I'm fine with it too.  Unless Ingo objects, 
>> I'll do it tomorrow-ish (still big holiday here).
> 
> Sure - but please do it as a clean rebase ontop of latest, not as a 
> conflict-merge, as i'd expect there to be quite many clashes.

Alrighty, re-basing.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* RE: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-27 22:47                                                       ` Rick Jones
@ 2009-01-28  0:17                                                         ` Luck, Tony
  2009-01-28 16:48                                                           ` Christoph Lameter
  0 siblings, 1 reply; 109+ messages in thread
From: Luck, Tony @ 2009-01-28  0:17 UTC (permalink / raw)
  To: Rick Jones, David Miller
  Cc: cl@linux-foundation.org, tj@kernel.org, rusty@rustcorp.com.au,
	mingo@elte.hu, herbert@gondor.apana.org.au,
	akpm@linux-foundation.org, hpa@zytor.com, brgerst@gmail.com,
	ebiederm@xmission.com, travis@sgi.com,
	linux-kernel@vger.kernel.org, steiner@sgi.com, hugh@veritas.com,
	netdev@vger.kernel.org, mathieu.desnoyers@polymtl.ca,
	linux-ia64@vger.kernel.org

> I was asking around, and was told that on IA64 *harware* at least, in addition to
> supporting multiple page sizes (up to a GB IIRC), one can pin up to 8 or perhaps
> 1/2 the TLB entries.

The number of TLB entries that can be pinned is model specific (but the
minimum allowed is 8 each for code & data).  Page sizes supported also
vary by model, recent models go up to 4GB.

BUT ... we stopped pinning this entry in the TLB when benchmarks showed
that it was better to just insert this as a regular TLB entry which might
can be dropped to map something else.  So now there is code in the Alt-DTLB
miss handler (ivt.S) to insert the per-cpu mapping on an as needed basis.

> So, in theory if one were so inclined the special pinned per-CPU entry could
> either be more than one 64K entry, or a single, rather larger entry.

Managing a larger space could be done ... but at the expense of making
the Alt-DTLB miss handler do a memory lookup to find the physical address
of the per-cpu page needed (assuming that we allocate a bunch of random
physical pages for use as per-cpu space rather than a single contiguous
block of physical memory).

When do we know the total amount of per-cpu memory needed?
1) CONFIG time?
2) Boot time?
3) Arbitrary run time?

-Tony

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-27 23:07                                                     ` Tejun Heo
@ 2009-01-28  3:36                                                       ` Tejun Heo
  2009-01-28  8:12                                                         ` Tejun Heo
  0 siblings, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-28  3:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl,
	travis, linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers

Tejun Heo wrote:
> Ingo Molnar wrote:
>> * Tejun Heo <tj@kernel.org> wrote:
>>
>>>> Tejun, any chance of you updating the tj-percpu tree? My current 
>>>> patches are against Linus's tree, and rebasing them on yours involves 
>>>> some icky merging.
>>> If Ingo is okay with it, I'm fine with it too.  Unless Ingo objects, 
>>> I'll do it tomorrow-ish (still big holiday here).
>> Sure - but please do it as a clean rebase ontop of latest, not as a 
>> conflict-merge, as i'd expect there to be quite many clashes.
> 
> Alrighty, re-basing.

Okay, rebased on top of the current master[1].

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu

The rebased commit ID is e7f538f3d2a6b3a0a632190a7d736730ee10909c.

There were several changes which didn't fit the rebased tree as they
really belong to other branches in the x86 tree.  I'll re-submit them
as patches against the appropriate branch.

Thanks.

-- 
tejun

[1] 5ee810072175042775e39bdd3eaaa68884c27805

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-28  3:36                                                       ` Tejun Heo
@ 2009-01-28  8:12                                                         ` Tejun Heo
  0 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-28  8:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl,
	travis, linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers

Tejun Heo wrote:
> Okay, rebased on top of the current master[1].
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu
> 
> The rebased commit ID is e7f538f3d2a6b3a0a632190a7d736730ee10909c.
> 
> There were several changes which didn't fit the rebased tree as they
> really belong to other branches in the x86 tree.  I'll re-submit them
> as patches against the appropriate branch.

Two patches against #stackprotector and one against #cpus4096 sent
out.  With these three patches, most of stuff has been preserved but
there are still few missing pieces which only make sense after trees
are merged.  Will be happy to drive it if the current branch looks
fine.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-27  2:24                                                 ` Tejun Heo
  2009-01-27 13:13                                                   ` Ingo Molnar
  2009-01-27 20:08                                                   ` Christoph Lameter
@ 2009-01-28 10:38                                                   ` Rusty Russell
  2009-01-28 10:56                                                     ` Tejun Heo
  2009-01-28 16:50                                                     ` Christoph Lameter
  2 siblings, 2 replies; 109+ messages in thread
From: Rusty Russell @ 2009-01-28 10:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl, travis,
	linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers

On Tuesday 27 January 2009 12:54:27 Tejun Heo wrote:
> Hello, Rusty.

Hi Tejun!

> There actually were quite some places where atomic add ops would be
> useful, especially the places where statistics are collected.  For
> logical bitops, I don't think we'll have too many of them.

If the stats are only manipulated in one context, than an atomic requirement is overkill (and expensive on non-x86).

> > If they are worth doing generically, should the ops be atomic? To
> > extrapolate from x86 usages again, it seems to be happy with
> > non-atomic (tho of course it is atomic on x86).
> 
> If atomic rw/add/sub are implementible on most archs (and judging from
> local_t, I suppose it is), I think it should.  So that it can replace
> local_t and we won't need something else again in the future.

This is more like Christoph's CPU_OPS: they were special operators on normal per-cpu vars/ptrs.  Generic version was irqsave+op+irqrestore.

I actually like this idea, but Mathieu insists that the ops be NMI-safe, for ftrace.  Hence local_t needing to be atomic_t for generic code.

AFAICT we'll need a hybrid: HAVE_NMISAFE_CPUOPS, and if not, use atomic_t
in ftrace (which isn't NMI safe on parisc or sparc/32 anyway, but I don't think we care).

Other than the shouting, I liked Christoph's system:
- CPU_INC = always safe (eg. local_irq_save/per_cpu(i)++/local_irq_restore)
- _CPU_INC = not safe against interrupts (eg. get_cpu/per_cpu(i)++/put_cpu)
- __CPU_INC = not safe against anything (eg. per_cpu(i)++)

I prefer the name 'local' to the name 'cpu', but I'm not hugely fussed.

> >> Another question to ask is whether to keep using separate
> >> interfaces for static and dynamic percpu variables or migrate to
> >> something which can take both.
> >
> > Well, IA64 can do stuff with static percpus that it can't do with
> > dynamic (assuming we get expanding dynamic percpu areas
> > later). That's because they use TLB tricks for a static 64k per-cpu
> > area, but this doesn't scale.  That might not be vital: abandoning
> > that trick will mean they can't optimise read_percpu/read_percpu_var
> > etc as much.
> 
> Isn't something like the following possible?
> 
> #define pcpu_read(ptr)						\
> ({								\
> 	if (__builtin_constant_p(ptr) &&			\
> 	    ptr >= PCPU_STATIC_START && ptr < PCPU_STATIC_END)	\
> 		do 64k TLB trick for static pcpu;		\
> 	else							\
> 		do generic stuff;				\
> })

No, that will be "do generic stuff", since it's a link-time constant.  I don't know that this is a huge worry, to be honest.  We can leave the __ia64_per_cpu_var for their arch-specific code (I feel the same way about x86 to be honest).

> > Tejun, any chance of you updating the tj-percpu tree? My current
> > patches are against Linus's tree, and rebasing them on yours
> > involves some icky merging.
> 
> If Ingo is okay with it, I'm fine with it too.  Unless Ingo objects,
> I'll do it tomorrow-ish (still big holiday here).

Ah, I did not realize that you celebrated Australia day :)

Cheers!
Rusty.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-28 10:38                                                   ` Rusty Russell
@ 2009-01-28 10:56                                                     ` Tejun Heo
  2009-01-29  2:06                                                       ` Rusty Russell
  2009-01-28 16:50                                                     ` Christoph Lameter
  1 sibling, 1 reply; 109+ messages in thread
From: Tejun Heo @ 2009-01-28 10:56 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl, travis,
	linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers

Hello,

Rusty Russell wrote:
> On Tuesday 27 January 2009 12:54:27 Tejun Heo wrote:
>> Hello, Rusty.
> 
> Hi Tejun!
> 
>> There actually were quite some places where atomic add ops would be
>> useful, especially the places where statistics are collected.  For
>> logical bitops, I don't think we'll have too many of them.
> 
> If the stats are only manipulated in one context, than an atomic
> requirement is overkill (and expensive on non-x86).

Yes, it is.  I was hoping it to be not more expensive on most archs.
It isn't on x86 at the very least but I don't know much about other
archs.

>>> If they are worth doing generically, should the ops be atomic? To
>>> extrapolate from x86 usages again, it seems to be happy with
>>> non-atomic (tho of course it is atomic on x86).
>> If atomic rw/add/sub are implementible on most archs (and judging from
>> local_t, I suppose it is), I think it should.  So that it can replace
>> local_t and we won't need something else again in the future.
> 
> This is more like Christoph's CPU_OPS: they were special operators
> on normal per-cpu vars/ptrs.  Generic version was
> irqsave+op+irqrestore.
>
> I actually like this idea, but Mathieu insists that the ops be
> NMI-safe, for ftrace.  Hence local_t needing to be atomic_t for
> generic code.
> 
> AFAICT we'll need a hybrid: HAVE_NMISAFE_CPUOPS, and if not, use
> atomic_t in ftrace (which isn't NMI safe on parisc or sparc/32
> anyway, but I don't think we care).

Requiring NMI-safeness is quite an exception, I suppose.  I don't
think we should design around it.  If it can be worked around one way
or the other, it should be fine.

> Other than the shouting, I liked Christoph's system:
> - CPU_INC = always safe (eg. local_irq_save/per_cpu(i)++/local_irq_restore)
> - _CPU_INC = not safe against interrupts (eg. get_cpu/per_cpu(i)++/put_cpu)
> - __CPU_INC = not safe against anything (eg. per_cpu(i)++)
> 
> I prefer the name 'local' to the name 'cpu', but I'm not hugely fussed.

I like local better too but no biggies one way or the other.

>>>> Another question to ask is whether to keep using separate
>>>> interfaces for static and dynamic percpu variables or migrate to
>>>> something which can take both.
>>> Well, IA64 can do stuff with static percpus that it can't do with
>>> dynamic (assuming we get expanding dynamic percpu areas
>>> later). That's because they use TLB tricks for a static 64k per-cpu
>>> area, but this doesn't scale.  That might not be vital: abandoning
>>> that trick will mean they can't optimise read_percpu/read_percpu_var
>>> etc as much.
>> Isn't something like the following possible?
>>
>> #define pcpu_read(ptr)						\
>> ({								\
>> 	if (__builtin_constant_p(ptr) &&			\
>> 	    ptr >= PCPU_STATIC_START && ptr < PCPU_STATIC_END)	\
>> 		do 64k TLB trick for static pcpu;		\
>> 	else							\
>> 		do generic stuff;				\
>> })
> 
> No, that will be "do generic stuff", since it's a link-time
> constant.  I don't know that this is a huge worry, to be honest.  We
> can leave the __ia64_per_cpu_var for their arch-specific code (I
> feel the same way about x86 to be honest).

Yes, right.  Got confused there.  Hmmm... looks like what would work
there is "is it a lvalue?" test.  Well, anyways, if it isn't
necessary.

>>> Tejun, any chance of you updating the tj-percpu tree? My current
>>> patches are against Linus's tree, and rebasing them on yours
>>> involves some icky merging.
>> If Ingo is okay with it, I'm fine with it too.  Unless Ingo objects,
>> I'll do it tomorrow-ish (still big holiday here).
> 
> Ah, I did not realize that you celebrated Australia day :)

Hey, didn't know Australia was founded on lunar New Year's day.
Nice. :-)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-27 21:47                                                     ` David Miller
  2009-01-27 22:47                                                       ` Rick Jones
@ 2009-01-28 16:45                                                       ` Christoph Lameter
  2009-01-28 20:47                                                         ` David Miller
  1 sibling, 1 reply; 109+ messages in thread
From: Christoph Lameter @ 2009-01-28 16:45 UTC (permalink / raw)
  To: David Miller
  Cc: tj, rusty, mingo, herbert, akpm, hpa, brgerst, ebiederm, travis,
	linux-kernel, steiner, hugh, netdev, mathieu.desnoyers

On Tue, 27 Jan 2009, David Miller wrote:

> > Why wont it scale? this is a separate TLB entry for each processor.
>
> The IA64 per-cpu TLB entry only covers 64k which makes use of it for
> dynamic per-cpu stuff out of the question.  That's why it "doesn't
> scale"

IA64 supports varying page sizes. You can use a 128k TLB entry etc.


^ permalink raw reply	[flat|nested] 109+ messages in thread

* RE: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-28  0:17                                                         ` Luck, Tony
@ 2009-01-28 16:48                                                           ` Christoph Lameter
  2009-01-28 17:15                                                             ` Luck, Tony
  0 siblings, 1 reply; 109+ messages in thread
From: Christoph Lameter @ 2009-01-28 16:48 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Rick Jones, David Miller, tj@kernel.org, rusty@rustcorp.com.au,
	mingo@elte.hu, herbert@gondor.apana.org.au,
	akpm@linux-foundation.org, hpa@zytor.com, brgerst@gmail.com,
	ebiederm@xmission.com, travis@sgi.com,
	linux-kernel@vger.kernel.org, steiner@sgi.com, hugh@veritas.com,
	netdev@vger.kernel.org, mathieu.desnoyers@polymtl.ca,
	linux-ia64@vger.kernel.org

On Tue, 27 Jan 2009, Luck, Tony wrote:

> Managing a larger space could be done ... but at the expense of making
> the Alt-DTLB miss handler do a memory lookup to find the physical address
> of the per-cpu page needed (assuming that we allocate a bunch of random
> physical pages for use as per-cpu space rather than a single contiguous
> block of physical memory).

We cannot resize the area by using a single larger TLB entry?

> When do we know the total amount of per-cpu memory needed?
> 1) CONFIG time?

Would be easiest.

> 2) Boot time?

We could make the TLB entry size configurable with a kernel parameter

> 3) Arbitrary run time?

We could reserve a larger virtual space and switch TLB entries as needed?
We would need do get larger order pages to do this.


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-28 10:38                                                   ` Rusty Russell
  2009-01-28 10:56                                                     ` Tejun Heo
@ 2009-01-28 16:50                                                     ` Christoph Lameter
  2009-01-28 18:07                                                       ` Mathieu Desnoyers
  1 sibling, 1 reply; 109+ messages in thread
From: Christoph Lameter @ 2009-01-28 16:50 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Tejun Heo, Ingo Molnar, Herbert Xu, akpm, hpa, brgerst, ebiederm,
	travis, linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers

On Wed, 28 Jan 2009, Rusty Russell wrote:

> AFAICT we'll need a hybrid: HAVE_NMISAFE_CPUOPS, and if not, use atomic_t
> in ftrace (which isn't NMI safe on parisc or sparc/32 anyway, but I don't think we care).

Right.


> Other than the shouting, I liked Christoph's system:
> - CPU_INC = always safe (eg. local_irq_save/per_cpu(i)++/local_irq_restore)
> - _CPU_INC = not safe against interrupts (eg. get_cpu/per_cpu(i)++/put_cpu)
> - __CPU_INC = not safe against anything (eg. per_cpu(i)++)
>
> I prefer the name 'local' to the name 'cpu', but I'm not hugely fussed.

The term cpu is meaning multiple things at this point. So yes it may be
better to go with glibc naming of thread local space.


^ permalink raw reply	[flat|nested] 109+ messages in thread

* RE: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-28 16:48                                                           ` Christoph Lameter
@ 2009-01-28 17:15                                                             ` Luck, Tony
  0 siblings, 0 replies; 109+ messages in thread
From: Luck, Tony @ 2009-01-28 17:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rick Jones, David Miller, tj@kernel.org, rusty@rustcorp.com.au,
	mingo@elte.hu, herbert@gondor.apana.org.au,
	akpm@linux-foundation.org, hpa@zytor.com, brgerst@gmail.com,
	ebiederm@xmission.com, travis@sgi.com,
	linux-kernel@vger.kernel.org, steiner@sgi.com, hugh@veritas.com,
	netdev@vger.kernel.org, mathieu.desnoyers@polymtl.ca,
	linux-ia64@vger.kernel.org

> > Managing a larger space could be done ... but at the expense of making
> > the Alt-DTLB miss handler do a memory lookup to find the physical address
> > of the per-cpu page needed (assuming that we allocate a bunch of random
> > physical pages for use as per-cpu space rather than a single contiguous
> > block of physical memory).
>
> We cannot resize the area by using a single larger TLB entry?

Yes we could ... the catch is that the supported TLB page sizes go
up by multiples of 4.  So if 64K is not enough the next stop is
at 256K, then 1M, then 4M, and so on.

That's why I asked when we know what total amount of per-cpu memory
is needed.  CONFIG time or boot time is good, because allocating higher
order pages is easy at boot time.  Arbitrary run time is bad because
we might have difficulty allocating a high order page for every cpu.

-Tony

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-28 16:50                                                     ` Christoph Lameter
@ 2009-01-28 18:07                                                       ` Mathieu Desnoyers
  2009-01-29 18:33                                                         ` Christoph Lameter
  0 siblings, 1 reply; 109+ messages in thread
From: Mathieu Desnoyers @ 2009-01-28 18:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rusty Russell, Tejun Heo, Ingo Molnar, Herbert Xu, akpm, hpa,
	brgerst, ebiederm, travis, linux-kernel, steiner, hugh,
	David S. Miller, netdev

* Christoph Lameter (cl@linux-foundation.org) wrote:
> On Wed, 28 Jan 2009, Rusty Russell wrote:
> 
> > AFAICT we'll need a hybrid: HAVE_NMISAFE_CPUOPS, and if not, use atomic_t
> > in ftrace (which isn't NMI safe on parisc or sparc/32 anyway, but I don't think we care).
> 
> Right.
> 

Ideally this should be done transparently so we don't have to #ifdef
code around HAVE_NMISAFE_CPUOPS everywhere in the tracer. We might
consider declaring an intermediate type with this kind of #ifdef in the
tracer code if we are the only one user though.

> 
> > Other than the shouting, I liked Christoph's system:
> > - CPU_INC = always safe (eg. local_irq_save/per_cpu(i)++/local_irq_restore)
> > - _CPU_INC = not safe against interrupts (eg. get_cpu/per_cpu(i)++/put_cpu)
> > - __CPU_INC = not safe against anything (eg. per_cpu(i)++)
> >
> > I prefer the name 'local' to the name 'cpu', but I'm not hugely fussed.
> 
> The term cpu is meaning multiple things at this point. So yes it may be
> better to go with glibc naming of thread local space.
> 

However using "local" for "per-cpu" could be confusing with the glibc
naming of thread local space, because "per-thread" and "per-cpu"
variables are different from a synchronization POV and we can end up
needing both (e.g. a thread local variable can never be accessed by
another thread, but a cpu local variable could be accessed by a
different CPU due to scheduling).

I'm currently thinking about implementing user-space per-cpu tracing buffers,
and the last thing I'd like is to have a "local" semantic clash between
the kernel and glibc. Ideally, we could have something like :

Atomic safe primitives (emulated with irq disable if the architecture
does not have atomic primitives) :
- atomic_thread_inc()
  * current mainline local_t local_inc().
- atomic_cpu_inc()
  * Your proposed CPU_INC.

Non-safe against interrupts, but safe against preemption :
- thread_inc()
  * no preempt_disable involved, because this deals with per-thread
    variables :
    * Simple var++
- cpu_inc()
  * disables preemption, per_cpu(i)++, enables preemption

Not safe against preemption nor interrupts :
- _thread_inc()
  * maps to thread_inc()
- _cpu_inc()
  * simple per_cpu(i)++

So maybe we don't really need thread_inc(), _thread_inc() and _cpu_inc,
because they map to standard C operations, but we may find that in some
architectures that the atomic_cpu_inc() is faster than the per_cpu(i)++,
and in those cases it would make sense to map e.g. _cpu_inc() to
atomic_cpu_inc().

Also note that don't think adding _ and __ prefixes to the operations
makes it clear for the programmer and reviewer what is safe against
what. OMHO, it will just make the code more obscure. One level of
underscore is about the limit I think people can "know" what this
"unsafe" version of the primitive does.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-28 16:45                                                       ` Christoph Lameter
@ 2009-01-28 20:47                                                         ` David Miller
  0 siblings, 0 replies; 109+ messages in thread
From: David Miller @ 2009-01-28 20:47 UTC (permalink / raw)
  To: cl
  Cc: tj, rusty, mingo, herbert, akpm, hpa, brgerst, ebiederm, travis,
	linux-kernel, steiner, hugh, netdev, mathieu.desnoyers

From: Christoph Lameter <cl@linux-foundation.org>
Date: Wed, 28 Jan 2009 11:45:28 -0500 (EST)

> On Tue, 27 Jan 2009, David Miller wrote:
> 
> > > Why wont it scale? this is a separate TLB entry for each processor.
> >
> > The IA64 per-cpu TLB entry only covers 64k which makes use of it for
> > dynamic per-cpu stuff out of the question.  That's why it "doesn't
> > scale"
> 
> IA64 supports varying page sizes. You can use a 128k TLB entry etc.

Good luck moving to a larger size dynamically at run time.

It really isn't a tenable solution.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-28 10:56                                                     ` Tejun Heo
@ 2009-01-29  2:06                                                       ` Rusty Russell
  2009-01-31  6:11                                                         ` Tejun Heo
  0 siblings, 1 reply; 109+ messages in thread
From: Rusty Russell @ 2009-01-29  2:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl, travis,
	linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers

On Wednesday 28 January 2009 21:26:34 Tejun Heo wrote:
> Hello,

Hi Tejun,

> Rusty Russell wrote:
> > If the stats are only manipulated in one context, than an atomic
> > requirement is overkill (and expensive on non-x86).
> 
> Yes, it is.  I was hoping it to be not more expensive on most archs.
> It isn't on x86 at the very least but I don't know much about other
> archs.

Hmm, you can garner this from the local_t stats which were flying around.
(see Re: local_add_return from me), or look in the preamble to
http://ozlabs.org/~rusty/kernel/rr-latest/misc:test-local_t.patch ).

Of course, if you want to be my hero, you could implement "soft" irq
disable for all archs, which would make this cheaper.

> > Other than the shouting, I liked Christoph's system:
> > - CPU_INC = always safe (eg. local_irq_save/per_cpu(i)++/local_irq_restore)
> > - _CPU_INC = not safe against interrupts (eg. get_cpu/per_cpu(i)++/put_cpu)
> > - __CPU_INC = not safe against anything (eg. per_cpu(i)++)
> > 
> > I prefer the name 'local' to the name 'cpu', but I'm not hugely fussed.
> 
> I like local better too but no biggies one way or the other.

Maybe kill local_t and take the name back.  I'll leave it to you...

> > Ah, I did not realize that you celebrated Australia day :)
> 
> Hey, didn't know Australia was founded on lunar New Year's day.
> Nice. :-)

That would have been cool, but no; first time in 76 years they matched tho.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-28 18:07                                                       ` Mathieu Desnoyers
@ 2009-01-29 18:33                                                         ` Christoph Lameter
  2009-01-29 18:48                                                           ` H. Peter Anvin
  0 siblings, 1 reply; 109+ messages in thread
From: Christoph Lameter @ 2009-01-29 18:33 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Rusty Russell, Tejun Heo, Ingo Molnar, Herbert Xu, akpm, hpa,
	brgerst, ebiederm, travis, linux-kernel, steiner, hugh,
	David S. Miller, netdev

On Wed, 28 Jan 2009, Mathieu Desnoyers wrote:

> > The term cpu is meaning multiple things at this point. So yes it may be
> > better to go with glibc naming of thread local space.
> >
>
> However using "local" for "per-cpu" could be confusing with the glibc
> naming of thread local space, because "per-thread" and "per-cpu"
> variables are different from a synchronization POV and we can end up
> needing both (e.g. a thread local variable can never be accessed by
> another thread, but a cpu local variable could be accessed by a
> different CPU due to scheduling).

gcc/glibc support a __thread attribute to variables. As far as I can tell
this automatically makes gcc perform the relocation to the current
context using a segment register.

But its a weird ABI http://people.redhat.com/drepper/tls.pdf. After
reading that I agree that we should stay with the cpu ops and forget about
the local and thread stuff in gcc/glibc.


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-29 18:33                                                         ` Christoph Lameter
@ 2009-01-29 18:48                                                           ` H. Peter Anvin
  0 siblings, 0 replies; 109+ messages in thread
From: H. Peter Anvin @ 2009-01-29 18:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mathieu Desnoyers, Rusty Russell, Tejun Heo, Ingo Molnar,
	Herbert Xu, akpm, brgerst, ebiederm, travis, linux-kernel,
	steiner, hugh, David S. Miller, netdev

Christoph Lameter wrote:
> 
> gcc/glibc support a __thread attribute to variables. As far as I can tell
> this automatically makes gcc perform the relocation to the current
> context using a segment register.
> 
> But its a weird ABI http://people.redhat.com/drepper/tls.pdf. After
> reading that I agree that we should stay with the cpu ops and forget about
> the local and thread stuff in gcc/glibc.
> 

We have discussed this a number of times.  There are several issues with
it; one being that per-cpu != per-thread (we can switch CPU whereever
we're preempted), and another that many versions of gcc uses hardcoded
registers, in particular %fs on 64 bits, but the kernel *has* to use %gs
since there is no swapfs instruction.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-29  2:06                                                       ` Rusty Russell
@ 2009-01-31  6:11                                                         ` Tejun Heo
  0 siblings, 0 replies; 109+ messages in thread
From: Tejun Heo @ 2009-01-31  6:11 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Herbert Xu, akpm, hpa, brgerst, ebiederm, cl, travis,
	linux-kernel, steiner, hugh, David S. Miller, netdev,
	Mathieu Desnoyers

Hello, Rusty.

Rusty Russell wrote:
>> Rusty Russell wrote:
>>> If the stats are only manipulated in one context, than an atomic
>>> requirement is overkill (and expensive on non-x86).
>> Yes, it is.  I was hoping it to be not more expensive on most archs.
>> It isn't on x86 at the very least but I don't know much about other
>> archs.
> 
> Hmm, you can garner this from the local_t stats which were flying around.
> (see Re: local_add_return from me), or look in the preamble to
> http://ozlabs.org/~rusty/kernel/rr-latest/misc:test-local_t.patch ).

Ah... Great.

> Of course, if you want to be my hero, you could implement "soft" irq
> disable for all archs, which would make this cheaper.

I suppose you mean deferred execution of interrupt handlers for quick
atomicities.  Yeah, that would be nice for things like this.

>>> Other than the shouting, I liked Christoph's system:
>>> - CPU_INC = always safe (eg. local_irq_save/per_cpu(i)++/local_irq_restore)
>>> - _CPU_INC = not safe against interrupts (eg. get_cpu/per_cpu(i)++/put_cpu)
>>> - __CPU_INC = not safe against anything (eg. per_cpu(i)++)
>>>
>>> I prefer the name 'local' to the name 'cpu', but I'm not hugely fussed.
>> I like local better too but no biggies one way or the other.
> 
> Maybe kill local_t and take the name back.  I'll leave it to you...
> 
>>> Ah, I did not realize that you celebrated Australia day :)
>> Hey, didn't know Australia was founded on lunar New Year's day.
>> Nice. :-)
> 
> That would have been cool, but no; first time in 76 years they matched tho.

It was a joke.  :-)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-14  9:38         ` [patch] add optimized generic percpu accessors Ingo Molnar
                             ` (3 preceding siblings ...)
  2009-01-15 10:26           ` Tejun Heo
@ 2009-01-31 10:30           ` Jeremy Fitzhardinge
  2009-01-31 16:00             ` Ingo Molnar
  4 siblings, 1 reply; 109+ messages in thread
From: Jeremy Fitzhardinge @ 2009-01-31 10:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Tejun Heo, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

Ingo Molnar wrote:
> Tejun, could you please also add the patch below to your lineup too?
>
> It is an optimization and a cleanup, and adds the following new generic 
> percpu methods:
>
>   percpu_read()
>   percpu_write()
>   percpu_add()
>   percpu_sub()
>   percpu_or() 
>   percpu_xor()
>
> and implements support for them on x86. (other architectures will fall 
> back to a default implementation)
>
> The advantage is that for example to read a local percpu variable, instead 
> of this sequence:
>
>  return __get_cpu_var(var);
>
>  ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
>  ffffffff8102ca32:	81 
>  ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
>  ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax
>
> We can get a single instruction by using the optimized variants:
>
>  return percpu_read(var);
>
>  ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax
>
> I also cleaned up the x86-specific APIs and made the x86 code use these 
> new generic percpu primitives.
>
> It looks quite hard to convince the compiler to generate the optimized 
> single-instruction sequence for us out of __get_cpu_var(var) - or can you 
> perhaps see a way to do it?
>
> The patch is against your latest zero-based percpu / pda unification tree. 
> Untested.

I have no objection to this patch overall, or the use of non-arch 
specific names.

But there is one subtle thing you're overlooking here.  The x86_*_percpu 
operations are guaranteed to be atomic with respect to preemption, so 
you can use them when preemption is enabled.  When they compile down to 
one instruction then that happens naturally, otherwise they have to be 
wrapped with preempt_disable/enable.

Otherwise, being able to access a percpu with one instruction is nice, 
but it isn't all that efficient.  If you're going to access a variable 
more than once or twice, its more efficient to take the address and 
access it via that.

So, upshot, I think the default versions should be wrapped in 
preempt_disable/enable to preserve this interface invariant.

    J

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH] percpu: add optimized generic percpu accessors
  2009-01-15 13:23                       ` [PATCH] percpu: " Tejun Heo
                                           ` (2 preceding siblings ...)
  2009-01-15 18:46                         ` Ingo Molnar
@ 2009-01-31 10:36                         ` Jeremy Fitzhardinge
  3 siblings, 0 replies; 109+ messages in thread
From: Jeremy Fitzhardinge @ 2009-01-31 10:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, H. Peter Anvin, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh

Tejun Heo wrote:
> I also cleaned up the x86-specific APIs and made the x86 code use
> these new generic percpu primitives.
>
> tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
>     * added percpu_and() for completeness's sake
>     * made generic percpu ops atomic against preemption
>   

Ah, good.  I'm happy.

    J

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [patch] add optimized generic percpu accessors
  2009-01-31 10:30           ` Jeremy Fitzhardinge
@ 2009-01-31 16:00             ` Ingo Molnar
  0 siblings, 0 replies; 109+ messages in thread
From: Ingo Molnar @ 2009-01-31 16:00 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Tejun Heo, Brian Gerst, ebiederm, cl, rusty,
	travis, linux-kernel, akpm, steiner, hugh


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> So, upshot, I think the default versions should be wrapped in 
> preempt_disable/enable to preserve this interface invariant.

The default ones already are:

#define __percpu_generic_to_op(var, val, op)                            \
do {                                                                    \
        get_cpu_var(var) op val;                                        \
        put_cpu_var(var);                                               \
} while (0)

	Ingo

^ permalink raw reply	[flat|nested] 109+ messages in thread

end of thread, other threads:[~2009-01-31 16:01 UTC | newest]

Thread overview: 109+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-13 10:38 [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based on SMP Tejun Heo
2009-01-13 10:38 ` [PATCH 01/13] x86_64: fix pda_to_op() Tejun Heo
2009-01-13 10:38 ` [PATCH 02/13] x86: make early_per_cpu() a lvalue and use it Tejun Heo
2009-01-13 11:43   ` [PATCH 02/13 UPDATED] " Tejun Heo
2009-01-13 10:38 ` [PATCH 03/13] x86_64: Cleanup early setup_percpu references Tejun Heo
2009-01-13 10:38 ` [PATCH 04/13] x86_32: make vmlinux_32.lds.S use PERCPU() macro Tejun Heo
2009-01-13 10:38 ` [PATCH 05/13] x86_64: make percpu symbols zerobased on SMP Tejun Heo
2009-01-13 10:38 ` [PATCH 06/13] x86_64: load pointer to pda into %gs while brining up a CPU Tejun Heo
2009-01-13 10:38 ` [PATCH 07/13] x86_64: use static _cpu_pda array Tejun Heo
2009-01-13 10:38 ` [PATCH 08/13] x86_64: fold pda into percpu area on SMP Tejun Heo
2009-01-13 10:38 ` [PATCH 09/13] x86_64: merge 64 and 32 SMP percpu handling Tejun Heo
2009-01-13 10:38 ` [PATCH 10/13] x86_64: make pda a percpu variable Tejun Heo
2009-01-13 10:38 ` [PATCH 11/13] x86_64: convert pda ops to wrappers around x86 percpu accessors Tejun Heo
2009-01-13 10:38 ` [PATCH 12/13] x86_64: misc clean up after the percpu update Tejun Heo
2009-01-13 10:38 ` [PATCH 13/13] x86_32: make percpu symbols zerobased on SMP Tejun Heo
2009-01-14  0:18   ` Rusty Russell
2009-01-14  2:03     ` Tejun Heo
2009-01-13 10:48 ` [PATCHSET linux-2.6-x86:tip] x86: make percpu offsets zero-based " Tejun Heo
2009-01-13 13:27   ` Brian Gerst
2009-01-13 14:05     ` Tejun Heo
2009-01-13 14:26       ` Brian Gerst
2009-01-13 14:37         ` Tejun Heo
2009-01-14  6:57       ` H. Peter Anvin
2009-01-14  9:38         ` [patch] add optimized generic percpu accessors Ingo Molnar
2009-01-14  9:45           ` Ingo Molnar
     [not found]           ` <200901151253.44016.rusty@rustcorp.com.au>
2009-01-15  9:55             ` Ingo Molnar
2009-01-15 10:27             ` Tejun Heo
2009-01-15 10:04           ` roel kluin
2009-01-15 10:26             ` Tejun Heo
2009-01-15 11:32               ` Ingo Molnar
2009-01-15 11:36                 ` Tejun Heo
2009-01-15 12:22                   ` Ingo Molnar
2009-01-15 13:09                     ` Tejun Heo
2009-01-15 13:32                       ` Ingo Molnar
2009-01-15 13:33                         ` Ingo Molnar
2009-01-15 13:39                         ` Ingo Molnar
2009-01-15 21:54                           ` Tejun Heo
2009-01-16  1:28                             ` [PATCH x86/percpu] x86: fix build bug introduced during merge Tejun Heo
2009-01-16  3:25                               ` [PATCH x86/percpu] x86_64: initialize this_cpu_off to __per_cpu_load Tejun Heo
2009-01-16 13:16                                 ` Ingo Molnar
2009-01-16 13:47                                   ` Tejun Heo
     [not found]                                     ` <20090116221651.GA32736@elte.hu>
     [not found]                                       ` <20090116223828.GA9294@elte.hu>
     [not found]                                         ` <49716F0E.7060605@gmail.com>
     [not found]                                           ` <49716FBF.7080203@kernel.org>
2009-01-17  6:29                                             ` [PATCH core/percpu REPOST] linker script: add missing VMLINUX_SYMBOL Tejun Heo
2009-01-17  6:32                                               ` [PATCH] linker script: add missing .data.percpu.page_aligned Tejun Heo
2009-01-17  8:21                                                 ` Ingo Molnar
2009-01-21 23:46                                                 ` Christoph Lameter
2009-01-21 23:45                                               ` [PATCH core/percpu REPOST] linker script: add missing VMLINUX_SYMBOL Christoph Lameter
2009-01-15 22:34                           ` [patch] add optimized generic percpu accessors Ingo Molnar
2009-01-15 20:31                     ` Christoph Lameter
2009-01-16  9:41                       ` Tejun Heo
2009-01-16 13:23                         ` Ingo Molnar
2009-01-16 13:49                           ` Tejun Heo
2009-01-15 10:26           ` Tejun Heo
2009-01-15 11:30             ` Ingo Molnar
2009-01-15 11:38               ` Tejun Heo
2009-01-15 12:26                 ` Ingo Molnar
2009-01-15 13:04                   ` Tejun Heo
2009-01-15 13:07                     ` Ingo Molnar
2009-01-15 13:23                       ` [PATCH] percpu: " Tejun Heo
2009-01-15 13:36                         ` Ingo Molnar
2009-01-15 17:30                         ` Andrew Morton
2009-01-15 18:02                           ` Ingo Molnar
2009-01-15 18:34                             ` Andrew Morton
2009-01-15 18:39                               ` Ingo Molnar
2009-01-15 21:53                                 ` Tejun Heo
2009-01-16  0:12                                 ` Herbert Xu
2009-01-16  0:15                                   ` Ingo Molnar
2009-01-16  0:18                                     ` Herbert Xu
     [not found]                                       ` <200901170827.33729.rusty@rustcorp.com.au>
2009-01-16 22:08                                         ` Ingo Molnar
     [not found]                                           ` <200901201328.24605.rusty@rustcorp.com.au>
2009-01-20  6:25                                             ` Tejun Heo
2009-01-20 10:36                                               ` Ingo Molnar
     [not found]                                               ` <200901271213.18605.rusty@rustcorp.com.au>
2009-01-27  2:24                                                 ` Tejun Heo
2009-01-27 13:13                                                   ` Ingo Molnar
2009-01-27 23:07                                                     ` Tejun Heo
2009-01-28  3:36                                                       ` Tejun Heo
2009-01-28  8:12                                                         ` Tejun Heo
2009-01-27 20:08                                                   ` Christoph Lameter
2009-01-27 21:47                                                     ` David Miller
2009-01-27 22:47                                                       ` Rick Jones
2009-01-28  0:17                                                         ` Luck, Tony
2009-01-28 16:48                                                           ` Christoph Lameter
2009-01-28 17:15                                                             ` Luck, Tony
2009-01-28 16:45                                                       ` Christoph Lameter
2009-01-28 20:47                                                         ` David Miller
2009-01-28 10:38                                                   ` Rusty Russell
2009-01-28 10:56                                                     ` Tejun Heo
2009-01-29  2:06                                                       ` Rusty Russell
2009-01-31  6:11                                                         ` Tejun Heo
2009-01-28 16:50                                                     ` Christoph Lameter
2009-01-28 18:07                                                       ` Mathieu Desnoyers
2009-01-29 18:33                                                         ` Christoph Lameter
2009-01-29 18:48                                                           ` H. Peter Anvin
2009-01-20 10:40                                             ` Ingo Molnar
2009-01-21  5:52                                               ` Tejun Heo
2009-01-21 10:05                                                 ` Ingo Molnar
2009-01-21 11:21                                                 ` Eric W. Biederman
2009-01-21 12:45                                                   ` Stephen Hemminger
2009-01-21 14:13                                                     ` Eric W. Biederman
2009-01-21 20:34                                                     ` David Miller
2009-01-16  1:09                                     ` H. Peter Anvin
     [not found]                                   ` <200901170804.18622.rusty@rustcorp.com.au>
2009-01-16 21:59                                     ` Ingo Molnar
2009-01-16 22:09                                     ` Ingo Molnar
2009-01-16 14:10                               ` Mark Lord
2009-01-15 18:46                         ` Ingo Molnar
2009-01-31 10:36                         ` Jeremy Fitzhardinge
2009-01-15 13:59                     ` [patch] " roel kluin
2009-01-15 21:51                       ` Tejun Heo
     [not found]                 ` <200901170748.53734.rusty@rustcorp.com.au>
2009-01-16 21:24                   ` Ingo Molnar
2009-01-31 10:30           ` Jeremy Fitzhardinge
2009-01-31 16:00             ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).