From: clameter@sgi.com
From: Christoph Lameter <clameter@sgi.com>
To: ak@suse.de
Cc: akpm@linux-foundation.org
Cc: travis@sgi.com
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: linux-kernel@vger.kernel.org
Subject: [rfc 03/45] Generic CPU operations: Core piece
Date: Mon, 19 Nov 2007 17:11:35 -0800 [thread overview]
Message-ID: <20071120011332.415903723@sgi.com> (raw)
In-Reply-To: 20071120011132.143632442@sgi.com
[-- Attachment #1: cpu_ops_base --]
[-- Type: text/plain, Size: 6870 bytes --]
Currently the per cpu subsystem is not able to use the atomic capabilities
of the processors we have.
This adds new functionality that allows the optimizing of per cpu variable
handliong. It in particular provides a simple way to exploit atomic operations
to avoid having to disable itnerrupts or add an per cpu offset.
F.e. current implementations may do
unsigned long flags;
struct stat_struct *p;
local_irq_save(flags);
/* Calculate address of per processor area */
p = CPU_PTR(stat, smp_processor_id());
p->counter++;
local_irq_restore(flags);
This whole segment can be replaced by a single CPU operation
CPU_INC(stat->counter);
And on most processors it is possible to perform the increment with
a single processor instruction. Processors have segment registers,
global registers and per cpu mappings of per cpu areas for that purpose.
The problem is that the current schemes cannot utilize those features.
local_t is not really addressing the issue since the offset calculation
is not solved. local_t is x86 processor specific. This solution here
can utilize other methods than just the x86 instruction set.
On x86 the above CPU_INC translated into a single instruction:
inc %%gs:(&stat->counter)
This instruction is interrupt safe since it can either be completed
or not.
The determination of the correct per cpu area for the current processor
does not require access to smp_processor_id() (expensive...). The gs
register is used to provide a processor specific offset to the respective
per cpu area where the per cpu variabvle resides.
Note tha the counter offset into the struct was added *before* the segment
selector was added. This is necessary to avoid calculation, In the past
we first determine the address of the stats structure on the respective
processor and then added the field offset. However, the offset may as
well be added earlier.
If stat was declared via DECLARE_PER_CPU then this patchset is capoable of
convincing the linker to provide the proper base address. In that case
no calculations are necessary.
Should the stats structure be reachable via a register then the address
calculation capabilities can be leverages to avoid calculations.
On IA64 the same will result in another single instruction using the
factor that we have a virtual address that always maps to the local per cpu
area.
fetchadd &stat->counter + (VCPU_BASE - __per_cpu_base)
The access is forced into the per cpu address reachable via the virtualized
address. Again the counter field offset is eadded to the offset. The access
is then similarly a singular instruction thing as on x86.
In order to be able to exploit the atomicity of this instructions we
introduce a series of new functions that take a BASE pointer (a pointer
into the area of cpu 0 which is the canonical base).
CPU_READ()
CPU_WRITE()
CPU_INC
CPU_DEC
CPU_ADD
CPU_SUB
CPU_XCHG
CPU_CMPXCHG
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/linux/percpu.h | 156 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 156 insertions(+)
Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h 2007-11-18 22:13:51.773274119 -0800
+++ linux-2.6/include/linux/percpu.h 2007-11-18 22:15:10.396773779 -0800
@@ -190,4 +190,160 @@ void cpu_free(void *cpu_pointer, unsigne
*/
void *boot_cpu_alloc(unsigned long size);
+/*
+ * Fast Atomic per cpu operations.
+ *
+ * The following operations can be overridden by arches to implement fast
+ * and efficient operations. The operations are atomic meaning that the
+ * determination of the processor, the calculation of the address and the
+ * operation on the data is an atomic operation.
+ */
+
+#ifndef CONFIG_FAST_CPU_OPS
+
+/*
+ * The fallbacks are rather slow but they are safe
+ *
+ * The first group of macros is used when we it is
+ * safe to update the per cpu variable because
+ * preemption is off (per cpu variables that are not
+ * updated from interrupt cointext) or because
+ * interrupts are already off.
+ */
+
+#define __CPU_READ(obj) \
+({ \
+ typeof(obj) x; \
+ x = *THIS_CPU(&(obj)); \
+ (x); \
+})
+
+#define __CPU_WRITE(obj, value) \
+({ \
+ *THIS_CPU((&(obj)) = value; \
+})
+
+#define __CPU_ADD(obj, value) \
+({ \
+ *THIS_CPU(&(obj)) += value; \
+})
+
+
+#define __CPU_INC(addr) __CPU_ADD(addr, 1)
+#define __CPU_DEC(addr) __CPU_ADD(addr, -1)
+#define __CPU_SUB(addr, value) __CPU_ADD(addr, -(value))
+
+#define __CPU_CMPXCHG(obj, old, new) \
+({ \
+ typeof(obj) x; \
+ typeof(obj) *p = THIS_CPU(&(obj)); \
+ x = *p; \
+ if (x == old) \
+ *p = new; \
+ (x); \
+})
+
+#define __CPU_XCHG(obj, new) \
+({ \
+ typeof(obj) x; \
+ typeof(obj) *p = THIS_CPU(&(obj)); \
+ x = *p; \
+ *p = new; \
+ (x); \
+})
+
+/*
+ * Second group used for per cpu variables that
+ * are not updated from an interrupt context.
+ * In that case we can simply disable preemption which
+ * may be free if the kernel is compiled without preemption.
+ */
+
+#define _CPU_READ(addr) \
+({ \
+ (__CPU_READ(addr)); \
+})
+
+#define _CPU_WRITE(addr, value) \
+({ \
+ __CPU_WRITE(addr, value); \
+})
+
+#define _CPU_ADD(addr, value) \
+({ \
+ preempt_disable(); \
+ __CPU_ADD(addr, value); \
+ preempt_enable(); \
+})
+
+#define _CPU_INC(addr) _CPU_ADD(addr, 1)
+#define _CPU_DEC(addr) _CPU_ADD(addr, -1)
+#define _CPU_SUB(addr, value) _CPU_ADD(addr, -(value))
+
+#define _CPU_CMPXCHG(addr, old, new) \
+({ \
+ typeof(addr) x; \
+ preempt_disable(); \
+ x = __CPU_CMPXCHG(addr, old, new); \
+ preempt_enable(); \
+ (x); \
+})
+
+#define _CPU_XCHG(addr, new) \
+({ \
+ typeof(addr) x; \
+ preempt_disable(); \
+ x = __CPU_XCHG(addr, new); \
+ preempt_enable(); \
+ (x); \
+})
+
+/*
+ * Interrupt safe CPU functions
+ */
+
+#define CPU_READ(addr) \
+({ \
+ (__CPU_READ(addr)); \
+})
+
+#define CPU_WRITE(addr, value) \
+({ \
+ __CPU_WRITE(addr, value); \
+})
+
+#define CPU_ADD(addr, value) \
+({ \
+ unsigned long flags; \
+ local_irq_save(flags); \
+ __CPU_ADD(addr, value); \
+ local_irq_restore(flags); \
+})
+
+#define CPU_INC(addr) CPU_ADD(addr, 1)
+#define CPU_DEC(addr) CPU_ADD(addr, -1)
+#define CPU_SUB(addr, value) CPU_ADD(addr, -(value))
+
+#define CPU_CMPXCHG(addr, old, new) \
+({ \
+ unsigned long flags; \
+ typeof(*addr) x; \
+ local_irq_save(flags); \
+ x = __CPU_CMPXCHG(addr, old, new); \
+ local_irq_restore(flags); \
+ (x); \
+})
+
+#define CPU_XCHG(addr, new) \
+({ \
+ unsigned long flags; \
+ typeof(*addr) x; \
+ local_irq_save(flags); \
+ x = __CPU_XCHG(addr, new); \
+ local_irq_restore(flags); \
+ (x); \
+})
+
+#endif /* CONFIG_FAST_CPU_OPS */
+
#endif /* __LINUX_PERCPU_H */
--
next prev parent reply other threads:[~2007-11-20 1:14 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-20 1:11 [rfc 00/45] [RFC] CPU ops and a rework of per cpu data handling on x86_64 clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 01/45] ACPI: Avoid references to impossible processors clameter, Christoph Lameter
2007-11-20 12:47 ` Mathieu Desnoyers
2007-11-20 20:16 ` Christoph Lameter
2007-11-20 15:29 ` Andi Kleen
2007-11-20 20:18 ` Christoph Lameter
2007-11-20 1:11 ` [rfc 02/45] cpu alloc: Simple version of the allocator (static allocations) clameter, Christoph Lameter
2007-11-20 1:11 ` clameter, Christoph Lameter [this message]
2007-11-20 3:17 ` [rfc 03/45] Generic CPU operations: Core piece Mathieu Desnoyers
2007-11-20 3:30 ` Christoph Lameter
2007-11-20 4:07 ` Mathieu Desnoyers
2007-11-20 20:36 ` Christoph Lameter
2007-11-20 1:11 ` [rfc 04/45] cpu alloc: Use in SLUB clameter, Christoph Lameter
2007-11-20 12:42 ` Mathieu Desnoyers
2007-11-20 20:44 ` Christoph Lameter
2007-11-20 21:23 ` Mathieu Desnoyers
2007-11-20 21:36 ` Christoph Lameter
2007-11-20 21:43 ` Mathieu Desnoyers
2007-11-20 1:11 ` [rfc 05/45] cpu alloc: Remove SLUB fields clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 06/45] cpu alloc: page allocator conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 07/45] cpu_alloc: Implement dynamically extendable cpu areas clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 08/45] cpu alloc: x86 support clameter, Christoph Lameter
2007-11-20 1:35 ` H. Peter Anvin
2007-11-20 2:02 ` Christoph Lameter
2007-11-20 2:18 ` H. Peter Anvin
2007-11-20 3:37 ` Nick Piggin
2007-11-20 3:59 ` Nick Piggin
2007-11-20 12:05 ` Andi Kleen
2007-11-20 3:16 ` Andi Kleen
2007-11-20 3:50 ` Christoph Lameter
2007-11-20 12:01 ` Andi Kleen
2007-11-20 20:35 ` Christoph Lameter
2007-11-20 20:59 ` Andi Kleen
2007-11-20 21:33 ` Christoph Lameter
2007-11-21 0:10 ` Christoph Lameter
2007-11-21 1:16 ` Christoph Lameter
2007-11-21 1:36 ` Andi Kleen
2007-11-21 2:08 ` Christoph Lameter
2007-11-21 13:08 ` Andi Kleen
2007-11-21 19:01 ` Christoph Lameter
2007-11-20 20:43 ` H. Peter Anvin
2007-11-20 20:51 ` Andi Kleen
2007-11-20 20:58 ` Christoph Lameter
2007-11-20 21:06 ` H. Peter Anvin
2007-11-20 21:34 ` Christoph Lameter
2007-11-20 21:01 ` H. Peter Anvin
2007-11-27 4:12 ` John Richard Moser
2007-11-20 1:11 ` [rfc 09/45] cpu alloc: IA64 support clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 10/45] cpu_alloc: Sparc64 support clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 11/45] cpu alloc: percpu_counter conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 12/45] cpu alloc: crash_notes conversion clameter, Christoph Lameter
2007-11-20 13:03 ` Mathieu Desnoyers
2007-11-20 20:50 ` Christoph Lameter
2007-11-20 1:11 ` [rfc 13/45] cpu alloc: workqueue conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 14/45] cpu alloc: ACPI cstate handling conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 15/45] cpu alloc: genhd statistics conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 16/45] cpu alloc: blktrace conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 17/45] cpu alloc: SRCU clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 18/45] cpu alloc: XFS counters clameter, Christoph Lameter
2007-11-20 8:12 ` Christoph Hellwig
2007-11-20 20:38 ` Christoph Lameter
2007-11-21 4:47 ` David Chinner
2007-11-21 4:50 ` Christoph Lameter
2007-11-20 1:11 ` [rfc 19/45] cpu alloc: NFS statistics clameter, Christoph Lameter
2007-11-20 13:02 ` Mathieu Desnoyers
2007-11-20 20:49 ` Christoph Lameter
2007-11-20 20:56 ` Trond Myklebust
2007-11-20 21:28 ` Mathieu Desnoyers
2007-11-20 21:48 ` Trond Myklebust
2007-11-20 21:50 ` Mathieu Desnoyers
2007-11-20 22:46 ` Trond Myklebust
2007-11-21 0:53 ` Mathieu Desnoyers
2007-11-20 21:26 ` Mathieu Desnoyers
2007-11-20 1:11 ` [rfc 20/45] cpu alloc: neigbour statistics clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 21/45] cpu alloc: tcp statistics clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 22/45] cpu alloc: convert scatches clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 23/45] cpu alloc: dmaengine conversion clameter, Christoph Lameter
2007-11-20 12:50 ` Mathieu Desnoyers
2007-11-20 20:46 ` Christoph Lameter
2007-11-20 1:11 ` [rfc 24/45] cpu alloc: convert loopback statistics clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 25/45] cpu alloc: veth conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 26/45] cpu alloc: Chelsio statistics conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 27/45] cpu alloc: convert mib handling to cpu alloc clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 28/45] cpu_alloc: convert network sockets clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 29/45] cpu alloc: Use for infiniband clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 30/45] cpu alloc: Use in the crypto subsystem clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 31/45] cpu alloc: Remove the allocpercpu functionality clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 32/45] Module handling: Use CPU_xx ops to dynamically allocate counters clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 33/45] x86_64: Use CPU ops for nmi alert counter clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 34/45] x86_64: Fold percpu area into the cpu area clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 35/45] X86_64: Declare pda as per cpu data thereby moving it " clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 36/45] X86_64: Place pda first in " clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 37/45] x86_64: Support for fast per cpu operations clameter, Christoph Lameter
2007-11-20 2:00 ` H. Peter Anvin
2007-11-20 2:03 ` Christoph Lameter
2007-11-20 2:15 ` H. Peter Anvin
2007-11-20 2:17 ` David Miller
2007-11-20 2:19 ` H. Peter Anvin
2007-11-20 3:23 ` Andi Kleen
2007-11-20 2:45 ` Paul Mackerras
2007-11-20 1:12 ` [rfc 38/45] x86_64: Remove obsolete per_cpu offset calculations clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 39/45] x86_64: Remove the data_offset field from the pda clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 40/45] x86_64: Provide per_cpu_var definition clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 41/45] VM statistics: Use CPU ops clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 43/45] x86_64: Add a CPU_OR to support or_pda() clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 44/45] Remove local_t support clameter, Christoph Lameter
2007-11-20 12:59 ` Mathieu Desnoyers
2007-11-20 20:48 ` Christoph Lameter
2007-11-20 1:12 ` [rfc 45/45] Modules: Hack to handle symbols that have a zero value clameter, Christoph Lameter
2007-11-20 2:20 ` Mathieu Desnoyers
2007-11-20 2:49 ` Christoph Lameter
2007-11-20 3:29 ` Mathieu Desnoyers
2007-11-20 1:18 ` [rfc 00/45] [RFC] CPU ops and a rework of per cpu data handling on x86_64 Christoph Lameter
2007-11-20 1:51 ` David Miller
2007-11-20 1:59 ` Christoph Lameter
2007-11-20 2:10 ` David Miller
2007-11-20 2:12 ` Christoph Lameter
2007-11-20 3:25 ` Andi Kleen
2007-11-20 3:33 ` Christoph Lameter
2007-11-20 4:04 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071120011332.415903723@sgi.com \
--to=clameter@sgi.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox