From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: clameter@sgi.com
Cc: ak@suse.de, akpm@linux-foundation.org, travis@sgi.com,
linux-kernel@vger.kernel.org
Subject: Re: [rfc 03/45] Generic CPU operations: Core piece
Date: Mon, 19 Nov 2007 22:17:52 -0500 [thread overview]
Message-ID: <20071120031751.GA21743@Krystal> (raw)
In-Reply-To: <20071120011332.415903723@sgi.com>
Very interesting patch! I did not expect we could mix local atomic ops
with per CPU offsets in an atomic manner.. brilliant :)
Some nitpicking follows...
* clameter@sgi.com (clameter@sgi.com) wrote:
> Currently the per cpu subsystem is not able to use the atomic capabilities
> of the processors we have.
>
> This adds new functionality that allows the optimizing of per cpu variable
> handliong. It in particular provides a simple way to exploit atomic operations
handling
> to avoid having to disable itnerrupts or add an per cpu offset.
interrupts
>
> F.e. current implementations may do
>
> unsigned long flags;
> struct stat_struct *p;
>
> local_irq_save(flags);
> /* Calculate address of per processor area */
> p = CPU_PTR(stat, smp_processor_id());
> p->counter++;
> local_irq_restore(flags);
>
> This whole segment can be replaced by a single CPU operation
>
> CPU_INC(stat->counter);
>
> And on most processors it is possible to perform the increment with
> a single processor instruction. Processors have segment registers,
> global registers and per cpu mappings of per cpu areas for that purpose.
>
> The problem is that the current schemes cannot utilize those features.
> local_t is not really addressing the issue since the offset calculation
> is not solved. local_t is x86 processor specific. This solution here
> can utilize other methods than just the x86 instruction set.
>
> On x86 the above CPU_INC translated into a single instruction:
>
> inc %%gs:(&stat->counter)
>
> This instruction is interrupt safe since it can either be completed
> or not.
>
> The determination of the correct per cpu area for the current processor
> does not require access to smp_processor_id() (expensive...). The gs
> register is used to provide a processor specific offset to the respective
> per cpu area where the per cpu variabvle resides.
variable
>
> Note tha the counter offset into the struct was added *before* the segment
that
> selector was added. This is necessary to avoid calculation, In the past
> we first determine the address of the stats structure on the respective
> processor and then added the field offset. However, the offset may as
> well be added earlier.
>
> If stat was declared via DECLARE_PER_CPU then this patchset is capoable of
capable
> convincing the linker to provide the proper base address. In that case
> no calculations are necessary.
>
> Should the stats structure be reachable via a register then the address
> calculation capabilities can be leverages to avoid calculations.
>
> On IA64 the same will result in another single instruction using the
> factor that we have a virtual address that always maps to the local per cpu
> area.
>
> fetchadd &stat->counter + (VCPU_BASE - __per_cpu_base)
>
> The access is forced into the per cpu address reachable via the virtualized
> address. Again the counter field offset is eadded to the offset. The access
added
> is then similarly a singular instruction thing as on x86.
>
> In order to be able to exploit the atomicity of this instructions we
> introduce a series of new functions that take a BASE pointer (a pointer
> into the area of cpu 0 which is the canonical base).
>
> CPU_READ()
> CPU_WRITE()
> CPU_INC
> CPU_DEC
> CPU_ADD
> CPU_SUB
> CPU_XCHG
> CPU_CMPXCHG
>
>
>
>
>
>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
>
> ---
> include/linux/percpu.h | 156 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 156 insertions(+)
>
> Index: linux-2.6/include/linux/percpu.h
> ===================================================================
> --- linux-2.6.orig/include/linux/percpu.h 2007-11-18 22:13:51.773274119 -0800
> +++ linux-2.6/include/linux/percpu.h 2007-11-18 22:15:10.396773779 -0800
> @@ -190,4 +190,160 @@ void cpu_free(void *cpu_pointer, unsigne
> */
> void *boot_cpu_alloc(unsigned long size);
>
> +/*
> + * Fast Atomic per cpu operations.
> + *
> + * The following operations can be overridden by arches to implement fast
> + * and efficient operations. The operations are atomic meaning that the
> + * determination of the processor, the calculation of the address and the
> + * operation on the data is an atomic operation.
> + */
> +
> +#ifndef CONFIG_FAST_CPU_OPS
> +
> +/*
> + * The fallbacks are rather slow but they are safe
> + *
> + * The first group of macros is used when we it is
> + * safe to update the per cpu variable because
> + * preemption is off (per cpu variables that are not
> + * updated from interrupt cointext) or because
context
> + * interrupts are already off.
> + */
> +
> +#define __CPU_READ(obj) \
> +({ \
> + typeof(obj) x; \
> + x = *THIS_CPU(&(obj)); \
> + (x); \
> +})
> +
> +#define __CPU_WRITE(obj, value) \
> +({ \
> + *THIS_CPU((&(obj)) = value; \
> +})
> +
> +#define __CPU_ADD(obj, value) \
> +({ \
> + *THIS_CPU(&(obj)) += value; \
> +})
> +
> +
> +#define __CPU_INC(addr) __CPU_ADD(addr, 1)
> +#define __CPU_DEC(addr) __CPU_ADD(addr, -1)
> +#define __CPU_SUB(addr, value) __CPU_ADD(addr, -(value))
> +
> +#define __CPU_CMPXCHG(obj, old, new) \
> +({ \
> + typeof(obj) x; \
> + typeof(obj) *p = THIS_CPU(&(obj)); \
> + x = *p; \
> + if (x == old) \
> + *p = new; \
I think you could use extra () around old, new etc.. ?
> + (x); \
> +})
> +
> +#define __CPU_XCHG(obj, new) \
> +({ \
> + typeof(obj) x; \
> + typeof(obj) *p = THIS_CPU(&(obj)); \
> + x = *p; \
> + *p = new; \
Same here.
> + (x); \
() seems unneeded here, since x is local.
> +})
> +
> +/*
> + * Second group used for per cpu variables that
> + * are not updated from an interrupt context.
> + * In that case we can simply disable preemption which
> + * may be free if the kernel is compiled without preemption.
> + */
> +
> +#define _CPU_READ(addr) \
> +({ \
> + (__CPU_READ(addr)); \
> +})
({ }) seems to be unneeded here.
> +
> +#define _CPU_WRITE(addr, value) \
> +({ \
> + __CPU_WRITE(addr, value); \
> +})
and here..
> +
> +#define _CPU_ADD(addr, value) \
> +({ \
> + preempt_disable(); \
> + __CPU_ADD(addr, value); \
> + preempt_enable(); \
> +})
> +
Add ()
> +#define _CPU_INC(addr) _CPU_ADD(addr, 1)
> +#define _CPU_DEC(addr) _CPU_ADD(addr, -1)
> +#define _CPU_SUB(addr, value) _CPU_ADD(addr, -(value))
> +
> +#define _CPU_CMPXCHG(addr, old, new) \
> +({ \
> + typeof(addr) x; \
> + preempt_disable(); \
> + x = __CPU_CMPXCHG(addr, old, new); \
add ()
> + preempt_enable(); \
> + (x); \
> +})
> +
> +#define _CPU_XCHG(addr, new) \
> +({ \
> + typeof(addr) x; \
> + preempt_disable(); \
> + x = __CPU_XCHG(addr, new); \
()
> + preempt_enable(); \
> + (x); \
() seems unneeded here, since x is local.
> +})
> +
> +/*
> + * Interrupt safe CPU functions
> + */
> +
> +#define CPU_READ(addr) \
> +({ \
> + (__CPU_READ(addr)); \
> +})
> +
Unnecessary ({ })
> +#define CPU_WRITE(addr, value) \
> +({ \
> + __CPU_WRITE(addr, value); \
> +})
> +
> +#define CPU_ADD(addr, value) \
> +({ \
> + unsigned long flags; \
> + local_irq_save(flags); \
> + __CPU_ADD(addr, value); \
> + local_irq_restore(flags); \
> +})
> +
> +#define CPU_INC(addr) CPU_ADD(addr, 1)
> +#define CPU_DEC(addr) CPU_ADD(addr, -1)
> +#define CPU_SUB(addr, value) CPU_ADD(addr, -(value))
> +
> +#define CPU_CMPXCHG(addr, old, new) \
> +({ \
> + unsigned long flags; \
> + typeof(*addr) x; \
> + local_irq_save(flags); \
> + x = __CPU_CMPXCHG(addr, old, new); \
()
> + local_irq_restore(flags); \
> + (x); \
() seems unneeded here, since x is local.
> +})
> +
> +#define CPU_XCHG(addr, new) \
> +({ \
> + unsigned long flags; \
> + typeof(*addr) x; \
> + local_irq_save(flags); \
> + x = __CPU_XCHG(addr, new); \
()
> + local_irq_restore(flags); \
> + (x); \
() seems unneeded here, since x is local.
> +})
> +
> +#endif /* CONFIG_FAST_CPU_OPS */
> +
> #endif /* __LINUX_PERCPU_H */
>
> --
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2007-11-20 3:23 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-20 1:11 [rfc 00/45] [RFC] CPU ops and a rework of per cpu data handling on x86_64 clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 01/45] ACPI: Avoid references to impossible processors clameter, Christoph Lameter
2007-11-20 12:47 ` Mathieu Desnoyers
2007-11-20 20:16 ` Christoph Lameter
2007-11-20 15:29 ` Andi Kleen
2007-11-20 20:18 ` Christoph Lameter
2007-11-20 1:11 ` [rfc 02/45] cpu alloc: Simple version of the allocator (static allocations) clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 03/45] Generic CPU operations: Core piece clameter, Christoph Lameter
2007-11-20 3:17 ` Mathieu Desnoyers [this message]
2007-11-20 3:30 ` Christoph Lameter
2007-11-20 4:07 ` Mathieu Desnoyers
2007-11-20 20:36 ` Christoph Lameter
2007-11-20 1:11 ` [rfc 04/45] cpu alloc: Use in SLUB clameter, Christoph Lameter
2007-11-20 12:42 ` Mathieu Desnoyers
2007-11-20 20:44 ` Christoph Lameter
2007-11-20 21:23 ` Mathieu Desnoyers
2007-11-20 21:36 ` Christoph Lameter
2007-11-20 21:43 ` Mathieu Desnoyers
2007-11-20 1:11 ` [rfc 05/45] cpu alloc: Remove SLUB fields clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 06/45] cpu alloc: page allocator conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 07/45] cpu_alloc: Implement dynamically extendable cpu areas clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 08/45] cpu alloc: x86 support clameter, Christoph Lameter
2007-11-20 1:35 ` H. Peter Anvin
2007-11-20 2:02 ` Christoph Lameter
2007-11-20 2:18 ` H. Peter Anvin
2007-11-20 3:37 ` Nick Piggin
2007-11-20 3:59 ` Nick Piggin
2007-11-20 12:05 ` Andi Kleen
2007-11-20 3:16 ` Andi Kleen
2007-11-20 3:50 ` Christoph Lameter
2007-11-20 12:01 ` Andi Kleen
2007-11-20 20:35 ` Christoph Lameter
2007-11-20 20:59 ` Andi Kleen
2007-11-20 21:33 ` Christoph Lameter
2007-11-21 0:10 ` Christoph Lameter
2007-11-21 1:16 ` Christoph Lameter
2007-11-21 1:36 ` Andi Kleen
2007-11-21 2:08 ` Christoph Lameter
2007-11-21 13:08 ` Andi Kleen
2007-11-21 19:01 ` Christoph Lameter
2007-11-20 20:43 ` H. Peter Anvin
2007-11-20 20:51 ` Andi Kleen
2007-11-20 20:58 ` Christoph Lameter
2007-11-20 21:06 ` H. Peter Anvin
2007-11-20 21:34 ` Christoph Lameter
2007-11-20 21:01 ` H. Peter Anvin
2007-11-27 4:12 ` John Richard Moser
2007-11-20 1:11 ` [rfc 09/45] cpu alloc: IA64 support clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 10/45] cpu_alloc: Sparc64 support clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 11/45] cpu alloc: percpu_counter conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 12/45] cpu alloc: crash_notes conversion clameter, Christoph Lameter
2007-11-20 13:03 ` Mathieu Desnoyers
2007-11-20 20:50 ` Christoph Lameter
2007-11-20 1:11 ` [rfc 13/45] cpu alloc: workqueue conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 14/45] cpu alloc: ACPI cstate handling conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 15/45] cpu alloc: genhd statistics conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 16/45] cpu alloc: blktrace conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 17/45] cpu alloc: SRCU clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 18/45] cpu alloc: XFS counters clameter, Christoph Lameter
2007-11-20 8:12 ` Christoph Hellwig
2007-11-20 20:38 ` Christoph Lameter
2007-11-21 4:47 ` David Chinner
2007-11-21 4:50 ` Christoph Lameter
2007-11-20 1:11 ` [rfc 19/45] cpu alloc: NFS statistics clameter, Christoph Lameter
2007-11-20 13:02 ` Mathieu Desnoyers
2007-11-20 20:49 ` Christoph Lameter
2007-11-20 20:56 ` Trond Myklebust
2007-11-20 21:28 ` Mathieu Desnoyers
2007-11-20 21:48 ` Trond Myklebust
2007-11-20 21:50 ` Mathieu Desnoyers
2007-11-20 22:46 ` Trond Myklebust
2007-11-21 0:53 ` Mathieu Desnoyers
2007-11-20 21:26 ` Mathieu Desnoyers
2007-11-20 1:11 ` [rfc 20/45] cpu alloc: neigbour statistics clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 21/45] cpu alloc: tcp statistics clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 22/45] cpu alloc: convert scatches clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 23/45] cpu alloc: dmaengine conversion clameter, Christoph Lameter
2007-11-20 12:50 ` Mathieu Desnoyers
2007-11-20 20:46 ` Christoph Lameter
2007-11-20 1:11 ` [rfc 24/45] cpu alloc: convert loopback statistics clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 25/45] cpu alloc: veth conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 26/45] cpu alloc: Chelsio statistics conversion clameter, Christoph Lameter
2007-11-20 1:11 ` [rfc 27/45] cpu alloc: convert mib handling to cpu alloc clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 28/45] cpu_alloc: convert network sockets clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 29/45] cpu alloc: Use for infiniband clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 30/45] cpu alloc: Use in the crypto subsystem clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 31/45] cpu alloc: Remove the allocpercpu functionality clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 32/45] Module handling: Use CPU_xx ops to dynamically allocate counters clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 33/45] x86_64: Use CPU ops for nmi alert counter clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 34/45] x86_64: Fold percpu area into the cpu area clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 35/45] X86_64: Declare pda as per cpu data thereby moving it " clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 36/45] X86_64: Place pda first in " clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 37/45] x86_64: Support for fast per cpu operations clameter, Christoph Lameter
2007-11-20 2:00 ` H. Peter Anvin
2007-11-20 2:03 ` Christoph Lameter
2007-11-20 2:15 ` H. Peter Anvin
2007-11-20 2:17 ` David Miller
2007-11-20 2:19 ` H. Peter Anvin
2007-11-20 3:23 ` Andi Kleen
2007-11-20 2:45 ` Paul Mackerras
2007-11-20 1:12 ` [rfc 38/45] x86_64: Remove obsolete per_cpu offset calculations clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 39/45] x86_64: Remove the data_offset field from the pda clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 40/45] x86_64: Provide per_cpu_var definition clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 41/45] VM statistics: Use CPU ops clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 43/45] x86_64: Add a CPU_OR to support or_pda() clameter, Christoph Lameter
2007-11-20 1:12 ` [rfc 44/45] Remove local_t support clameter, Christoph Lameter
2007-11-20 12:59 ` Mathieu Desnoyers
2007-11-20 20:48 ` Christoph Lameter
2007-11-20 1:12 ` [rfc 45/45] Modules: Hack to handle symbols that have a zero value clameter, Christoph Lameter
2007-11-20 2:20 ` Mathieu Desnoyers
2007-11-20 2:49 ` Christoph Lameter
2007-11-20 3:29 ` Mathieu Desnoyers
2007-11-20 1:18 ` [rfc 00/45] [RFC] CPU ops and a rework of per cpu data handling on x86_64 Christoph Lameter
2007-11-20 1:51 ` David Miller
2007-11-20 1:59 ` Christoph Lameter
2007-11-20 2:10 ` David Miller
2007-11-20 2:12 ` Christoph Lameter
2007-11-20 3:25 ` Andi Kleen
2007-11-20 3:33 ` Christoph Lameter
2007-11-20 4:04 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071120031751.GA21743@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=travis@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox