From: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <clameter@sgi.com>
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
David Miller <davem@davemloft.net>,
Eric Dumazet <dada1@cosmosbay.com>,
Peter Zijlstra <peterz@infradead.org>,
Rusty Russell <rusty@rustcorp.com.au>,
Mike Travis <travis@sgi.com>
Subject: Re: [patch 04/41] cpu ops: Core piece for generic atomic per cpu operations
Date: Thu, 29 May 2008 21:58:44 -0700 [thread overview]
Message-ID: <20080529215844.609a3ac8.akpm@linux-foundation.org> (raw)
In-Reply-To: <20080530040011.727424512@sgi.com>
On Thu, 29 May 2008 20:56:24 -0700 Christoph Lameter <clameter@sgi.com> wrote:
> Currently the per cpu subsystem is not able to use the atomic capabilities
> that are provided by many of the available processors.
>
> This patch adds new functionality that allows the optimizing of per cpu
> variable handling. In particular it provides a simple way to exploit
> atomic operations in order to avoid having to disable interrupts or
> performing address calculation to access per cpu data.
>
> F.e. Using our current methods we may do
>
> unsigned long flags;
> struct stat_struct *p;
>
> local_irq_save(flags);
> /* Calculate address of per processor area */
> p = CPU_PTR(stat, smp_processor_id());
> p->counter++;
> local_irq_restore(flags);
eh? That's what local_t is for?
> The segment can be replaced by a single atomic CPU operation:
>
> CPU_INC(stat->counter);
hm, I guess this _has_ to be implemented as a macro. ho hum. But
please: "cpu_inc"?
> Most processors have instructions to perform the increment using a
> a single atomic instruction. Processors may have segment registers,
> global registers or per cpu mappings of per cpu areas that can be used
> to generate atomic instructions that combine the following in a single
> operation:
>
> 1. Adding of an offset / register to a base address
> 2. Read modify write operation on the address calculated by
> the instruction.
>
> If 1+2 are combined in an instruction then the instruction is atomic
> vs interrupts. This means that percpu atomic operations do not need
> to disable interrupts to increments counters etc.
>
> The existing methods in use in the kernel cannot utilize the power of
> these atomic instructions. local_t is not really addressing the issue
> since the offset calculation performed before the atomic operation. The
> operation is therefor not atomic. Disabling interrupt or preemption is
> required in order to use local_t.
Your terminology is totally confusing here.
To me, an "atomic operation" is one which is atomic wrt other CPUs:
atomic_t, for example.
Here we're talking about atomic-wrt-this-cpu-only, yes?
If so, we should invent a new term for that different concept and stick
to it like glue. How about "self-atomic"? Or "locally-atomic" in
deference to the existing local_t?
> local_t is also very specific to the x86 processor.
And alpha, m32r, mips and powerpc, methinks. Probably others, but
people just haven't got around to it.
> The solution here can
> utilize other methods than just those provided by the x86 instruction set.
>
>
>
> On x86 the above CPU_INC translated into a single instruction:
>
> inc %%gs:(&stat->counter)
>
> This instruction is interrupt safe since it can either be completed
> or not. Both adding of the offset and the read modify write are combined
> in one instruction.
>
> The determination of the correct per cpu area for the current processor
> does not require access to smp_processor_id() (expensive...). The gs
> register is used to provide a processor specific offset to the respective
> per cpu area where the per cpu variable resides.
>
> Note that the counter offset into the struct was added *before* the segment
> selector was added. This is necessary to avoid calculations. In the past
> we first determine the address of the stats structure on the respective
> processor and then added the field offset. However, the offset may as
> well be added earlier. The adding of the per cpu offset (here through the
> gs register) must be done by the instruction used for atomic per cpu
> access.
>
>
>
> If "stat" was declared via DECLARE_PER_CPU then this patchset is capable of
> convincing the linker to provide the proper base address. In that case
> no calculations are necessary.
>
> Should the stat structure be reachable via a register then the address
> calculation capabilities can be leveraged to avoid calculations.
>
> On IA64 we can get the same combination of operations in a single instruction
> by using the virtual address that always maps to the local per cpu area:
>
> fetchadd &stat->counter + (VCPU_BASE - __per_cpu_start)
>
> The access is forced into the per cpu address reachable via the virtualized
> address. IA64 allows the embedding of an offset into the instruction. So the
> fetchadd can perform both the relocation of the pointer into the per cpu
> area as well as the atomic read modify write cycle.
>
>
>
> In order to be able to exploit the atomicity of these instructions we
> introduce a series of new functions that take either:
>
> 1. A per cpu pointer as returned by cpu_alloc() or CPU_ALLOC().
>
> 2. A per cpu variable address as returned by per_cpu_var(<percpuvarname>).
>
> CPU_READ()
> CPU_WRITE()
> CPU_INC
> CPU_DEC
> CPU_ADD
> CPU_SUB
> CPU_XCHG
> CPU_CMPXCHG
>
I think I'll need to come back another time to understand all that ;)
Thanks for writing it up carefully.
>
> ---
> include/linux/percpu.h | 135 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 135 insertions(+)
>
> Index: linux-2.6/include/linux/percpu.h
> ===================================================================
> --- linux-2.6.orig/include/linux/percpu.h 2008-05-28 22:31:43.000000000 -0700
> +++ linux-2.6/include/linux/percpu.h 2008-05-28 23:38:17.000000000 -0700
I wonder if all this stuff should be in a new header file.
We could get lazy and include that header from percpu.h if needed.
> @@ -179,4 +179,139 @@
> void *cpu_alloc(unsigned long size, gfp_t flags, unsigned long align);
> void cpu_free(void *cpu_pointer, unsigned long size);
>
> +/*
> + * Fast atomic per cpu operations.
> + *
> + * The following operations can be overridden by arches to implement fast
> + * and efficient operations. The operations are atomic meaning that the
> + * determination of the processor, the calculation of the address and the
> + * operation on the data is an atomic operation.
> + *
> + * The parameter passed to the atomic per cpu operations is an lvalue not a
> + * pointer to the object.
> + */
> +#ifndef CONFIG_HAVE_CPU_OPS
If you move this functionality into a new cpu_alloc.h then the below
code goes into include/asm-generic/cpu_alloc.h and most architectures'
include/asm/cpu_alloc.h will include asm-generic/cpu_alloc.h.
include/linux/percpu.h can still include linux/cpu_alloc.h (which
includes asm/cpu_alloc.h) if needed. But it would be better to just
teach the .c files to include <linux/cpu_alloc.h>
> +/*
> + * Fallback in case the arch does not provide for atomic per cpu operations.
> + *
> + * The first group of macros is used when it is safe to update the per
> + * cpu variable because preemption is off (per cpu variables that are not
> + * updated from interrupt context) or because interrupts are already off.
> + */
> +#define __CPU_READ(var) \
> +({ \
> + (*THIS_CPU(&(var))); \
> +})
> +
> +#define __CPU_WRITE(var, value) \
> +({ \
> + *THIS_CPU(&(var)) = (value); \
> +})
> +
> +#define __CPU_ADD(var, value) \
> +({ \
> + *THIS_CPU(&(var)) += (value); \
> +})
> +
> +#define __CPU_INC(var) __CPU_ADD((var), 1)
> +#define __CPU_DEC(var) __CPU_ADD((var), -1)
> +#define __CPU_SUB(var, value) __CPU_ADD((var), -(value))
> +
> +#define __CPU_CMPXCHG(var, old, new) \
> +({ \
> + typeof(obj) x; \
> + typeof(obj) *p = THIS_CPU(&(obj)); \
> + x = *p; \
> + if (x == (old)) \
> + *p = (new); \
> + (x); \
> +})
> +
> +#define __CPU_XCHG(obj, new) \
> +({ \
> + typeof(obj) x; \
> + typeof(obj) *p = THIS_CPU(&(obj)); \
> + x = *p; \
> + *p = (new); \
> + (x); \
> +})
> +
> +/*
> + * Second group used for per cpu variables that are not updated from an
> + * interrupt context. In that case we can simply disable preemption which
> + * may be free if the kernel is compiled without support for preemption.
> + */
> +#define _CPU_READ __CPU_READ
> +#define _CPU_WRITE __CPU_WRITE
> +
> +#define _CPU_ADD(var, value) \
> +({ \
> + preempt_disable(); \
> + __CPU_ADD((var), (value)); \
> + preempt_enable(); \
> +})
> +
> +#define _CPU_INC(var) _CPU_ADD((var), 1)
> +#define _CPU_DEC(var) _CPU_ADD((var), -1)
> +#define _CPU_SUB(var, value) _CPU_ADD((var), -(value))
> +
> +#define _CPU_CMPXCHG(var, old, new) \
> +({ \
> + typeof(addr) x; \
> + preempt_disable(); \
> + x = __CPU_CMPXCHG((var), (old), (new)); \
> + preempt_enable(); \
> + (x); \
> +})
> +
> +#define _CPU_XCHG(var, new) \
> +({ \
> + typeof(var) x; \
> + preempt_disable(); \
> + x = __CPU_XCHG((var), (new)); \
> + preempt_enable(); \
> + (x); \
> +})
> +
> +/*
> + * Third group: Interrupt safe CPU functions
> + */
> +#define CPU_READ __CPU_READ
> +#define CPU_WRITE __CPU_WRITE
> +
> +#define CPU_ADD(var, value) \
> +({ \
> + unsigned long flags; \
> + local_irq_save(flags); \
> + __CPU_ADD((var), (value)); \
> + local_irq_restore(flags); \
> +})
> +
> +#define CPU_INC(var) CPU_ADD((var), 1)
> +#define CPU_DEC(var) CPU_ADD((var), -1)
> +#define CPU_SUB(var, value) CPU_ADD((var), -(value))
> +
> +#define CPU_CMPXCHG(var, old, new) \
> +({ \
> + unsigned long flags; \
> + typeof(var) x; \
> + local_irq_save(flags); \
> + x = __CPU_CMPXCHG((var), (old), (new)); \
> + local_irq_restore(flags); \
> + (x); \
> +})
> +
> +#define CPU_XCHG(var, new) \
> +({ \
> + unsigned long flags; \
> + typeof(var) x; \
> + local_irq_save(flags); \
> + x = __CPU_XCHG((var), (new)); \
> + local_irq_restore(flags); \
> + (x); \
> +})
> +
> +#endif /* CONFIG_HAVE_CPU_OPS */
> +
> #endif /* __LINUX_PERCPU_H */
next prev parent reply other threads:[~2008-05-30 4:59 UTC|newest]
Thread overview: 163+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-30 3:56 [patch 00/41] cpu alloc / cpu ops v3: Optimize per cpu access Christoph Lameter
2008-05-30 3:56 ` [patch 01/41] cpu_alloc: Increase percpu area size to 128k Christoph Lameter
2008-06-02 17:58 ` Luck, Tony
2008-06-02 23:48 ` Rusty Russell
2008-06-10 17:22 ` Christoph Lameter
2008-06-10 17:22 ` Christoph Lameter
2008-06-10 19:54 ` Luck, Tony
2008-05-30 3:56 ` [patch 02/41] cpu alloc: The allocator Christoph Lameter
2008-05-30 4:58 ` Andrew Morton
2008-05-30 5:10 ` Christoph Lameter
2008-05-30 5:31 ` Andrew Morton
2008-06-02 9:29 ` Paul Jackson
2008-05-30 5:56 ` KAMEZAWA Hiroyuki
2008-05-30 6:16 ` Christoph Lameter
2008-06-04 14:48 ` Mike Travis
2008-05-30 5:04 ` Eric Dumazet
2008-05-30 5:20 ` Christoph Lameter
2008-05-30 5:52 ` Rusty Russell
2008-06-04 15:30 ` Mike Travis
2008-06-05 23:48 ` Rusty Russell
2008-05-30 5:54 ` Eric Dumazet
2008-06-04 14:58 ` Mike Travis
2008-06-04 15:11 ` Eric Dumazet
2008-06-06 0:32 ` Rusty Russell
2008-06-06 0:32 ` Rusty Russell
2008-06-10 17:33 ` Christoph Lameter
2008-06-10 18:05 ` Eric Dumazet
2008-06-10 18:28 ` Christoph Lameter
2008-05-30 5:46 ` Rusty Russell
2008-06-04 15:04 ` Mike Travis
2008-06-10 17:34 ` Christoph Lameter
2008-05-31 20:58 ` Pavel Machek
2008-05-30 3:56 ` [patch 03/41] cpu alloc: Use cpu allocator instead of the builtin modules per cpu allocator Christoph Lameter
2008-05-30 4:58 ` Andrew Morton
2008-05-30 5:14 ` Christoph Lameter
2008-05-30 5:34 ` Andrew Morton
2008-05-30 6:08 ` Rusty Russell
2008-05-30 6:21 ` Christoph Lameter
2008-05-30 3:56 ` [patch 04/41] cpu ops: Core piece for generic atomic per cpu operations Christoph Lameter
2008-05-30 4:58 ` Andrew Morton [this message]
2008-05-30 5:17 ` Christoph Lameter
2008-05-30 5:38 ` Andrew Morton
2008-05-30 6:12 ` Christoph Lameter
2008-05-30 7:08 ` Rusty Russell
2008-05-30 18:00 ` Christoph Lameter
2008-06-02 2:00 ` Rusty Russell
2008-06-04 18:18 ` Mike Travis
2008-06-05 23:59 ` Rusty Russell
2008-06-09 19:00 ` Christoph Lameter
2008-06-09 23:27 ` Rusty Russell
2008-06-09 23:54 ` Christoph Lameter
2008-06-10 2:56 ` Rusty Russell
2008-06-10 3:18 ` Christoph Lameter
2008-06-11 0:03 ` Rusty Russell
2008-06-11 0:15 ` Christoph Lameter
2008-06-09 23:09 ` Christoph Lameter
2008-06-10 17:42 ` Christoph Lameter
2008-06-11 11:10 ` Rusty Russell
2008-06-11 23:39 ` Christoph Lameter
2008-06-12 0:58 ` Nick Piggin
2008-06-12 2:44 ` Rusty Russell
2008-06-12 3:40 ` Nick Piggin
2008-06-12 9:37 ` Martin Peschke
2008-06-12 11:21 ` Nick Piggin
2008-06-12 17:19 ` Christoph Lameter
2008-06-13 0:38 ` Rusty Russell
2008-06-13 2:27 ` Christoph Lameter
2008-06-15 10:33 ` Rusty Russell
2008-06-15 10:33 ` Rusty Russell
2008-06-16 14:52 ` Christoph Lameter
2008-06-17 0:24 ` Rusty Russell
2008-06-17 2:29 ` Christoph Lameter
2008-06-17 14:21 ` Mike Travis
2008-05-30 7:05 ` Rusty Russell
2008-05-30 6:32 ` Rusty Russell
2008-05-30 3:56 ` [patch 05/41] cpu alloc: Percpu_counter conversion Christoph Lameter
2008-05-30 6:47 ` Rusty Russell
2008-05-30 17:54 ` Christoph Lameter
2008-05-30 3:56 ` [patch 06/41] cpu alloc: crash_notes conversion Christoph Lameter
2008-05-30 3:56 ` [patch 07/41] cpu alloc: Workqueue conversion Christoph Lameter
2008-05-30 3:56 ` [patch 08/41] cpu alloc: ACPI cstate handling conversion Christoph Lameter
2008-05-30 3:56 ` [patch 09/41] cpu alloc: Genhd statistics conversion Christoph Lameter
2008-05-30 3:56 ` [patch 10/41] cpu alloc: blktrace conversion Christoph Lameter
2008-05-30 3:56 ` [patch 11/41] cpu alloc: SRCU cpu alloc conversion Christoph Lameter
2008-05-30 3:56 ` [patch 12/41] cpu alloc: XFS counter conversion Christoph Lameter
2008-05-30 3:56 ` [patch 13/41] cpu alloc: NFS statistics Christoph Lameter
2008-05-30 3:56 ` [patch 14/41] cpu alloc: Neigbour statistics Christoph Lameter
2008-05-30 3:56 ` [patch 15/41] cpu_alloc: Convert ip route statistics Christoph Lameter
2008-05-30 3:56 ` [patch 16/41] cpu alloc: Tcp statistics conversion Christoph Lameter
2008-05-30 3:56 ` [patch 17/41] cpu alloc: Convert scratches to cpu alloc Christoph Lameter
2008-05-30 3:56 ` [patch 18/41] cpu alloc: Dmaengine conversion Christoph Lameter
2008-05-30 3:56 ` [patch 19/41] cpu alloc: Convert loopback statistics Christoph Lameter
2008-05-30 3:56 ` [patch 20/41] cpu alloc: Veth conversion Christoph Lameter
2008-05-30 3:56 ` [patch 21/41] cpu alloc: Chelsio statistics conversion Christoph Lameter
2008-05-30 3:56 ` [patch 22/41] cpu alloc: Convert network sockets inuse counter Christoph Lameter
2008-05-30 3:56 ` [patch 23/41] cpu alloc: Use it for infiniband Christoph Lameter
2008-05-30 3:56 ` [patch 24/41] cpu alloc: Use in the crypto subsystem Christoph Lameter
2008-05-30 3:56 ` [patch 25/41] cpu alloc: scheduler: Convert cpuusage to cpu_alloc Christoph Lameter
2008-05-30 3:56 ` [patch 26/41] cpu alloc: Convert mib handling to cpu alloc Christoph Lameter
2008-05-30 6:47 ` Eric Dumazet
2008-05-30 18:01 ` Christoph Lameter
2008-05-30 3:56 ` [patch 27/41] cpu alloc: Remove the allocpercpu functionality Christoph Lameter
2008-05-30 4:58 ` Andrew Morton
2008-05-30 3:56 ` [patch 28/41] Module handling: Use CPU_xx ops to dynamically allocate counters Christoph Lameter
2008-05-30 3:56 ` [patch 29/41] x86_64: Use CPU ops for nmi alert counter Christoph Lameter
2008-05-30 3:56 ` [patch 30/41] Remove local_t support Christoph Lameter
2008-05-30 3:56 ` [patch 31/41] VM statistics: Use CPU ops Christoph Lameter
2008-05-30 3:56 ` [patch 32/41] cpu alloc: Use in slub Christoph Lameter
2008-05-30 3:56 ` [patch 33/41] cpu alloc: Remove slub fields Christoph Lameter
2008-05-30 3:56 ` [patch 34/41] cpu alloc: Page allocator conversion Christoph Lameter
2008-05-30 3:56 ` [patch 35/41] Support for CPU ops Christoph Lameter
2008-05-30 4:58 ` Andrew Morton
2008-05-30 5:18 ` Christoph Lameter
2008-05-30 3:56 ` [patch 36/41] Zero based percpu: Infrastructure to rebase the per cpu area to zero Christoph Lameter
2008-05-30 3:56 ` [patch 37/41] x86_64: Fold pda into per cpu area Christoph Lameter
2008-05-30 3:56 ` [patch 38/41] x86: Extend percpu ops to 64 bit Christoph Lameter
2008-05-30 3:56 ` [patch 39/41] x86: Replace cpu_pda() using percpu logic and get rid of _cpu_pda() Christoph Lameter
2008-05-30 3:57 ` [patch 40/41] x86: Replace xxx_pda() operations with x86_xx_percpu() Christoph Lameter
2008-05-30 3:57 ` [patch 41/41] x86_64: Support for cpu ops Christoph Lameter
2008-05-30 4:58 ` [patch 00/41] cpu alloc / cpu ops v3: Optimize per cpu access Andrew Morton
2008-05-30 5:03 ` Christoph Lameter
2008-05-30 5:21 ` Andrew Morton
2008-05-30 5:27 ` Christoph Lameter
2008-05-30 5:49 ` Andrew Morton
2008-05-30 6:16 ` Christoph Lameter
2008-05-30 6:51 ` KAMEZAWA Hiroyuki
2008-05-30 14:38 ` Mike Travis
2008-05-30 17:50 ` Christoph Lameter
2008-05-30 18:00 ` Matthew Wilcox
2008-05-30 18:12 ` Christoph Lameter
2008-05-30 6:01 ` Eric Dumazet
2008-05-30 6:16 ` Andrew Morton
2008-05-30 6:22 ` Christoph Lameter
2008-05-30 6:37 ` Andrew Morton
2008-05-30 11:32 ` Matthew Wilcox
2008-06-04 15:07 ` Mike Travis
2008-06-06 5:33 ` Eric Dumazet
2008-06-06 13:08 ` Mike Travis
2008-06-08 6:00 ` Rusty Russell
2008-06-09 18:44 ` Christoph Lameter
2008-06-09 19:11 ` Andi Kleen
2008-06-09 20:15 ` Eric Dumazet
2008-05-30 9:12 ` Peter Zijlstra
2008-05-30 9:18 ` Ingo Molnar
2008-05-30 18:11 ` Christoph Lameter
2008-05-30 18:40 ` Peter Zijlstra
2008-05-30 18:56 ` Christoph Lameter
2008-05-30 19:13 ` Peter Zijlstra
2008-06-01 3:25 ` Christoph Lameter
2008-06-01 8:19 ` Peter Zijlstra
2008-05-30 18:06 ` Christoph Lameter
2008-05-30 18:19 ` Peter Zijlstra
2008-05-30 18:26 ` Christoph Lameter
2008-05-30 18:47 ` Peter Zijlstra
2008-05-30 19:10 ` Christoph Lameter
2008-05-30 19:21 ` Peter Zijlstra
2008-05-30 19:35 ` Peter Zijlstra
2008-06-01 3:27 ` Christoph Lameter
2008-05-30 18:08 ` Christoph Lameter
2008-05-30 18:39 ` Peter Zijlstra
2008-05-30 18:51 ` Christoph Lameter
2008-05-30 19:00 ` Peter Zijlstra
2008-05-30 19:11 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080529215844.609a3ac8.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rusty@rustcorp.com.au \
--cc=travis@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox