public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <clameter@sgi.com>
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <dada1@cosmosbay.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Mike Travis <travis@sgi.com>
Subject: Re: [patch 04/41] cpu ops: Core piece for generic atomic per cpu operations
Date: Thu, 29 May 2008 21:58:44 -0700	[thread overview]
Message-ID: <20080529215844.609a3ac8.akpm@linux-foundation.org> (raw)
In-Reply-To: <20080530040011.727424512@sgi.com>

On Thu, 29 May 2008 20:56:24 -0700 Christoph Lameter <clameter@sgi.com> wrote:

> Currently the per cpu subsystem is not able to use the atomic capabilities
> that are provided by many of the available processors.
> 
> This patch adds new functionality that allows the optimizing of per cpu
> variable handling. In particular it provides a simple way to exploit
> atomic operations in order to avoid having to disable interrupts or
> performing address calculation to access per cpu data.
> 
> F.e. Using our current methods we may do
> 
> 	unsigned long flags;
> 	struct stat_struct *p;
> 
> 	local_irq_save(flags);
> 	/* Calculate address of per processor area */
> 	p = CPU_PTR(stat, smp_processor_id());
> 	p->counter++;
> 	local_irq_restore(flags);

eh?  That's what local_t is for?

> The segment can be replaced by a single atomic CPU operation:
> 
> 	CPU_INC(stat->counter);

hm, I guess this _has_ to be implemented as a macro.  ho hum.  But
please: "cpu_inc"?

> Most processors have instructions to perform the increment using a
> a single atomic instruction. Processors may have segment registers,
> global registers or per cpu mappings of per cpu areas that can be used
> to generate atomic instructions that combine the following in a single
> operation:
> 
> 1. Adding of an offset / register to a base address
> 2. Read modify write operation on the address calculated by
>    the instruction.
> 
> If 1+2 are combined in an instruction then the instruction is atomic
> vs interrupts. This means that percpu atomic operations do not need
> to disable interrupts to increments counters etc.
> 
> The existing methods in use in the kernel cannot utilize the power of
> these atomic instructions. local_t is not really addressing the issue
> since the offset calculation performed before the atomic operation. The
> operation is therefor not atomic. Disabling interrupt or preemption is
> required in order to use local_t.

Your terminology is totally confusing here.

To me, an "atomic operation" is one which is atomic wrt other CPUs:
atomic_t, for example.

Here we're talking about atomic-wrt-this-cpu-only, yes?

If so, we should invent a new term for that different concept and stick
to it like glue.  How about "self-atomic"?  Or "locally-atomic" in
deference to the existing local_t?

> local_t is also very specific to the x86 processor.

And alpha, m32r, mips and powerpc, methinks.  Probably others, but
people just haven't got around to it.

> The solution here can
> utilize other methods than just those provided by the x86 instruction set.
> 
> 
> 
> On x86 the above CPU_INC translated into a single instruction:
> 
> 	inc %%gs:(&stat->counter)
> 
> This instruction is interrupt safe since it can either be completed
> or not. Both adding of the offset and the read modify write are combined
> in one instruction.
> 
> The determination of the correct per cpu area for the current processor
> does not require access to smp_processor_id() (expensive...). The gs
> register is used to provide a processor specific offset to the respective
> per cpu area where the per cpu variable resides.
> 
> Note that the counter offset into the struct was added *before* the segment
> selector was added. This is necessary to avoid calculations.  In the past
> we first determine the address of the stats structure on the respective
> processor and then added the field offset. However, the offset may as
> well be added earlier. The adding of the per cpu offset (here through the
> gs register) must be done by the instruction used for atomic per cpu
> access.
> 
> 
> 
> If "stat" was declared via DECLARE_PER_CPU then this patchset is capable of
> convincing the linker to provide the proper base address. In that case
> no calculations are necessary.
> 
> Should the stat structure be reachable via a register then the address
> calculation capabilities can be leveraged to avoid calculations.
> 
> On IA64 we can get the same combination of operations in a single instruction
> by using the virtual address that always maps to the local per cpu area:
> 
> 	fetchadd &stat->counter + (VCPU_BASE - __per_cpu_start)
> 
> The access is forced into the per cpu address reachable via the virtualized
> address. IA64 allows the embedding of an offset into the instruction. So the
> fetchadd can perform both the relocation of the pointer into the per cpu
> area as well as the atomic read modify write cycle.
> 
> 
> 
> In order to be able to exploit the atomicity of these instructions we
> introduce a series of new functions that take either:
> 
> 1. A per cpu pointer as returned by cpu_alloc() or CPU_ALLOC().
> 
> 2. A per cpu variable address as returned by per_cpu_var(<percpuvarname>).
> 
> CPU_READ()
> CPU_WRITE()
> CPU_INC
> CPU_DEC
> CPU_ADD
> CPU_SUB
> CPU_XCHG
> CPU_CMPXCHG
> 

I think I'll need to come back another time to understand all that ;)

Thanks for writing it up carefully.

> 
> ---
>  include/linux/percpu.h |  135 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 135 insertions(+)
> 
> Index: linux-2.6/include/linux/percpu.h
> ===================================================================
> --- linux-2.6.orig/include/linux/percpu.h	2008-05-28 22:31:43.000000000 -0700
> +++ linux-2.6/include/linux/percpu.h	2008-05-28 23:38:17.000000000 -0700

I wonder if all this stuff should be in a new header file.

We could get lazy and include that header from percpu.h if needed.

> @@ -179,4 +179,139 @@
>  void *cpu_alloc(unsigned long size, gfp_t flags, unsigned long align);
>  void cpu_free(void *cpu_pointer, unsigned long size);
>  
> +/*
> + * Fast atomic per cpu operations.
> + *
> + * The following operations can be overridden by arches to implement fast
> + * and efficient operations. The operations are atomic meaning that the
> + * determination of the processor, the calculation of the address and the
> + * operation on the data is an atomic operation.
> + *
> + * The parameter passed to the atomic per cpu operations is an lvalue not a
> + * pointer to the object.
> + */
> +#ifndef CONFIG_HAVE_CPU_OPS

If you move this functionality into a new cpu_alloc.h then the below
code goes into include/asm-generic/cpu_alloc.h and most architectures'
include/asm/cpu_alloc.h will include asm-generic/cpu_alloc.h.

include/linux/percpu.h can still include linux/cpu_alloc.h (which
includes asm/cpu_alloc.h) if needed.  But it would be better to just
teach the .c files to include <linux/cpu_alloc.h>

> +/*
> + * Fallback in case the arch does not provide for atomic per cpu operations.
> + *
> + * The first group of macros is used when it is safe to update the per
> + * cpu variable because preemption is off (per cpu variables that are not
> + * updated from interrupt context) or because interrupts are already off.
> + */
> +#define __CPU_READ(var)				\
> +({						\
> +	(*THIS_CPU(&(var)));			\
> +})
> +
> +#define __CPU_WRITE(var, value)			\
> +({						\
> +	*THIS_CPU(&(var)) = (value);		\
> +})
> +
> +#define __CPU_ADD(var, value)			\
> +({						\
> +	*THIS_CPU(&(var)) += (value);		\
> +})
> +
> +#define __CPU_INC(var) __CPU_ADD((var), 1)
> +#define __CPU_DEC(var) __CPU_ADD((var), -1)
> +#define __CPU_SUB(var, value) __CPU_ADD((var), -(value))
> +
> +#define __CPU_CMPXCHG(var, old, new)		\
> +({						\
> +	typeof(obj) x;				\
> +	typeof(obj) *p = THIS_CPU(&(obj));	\
> +	x = *p;					\
> +	if (x == (old))				\
> +		*p = (new);			\
> +	(x);					\
> +})
> +
> +#define __CPU_XCHG(obj, new)			\
> +({						\
> +	typeof(obj) x;				\
> +	typeof(obj) *p = THIS_CPU(&(obj));	\
> +	x = *p;					\
> +	*p = (new);				\
> +	(x);					\
> +})
> +
> +/*
> + * Second group used for per cpu variables that are not updated from an
> + * interrupt context. In that case we can simply disable preemption which
> + * may be free if the kernel is compiled without support for preemption.
> + */
> +#define _CPU_READ __CPU_READ
> +#define _CPU_WRITE __CPU_WRITE
> +
> +#define _CPU_ADD(var, value)			\
> +({						\
> +	preempt_disable();			\
> +	__CPU_ADD((var), (value));		\
> +	preempt_enable();			\
> +})
> +
> +#define _CPU_INC(var) _CPU_ADD((var), 1)
> +#define _CPU_DEC(var) _CPU_ADD((var), -1)
> +#define _CPU_SUB(var, value) _CPU_ADD((var), -(value))
> +
> +#define _CPU_CMPXCHG(var, old, new)		\
> +({						\
> +	typeof(addr) x;				\
> +	preempt_disable();			\
> +	x = __CPU_CMPXCHG((var), (old), (new));	\
> +	preempt_enable();			\
> +	(x);					\
> +})
> +
> +#define _CPU_XCHG(var, new)			\
> +({						\
> +	typeof(var) x;				\
> +	preempt_disable();			\
> +	x = __CPU_XCHG((var), (new));		\
> +	preempt_enable();			\
> +	(x);					\
> +})
> +
> +/*
> + * Third group: Interrupt safe CPU functions
> + */
> +#define CPU_READ __CPU_READ
> +#define CPU_WRITE __CPU_WRITE
> +
> +#define CPU_ADD(var, value)			\
> +({						\
> +	unsigned long flags;			\
> +	local_irq_save(flags);			\
> +	__CPU_ADD((var), (value));		\
> +	local_irq_restore(flags);		\
> +})
> +
> +#define CPU_INC(var) CPU_ADD((var), 1)
> +#define CPU_DEC(var) CPU_ADD((var), -1)
> +#define CPU_SUB(var, value) CPU_ADD((var), -(value))
> +
> +#define CPU_CMPXCHG(var, old, new)		\
> +({						\
> +	unsigned long flags;			\
> +	typeof(var) x;				\
> +	local_irq_save(flags);			\
> +	x = __CPU_CMPXCHG((var), (old), (new));	\
> +	local_irq_restore(flags);		\
> +	(x);					\
> +})
> +
> +#define CPU_XCHG(var, new)			\
> +({						\
> +	unsigned long flags;			\
> +	typeof(var) x;				\
> +	local_irq_save(flags);			\
> +	x = __CPU_XCHG((var), (new));		\
> +	local_irq_restore(flags);		\
> +	(x);					\
> +})
> +
> +#endif /* CONFIG_HAVE_CPU_OPS */
> +
>  #endif /* __LINUX_PERCPU_H */

  reply	other threads:[~2008-05-30  4:59 UTC|newest]

Thread overview: 163+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-30  3:56 [patch 00/41] cpu alloc / cpu ops v3: Optimize per cpu access Christoph Lameter
2008-05-30  3:56 ` [patch 01/41] cpu_alloc: Increase percpu area size to 128k Christoph Lameter
2008-06-02 17:58   ` Luck, Tony
2008-06-02 23:48     ` Rusty Russell
2008-06-10 17:22     ` Christoph Lameter
2008-06-10 17:22       ` Christoph Lameter
2008-06-10 19:54       ` Luck, Tony
2008-05-30  3:56 ` [patch 02/41] cpu alloc: The allocator Christoph Lameter
2008-05-30  4:58   ` Andrew Morton
2008-05-30  5:10     ` Christoph Lameter
2008-05-30  5:31       ` Andrew Morton
2008-06-02  9:29         ` Paul Jackson
2008-05-30  5:56       ` KAMEZAWA Hiroyuki
2008-05-30  6:16         ` Christoph Lameter
2008-06-04 14:48     ` Mike Travis
2008-05-30  5:04   ` Eric Dumazet
2008-05-30  5:20     ` Christoph Lameter
2008-05-30  5:52       ` Rusty Russell
2008-06-04 15:30         ` Mike Travis
2008-06-05 23:48           ` Rusty Russell
2008-05-30  5:54       ` Eric Dumazet
2008-06-04 14:58       ` Mike Travis
2008-06-04 15:11         ` Eric Dumazet
2008-06-06  0:32           ` Rusty Russell
2008-06-06  0:32             ` Rusty Russell
2008-06-10 17:33         ` Christoph Lameter
2008-06-10 18:05           ` Eric Dumazet
2008-06-10 18:28             ` Christoph Lameter
2008-05-30  5:46   ` Rusty Russell
2008-06-04 15:04     ` Mike Travis
2008-06-10 17:34       ` Christoph Lameter
2008-05-31 20:58   ` Pavel Machek
2008-05-30  3:56 ` [patch 03/41] cpu alloc: Use cpu allocator instead of the builtin modules per cpu allocator Christoph Lameter
2008-05-30  4:58   ` Andrew Morton
2008-05-30  5:14     ` Christoph Lameter
2008-05-30  5:34       ` Andrew Morton
2008-05-30  6:08   ` Rusty Russell
2008-05-30  6:21     ` Christoph Lameter
2008-05-30  3:56 ` [patch 04/41] cpu ops: Core piece for generic atomic per cpu operations Christoph Lameter
2008-05-30  4:58   ` Andrew Morton [this message]
2008-05-30  5:17     ` Christoph Lameter
2008-05-30  5:38       ` Andrew Morton
2008-05-30  6:12         ` Christoph Lameter
2008-05-30  7:08           ` Rusty Russell
2008-05-30 18:00             ` Christoph Lameter
2008-06-02  2:00               ` Rusty Russell
2008-06-04 18:18                 ` Mike Travis
2008-06-05 23:59                   ` Rusty Russell
2008-06-09 19:00                     ` Christoph Lameter
2008-06-09 23:27                       ` Rusty Russell
2008-06-09 23:54                         ` Christoph Lameter
2008-06-10  2:56                           ` Rusty Russell
2008-06-10  3:18                             ` Christoph Lameter
2008-06-11  0:03                               ` Rusty Russell
2008-06-11  0:15                                 ` Christoph Lameter
2008-06-09 23:09                   ` Christoph Lameter
2008-06-10 17:42                 ` Christoph Lameter
2008-06-11 11:10                   ` Rusty Russell
2008-06-11 23:39                     ` Christoph Lameter
2008-06-12  0:58                       ` Nick Piggin
2008-06-12  2:44                         ` Rusty Russell
2008-06-12  3:40                           ` Nick Piggin
2008-06-12  9:37                             ` Martin Peschke
2008-06-12 11:21                               ` Nick Piggin
2008-06-12 17:19                                 ` Christoph Lameter
2008-06-13  0:38                                   ` Rusty Russell
2008-06-13  2:27                                     ` Christoph Lameter
2008-06-15 10:33                                       ` Rusty Russell
2008-06-15 10:33                                         ` Rusty Russell
2008-06-16 14:52                                         ` Christoph Lameter
2008-06-17  0:24                                           ` Rusty Russell
2008-06-17  2:29                                             ` Christoph Lameter
2008-06-17 14:21                                             ` Mike Travis
2008-05-30  7:05         ` Rusty Russell
2008-05-30  6:32       ` Rusty Russell
2008-05-30  3:56 ` [patch 05/41] cpu alloc: Percpu_counter conversion Christoph Lameter
2008-05-30  6:47   ` Rusty Russell
2008-05-30 17:54     ` Christoph Lameter
2008-05-30  3:56 ` [patch 06/41] cpu alloc: crash_notes conversion Christoph Lameter
2008-05-30  3:56 ` [patch 07/41] cpu alloc: Workqueue conversion Christoph Lameter
2008-05-30  3:56 ` [patch 08/41] cpu alloc: ACPI cstate handling conversion Christoph Lameter
2008-05-30  3:56 ` [patch 09/41] cpu alloc: Genhd statistics conversion Christoph Lameter
2008-05-30  3:56 ` [patch 10/41] cpu alloc: blktrace conversion Christoph Lameter
2008-05-30  3:56 ` [patch 11/41] cpu alloc: SRCU cpu alloc conversion Christoph Lameter
2008-05-30  3:56 ` [patch 12/41] cpu alloc: XFS counter conversion Christoph Lameter
2008-05-30  3:56 ` [patch 13/41] cpu alloc: NFS statistics Christoph Lameter
2008-05-30  3:56 ` [patch 14/41] cpu alloc: Neigbour statistics Christoph Lameter
2008-05-30  3:56 ` [patch 15/41] cpu_alloc: Convert ip route statistics Christoph Lameter
2008-05-30  3:56 ` [patch 16/41] cpu alloc: Tcp statistics conversion Christoph Lameter
2008-05-30  3:56 ` [patch 17/41] cpu alloc: Convert scratches to cpu alloc Christoph Lameter
2008-05-30  3:56 ` [patch 18/41] cpu alloc: Dmaengine conversion Christoph Lameter
2008-05-30  3:56 ` [patch 19/41] cpu alloc: Convert loopback statistics Christoph Lameter
2008-05-30  3:56 ` [patch 20/41] cpu alloc: Veth conversion Christoph Lameter
2008-05-30  3:56 ` [patch 21/41] cpu alloc: Chelsio statistics conversion Christoph Lameter
2008-05-30  3:56 ` [patch 22/41] cpu alloc: Convert network sockets inuse counter Christoph Lameter
2008-05-30  3:56 ` [patch 23/41] cpu alloc: Use it for infiniband Christoph Lameter
2008-05-30  3:56 ` [patch 24/41] cpu alloc: Use in the crypto subsystem Christoph Lameter
2008-05-30  3:56 ` [patch 25/41] cpu alloc: scheduler: Convert cpuusage to cpu_alloc Christoph Lameter
2008-05-30  3:56 ` [patch 26/41] cpu alloc: Convert mib handling to cpu alloc Christoph Lameter
2008-05-30  6:47   ` Eric Dumazet
2008-05-30 18:01     ` Christoph Lameter
2008-05-30  3:56 ` [patch 27/41] cpu alloc: Remove the allocpercpu functionality Christoph Lameter
2008-05-30  4:58   ` Andrew Morton
2008-05-30  3:56 ` [patch 28/41] Module handling: Use CPU_xx ops to dynamically allocate counters Christoph Lameter
2008-05-30  3:56 ` [patch 29/41] x86_64: Use CPU ops for nmi alert counter Christoph Lameter
2008-05-30  3:56 ` [patch 30/41] Remove local_t support Christoph Lameter
2008-05-30  3:56 ` [patch 31/41] VM statistics: Use CPU ops Christoph Lameter
2008-05-30  3:56 ` [patch 32/41] cpu alloc: Use in slub Christoph Lameter
2008-05-30  3:56 ` [patch 33/41] cpu alloc: Remove slub fields Christoph Lameter
2008-05-30  3:56 ` [patch 34/41] cpu alloc: Page allocator conversion Christoph Lameter
2008-05-30  3:56 ` [patch 35/41] Support for CPU ops Christoph Lameter
2008-05-30  4:58   ` Andrew Morton
2008-05-30  5:18     ` Christoph Lameter
2008-05-30  3:56 ` [patch 36/41] Zero based percpu: Infrastructure to rebase the per cpu area to zero Christoph Lameter
2008-05-30  3:56 ` [patch 37/41] x86_64: Fold pda into per cpu area Christoph Lameter
2008-05-30  3:56 ` [patch 38/41] x86: Extend percpu ops to 64 bit Christoph Lameter
2008-05-30  3:56 ` [patch 39/41] x86: Replace cpu_pda() using percpu logic and get rid of _cpu_pda() Christoph Lameter
2008-05-30  3:57 ` [patch 40/41] x86: Replace xxx_pda() operations with x86_xx_percpu() Christoph Lameter
2008-05-30  3:57 ` [patch 41/41] x86_64: Support for cpu ops Christoph Lameter
2008-05-30  4:58 ` [patch 00/41] cpu alloc / cpu ops v3: Optimize per cpu access Andrew Morton
2008-05-30  5:03   ` Christoph Lameter
2008-05-30  5:21     ` Andrew Morton
2008-05-30  5:27       ` Christoph Lameter
2008-05-30  5:49         ` Andrew Morton
2008-05-30  6:16           ` Christoph Lameter
2008-05-30  6:51             ` KAMEZAWA Hiroyuki
2008-05-30 14:38         ` Mike Travis
2008-05-30 17:50           ` Christoph Lameter
2008-05-30 18:00             ` Matthew Wilcox
2008-05-30 18:12               ` Christoph Lameter
2008-05-30  6:01       ` Eric Dumazet
2008-05-30  6:16         ` Andrew Morton
2008-05-30  6:22           ` Christoph Lameter
2008-05-30  6:37             ` Andrew Morton
2008-05-30 11:32               ` Matthew Wilcox
2008-06-04 15:07   ` Mike Travis
2008-06-06  5:33     ` Eric Dumazet
2008-06-06 13:08       ` Mike Travis
2008-06-08  6:00       ` Rusty Russell
2008-06-09 18:44       ` Christoph Lameter
2008-06-09 19:11         ` Andi Kleen
2008-06-09 20:15           ` Eric Dumazet
2008-05-30  9:12 ` Peter Zijlstra
2008-05-30  9:18   ` Ingo Molnar
2008-05-30 18:11     ` Christoph Lameter
2008-05-30 18:40       ` Peter Zijlstra
2008-05-30 18:56         ` Christoph Lameter
2008-05-30 19:13           ` Peter Zijlstra
2008-06-01  3:25             ` Christoph Lameter
2008-06-01  8:19               ` Peter Zijlstra
2008-05-30 18:06   ` Christoph Lameter
2008-05-30 18:19     ` Peter Zijlstra
2008-05-30 18:26       ` Christoph Lameter
2008-05-30 18:47         ` Peter Zijlstra
2008-05-30 19:10           ` Christoph Lameter
2008-05-30 19:21             ` Peter Zijlstra
2008-05-30 19:35               ` Peter Zijlstra
2008-06-01  3:27               ` Christoph Lameter
2008-05-30 18:08   ` Christoph Lameter
2008-05-30 18:39     ` Peter Zijlstra
2008-05-30 18:51       ` Christoph Lameter
2008-05-30 19:00         ` Peter Zijlstra
2008-05-30 19:11           ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080529215844.609a3ac8.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rusty@rustcorp.com.au \
    --cc=travis@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox