[patch 04/41] cpu ops: Core piece for generic atomic per cpu operations

public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed

From: Christoph Lameter <clameter@sgi.com>
To: akpm@linux-foundation.org
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <dada1@cosmosbay.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Mike Travis <travis@sgi.com>
Subject: [patch 04/41] cpu ops: Core piece for generic atomic per cpu operations
Date: Thu, 29 May 2008 20:56:24 -0700	[thread overview]
Message-ID: <20080530040011.727424512@sgi.com> (raw)
In-Reply-To: 20080530035620.587204923@sgi.com

[-- Attachment #1: cpu_alloc_ops_base --]
[-- Type: text/plain, Size: 7877 bytes --]

Currently the per cpu subsystem is not able to use the atomic capabilities
that are provided by many of the available processors.

This patch adds new functionality that allows the optimizing of per cpu
variable handling. In particular it provides a simple way to exploit
atomic operations in order to avoid having to disable interrupts or
performing address calculation to access per cpu data.

F.e. Using our current methods we may do

	unsigned long flags;
	struct stat_struct *p;

	local_irq_save(flags);
	/* Calculate address of per processor area */
	p = CPU_PTR(stat, smp_processor_id());
	p->counter++;
	local_irq_restore(flags);

The segment can be replaced by a single atomic CPU operation:

	CPU_INC(stat->counter);

Most processors have instructions to perform the increment using a
a single atomic instruction. Processors may have segment registers,
global registers or per cpu mappings of per cpu areas that can be used
to generate atomic instructions that combine the following in a single
operation:

1. Adding of an offset / register to a base address
2. Read modify write operation on the address calculated by
   the instruction.

If 1+2 are combined in an instruction then the instruction is atomic
vs interrupts. This means that percpu atomic operations do not need
to disable interrupts to increments counters etc.

The existing methods in use in the kernel cannot utilize the power of
these atomic instructions. local_t is not really addressing the issue
since the offset calculation performed before the atomic operation. The
operation is therefor not atomic. Disabling interrupt or preemption is
required in order to use local_t.

local_t is also very specific to the x86 processor. The solution here can
utilize other methods than just those provided by the x86 instruction set.

On x86 the above CPU_INC translated into a single instruction:

	inc %%gs:(&stat->counter)

This instruction is interrupt safe since it can either be completed
or not. Both adding of the offset and the read modify write are combined
in one instruction.

The determination of the correct per cpu area for the current processor
does not require access to smp_processor_id() (expensive...). The gs
register is used to provide a processor specific offset to the respective
per cpu area where the per cpu variable resides.

Note that the counter offset into the struct was added *before* the segment
selector was added. This is necessary to avoid calculations.  In the past
we first determine the address of the stats structure on the respective
processor and then added the field offset. However, the offset may as
well be added earlier. The adding of the per cpu offset (here through the
gs register) must be done by the instruction used for atomic per cpu
access.

If "stat" was declared via DECLARE_PER_CPU then this patchset is capable of
convincing the linker to provide the proper base address. In that case
no calculations are necessary.

Should the stat structure be reachable via a register then the address
calculation capabilities can be leveraged to avoid calculations.

On IA64 we can get the same combination of operations in a single instruction
by using the virtual address that always maps to the local per cpu area:

	fetchadd &stat->counter + (VCPU_BASE - __per_cpu_start)

The access is forced into the per cpu address reachable via the virtualized
address. IA64 allows the embedding of an offset into the instruction. So the
fetchadd can perform both the relocation of the pointer into the per cpu
area as well as the atomic read modify write cycle.

In order to be able to exploit the atomicity of these instructions we
introduce a series of new functions that take either:

1. A per cpu pointer as returned by cpu_alloc() or CPU_ALLOC().

2. A per cpu variable address as returned by per_cpu_var(<percpuvarname>).

CPU_READ()
CPU_WRITE()
CPU_INC
CPU_DEC
CPU_ADD
CPU_SUB
CPU_XCHG
CPU_CMPXCHG

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/percpu.h |  135 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)

Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h	2008-05-28 22:31:43.000000000 -0700
+++ linux-2.6/include/linux/percpu.h	2008-05-28 23:38:17.000000000 -0700
@@ -179,4 +179,139 @@
 void *cpu_alloc(unsigned long size, gfp_t flags, unsigned long align);
 void cpu_free(void *cpu_pointer, unsigned long size);

+/*
+ * Fast atomic per cpu operations.
+ *
+ * The following operations can be overridden by arches to implement fast
+ * and efficient operations. The operations are atomic meaning that the
+ * determination of the processor, the calculation of the address and the
+ * operation on the data is an atomic operation.
+ *
+ * The parameter passed to the atomic per cpu operations is an lvalue not a
+ * pointer to the object.
+ */
+#ifndef CONFIG_HAVE_CPU_OPS
+
+/*
+ * Fallback in case the arch does not provide for atomic per cpu operations.
+ *
+ * The first group of macros is used when it is safe to update the per
+ * cpu variable because preemption is off (per cpu variables that are not
+ * updated from interrupt context) or because interrupts are already off.
+ */
+#define __CPU_READ(var)				\
+({						\
+	(*THIS_CPU(&(var)));			\
+})
+
+#define __CPU_WRITE(var, value)			\
+({						\
+	*THIS_CPU(&(var)) = (value);		\
+})
+
+#define __CPU_ADD(var, value)			\
+({						\
+	*THIS_CPU(&(var)) += (value);		\
+})
+
+#define __CPU_INC(var) __CPU_ADD((var), 1)
+#define __CPU_DEC(var) __CPU_ADD((var), -1)
+#define __CPU_SUB(var, value) __CPU_ADD((var), -(value))
+
+#define __CPU_CMPXCHG(var, old, new)		\
+({						\
+	typeof(obj) x;				\
+	typeof(obj) *p = THIS_CPU(&(obj));	\
+	x = *p;					\
+	if (x == (old))				\
+		*p = (new);			\
+	(x);					\
+})
+
+#define __CPU_XCHG(obj, new)			\
+({						\
+	typeof(obj) x;				\
+	typeof(obj) *p = THIS_CPU(&(obj));	\
+	x = *p;					\
+	*p = (new);				\
+	(x);					\
+})
+
+/*
+ * Second group used for per cpu variables that are not updated from an
+ * interrupt context. In that case we can simply disable preemption which
+ * may be free if the kernel is compiled without support for preemption.
+ */
+#define _CPU_READ __CPU_READ
+#define _CPU_WRITE __CPU_WRITE
+
+#define _CPU_ADD(var, value)			\
+({						\
+	preempt_disable();			\
+	__CPU_ADD((var), (value));		\
+	preempt_enable();			\
+})
+
+#define _CPU_INC(var) _CPU_ADD((var), 1)
+#define _CPU_DEC(var) _CPU_ADD((var), -1)
+#define _CPU_SUB(var, value) _CPU_ADD((var), -(value))
+
+#define _CPU_CMPXCHG(var, old, new)		\
+({						\
+	typeof(addr) x;				\
+	preempt_disable();			\
+	x = __CPU_CMPXCHG((var), (old), (new));	\
+	preempt_enable();			\
+	(x);					\
+})
+
+#define _CPU_XCHG(var, new)			\
+({						\
+	typeof(var) x;				\
+	preempt_disable();			\
+	x = __CPU_XCHG((var), (new));		\
+	preempt_enable();			\
+	(x);					\
+})
+
+/*
+ * Third group: Interrupt safe CPU functions
+ */
+#define CPU_READ __CPU_READ
+#define CPU_WRITE __CPU_WRITE
+
+#define CPU_ADD(var, value)			\
+({						\
+	unsigned long flags;			\
+	local_irq_save(flags);			\
+	__CPU_ADD((var), (value));		\
+	local_irq_restore(flags);		\
+})
+
+#define CPU_INC(var) CPU_ADD((var), 1)
+#define CPU_DEC(var) CPU_ADD((var), -1)
+#define CPU_SUB(var, value) CPU_ADD((var), -(value))
+
+#define CPU_CMPXCHG(var, old, new)		\
+({						\
+	unsigned long flags;			\
+	typeof(var) x;				\
+	local_irq_save(flags);			\
+	x = __CPU_CMPXCHG((var), (old), (new));	\
+	local_irq_restore(flags);		\
+	(x);					\
+})
+
+#define CPU_XCHG(var, new)			\
+({						\
+	unsigned long flags;			\
+	typeof(var) x;				\
+	local_irq_save(flags);			\
+	x = __CPU_XCHG((var), (new));		\
+	local_irq_restore(flags);		\
+	(x);					\
+})
+
+#endif /* CONFIG_HAVE_CPU_OPS */
+
 #endif /* __LINUX_PERCPU_H */

--

next prev parent reply	other threads:[~2008-05-30  4:00 UTC|newest]

Thread overview: 163+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-30  3:56 [patch 00/41] cpu alloc / cpu ops v3: Optimize per cpu access Christoph Lameter
2008-05-30  3:56 ` [patch 01/41] cpu_alloc: Increase percpu area size to 128k Christoph Lameter
2008-06-02 17:58   ` Luck, Tony
2008-06-02 23:48     ` Rusty Russell
2008-06-10 17:22     ` Christoph Lameter
2008-06-10 17:22       ` Christoph Lameter
2008-06-10 19:54       ` Luck, Tony
2008-05-30  3:56 ` [patch 02/41] cpu alloc: The allocator Christoph Lameter
2008-05-30  4:58   ` Andrew Morton
2008-05-30  5:10     ` Christoph Lameter
2008-05-30  5:31       ` Andrew Morton
2008-06-02  9:29         ` Paul Jackson
2008-05-30  5:56       ` KAMEZAWA Hiroyuki
2008-05-30  6:16         ` Christoph Lameter
2008-06-04 14:48     ` Mike Travis
2008-05-30  5:04   ` Eric Dumazet
2008-05-30  5:20     ` Christoph Lameter
2008-05-30  5:52       ` Rusty Russell
2008-06-04 15:30         ` Mike Travis
2008-06-05 23:48           ` Rusty Russell
2008-05-30  5:54       ` Eric Dumazet
2008-06-04 14:58       ` Mike Travis
2008-06-04 15:11         ` Eric Dumazet
2008-06-06  0:32           ` Rusty Russell
2008-06-06  0:32             ` Rusty Russell
2008-06-10 17:33         ` Christoph Lameter
2008-06-10 18:05           ` Eric Dumazet
2008-06-10 18:28             ` Christoph Lameter
2008-05-30  5:46   ` Rusty Russell
2008-06-04 15:04     ` Mike Travis
2008-06-10 17:34       ` Christoph Lameter
2008-05-31 20:58   ` Pavel Machek
2008-05-30  3:56 ` [patch 03/41] cpu alloc: Use cpu allocator instead of the builtin modules per cpu allocator Christoph Lameter
2008-05-30  4:58   ` Andrew Morton
2008-05-30  5:14     ` Christoph Lameter
2008-05-30  5:34       ` Andrew Morton
2008-05-30  6:08   ` Rusty Russell
2008-05-30  6:21     ` Christoph Lameter
2008-05-30  3:56 ` Christoph Lameter [this message]
2008-05-30  4:58   ` [patch 04/41] cpu ops: Core piece for generic atomic per cpu operations Andrew Morton
2008-05-30  5:17     ` Christoph Lameter
2008-05-30  5:38       ` Andrew Morton
2008-05-30  6:12         ` Christoph Lameter
2008-05-30  7:08           ` Rusty Russell
2008-05-30 18:00             ` Christoph Lameter
2008-06-02  2:00               ` Rusty Russell
2008-06-04 18:18                 ` Mike Travis
2008-06-05 23:59                   ` Rusty Russell
2008-06-09 19:00                     ` Christoph Lameter
2008-06-09 23:27                       ` Rusty Russell
2008-06-09 23:54                         ` Christoph Lameter
2008-06-10  2:56                           ` Rusty Russell
2008-06-10  3:18                             ` Christoph Lameter
2008-06-11  0:03                               ` Rusty Russell
2008-06-11  0:15                                 ` Christoph Lameter
2008-06-09 23:09                   ` Christoph Lameter
2008-06-10 17:42                 ` Christoph Lameter
2008-06-11 11:10                   ` Rusty Russell
2008-06-11 23:39                     ` Christoph Lameter
2008-06-12  0:58                       ` Nick Piggin
2008-06-12  2:44                         ` Rusty Russell
2008-06-12  3:40                           ` Nick Piggin
2008-06-12  9:37                             ` Martin Peschke
2008-06-12 11:21                               ` Nick Piggin
2008-06-12 17:19                                 ` Christoph Lameter
2008-06-13  0:38                                   ` Rusty Russell
2008-06-13  2:27                                     ` Christoph Lameter
2008-06-15 10:33                                       ` Rusty Russell
2008-06-15 10:33                                         ` Rusty Russell
2008-06-16 14:52                                         ` Christoph Lameter
2008-06-17  0:24                                           ` Rusty Russell
2008-06-17  2:29                                             ` Christoph Lameter
2008-06-17 14:21                                             ` Mike Travis
2008-05-30  7:05         ` Rusty Russell
2008-05-30  6:32       ` Rusty Russell
2008-05-30  3:56 ` [patch 05/41] cpu alloc: Percpu_counter conversion Christoph Lameter
2008-05-30  6:47   ` Rusty Russell
2008-05-30 17:54     ` Christoph Lameter
2008-05-30  3:56 ` [patch 06/41] cpu alloc: crash_notes conversion Christoph Lameter
2008-05-30  3:56 ` [patch 07/41] cpu alloc: Workqueue conversion Christoph Lameter
2008-05-30  3:56 ` [patch 08/41] cpu alloc: ACPI cstate handling conversion Christoph Lameter
2008-05-30  3:56 ` [patch 09/41] cpu alloc: Genhd statistics conversion Christoph Lameter
2008-05-30  3:56 ` [patch 10/41] cpu alloc: blktrace conversion Christoph Lameter
2008-05-30  3:56 ` [patch 11/41] cpu alloc: SRCU cpu alloc conversion Christoph Lameter
2008-05-30  3:56 ` [patch 12/41] cpu alloc: XFS counter conversion Christoph Lameter
2008-05-30  3:56 ` [patch 13/41] cpu alloc: NFS statistics Christoph Lameter
2008-05-30  3:56 ` [patch 14/41] cpu alloc: Neigbour statistics Christoph Lameter
2008-05-30  3:56 ` [patch 15/41] cpu_alloc: Convert ip route statistics Christoph Lameter
2008-05-30  3:56 ` [patch 16/41] cpu alloc: Tcp statistics conversion Christoph Lameter
2008-05-30  3:56 ` [patch 17/41] cpu alloc: Convert scratches to cpu alloc Christoph Lameter
2008-05-30  3:56 ` [patch 18/41] cpu alloc: Dmaengine conversion Christoph Lameter
2008-05-30  3:56 ` [patch 19/41] cpu alloc: Convert loopback statistics Christoph Lameter
2008-05-30  3:56 ` [patch 20/41] cpu alloc: Veth conversion Christoph Lameter
2008-05-30  3:56 ` [patch 21/41] cpu alloc: Chelsio statistics conversion Christoph Lameter
2008-05-30  3:56 ` [patch 22/41] cpu alloc: Convert network sockets inuse counter Christoph Lameter
2008-05-30  3:56 ` [patch 23/41] cpu alloc: Use it for infiniband Christoph Lameter
2008-05-30  3:56 ` [patch 24/41] cpu alloc: Use in the crypto subsystem Christoph Lameter
2008-05-30  3:56 ` [patch 25/41] cpu alloc: scheduler: Convert cpuusage to cpu_alloc Christoph Lameter
2008-05-30  3:56 ` [patch 26/41] cpu alloc: Convert mib handling to cpu alloc Christoph Lameter
2008-05-30  6:47   ` Eric Dumazet
2008-05-30 18:01     ` Christoph Lameter
2008-05-30  3:56 ` [patch 27/41] cpu alloc: Remove the allocpercpu functionality Christoph Lameter
2008-05-30  4:58   ` Andrew Morton
2008-05-30  3:56 ` [patch 28/41] Module handling: Use CPU_xx ops to dynamically allocate counters Christoph Lameter
2008-05-30  3:56 ` [patch 29/41] x86_64: Use CPU ops for nmi alert counter Christoph Lameter
2008-05-30  3:56 ` [patch 30/41] Remove local_t support Christoph Lameter
2008-05-30  3:56 ` [patch 31/41] VM statistics: Use CPU ops Christoph Lameter
2008-05-30  3:56 ` [patch 32/41] cpu alloc: Use in slub Christoph Lameter
2008-05-30  3:56 ` [patch 33/41] cpu alloc: Remove slub fields Christoph Lameter
2008-05-30  3:56 ` [patch 34/41] cpu alloc: Page allocator conversion Christoph Lameter
2008-05-30  3:56 ` [patch 35/41] Support for CPU ops Christoph Lameter
2008-05-30  4:58   ` Andrew Morton
2008-05-30  5:18     ` Christoph Lameter
2008-05-30  3:56 ` [patch 36/41] Zero based percpu: Infrastructure to rebase the per cpu area to zero Christoph Lameter
2008-05-30  3:56 ` [patch 37/41] x86_64: Fold pda into per cpu area Christoph Lameter
2008-05-30  3:56 ` [patch 38/41] x86: Extend percpu ops to 64 bit Christoph Lameter
2008-05-30  3:56 ` [patch 39/41] x86: Replace cpu_pda() using percpu logic and get rid of _cpu_pda() Christoph Lameter
2008-05-30  3:57 ` [patch 40/41] x86: Replace xxx_pda() operations with x86_xx_percpu() Christoph Lameter
2008-05-30  3:57 ` [patch 41/41] x86_64: Support for cpu ops Christoph Lameter
2008-05-30  4:58 ` [patch 00/41] cpu alloc / cpu ops v3: Optimize per cpu access Andrew Morton
2008-05-30  5:03   ` Christoph Lameter
2008-05-30  5:21     ` Andrew Morton
2008-05-30  5:27       ` Christoph Lameter
2008-05-30  5:49         ` Andrew Morton
2008-05-30  6:16           ` Christoph Lameter
2008-05-30  6:51             ` KAMEZAWA Hiroyuki
2008-05-30 14:38         ` Mike Travis
2008-05-30 17:50           ` Christoph Lameter
2008-05-30 18:00             ` Matthew Wilcox
2008-05-30 18:12               ` Christoph Lameter
2008-05-30  6:01       ` Eric Dumazet
2008-05-30  6:16         ` Andrew Morton
2008-05-30  6:22           ` Christoph Lameter
2008-05-30  6:37             ` Andrew Morton
2008-05-30 11:32               ` Matthew Wilcox
2008-06-04 15:07   ` Mike Travis
2008-06-06  5:33     ` Eric Dumazet
2008-06-06 13:08       ` Mike Travis
2008-06-08  6:00       ` Rusty Russell
2008-06-09 18:44       ` Christoph Lameter
2008-06-09 19:11         ` Andi Kleen
2008-06-09 20:15           ` Eric Dumazet
2008-05-30  9:12 ` Peter Zijlstra
2008-05-30  9:18   ` Ingo Molnar
2008-05-30 18:11     ` Christoph Lameter
2008-05-30 18:40       ` Peter Zijlstra
2008-05-30 18:56         ` Christoph Lameter
2008-05-30 19:13           ` Peter Zijlstra
2008-06-01  3:25             ` Christoph Lameter
2008-06-01  8:19               ` Peter Zijlstra
2008-05-30 18:06   ` Christoph Lameter
2008-05-30 18:19     ` Peter Zijlstra
2008-05-30 18:26       ` Christoph Lameter
2008-05-30 18:47         ` Peter Zijlstra
2008-05-30 19:10           ` Christoph Lameter
2008-05-30 19:21             ` Peter Zijlstra
2008-05-30 19:35               ` Peter Zijlstra
2008-06-01  3:27               ` Christoph Lameter
2008-05-30 18:08   ` Christoph Lameter
2008-05-30 18:39     ` Peter Zijlstra
2008-05-30 18:51       ` Christoph Lameter
2008-05-30 19:00         ` Peter Zijlstra
2008-05-30 19:11           ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080530040011.727424512@sgi.com \
    --to=clameter@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rusty@rustcorp.com.au \
    --cc=travis@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox