public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [rfc 00/45] [RFC] CPU ops and a rework of per cpu data handling on x86_64
@ 2007-11-20  1:11 clameter, Christoph Lameter
  2007-11-20  1:11 ` [rfc 01/45] ACPI: Avoid references to impossible processors clameter, Christoph Lameter
                   ` (45 more replies)
  0 siblings, 46 replies; 120+ messages in thread
From: clameter, Christoph Lameter @ 2007-11-20  1:11 UTC (permalink / raw)
  To: ak; +Cc: akpm, travis, Mathieu Desnoyers, linux-kernel

This is a pretty early draft stage of the patch. It works on
x86_64 only. Its a bit massive so I'd like to have some feedback
before proceeding (and maybe some help)?.

The support for other arches was not tested yet.

The patch establishes a new set of cpu operations that allow to
exploit single instruction atomicity to allow per cpu variable
modifications without disabling/enabling preempt or interrupts and
without the need to do an offset calculation in order to determine
the location of the variable on the current processor.

It then implements these operations on x86_64 after consolidating
per cpu access for allocpercpu, percpu and pda. All per
cpu data is then accessible via gs segment override.

This results in a reduction in code size of the kernel and in more efficient
operation of per cpu access.

Before:
   text    data     bss     dec     hex filename
   4041907  512371 1302360 5856638  595d7e vmlinux

After (this includes the code added for the cpu allocator!):

  text    data     bss     dec     hex filename
  3861532  527715 1298072 5687319  56c817 vmlinux


On x86_64 the segment override results in the following change for a simple
vm counter increment:

Before:

mov    %gs:0x8,%rdx		Get smp_processor_id
mov    tableoffset,%rax		Get table base
incq   varoffset(%rax,%rdx,1)	Perform the operation with a complex lookup
				adding the var offset

An interrupt or a reschedule action can move the execution thread to another
processor if interrupt or preempt is not disabled. Then the variable of
the wrong processor may be updated in a racy way.

After:

incq   %gs:varoffset(%rip)

Single instruction that is safe from interrupts or moving of the execution
thread. It will reliably operate on the current processors data area.

Other platforms can also perform address relocation plus atomic ops on
a memory location. Exploiting of the atomicity of instructions vs interrupts
is therefore possible and will reduce the cpu op processing overhead.

F.e on IA64 we have per cpu virtual mapping of the per cpu area. If
we add an offset to the per cpu area variable address then we can guarantee
that we always hit the per cpu areas local to a processor.

Other platforms (SPARC?) have registers that can be used to form addresses.
If the cpu area address is in one of those then atomic per cpu modifications
can be generated for those platforms in the same way.

Slub best performance in the fast fastpath goes from 47 cycles to 41 cycles
through the use of the segment override.


-- 

^ permalink raw reply	[flat|nested] 120+ messages in thread

end of thread, other threads:[~2007-11-27  4:19 UTC | newest]

Thread overview: 120+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-20  1:11 [rfc 00/45] [RFC] CPU ops and a rework of per cpu data handling on x86_64 clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 01/45] ACPI: Avoid references to impossible processors clameter, Christoph Lameter
2007-11-20 12:47   ` Mathieu Desnoyers
2007-11-20 20:16     ` Christoph Lameter
2007-11-20 15:29   ` Andi Kleen
2007-11-20 20:18     ` Christoph Lameter
2007-11-20  1:11 ` [rfc 02/45] cpu alloc: Simple version of the allocator (static allocations) clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 03/45] Generic CPU operations: Core piece clameter, Christoph Lameter
2007-11-20  3:17   ` Mathieu Desnoyers
2007-11-20  3:30     ` Christoph Lameter
2007-11-20  4:07       ` Mathieu Desnoyers
2007-11-20 20:36         ` Christoph Lameter
2007-11-20  1:11 ` [rfc 04/45] cpu alloc: Use in SLUB clameter, Christoph Lameter
2007-11-20 12:42   ` Mathieu Desnoyers
2007-11-20 20:44     ` Christoph Lameter
2007-11-20 21:23       ` Mathieu Desnoyers
2007-11-20 21:36         ` Christoph Lameter
2007-11-20 21:43           ` Mathieu Desnoyers
2007-11-20  1:11 ` [rfc 05/45] cpu alloc: Remove SLUB fields clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 06/45] cpu alloc: page allocator conversion clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 07/45] cpu_alloc: Implement dynamically extendable cpu areas clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 08/45] cpu alloc: x86 support clameter, Christoph Lameter
2007-11-20  1:35   ` H. Peter Anvin
2007-11-20  2:02     ` Christoph Lameter
2007-11-20  2:18       ` H. Peter Anvin
2007-11-20  3:37       ` Nick Piggin
2007-11-20  3:59       ` Nick Piggin
2007-11-20 12:05         ` Andi Kleen
2007-11-20  3:16   ` Andi Kleen
2007-11-20  3:50     ` Christoph Lameter
2007-11-20 12:01       ` Andi Kleen
2007-11-20 20:35         ` Christoph Lameter
2007-11-20 20:59           ` Andi Kleen
2007-11-20 21:33             ` Christoph Lameter
2007-11-21  0:10               ` Christoph Lameter
2007-11-21  1:16                 ` Christoph Lameter
2007-11-21  1:36                   ` Andi Kleen
2007-11-21  2:08                     ` Christoph Lameter
2007-11-21 13:08                       ` Andi Kleen
2007-11-21 19:01                         ` Christoph Lameter
2007-11-20 20:43         ` H. Peter Anvin
2007-11-20 20:51           ` Andi Kleen
2007-11-20 20:58             ` Christoph Lameter
2007-11-20 21:06               ` H. Peter Anvin
2007-11-20 21:34                 ` Christoph Lameter
2007-11-20 21:01             ` H. Peter Anvin
2007-11-27  4:12         ` John Richard Moser
2007-11-20  1:11 ` [rfc 09/45] cpu alloc: IA64 support clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 10/45] cpu_alloc: Sparc64 support clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 11/45] cpu alloc: percpu_counter conversion clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 12/45] cpu alloc: crash_notes conversion clameter, Christoph Lameter
2007-11-20 13:03   ` Mathieu Desnoyers
2007-11-20 20:50     ` Christoph Lameter
2007-11-20  1:11 ` [rfc 13/45] cpu alloc: workqueue conversion clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 14/45] cpu alloc: ACPI cstate handling conversion clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 15/45] cpu alloc: genhd statistics conversion clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 16/45] cpu alloc: blktrace conversion clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 17/45] cpu alloc: SRCU clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 18/45] cpu alloc: XFS counters clameter, Christoph Lameter
2007-11-20  8:12   ` Christoph Hellwig
2007-11-20 20:38     ` Christoph Lameter
2007-11-21  4:47       ` David Chinner
2007-11-21  4:50         ` Christoph Lameter
2007-11-20  1:11 ` [rfc 19/45] cpu alloc: NFS statistics clameter, Christoph Lameter
2007-11-20 13:02   ` Mathieu Desnoyers
2007-11-20 20:49     ` Christoph Lameter
2007-11-20 20:56       ` Trond Myklebust
2007-11-20 21:28         ` Mathieu Desnoyers
2007-11-20 21:48           ` Trond Myklebust
2007-11-20 21:50             ` Mathieu Desnoyers
2007-11-20 22:46               ` Trond Myklebust
2007-11-21  0:53                 ` Mathieu Desnoyers
2007-11-20 21:26       ` Mathieu Desnoyers
2007-11-20  1:11 ` [rfc 20/45] cpu alloc: neigbour statistics clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 21/45] cpu alloc: tcp statistics clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 22/45] cpu alloc: convert scatches clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 23/45] cpu alloc: dmaengine conversion clameter, Christoph Lameter
2007-11-20 12:50   ` Mathieu Desnoyers
2007-11-20 20:46     ` Christoph Lameter
2007-11-20  1:11 ` [rfc 24/45] cpu alloc: convert loopback statistics clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 25/45] cpu alloc: veth conversion clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 26/45] cpu alloc: Chelsio statistics conversion clameter, Christoph Lameter
2007-11-20  1:11 ` [rfc 27/45] cpu alloc: convert mib handling to cpu alloc clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 28/45] cpu_alloc: convert network sockets clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 29/45] cpu alloc: Use for infiniband clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 30/45] cpu alloc: Use in the crypto subsystem clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 31/45] cpu alloc: Remove the allocpercpu functionality clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 32/45] Module handling: Use CPU_xx ops to dynamically allocate counters clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 33/45] x86_64: Use CPU ops for nmi alert counter clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 34/45] x86_64: Fold percpu area into the cpu area clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 35/45] X86_64: Declare pda as per cpu data thereby moving it " clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 36/45] X86_64: Place pda first in " clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 37/45] x86_64: Support for fast per cpu operations clameter, Christoph Lameter
2007-11-20  2:00   ` H. Peter Anvin
2007-11-20  2:03     ` Christoph Lameter
2007-11-20  2:15       ` H. Peter Anvin
2007-11-20  2:17     ` David Miller
2007-11-20  2:19       ` H. Peter Anvin
2007-11-20  3:23         ` Andi Kleen
2007-11-20  2:45     ` Paul Mackerras
2007-11-20  1:12 ` [rfc 38/45] x86_64: Remove obsolete per_cpu offset calculations clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 39/45] x86_64: Remove the data_offset field from the pda clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 40/45] x86_64: Provide per_cpu_var definition clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 41/45] VM statistics: Use CPU ops clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 43/45] x86_64: Add a CPU_OR to support or_pda() clameter, Christoph Lameter
2007-11-20  1:12 ` [rfc 44/45] Remove local_t support clameter, Christoph Lameter
2007-11-20 12:59   ` Mathieu Desnoyers
2007-11-20 20:48     ` Christoph Lameter
2007-11-20  1:12 ` [rfc 45/45] Modules: Hack to handle symbols that have a zero value clameter, Christoph Lameter
2007-11-20  2:20   ` Mathieu Desnoyers
2007-11-20  2:49     ` Christoph Lameter
2007-11-20  3:29       ` Mathieu Desnoyers
2007-11-20  1:18 ` [rfc 00/45] [RFC] CPU ops and a rework of per cpu data handling on x86_64 Christoph Lameter
2007-11-20  1:51 ` David Miller
2007-11-20  1:59   ` Christoph Lameter
2007-11-20  2:10     ` David Miller
2007-11-20  2:12       ` Christoph Lameter
2007-11-20  3:25   ` Andi Kleen
2007-11-20  3:33     ` Christoph Lameter
2007-11-20  4:04     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox