From: Christoph Lameter <clameter@sgi.com>
To: akpm@linux-foundation.org
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
David Miller <davem@davemloft.net>,
Eric Dumazet <dada1@cosmosbay.com>,
Peter Zijlstra <peterz@infradead.org>,
Rusty Russell <rusty@rustcorp.com.au>,
Mike Travis <travis@sgi.com>
Subject: [patch 00/41] cpu alloc / cpu ops v3: Optimize per cpu access
Date: Thu, 29 May 2008 20:56:20 -0700 [thread overview]
Message-ID: <20080530035620.587204923@sgi.com> (raw)
In various places the kernel maintains arrays of pointers indexed by
processor numbers. These are used to locate objects that need to be used
when executing on a specirfic processor. Both the slab allocator
and the page allocator use these arrays and there the arrays are used in
performance critical code. The allocpercpu functionality is a simple
allocator to provide these arrays. However, there are certain drawbacks
in using such arrays:
1. The arrays become huge for large systems and may be very sparsely
populated (if they are dimensionied for NR_CPUS) on an architecture
like IA64 that allows up to 4k cpus if a kernel is then booted on a
machine that only supports 8 processors. We could nr_cpu_ids there
but we would still have to allocate all possible processors up to
the number of processor ids. cpu_alloc can deal with sparse cpu_maps.
2. The arrays cause surrounding variables to no longer fit into a single
cacheline. The layout of core data structure is typically optimized so
that variables frequently used together are placed in the same cacheline.
Arrays of pointers move these variables far apart and destroy this effect.
3. A processor frequently follows only one pointer for its own use. Thus
that cacheline with that pointer has to be kept in memory. The neighboring
pointers are all to other processors that are rarely used. So a whole
cacheline of 128 bytes may be consumed but only 8 bytes of information
is constant use. It would be better to be able to place more information
in this cacheline.
4. The lookup of the per cpu object is expensive and requires multiple
memory accesses to:
A) smp_processor_id()
B) pointer to the base of the per cpu pointer array
C) pointer to the per cpu object in the pointer array
D) the per cpu object itself.
5. Each use of allocper requires its own per cpu array. On large
system large arrays have to be allocated again and again.
6. Processor hotplug cannot effectively track the per cpu objects
since the VM cannot find all memory that was allocated for
a specific cpu. It is impossible to add or remove objects in
a consistent way. Although the allocpercpu subsystem was extended
to add that capability is not used since use would require adding
cpu hotplug callbacks to each and every use of allocpercpu in
the kernel.
The patchset here provides an cpu allocator that arranges data differently.
Objects are placed tightly in linear areas reserved for each processor.
The areas are of a fixed size so that address calculation can be used
instead of a lookup. This means that
1. The VM knows where all the per cpu variables are and it could remove
or add cpu areas as cpus come online or go offline.
2. There is only a single per cpu array that is used for the percpu area
and all per cpu allocations.
3. The lookup of a per cpu object is easy and requires memory access to
(worst case: architecture does not provide cpu ops):
A) per cpu offset from the per cpu pointer table
(if its the current processor then there is usually some
more efficient means of retrieving the offset)
B) cpu pointer to the object
C) the per cpu object itself.
4. Surrounding variables can be placed in the same cacheline.
This allow f.e. in SLUB to avoid caching objects in per cpu structures
since the kmem_cache structure is finally available without the need
to access a cache cold cacheline.
5. A single pointer can be used regardless of the number of processors
in the system.
The cpu allocator manages a fixed size data per cpu data area. The size
can be configured as needed.
The current usage of the cpu area can be seen in the field
cpu_bytes
in /proc/vmstat
The patchset is agsinst 2.6.26-rc4.
There are two arch implementation of cpu ops provides.
1. x86. Another version of the zero based x86 patches
exist by Mike.
2. IA64. Limited implementation since IA64 has
no fast RMV ops. But we can avoid the addition of the
my_cpu_offset in hotpaths.
This is a rather complex patchset and I am not sure how to merge it.
Maybe it would be best to merge a piece at a time beginning with the
basic infrastructure in the first few patches?
--
next reply other threads:[~2008-05-30 4:00 UTC|newest]
Thread overview: 163+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-30 3:56 Christoph Lameter [this message]
2008-05-30 3:56 ` [patch 01/41] cpu_alloc: Increase percpu area size to 128k Christoph Lameter
2008-06-02 17:58 ` Luck, Tony
2008-06-02 23:48 ` Rusty Russell
2008-06-10 17:22 ` Christoph Lameter
2008-06-10 17:22 ` Christoph Lameter
2008-06-10 19:54 ` Luck, Tony
2008-05-30 3:56 ` [patch 02/41] cpu alloc: The allocator Christoph Lameter
2008-05-30 4:58 ` Andrew Morton
2008-05-30 5:10 ` Christoph Lameter
2008-05-30 5:31 ` Andrew Morton
2008-06-02 9:29 ` Paul Jackson
2008-05-30 5:56 ` KAMEZAWA Hiroyuki
2008-05-30 6:16 ` Christoph Lameter
2008-06-04 14:48 ` Mike Travis
2008-05-30 5:04 ` Eric Dumazet
2008-05-30 5:20 ` Christoph Lameter
2008-05-30 5:52 ` Rusty Russell
2008-06-04 15:30 ` Mike Travis
2008-06-05 23:48 ` Rusty Russell
2008-05-30 5:54 ` Eric Dumazet
2008-06-04 14:58 ` Mike Travis
2008-06-04 15:11 ` Eric Dumazet
2008-06-06 0:32 ` Rusty Russell
2008-06-06 0:32 ` Rusty Russell
2008-06-10 17:33 ` Christoph Lameter
2008-06-10 18:05 ` Eric Dumazet
2008-06-10 18:28 ` Christoph Lameter
2008-05-30 5:46 ` Rusty Russell
2008-06-04 15:04 ` Mike Travis
2008-06-10 17:34 ` Christoph Lameter
2008-05-31 20:58 ` Pavel Machek
2008-05-30 3:56 ` [patch 03/41] cpu alloc: Use cpu allocator instead of the builtin modules per cpu allocator Christoph Lameter
2008-05-30 4:58 ` Andrew Morton
2008-05-30 5:14 ` Christoph Lameter
2008-05-30 5:34 ` Andrew Morton
2008-05-30 6:08 ` Rusty Russell
2008-05-30 6:21 ` Christoph Lameter
2008-05-30 3:56 ` [patch 04/41] cpu ops: Core piece for generic atomic per cpu operations Christoph Lameter
2008-05-30 4:58 ` Andrew Morton
2008-05-30 5:17 ` Christoph Lameter
2008-05-30 5:38 ` Andrew Morton
2008-05-30 6:12 ` Christoph Lameter
2008-05-30 7:08 ` Rusty Russell
2008-05-30 18:00 ` Christoph Lameter
2008-06-02 2:00 ` Rusty Russell
2008-06-04 18:18 ` Mike Travis
2008-06-05 23:59 ` Rusty Russell
2008-06-09 19:00 ` Christoph Lameter
2008-06-09 23:27 ` Rusty Russell
2008-06-09 23:54 ` Christoph Lameter
2008-06-10 2:56 ` Rusty Russell
2008-06-10 3:18 ` Christoph Lameter
2008-06-11 0:03 ` Rusty Russell
2008-06-11 0:15 ` Christoph Lameter
2008-06-09 23:09 ` Christoph Lameter
2008-06-10 17:42 ` Christoph Lameter
2008-06-11 11:10 ` Rusty Russell
2008-06-11 23:39 ` Christoph Lameter
2008-06-12 0:58 ` Nick Piggin
2008-06-12 2:44 ` Rusty Russell
2008-06-12 3:40 ` Nick Piggin
2008-06-12 9:37 ` Martin Peschke
2008-06-12 11:21 ` Nick Piggin
2008-06-12 17:19 ` Christoph Lameter
2008-06-13 0:38 ` Rusty Russell
2008-06-13 2:27 ` Christoph Lameter
2008-06-15 10:33 ` Rusty Russell
2008-06-15 10:33 ` Rusty Russell
2008-06-16 14:52 ` Christoph Lameter
2008-06-17 0:24 ` Rusty Russell
2008-06-17 2:29 ` Christoph Lameter
2008-06-17 14:21 ` Mike Travis
2008-05-30 7:05 ` Rusty Russell
2008-05-30 6:32 ` Rusty Russell
2008-05-30 3:56 ` [patch 05/41] cpu alloc: Percpu_counter conversion Christoph Lameter
2008-05-30 6:47 ` Rusty Russell
2008-05-30 17:54 ` Christoph Lameter
2008-05-30 3:56 ` [patch 06/41] cpu alloc: crash_notes conversion Christoph Lameter
2008-05-30 3:56 ` [patch 07/41] cpu alloc: Workqueue conversion Christoph Lameter
2008-05-30 3:56 ` [patch 08/41] cpu alloc: ACPI cstate handling conversion Christoph Lameter
2008-05-30 3:56 ` [patch 09/41] cpu alloc: Genhd statistics conversion Christoph Lameter
2008-05-30 3:56 ` [patch 10/41] cpu alloc: blktrace conversion Christoph Lameter
2008-05-30 3:56 ` [patch 11/41] cpu alloc: SRCU cpu alloc conversion Christoph Lameter
2008-05-30 3:56 ` [patch 12/41] cpu alloc: XFS counter conversion Christoph Lameter
2008-05-30 3:56 ` [patch 13/41] cpu alloc: NFS statistics Christoph Lameter
2008-05-30 3:56 ` [patch 14/41] cpu alloc: Neigbour statistics Christoph Lameter
2008-05-30 3:56 ` [patch 15/41] cpu_alloc: Convert ip route statistics Christoph Lameter
2008-05-30 3:56 ` [patch 16/41] cpu alloc: Tcp statistics conversion Christoph Lameter
2008-05-30 3:56 ` [patch 17/41] cpu alloc: Convert scratches to cpu alloc Christoph Lameter
2008-05-30 3:56 ` [patch 18/41] cpu alloc: Dmaengine conversion Christoph Lameter
2008-05-30 3:56 ` [patch 19/41] cpu alloc: Convert loopback statistics Christoph Lameter
2008-05-30 3:56 ` [patch 20/41] cpu alloc: Veth conversion Christoph Lameter
2008-05-30 3:56 ` [patch 21/41] cpu alloc: Chelsio statistics conversion Christoph Lameter
2008-05-30 3:56 ` [patch 22/41] cpu alloc: Convert network sockets inuse counter Christoph Lameter
2008-05-30 3:56 ` [patch 23/41] cpu alloc: Use it for infiniband Christoph Lameter
2008-05-30 3:56 ` [patch 24/41] cpu alloc: Use in the crypto subsystem Christoph Lameter
2008-05-30 3:56 ` [patch 25/41] cpu alloc: scheduler: Convert cpuusage to cpu_alloc Christoph Lameter
2008-05-30 3:56 ` [patch 26/41] cpu alloc: Convert mib handling to cpu alloc Christoph Lameter
2008-05-30 6:47 ` Eric Dumazet
2008-05-30 18:01 ` Christoph Lameter
2008-05-30 3:56 ` [patch 27/41] cpu alloc: Remove the allocpercpu functionality Christoph Lameter
2008-05-30 4:58 ` Andrew Morton
2008-05-30 3:56 ` [patch 28/41] Module handling: Use CPU_xx ops to dynamically allocate counters Christoph Lameter
2008-05-30 3:56 ` [patch 29/41] x86_64: Use CPU ops for nmi alert counter Christoph Lameter
2008-05-30 3:56 ` [patch 30/41] Remove local_t support Christoph Lameter
2008-05-30 3:56 ` [patch 31/41] VM statistics: Use CPU ops Christoph Lameter
2008-05-30 3:56 ` [patch 32/41] cpu alloc: Use in slub Christoph Lameter
2008-05-30 3:56 ` [patch 33/41] cpu alloc: Remove slub fields Christoph Lameter
2008-05-30 3:56 ` [patch 34/41] cpu alloc: Page allocator conversion Christoph Lameter
2008-05-30 3:56 ` [patch 35/41] Support for CPU ops Christoph Lameter
2008-05-30 4:58 ` Andrew Morton
2008-05-30 5:18 ` Christoph Lameter
2008-05-30 3:56 ` [patch 36/41] Zero based percpu: Infrastructure to rebase the per cpu area to zero Christoph Lameter
2008-05-30 3:56 ` [patch 37/41] x86_64: Fold pda into per cpu area Christoph Lameter
2008-05-30 3:56 ` [patch 38/41] x86: Extend percpu ops to 64 bit Christoph Lameter
2008-05-30 3:56 ` [patch 39/41] x86: Replace cpu_pda() using percpu logic and get rid of _cpu_pda() Christoph Lameter
2008-05-30 3:57 ` [patch 40/41] x86: Replace xxx_pda() operations with x86_xx_percpu() Christoph Lameter
2008-05-30 3:57 ` [patch 41/41] x86_64: Support for cpu ops Christoph Lameter
2008-05-30 4:58 ` [patch 00/41] cpu alloc / cpu ops v3: Optimize per cpu access Andrew Morton
2008-05-30 5:03 ` Christoph Lameter
2008-05-30 5:21 ` Andrew Morton
2008-05-30 5:27 ` Christoph Lameter
2008-05-30 5:49 ` Andrew Morton
2008-05-30 6:16 ` Christoph Lameter
2008-05-30 6:51 ` KAMEZAWA Hiroyuki
2008-05-30 14:38 ` Mike Travis
2008-05-30 17:50 ` Christoph Lameter
2008-05-30 18:00 ` Matthew Wilcox
2008-05-30 18:12 ` Christoph Lameter
2008-05-30 6:01 ` Eric Dumazet
2008-05-30 6:16 ` Andrew Morton
2008-05-30 6:22 ` Christoph Lameter
2008-05-30 6:37 ` Andrew Morton
2008-05-30 11:32 ` Matthew Wilcox
2008-06-04 15:07 ` Mike Travis
2008-06-06 5:33 ` Eric Dumazet
2008-06-06 13:08 ` Mike Travis
2008-06-08 6:00 ` Rusty Russell
2008-06-09 18:44 ` Christoph Lameter
2008-06-09 19:11 ` Andi Kleen
2008-06-09 20:15 ` Eric Dumazet
2008-05-30 9:12 ` Peter Zijlstra
2008-05-30 9:18 ` Ingo Molnar
2008-05-30 18:11 ` Christoph Lameter
2008-05-30 18:40 ` Peter Zijlstra
2008-05-30 18:56 ` Christoph Lameter
2008-05-30 19:13 ` Peter Zijlstra
2008-06-01 3:25 ` Christoph Lameter
2008-06-01 8:19 ` Peter Zijlstra
2008-05-30 18:06 ` Christoph Lameter
2008-05-30 18:19 ` Peter Zijlstra
2008-05-30 18:26 ` Christoph Lameter
2008-05-30 18:47 ` Peter Zijlstra
2008-05-30 19:10 ` Christoph Lameter
2008-05-30 19:21 ` Peter Zijlstra
2008-05-30 19:35 ` Peter Zijlstra
2008-06-01 3:27 ` Christoph Lameter
2008-05-30 18:08 ` Christoph Lameter
2008-05-30 18:39 ` Peter Zijlstra
2008-05-30 18:51 ` Christoph Lameter
2008-05-30 19:00 ` Peter Zijlstra
2008-05-30 19:11 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080530035620.587204923@sgi.com \
--to=clameter@sgi.com \
--cc=akpm@linux-foundation.org \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rusty@rustcorp.com.au \
--cc=travis@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox