From: Eric Dumazet <dada1@cosmosbay.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: akpm@linux-foundation.org, linux-arch@vger.kernel.org,
linux-kernel@vger.kernel.org,
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
Pekka Enberg <penberg@cs.helsinki.fi>
Subject: Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead
Date: Thu, 01 Nov 2007 08:17:58 +0100 [thread overview]
Message-ID: <47297DA6.5070904@cosmosbay.com> (raw)
In-Reply-To: <20071101000211.970501947@sgi.com>
Christoph Lameter a écrit :
> This patch increases the speed of the SLUB fastpath by
> improving the per cpu allocator and makes it usable for SLUB.
>
> Currently allocpercpu manages arrays of pointer to per cpu objects.
> This means that is has to allocate the arrays and then populate them
> as needed with objects. Although these objects are called per cpu
> objects they cannot be handled in the same way as per cpu objects
> by adding the per cpu offset of the respective cpu.
>
> The patch here changes that. We create a small memory pool in the
> percpu area and allocate from there if alloc per cpu is called.
> As a result we do not need the per cpu pointer arrays for each
> object. This reduces memory usage and also the cache foot print
> of allocpercpu users. Also the per cpu objects for a single processor
> are tightly packed next to each other decreasing cache footprint
> even further and making it possible to access multiple objects
> in the same cacheline.
>
> SLUB has the same mechanism implemented. After fixing up the
> alloccpu stuff we throw the SLUB method out and use the new
> allocpercpu handling. Then we optimize allocpercpu addressing
> by adding a new function
>
> this_cpu_ptr()
>
> that allows the determination of the per cpu pointer for the
> current processor in an more efficient way on many platforms.
>
> This increases the speed of SLUB (and likely other kernel subsystems
> that benefit from the allocpercpu enhancements):
>
>
> SLAB SLUB SLUB+ SLUB-o SLUB-a
> 8 96 86 45 44 38 3 *
> 16 84 92 49 48 43 2 *
> 32 84 106 61 59 53 +++
> 64 102 129 82 88 75 ++
> 128 147 226 188 181 176 -
> 256 200 248 207 285 204 =
> 512 300 301 260 209 250 +
> 1024 416 440 398 264 391 ++
> 2048 720 542 530 390 511 +++
> 4096 1254 342 342 336 376 3 *
>
> alloc/free test
> SLAB SLUB SLUB+ SLUB-o SLUB-a
> 137-146 151 68-72 68-74 56-58 3 *
>
> Note: The per cpu optimization are only half way there because of the screwed
> up way that x86_64 handles its cpu area that causes addditional cycles to be
> spend by retrieving a pointer from memory and adding it to the address.
> The i386 code is much less cycle intensive being able to get to per cpu
> data using a segment prefix and if we can get that to work on x86_64
> then we may be able to get the cycle count for the fastpath down to 20-30
> cycles.
>
Really sounds good Christoph, not only for SLUB, so I guess the 32k limit is
not enough because many things will use per_cpu if only per_cpu was reasonably
fast (ie not so many dereferences)
I think this question already came in the past and Linus already answered it,
but I again ask it. What about VM games with modern cpus (64 bits arches)
Say we reserve on x86_64 a really huge (2^32 bytes) area, and change VM layout
so that each cpu maps its own per_cpu area on this area, so that the local
per_cpu data sits in the same virtual address on each cpu. Then we dont need a
segment prefix nor adding a 'per_cpu offset'. No need to write special asm
functions to read/write/increment a per_cpu data and gcc could use normal
rules for optimizations.
We only would need adding "per_cpu offset" to get data for a given cpu.
next prev parent reply other threads:[~2007-11-01 7:18 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-01 0:02 [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead Christoph Lameter
2007-11-01 0:02 ` Christoph Lameter
2007-11-01 0:02 ` [patch 1/7] allocpercpu: Make it a true per cpu allocator by allocating from a per cpu array Christoph Lameter
2007-11-01 0:02 ` Christoph Lameter
2007-11-01 7:24 ` Eric Dumazet
2007-11-01 12:59 ` Christoph Lameter
2007-11-01 0:02 ` [patch 2/7] allocpercpu: Remove functions that are rarely used Christoph Lameter
2007-11-01 0:02 ` Christoph Lameter
2007-11-01 0:02 ` [patch 3/7] Allocpercpu: Do __percpu_disguise() only if CONFIG_DEBUG_VM is set Christoph Lameter
2007-11-01 0:02 ` Christoph Lameter
2007-11-01 7:25 ` Eric Dumazet
2007-11-01 0:02 ` [patch 4/7] Percpu: Add support for this_cpu_offset() to be able to create this_cpu_ptr() Christoph Lameter
2007-11-01 0:02 ` Christoph Lameter
2007-11-01 0:02 ` [patch 5/7] SLUB: Use allocpercpu to allocate per cpu data instead of running our own per cpu allocator Christoph Lameter
2007-11-01 0:02 ` Christoph Lameter
2007-11-01 0:02 ` [patch 6/7] SLUB: No need to cache kmem_cache data in kmem_cache_cpu anymore Christoph Lameter
2007-11-01 0:02 ` Christoph Lameter
2007-11-01 0:02 ` [patch 7/7] SLUB: Optimize per cpu access on the local cpu using this_cpu_ptr() Christoph Lameter
2007-11-01 0:02 ` Christoph Lameter
2007-11-01 0:24 ` [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead David Miller
2007-11-01 0:26 ` Christoph Lameter
2007-11-01 0:27 ` David Miller
2007-11-01 0:31 ` Christoph Lameter
2007-11-01 0:51 ` David Miller
2007-11-01 0:53 ` Christoph Lameter
2007-11-01 1:00 ` David Miller
2007-11-01 1:01 ` Christoph Lameter
2007-11-01 1:09 ` David Miller
2007-11-01 1:12 ` Christoph Lameter
2007-11-01 1:13 ` David Miller
2007-11-01 1:21 ` Christoph Lameter
2007-11-01 5:27 ` David Miller
2007-11-01 4:16 ` Christoph Lameter
2007-11-01 5:38 ` David Miller
2007-11-01 7:01 ` David Miller
2007-11-01 9:14 ` David Miller
2007-11-01 13:03 ` Christoph Lameter
2007-11-01 21:29 ` David Miller
2007-11-01 22:15 ` Christoph Lameter
2007-11-01 22:38 ` David Miller
2007-11-01 22:48 ` Christoph Lameter
2007-11-01 22:58 ` David Miller
2007-11-02 1:06 ` Christoph Lameter
2007-11-02 2:51 ` David Miller
2007-11-02 10:28 ` Peter Zijlstra
2007-11-02 14:35 ` Christoph Lameter
2007-11-02 15:20 ` Peter Zijlstra
2007-11-02 15:29 ` Christoph Lameter
2007-11-12 10:52 ` Herbert Xu
2007-11-12 10:52 ` Herbert Xu
2007-11-12 19:14 ` Christoph Lameter
2007-11-12 19:48 ` Eric Dumazet
2007-11-12 19:56 ` Christoph Lameter
2007-11-12 20:18 ` Eric Dumazet
2007-11-12 22:46 ` David Miller
2007-11-12 19:57 ` Luck, Tony
2007-11-12 19:57 ` Luck, Tony
2007-11-12 20:14 ` Eric Dumazet
2007-11-12 22:46 ` David Miller
2007-11-12 21:28 ` David Miller
2007-11-01 23:00 ` Eric Dumazet
2007-11-02 0:58 ` Christoph Lameter
2007-11-02 1:40 ` Christoph Lameter
2007-11-01 7:17 ` Eric Dumazet [this message]
2007-11-01 7:57 ` David Miller
2007-11-01 13:01 ` Christoph Lameter
2007-11-01 21:25 ` David Miller
2007-11-01 12:57 ` Christoph Lameter
2007-11-01 21:28 ` David Miller
2007-11-01 22:11 ` Christoph Lameter
2007-11-01 22:14 ` David Miller
2007-11-01 22:16 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47297DA6.5070904@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=penberg@cs.helsinki.fi \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.