From: Eric Dumazet <dada1@cosmosbay.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: akpm@linux-foundation.org, linux-arch@vger.kernel.org,
linux-kernel@vger.kernel.org,
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
Pekka Enberg <penberg@cs.helsinki.fi>
Subject: Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead
Date: Thu, 01 Nov 2007 08:17:58 +0100 [thread overview]
Message-ID: <47297DA6.5070904@cosmosbay.com> (raw)
In-Reply-To: <20071101000211.970501947@sgi.com>
Christoph Lameter a écrit :
> This patch increases the speed of the SLUB fastpath by
> improving the per cpu allocator and makes it usable for SLUB.
>
> Currently allocpercpu manages arrays of pointer to per cpu objects.
> This means that is has to allocate the arrays and then populate them
> as needed with objects. Although these objects are called per cpu
> objects they cannot be handled in the same way as per cpu objects
> by adding the per cpu offset of the respective cpu.
>
> The patch here changes that. We create a small memory pool in the
> percpu area and allocate from there if alloc per cpu is called.
> As a result we do not need the per cpu pointer arrays for each
> object. This reduces memory usage and also the cache foot print
> of allocpercpu users. Also the per cpu objects for a single processor
> are tightly packed next to each other decreasing cache footprint
> even further and making it possible to access multiple objects
> in the same cacheline.
>
> SLUB has the same mechanism implemented. After fixing up the
> alloccpu stuff we throw the SLUB method out and use the new
> allocpercpu handling. Then we optimize allocpercpu addressing
> by adding a new function
>
> this_cpu_ptr()
>
> that allows the determination of the per cpu pointer for the
> current processor in an more efficient way on many platforms.
>
> This increases the speed of SLUB (and likely other kernel subsystems
> that benefit from the allocpercpu enhancements):
>
>
> SLAB SLUB SLUB+ SLUB-o SLUB-a
> 8 96 86 45 44 38 3 *
> 16 84 92 49 48 43 2 *
> 32 84 106 61 59 53 +++
> 64 102 129 82 88 75 ++
> 128 147 226 188 181 176 -
> 256 200 248 207 285 204 =
> 512 300 301 260 209 250 +
> 1024 416 440 398 264 391 ++
> 2048 720 542 530 390 511 +++
> 4096 1254 342 342 336 376 3 *
>
> alloc/free test
> SLAB SLUB SLUB+ SLUB-o SLUB-a
> 137-146 151 68-72 68-74 56-58 3 *
>
> Note: The per cpu optimization are only half way there because of the screwed
> up way that x86_64 handles its cpu area that causes addditional cycles to be
> spend by retrieving a pointer from memory and adding it to the address.
> The i386 code is much less cycle intensive being able to get to per cpu
> data using a segment prefix and if we can get that to work on x86_64
> then we may be able to get the cycle count for the fastpath down to 20-30
> cycles.
>
Really sounds good Christoph, not only for SLUB, so I guess the 32k limit is
not enough because many things will use per_cpu if only per_cpu was reasonably
fast (ie not so many dereferences)
I think this question already came in the past and Linus already answered it,
but I again ask it. What about VM games with modern cpus (64 bits arches)
Say we reserve on x86_64 a really huge (2^32 bytes) area, and change VM layout
so that each cpu maps its own per_cpu area on this area, so that the local
per_cpu data sits in the same virtual address on each cpu. Then we dont need a
segment prefix nor adding a 'per_cpu offset'. No need to write special asm
functions to read/write/increment a per_cpu data and gcc could use normal
rules for optimizations.
We only would need adding "per_cpu offset" to get data for a given cpu.
next prev parent reply other threads:[~2007-11-01 7:18 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-01 0:02 [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead Christoph Lameter
2007-11-01 0:02 ` [patch 1/7] allocpercpu: Make it a true per cpu allocator by allocating from a per cpu array Christoph Lameter
2007-11-01 7:24 ` Eric Dumazet
2007-11-01 12:59 ` Christoph Lameter
2007-11-01 0:02 ` [patch 2/7] allocpercpu: Remove functions that are rarely used Christoph Lameter
2007-11-01 0:02 ` [patch 3/7] Allocpercpu: Do __percpu_disguise() only if CONFIG_DEBUG_VM is set Christoph Lameter
2007-11-01 7:25 ` Eric Dumazet
2007-11-01 0:02 ` [patch 4/7] Percpu: Add support for this_cpu_offset() to be able to create this_cpu_ptr() Christoph Lameter
2007-11-01 0:02 ` [patch 5/7] SLUB: Use allocpercpu to allocate per cpu data instead of running our own per cpu allocator Christoph Lameter
2007-11-01 0:02 ` [patch 6/7] SLUB: No need to cache kmem_cache data in kmem_cache_cpu anymore Christoph Lameter
2007-11-01 0:02 ` [patch 7/7] SLUB: Optimize per cpu access on the local cpu using this_cpu_ptr() Christoph Lameter
2007-11-01 0:24 ` [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead David Miller
2007-11-01 0:26 ` Christoph Lameter
2007-11-01 0:27 ` David Miller
2007-11-01 0:31 ` Christoph Lameter
2007-11-01 0:51 ` David Miller
2007-11-01 0:53 ` Christoph Lameter
2007-11-01 1:00 ` David Miller
2007-11-01 1:01 ` Christoph Lameter
2007-11-01 1:09 ` David Miller
2007-11-01 1:12 ` Christoph Lameter
2007-11-01 1:13 ` David Miller
2007-11-01 1:21 ` Christoph Lameter
2007-11-01 5:27 ` David Miller
2007-11-01 4:16 ` Christoph Lameter
2007-11-01 5:38 ` David Miller
2007-11-01 7:01 ` David Miller
2007-11-01 9:14 ` David Miller
2007-11-01 13:03 ` Christoph Lameter
2007-11-01 21:29 ` David Miller
2007-11-01 22:15 ` Christoph Lameter
2007-11-01 22:38 ` David Miller
2007-11-01 22:48 ` Christoph Lameter
2007-11-01 22:58 ` David Miller
2007-11-02 1:06 ` Christoph Lameter
2007-11-02 2:51 ` David Miller
2007-11-02 10:28 ` Peter Zijlstra
2007-11-02 14:35 ` Christoph Lameter
2007-11-02 15:20 ` Peter Zijlstra
2007-11-02 15:29 ` Christoph Lameter
2007-11-12 10:52 ` Herbert Xu
2007-11-12 19:14 ` Christoph Lameter
2007-11-12 19:48 ` Eric Dumazet
2007-11-12 19:56 ` Christoph Lameter
2007-11-12 20:18 ` Eric Dumazet
2007-11-12 22:46 ` David Miller
2007-11-12 19:57 ` Luck, Tony
2007-11-12 20:14 ` Eric Dumazet
2007-11-12 22:46 ` David Miller
2007-11-12 21:28 ` David Miller
2007-11-01 23:00 ` Eric Dumazet
2007-11-02 0:58 ` Christoph Lameter
2007-11-02 1:40 ` Christoph Lameter
2007-11-01 7:17 ` Eric Dumazet [this message]
2007-11-01 7:57 ` David Miller
2007-11-01 13:01 ` Christoph Lameter
2007-11-01 21:25 ` David Miller
2007-11-01 12:57 ` Christoph Lameter
2007-11-01 21:28 ` David Miller
2007-11-01 22:11 ` Christoph Lameter
2007-11-01 22:14 ` David Miller
2007-11-01 22:16 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47297DA6.5070904@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=penberg@cs.helsinki.fi \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).