linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: akpm@linux-foundation.org, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	Pekka Enberg <penberg@cs.helsinki.fi>
Subject: Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead
Date: Thu, 01 Nov 2007 08:17:58 +0100	[thread overview]
Message-ID: <47297DA6.5070904@cosmosbay.com> (raw)
In-Reply-To: <20071101000211.970501947@sgi.com>

Christoph Lameter a écrit :
> This patch increases the speed of the SLUB fastpath by
> improving the per cpu allocator and makes it usable for SLUB.
> 
> Currently allocpercpu manages arrays of pointer to per cpu objects.
> This means that is has to allocate the arrays and then populate them
> as needed with objects. Although these objects are called per cpu
> objects they cannot be handled in the same way as per cpu objects
> by adding the per cpu offset of the respective cpu.
> 
> The patch here changes that. We create a small memory pool in the
> percpu area and allocate from there if alloc per cpu is called.
> As a result we do not need the per cpu pointer arrays for each
> object. This reduces memory usage and also the cache foot print
> of allocpercpu users. Also the per cpu objects for a single processor
> are tightly packed next to each other decreasing cache footprint
> even further and making it possible to access multiple objects
> in the same cacheline.
> 
> SLUB has the same mechanism implemented. After fixing up the
> alloccpu stuff we throw the SLUB method out and use the new
> allocpercpu handling. Then we optimize allocpercpu addressing
> by adding a new function
> 
> 	this_cpu_ptr()
> 
> that allows the determination of the per cpu pointer for the
> current processor in an more efficient way on many platforms.
> 
> This increases the speed of SLUB (and likely other kernel subsystems
> that benefit from the allocpercpu enhancements):
> 
> 
>        SLAB    SLUB    SLUB+   SLUB-o	SLUB-a
>    8    96      86      45      44      38	3 *
>   16    84      92      49      48      43	2 *
>   32    84      106     61      59      53	+++
>   64    102     129     82      88      75	++
>  128    147     226     188     181     176	-
>  256    200     248     207     285     204	=
>  512    300     301     260     209     250	+
> 1024    416     440     398     264     391	++
> 2048    720     542     530     390     511	+++
> 4096    1254    342     342     336     376	3 *
> 
> alloc/free test
>       SLAB    SLUB    SLUB+   SLUB-o	SLUB-a
>       137-146 151     68-72   68-74	56-58	3 *
> 
> Note: The per cpu optimization are only half way there because of the screwed
> up way that x86_64 handles its cpu area that causes addditional cycles to be
> spend by retrieving a pointer from memory and adding it to the address.
> The i386 code is much less cycle intensive being able to get to per cpu
> data using a segment prefix and if we can get that to work on x86_64
> then we may be able to get the cycle count for the fastpath down to 20-30
> cycles.
> 

Really sounds good Christoph, not only for SLUB, so I guess the 32k limit is 
not enough because many things will use per_cpu if only per_cpu was reasonably 
fast (ie not so many dereferences)

I think this question already came in the past and Linus already answered it, 
but I again ask it. What about VM games with modern cpus (64 bits arches)

Say we reserve on x86_64 a really huge (2^32 bytes) area, and change VM layout 
so that each cpu maps its own per_cpu area on this area, so that the local 
per_cpu data sits in the same virtual address on each cpu. Then we dont need a 
segment prefix nor adding a 'per_cpu offset'. No need to write special asm 
functions to read/write/increment a per_cpu data and gcc could use normal 
rules for optimizations.

We only would need adding "per_cpu offset" to get data for a given cpu.


  parent reply	other threads:[~2007-11-01  7:18 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-01  0:02 [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead Christoph Lameter
2007-11-01  0:02 ` [patch 1/7] allocpercpu: Make it a true per cpu allocator by allocating from a per cpu array Christoph Lameter
2007-11-01  7:24   ` Eric Dumazet
2007-11-01 12:59     ` Christoph Lameter
2007-11-01  0:02 ` [patch 2/7] allocpercpu: Remove functions that are rarely used Christoph Lameter
2007-11-01  0:02 ` [patch 3/7] Allocpercpu: Do __percpu_disguise() only if CONFIG_DEBUG_VM is set Christoph Lameter
2007-11-01  7:25   ` Eric Dumazet
2007-11-01  0:02 ` [patch 4/7] Percpu: Add support for this_cpu_offset() to be able to create this_cpu_ptr() Christoph Lameter
2007-11-01  0:02 ` [patch 5/7] SLUB: Use allocpercpu to allocate per cpu data instead of running our own per cpu allocator Christoph Lameter
2007-11-01  0:02 ` [patch 6/7] SLUB: No need to cache kmem_cache data in kmem_cache_cpu anymore Christoph Lameter
2007-11-01  0:02 ` [patch 7/7] SLUB: Optimize per cpu access on the local cpu using this_cpu_ptr() Christoph Lameter
2007-11-01  0:24 ` [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead David Miller
2007-11-01  0:26   ` Christoph Lameter
2007-11-01  0:27     ` David Miller
2007-11-01  0:31       ` Christoph Lameter
2007-11-01  0:51         ` David Miller
2007-11-01  0:53           ` Christoph Lameter
2007-11-01  1:00             ` David Miller
2007-11-01  1:01               ` Christoph Lameter
2007-11-01  1:09                 ` David Miller
2007-11-01  1:12                   ` Christoph Lameter
2007-11-01  1:13                     ` David Miller
2007-11-01  1:21                       ` Christoph Lameter
2007-11-01  5:27                         ` David Miller
2007-11-01  4:16                       ` Christoph Lameter
2007-11-01  5:38                         ` David Miller
2007-11-01  7:01                         ` David Miller
2007-11-01  9:14                           ` David Miller
2007-11-01 13:03                             ` Christoph Lameter
2007-11-01 21:29                               ` David Miller
2007-11-01 22:15                                 ` Christoph Lameter
2007-11-01 22:38                                   ` David Miller
2007-11-01 22:48                                     ` Christoph Lameter
2007-11-01 22:58                                       ` David Miller
2007-11-02  1:06                                         ` Christoph Lameter
2007-11-02  2:51                                           ` David Miller
2007-11-02 10:28                                         ` Peter Zijlstra
2007-11-02 14:35                                           ` Christoph Lameter
2007-11-02 15:20                                             ` Peter Zijlstra
2007-11-02 15:29                                               ` Christoph Lameter
2007-11-12 10:52                                         ` Herbert Xu
2007-11-12 19:14                                           ` Christoph Lameter
2007-11-12 19:48                                             ` Eric Dumazet
2007-11-12 19:56                                               ` Christoph Lameter
2007-11-12 20:18                                                 ` Eric Dumazet
2007-11-12 22:46                                                   ` David Miller
2007-11-12 19:57                                               ` Luck, Tony
2007-11-12 20:14                                                 ` Eric Dumazet
2007-11-12 22:46                                                   ` David Miller
2007-11-12 21:28                                           ` David Miller
2007-11-01 23:00                                       ` Eric Dumazet
2007-11-02  0:58                                         ` Christoph Lameter
2007-11-02  1:40                                         ` Christoph Lameter
2007-11-01  7:17 ` Eric Dumazet [this message]
2007-11-01  7:57   ` David Miller
2007-11-01 13:01     ` Christoph Lameter
2007-11-01 21:25       ` David Miller
2007-11-01 12:57   ` Christoph Lameter
2007-11-01 21:28     ` David Miller
2007-11-01 22:11       ` Christoph Lameter
2007-11-01 22:14         ` David Miller
2007-11-01 22:16           ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47297DA6.5070904@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=penberg@cs.helsinki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).