From: Nick Piggin <npiggin@suse.de>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>,
"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
Lin Ming <ming.m.lin@intel.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [patch] SLQB slab allocator
Date: Mon, 19 Jan 2009 06:47:30 +0100 [thread overview]
Message-ID: <20090119054730.GA22584@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0901161500080.27283@quilx.com>
On Fri, Jan 16, 2009 at 03:07:52PM -0600, Christoph Lameter wrote:
> On Fri, 16 Jan 2009, Nick Piggin wrote:
>
> > > handled by the same 2M TLB covering a 32k page. If the 4k pages are
> > > dispersed then you may need 8 2M tlbs (which covers already a quarter of
> > > the available 2M TLBs on nehalem f.e.) for which the larger alloc just
> > > needs a single one.
> >
> > Yes I know that. But it's pretty theoretical IMO (and I could equally
> > describe a theoretical situation where increased fragmentation in higher
> > order slabs will result in worse TLB coverage).
>
> Theoretical only for low sizes of memory. If you have terabytes of memory
> then this becomes significant in a pretty fast way.
I don't really buy that as a general statement with no other qualifiers.
If the huge system has a correspondingly increased number of slab
objects, then the potential win is much smaller as system sizes increases.
Say if you have a 1GB RAM system, with 128 2MB TLBs, and suppose you have
a slab that takes 25% of the RAM. Then if you optimally pack the slab for
TLB access of those objects in a random pattern, then you can get 100%
TLB hit for a given random access. And 25% in the case of random allocations.
In a 1TB RAM system, you have ~0.1% chance of TLB hit in the optimally packed
case and ~0.025% chance of hit in the random case. So in that case it is a
much smaller (negligable) possible gain from packing.
And note that we're talking about best possible packing scenario (ie. 2MB
pages vs 4K pages). The standard SLUB tune would not get anywhere near that.
The thing IMO you forget with all these doomsday scenarios about SGI's peta
scale systems is that no matter what you do, you can't avoid the fact that
computing is about locality. Even if you totally take the TLB out of the
equation, you still have the small detail of other caches. Code that jumps
all over that 1024 TB of memory with no locality is going to suck regardless
of what the kernel ever does, due to physical limitations of hardware.
Anyway, this is trivial for SLQB to tune this on those systems that really
care, giving SLUB no advantage. So we needn't really get sidetracked with
this in the context of SLQB.
> > > It has lists of free objects that are bound to a particular page. That
> > > simplifies numa handling since all the objects in a "queue" (or page) have
> > > the same NUMA characteristics.
> >
> > The same can be said of SLQB and SLAB as well.
>
> Sorry not at all. SLAB and SLQB queue objects from different pages in the
> same queue.
The last sentence is what I was replying to. Ie. "simplification of
numa handling" does not follow from the SLUB implementation of per-page
freelists.
> > > was assigned to a processor. Memory wastage may only occur because
> > > each processor needs to have a separate page from which to allocate. SLAB
> > > like designs needs to put a large number of objects in queues which may
> > > keep a number of pages in the allocated pages pool although all objects
> > > are unused. That does not occur with slub.
> >
> > That's wrong. SLUB keeps completely free pages on its partial lists, and
> > also IIRC can keep free pages pinned in the per-cpu page. I have actually
> > seen SLQB use less memory than SLUB in some situations for this reason.
>
> As I sad it pins a single page in the per cpu page and uses that in a way
> that you call a queue and I call a freelist.
And you found you have to increase the size of your pages because you
need bigger queues. (must we argue semantics? it is a list of free
objects)
> SLUB keeps a few pages on the partial list right now because it tries to
> avoid trips to the page allocator (which is quite slow). These could be
> eliminated if the page allocator would work effectively. However that
> number is a per node limit.
This is the practical vs theoretical I'm talking about.
> SLAB and SLUB can have large quantities of objects in their queues that
> each can keep a single page out of circulation if its the last
> object in that page. This is per queue thing and you have at least two
And if that were a problem, SLQB can easily be runtime tuned to keep no
objects in its object lists. But as I said, queueing is good, so why
would anybody want to get rid of it?
> queues per cpu. SLAB has queues per cpu, per pair of cpu, per node and per
> alien node for each node. That can pin quite a number of pages on large
> systems. Note that SLAB has one per cpu whereas you have already 2 per
> cpu? In SMP configurations this may mean that SLQB has more queues than
> SLAB.
Again, this doesn't really go anywhere while we disagree on the
fundamental goodliness of queueing. This is just describing the
implementation.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-01-19 5:47 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-14 9:04 [patch] SLQB slab allocator Nick Piggin
2009-01-14 10:53 ` Pekka Enberg
2009-01-14 11:47 ` Nick Piggin
2009-01-14 13:44 ` Pekka Enberg
2009-01-14 14:22 ` Nick Piggin
2009-01-14 14:45 ` Pekka Enberg
2009-01-14 15:09 ` Nick Piggin
2009-01-14 15:22 ` Nick Piggin
2009-01-14 15:30 ` Pekka Enberg
2009-01-14 15:59 ` Nick Piggin
2009-01-14 18:40 ` Christoph Lameter
2009-01-15 6:19 ` Nick Piggin
2009-01-15 20:47 ` Christoph Lameter
2009-01-16 3:43 ` Nick Piggin
2009-01-16 21:25 ` Christoph Lameter
2009-01-19 6:18 ` Nick Piggin
2009-01-22 0:13 ` Christoph Lameter
2009-01-22 9:27 ` Pekka Enberg
2009-01-22 9:30 ` Zhang, Yanmin
2009-01-22 9:33 ` Pekka Enberg
2009-01-23 15:32 ` Christoph Lameter
2009-01-23 15:37 ` Pekka Enberg
2009-01-23 15:42 ` Christoph Lameter
2009-01-23 15:32 ` Christoph Lameter
2009-01-23 4:09 ` Nick Piggin
2009-01-23 15:41 ` Christoph Lameter
2009-01-23 15:53 ` Nick Piggin
2009-01-26 17:28 ` Christoph Lameter
2009-02-03 1:53 ` Nick Piggin
2009-02-03 17:33 ` Christoph Lameter
2009-02-03 18:42 ` Pekka Enberg
2009-02-03 18:47 ` Pekka Enberg
2009-02-04 4:22 ` Nick Piggin
2009-02-04 20:09 ` Christoph Lameter
2009-02-05 3:18 ` Nick Piggin
2009-02-04 20:10 ` Christoph Lameter
2009-02-05 3:14 ` Nick Piggin
2009-02-04 4:07 ` Nick Piggin
2009-01-14 18:01 ` Christoph Lameter
2009-01-15 6:03 ` Nick Piggin
2009-01-15 20:05 ` Christoph Lameter
2009-01-16 3:19 ` Nick Piggin
2009-01-16 21:07 ` Christoph Lameter
2009-01-19 5:47 ` Nick Piggin [this message]
2009-01-22 0:19 ` Christoph Lameter
2009-01-23 4:17 ` Nick Piggin
2009-01-23 15:52 ` Christoph Lameter
2009-01-23 16:10 ` Nick Piggin
2009-01-23 17:09 ` Nick Piggin
2009-01-26 17:46 ` Christoph Lameter
2009-02-03 1:42 ` Nick Piggin
2009-01-26 17:34 ` Christoph Lameter
2009-02-03 1:48 ` Nick Piggin
-- strict thread matches above, loose matches on Subject: below --
2009-01-21 14:30 Nick Piggin
2009-01-21 14:59 ` Ingo Molnar
2009-01-21 15:17 ` Nick Piggin
2009-01-21 16:56 ` Nick Piggin
2009-01-21 17:40 ` Ingo Molnar
2009-01-23 3:31 ` Nick Piggin
2009-01-23 6:14 ` Nick Piggin
2009-01-23 12:56 ` Ingo Molnar
2009-01-21 17:59 ` Joe Perches
2009-01-23 3:35 ` Nick Piggin
2009-01-23 4:00 ` Joe Perches
2009-01-21 18:10 ` Hugh Dickins
2009-01-22 10:01 ` Pekka Enberg
2009-01-22 12:47 ` Hugh Dickins
2009-01-23 14:23 ` Hugh Dickins
2009-01-23 14:30 ` Pekka Enberg
2009-02-02 3:38 ` Zhang, Yanmin
2009-02-02 9:00 ` Pekka Enberg
2009-02-02 15:00 ` Christoph Lameter
2009-02-03 1:34 ` Zhang, Yanmin
2009-02-03 7:29 ` Zhang, Yanmin
2009-02-03 12:18 ` Hugh Dickins
2009-02-04 2:21 ` Zhang, Yanmin
2009-02-05 19:04 ` Hugh Dickins
2009-02-06 0:47 ` Zhang, Yanmin
2009-02-06 8:57 ` Pekka Enberg
2009-02-06 12:33 ` Hugh Dickins
2009-02-10 8:56 ` Zhang, Yanmin
2009-02-02 11:50 ` Hugh Dickins
2009-01-23 3:55 ` Nick Piggin
2009-01-23 13:57 ` Hugh Dickins
2009-01-22 8:45 ` Zhang, Yanmin
2009-01-23 3:57 ` Nick Piggin
2009-01-23 9:00 ` Nick Piggin
2009-01-23 13:34 ` Hugh Dickins
2009-01-23 13:44 ` Nick Piggin
2009-01-23 9:55 ` Andi Kleen
2009-01-23 10:13 ` Pekka Enberg
2009-01-23 11:25 ` Nick Piggin
2009-01-23 11:57 ` Andi Kleen
2009-01-23 13:18 ` Nick Piggin
2009-01-23 14:04 ` Andi Kleen
2009-01-23 14:27 ` Nick Piggin
2009-01-23 15:06 ` Andi Kleen
2009-01-23 15:15 ` Nick Piggin
2009-01-23 12:55 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090119054730.GA22584@wotan.suse.de \
--to=npiggin@suse.de \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ming.m.lin@intel.com \
--cc=penberg@cs.helsinki.fi \
--cc=torvalds@linux-foundation.org \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).