* [PATCH] slob: reduce list scanning
@ 2007-07-14 5:54 Matt Mackall
2007-07-16 6:01 ` Nick Piggin
0 siblings, 1 reply; 5+ messages in thread
From: Matt Mackall @ 2007-07-14 5:54 UTC (permalink / raw)
To: linux-kernel; +Cc: Nick Piggin, akpm, Pekka Enberg, Christoph Lameter
The version of SLOB in -mm always scans its free list from the
beginning, which results in small allocations and free segments
clustering at the beginning of the list over time. This causes the
average search to scan over a large stretch at the beginning on each
allocation.
By starting each page search where the last one left off, we evenly
distribute the allocations and greatly shorten the average search.
Without this patch, kernel compiles on a 1.5G machine take a large
amount of system time for list scanning. With this patch, compiles are
within a few seconds of performance of a SLAB kernel with no notable
change in system time.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Index: mm/mm/slob.c
===================================================================
--- mm.orig/mm/slob.c 2007-07-13 17:51:25.000000000 -0500
+++ mm/mm/slob.c 2007-07-13 18:42:59.000000000 -0500
@@ -293,6 +293,7 @@ static void *slob_page_alloc(struct slob
static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
{
struct slob_page *sp;
+ struct list_head *prev;
slob_t *b = NULL;
unsigned long flags;
@@ -307,12 +308,22 @@ static void *slob_alloc(size_t size, gfp
if (node != -1 && page_to_nid(&sp->page) != node)
continue;
#endif
+ /* Enough room on this page? */
+ if (sp->units < SLOB_UNITS(size))
+ continue;
- if (sp->units >= SLOB_UNITS(size)) {
- b = slob_page_alloc(sp, size, align);
- if (b)
- break;
- }
+ /* Attempt to alloc */
+ prev = sp->list.prev;
+ b = slob_page_alloc(sp, size, align);
+ if (!b)
+ continue;
+
+ /* Improve fragment distribution and reduce our average
+ * search time by starting our next search here. (see
+ * Knuth vol 1, sec 2.5, pg 449) */
+ if (free_slob_pages.next != prev->next)
+ list_move_tail(&free_slob_pages, prev->next);
+ break;
}
spin_unlock_irqrestore(&slob_lock, flags);
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] slob: reduce list scanning
2007-07-14 5:54 [PATCH] slob: reduce list scanning Matt Mackall
@ 2007-07-16 6:01 ` Nick Piggin
2007-07-16 7:22 ` Pekka Enberg
2007-07-16 16:49 ` Matt Mackall
0 siblings, 2 replies; 5+ messages in thread
From: Nick Piggin @ 2007-07-16 6:01 UTC (permalink / raw)
To: Matt Mackall; +Cc: linux-kernel, akpm, Pekka Enberg, Christoph Lameter
Matt Mackall wrote:
> The version of SLOB in -mm always scans its free list from the
> beginning, which results in small allocations and free segments
> clustering at the beginning of the list over time. This causes the
> average search to scan over a large stretch at the beginning on each
> allocation.
>
> By starting each page search where the last one left off, we evenly
> distribute the allocations and greatly shorten the average search.
>
> Without this patch, kernel compiles on a 1.5G machine take a large
> amount of system time for list scanning. With this patch, compiles are
> within a few seconds of performance of a SLAB kernel with no notable
> change in system time.
This looks pretty nice, and performance results sound good too.
IMO this should probably be merged along with the previous
SLOB patches, because they removed the cyclic scanning to begin
with (so it may be possible that introduces a performnace
regression in some situations).
I wonder what it would take to close the performance gap further.
I still want to look at per-cpu freelists after Andrew merges
this set of patches. That may improve both cache hotness and
CPU scalability.
Actually SLOB potentially has some fundamental CPU cache hotness
advantages over the other allocators, for the same reasons as
its space advantages. It may be possible to make some workloads
faster with SLOB than with SLUB! Maybe we could remove SLAB and
SLUB then :)
--
SUSE Labs, Novell Inc.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] slob: reduce list scanning
2007-07-16 6:01 ` Nick Piggin
@ 2007-07-16 7:22 ` Pekka Enberg
2007-07-16 8:37 ` Nick Piggin
2007-07-16 16:49 ` Matt Mackall
1 sibling, 1 reply; 5+ messages in thread
From: Pekka Enberg @ 2007-07-16 7:22 UTC (permalink / raw)
To: Nick Piggin; +Cc: Matt Mackall, linux-kernel, akpm, Christoph Lameter
On 7/16/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Actually SLOB potentially has some fundamental CPU cache hotness
> advantages over the other allocators, for the same reasons as
> its space advantages.
Because consecutive allocations hit the same cache-hot page regardless
of requested size where as SLUB by definition distributes allocations
to different pages (some of which may not be hot)?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] slob: reduce list scanning
2007-07-16 7:22 ` Pekka Enberg
@ 2007-07-16 8:37 ` Nick Piggin
0 siblings, 0 replies; 5+ messages in thread
From: Nick Piggin @ 2007-07-16 8:37 UTC (permalink / raw)
To: Pekka Enberg; +Cc: Matt Mackall, linux-kernel, akpm, Christoph Lameter
Pekka Enberg wrote:
> On 7/16/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>> Actually SLOB potentially has some fundamental CPU cache hotness
>> advantages over the other allocators, for the same reasons as
>> its space advantages.
>
>
> Because consecutive allocations hit the same cache-hot page regardless
> of requested size where as SLUB by definition distributes allocations
> to different pages (some of which may not be hot)?
Yeah, that, and also a newly freed slab object is quite likely to be
hot, and that memory can be used by another subsequent allocation --
not always, because the allocation heuristics may not place it there,
but there is potential that is impossible with slab allocators.
--
SUSE Labs, Novell Inc.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] slob: reduce list scanning
2007-07-16 6:01 ` Nick Piggin
2007-07-16 7:22 ` Pekka Enberg
@ 2007-07-16 16:49 ` Matt Mackall
1 sibling, 0 replies; 5+ messages in thread
From: Matt Mackall @ 2007-07-16 16:49 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-kernel, akpm, Pekka Enberg, Christoph Lameter
On Mon, Jul 16, 2007 at 04:01:15PM +1000, Nick Piggin wrote:
> Matt Mackall wrote:
> >The version of SLOB in -mm always scans its free list from the
> >beginning, which results in small allocations and free segments
> >clustering at the beginning of the list over time. This causes the
> >average search to scan over a large stretch at the beginning on each
> >allocation.
> >
> >By starting each page search where the last one left off, we evenly
> >distribute the allocations and greatly shorten the average search.
> >
> >Without this patch, kernel compiles on a 1.5G machine take a large
> >amount of system time for list scanning. With this patch, compiles are
> >within a few seconds of performance of a SLAB kernel with no notable
> >change in system time.
>
> This looks pretty nice, and performance results sound good too.
> IMO this should probably be merged along with the previous
> SLOB patches, because they removed the cyclic scanning to begin
> with (so it may be possible that introduces a performnace
> regression in some situations).
>
> I wonder what it would take to close the performance gap further.
> I still want to look at per-cpu freelists after Andrew merges
> this set of patches. That may improve both cache hotness and
> CPU scalability.
The idea I'm currently kicking around is having an array of spinlocks
and list heads per CPU and add an array index to the SLOB page struct.
To allocate, we loop over the array starting at the current CPU
looking for space. On failure, we add a page to the current CPU's
list. We can imagine several variants here: attempting to trylock
while scanning the list or doing no fallback at all. The first is
liable to be unhelpful if there's actually contention, the second will
consume more total memory but reduce the average scan time.
To free, we locate the list from the page struct so we can grab the
relevant lock.
This probably also ends up being very friendly to NUMA. But it's not
clear that it's worth doing for the common case of 2 cores, where
contention may be too low to be worth the extra trouble.
> Actually SLOB potentially has some fundamental CPU cache hotness
> advantages over the other allocators, for the same reasons as
> its space advantages. It may be possible to make some workloads
> faster with SLOB than with SLUB! Maybe we could remove SLAB and
> SLUB then :)
It's all handwaving until there are actually benchmarks.
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-07-16 16:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-14 5:54 [PATCH] slob: reduce list scanning Matt Mackall
2007-07-16 6:01 ` Nick Piggin
2007-07-16 7:22 ` Pekka Enberg
2007-07-16 8:37 ` Nick Piggin
2007-07-16 16:49 ` Matt Mackall
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox