* [slub p3 1/7] slub: free slabs without holding locks (V2)
2011-08-01 16:28 [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 Christoph Lameter
@ 2011-08-01 16:28 ` Christoph Lameter
2011-08-01 16:28 ` [slub p3 2/7] slub: Remove useless statements in __slab_alloc Christoph Lameter
` (6 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-08-01 16:28 UTC (permalink / raw)
To: Pekka Enberg
Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-kernel
[-- Attachment #1: slub_free_wo_locks --]
[-- Type: text/plain, Size: 3062 bytes --]
There are two situations in which slub holds a lock while releasing
pages:
A. During kmem_cache_shrink()
B. During kmem_cache_close()
For A build a list while holding the lock and then release the pages
later. In case of B we are the last remaining user of the slab so
there is no need to take the listlock.
After this patch all calls to the page allocator to free pages are
done without holding any spinlocks. kmem_cache_destroy() will still
hold the slub_lock semaphore.
V1->V2. Remove kfree. Avoid locking in free_partial.
Signed-off-by: Christoph Lameter <cl@linux.com>
---
mm/slub.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2011-08-01 10:22:37.455874973 -0500
+++ linux-2.6/mm/slub.c 2011-08-01 10:24:38.525874198 -0500
@@ -2968,13 +2968,13 @@ static void list_slab_objects(struct kme
/*
* Attempt to free all partial slabs on a node.
+ * This is called from kmem_cache_close(). We must be the last thread
+ * using the cache and therefore we do not need to lock anymore.
*/
static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n)
{
- unsigned long flags;
struct page *page, *h;
- spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry_safe(page, h, &n->partial, lru) {
if (!page->inuse) {
remove_partial(n, page);
@@ -2984,7 +2984,6 @@ static void free_partial(struct kmem_cac
"Objects remaining on kmem_cache_close()");
}
}
- spin_unlock_irqrestore(&n->list_lock, flags);
}
/*
@@ -3018,6 +3017,7 @@ void kmem_cache_destroy(struct kmem_cach
s->refcount--;
if (!s->refcount) {
list_del(&s->list);
+ up_write(&slub_lock);
if (kmem_cache_close(s)) {
printk(KERN_ERR "SLUB %s: %s called for cache that "
"still has objects.\n", s->name, __func__);
@@ -3026,8 +3026,8 @@ void kmem_cache_destroy(struct kmem_cach
if (s->flags & SLAB_DESTROY_BY_RCU)
rcu_barrier();
sysfs_slab_remove(s);
- }
- up_write(&slub_lock);
+ } else
+ up_write(&slub_lock);
}
EXPORT_SYMBOL(kmem_cache_destroy);
@@ -3345,23 +3345,23 @@ int kmem_cache_shrink(struct kmem_cache
* list_lock. page->inuse here is the upper limit.
*/
list_for_each_entry_safe(page, t, &n->partial, lru) {
- if (!page->inuse) {
- remove_partial(n, page);
- discard_slab(s, page);
- } else {
- list_move(&page->lru,
- slabs_by_inuse + page->inuse);
- }
+ list_move(&page->lru, slabs_by_inuse + page->inuse);
+ if (!page->inuse)
+ n->nr_partial--;
}
/*
* Rebuild the partial list with the slabs filled up most
* first and the least used slabs at the end.
*/
- for (i = objects - 1; i >= 0; i--)
+ for (i = objects - 1; i > 0; i--)
list_splice(slabs_by_inuse + i, n->partial.prev);
spin_unlock_irqrestore(&n->list_lock, flags);
+
+ /* Release empty slabs */
+ list_for_each_entry_safe(page, t, slabs_by_inuse, lru)
+ discard_slab(s, page);
}
kfree(slabs_by_inuse);
^ permalink raw reply [flat|nested] 13+ messages in thread* [slub p3 2/7] slub: Remove useless statements in __slab_alloc
2011-08-01 16:28 [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 Christoph Lameter
2011-08-01 16:28 ` [slub p3 1/7] slub: free slabs without holding locks (V2) Christoph Lameter
@ 2011-08-01 16:28 ` Christoph Lameter
2011-08-01 16:28 ` [slub p3 3/7] slub: Prepare inuse field in new_slab() Christoph Lameter
` (5 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-08-01 16:28 UTC (permalink / raw)
To: Pekka Enberg
Cc: David Rientjes, torvalds, Andi Kleen, tj, Metathronius Galabant,
Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-kernel
[-- Attachment #1: remove_useless_page_null --]
[-- Type: text/plain, Size: 1155 bytes --]
Two statements in __slab_alloc() do not have any effect.
1. c->page is already set to NULL by deactivate_slab() called right before.
2. gfpflags are masked in new_slab() before being passed to the page
allocator. There is no need to mask gfpflags in __slab_alloc in particular
since most frequent processing in __slab_alloc does not require the use of a
gfpmask.
Cc: torvalds@linux-foundation.org
Signed-off-by: Christoph Lameter <cl@linux.com>
---
mm/slub.c | 4 ----
1 file changed, 4 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2011-08-01 11:03:15.000000000 -0500
+++ linux-2.6/mm/slub.c 2011-08-01 11:04:06.385859038 -0500
@@ -2064,9 +2064,6 @@ static void *__slab_alloc(struct kmem_ca
c = this_cpu_ptr(s->cpu_slab);
#endif
- /* We handle __GFP_ZERO in the caller */
- gfpflags &= ~__GFP_ZERO;
-
page = c->page;
if (!page)
goto new_slab;
@@ -2163,7 +2160,6 @@ debug:
c->freelist = get_freepointer(s, object);
deactivate_slab(s, c);
- c->page = NULL;
c->node = NUMA_NO_NODE;
local_irq_restore(flags);
return object;
^ permalink raw reply [flat|nested] 13+ messages in thread* [slub p3 3/7] slub: Prepare inuse field in new_slab()
2011-08-01 16:28 [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 Christoph Lameter
2011-08-01 16:28 ` [slub p3 1/7] slub: free slabs without holding locks (V2) Christoph Lameter
2011-08-01 16:28 ` [slub p3 2/7] slub: Remove useless statements in __slab_alloc Christoph Lameter
@ 2011-08-01 16:28 ` Christoph Lameter
2011-08-01 16:28 ` [slub p3 4/7] slub: pass kmem_cache_cpu pointer to get_partial() Christoph Lameter
` (4 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-08-01 16:28 UTC (permalink / raw)
To: Pekka Enberg
Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-kernel
[-- Attachment #1: new_slab --]
[-- Type: text/plain, Size: 1180 bytes --]
inuse will always be set to page->objects. There is no point in
initializing the field to zero in new_slab() and then overwriting
the value in __slab_alloc().
Signed-off-by: Christoph Lameter <cl@linux.com>
---
mm/slub.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2011-08-01 11:04:06.385859038 -0500
+++ linux-2.6/mm/slub.c 2011-08-01 11:04:26.025858912 -0500
@@ -1447,7 +1447,7 @@ static struct page *new_slab(struct kmem
set_freepointer(s, last, NULL);
page->freelist = start;
- page->inuse = 0;
+ page->inuse = page->objects;
page->frozen = 1;
out:
return page;
@@ -2139,7 +2139,6 @@ new_slab:
*/
object = page->freelist;
page->freelist = NULL;
- page->inuse = page->objects;
stat(s, ALLOC_SLAB);
c->node = page_to_nid(page);
@@ -2679,7 +2678,7 @@ static void early_kmem_cache_node_alloc(
n = page->freelist;
BUG_ON(!n);
page->freelist = get_freepointer(kmem_cache_node, n);
- page->inuse++;
+ page->inuse = 1;
page->frozen = 0;
kmem_cache_node->node[node] = n;
#ifdef CONFIG_SLUB_DEBUG
^ permalink raw reply [flat|nested] 13+ messages in thread* [slub p3 4/7] slub: pass kmem_cache_cpu pointer to get_partial()
2011-08-01 16:28 [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 Christoph Lameter
` (2 preceding siblings ...)
2011-08-01 16:28 ` [slub p3 3/7] slub: Prepare inuse field in new_slab() Christoph Lameter
@ 2011-08-01 16:28 ` Christoph Lameter
2011-08-01 16:28 ` [slub p3 5/7] slub: return object pointer from get_partial() / new_slab() Christoph Lameter
` (3 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-08-01 16:28 UTC (permalink / raw)
To: Pekka Enberg
Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-kernel
[-- Attachment #1: push_c_into_get_partial --]
[-- Type: text/plain, Size: 3456 bytes --]
Pass the kmem_cache_cpu pointer to get_partial(). That way
we can avoid the this_cpu_write() statements.
Signed-off-by: Christoph Lameter <cl@linux.com>
---
mm/slub.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2011-08-01 11:04:26.025858912 -0500
+++ linux-2.6/mm/slub.c 2011-08-01 11:04:29.985858887 -0500
@@ -1557,7 +1557,8 @@ static inline void remove_partial(struct
* Must hold list_lock.
*/
static inline int acquire_slab(struct kmem_cache *s,
- struct kmem_cache_node *n, struct page *page)
+ struct kmem_cache_node *n, struct page *page,
+ struct kmem_cache_cpu *c)
{
void *freelist;
unsigned long counters;
@@ -1586,9 +1587,9 @@ static inline int acquire_slab(struct km
if (freelist) {
/* Populate the per cpu freelist */
- this_cpu_write(s->cpu_slab->freelist, freelist);
- this_cpu_write(s->cpu_slab->page, page);
- this_cpu_write(s->cpu_slab->node, page_to_nid(page));
+ c->freelist = freelist;
+ c->page = page;
+ c->node = page_to_nid(page);
return 1;
} else {
/*
@@ -1606,7 +1607,7 @@ static inline int acquire_slab(struct km
* Try to allocate a partial slab from a specific node.
*/
static struct page *get_partial_node(struct kmem_cache *s,
- struct kmem_cache_node *n)
+ struct kmem_cache_node *n, struct kmem_cache_cpu *c)
{
struct page *page;
@@ -1621,7 +1622,7 @@ static struct page *get_partial_node(str
spin_lock(&n->list_lock);
list_for_each_entry(page, &n->partial, lru)
- if (acquire_slab(s, n, page))
+ if (acquire_slab(s, n, page, c))
goto out;
page = NULL;
out:
@@ -1632,7 +1633,8 @@ out:
/*
* Get a page from somewhere. Search in increasing NUMA distances.
*/
-static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags)
+static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags,
+ struct kmem_cache_cpu *c)
{
#ifdef CONFIG_NUMA
struct zonelist *zonelist;
@@ -1672,7 +1674,7 @@ static struct page *get_any_partial(stru
if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
n->nr_partial > s->min_partial) {
- page = get_partial_node(s, n);
+ page = get_partial_node(s, n, c);
if (page) {
put_mems_allowed();
return page;
@@ -1687,16 +1689,17 @@ static struct page *get_any_partial(stru
/*
* Get a partial page, lock it and return it.
*/
-static struct page *get_partial(struct kmem_cache *s, gfp_t flags, int node)
+static struct page *get_partial(struct kmem_cache *s, gfp_t flags, int node,
+ struct kmem_cache_cpu *c)
{
struct page *page;
int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
- page = get_partial_node(s, get_node(s, searchnode));
+ page = get_partial_node(s, get_node(s, searchnode), c);
if (page || node != NUMA_NO_NODE)
return page;
- return get_any_partial(s, flags);
+ return get_any_partial(s, flags, c);
}
#ifdef CONFIG_PREEMPT
@@ -1765,9 +1768,6 @@ void init_kmem_cache_cpus(struct kmem_ca
for_each_possible_cpu(cpu)
per_cpu_ptr(s->cpu_slab, cpu)->tid = init_tid(cpu);
}
-/*
- * Remove the cpu slab
- */
/*
* Remove the cpu slab
@@ -2116,7 +2116,7 @@ load_freelist:
return object;
new_slab:
- page = get_partial(s, gfpflags, node);
+ page = get_partial(s, gfpflags, node, c);
if (page) {
stat(s, ALLOC_FROM_PARTIAL);
object = c->freelist;
^ permalink raw reply [flat|nested] 13+ messages in thread* [slub p3 5/7] slub: return object pointer from get_partial() / new_slab().
2011-08-01 16:28 [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 Christoph Lameter
` (3 preceding siblings ...)
2011-08-01 16:28 ` [slub p3 4/7] slub: pass kmem_cache_cpu pointer to get_partial() Christoph Lameter
@ 2011-08-01 16:28 ` Christoph Lameter
2011-08-01 16:28 ` [slub p3 6/7] slub: per cpu cache for partial pages Christoph Lameter
` (2 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-08-01 16:28 UTC (permalink / raw)
To: Pekka Enberg
Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-kernel
[-- Attachment #1: object_instead_of_page_return --]
[-- Type: text/plain, Size: 7402 bytes --]
There is no need anymore to return the pointer to a slab page from get_partial()
since the page reference can be stored in the kmem_cache_cpu structures "page" field.
Return an object pointer instead.
That in turn allows a simplification of the spaghetti code in __slab_alloc().
Signed-off-by: Christoph Lameter <cl@linux.com>
---
mm/slub.c | 133 ++++++++++++++++++++++++++++++++++----------------------------
1 file changed, 73 insertions(+), 60 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2011-08-01 11:04:29.985858887 -0500
+++ linux-2.6/mm/slub.c 2011-08-01 11:04:33.755858864 -0500
@@ -1554,9 +1554,11 @@ static inline void remove_partial(struct
* Lock slab, remove from the partial list and put the object into the
* per cpu freelist.
*
+ * Returns a list of objects or NULL if it fails.
+ *
* Must hold list_lock.
*/
-static inline int acquire_slab(struct kmem_cache *s,
+static inline void *acquire_slab(struct kmem_cache *s,
struct kmem_cache_node *n, struct page *page,
struct kmem_cache_cpu *c)
{
@@ -1587,10 +1589,11 @@ static inline int acquire_slab(struct km
if (freelist) {
/* Populate the per cpu freelist */
- c->freelist = freelist;
c->page = page;
c->node = page_to_nid(page);
- return 1;
+ stat(s, ALLOC_FROM_PARTIAL);
+
+ return freelist;
} else {
/*
* Slab page came from the wrong list. No object to allocate
@@ -1599,17 +1602,18 @@ static inline int acquire_slab(struct km
*/
printk(KERN_ERR "SLUB: %s : Page without available objects on"
" partial list\n", s->name);
- return 0;
+ return NULL;
}
}
/*
* Try to allocate a partial slab from a specific node.
*/
-static struct page *get_partial_node(struct kmem_cache *s,
+static void *get_partial_node(struct kmem_cache *s,
struct kmem_cache_node *n, struct kmem_cache_cpu *c)
{
struct page *page;
+ void *object;
/*
* Racy check. If we mistakenly see no partial slabs then we
@@ -1621,13 +1625,15 @@ static struct page *get_partial_node(str
return NULL;
spin_lock(&n->list_lock);
- list_for_each_entry(page, &n->partial, lru)
- if (acquire_slab(s, n, page, c))
+ list_for_each_entry(page, &n->partial, lru) {
+ object = acquire_slab(s, n, page, c);
+ if (object)
goto out;
- page = NULL;
+ }
+ object = NULL;
out:
spin_unlock(&n->list_lock);
- return page;
+ return object;
}
/*
@@ -1641,7 +1647,7 @@ static struct page *get_any_partial(stru
struct zoneref *z;
struct zone *zone;
enum zone_type high_zoneidx = gfp_zone(flags);
- struct page *page;
+ void *object;
/*
* The defrag ratio allows a configuration of the tradeoffs between
@@ -1674,10 +1680,10 @@ static struct page *get_any_partial(stru
if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
n->nr_partial > s->min_partial) {
- page = get_partial_node(s, n, c);
- if (page) {
+ object = get_partial_node(s, n, c);
+ if (object) {
put_mems_allowed();
- return page;
+ return object;
}
}
}
@@ -1689,15 +1695,15 @@ static struct page *get_any_partial(stru
/*
* Get a partial page, lock it and return it.
*/
-static struct page *get_partial(struct kmem_cache *s, gfp_t flags, int node,
+static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
struct kmem_cache_cpu *c)
{
- struct page *page;
+ void *object;
int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
- page = get_partial_node(s, get_node(s, searchnode), c);
- if (page || node != NUMA_NO_NODE)
- return page;
+ object = get_partial_node(s, get_node(s, searchnode), c);
+ if (object || node != NUMA_NO_NODE)
+ return object;
return get_any_partial(s, flags, c);
}
@@ -2027,6 +2033,35 @@ slab_out_of_memory(struct kmem_cache *s,
}
}
+static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
+ int node, struct kmem_cache_cpu **pc)
+{
+ void *object;
+ struct kmem_cache_cpu *c;
+ struct page *page = new_slab(s, flags, node);
+
+ if (page) {
+ c = __this_cpu_ptr(s->cpu_slab);
+ if (c->page)
+ flush_slab(s, c);
+
+ /*
+ * No other reference to the page yet so we can
+ * muck around with it freely without cmpxchg
+ */
+ object = page->freelist;
+ page->freelist = NULL;
+
+ stat(s, ALLOC_SLAB);
+ c->node = page_to_nid(page);
+ c->page = page;
+ *pc = c;
+ } else
+ object = NULL;
+
+ return object;
+}
+
/*
* Slow path. The lockless freelist is empty or we need to perform
* debugging duties.
@@ -2049,7 +2084,6 @@ static void *__slab_alloc(struct kmem_ca
unsigned long addr, struct kmem_cache_cpu *c)
{
void **object;
- struct page *page;
unsigned long flags;
struct page new;
unsigned long counters;
@@ -2064,8 +2098,7 @@ static void *__slab_alloc(struct kmem_ca
c = this_cpu_ptr(s->cpu_slab);
#endif
- page = c->page;
- if (!page)
+ if (!c->page)
goto new_slab;
if (unlikely(!node_match(c, node))) {
@@ -2077,8 +2110,8 @@ static void *__slab_alloc(struct kmem_ca
stat(s, ALLOC_SLOWPATH);
do {
- object = page->freelist;
- counters = page->counters;
+ object = c->page->freelist;
+ counters = c->page->counters;
new.counters = counters;
VM_BUG_ON(!new.frozen);
@@ -2090,12 +2123,12 @@ static void *__slab_alloc(struct kmem_ca
*
* If there are objects left then we retrieve them
* and use them to refill the per cpu queue.
- */
+ */
- new.inuse = page->objects;
+ new.inuse = c->page->objects;
new.frozen = object != NULL;
- } while (!__cmpxchg_double_slab(s, page,
+ } while (!__cmpxchg_double_slab(s, c->page,
object, counters,
NULL, new.counters,
"__slab_alloc"));
@@ -2109,53 +2142,33 @@ static void *__slab_alloc(struct kmem_ca
stat(s, ALLOC_REFILL);
load_freelist:
- VM_BUG_ON(!page->frozen);
c->freelist = get_freepointer(s, object);
c->tid = next_tid(c->tid);
local_irq_restore(flags);
return object;
new_slab:
- page = get_partial(s, gfpflags, node, c);
- if (page) {
- stat(s, ALLOC_FROM_PARTIAL);
- object = c->freelist;
+ object = get_partial(s, gfpflags, node, c);
- if (kmem_cache_debug(s))
- goto debug;
- goto load_freelist;
- }
+ if (unlikely(!object)) {
- page = new_slab(s, gfpflags, node);
+ object = new_slab_objects(s, gfpflags, node, &c);
- if (page) {
- c = __this_cpu_ptr(s->cpu_slab);
- if (c->page)
- flush_slab(s, c);
+ if (unlikely(!object)) {
+ if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
+ slab_out_of_memory(s, gfpflags, node);
- /*
- * No other reference to the page yet so we can
- * muck around with it freely without cmpxchg
- */
- object = page->freelist;
- page->freelist = NULL;
-
- stat(s, ALLOC_SLAB);
- c->node = page_to_nid(page);
- c->page = page;
+ local_irq_restore(flags);
+ return NULL;
+ }
+ }
- if (kmem_cache_debug(s))
- goto debug;
+ if (likely(!kmem_cache_debug(s)))
goto load_freelist;
- }
- if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
- slab_out_of_memory(s, gfpflags, node);
- local_irq_restore(flags);
- return NULL;
-debug:
- if (!object || !alloc_debug_processing(s, page, object, addr))
- goto new_slab;
+ /* Only entered in the debug case */
+ if (!alloc_debug_processing(s, c->page, object, addr))
+ goto new_slab; /* Slab failed checks. Next slab needed */
c->freelist = get_freepointer(s, object);
deactivate_slab(s, c);
^ permalink raw reply [flat|nested] 13+ messages in thread* [slub p3 6/7] slub: per cpu cache for partial pages
2011-08-01 16:28 [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 Christoph Lameter
` (4 preceding siblings ...)
2011-08-01 16:28 ` [slub p3 5/7] slub: return object pointer from get_partial() / new_slab() Christoph Lameter
@ 2011-08-01 16:28 ` Christoph Lameter
2011-08-02 17:24 ` Christoph Lameter
2011-08-01 16:28 ` [slub p3 7/7] slub: update slabinfo tools to report per cpu partial list statistics Christoph Lameter
2011-08-02 4:15 ` [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 David Rientjes
7 siblings, 1 reply; 13+ messages in thread
From: Christoph Lameter @ 2011-08-01 16:28 UTC (permalink / raw)
To: Pekka Enberg
Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-kernel
[-- Attachment #1: per_cpu_partial --]
[-- Type: text/plain, Size: 17339 bytes --]
Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to
partial pages. The partial page list is used in slab_free() to avoid
per node lock taking.
In __slab_alloc() we can then take multiple partial pages off the per
node partial list in one go reducing node lock pressure.
We can also use the per cpu partial list in slab_alloc() to avoid scanning
partial lists for pages with free objects.
The main effect of a per cpu partial list is that the per node list_lock
is taken for batches of partial pages instead of individual ones.
This is only a first stab at this. There are some limitations:
1. We have to scan through an percpu array of page pointers. That is fast
since we stick to a cacheline size.
2. The "unfreeze()" function should have common code with deactivate_slab().
Maybe those can be unified.
Future enhancements:
1. The pickup from the partial list could be perhaps be done without disabling
interrupts with some work. The free path already puts the page into the
per cpu partial list without disabling interrupts.
2. The __slab_free() likely has some code path that are unnecessary now or
where code is duplicated.
3. We dump all partials if the per cpu array overflows. There must be some other
better algorithm.
Performance:
Before After
./hackbench 100 process 200000
Time: 2299.072 1742.454
./hackbench 100 process 20000
Time: 224.654 182.393
./hackbench 100 process 20000
Time: 227.126 182.780
./hackbench 100 process 20000
Time: 219.608 182.899
./hackbench 10 process 20000
Time: 21.769 18.756
./hackbench 10 process 20000
Time: 21.657 18.938
./hackbench 10 process 20000
Time: 23.193 19.537
./hackbench 1 process 20000
Time: 2.337 2.263
./hackbench 1 process 20000
Time: 2.223 2.271
./hackbench 1 process 20000
Time: 2.269 2.301
Signed-off-by: Christoph Lameter <cl@linux.com>
---
include/linux/slub_def.h | 4
mm/slub.c | 347 +++++++++++++++++++++++++++++++++++++++--------
2 files changed, 294 insertions(+), 57 deletions(-)
Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h 2011-08-01 11:03:01.405859454 -0500
+++ linux-2.6/include/linux/slub_def.h 2011-08-01 11:04:39.905858823 -0500
@@ -36,6 +36,8 @@ enum stat_item {
ORDER_FALLBACK, /* Number of times fallback was necessary */
CMPXCHG_DOUBLE_CPU_FAIL,/* Failure of this_cpu_cmpxchg_double */
CMPXCHG_DOUBLE_FAIL, /* Number of times that cmpxchg double did not match */
+ CPU_PARTIAL_ALLOC, /* Used cpu partial on alloc */
+ CPU_PARTIAL_FREE, /* USed cpu partial on free */
NR_SLUB_STAT_ITEMS };
struct kmem_cache_cpu {
@@ -46,6 +48,7 @@ struct kmem_cache_cpu {
#ifdef CONFIG_SLUB_STATS
unsigned stat[NR_SLUB_STAT_ITEMS];
#endif
+ struct page *partial[]; /* Partially allocated frozen slabs */
};
struct kmem_cache_node {
@@ -79,6 +82,7 @@ struct kmem_cache {
int size; /* The size of an object including meta data */
int objsize; /* The size of an object without meta data */
int offset; /* Free pointer offset. */
+ int cpu_partial; /* Number of per cpu partial pages to keep around */
struct kmem_cache_order_objects oo;
/* Allocation and freeing of slabs */
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2011-08-01 11:04:33.755858864 -0500
+++ linux-2.6/mm/slub.c 2011-08-01 11:04:39.915858823 -0500
@@ -1560,7 +1560,7 @@ static inline void remove_partial(struct
*/
static inline void *acquire_slab(struct kmem_cache *s,
struct kmem_cache_node *n, struct page *page,
- struct kmem_cache_cpu *c)
+ int mode)
{
void *freelist;
unsigned long counters;
@@ -1575,7 +1575,8 @@ static inline void *acquire_slab(struct
freelist = page->freelist;
counters = page->counters;
new.counters = counters;
- new.inuse = page->objects;
+ if (mode)
+ new.inuse = page->objects;
VM_BUG_ON(new.frozen);
new.frozen = 1;
@@ -1586,24 +1587,7 @@ static inline void *acquire_slab(struct
"lock and freeze"));
remove_partial(n, page);
-
- if (freelist) {
- /* Populate the per cpu freelist */
- c->page = page;
- c->node = page_to_nid(page);
- stat(s, ALLOC_FROM_PARTIAL);
-
- return freelist;
- } else {
- /*
- * Slab page came from the wrong list. No object to allocate
- * from. Put it onto the correct list and continue partial
- * scan.
- */
- printk(KERN_ERR "SLUB: %s : Page without available objects on"
- " partial list\n", s->name);
- return NULL;
- }
+ return freelist;
}
/*
@@ -1612,8 +1596,9 @@ static inline void *acquire_slab(struct
static void *get_partial_node(struct kmem_cache *s,
struct kmem_cache_node *n, struct kmem_cache_cpu *c)
{
- struct page *page;
- void *object;
+ struct page *page, *page2;
+ void *object = NULL;
+ int count = 0;
/*
* Racy check. If we mistakenly see no partial slabs then we
@@ -1625,13 +1610,26 @@ static void *get_partial_node(struct kme
return NULL;
spin_lock(&n->list_lock);
- list_for_each_entry(page, &n->partial, lru) {
- object = acquire_slab(s, n, page, c);
- if (object)
- goto out;
+ list_for_each_entry_safe(page, page2, &n->partial, lru) {
+ void *t = acquire_slab(s, n, page, count == 0);
+
+ if (!t)
+ break;
+
+ if (!count) {
+ c->page = page;
+ c->node = page_to_nid(page);
+ stat(s, ALLOC_FROM_PARTIAL);
+ count++;
+ object = t;
+ } else {
+ c->partial[count++] = page;
+ page->freelist = t;
+ }
+
+ if (count >= s->cpu_partial / 2)
+ break;
}
- object = NULL;
-out:
spin_unlock(&n->list_lock);
return object;
}
@@ -1926,6 +1924,142 @@ redo:
}
}
+/*
+ * Unfreeze a page. Page cannot be full. May be empty. If n is passed then the list lock on that
+ * node was taken. The functions return the pointer to the list_lock that was eventually taken in
+ * this function.
+ *
+ * Races are limited to concurrency with __slab_free since the page is frozen and it is not the
+ * current slab used for allocation. Meaning that the number of free objects in a slab may increase
+ * but not decrease.
+ */
+struct kmem_cache_node *unfreeze(struct kmem_cache *s, struct page *page, struct kmem_cache_node *n)
+{
+ enum slab_modes { M_PARTIAL, M_FREE };
+ enum slab_modes l = M_FREE, m = M_FREE;
+ struct page new;
+ struct page old;
+
+ do {
+
+ old.freelist = page->freelist;
+ old.counters = page->counters;
+ VM_BUG_ON(!old.frozen);
+
+ new.counters = old.counters;
+ new.freelist = old.freelist;
+
+ new.frozen = 0;
+
+ if (!new.inuse && (!n || n->nr_partial < s->min_partial))
+ m = M_FREE;
+ else {
+ struct kmem_cache_node *n2 = get_node(s, page_to_nid(page));
+
+ m = M_PARTIAL;
+ if (n != n2) {
+ if (n)
+ spin_unlock(&n->list_lock);
+
+ n = n2;
+ spin_lock(&n->list_lock);
+ }
+ }
+
+ if (l != m) {
+ if (l == M_PARTIAL)
+ remove_partial(n, page);
+ else
+ add_partial(n, page, 1);
+
+ l = m;
+ }
+
+ } while (!cmpxchg_double_slab(s, page,
+ old.freelist, old.counters,
+ new.freelist, new.counters,
+ "unfreezing slab"));
+
+ if (m == M_FREE) {
+ stat(s, DEACTIVATE_EMPTY);
+ discard_slab(s, page);
+ stat(s, FREE_SLAB);
+ }
+ return n;
+}
+
+/* Unfreeze all the cpu partial slabs */
+static void unfreeze_partials(struct kmem_cache *s, struct page *page)
+{
+ int i;
+ struct kmem_cache_node *n = NULL;
+
+ if (page)
+ n = unfreeze(s, page, NULL);
+
+ for (i = 0; i < s->cpu_partial; i++) {
+ page = this_cpu_read(s->cpu_slab->partial[i]);
+
+ if (page) {
+ this_cpu_write(s->cpu_slab->partial[i], NULL);
+ n = unfreeze(s, page, n);
+ }
+
+ }
+
+ if (n)
+ spin_unlock(&n->list_lock);
+}
+
+/*
+ * Put a page that was just frozen (in __slab_free) into a partial page
+ * slot if available. This is done without interrupts disabled and without
+ * preemption disabled. The cmpxchg is racy and may put the partial page
+ * onto a random cpus partial slot.
+ *
+ * If we did not find a slot then simply move all the partials to the
+ * per node partial list.
+ */
+static inline void put_cpu_partial(struct kmem_cache *s, struct page *page)
+{
+ int i;
+ unsigned long flags;
+
+ for (i = 0; i < s->cpu_partial; i++)
+ if (this_cpu_cmpxchg(s->cpu_slab->partial[i], NULL, page) == NULL) {
+ stat(s, CPU_PARTIAL_FREE);
+ return;
+ }
+
+ /*
+ * partial array is full. Move them all (including the one we
+ * just froze) to the per node partial list.
+ */
+ local_irq_save(flags);
+ unfreeze_partials(s, page);
+ local_irq_restore(flags);
+}
+
+/*
+ * Retrieve a page from the per cpu partial slab list. This is done with
+ * interrupts disabled and therefore we can avoid the use of this cpu ops.
+ */
+static inline int get_cpu_partial(struct kmem_cache *s, struct kmem_cache_cpu *c)
+{
+ int i;
+
+ for (i = 0; i < s->cpu_partial; i++)
+ if (c->partial[i]) {
+ c->page = c->partial[i];
+ c->freelist = NULL;
+ c->partial[i] = NULL;
+ c->node = page_to_nid(c->page);
+ stat(s, CPU_PARTIAL_ALLOC);
+ return 1;
+ }
+ return 0;
+}
+
static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
{
stat(s, CPUSLAB_FLUSH);
@@ -1941,8 +2075,12 @@ static inline void __flush_cpu_slab(stru
{
struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
- if (likely(c && c->page))
- flush_slab(s, c);
+ if (likely(c)) {
+ if (c->page)
+ flush_slab(s, c);
+
+ unfreeze_partials(s, NULL);
+ }
}
static void flush_cpu_slab(void *d)
@@ -2066,8 +2204,6 @@ static inline void *new_slab_objects(str
* Slow path. The lockless freelist is empty or we need to perform
* debugging duties.
*
- * Interrupts are disabled.
- *
* Processing is still very fast if new objects have been freed to the
* regular freelist. In that case we simply take over the regular freelist
* as the lockless freelist and zap the regular freelist.
@@ -2100,7 +2236,7 @@ static void *__slab_alloc(struct kmem_ca
if (!c->page)
goto new_slab;
-
+redo:
if (unlikely(!node_match(c, node))) {
stat(s, ALLOC_NODE_MISMATCH);
deactivate_slab(s, c);
@@ -2133,7 +2269,7 @@ static void *__slab_alloc(struct kmem_ca
NULL, new.counters,
"__slab_alloc"));
- if (unlikely(!object)) {
+ if (!object) {
c->page = NULL;
stat(s, DEACTIVATE_BYPASS);
goto new_slab;
@@ -2148,6 +2284,11 @@ load_freelist:
return object;
new_slab:
+
+ if (get_cpu_partial(s, c))
+ goto redo;
+
+ /* Then do expensive stuff like retrieving pages from the partial lists */
object = get_partial(s, gfpflags, node, c);
if (unlikely(!object)) {
@@ -2341,16 +2482,29 @@ static void __slab_free(struct kmem_cach
was_frozen = new.frozen;
new.inuse--;
if ((!new.inuse || !prior) && !was_frozen && !n) {
- n = get_node(s, page_to_nid(page));
- /*
- * Speculatively acquire the list_lock.
- * If the cmpxchg does not succeed then we may
- * drop the list_lock without any processing.
- *
- * Otherwise the list_lock will synchronize with
- * other processors updating the list of slabs.
- */
- spin_lock_irqsave(&n->list_lock, flags);
+
+ if (!kmem_cache_debug(s) && !prior)
+
+ /*
+ * Slab was on no list before and will be partially empty
+ * We can defer the list move and instead freeze it.
+ */
+ new.frozen = 1;
+
+ else { /* Needs to be taken off a list */
+
+ n = get_node(s, page_to_nid(page));
+ /*
+ * Speculatively acquire the list_lock.
+ * If the cmpxchg does not succeed then we may
+ * drop the list_lock without any processing.
+ *
+ * Otherwise the list_lock will synchronize with
+ * other processors updating the list of slabs.
+ */
+ spin_lock_irqsave(&n->list_lock, flags);
+
+ }
}
inuse = new.inuse;
@@ -2360,7 +2514,15 @@ static void __slab_free(struct kmem_cach
"__slab_free"));
if (likely(!n)) {
- /*
+
+ /*
+ * If we just froze the page then put it onto the
+ * per cpu partial list.
+ */
+ if (new.frozen && !was_frozen)
+ put_cpu_partial(s, page);
+
+ /*
* The list lock was not taken therefore no list
* activity can be necessary.
*/
@@ -2427,7 +2589,6 @@ static __always_inline void slab_free(st
slab_free_hook(s, x);
redo:
-
/*
* Determine the currently cpus per cpu slab.
* The cpu may change afterward. However that does not matter since
@@ -2642,6 +2803,9 @@ init_kmem_cache_node(struct kmem_cache_n
static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
{
+ int size = sizeof(struct kmem_cache_cpu) + s->cpu_partial * sizeof(void *);
+ int align = 2 * sizeof(void *);
+
BUILD_BUG_ON(PERCPU_DYNAMIC_EARLY_SIZE <
SLUB_PAGE_SHIFT * sizeof(struct kmem_cache_cpu));
@@ -2649,9 +2813,7 @@ static inline int alloc_kmem_cache_cpus(
* Must align to double word boundary for the double cmpxchg
* instructions to work; see __pcpu_double_call_return_bool().
*/
- s->cpu_slab = __alloc_percpu(sizeof(struct kmem_cache_cpu),
- 2 * sizeof(void *));
-
+ s->cpu_slab = __alloc_percpu(size, align);
if (!s->cpu_slab)
return 0;
@@ -2917,7 +3079,16 @@ static int kmem_cache_open(struct kmem_c
* The larger the object size is, the more pages we want on the partial
* list to avoid pounding the page allocator excessively.
*/
- set_min_partial(s, ilog2(s->size));
+ set_min_partial(s, ilog2(s->size) / 2);
+
+ /* Try to fit partial page pointers into the same cacheline */
+ s->cpu_partial = min_t(int, (cache_line_size() -
+ sizeof(struct kmem_cache_cpu)) / sizeof(void *),
+ s->min_partial / 2);
+ if (s->cpu_partial < 2)
+ /* Less than two partial page pointers fit in so give up */
+ s->cpu_partial = s->min_partial / 2;
+
s->refcount = 1;
#ifdef CONFIG_NUMA
s->remote_node_defrag_ratio = 1000;
@@ -4306,12 +4477,28 @@ enum slab_stat_type {
#define SO_OBJECTS (1 << SL_OBJECTS)
#define SO_TOTAL (1 << SL_TOTAL)
+/* Determine the count of objects in a page */
+static int obj_count(struct page *page, unsigned long flags)
+{
+ if (!page)
+ return 0;
+
+ if (flags & SO_TOTAL)
+ return page->objects;
+
+ if (flags & SO_OBJECTS)
+ return page->inuse;
+
+ return 1;
+}
+
static ssize_t show_slab_objects(struct kmem_cache *s,
char *buf, unsigned long flags)
{
unsigned long total = 0;
int node;
int x;
+ int i;
unsigned long *nodes;
unsigned long *per_cpu;
@@ -4330,13 +4517,12 @@ static ssize_t show_slab_objects(struct
continue;
if (c->page) {
- if (flags & SO_TOTAL)
- x = c->page->objects;
- else if (flags & SO_OBJECTS)
- x = c->page->inuse;
- else
- x = 1;
-
+ x = obj_count(c->page, flags);
+ total += x;
+ nodes[c->node] += x;
+ }
+ for (i = 0; i < s->cpu_partial; i++) {
+ x = obj_count(c->partial[i], flags);
total += x;
nodes[c->node] += x;
}
@@ -4491,6 +4677,12 @@ static ssize_t min_partial_store(struct
}
SLAB_ATTR(min_partial);
+static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
+{
+ return sprintf(buf, "%u\n", s->cpu_partial);
+}
+SLAB_ATTR_RO(cpu_partial);
+
static ssize_t ctor_show(struct kmem_cache *s, char *buf)
{
if (!s->ctor)
@@ -4529,6 +4721,41 @@ static ssize_t objects_partial_show(stru
}
SLAB_ATTR_RO(objects_partial);
+static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
+{
+ unsigned long sum = 0;
+ int cpu;
+ int len;
+ int *data = kmalloc(nr_cpu_ids * sizeof(int), GFP_KERNEL);
+
+ if (!data)
+ return -ENOMEM;
+
+ for_each_online_cpu(cpu) {
+ unsigned x = 0;
+ int i;
+
+ for (i = 0; i < s->cpu_partial; i++)
+ if (per_cpu_ptr(s->cpu_slab, cpu)->partial[i])
+ x++;
+
+ data[cpu] = x;
+ sum += x;
+ }
+
+ len = sprintf(buf, "%lu", sum);
+
+#ifdef CONFIG_SMP
+ for_each_online_cpu(cpu) {
+ if (data[cpu] && len < PAGE_SIZE - 20)
+ len += sprintf(buf + len, " C%d=%u", cpu, data[cpu]);
+ }
+#endif
+ kfree(data);
+ return len + sprintf(buf + len, "\n");
+}
+SLAB_ATTR_RO(slabs_cpu_partial);
+
static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
{
return sprintf(buf, "%d\n", !!(s->flags & SLAB_RECLAIM_ACCOUNT));
@@ -4851,6 +5078,8 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
STAT_ATTR(ORDER_FALLBACK, order_fallback);
STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
+STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
+STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
#endif
static struct attribute *slab_attrs[] = {
@@ -4859,6 +5088,7 @@ static struct attribute *slab_attrs[] =
&objs_per_slab_attr.attr,
&order_attr.attr,
&min_partial_attr.attr,
+ &cpu_partial_attr.attr,
&objects_attr.attr,
&objects_partial_attr.attr,
&partial_attr.attr,
@@ -4871,6 +5101,7 @@ static struct attribute *slab_attrs[] =
&destroy_by_rcu_attr.attr,
&shrink_attr.attr,
&reserved_attr.attr,
+ &slabs_cpu_partial_attr.attr,
#ifdef CONFIG_SLUB_DEBUG
&total_objects_attr.attr,
&slabs_attr.attr,
@@ -4912,6 +5143,8 @@ static struct attribute *slab_attrs[] =
&order_fallback_attr.attr,
&cmpxchg_double_fail_attr.attr,
&cmpxchg_double_cpu_fail_attr.attr,
+ &cpu_partial_alloc_attr.attr,
+ &cpu_partial_free_attr.attr,
#endif
#ifdef CONFIG_FAILSLAB
&failslab_attr.attr,
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [slub p3 6/7] slub: per cpu cache for partial pages
2011-08-01 16:28 ` [slub p3 6/7] slub: per cpu cache for partial pages Christoph Lameter
@ 2011-08-02 17:24 ` Christoph Lameter
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-08-02 17:24 UTC (permalink / raw)
To: Pekka Enberg
Cc: David Rientjes, Andi Kleen, Tejun Heo, Metathronius Galabant,
Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-kernel
New revision of the patch. Allows dynamic configuration of the number of
objects to be kept on the per cpu partial lists (this means that the
number of partial slabs may vary based on the ratio of object availability
in those partial slabs).
This also allows us to avoid a loop over an array of page pointers in the
hotpaths. Which should improve performance.
Subject: slub: per cpu cache for partial pages
Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to
partial pages. The partial page list is used in slab_free() to avoid
per node lock taking.
In __slab_alloc() we can then take multiple partial pages off the per
node partial list in one go reducing node lock pressure.
We can also use the per cpu partial list in slab_alloc() to avoid scanning
partial lists for pages with free objects.
The main effect of a per cpu partial list is that the per node list_lock
is taken for batches of partial pages instead of individual ones.
1. The "unfreeze()" function should have common code with
deactivate_slab(). Maybe those can be unified.
Future enhancements:
1. The pickup from the partial list could be perhaps be done without disabling
interrupts with some work. The free path already puts the page into the
per cpu partial list without disabling interrupts.
2. The __slab_free() likely has some code path that are unnecessary now or
where code is duplicated.
3. We dump all partials if the per cpu array overflows. There must be some other
better algorithm.
Signed-off-by: Christoph Lameter <cl@linux.com>
---
include/linux/mm_types.h | 9 +
include/linux/slub_def.h | 4
mm/slub.c | 333 ++++++++++++++++++++++++++++++++++++++++-------
3 files changed, 298 insertions(+), 48 deletions(-)
Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h 2011-08-02 12:07:33.565281487 -0500
+++ linux-2.6/include/linux/slub_def.h 2011-08-02 12:07:35.225281476 -0500
@@ -36,12 +36,15 @@ enum stat_item {
ORDER_FALLBACK, /* Number of times fallback was necessary */
CMPXCHG_DOUBLE_CPU_FAIL,/* Failure of this_cpu_cmpxchg_double */
CMPXCHG_DOUBLE_FAIL, /* Number of times that cmpxchg double did not match */
+ CPU_PARTIAL_ALLOC, /* Used cpu partial on alloc */
+ CPU_PARTIAL_FREE, /* USed cpu partial on free */
NR_SLUB_STAT_ITEMS };
struct kmem_cache_cpu {
void **freelist; /* Pointer to next available object */
unsigned long tid; /* Globally unique transaction id */
struct page *page; /* The slab from which we are allocating */
+ struct page *partial; /* Partially allocated frozen slabs */
int node; /* The node of the page (or -1 for debug) */
#ifdef CONFIG_SLUB_STATS
unsigned stat[NR_SLUB_STAT_ITEMS];
@@ -79,6 +82,7 @@ struct kmem_cache {
int size; /* The size of an object including meta data */
int objsize; /* The size of an object without meta data */
int offset; /* Free pointer offset. */
+ int cpu_partial; /* Number of per cpu partial pages to keep around */
struct kmem_cache_order_objects oo;
/* Allocation and freeing of slabs */
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2011-08-02 12:07:33.585281487 -0500
+++ linux-2.6/mm/slub.c 2011-08-02 12:13:11.345279324 -0500
@@ -1560,7 +1560,7 @@ static inline void remove_partial(struct
*/
static inline void *acquire_slab(struct kmem_cache *s,
struct kmem_cache_node *n, struct page *page,
- struct kmem_cache_cpu *c)
+ int mode)
{
void *freelist;
unsigned long counters;
@@ -1575,7 +1575,8 @@ static inline void *acquire_slab(struct
freelist = page->freelist;
counters = page->counters;
new.counters = counters;
- new.inuse = page->objects;
+ if (mode)
+ new.inuse = page->objects;
VM_BUG_ON(new.frozen);
new.frozen = 1;
@@ -1586,34 +1587,20 @@ static inline void *acquire_slab(struct
"lock and freeze"));
remove_partial(n, page);
-
- if (freelist) {
- /* Populate the per cpu freelist */
- c->page = page;
- c->node = page_to_nid(page);
- stat(s, ALLOC_FROM_PARTIAL);
-
- return freelist;
- } else {
- /*
- * Slab page came from the wrong list. No object to allocate
- * from. Put it onto the correct list and continue partial
- * scan.
- */
- printk(KERN_ERR "SLUB: %s : Page without available objects on"
- " partial list\n", s->name);
- return NULL;
- }
+ return freelist;
}
+static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
+
/*
* Try to allocate a partial slab from a specific node.
*/
static void *get_partial_node(struct kmem_cache *s,
struct kmem_cache_node *n, struct kmem_cache_cpu *c)
{
- struct page *page;
- void *object;
+ struct page *page, *page2;
+ void *object = NULL;
+ int count = 0;
/*
* Racy check. If we mistakenly see no partial slabs then we
@@ -1625,13 +1612,28 @@ static void *get_partial_node(struct kme
return NULL;
spin_lock(&n->list_lock);
- list_for_each_entry(page, &n->partial, lru) {
- object = acquire_slab(s, n, page, c);
- if (object)
- goto out;
+ list_for_each_entry_safe(page, page2, &n->partial, lru) {
+ void *t = acquire_slab(s, n, page, count == 0);
+ int available;
+
+ if (!t)
+ break;
+
+ if (!count) {
+ c->page = page;
+ c->node = page_to_nid(page);
+ stat(s, ALLOC_FROM_PARTIAL);
+ count++;
+ object = t;
+ available = page->objects - page->inuse;
+ } else {
+ page->freelist = t;
+ available = put_cpu_partial(s, page, 0);
+ }
+ if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
+ break;
+
}
- object = NULL;
-out:
spin_unlock(&n->list_lock);
return object;
}
@@ -1926,6 +1928,135 @@ redo:
}
}
+/*
+ * Unfreeze a page. Page cannot be full. May be empty. If n is passed then the list lock on that
+ * node was taken. The functions return the pointer to the list_lock that was eventually taken in
+ * this function.
+ *
+ * Races are limited to concurrency with __slab_free since the page is frozen and it is not the
+ * current slab used for allocation. Meaning that the number of free objects in a slab may increase
+ * but not decrease.
+ */
+struct kmem_cache_node *unfreeze(struct kmem_cache *s, struct page *page, struct kmem_cache_node *n)
+{
+ enum slab_modes { M_PARTIAL, M_FREE };
+ enum slab_modes l = M_FREE, m = M_FREE;
+ struct page new;
+ struct page old;
+
+ do {
+
+ old.freelist = page->freelist;
+ old.counters = page->counters;
+ VM_BUG_ON(!old.frozen);
+
+ new.counters = old.counters;
+ new.freelist = old.freelist;
+
+ new.frozen = 0;
+
+ if (!new.inuse && (!n || n->nr_partial < s->min_partial))
+ m = M_FREE;
+ else {
+ struct kmem_cache_node *n2 = get_node(s, page_to_nid(page));
+
+ m = M_PARTIAL;
+ if (n != n2) {
+ if (n)
+ spin_unlock(&n->list_lock);
+
+ n = n2;
+ spin_lock(&n->list_lock);
+ }
+ }
+
+ if (l != m) {
+ if (l == M_PARTIAL)
+ remove_partial(n, page);
+ else
+ add_partial(n, page, 1);
+
+ l = m;
+ }
+
+ } while (!cmpxchg_double_slab(s, page,
+ old.freelist, old.counters,
+ new.freelist, new.counters,
+ "unfreezing slab"));
+
+ if (m == M_FREE) {
+ stat(s, DEACTIVATE_EMPTY);
+ discard_slab(s, page);
+ stat(s, FREE_SLAB);
+ }
+ return n;
+}
+
+/* Unfreeze all the cpu partial slabs */
+static void unfreeze_partials(struct kmem_cache *s)
+{
+ struct kmem_cache_node *n = NULL;
+ struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
+ struct page *page;
+
+ while ((page = c->partial)) {
+ c->partial = page->next;
+ n = unfreeze(s, page, n);
+ }
+
+ if (n)
+ spin_unlock(&n->list_lock);
+}
+
+/*
+ * Put a page that was just frozen (in __slab_free) into a partial page
+ * slot if available. This is done without interrupts disabled and without
+ * preemption disabled. The cmpxchg is racy and may put the partial page
+ * onto a random cpus partial slot.
+ *
+ * If we did not find a slot then simply move all the partials to the
+ * per node partial list.
+ */
+int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
+{
+ struct page *oldpage;
+ int pages;
+ int pobjects;
+
+ do {
+ pages = 0;
+ pobjects = 0;
+ oldpage = this_cpu_read(s->cpu_slab->partial);
+
+ if (oldpage) {
+ pobjects = oldpage->pobjects;
+ pages = oldpage->pages;
+ if (drain && pobjects > s->cpu_partial) {
+ unsigned long flags;
+ /*
+ * partial array is full. Move the existing
+ * set to the per node partial list.
+ */
+ local_irq_save(flags);
+ unfreeze_partials(s);
+ local_irq_restore(flags);
+ pobjects = 0;
+ pages = 0;
+ }
+ }
+
+ pages++;
+ pobjects += page->objects - page->inuse;
+
+ page->pages = pages;
+ page->pobjects = pobjects;
+ page->next = oldpage;
+
+ } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
+ stat(s, CPU_PARTIAL_FREE);
+ return pobjects;
+}
+
static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
{
stat(s, CPUSLAB_FLUSH);
@@ -1941,8 +2072,12 @@ static inline void __flush_cpu_slab(stru
{
struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
- if (likely(c && c->page))
- flush_slab(s, c);
+ if (likely(c)) {
+ if (c->page)
+ flush_slab(s, c);
+
+ unfreeze_partials(s);
+ }
}
static void flush_cpu_slab(void *d)
@@ -2066,8 +2201,6 @@ static inline void *new_slab_objects(str
* Slow path. The lockless freelist is empty or we need to perform
* debugging duties.
*
- * Interrupts are disabled.
- *
* Processing is still very fast if new objects have been freed to the
* regular freelist. In that case we simply take over the regular freelist
* as the lockless freelist and zap the regular freelist.
@@ -2100,7 +2233,7 @@ static void *__slab_alloc(struct kmem_ca
if (!c->page)
goto new_slab;
-
+redo:
if (unlikely(!node_match(c, node))) {
stat(s, ALLOC_NODE_MISMATCH);
deactivate_slab(s, c);
@@ -2133,7 +2266,7 @@ static void *__slab_alloc(struct kmem_ca
NULL, new.counters,
"__slab_alloc"));
- if (unlikely(!object)) {
+ if (!object) {
c->page = NULL;
stat(s, DEACTIVATE_BYPASS);
goto new_slab;
@@ -2148,6 +2281,17 @@ load_freelist:
return object;
new_slab:
+
+ if (c->partial) {
+ c->page = c->partial;
+ c->partial = c->page->next;
+ c->node = page_to_nid(c->page);
+ stat(s, CPU_PARTIAL_ALLOC);
+ c->freelist = NULL;
+ goto redo;
+ }
+
+ /* Then do expensive stuff like retrieving pages from the partial lists */
object = get_partial(s, gfpflags, node, c);
if (unlikely(!object)) {
@@ -2341,16 +2485,29 @@ static void __slab_free(struct kmem_cach
was_frozen = new.frozen;
new.inuse--;
if ((!new.inuse || !prior) && !was_frozen && !n) {
- n = get_node(s, page_to_nid(page));
- /*
- * Speculatively acquire the list_lock.
- * If the cmpxchg does not succeed then we may
- * drop the list_lock without any processing.
- *
- * Otherwise the list_lock will synchronize with
- * other processors updating the list of slabs.
- */
- spin_lock_irqsave(&n->list_lock, flags);
+
+ if (!kmem_cache_debug(s) && !prior)
+
+ /*
+ * Slab was on no list before and will be partially empty
+ * We can defer the list move and instead freeze it.
+ */
+ new.frozen = 1;
+
+ else { /* Needs to be taken off a list */
+
+ n = get_node(s, page_to_nid(page));
+ /*
+ * Speculatively acquire the list_lock.
+ * If the cmpxchg does not succeed then we may
+ * drop the list_lock without any processing.
+ *
+ * Otherwise the list_lock will synchronize with
+ * other processors updating the list of slabs.
+ */
+ spin_lock_irqsave(&n->list_lock, flags);
+
+ }
}
inuse = new.inuse;
@@ -2360,7 +2517,15 @@ static void __slab_free(struct kmem_cach
"__slab_free"));
if (likely(!n)) {
- /*
+
+ /*
+ * If we just froze the page then put it onto the
+ * per cpu partial list.
+ */
+ if (new.frozen && !was_frozen)
+ put_cpu_partial(s, page, 1);
+
+ /*
* The list lock was not taken therefore no list
* activity can be necessary.
*/
@@ -2427,7 +2592,6 @@ static __always_inline void slab_free(st
slab_free_hook(s, x);
redo:
-
/*
* Determine the currently cpus per cpu slab.
* The cpu may change afterward. However that does not matter since
@@ -2917,7 +3081,10 @@ static int kmem_cache_open(struct kmem_c
* The larger the object size is, the more pages we want on the partial
* list to avoid pounding the page allocator excessively.
*/
- set_min_partial(s, ilog2(s->size));
+ set_min_partial(s, ilog2(s->size) / 2);
+
+ s->cpu_partial = 50;
+
s->refcount = 1;
#ifdef CONFIG_NUMA
s->remote_node_defrag_ratio = 1000;
@@ -4325,6 +4492,7 @@ static ssize_t show_slab_objects(struct
for_each_possible_cpu(cpu) {
struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
+ struct page *page;
if (!c || c->node < 0)
continue;
@@ -4340,6 +4508,13 @@ static ssize_t show_slab_objects(struct
total += x;
nodes[c->node] += x;
}
+ page = c->partial;
+
+ if (page) {
+ x = page->pobjects;
+ total += x;
+ nodes[c->node] += x;
+ }
per_cpu[c->node]++;
}
}
@@ -4491,6 +4666,27 @@ static ssize_t min_partial_store(struct
}
SLAB_ATTR(min_partial);
+static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
+{
+ return sprintf(buf, "%u\n", s->cpu_partial);
+}
+
+static ssize_t cpu_partial_store(struct kmem_cache *s, const char *buf,
+ size_t length)
+{
+ unsigned long objects;
+ int err;
+
+ err = strict_strtoul(buf, 10, &objects);
+ if (err)
+ return err;
+
+ s->cpu_partial = objects;
+ flush_all(s);
+ return length;
+}
+SLAB_ATTR(cpu_partial);
+
static ssize_t ctor_show(struct kmem_cache *s, char *buf)
{
if (!s->ctor)
@@ -4529,6 +4725,43 @@ static ssize_t objects_partial_show(stru
}
SLAB_ATTR_RO(objects_partial);
+static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
+{
+ unsigned long sum = 0;
+ unsigned long pages = 0;
+ int cpu;
+ int len;
+ int *data = kmalloc(nr_cpu_ids * sizeof(int), GFP_KERNEL);
+
+ if (!data)
+ return -ENOMEM;
+
+ for_each_online_cpu(cpu) {
+ unsigned x = 0;
+ struct page *page;
+
+ page = per_cpu_ptr(s->cpu_slab, cpu)->partial;
+ if (page) {
+ pages += page->pages;
+ x = page->pobjects;
+ }
+ data[cpu] = x;
+ sum += x;
+ }
+
+ len = sprintf(buf, "%lu(%lu)", sum, pages);
+
+#ifdef CONFIG_SMP
+ for_each_online_cpu(cpu) {
+ if (data[cpu] && len < PAGE_SIZE - 20)
+ len += sprintf(buf + len, " C%d=%u", cpu, data[cpu]);
+ }
+#endif
+ kfree(data);
+ return len + sprintf(buf + len, "\n");
+}
+SLAB_ATTR_RO(slabs_cpu_partial);
+
static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
{
return sprintf(buf, "%d\n", !!(s->flags & SLAB_RECLAIM_ACCOUNT));
@@ -4851,6 +5084,8 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
STAT_ATTR(ORDER_FALLBACK, order_fallback);
STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
+STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
+STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
#endif
static struct attribute *slab_attrs[] = {
@@ -4859,6 +5094,7 @@ static struct attribute *slab_attrs[] =
&objs_per_slab_attr.attr,
&order_attr.attr,
&min_partial_attr.attr,
+ &cpu_partial_attr.attr,
&objects_attr.attr,
&objects_partial_attr.attr,
&partial_attr.attr,
@@ -4871,6 +5107,7 @@ static struct attribute *slab_attrs[] =
&destroy_by_rcu_attr.attr,
&shrink_attr.attr,
&reserved_attr.attr,
+ &slabs_cpu_partial_attr.attr,
#ifdef CONFIG_SLUB_DEBUG
&total_objects_attr.attr,
&slabs_attr.attr,
@@ -4912,6 +5149,8 @@ static struct attribute *slab_attrs[] =
&order_fallback_attr.attr,
&cmpxchg_double_fail_attr.attr,
&cmpxchg_double_cpu_fail_attr.attr,
+ &cpu_partial_alloc_attr.attr,
+ &cpu_partial_free_attr.attr,
#endif
#ifdef CONFIG_FAILSLAB
&failslab_attr.attr,
Index: linux-2.6/include/linux/mm_types.h
===================================================================
--- linux-2.6.orig/include/linux/mm_types.h 2011-08-02 12:07:33.575281487 -0500
+++ linux-2.6/include/linux/mm_types.h 2011-08-02 12:07:35.225281476 -0500
@@ -79,9 +79,16 @@ struct page {
};
/* Third double word block */
- struct list_head lru; /* Pageout list, eg. active_list
+ union {
+ struct list_head lru; /* Pageout list, eg. active_list
* protected by zone->lru_lock !
*/
+ struct { /* SLUB pages in frozen state */
+ struct page *next; /* Next partial slab */
+ short int pages; /* Nr of partial slabs left */
+ short int pobjects; /* Approximate # of objects */
+ };
+ };
/* Remainder is not double word aligned */
union {
^ permalink raw reply [flat|nested] 13+ messages in thread
* [slub p3 7/7] slub: update slabinfo tools to report per cpu partial list statistics
2011-08-01 16:28 [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 Christoph Lameter
` (5 preceding siblings ...)
2011-08-01 16:28 ` [slub p3 6/7] slub: per cpu cache for partial pages Christoph Lameter
@ 2011-08-01 16:28 ` Christoph Lameter
2011-08-02 4:15 ` [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 David Rientjes
7 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-08-01 16:28 UTC (permalink / raw)
To: Pekka Enberg
Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-kernel
[-- Attachment #1: update_slabinfo_cp --]
[-- Type: text/plain, Size: 1808 bytes --]
Update the slabinfo tool to report the stats on per cpu partial list usage.
Signed-off-by: Christoph Lameter <cl@linux.com>
---
tools/slub/slabinfo.c | 8 ++++++++
1 file changed, 8 insertions(+)
Index: linux-2.6/tools/slub/slabinfo.c
===================================================================
--- linux-2.6.orig/tools/slub/slabinfo.c 2011-08-01 11:03:01.275859455 -0500
+++ linux-2.6/tools/slub/slabinfo.c 2011-08-01 11:04:45.125858790 -0500
@@ -42,6 +42,7 @@ struct slabinfo {
unsigned long deactivate_remote_frees, order_fallback;
unsigned long cmpxchg_double_cpu_fail, cmpxchg_double_fail;
unsigned long alloc_node_mismatch, deactivate_bypass;
+ unsigned long cpu_partial_alloc, cpu_partial_free;
int numa[MAX_NODES];
int numa_partial[MAX_NODES];
} slabinfo[MAX_SLABS];
@@ -455,6 +456,11 @@ static void slab_stats(struct slabinfo *
s->alloc_from_partial * 100 / total_alloc,
s->free_remove_partial * 100 / total_free);
+ printf("Cpu partial list %8lu %8lu %3lu %3lu\n",
+ s->cpu_partial_alloc, s->cpu_partial_free,
+ s->cpu_partial_alloc * 100 / total_alloc,
+ s->cpu_partial_free * 100 / total_free);
+
printf("RemoteObj/SlabFrozen %8lu %8lu %3lu %3lu\n",
s->deactivate_remote_frees, s->free_frozen,
s->deactivate_remote_frees * 100 / total_alloc,
@@ -1209,6 +1215,8 @@ static void read_slab_dir(void)
slab->order_fallback = get_obj("order_fallback");
slab->cmpxchg_double_cpu_fail = get_obj("cmpxchg_double_cpu_fail");
slab->cmpxchg_double_fail = get_obj("cmpxchg_double_fail");
+ slab->cpu_partial_alloc = get_obj("cpu_partial_alloc");
+ slab->cpu_partial_free = get_obj("cpu_partial_free");
slab->alloc_node_mismatch = get_obj("alloc_node_mismatch");
slab->deactivate_bypass = get_obj("deactivate_bypass");
chdir("..");
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3
2011-08-01 16:28 [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 Christoph Lameter
` (6 preceding siblings ...)
2011-08-01 16:28 ` [slub p3 7/7] slub: update slabinfo tools to report per cpu partial list statistics Christoph Lameter
@ 2011-08-02 4:15 ` David Rientjes
2011-08-02 14:10 ` Christoph Lameter
7 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2011-08-02 4:15 UTC (permalink / raw)
To: Christoph Lameter
Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
Eric Dumazet, Adrian Drzewiecki, linux-kernel
On Mon, 1 Aug 2011, Christoph Lameter wrote:
> Performance:
>
> Before After
> ./hackbench 100 process 200000
> Time: 2299.072 1742.454
> ./hackbench 100 process 20000
> Time: 224.654 182.393
> ./hackbench 100 process 20000
> Time: 227.126 182.780
> ./hackbench 100 process 20000
> Time: 219.608 182.899
> ./hackbench 10 process 20000
> Time: 21.769 18.756
> ./hackbench 10 process 20000
> Time: 21.657 18.938
> ./hackbench 10 process 20000
> Time: 23.193 19.537
> ./hackbench 1 process 20000
> Time: 2.337 2.263
> ./hackbench 1 process 20000
> Time: 2.223 2.271
> ./hackbench 1 process 20000
> Time: 2.269 2.301
>
This applied nicely to Linus' tree so I've moved to testing atop that
rather than slub/lockless on the same netperf testing environment as the
slab vs. slub comparison. The benchmarking completed without error and
here are the results:
threads before after
16 75509 75443 (-0.1%)
32 118121 117558 (-0.5%)
48 149997 149514 (-0.3%)
64 185216 186772 (+0.8%)
80 221195 222612 (+0.6%)
96 239732 241089 (+0.6%)
112 261967 266643 (+1.8%)
128 272946 281794 (+3.2%)
144 279202 289421 (+3.7%)
160 285745 297216 (+4.0%)
So the patchset certainly looks helpful, especially if it improves other
benchmarks as well.
I'll review the patches individually, starting with the cleanup patches
that can hopefully be pushed quickly while we discuss per-cpu partial
lists further.
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3
2011-08-02 4:15 ` [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 David Rientjes
@ 2011-08-02 14:10 ` Christoph Lameter
2011-08-02 16:37 ` David Rientjes
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Lameter @ 2011-08-02 14:10 UTC (permalink / raw)
To: David Rientjes
Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
Eric Dumazet, Adrian Drzewiecki, linux-kernel
On Mon, 1 Aug 2011, David Rientjes wrote:
>
> This applied nicely to Linus' tree so I've moved to testing atop that
> rather than slub/lockless on the same netperf testing environment as the
> slab vs. slub comparison. The benchmarking completed without error and
> here are the results:
>
> threads before after
> 16 75509 75443 (-0.1%)
> 32 118121 117558 (-0.5%)
> 48 149997 149514 (-0.3%)
> 64 185216 186772 (+0.8%)
> 80 221195 222612 (+0.6%)
> 96 239732 241089 (+0.6%)
> 112 261967 266643 (+1.8%)
> 128 272946 281794 (+3.2%)
> 144 279202 289421 (+3.7%)
> 160 285745 297216 (+4.0%)
>
> So the patchset certainly looks helpful, especially if it improves other
> benchmarks as well.
The problem is that the partial approach has not been fine tuned yet for
these larger loads. And the proper knobs are not implemented yet.
> I'll review the patches individually, starting with the cleanup patches
> that can hopefully be pushed quickly while we discuss per-cpu partial
> lists further.
I am currently reworking the patches to operate on a linked list instead
of a very small array of pointers to page structs. That will allow much
larger per cpu partial lists and a dynamic configuration of the sizes.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3
2011-08-02 14:10 ` Christoph Lameter
@ 2011-08-02 16:37 ` David Rientjes
2011-08-02 16:47 ` Christoph Lameter
0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2011-08-02 16:37 UTC (permalink / raw)
To: Christoph Lameter
Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
Eric Dumazet, Adrian Drzewiecki, linux-kernel
On Tue, 2 Aug 2011, Christoph Lameter wrote:
> > This applied nicely to Linus' tree so I've moved to testing atop that
> > rather than slub/lockless on the same netperf testing environment as the
> > slab vs. slub comparison. The benchmarking completed without error and
> > here are the results:
> >
> > threads before after
> > 16 75509 75443 (-0.1%)
> > 32 118121 117558 (-0.5%)
> > 48 149997 149514 (-0.3%)
> > 64 185216 186772 (+0.8%)
> > 80 221195 222612 (+0.6%)
> > 96 239732 241089 (+0.6%)
> > 112 261967 266643 (+1.8%)
> > 128 272946 281794 (+3.2%)
> > 144 279202 289421 (+3.7%)
> > 160 285745 297216 (+4.0%)
> >
> > So the patchset certainly looks helpful, especially if it improves other
> > benchmarks as well.
>
> The problem is that the partial approach has not been fine tuned yet for
> these larger loads. And the proper knobs are not implemented yet.
>
Aside from per-cpu partial lists, I think this particular benchmark would
benefit from two other changes on my testing environment:
- remote cpu freeing so that objects allocated on a different cpu get
moved to a separate list that will eventually get flushed back to the
origin cpu to be reallocated later with sane heuristics to determine
when to take the necessary lock and cacheline bounce, and
- a preference to only pull a slab from the partial lists if there are
a sane number of free objects risking perhaps a costly page allocation
that will nevertheless allow the fastpaths to be exercised a little
more either way (this benchmark suffers horribly when only one or two
objects can be allocated from a partial slab).
> > I'll review the patches individually, starting with the cleanup patches
> > that can hopefully be pushed quickly while we discuss per-cpu partial
> > lists further.
>
> I am currently reworking the patches to operate on a linked list instead
> of a very small array of pointers to page structs. That will allow much
> larger per cpu partial lists and a dynamic configuration of the sizes.
>
Ok, so is the per-cpu partial list patch in this series worth the review
or are you going to go under the hood and rework it?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3
2011-08-02 16:37 ` David Rientjes
@ 2011-08-02 16:47 ` Christoph Lameter
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-08-02 16:47 UTC (permalink / raw)
To: David Rientjes
Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
Eric Dumazet, Adrian Drzewiecki, linux-kernel
On Tue, 2 Aug 2011, David Rientjes wrote:
> Aside from per-cpu partial lists, I think this particular benchmark would
> benefit from two other changes on my testing environment:
>
> - remote cpu freeing so that objects allocated on a different cpu get
> moved to a separate list that will eventually get flushed back to the
> origin cpu to be reallocated later with sane heuristics to determine
> when to take the necessary lock and cacheline bounce, and
Remote cpu should scale better now since we only need to acquire the
cacheline of the page struct exclusively. With per cpu partials the
pressure on the node lock is pretty much gone.
> - a preference to only pull a slab from the partial lists if there are
> a sane number of free objects risking perhaps a costly page allocation
> that will nevertheless allow the fastpaths to be exercised a little
> more either way (this benchmark suffers horribly when only one or two
> objects can be allocated from a partial slab).
The partial patch will pull a large number of partial slabs off the per
node list if possible.
> > I am currently reworking the patches to operate on a linked list instead
> > of a very small array of pointers to page structs. That will allow much
> > larger per cpu partial lists and a dynamic configuration of the sizes.
> >
>
> Ok, so is the per-cpu partial list patch in this series worth the review
> or are you going to go under the hood and rework it?
No its not worth it right now. There are already too many changes. I can
send you my current patches if you want to get directly involved.
Certainly would appreciate the help.
^ permalink raw reply [flat|nested] 13+ messages in thread