* [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)
@ 2014-10-22 15:55 Christoph Lameter
2014-10-22 15:55 ` [RFC 1/4] slub: Remove __slab_alloc code duplication Christoph Lameter
` (4 more replies)
0 siblings, 5 replies; 15+ messages in thread
From: Christoph Lameter @ 2014-10-22 15:55 UTC (permalink / raw)
To: akpm; +Cc: rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
We had to insert a preempt enable/disable in the fastpath a while ago. This
was mainly due to a lot of state that is kept to be allocating from the per
cpu freelist. In particular the page field is not covered by
this_cpu_cmpxchg used in the fastpath to do the necessary atomic state
change for fast path allocation and freeing.
This patch removes the need for the page field to describe the state of the
per cpu list. The freelist pointer can be used to determine the page struct
address if necessary.
However, currently this does not work for the termination value of a list
which is NULL and the same for all slab pages. If we use a valid pointer
into the page as well as set the last bit then all freelist pointers can
always be used to determine the address of the page struct and we will not
need the page field anymore in the per cpu are for a slab. Testing for the
end of the list is a test if the first bit is set.
So the first patch changes the termination pointer for freelists to do just
that. The second removes the page field and then third can then remove the
preempt enable/disable.
There are currently a number of caveats because we are adding calls to
page_address() and virt_to_head_page() in a number of code paths. These
can hopefully be removed one way or the other.
Removing the ->page field reduces the cache footprint of the fastpath so hopefully overall
allocator effectiveness will increase further. Also RT uses full preemption which means
that currently pretty expensive code has to be inserted into the fastpath. This approach
allows the removal of that code and a corresponding performance increase.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 1/4] slub: Remove __slab_alloc code duplication
2014-10-22 15:55 [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT) Christoph Lameter
@ 2014-10-22 15:55 ` Christoph Lameter
2014-10-22 18:04 ` Thomas Gleixner
2014-10-22 15:55 ` [RFC 2/4] slub: Use end_token instead of NULL to terminate freelists Christoph Lameter
` (3 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2014-10-22 15:55 UTC (permalink / raw)
To: akpm; +Cc: rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
[-- Attachment #1: simplify_code --]
[-- Type: text/plain, Size: 1576 bytes --]
Somehow the two branches in __slab_alloc do the same.
Unify them.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c
+++ linux/mm/slub.c
@@ -2280,12 +2280,8 @@ redo:
if (node != NUMA_NO_NODE && !node_present_pages(node))
searchnode = node_to_mem_node(node);
- if (unlikely(!node_match(page, searchnode))) {
- stat(s, ALLOC_NODE_MISMATCH);
- deactivate_slab(s, page, c->freelist);
- c->page = NULL;
- c->freelist = NULL;
- goto new_slab;
+ if (unlikely(!node_match(page, searchnode)))
+ goto deactivate;
}
}
@@ -2294,12 +2290,8 @@ redo:
* PFMEMALLOC but right now, we are losing the pfmemalloc
* information when the page leaves the per-cpu allocator
*/
- if (unlikely(!pfmemalloc_match(page, gfpflags))) {
- deactivate_slab(s, page, c->freelist);
- c->page = NULL;
- c->freelist = NULL;
- goto new_slab;
- }
+ if (unlikely(!pfmemalloc_match(page, gfpflags)))
+ goto deactivate;
/* must check again c->freelist in case of cpu migration or IRQ */
freelist = c->freelist;
@@ -2328,6 +2320,11 @@ load_freelist:
local_irq_restore(flags);
return freelist;
+deactivate:
+ deactivate_slab(s, page, c->freelist);
+ c->page = NULL;
+ c->freelist = NULL;
+
new_slab:
if (c->partial) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 2/4] slub: Use end_token instead of NULL to terminate freelists
2014-10-22 15:55 [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT) Christoph Lameter
2014-10-22 15:55 ` [RFC 1/4] slub: Remove __slab_alloc code duplication Christoph Lameter
@ 2014-10-22 15:55 ` Christoph Lameter
2014-10-22 15:55 ` [RFC 3/4] slub: Drop ->page field from kmem_cache_cpu Christoph Lameter
` (2 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2014-10-22 15:55 UTC (permalink / raw)
To: akpm; +Cc: rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
[-- Attachment #1: slub_use_end_token_instead_of_null --]
[-- Type: text/plain, Size: 6938 bytes --]
Ending a list with NULL means that the termination of a list is the same
for all slab pages. The pointers of freelists otherwise always are
pointing to the address space of the page. Make termination of a
list possible by setting the lowest bit in the freelist address
and use the start address of a page if no other address is available
for list termination.
This will allow us to determine the page struct address from a
freelist pointer in the future.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c
+++ linux/mm/slub.c
@@ -132,6 +132,16 @@ static inline bool kmem_cache_has_cpu_pa
#endif
}
+static bool is_end_token(const void *freelist)
+{
+ return ((unsigned long)freelist) & 1;
+}
+
+static void *end_token(const void *address)
+{
+ return (void *)((unsigned long)address | 1);
+}
+
/*
* Issues still to be resolved:
*
@@ -234,7 +244,7 @@ static inline int check_valid_pointer(st
base = page_address(page);
if (object < base || object >= base + page->objects * s->size ||
- (object - base) % s->size) {
+ ((object - base) % s->size && !is_end_token(object))) {
return 0;
}
@@ -451,7 +461,7 @@ static void get_map(struct kmem_cache *s
void *p;
void *addr = page_address(page);
- for (p = page->freelist; p; p = get_freepointer(s, p))
+ for (p = page->freelist; !is_end_token(p); p = get_freepointer(s, p))
set_bit(slab_index(p, s, addr), map);
}
@@ -829,7 +839,7 @@ static int check_object(struct kmem_cach
* of the free objects in this slab. May cause
* another error because the object count is now wrong.
*/
- set_freepointer(s, p, NULL);
+ set_freepointer(s, p, end_token(page_address(page)));
return 0;
}
return 1;
@@ -874,7 +884,7 @@ static int on_freelist(struct kmem_cache
unsigned long max_objects;
fp = page->freelist;
- while (fp && nr <= page->objects) {
+ while (!is_end_token(fp) && nr <= page->objects) {
if (fp == search)
return 1;
if (!check_valid_pointer(s, page, fp)) {
@@ -1033,7 +1043,7 @@ bad:
*/
slab_fix(s, "Marking all objects used");
page->inuse = page->objects;
- page->freelist = NULL;
+ page->freelist = end_token(page_address(page));
}
return 0;
}
@@ -1401,7 +1411,7 @@ static struct page *new_slab(struct kmem
if (likely(idx < page->objects))
set_freepointer(s, p, p + s->size);
else
- set_freepointer(s, p, NULL);
+ set_freepointer(s, p, end_token(start));
}
page->freelist = start;
@@ -1544,12 +1554,11 @@ static inline void *acquire_slab(struct
freelist = page->freelist;
counters = page->counters;
new.counters = counters;
+ new.freelist = freelist;
*objects = new.objects - new.inuse;
if (mode) {
new.inuse = page->objects;
- new.freelist = NULL;
- } else {
- new.freelist = freelist;
+ new.freelist = end_token(freelist);
}
VM_BUG_ON(new.frozen);
@@ -1785,7 +1794,7 @@ static void deactivate_slab(struct kmem_
struct page new;
struct page old;
- if (page->freelist) {
+ if (!is_end_token(page->freelist)) {
stat(s, DEACTIVATE_REMOTE_FREES);
tail = DEACTIVATE_TO_TAIL;
}
@@ -1798,7 +1807,8 @@ static void deactivate_slab(struct kmem_
* There is no need to take the list->lock because the page
* is still frozen.
*/
- while (freelist && (nextfree = get_freepointer(s, freelist))) {
+ if (freelist)
+ while (!is_end_token(freelist) && (nextfree = get_freepointer(s, freelist))) {
void *prior;
unsigned long counters;
@@ -1816,7 +1826,8 @@ static void deactivate_slab(struct kmem_
"drain percpu freelist"));
freelist = nextfree;
- }
+ } else
+ freelist = end_token(page_address(page));
/*
* Stage two: Ensure that the page is unfrozen while the
@@ -1840,7 +1851,7 @@ redo:
/* Determine target state of the slab */
new.counters = old.counters;
- if (freelist) {
+ if (!is_end_token(freelist)) {
new.inuse--;
set_freepointer(s, freelist, old.freelist);
new.freelist = freelist;
@@ -1851,7 +1862,7 @@ redo:
if (!new.inuse && n->nr_partial >= s->min_partial)
m = M_FREE;
- else if (new.freelist) {
+ else if (!is_end_token(new.freelist)) {
m = M_PARTIAL;
if (!lock) {
lock = 1;
@@ -2169,7 +2180,7 @@ static inline void *new_slab_objects(str
freelist = get_partial(s, flags, node, c);
- if (freelist)
+ if (freelist && !is_end_token(freelist))
return freelist;
page = new_slab(s, flags, node);
@@ -2183,7 +2194,7 @@ static inline void *new_slab_objects(str
* muck around with it freely without cmpxchg
*/
freelist = page->freelist;
- page->freelist = NULL;
+ page->freelist = end_token(freelist);
stat(s, ALLOC_SLAB);
c->page = page;
@@ -2226,11 +2237,11 @@ static inline void *get_freelist(struct
VM_BUG_ON(!new.frozen);
new.inuse = page->objects;
- new.frozen = freelist != NULL;
+ new.frozen = !is_end_token(freelist);
} while (!__cmpxchg_double_slab(s, page,
freelist, counters,
- NULL, new.counters,
+ end_token(freelist), new.counters,
"get_freelist"));
return freelist;
@@ -2282,7 +2293,6 @@ redo:
if (unlikely(!node_match(page, searchnode)))
goto deactivate;
- }
}
/*
@@ -2295,12 +2305,12 @@ redo:
/* must check again c->freelist in case of cpu migration or IRQ */
freelist = c->freelist;
- if (freelist)
+ if (freelist && !is_end_token(freelist))
goto load_freelist;
freelist = get_freelist(s, page);
- if (!freelist) {
+ if (!freelist || is_end_token(freelist)) {
c->page = NULL;
stat(s, DEACTIVATE_BYPASS);
goto new_slab;
@@ -2407,7 +2418,7 @@ redo:
object = c->freelist;
page = c->page;
- if (unlikely(!object || !node_match(page, node))) {
+ if (unlikely(!object || is_end_token(object) || !node_match(page, node))) {
object = __slab_alloc(s, gfpflags, node, addr, c);
stat(s, ALLOC_SLOWPATH);
} else {
@@ -2537,9 +2548,9 @@ static void __slab_free(struct kmem_cach
new.counters = counters;
was_frozen = new.frozen;
new.inuse--;
- if ((!new.inuse || !prior) && !was_frozen) {
+ if ((!new.inuse || is_end_token(prior)) && !was_frozen) {
- if (kmem_cache_has_cpu_partial(s) && !prior) {
+ if (kmem_cache_has_cpu_partial(s) && is_end_token(prior)) {
/*
* Slab was on no list before and will be
@@ -2596,7 +2607,7 @@ static void __slab_free(struct kmem_cach
* Objects left in the slab. If it was not on the partial list before
* then add it.
*/
- if (!kmem_cache_has_cpu_partial(s) && unlikely(!prior)) {
+ if (!kmem_cache_has_cpu_partial(s) && unlikely(is_end_token(prior))) {
if (kmem_cache_debug(s))
remove_full(s, n, page);
add_partial(n, page, DEACTIVATE_TO_TAIL);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 3/4] slub: Drop ->page field from kmem_cache_cpu
2014-10-22 15:55 [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT) Christoph Lameter
2014-10-22 15:55 ` [RFC 1/4] slub: Remove __slab_alloc code duplication Christoph Lameter
2014-10-22 15:55 ` [RFC 2/4] slub: Use end_token instead of NULL to terminate freelists Christoph Lameter
@ 2014-10-22 15:55 ` Christoph Lameter
2014-10-22 15:55 ` [RFC 4/4] slub: Remove preemption disable/enable from fastpath Christoph Lameter
2014-10-23 8:09 ` [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT) Joonsoo Kim
4 siblings, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2014-10-22 15:55 UTC (permalink / raw)
To: akpm; +Cc: rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
[-- Attachment #1: slub_drop_kmem_cache_cpu_page_Field --]
[-- Type: text/plain, Size: 5514 bytes --]
Dropping the page field is possible since the page struct address
of an object or a freelist pointer can now always be calcualted from
the address. No freelist pointer will be NULL anymore so use
NULL to signify the condition that the current cpu has no
percpu slab attached to it.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/include/linux/slub_def.h
===================================================================
--- linux.orig/include/linux/slub_def.h
+++ linux/include/linux/slub_def.h
@@ -40,7 +40,6 @@ enum stat_item {
struct kmem_cache_cpu {
void **freelist; /* Pointer to next available object */
unsigned long tid; /* Globally unique transaction id */
- struct page *page; /* The slab from which we are allocating */
struct page *partial; /* Partially allocated frozen slabs */
#ifdef CONFIG_SLUB_STATS
unsigned stat[NR_SLUB_STAT_ITEMS];
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c
+++ linux/mm/slub.c
@@ -1611,7 +1611,6 @@ static void *get_partial_node(struct kme
available += objects;
if (!object) {
- c->page = page;
stat(s, ALLOC_FROM_PARTIAL);
object = t;
} else {
@@ -2049,10 +2048,9 @@ static void put_cpu_partial(struct kmem_
static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
{
stat(s, CPUSLAB_FLUSH);
- deactivate_slab(s, c->page, c->freelist);
+ deactivate_slab(s, virt_to_head_page(c->freelist), c->freelist);
c->tid = next_tid(c->tid);
- c->page = NULL;
c->freelist = NULL;
}
@@ -2066,7 +2064,7 @@ static inline void __flush_cpu_slab(stru
struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
if (likely(c)) {
- if (c->page)
+ if (c->freelist)
flush_slab(s, c);
unfreeze_partials(s, c);
@@ -2085,7 +2083,7 @@ static bool has_cpu_slab(int cpu, void *
struct kmem_cache *s = info;
struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
- return c->page || c->partial;
+ return c->freelist || c->partial;
}
static void flush_all(struct kmem_cache *s)
@@ -2186,7 +2184,7 @@ static inline void *new_slab_objects(str
page = new_slab(s, flags, node);
if (page) {
c = raw_cpu_ptr(s->cpu_slab);
- if (c->page)
+ if (c->freelist)
flush_slab(s, c);
/*
@@ -2197,7 +2195,6 @@ static inline void *new_slab_objects(str
page->freelist = end_token(freelist);
stat(s, ALLOC_SLAB);
- c->page = page;
*pc = c;
} else
freelist = NULL;
@@ -2280,9 +2277,10 @@ static void *__slab_alloc(struct kmem_ca
c = this_cpu_ptr(s->cpu_slab);
#endif
- page = c->page;
- if (!page)
+ if (!c->freelist || is_end_token(c->freelist))
goto new_slab;
+
+ page = virt_to_head_page(c->freelist);
redo:
if (unlikely(!node_match(page, node))) {
@@ -2311,7 +2309,7 @@ redo:
freelist = get_freelist(s, page);
if (!freelist || is_end_token(freelist)) {
- c->page = NULL;
+ c->freelist = NULL;
stat(s, DEACTIVATE_BYPASS);
goto new_slab;
}
@@ -2324,7 +2322,7 @@ load_freelist:
* page is pointing to the page from which the objects are obtained.
* That page must be frozen for per cpu allocations to work.
*/
- VM_BUG_ON(!c->page->frozen);
+ VM_BUG_ON(!virt_to_head_page(freelist)->frozen);
c->freelist = get_freepointer(s, freelist);
c->tid = next_tid(c->tid);
local_irq_restore(flags);
@@ -2332,16 +2330,15 @@ load_freelist:
deactivate:
deactivate_slab(s, page, c->freelist);
- c->page = NULL;
c->freelist = NULL;
new_slab:
if (c->partial) {
- page = c->page = c->partial;
+ page = c->partial;
c->partial = page->next;
stat(s, CPU_PARTIAL_ALLOC);
- c->freelist = NULL;
+ c->freelist = end_token(page_address(page));
goto redo;
}
@@ -2353,7 +2350,7 @@ new_slab:
return NULL;
}
- page = c->page;
+ page = virt_to_head_page(freelist);
if (likely(!kmem_cache_debug(s) && pfmemalloc_match(page, gfpflags)))
goto load_freelist;
@@ -2363,7 +2360,6 @@ new_slab:
goto new_slab; /* Slab failed checks. Next slab needed */
deactivate_slab(s, page, get_freepointer(s, freelist));
- c->page = NULL;
c->freelist = NULL;
local_irq_restore(flags);
return freelist;
@@ -2384,7 +2380,6 @@ static __always_inline void *slab_alloc_
{
void **object;
struct kmem_cache_cpu *c;
- struct page *page;
unsigned long tid;
if (slab_pre_alloc_hook(s, gfpflags))
@@ -2416,8 +2411,7 @@ redo:
preempt_enable();
object = c->freelist;
- page = c->page;
- if (unlikely(!object || is_end_token(object) || !node_match(page, node))) {
+ if (unlikely(!object || is_end_token(object) || !node_match(virt_to_head_page(object), node))) {
object = __slab_alloc(s, gfpflags, node, addr, c);
stat(s, ALLOC_SLOWPATH);
} else {
@@ -2665,7 +2659,7 @@ redo:
tid = c->tid;
preempt_enable();
- if (likely(page == c->page)) {
+ if (likely(c->freelist && page == virt_to_head_page(c->freelist))) {
set_freepointer(s, object, c->freelist);
if (unlikely(!this_cpu_cmpxchg_double(
@@ -4191,10 +4185,10 @@ static ssize_t show_slab_objects(struct
int node;
struct page *page;
- page = ACCESS_ONCE(c->page);
- if (!page)
+ if (!c->freelist)
continue;
+ page = virt_to_head_page(c->freelist);
node = page_to_nid(page);
if (flags & SO_TOTAL)
x = page->objects;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 4/4] slub: Remove preemption disable/enable from fastpath
2014-10-22 15:55 [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT) Christoph Lameter
` (2 preceding siblings ...)
2014-10-22 15:55 ` [RFC 3/4] slub: Drop ->page field from kmem_cache_cpu Christoph Lameter
@ 2014-10-22 15:55 ` Christoph Lameter
2014-10-23 8:09 ` [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT) Joonsoo Kim
4 siblings, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2014-10-22 15:55 UTC (permalink / raw)
To: akpm; +Cc: rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
[-- Attachment #1: slub_fastpath_remove_preempt --]
[-- Type: text/plain, Size: 4017 bytes --]
We can now use a this_cpu_cmpxchg_double to update two 64
bit values that are the entire description of the per cpu
freelist. There is no need anymore to disable preempt.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c
+++ linux/mm/slub.c
@@ -2261,21 +2261,15 @@ static inline void *get_freelist(struct
* a call to the page allocator and the setup of a new slab.
*/
static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
- unsigned long addr, struct kmem_cache_cpu *c)
+ unsigned long addr)
{
void *freelist;
struct page *page;
unsigned long flags;
+ struct kmem_cache_cpu *c;
local_irq_save(flags);
-#ifdef CONFIG_PREEMPT
- /*
- * We may have been preempted and rescheduled on a different
- * cpu before disabling interrupts. Need to reload cpu area
- * pointer.
- */
c = this_cpu_ptr(s->cpu_slab);
-#endif
if (!c->freelist || is_end_token(c->freelist))
goto new_slab;
@@ -2379,7 +2373,6 @@ static __always_inline void *slab_alloc_
gfp_t gfpflags, int node, unsigned long addr)
{
void **object;
- struct kmem_cache_cpu *c;
unsigned long tid;
if (slab_pre_alloc_hook(s, gfpflags))
@@ -2388,31 +2381,15 @@ static __always_inline void *slab_alloc_
s = memcg_kmem_get_cache(s, gfpflags);
redo:
/*
- * Must read kmem_cache cpu data via this cpu ptr. Preemption is
- * enabled. We may switch back and forth between cpus while
- * reading from one cpu area. That does not matter as long
- * as we end up on the original cpu again when doing the cmpxchg.
- *
- * Preemption is disabled for the retrieval of the tid because that
- * must occur from the current processor. We cannot allow rescheduling
- * on a different processor between the determination of the pointer
- * and the retrieval of the tid.
- */
- preempt_disable();
- c = this_cpu_ptr(s->cpu_slab);
-
- /*
* The transaction ids are globally unique per cpu and per operation on
* a per cpu queue. Thus they can be guarantee that the cmpxchg_double
* occurs on the right processor and that there was no operation on the
* linked list in between.
*/
- tid = c->tid;
- preempt_enable();
-
- object = c->freelist;
+ tid = this_cpu_read(s->cpu_slab->tid);
+ object = this_cpu_read(s->cpu_slab->freelist);
if (unlikely(!object || is_end_token(object) || !node_match(virt_to_head_page(object), node))) {
- object = __slab_alloc(s, gfpflags, node, addr, c);
+ object = __slab_alloc(s, gfpflags, node, addr);
stat(s, ALLOC_SLOWPATH);
} else {
void *next_object = get_freepointer_safe(s, object);
@@ -2641,30 +2618,21 @@ static __always_inline void slab_free(st
struct page *page, void *x, unsigned long addr)
{
void **object = (void *)x;
- struct kmem_cache_cpu *c;
+ void *freelist;
unsigned long tid;
slab_free_hook(s, x);
redo:
- /*
- * Determine the currently cpus per cpu slab.
- * The cpu may change afterward. However that does not matter since
- * data is retrieved via this pointer. If we are on the same cpu
- * during the cmpxchg then the free will succedd.
- */
- preempt_disable();
- c = this_cpu_ptr(s->cpu_slab);
-
- tid = c->tid;
- preempt_enable();
+ tid = this_cpu_read(s->cpu_slab->tid);
+ freelist = this_cpu_read(s->cpu_slab->freelist);
- if (likely(c->freelist && page == virt_to_head_page(c->freelist))) {
- set_freepointer(s, object, c->freelist);
+ if (likely(freelist && page == virt_to_head_page(freelist))) {
+ set_freepointer(s, object, freelist);
if (unlikely(!this_cpu_cmpxchg_double(
s->cpu_slab->freelist, s->cpu_slab->tid,
- c->freelist, tid,
+ freelist, tid,
object, next_tid(tid)))) {
note_cmpxchg_failure("slab_free", s, tid);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 1/4] slub: Remove __slab_alloc code duplication
2014-10-22 15:55 ` [RFC 1/4] slub: Remove __slab_alloc code duplication Christoph Lameter
@ 2014-10-22 18:04 ` Thomas Gleixner
2014-10-22 18:27 ` Christoph Lameter
0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2014-10-22 18:04 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-mm, penberg, iamjoonsoo
On Wed, 22 Oct 2014, Christoph Lameter wrote:
> Somehow the two branches in __slab_alloc do the same.
> Unify them.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c
> +++ linux/mm/slub.c
> @@ -2280,12 +2280,8 @@ redo:
> if (node != NUMA_NO_NODE && !node_present_pages(node))
> searchnode = node_to_mem_node(node);
>
> - if (unlikely(!node_match(page, searchnode))) {
> - stat(s, ALLOC_NODE_MISMATCH);
> - deactivate_slab(s, page, c->freelist);
> - c->page = NULL;
> - c->freelist = NULL;
> - goto new_slab;
> + if (unlikely(!node_match(page, searchnode)))
> + goto deactivate;
> }
That's not compiling at all due to the left over '}' !
And shouldn't you keep the stat(); call in that code path?
Thanks,
tglx
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 1/4] slub: Remove __slab_alloc code duplication
2014-10-22 18:04 ` Thomas Gleixner
@ 2014-10-22 18:27 ` Christoph Lameter
0 siblings, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2014-10-22 18:27 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: akpm, linux-mm, penberg, iamjoonsoo
On Wed, 22 Oct 2014, Thomas Gleixner wrote:
> That's not compiling at all due to the left over '}' !
Argh. Stale patch.
> And shouldn't you keep the stat(); call in that code path?
True. Fixed up patch follows:
Subject: slub: Remove __slab_alloc code duplication
Somehow the two branches in __slab_alloc do the same.
Unify them.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c
+++ linux/mm/slub.c
@@ -2282,10 +2282,7 @@ redo:
if (unlikely(!node_match(page, searchnode))) {
stat(s, ALLOC_NODE_MISMATCH);
- deactivate_slab(s, page, c->freelist);
- c->page = NULL;
- c->freelist = NULL;
- goto new_slab;
+ goto deactivate;
}
}
@@ -2294,12 +2291,8 @@ redo:
* PFMEMALLOC but right now, we are losing the pfmemalloc
* information when the page leaves the per-cpu allocator
*/
- if (unlikely(!pfmemalloc_match(page, gfpflags))) {
- deactivate_slab(s, page, c->freelist);
- c->page = NULL;
- c->freelist = NULL;
- goto new_slab;
- }
+ if (unlikely(!pfmemalloc_match(page, gfpflags)))
+ goto deactivate;
/* must check again c->freelist in case of cpu migration or IRQ */
freelist = c->freelist;
@@ -2328,6 +2321,11 @@ load_freelist:
local_irq_restore(flags);
return freelist;
+deactivate:
+ deactivate_slab(s, page, c->freelist);
+ c->page = NULL;
+ c->freelist = NULL;
+
new_slab:
if (c->partial) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)
2014-10-22 15:55 [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT) Christoph Lameter
` (3 preceding siblings ...)
2014-10-22 15:55 ` [RFC 4/4] slub: Remove preemption disable/enable from fastpath Christoph Lameter
@ 2014-10-23 8:09 ` Joonsoo Kim
2014-10-23 14:18 ` Christoph Lameter
4 siblings, 1 reply; 15+ messages in thread
From: Joonsoo Kim @ 2014-10-23 8:09 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
On Wed, Oct 22, 2014 at 10:55:17AM -0500, Christoph Lameter wrote:
> We had to insert a preempt enable/disable in the fastpath a while ago. This
> was mainly due to a lot of state that is kept to be allocating from the per
> cpu freelist. In particular the page field is not covered by
> this_cpu_cmpxchg used in the fastpath to do the necessary atomic state
> change for fast path allocation and freeing.
>
> This patch removes the need for the page field to describe the state of the
> per cpu list. The freelist pointer can be used to determine the page struct
> address if necessary.
>
> However, currently this does not work for the termination value of a list
> which is NULL and the same for all slab pages. If we use a valid pointer
> into the page as well as set the last bit then all freelist pointers can
> always be used to determine the address of the page struct and we will not
> need the page field anymore in the per cpu are for a slab. Testing for the
> end of the list is a test if the first bit is set.
>
> So the first patch changes the termination pointer for freelists to do just
> that. The second removes the page field and then third can then remove the
> preempt enable/disable.
>
> There are currently a number of caveats because we are adding calls to
> page_address() and virt_to_head_page() in a number of code paths. These
> can hopefully be removed one way or the other.
>
> Removing the ->page field reduces the cache footprint of the fastpath so hopefully overall
> allocator effectiveness will increase further. Also RT uses full preemption which means
> that currently pretty expensive code has to be inserted into the fastpath. This approach
> allows the removal of that code and a corresponding performance increase.
>
Hello, Christoph.
Preemption disable during very short code would cause large problem for RT?
And, if page_address() and virt_to_head_page() remain as current patchset
implementation, this would work worse than before.
I looked at the patchset quickly and found another idea to remove
preemption disable. How about just retrieving s->cpu_slab->tid first,
before accessing s->cpu_slab, in slab_alloc() and slab_free()?
Retrieved tid may ensure that we aren't migrated to other CPUs so that
we can remove code for preemption disable.
Following is the patch implementing above idea.
Thanks.
------------->8------------------------
diff --git a/mm/slub.c b/mm/slub.c
index ae7b9f1..af622d8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2386,28 +2386,21 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
s = memcg_kmem_get_cache(s, gfpflags);
redo:
/*
+ * The transaction ids are globally unique per cpu and per operation on
+ * a per cpu queue. Thus they can be guarantee that the cmpxchg_double
+ * occurs on the right processor and that there was no operation on the
+ * linked list in between.
+ */
+ tid = this_cpu_read(s->cpu_slab->tid);
+
+ /*
* Must read kmem_cache cpu data via this cpu ptr. Preemption is
* enabled. We may switch back and forth between cpus while
* reading from one cpu area. That does not matter as long
* as we end up on the original cpu again when doing the cmpxchg.
- *
- * Preemption is disabled for the retrieval of the tid because that
- * must occur from the current processor. We cannot allow rescheduling
- * on a different processor between the determination of the pointer
- * and the retrieval of the tid.
*/
- preempt_disable();
c = this_cpu_ptr(s->cpu_slab);
- /*
- * The transaction ids are globally unique per cpu and per operation on
- * a per cpu queue. Thus they can be guarantee that the cmpxchg_double
- * occurs on the right processor and that there was no operation on the
- * linked list in between.
- */
- tid = c->tid;
- preempt_enable();
-
object = c->freelist;
page = c->page;
if (unlikely(!object || !node_match(page, node))) {
@@ -2646,18 +2639,16 @@ static __always_inline void slab_free(struct kmem_cache *s,
slab_free_hook(s, x);
redo:
+ tid = this_cpu_read(s->cpu_slab->tid);
+
/*
* Determine the currently cpus per cpu slab.
* The cpu may change afterward. However that does not matter since
* data is retrieved via this pointer. If we are on the same cpu
* during the cmpxchg then the free will succedd.
*/
- preempt_disable();
c = this_cpu_ptr(s->cpu_slab);
- tid = c->tid;
- preempt_enable();
-
if (likely(page == c->page)) {
set_freepointer(s, object, c->freelist);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)
2014-10-23 8:09 ` [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT) Joonsoo Kim
@ 2014-10-23 14:18 ` Christoph Lameter
2014-10-24 4:56 ` Joonsoo Kim
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2014-10-23 14:18 UTC (permalink / raw)
To: Joonsoo Kim
Cc: akpm, rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
On Thu, 23 Oct 2014, Joonsoo Kim wrote:
> Preemption disable during very short code would cause large problem for RT?
This is the hotpath and preempt enable/disable adds a significant number
of cycles.
> And, if page_address() and virt_to_head_page() remain as current patchset
> implementation, this would work worse than before.
Right.
> I looked at the patchset quickly and found another idea to remove
> preemption disable. How about just retrieving s->cpu_slab->tid first,
> before accessing s->cpu_slab, in slab_alloc() and slab_free()?
> Retrieved tid may ensure that we aren't migrated to other CPUs so that
> we can remove code for preemption disable.
You cannot do any of these things because you need the tid from the right
cpu and the scheduler can prempt you and reschedule you on another
processor at will. tid and c may be from different per cpu areas.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)
2014-10-23 14:18 ` Christoph Lameter
@ 2014-10-24 4:56 ` Joonsoo Kim
2014-10-24 14:02 ` Christoph Lameter
2014-10-24 14:41 ` Christoph Lameter
0 siblings, 2 replies; 15+ messages in thread
From: Joonsoo Kim @ 2014-10-24 4:56 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
On Thu, Oct 23, 2014 at 09:18:29AM -0500, Christoph Lameter wrote:
> On Thu, 23 Oct 2014, Joonsoo Kim wrote:
>
> > Preemption disable during very short code would cause large problem for RT?
>
> This is the hotpath and preempt enable/disable adds a significant number
> of cycles.
>
> > And, if page_address() and virt_to_head_page() remain as current patchset
> > implementation, this would work worse than before.
>
> Right.
>
> > I looked at the patchset quickly and found another idea to remove
> > preemption disable. How about just retrieving s->cpu_slab->tid first,
> > before accessing s->cpu_slab, in slab_alloc() and slab_free()?
> > Retrieved tid may ensure that we aren't migrated to other CPUs so that
> > we can remove code for preemption disable.
>
> You cannot do any of these things because you need the tid from the right
> cpu and the scheduler can prempt you and reschedule you on another
> processor at will. tid and c may be from different per cpu areas.
I found that you said retrieving tid first is sufficient to do
things right in old discussion. :)
https://lkml.org/lkml/2013/1/18/430
Think about following 4 examples.
TID CPU_CACHE CMPX_DOUBLE
1. cpu0 cpu0 cpu0
2. cpu0 cpu0 cpu1
3. cpu0 cpu1 cpu0
4. cpu0 cpu1 cpu1
1) has no problem and will succeed.
2, 4) would be failed due to tid mismatch.
Only complicated case is scenario 3).
In this case, object from cpu1's cpu_cache should be
different with cpu0's, so allocation would be failed.
Only problem of this method is that it's not easy to understand.
Am I missing something?
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)
2014-10-24 4:56 ` Joonsoo Kim
@ 2014-10-24 14:02 ` Christoph Lameter
2014-10-27 7:54 ` Joonsoo Kim
2014-10-24 14:41 ` Christoph Lameter
1 sibling, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2014-10-24 14:02 UTC (permalink / raw)
To: Joonsoo Kim
Cc: akpm, rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
On Fri, 24 Oct 2014, Joonsoo Kim wrote:
> In this case, object from cpu1's cpu_cache should be
> different with cpu0's, so allocation would be failed.
That is true for most object pointers unless the value is NULL. Which it
can be. But if this is the only case then the second patch + your approach
would work too.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)
2014-10-24 4:56 ` Joonsoo Kim
2014-10-24 14:02 ` Christoph Lameter
@ 2014-10-24 14:41 ` Christoph Lameter
2014-10-27 7:58 ` Joonsoo Kim
1 sibling, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2014-10-24 14:41 UTC (permalink / raw)
To: Joonsoo Kim
Cc: akpm, rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
> I found that you said retrieving tid first is sufficient to do
> things right in old discussion. :)
Right but the tid can be obtained from a different processor.
One other aspect of this patchset is that it reduces the cache footprint
of the alloc and free functions. This typically results in a performance
increase for the allocator. If we can avoid the page_address() and
virt_to_head_page() stuff that is required because we drop the ->page
field in a sufficient number of places then this may be a benefit that
goes beyond the RT and CONFIG_PREEMPT case.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)
2014-10-24 14:02 ` Christoph Lameter
@ 2014-10-27 7:54 ` Joonsoo Kim
0 siblings, 0 replies; 15+ messages in thread
From: Joonsoo Kim @ 2014-10-27 7:54 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
On Fri, Oct 24, 2014 at 09:02:18AM -0500, Christoph Lameter wrote:
> On Fri, 24 Oct 2014, Joonsoo Kim wrote:
>
> > In this case, object from cpu1's cpu_cache should be
> > different with cpu0's, so allocation would be failed.
>
> That is true for most object pointers unless the value is NULL. Which it
> can be. But if this is the only case then the second patch + your approach
> would work too.
Indeed... I missed the null value case.
Your second patch + mine would fix that situation, but, I need more
thinking. :)
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)
2014-10-24 14:41 ` Christoph Lameter
@ 2014-10-27 7:58 ` Joonsoo Kim
2014-10-27 13:53 ` Christoph Lameter
0 siblings, 1 reply; 15+ messages in thread
From: Joonsoo Kim @ 2014-10-27 7:58 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
On Fri, Oct 24, 2014 at 09:41:49AM -0500, Christoph Lameter wrote:
> > I found that you said retrieving tid first is sufficient to do
> > things right in old discussion. :)
>
> Right but the tid can be obtained from a different processor.
>
>
> One other aspect of this patchset is that it reduces the cache footprint
> of the alloc and free functions. This typically results in a performance
> increase for the allocator. If we can avoid the page_address() and
> virt_to_head_page() stuff that is required because we drop the ->page
> field in a sufficient number of places then this may be a benefit that
> goes beyond the RT and CONFIG_PREEMPT case.
Yeah... if we can avoid those function calls, it would be good.
But, current struct kmem_cache_cpu occupies just 32 bytes on 64 bits
machine, and, that means just 1 cacheline. Reducing size of struct may have
no remarkable performance benefit in this case.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)
2014-10-27 7:58 ` Joonsoo Kim
@ 2014-10-27 13:53 ` Christoph Lameter
0 siblings, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2014-10-27 13:53 UTC (permalink / raw)
To: Joonsoo Kim
Cc: akpm, rostedt, linux-kernel, Thomas Gleixner, linux-mm, penberg,
iamjoonsoo
On Mon, 27 Oct 2014, Joonsoo Kim wrote:
> > One other aspect of this patchset is that it reduces the cache footprint
> > of the alloc and free functions. This typically results in a performance
> > increase for the allocator. If we can avoid the page_address() and
> > virt_to_head_page() stuff that is required because we drop the ->page
> > field in a sufficient number of places then this may be a benefit that
> > goes beyond the RT and CONFIG_PREEMPT case.
>
> Yeah... if we can avoid those function calls, it would be good.
One trick that may be possible is to have an address mask for the
page_address. If a pointer satisfies the mask requuirements then its on
the right page and we do not need to do virt_to_head_page.
> But, current struct kmem_cache_cpu occupies just 32 bytes on 64 bits
> machine, and, that means just 1 cacheline. Reducing size of struct may have
> no remarkable performance benefit in this case.
Hmmm... If we also drop the partial field then a 64 byte cacheline would
fit kmem_cache_cpu structs from 4 caches. If we place them correctly then
the frequently used caches could avoid fetching up to 3 cachelines.
You are right just dropping ->page wont do anything since the
kmem_cache_cpu struct is aligned to a double word boundary.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-10-27 13:53 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-22 15:55 [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT) Christoph Lameter
2014-10-22 15:55 ` [RFC 1/4] slub: Remove __slab_alloc code duplication Christoph Lameter
2014-10-22 18:04 ` Thomas Gleixner
2014-10-22 18:27 ` Christoph Lameter
2014-10-22 15:55 ` [RFC 2/4] slub: Use end_token instead of NULL to terminate freelists Christoph Lameter
2014-10-22 15:55 ` [RFC 3/4] slub: Drop ->page field from kmem_cache_cpu Christoph Lameter
2014-10-22 15:55 ` [RFC 4/4] slub: Remove preemption disable/enable from fastpath Christoph Lameter
2014-10-23 8:09 ` [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT) Joonsoo Kim
2014-10-23 14:18 ` Christoph Lameter
2014-10-24 4:56 ` Joonsoo Kim
2014-10-24 14:02 ` Christoph Lameter
2014-10-27 7:54 ` Joonsoo Kim
2014-10-24 14:41 ` Christoph Lameter
2014-10-27 7:58 ` Joonsoo Kim
2014-10-27 13:53 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).