From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
Pekka Enberg <penberg@cs.helsinki.fi>, Tejun Heo <tj@kernel.org>,
Mel Gorman <mel@csn.ul.ie>,
mingo@elte.hu
Subject: Re: [this_cpu_xx V5 19/19] SLUB: Experimental new fastpath w/o interrupt disable
Date: Thu, 8 Oct 2009 16:37:24 -0400 [thread overview]
Message-ID: <20091008203724.GA15798@Krystal> (raw)
In-Reply-To: <alpine.DEB.1.10.0910081520210.8030@gentwo.org>
* Christoph Lameter (cl@linux-foundation.org) wrote:
> On Thu, 8 Oct 2009, Mathieu Desnoyers wrote:
>
> > OK, sounds good,
>
> Then here the full patch for review (vs. percpu-next):
>
>
> From: Christoph Lameter <cl@linux-foundation.org>
> Subject: SLUB: Experimental new fastpath w/o interrupt disable
>
> This is a bit of a different tack on things than the last version provided
> by Mathieu.
>
> Instead of using a cmpxchg we keep a state variable in the per cpu structure
> that is incremented when we enter the hot path. We can then detect that
> a thread is in the fastpath and fall back to alternate allocation / free
> technique that bypasses fastpath caching.
>
> V1->V2:
> - Use safe preempt_enable / disable.
> - Enable preempt before calling into the page allocator
> - Checkup on hotpath activity changes when returning from page allocator.
> - Add barriers.
>
> Todo:
> - Verify that this is really safe
> - Is this a benefit?
>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Cc: Pekka Enberg <penberg@cs.helsinki.fi>
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
>
>
> ---
> include/linux/slub_def.h | 1
> mm/slub.c | 112 +++++++++++++++++++++++++++++++++++++++--------
> 2 files changed, 96 insertions(+), 17 deletions(-)
>
> Index: linux-2.6/include/linux/slub_def.h
> ===================================================================
> --- linux-2.6.orig/include/linux/slub_def.h 2009-10-08 11:35:59.000000000 -0500
> +++ linux-2.6/include/linux/slub_def.h 2009-10-08 11:35:59.000000000 -0500
> @@ -38,6 +38,7 @@ struct kmem_cache_cpu {
> void **freelist; /* Pointer to first free per cpu object */
> struct page *page; /* The slab from which we are allocating */
> int node; /* The node of the page (or -1 for debug) */
> + int active; /* Active fastpaths */
> #ifdef CONFIG_SLUB_STATS
> unsigned stat[NR_SLUB_STAT_ITEMS];
> #endif
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c 2009-10-08 11:35:59.000000000 -0500
> +++ linux-2.6/mm/slub.c 2009-10-08 14:03:22.000000000 -0500
> @@ -1606,7 +1606,14 @@ static void *__slab_alloc(struct kmem_ca
> unsigned long addr)
> {
> void **object;
> - struct page *page = __this_cpu_read(s->cpu_slab->page);
> + struct page *page;
> + unsigned long flags;
> + int hotpath;
> +
> + local_irq_save(flags);
(Recommend adding)
preempt_enable_no_resched();
The preempt enable right in the middle of a big function is adding an
unnecessary barrier(), which will restrain gcc from doing its
optimizations. This might hurt performances.
I still recommend the preempt_enable_no_resched() at the beginning of
__slab_alloc(), and simply putting a check_resched() here (which saves
us the odd compiler barrier in the middle of function).
> + hotpath = __this_cpu_read(s->cpu_slab->active) != 0;
> + __this_cpu_dec(s->cpu_slab->active); /* Drop count from hotpath */
> + page = __this_cpu_read(s->cpu_slab->page);
>
> /* We handle __GFP_ZERO in the caller */
> gfpflags &= ~__GFP_ZERO;
> @@ -1627,12 +1634,24 @@ load_freelist:
> if (unlikely(SLABDEBUG && PageSlubDebug(page)))
> goto debug;
>
> - __this_cpu_write(s->cpu_slab->node, page_to_nid(page));
> - __this_cpu_write(s->cpu_slab->freelist, get_freepointer(s, object));
> - page->inuse = page->objects;
> - page->freelist = NULL;
> + if (unlikely(hotpath)) {
> + /* Object on page free list available and hotpath busy */
> + page->inuse++;
> + page->freelist = get_freepointer(s, object);
> +
> + } else {
> +
> + /* Prepare new list of objects for hotpath */
> + __this_cpu_write(s->cpu_slab->freelist, get_freepointer(s, object));
> + page->inuse = page->objects;
> + page->freelist = NULL;
> + __this_cpu_write(s->cpu_slab->node, page_to_nid(page));
> +
> + }
> +
> unlock_out:
> slab_unlock(page);
> + local_irq_restore(flags);
> stat(s, ALLOC_SLOWPATH);
> return object;
>
> @@ -1642,21 +1661,38 @@ another_slab:
> new_slab:
> page = get_partial(s, gfpflags, node);
> if (page) {
> - __this_cpu_write(s->cpu_slab->page, page);
> stat(s, ALLOC_FROM_PARTIAL);
> +
> + if (hotpath)
> + goto hot_lock;
> +
> + __this_cpu_write(s->cpu_slab->page, page);
> goto load_freelist;
> }
>
> if (gfpflags & __GFP_WAIT)
> local_irq_enable();
>
> + preempt_enable();
We could replace the above by:
if (gfpflags & __GFP_WAIT) {
local_irq_enable();
preempt_check_resched();
}
> page = new_slab(s, gfpflags, node);
> + preempt_disable();
> +
(remove the above)
> + /*
> + * We have already decremented our count. Someone else
> + * could be running right now or we were moved to a
> + * processor that is in the hotpath. So check against -1.
> + */
> + hotpath = __this_cpu_read(s->cpu_slab->active) != -1;
>
> if (gfpflags & __GFP_WAIT)
> local_irq_disable();
>
> if (page) {
> stat(s, ALLOC_SLAB);
> +
> + if (hotpath)
> + goto hot_no_lock;
> +
> if (__this_cpu_read(s->cpu_slab->page))
> flush_slab(s, __this_cpu_ptr(s->cpu_slab));
> slab_lock(page);
> @@ -1664,9 +1700,13 @@ new_slab:
> __this_cpu_write(s->cpu_slab->page, page);
> goto load_freelist;
> }
> +
> + local_irq_restore(flags);
> +
> if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
> slab_out_of_memory(s, gfpflags, node);
> return NULL;
> +
> debug:
> if (!alloc_debug_processing(s, page, object, addr))
> goto another_slab;
> @@ -1675,6 +1715,20 @@ debug:
> page->freelist = get_freepointer(s, object);
> __this_cpu_write(s->cpu_slab->node, -1);
> goto unlock_out;
> +
> + /*
> + * Hotpath is busy and we need to avoid touching
> + * hotpath variables
> + */
> +hot_no_lock:
> + slab_lock(page);
> +
> +hot_lock:
> + __ClearPageSlubFrozen(page);
> + if (get_freepointer(s, page->freelist))
> + /* Cannot put page into the hotpath. Instead back to partial */
> + add_partial(get_node(s, page_to_nid(page)), page, 0);
> + goto load_freelist;
> }
>
> /*
> @@ -1691,7 +1745,6 @@ static __always_inline void *slab_alloc(
> gfp_t gfpflags, int node, unsigned long addr)
> {
> void **object;
> - unsigned long flags;
>
> gfpflags &= gfp_allowed_mask;
>
> @@ -1701,18 +1754,27 @@ static __always_inline void *slab_alloc(
> if (should_failslab(s->objsize, gfpflags))
> return NULL;
>
> - local_irq_save(flags);
> + preempt_disable();
> +
> + irqsafe_cpu_inc(s->cpu_slab->active);
> + barrier();
> object = __this_cpu_read(s->cpu_slab->freelist);
> - if (unlikely(!object || !node_match(s, node)))
> + if (unlikely(!object || !node_match(s, node) ||
> + __this_cpu_read(s->cpu_slab->active)))
Missing a barrier() here ?
The idea is to let gcc know that "active" inc/dec and "freelist" reads
must never be reordered. Even when the decrement is done in the slow
path branch.
>
> object = __slab_alloc(s, gfpflags, node, addr);
>
> else {
> +
> __this_cpu_write(s->cpu_slab->freelist,
> get_freepointer(s, object));
> + barrier();
> + irqsafe_cpu_dec(s->cpu_slab->active);
> stat(s, ALLOC_FASTPATH);
> +
> }
> - local_irq_restore(flags);
> +
> + preempt_enable();
Could move the preempt_enable() above to the else (fast path) branch.
>
> if (unlikely((gfpflags & __GFP_ZERO) && object))
> memset(object, 0, s->objsize);
> @@ -1777,7 +1839,9 @@ static void __slab_free(struct kmem_cach
> {
> void *prior;
> void **object = (void *)x;
> + unsigned long flags;
>
> + local_irq_save(flags);
> stat(s, FREE_SLOWPATH);
> slab_lock(page);
>
> @@ -1809,6 +1873,7 @@ checks_ok:
>
> out_unlock:
> slab_unlock(page);
> + local_irq_restore(flags);
> return;
>
> slab_empty:
> @@ -1820,6 +1885,7 @@ slab_empty:
> stat(s, FREE_REMOVE_PARTIAL);
> }
> slab_unlock(page);
> + local_irq_restore(flags);
> stat(s, FREE_SLAB);
> discard_slab(s, page);
> return;
> @@ -1845,24 +1911,31 @@ static __always_inline void slab_free(st
> struct page *page, void *x, unsigned long addr)
> {
> void **object = (void *)x;
> - unsigned long flags;
>
> kmemleak_free_recursive(x, s->flags);
> - local_irq_save(flags);
> + preempt_disable();
> kmemcheck_slab_free(s, object, s->objsize);
> debug_check_no_locks_freed(object, s->objsize);
> if (!(s->flags & SLAB_DEBUG_OBJECTS))
> debug_check_no_obj_freed(object, s->objsize);
>
> + irqsafe_cpu_inc(s->cpu_slab->active);
> + barrier();
> if (likely(page == __this_cpu_read(s->cpu_slab->page) &&
> - __this_cpu_read(s->cpu_slab->node) >= 0)) {
> - set_freepointer(s, object, __this_cpu_read(s->cpu_slab->freelist));
> + __this_cpu_read(s->cpu_slab->node) >= 0) &&
> + !__this_cpu_read(s->cpu_slab->active)) {
> + set_freepointer(s, object,
> + __this_cpu_read(s->cpu_slab->freelist));
> __this_cpu_write(s->cpu_slab->freelist, object);
> + barrier();
> + irqsafe_cpu_dec(s->cpu_slab->active);
> + preempt_enable();
> stat(s, FREE_FASTPATH);
> - } else
> + } else {
Perhaps missing a barrier() in the else ?
Thanks,
Mathieu
> + irqsafe_cpu_dec(s->cpu_slab->active);
> + preempt_enable();
> __slab_free(s, page, x, addr);
> -
> - local_irq_restore(flags);
> + }
> }
>
> void kmem_cache_free(struct kmem_cache *s, void *x)
> @@ -2064,6 +2137,8 @@ static DEFINE_PER_CPU(struct kmem_cache_
>
> static inline int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
> {
> + int cpu;
> +
> if (s < kmalloc_caches + KMALLOC_CACHES && s >= kmalloc_caches)
> /*
> * Boot time creation of the kmalloc array. Use static per cpu data
> @@ -2073,6 +2148,9 @@ static inline int alloc_kmem_cache_cpus(
> else
> s->cpu_slab = alloc_percpu(struct kmem_cache_cpu);
>
> + for_each_possible_cpu(cpu)
> + per_cpu_ptr(s->cpu_slab, cpu)->active = -1;
> +
> if (!s->cpu_slab)
> return 0;
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2009-10-08 20:38 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-06 23:36 [this_cpu_xx V5 00/19] Introduce per cpu atomic operations and avoid per cpu address arithmetic cl
2009-10-06 23:36 ` [this_cpu_xx V5 01/19] Introduce this_cpu_ptr() and generic this_cpu_* operations cl
2009-10-06 23:52 ` Tejun Heo
2009-10-07 14:23 ` Christoph Lameter
2009-10-07 15:29 ` Tejun Heo
2009-10-06 23:36 ` [this_cpu_xx V5 02/19] this_cpu: X86 optimized this_cpu operations cl
2009-10-06 23:36 ` [this_cpu_xx V5 03/19] Use this_cpu operations for SNMP statistics cl
2009-10-06 23:36 ` [this_cpu_xx V5 04/19] Use this_cpu operations for NFS statistics cl
2009-10-06 23:36 ` [this_cpu_xx V5 05/19] use this_cpu ops for network statistics cl
2009-10-06 23:37 ` [this_cpu_xx V5 06/19] this_cpu_ptr: Straight transformations cl
2009-10-06 23:37 ` [this_cpu_xx V5 07/19] this_cpu_ptr: Eliminate get/put_cpu cl
2009-10-06 23:37 ` [this_cpu_xx V5 09/19] Use this_cpu_ptr in crypto subsystem cl
2009-10-06 23:37 ` [this_cpu_xx V5 10/19] Use this_cpu ops for VM statistics cl
2009-10-06 23:37 ` [this_cpu_xx V5 11/19] RCU: Use this_cpu operations cl
2009-10-06 23:37 ` [this_cpu_xx V5 12/19] this_cpu_ops: page allocator conversion cl
2009-10-06 23:37 ` [this_cpu_xx V5 13/19] this_cpu ops: Remove pageset_notifier cl
2009-10-06 23:37 ` [this_cpu_xx V5 14/19] Use this_cpu operations in slub cl
2009-10-06 23:37 ` [this_cpu_xx V5 15/19] SLUB: Get rid of dynamic DMA kmalloc cache allocation cl
2009-10-06 23:37 ` [this_cpu_xx V5 16/19] this_cpu: Remove slub kmem_cache fields cl
2009-10-06 23:37 ` [this_cpu_xx V5 17/19] Make slub statistics use this_cpu_inc cl
2009-10-06 23:37 ` [this_cpu_xx V5 18/19] this_cpu: slub aggressive use of this_cpu operations in the hotpaths cl
2009-10-06 23:37 ` [this_cpu_xx V5 19/19] SLUB: Experimental new fastpath w/o interrupt disable cl
2009-10-07 2:54 ` Mathieu Desnoyers
2009-10-07 9:11 ` Peter Zijlstra
2009-10-07 12:46 ` Mathieu Desnoyers
2009-10-07 13:01 ` Peter Zijlstra
2009-10-07 13:31 ` Mathieu Desnoyers
2009-10-07 14:37 ` Peter Zijlstra
2009-10-07 14:21 ` Christoph Lameter
2009-10-07 14:42 ` Christoph Lameter
2009-10-07 15:02 ` Mathieu Desnoyers
2009-10-07 15:05 ` Christoph Lameter
2009-10-07 15:19 ` Mathieu Desnoyers
2009-10-07 15:21 ` Christoph Lameter
2009-10-07 15:41 ` Mathieu Desnoyers
2009-10-07 16:42 ` Christoph Lameter
2009-10-07 17:12 ` Mathieu Desnoyers
2009-10-08 7:52 ` Peter Zijlstra
2009-10-08 12:44 ` Mathieu Desnoyers
2009-10-08 12:53 ` Peter Zijlstra
2009-10-08 16:17 ` Christoph Lameter
2009-10-08 17:22 ` Mathieu Desnoyers
2009-10-08 16:11 ` Christoph Lameter
2009-10-08 17:17 ` Mathieu Desnoyers
2009-10-08 17:44 ` Christoph Lameter
2009-10-08 19:17 ` Mathieu Desnoyers
2009-10-08 19:21 ` Christoph Lameter
2009-10-08 20:37 ` Mathieu Desnoyers [this message]
2009-10-08 21:08 ` Christoph Lameter
2009-10-12 13:56 ` Mathieu Desnoyers
2009-10-12 14:52 ` Christoph Lameter
2009-10-12 15:26 ` Mathieu Desnoyers
2009-10-12 15:23 ` Christoph Lameter
2009-10-12 15:38 ` Mathieu Desnoyers
2009-10-12 15:38 ` Christoph Lameter
2009-10-12 16:05 ` Mathieu Desnoyers
2009-10-07 15:25 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091008203724.GA15798@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=penberg@cs.helsinki.fi \
--cc=peterz@infradead.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.