linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RT 5/5] allow preemption in slab_alloc_node and slab_free
@ 2014-02-10 15:40 Nicholas Mc Guire
  2014-02-14 14:07 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 2+ messages in thread
From: Nicholas Mc Guire @ 2014-02-10 15:40 UTC (permalink / raw)
  To: linux-rt-users
  Cc: LKML, Sebastian Andrzej Siewior, Steven Rostedt, Peter Zijlstra,
	Carsten Emde, Thomas Gleixner, Andreas Platschek


drop preempt_disable/enable in slab_alloc_node and slab_free

__slab_alloc is only called from slub.c:slab_alloc_node
it runs with local irqs disabled so it can't be pushed off this CPU
asynchronously, the preempt_disable/enable is thus not needed.
Aside from that the later this_cpu_cmpxchg_double would catch such a 
migration event anyay.

slab_free:
 slowpath: (if the allocation was on a different CPU) detected by 
  (page == c->page) c pointing to the per cpu slab, this does not need a 
  consistent ref to tid so the slow path is safe without the 
  preempt_disable/enable
 fastpath: if allocation was on the same cpu but we got migrated between
  fetching the cpu_slab and the actual push onto the free list then
  this_cpu_cmpxchg_double would catch this case and loop in redo. So the
  fast path is also safe without the preempt_disable/enable

Testing:
 while : ; do ./hackbench 120 thread 10000 ; done 
Time: 296.631
Time: 298.723
Time: 301.468
Time: 303.880
Time: 301.988
Time: 300.038
Time: 299.634
Time: 301.488
 which seems to be a good way to stress-test slub

Impact on performance:
 The change could negatively impact performance if the removal of the 
 preempt_disable/enable would result in a significant increase of the 
 slow path being taken or looping via goto redo - this was checked by:
 static instrumentation:
  an instrumentation was added to check how often the redo loop is taken
  the results showed that the redo loop is very rarely taken (< 1 in 10000)
  and is below the value with the preempt_disable/enable present. Further
  the slowpath to fastpath ration improves slightly (not sure if this is 
  statistically significant though)
 running slab_test.c:
  the slub-benchmark from Christoph Lameter and Mathieu Desnoyers was used
  the only change being that asm/system.h was droped from the list of
  includes. The results indicate that the removal of preempt_disable/enable
  reduces the cycles needed slightly (though quite a few testsystems would
  need to be checked before this can be confirmed).

Tested-by: Andreas Platschek <platschek@ict.tuwien.ac.at>
Tested-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
---
 mm/slub.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 546bd9a..c422988 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2424,7 +2424,6 @@ redo:
 	 * on a different processor between the determination of the pointer
 	 * and the retrieval of the tid.
 	 */
-	preempt_disable();
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	/*
@@ -2434,7 +2433,6 @@ redo:
 	 * linked list in between.
 	 */
 	tid = c->tid;
-	preempt_enable();
 
 	object = c->freelist;
 	page = c->page;
@@ -2683,11 +2681,9 @@ redo:
 	 * data is retrieved via this pointer. If we are on the same cpu
 	 * during the cmpxchg then the free will succedd.
 	 */
-	preempt_disable();
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	tid = c->tid;
-	preempt_enable();
 
 	if (likely(page == c->page)) {
 		set_freepointer(s, object, c->freelist);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH RT 5/5] allow preemption in slab_alloc_node and slab_free
  2014-02-10 15:40 [PATCH RT 5/5] allow preemption in slab_alloc_node and slab_free Nicholas Mc Guire
@ 2014-02-14 14:07 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 2+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-02-14 14:07 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: linux-rt-users, LKML, Steven Rostedt, Peter Zijlstra,
	Carsten Emde, Thomas Gleixner, Andreas Platschek

* Nicholas Mc Guire | 2014-02-10 16:40:16 [+0100]:

>__slab_alloc is only called from slub.c:slab_alloc_node
>it runs with local irqs disabled so it can't be pushed off this CPU
>asynchronously, the preempt_disable/enable is thus not needed.
>Aside from that the later this_cpu_cmpxchg_double would catch such a 
>migration event anyay.

Not sure what to do with this one. You do write a longer chapter why it
is okay to drop the preemption disable section and that
this_cpu_cmpxchg_double() would catch it. And I didn't figure out so
far why need to keep preemption disabled while looking at c->tid but not
at c->page.
However, it seems that Christoph Lameter found it important to add a
note in the comment that this preemption disable here is important.
Looking at commit 7cccd80 ("slub: tid must be retrieved from the percpu
area of the current processor") it seems that Steven Rostedt run into
trouble and now we have that preemption_disable() here.

So if you really get better performance and you haven't seen anything
bad happen then you might want to check with Lameter & Rostedt about
your patch and getting it merged upstream.
The commit I mentioned is upstream since v3.11-rc1 and I can see it in
v3.8-RT tree so it looks serious.
I fail to see it in v3.2-RT, Steven, isn't this something we want there,
too?

Sebastian

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-02-14 14:07 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-10 15:40 [PATCH RT 5/5] allow preemption in slab_alloc_node and slab_free Nicholas Mc Guire
2014-02-14 14:07 ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).