[PATCH 1/4] slub: change declare of get_slab() to inline at all times

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/4] slub: change declare of get_slab() to inline at all times
       [not found] <yes>
@ 2012-06-08 17:23 ` Joonsoo Kim
  2012-06-08 17:23   ` [PATCH 2/4] slub: use __cmpxchg_double_slab() at interrupt disabled place Joonsoo Kim
                     ` (3 more replies)
  2012-06-22 18:22 ` [PATCH 1/3] slub: prefetch next freelist pointer in __slab_alloc() Joonsoo Kim
  1 sibling, 4 replies; 37+ messages in thread
From: Joonsoo Kim @ 2012-06-08 17:23 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Lameter, linux-kernel, linux-mm, Joonsoo Kim

__kmalloc and it's variants are invoked much frequently
and these are performance critical functions,
so their callee functions are declared '__always_inline'
But, currently, get_slab() isn't declared '__always_inline'.
In result, __kmalloc and it's variants call get_slab() on x86.
It is not desirable result, so change it to inline at all times.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/slub.c b/mm/slub.c
index 71de9b5..30ceb6d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3320,7 +3320,7 @@ static inline int size_index_elem(size_t bytes)
 	return (bytes - 1) / 8;
 }
 
-static struct kmem_cache *get_slab(size_t size, gfp_t flags)
+static __always_inline struct kmem_cache *get_slab(size_t size, gfp_t flags)
 {
 	int index;
 
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/4] slub: use __cmpxchg_double_slab() at interrupt disabled place
  2012-06-08 17:23 ` [PATCH 1/4] slub: change declare of get_slab() to inline at all times Joonsoo Kim
@ 2012-06-08 17:23   ` Joonsoo Kim
  2012-06-08 17:23   ` [PATCH 3/4] slub: refactoring unfreeze_partials() Joonsoo Kim
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 37+ messages in thread
From: Joonsoo Kim @ 2012-06-08 17:23 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Lameter, linux-kernel, linux-mm, Joonsoo Kim

get_freelist(), unfreeze_partials() are only called with interrupt disabled,
so __cmpxchg_double_slab() is suitable.

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/slub.c b/mm/slub.c
index 30ceb6d..686ed90 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1879,7 +1879,11 @@ redo:
 	}
 }
 
-/* Unfreeze all the cpu partial slabs */
+/*
+ * Unfreeze all the cpu partial slabs.
+ *
+ * This function must be called with interrupt disabled.
+ */
 static void unfreeze_partials(struct kmem_cache *s)
 {
 	struct kmem_cache_node *n = NULL;
@@ -1935,7 +1939,7 @@ static void unfreeze_partials(struct kmem_cache *s)
 				l = m;
 			}
 
-		} while (!cmpxchg_double_slab(s, page,
+		} while (!__cmpxchg_double_slab(s, page,
 				old.freelist, old.counters,
 				new.freelist, new.counters,
 				"unfreezing slab"));
@@ -2163,6 +2167,8 @@ static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
  * The page is still frozen if the return value is not NULL.
  *
  * If this function returns NULL then the page has been unfrozen.
+ *
+ * This function must be called with interrupt disabled.
  */
 static inline void *get_freelist(struct kmem_cache *s, struct page *page)
 {
@@ -2179,7 +2185,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
 		new.inuse = page->objects;
 		new.frozen = freelist != NULL;
 
-	} while (!cmpxchg_double_slab(s, page,
+	} while (!__cmpxchg_double_slab(s, page,
 		freelist, counters,
 		NULL, new.counters,
 		"get_freelist"));
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/4] slub: refactoring unfreeze_partials()
  2012-06-08 17:23 ` [PATCH 1/4] slub: change declare of get_slab() to inline at all times Joonsoo Kim
  2012-06-08 17:23   ` [PATCH 2/4] slub: use __cmpxchg_double_slab() at interrupt disabled place Joonsoo Kim
@ 2012-06-08 17:23   ` Joonsoo Kim
  2012-06-20  7:19     ` Pekka Enberg
  2012-06-08 17:23   ` [PATCH 4/4] slub: deactivate freelist of kmem_cache_cpu all at once in deactivate_slab() Joonsoo Kim
  2012-06-08 19:02   ` [PATCH 1/4] slub: change declare of get_slab() to inline at all times Christoph Lameter
  3 siblings, 1 reply; 37+ messages in thread
From: Joonsoo Kim @ 2012-06-08 17:23 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Lameter, linux-kernel, linux-mm, Joonsoo Kim

Current implementation of unfreeze_partials() is so complicated,
but benefit from it is insignificant. In addition many code in
do {} while loop have a bad influence to a fail rate of cmpxchg_double_slab.
Under current implementation which test status of cpu partial slab
and acquire list_lock in do {} while loop,
we don't need to acquire a list_lock and gain a little benefit
when front of the cpu partial slab is to be discarded, but this is a rare case.
In case that add_partial is performed and cmpxchg_double_slab is failed,
remove_partial should be called case by case.

I think that these are disadvantages of current implementation,
so I do refactoring unfreeze_partials().

Minimizing code in do {} while loop introduce a reduced fail rate
of cmpxchg_double_slab. Below is output of 'slabinfo -r kmalloc-256'
when './perf stat -r 33 hackbench 50 process 4000 > /dev/null' is done.

** before **
Cmpxchg_double Looping
------------------------
Locked Cmpxchg Double redos   182685
Unlocked Cmpxchg Double redos 0

** after **
Cmpxchg_double Looping
------------------------
Locked Cmpxchg Double redos   177995
Unlocked Cmpxchg Double redos 1

We can see cmpxchg_double_slab fail rate is improved slightly.

Bolow is output of './perf stat -r 30 hackbench 50 process 4000 > /dev/null'.

** before **
 Performance counter stats for './hackbench 50 process 4000' (30 runs):

     108517.190463 task-clock                #    7.926 CPUs utilized            ( +-  0.24% )
         2,919,550 context-switches          #    0.027 M/sec                    ( +-  3.07% )
           100,774 CPU-migrations            #    0.929 K/sec                    ( +-  4.72% )
           124,201 page-faults               #    0.001 M/sec                    ( +-  0.15% )
   401,500,234,387 cycles                    #    3.700 GHz                      ( +-  0.24% )
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
   250,576,913,354 instructions              #    0.62  insns per cycle          ( +-  0.13% )
    45,934,956,860 branches                  #  423.297 M/sec                    ( +-  0.14% )
       188,219,787 branch-misses             #    0.41% of all branches          ( +-  0.56% )

      13.691837307 seconds time elapsed                                          ( +-  0.24% )

** after **
 Performance counter stats for './hackbench 50 process 4000' (30 runs):

     107784.479767 task-clock                #    7.928 CPUs utilized            ( +-  0.22% )
         2,834,781 context-switches          #    0.026 M/sec                    ( +-  2.33% )
            93,083 CPU-migrations            #    0.864 K/sec                    ( +-  3.45% )
           123,967 page-faults               #    0.001 M/sec                    ( +-  0.15% )
   398,781,421,836 cycles                    #    3.700 GHz                      ( +-  0.22% )
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
   250,189,160,419 instructions              #    0.63  insns per cycle          ( +-  0.09% )
    45,855,370,128 branches                  #  425.436 M/sec                    ( +-  0.10% )
       169,881,248 branch-misses             #    0.37% of all branches          ( +-  0.43% )

      13.596272341 seconds time elapsed                                          ( +-  0.22% )

No regression is found, but rather we can see slightly better result.

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/slub.c b/mm/slub.c
index 686ed90..b5f2108 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1886,18 +1886,24 @@ redo:
  */
 static void unfreeze_partials(struct kmem_cache *s)
 {
-	struct kmem_cache_node *n = NULL;
+	struct kmem_cache_node *n = NULL, *n2 = NULL;
 	struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
 	struct page *page, *discard_page = NULL;
 
 	while ((page = c->partial)) {
-		enum slab_modes { M_PARTIAL, M_FREE };
-		enum slab_modes l, m;
 		struct page new;
 		struct page old;
 
 		c->partial = page->next;
-		l = M_FREE;
+
+		n2 = get_node(s, page_to_nid(page));
+		if (n != n2) {
+			if (n)
+				spin_unlock(&n->list_lock);
+
+			n = n2;
+			spin_lock(&n->list_lock);
+		}
 
 		do {
 
@@ -1910,43 +1916,17 @@ static void unfreeze_partials(struct kmem_cache *s)
 
 			new.frozen = 0;
 
-			if (!new.inuse && (!n || n->nr_partial > s->min_partial))
-				m = M_FREE;
-			else {
-				struct kmem_cache_node *n2 = get_node(s,
-							page_to_nid(page));
-
-				m = M_PARTIAL;
-				if (n != n2) {
-					if (n)
-						spin_unlock(&n->list_lock);
-
-					n = n2;
-					spin_lock(&n->list_lock);
-				}
-			}
-
-			if (l != m) {
-				if (l == M_PARTIAL) {
-					remove_partial(n, page);
-					stat(s, FREE_REMOVE_PARTIAL);
-				} else {
-					add_partial(n, page,
-						DEACTIVATE_TO_TAIL);
-					stat(s, FREE_ADD_PARTIAL);
-				}
-
-				l = m;
-			}
-
 		} while (!__cmpxchg_double_slab(s, page,
 				old.freelist, old.counters,
 				new.freelist, new.counters,
 				"unfreezing slab"));
 
-		if (m == M_FREE) {
+		if (unlikely(!new.inuse && n->nr_partial > s->min_partial)) {
 			page->next = discard_page;
 			discard_page = page;
+		} else {
+			add_partial(n, page, DEACTIVATE_TO_TAIL);
+			stat(s, FREE_ADD_PARTIAL);
 		}
 	}
 
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/4] slub: deactivate freelist of kmem_cache_cpu all at once in deactivate_slab()
  2012-06-08 17:23 ` [PATCH 1/4] slub: change declare of get_slab() to inline at all times Joonsoo Kim
  2012-06-08 17:23   ` [PATCH 2/4] slub: use __cmpxchg_double_slab() at interrupt disabled place Joonsoo Kim
  2012-06-08 17:23   ` [PATCH 3/4] slub: refactoring unfreeze_partials() Joonsoo Kim
@ 2012-06-08 17:23   ` Joonsoo Kim
  2012-06-08 19:04     ` Christoph Lameter
  2012-06-08 19:02   ` [PATCH 1/4] slub: change declare of get_slab() to inline at all times Christoph Lameter
  3 siblings, 1 reply; 37+ messages in thread
From: Joonsoo Kim @ 2012-06-08 17:23 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Lameter, linux-kernel, linux-mm, Joonsoo Kim

Current implementation of deactivate_slab() which deactivate
freelist of kmem_cache_cpu one by one is inefficient.
This patch changes it to deactivate freelist all at once.
But, there is no overall performance benefit,
because deactivate_slab() is invoked infrequently.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/slub.c b/mm/slub.c
index b5f2108..7bcb434 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1733,16 +1733,14 @@ void init_kmem_cache_cpus(struct kmem_cache *s)
  */
 static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
-	enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE };
 	struct page *page = c->page;
 	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
-	int lock = 0;
-	enum slab_modes l = M_NONE, m = M_NONE;
-	void *freelist;
-	void *nextfree;
-	int tail = DEACTIVATE_TO_HEAD;
+	void *freelist, *lastfree = NULL;
+	unsigned int nr_free = 0;
 	struct page new;
-	struct page old;
+	void *prior;
+	unsigned long counters;
+	int lock = 0, tail = DEACTIVATE_TO_HEAD;
 
 	if (page->freelist) {
 		stat(s, DEACTIVATE_REMOTE_FREES);
@@ -1752,127 +1750,54 @@ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 	c->tid = next_tid(c->tid);
 	c->page = NULL;
 	freelist = c->freelist;
-	c->freelist = NULL;
-
-	/*
-	 * Stage one: Free all available per cpu objects back
-	 * to the page freelist while it is still frozen. Leave the
-	 * last one.
-	 *
-	 * There is no need to take the list->lock because the page
-	 * is still frozen.
-	 */
-	while (freelist && (nextfree = get_freepointer(s, freelist))) {
-		void *prior;
-		unsigned long counters;
-
-		do {
-			prior = page->freelist;
-			counters = page->counters;
-			set_freepointer(s, freelist, prior);
-			new.counters = counters;
-			new.inuse--;
-			VM_BUG_ON(!new.frozen);
-
-		} while (!__cmpxchg_double_slab(s, page,
-			prior, counters,
-			freelist, new.counters,
-			"drain percpu freelist"));
-
-		freelist = nextfree;
+	while (freelist) {
+		lastfree = freelist;
+		freelist = get_freepointer(s, freelist);
+		nr_free++;
 	}
 
-	/*
-	 * Stage two: Ensure that the page is unfrozen while the
-	 * list presence reflects the actual number of objects
-	 * during unfreeze.
-	 *
-	 * We setup the list membership and then perform a cmpxchg
-	 * with the count. If there is a mismatch then the page
-	 * is not unfrozen but the page is on the wrong list.
-	 *
-	 * Then we restart the process which may have to remove
-	 * the page from the list that we just put it on again
-	 * because the number of objects in the slab may have
-	 * changed.
-	 */
-redo:
+	freelist = c->freelist;
+	c->freelist = NULL;
 
-	old.freelist = page->freelist;
-	old.counters = page->counters;
-	VM_BUG_ON(!old.frozen);
+	do {
+		if (lock) {
+			lock = 0;
+			spin_unlock(&n->list_lock);
+		}
 
-	/* Determine target state of the slab */
-	new.counters = old.counters;
-	if (freelist) {
-		new.inuse--;
-		set_freepointer(s, freelist, old.freelist);
-		new.freelist = freelist;
-	} else
-		new.freelist = old.freelist;
+		prior = page->freelist;
+		counters = page->counters;
 
-	new.frozen = 0;
+		if (lastfree)
+			set_freepointer(s, lastfree, prior);
+		else
+			freelist = prior;
 
-	if (!new.inuse && n->nr_partial > s->min_partial)
-		m = M_FREE;
-	else if (new.freelist) {
-		m = M_PARTIAL;
-		if (!lock) {
-			lock = 1;
-			/*
-			 * Taking the spinlock removes the possiblity
-			 * that acquire_slab() will see a slab page that
-			 * is frozen
-			 */
-			spin_lock(&n->list_lock);
-		}
-	} else {
-		m = M_FULL;
-		if (kmem_cache_debug(s) && !lock) {
+		new.counters = counters;
+		VM_BUG_ON(!new.frozen);
+		new.inuse -= nr_free;
+		new.frozen = 0;
+
+		if (new.inuse || n->nr_partial <= s->min_partial) {
 			lock = 1;
-			/*
-			 * This also ensures that the scanning of full
-			 * slabs from diagnostic functions will not see
-			 * any frozen slabs.
-			 */
 			spin_lock(&n->list_lock);
 		}
-	}
-
-	if (l != m) {
-
-		if (l == M_PARTIAL)
-
-			remove_partial(n, page);
-
-		else if (l == M_FULL)
-
-			remove_full(s, page);
-
-		if (m == M_PARTIAL) {
 
+	} while (!__cmpxchg_double_slab(s, page,
+				prior, counters,
+				freelist, new.counters,
+				"drain percpu freelist"));
+	if (lock) {
+		if (kmem_cache_debug(s) && !freelist) {
+			add_full(s, n, page);
+			stat(s, DEACTIVATE_FULL);
+		} else {
 			add_partial(n, page, tail);
 			stat(s, tail);
-
-		} else if (m == M_FULL) {
-
-			stat(s, DEACTIVATE_FULL);
-			add_full(s, n, page);
-
 		}
-	}
-
-	l = m;
-	if (!__cmpxchg_double_slab(s, page,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
-				"unfreezing slab"))
-		goto redo;
-
-	if (lock)
 		spin_unlock(&n->list_lock);
-
-	if (m == M_FREE) {
+	} else {
+		VM_BUG_ON(new.inuse);
 		stat(s, DEACTIVATE_EMPTY);
 		discard_slab(s, page);
 		stat(s, FREE_SLAB);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] slub: change declare of get_slab() to inline at all times
  2012-06-08 17:23 ` [PATCH 1/4] slub: change declare of get_slab() to inline at all times Joonsoo Kim
                     ` (2 preceding siblings ...)
  2012-06-08 17:23   ` [PATCH 4/4] slub: deactivate freelist of kmem_cache_cpu all at once in deactivate_slab() Joonsoo Kim
@ 2012-06-08 19:02   ` Christoph Lameter
  2012-06-09 15:57     ` JoonSoo Kim
  3 siblings, 1 reply; 37+ messages in thread
From: Christoph Lameter @ 2012-06-08 19:02 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm

On Sat, 9 Jun 2012, Joonsoo Kim wrote:

> -static struct kmem_cache *get_slab(size_t size, gfp_t flags)
> +static __always_inline struct kmem_cache *get_slab(size_t size, gfp_t flags)

I thought that the compiler felt totally free to inline static functions
at will? This may be a matter of compiler optimization settings. I can
understand the use of always_inline in a header file but why in a .c file?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/4] slub: deactivate freelist of kmem_cache_cpu all at once in deactivate_slab()
  2012-06-08 17:23   ` [PATCH 4/4] slub: deactivate freelist of kmem_cache_cpu all at once in deactivate_slab() Joonsoo Kim
@ 2012-06-08 19:04     ` Christoph Lameter
  2012-06-10 10:27       ` JoonSoo Kim
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Lameter @ 2012-06-08 19:04 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm

On Sat, 9 Jun 2012, Joonsoo Kim wrote:

> Current implementation of deactivate_slab() which deactivate
> freelist of kmem_cache_cpu one by one is inefficient.
> This patch changes it to deactivate freelist all at once.
> But, there is no overall performance benefit,
> because deactivate_slab() is invoked infrequently.

Hmm, deactivate freelist can race with slab_free. Need to look at this in
detail.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] slub: change declare of get_slab() to inline at all times
  2012-06-08 19:02   ` [PATCH 1/4] slub: change declare of get_slab() to inline at all times Christoph Lameter
@ 2012-06-09 15:57     ` JoonSoo Kim
  2012-06-11 15:04       ` Christoph Lameter
  0 siblings, 1 reply; 37+ messages in thread
From: JoonSoo Kim @ 2012-06-09 15:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, linux-kernel, linux-mm

2012/6/9 Christoph Lameter <cl@linux.com>:
> On Sat, 9 Jun 2012, Joonsoo Kim wrote:
>
>> -static struct kmem_cache *get_slab(size_t size, gfp_t flags)
>> +static __always_inline struct kmem_cache *get_slab(size_t size, gfp_t flags)
>
> I thought that the compiler felt totally free to inline static functions
> at will? This may be a matter of compiler optimization settings. I can
> understand the use of always_inline in a header file but why in a .c file?

Yes, but the compiler with -O2 doesn't inline get_slab() which
declared just 'static'.
I think that inlining get_slab() have a performance benefit, so add
'__always_inline' to declare of get_slab().
Other functions like slab_alloc, slab_free also use 'always_inline' in
.c file (slub.c)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/4] slub: deactivate freelist of kmem_cache_cpu all at once in deactivate_slab()
  2012-06-08 19:04     ` Christoph Lameter
@ 2012-06-10 10:27       ` JoonSoo Kim
  2012-06-22 18:34         ` JoonSoo Kim
  0 siblings, 1 reply; 37+ messages in thread
From: JoonSoo Kim @ 2012-06-10 10:27 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, linux-kernel, linux-mm

2012/6/9 Christoph Lameter <cl@linux.com>:
> On Sat, 9 Jun 2012, Joonsoo Kim wrote:
>
>> Current implementation of deactivate_slab() which deactivate
>> freelist of kmem_cache_cpu one by one is inefficient.
>> This patch changes it to deactivate freelist all at once.
>> But, there is no overall performance benefit,
>> because deactivate_slab() is invoked infrequently.
>
> Hmm, deactivate freelist can race with slab_free. Need to look at this in
> detail.

Implemented logic is nearly same as previous one.
I just merge first step of previous deactivate_slab() with second one.
In case of failure of cmpxchg_double_slab(), reloading page->freelist,
page->counters and recomputing inuse
ensure that race with slab_free() cannot be possible.
In case that we need a lock, try to get a lock before invoking
cmpxchg_double_slab(),
so race with slab_free cannot be occured too.

Above is my humble opinion, please give me some comments.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] slub: change declare of get_slab() to inline at all times
  2012-06-09 15:57     ` JoonSoo Kim
@ 2012-06-11 15:04       ` Christoph Lameter
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Lameter @ 2012-06-11 15:04 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm

On Sun, 10 Jun 2012, JoonSoo Kim wrote:

> 2012/6/9 Christoph Lameter <cl@linux.com>:
> > On Sat, 9 Jun 2012, Joonsoo Kim wrote:
> >
> >> -static struct kmem_cache *get_slab(size_t size, gfp_t flags)
> >> +static __always_inline struct kmem_cache *get_slab(size_t size, gfp_t flags)
> >
> > I thought that the compiler felt totally free to inline static functions
> > at will? This may be a matter of compiler optimization settings. I can
> > understand the use of always_inline in a header file but why in a .c file?
>
> Yes, but the compiler with -O2 doesn't inline get_slab() which
> declared just 'static'.
> I think that inlining get_slab() have a performance benefit, so add
> '__always_inline' to declare of get_slab().
> Other functions like slab_alloc, slab_free also use 'always_inline' in
> .c file (slub.c)

Yea I thought about removing those since I would think that the compiler
should be doing the right thing.

Does gcc inline with higher optimization settings?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] slub: refactoring unfreeze_partials()
  2012-06-08 17:23   ` [PATCH 3/4] slub: refactoring unfreeze_partials() Joonsoo Kim
@ 2012-06-20  7:19     ` Pekka Enberg
  0 siblings, 0 replies; 37+ messages in thread
From: Pekka Enberg @ 2012-06-20  7:19 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Christoph Lameter, linux-kernel, linux-mm, David Rientjes

On Fri, Jun 8, 2012 at 8:23 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> Current implementation of unfreeze_partials() is so complicated,
> but benefit from it is insignificant. In addition many code in
> do {} while loop have a bad influence to a fail rate of cmpxchg_double_slab.
> Under current implementation which test status of cpu partial slab
> and acquire list_lock in do {} while loop,
> we don't need to acquire a list_lock and gain a little benefit
> when front of the cpu partial slab is to be discarded, but this is a rare case.
> In case that add_partial is performed and cmpxchg_double_slab is failed,
> remove_partial should be called case by case.
>
> I think that these are disadvantages of current implementation,
> so I do refactoring unfreeze_partials().
>
> Minimizing code in do {} while loop introduce a reduced fail rate
> of cmpxchg_double_slab. Below is output of 'slabinfo -r kmalloc-256'
> when './perf stat -r 33 hackbench 50 process 4000 > /dev/null' is done.
>
> ** before **
> Cmpxchg_double Looping
> ------------------------
> Locked Cmpxchg Double redos   182685
> Unlocked Cmpxchg Double redos 0
>
> ** after **
> Cmpxchg_double Looping
> ------------------------
> Locked Cmpxchg Double redos   177995
> Unlocked Cmpxchg Double redos 1
>
> We can see cmpxchg_double_slab fail rate is improved slightly.
>
> Bolow is output of './perf stat -r 30 hackbench 50 process 4000 > /dev/null'.
>
> ** before **
>  Performance counter stats for './hackbench 50 process 4000' (30 runs):
>
>     108517.190463 task-clock                #    7.926 CPUs utilized            ( +-  0.24% )
>         2,919,550 context-switches          #    0.027 M/sec                    ( +-  3.07% )
>           100,774 CPU-migrations            #    0.929 K/sec                    ( +-  4.72% )
>           124,201 page-faults               #    0.001 M/sec                    ( +-  0.15% )
>   401,500,234,387 cycles                    #    3.700 GHz                      ( +-  0.24% )
>   <not supported> stalled-cycles-frontend
>   <not supported> stalled-cycles-backend
>   250,576,913,354 instructions              #    0.62  insns per cycle          ( +-  0.13% )
>    45,934,956,860 branches                  #  423.297 M/sec                    ( +-  0.14% )
>       188,219,787 branch-misses             #    0.41% of all branches          ( +-  0.56% )
>
>      13.691837307 seconds time elapsed                                          ( +-  0.24% )
>
> ** after **
>  Performance counter stats for './hackbench 50 process 4000' (30 runs):
>
>     107784.479767 task-clock                #    7.928 CPUs utilized            ( +-  0.22% )
>         2,834,781 context-switches          #    0.026 M/sec                    ( +-  2.33% )
>            93,083 CPU-migrations            #    0.864 K/sec                    ( +-  3.45% )
>           123,967 page-faults               #    0.001 M/sec                    ( +-  0.15% )
>   398,781,421,836 cycles                    #    3.700 GHz                      ( +-  0.22% )
>   <not supported> stalled-cycles-frontend
>   <not supported> stalled-cycles-backend
>   250,189,160,419 instructions              #    0.63  insns per cycle          ( +-  0.09% )
>    45,855,370,128 branches                  #  425.436 M/sec                    ( +-  0.10% )
>       169,881,248 branch-misses             #    0.37% of all branches          ( +-  0.43% )
>
>      13.596272341 seconds time elapsed                                          ( +-  0.22% )
>
> No regression is found, but rather we can see slightly better result.
>
> Acked-by: Christoph Lameter <cl@linux.com>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>

Applied, thanks!

> diff --git a/mm/slub.c b/mm/slub.c
> index 686ed90..b5f2108 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1886,18 +1886,24 @@ redo:
>  */
>  static void unfreeze_partials(struct kmem_cache *s)
>  {
> -       struct kmem_cache_node *n = NULL;
> +       struct kmem_cache_node *n = NULL, *n2 = NULL;
>        struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
>        struct page *page, *discard_page = NULL;
>
>        while ((page = c->partial)) {
> -               enum slab_modes { M_PARTIAL, M_FREE };
> -               enum slab_modes l, m;
>                struct page new;
>                struct page old;
>
>                c->partial = page->next;
> -               l = M_FREE;
> +
> +               n2 = get_node(s, page_to_nid(page));
> +               if (n != n2) {
> +                       if (n)
> +                               spin_unlock(&n->list_lock);
> +
> +                       n = n2;
> +                       spin_lock(&n->list_lock);
> +               }
>
>                do {
>
> @@ -1910,43 +1916,17 @@ static void unfreeze_partials(struct kmem_cache *s)
>
>                        new.frozen = 0;
>
> -                       if (!new.inuse && (!n || n->nr_partial > s->min_partial))
> -                               m = M_FREE;
> -                       else {
> -                               struct kmem_cache_node *n2 = get_node(s,
> -                                                       page_to_nid(page));
> -
> -                               m = M_PARTIAL;
> -                               if (n != n2) {
> -                                       if (n)
> -                                               spin_unlock(&n->list_lock);
> -
> -                                       n = n2;
> -                                       spin_lock(&n->list_lock);
> -                               }
> -                       }
> -
> -                       if (l != m) {
> -                               if (l == M_PARTIAL) {
> -                                       remove_partial(n, page);
> -                                       stat(s, FREE_REMOVE_PARTIAL);
> -                               } else {
> -                                       add_partial(n, page,
> -                                               DEACTIVATE_TO_TAIL);
> -                                       stat(s, FREE_ADD_PARTIAL);
> -                               }
> -
> -                               l = m;
> -                       }
> -
>                } while (!__cmpxchg_double_slab(s, page,
>                                old.freelist, old.counters,
>                                new.freelist, new.counters,
>                                "unfreezing slab"));
>
> -               if (m == M_FREE) {
> +               if (unlikely(!new.inuse && n->nr_partial > s->min_partial)) {
>                        page->next = discard_page;
>                        discard_page = page;
> +               } else {
> +                       add_partial(n, page, DEACTIVATE_TO_TAIL);
> +                       stat(s, FREE_ADD_PARTIAL);
>                }
>        }
>
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/3] slub: prefetch next freelist pointer in __slab_alloc()
       [not found] <yes>
  2012-06-08 17:23 ` [PATCH 1/4] slub: change declare of get_slab() to inline at all times Joonsoo Kim
@ 2012-06-22 18:22 ` Joonsoo Kim
  2012-06-22 18:22   ` [PATCH 2/3] slub: reduce failure of this_cpu_cmpxchg in put_cpu_partial() after unfreezing Joonsoo Kim
                     ` (2 more replies)
  1 sibling, 3 replies; 37+ messages in thread
From: Joonsoo Kim @ 2012-06-22 18:22 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Lameter, linux-kernel, linux-mm, Joonsoo Kim

Commit 0ad9500e16fe24aa55809a2b00e0d2d0e658fc71 ('slub: prefetch
next freelist pointer in slab_alloc') add prefetch instruction to
fast path of allocation.

Same benefit is also available in slow path of allocation, but it is not
large portion of overall allocation. Nevertheless we could get
some benifit from it, so prefetch next freelist pointer in __slab_alloc.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/slub.c b/mm/slub.c
index f96d8bc..92f1c0e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2248,6 +2248,7 @@ load_freelist:
 	VM_BUG_ON(!c->page->frozen);
 	c->freelist = get_freepointer(s, freelist);
 	c->tid = next_tid(c->tid);
+	prefetch_freepointer(s, c->freelist);
 	local_irq_restore(flags);
 	return freelist;
 
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/3] slub: reduce failure of this_cpu_cmpxchg in put_cpu_partial() after unfreezing
  2012-06-22 18:22 ` [PATCH 1/3] slub: prefetch next freelist pointer in __slab_alloc() Joonsoo Kim
@ 2012-06-22 18:22   ` Joonsoo Kim
  2012-07-04 13:05     ` Pekka Enberg
  2012-08-16  7:06     ` Pekka Enberg
  2012-06-22 18:22   ` [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free() Joonsoo Kim
  2012-06-22 18:45   ` [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc() Joonsoo Kim
  2 siblings, 2 replies; 37+ messages in thread
From: Joonsoo Kim @ 2012-06-22 18:22 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Lameter, linux-kernel, linux-mm, Joonsoo Kim

In current implementation, after unfreezing, we doesn't touch oldpage,
so it remain 'NOT NULL'. When we call this_cpu_cmpxchg()
with this old oldpage, this_cpu_cmpxchg() is mostly be failed.

We can change value of oldpage to NULL after unfreezing,
because unfreeze_partial() ensure that all the cpu partial slabs is removed
from cpu partial list. In this time, we could expect that
this_cpu_cmpxchg is mostly succeed.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/slub.c b/mm/slub.c
index 92f1c0e..531d8ed 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1968,6 +1968,7 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 				local_irq_save(flags);
 				unfreeze_partials(s);
 				local_irq_restore(flags);
+				oldpage = NULL;
 				pobjects = 0;
 				pages = 0;
 				stat(s, CPU_PARTIAL_DRAIN);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free()
  2012-06-22 18:22 ` [PATCH 1/3] slub: prefetch next freelist pointer in __slab_alloc() Joonsoo Kim
  2012-06-22 18:22   ` [PATCH 2/3] slub: reduce failure of this_cpu_cmpxchg in put_cpu_partial() after unfreezing Joonsoo Kim
@ 2012-06-22 18:22   ` Joonsoo Kim
  2012-07-04 13:10     ` Pekka Enberg
  2012-07-05 14:26     ` Christoph Lameter
  2012-06-22 18:45   ` [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc() Joonsoo Kim
  2 siblings, 2 replies; 37+ messages in thread
From: Joonsoo Kim @ 2012-06-22 18:22 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Lameter, linux-kernel, linux-mm, Joonsoo Kim

In some case of __slab_free(), we need a lock for manipulating partial list.
If freeing object with a lock is failed, a lock doesn't needed anymore
for some reasons.

Case 1. prior is NULL, kmem_cache_debug(s) is true

In this case, another free is occured before our free is succeed.
When slab is full(prior is NULL), only possible operation is slab_free().
So in this case, we guess another free is occured.
It may make a slab frozen, so lock is not needed anymore.

Case 2. inuse is NULL

In this case, acquire_slab() is occured before out free is succeed.
We have a last object for slab, so other operation for this slab is
not possible except acquire_slab().
Acquire_slab() makes a slab frozen, so lock is not needed anymore.

Above two reason explain why we don't need a lock
when freeing object with a lock is failed.

So, when cmpxchg_double_slab() is failed, releasing a lock is more suitable.
This may reduce lock contention.

This also make logic somehow simple that 'was_frozen with a lock' case
is never occured. Remove it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/slub.c b/mm/slub.c
index 531d8ed..3e0b9db 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2438,7 +2438,6 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 	void *prior;
 	void **object = (void *)x;
 	int was_frozen;
-	int inuse;
 	struct page new;
 	unsigned long counters;
 	struct kmem_cache_node *n = NULL;
@@ -2450,13 +2449,17 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 		return;
 
 	do {
+		if (unlikely(n)) {
+			spin_unlock_irqrestore(&n->list_lock, flags);
+			n = NULL;
+		}
 		prior = page->freelist;
 		counters = page->counters;
 		set_freepointer(s, object, prior);
 		new.counters = counters;
 		was_frozen = new.frozen;
 		new.inuse--;
-		if ((!new.inuse || !prior) && !was_frozen && !n) {
+		if ((!new.inuse || !prior) && !was_frozen) {
 
 			if (!kmem_cache_debug(s) && !prior)
 
@@ -2481,7 +2484,6 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 
 			}
 		}
-		inuse = new.inuse;
 
 	} while (!cmpxchg_double_slab(s, page,
 		prior, counters,
@@ -2507,25 +2509,17 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
                 return;
         }
 
+	if (unlikely(!new.inuse && n->nr_partial > s->min_partial))
+		goto slab_empty;
+
 	/*
-	 * was_frozen may have been set after we acquired the list_lock in
-	 * an earlier loop. So we need to check it here again.
+	 * Objects left in the slab. If it was not on the partial list before
+	 * then add it.
 	 */
-	if (was_frozen)
-		stat(s, FREE_FROZEN);
-	else {
-		if (unlikely(!inuse && n->nr_partial > s->min_partial))
-                        goto slab_empty;
-
-		/*
-		 * Objects left in the slab. If it was not on the partial list before
-		 * then add it.
-		 */
-		if (unlikely(!prior)) {
-			remove_full(s, page);
-			add_partial(n, page, DEACTIVATE_TO_TAIL);
-			stat(s, FREE_ADD_PARTIAL);
-		}
+	if (kmem_cache_debug(s) && unlikely(!prior)) {
+		remove_full(s, page);
+		add_partial(n, page, DEACTIVATE_TO_TAIL);
+		stat(s, FREE_ADD_PARTIAL);
 	}
 	spin_unlock_irqrestore(&n->list_lock, flags);
 	return;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/4] slub: deactivate freelist of kmem_cache_cpu all at once in deactivate_slab()
  2012-06-10 10:27       ` JoonSoo Kim
@ 2012-06-22 18:34         ` JoonSoo Kim
  0 siblings, 0 replies; 37+ messages in thread
From: JoonSoo Kim @ 2012-06-22 18:34 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, linux-kernel, linux-mm

2012/6/10 JoonSoo Kim <js1304@gmail.com>:
> 2012/6/9 Christoph Lameter <cl@linux.com>:
>> On Sat, 9 Jun 2012, Joonsoo Kim wrote:
>>
>>> Current implementation of deactivate_slab() which deactivate
>>> freelist of kmem_cache_cpu one by one is inefficient.
>>> This patch changes it to deactivate freelist all at once.
>>> But, there is no overall performance benefit,
>>> because deactivate_slab() is invoked infrequently.
>>
>> Hmm, deactivate freelist can race with slab_free. Need to look at this in
>> detail.
>
> Implemented logic is nearly same as previous one.
> I just merge first step of previous deactivate_slab() with second one.
> In case of failure of cmpxchg_double_slab(), reloading page->freelist,
> page->counters and recomputing inuse
> ensure that race with slab_free() cannot be possible.
> In case that we need a lock, try to get a lock before invoking
> cmpxchg_double_slab(),
> so race with slab_free cannot be occured too.
>
> Above is my humble opinion, please give me some comments.

Hi Pekka and Christoph.
Could you give me some comments about this, please?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc()
  2012-06-22 18:22 ` [PATCH 1/3] slub: prefetch next freelist pointer in __slab_alloc() Joonsoo Kim
  2012-06-22 18:22   ` [PATCH 2/3] slub: reduce failure of this_cpu_cmpxchg in put_cpu_partial() after unfreezing Joonsoo Kim
  2012-06-22 18:22   ` [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free() Joonsoo Kim
@ 2012-06-22 18:45   ` Joonsoo Kim
  2012-07-04 12:58     ` JoonSoo Kim
  2012-07-04 13:00     ` Pekka Enberg
  2 siblings, 2 replies; 37+ messages in thread
From: Joonsoo Kim @ 2012-06-22 18:45 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, linux-kernel, linux-mm, Eric Dumazet,
	Joonsoo Kim

Commit 0ad9500e16fe24aa55809a2b00e0d2d0e658fc71 ('slub: prefetch
next freelist pointer in slab_alloc') add prefetch instruction to
fast path of allocation.

Same benefit is also available in slow path of allocation, but it is not
large portion of overall allocation. Nevertheless we could get
some benifit from it, so prefetch next freelist pointer in __slab_alloc.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
---
Add 'Cc: Eric Dumazet <eric.dumazet@gmail.com>'

diff --git a/mm/slub.c b/mm/slub.c
index f96d8bc..92f1c0e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2248,6 +2248,7 @@ load_freelist:
 	VM_BUG_ON(!c->page->frozen);
 	c->freelist = get_freepointer(s, freelist);
 	c->tid = next_tid(c->tid);
+	prefetch_freepointer(s, c->freelist);
 	local_irq_restore(flags);
 	return freelist;
 
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc()
  2012-06-22 18:45   ` [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc() Joonsoo Kim
@ 2012-07-04 12:58     ` JoonSoo Kim
  2012-07-04 13:00     ` Pekka Enberg
  1 sibling, 0 replies; 37+ messages in thread
From: JoonSoo Kim @ 2012-07-04 12:58 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, linux-kernel, linux-mm, Eric Dumazet,
	Joonsoo Kim

Hi Pekka and Christoph.
Could you give me some comments for these patches?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc()
  2012-06-22 18:45   ` [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc() Joonsoo Kim
  2012-07-04 12:58     ` JoonSoo Kim
@ 2012-07-04 13:00     ` Pekka Enberg
  2012-07-04 14:30       ` JoonSoo Kim
  1 sibling, 1 reply; 37+ messages in thread
From: Pekka Enberg @ 2012-07-04 13:00 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Christoph Lameter, linux-kernel, linux-mm, Eric Dumazet

On Fri, Jun 22, 2012 at 9:45 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> Commit 0ad9500e16fe24aa55809a2b00e0d2d0e658fc71 ('slub: prefetch
> next freelist pointer in slab_alloc') add prefetch instruction to
> fast path of allocation.
>
> Same benefit is also available in slow path of allocation, but it is not
> large portion of overall allocation. Nevertheless we could get
> some benifit from it, so prefetch next freelist pointer in __slab_alloc.
>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> ---
> Add 'Cc: Eric Dumazet <eric.dumazet@gmail.com>'
>
> diff --git a/mm/slub.c b/mm/slub.c
> index f96d8bc..92f1c0e 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2248,6 +2248,7 @@ load_freelist:
>         VM_BUG_ON(!c->page->frozen);
>         c->freelist = get_freepointer(s, freelist);
>         c->tid = next_tid(c->tid);
> +       prefetch_freepointer(s, c->freelist);
>         local_irq_restore(flags);
>         return freelist;

Well, can you show improvement in any benchmark or workload?
Prefetching is not always an obvious win and the reason we merged
Eric's patch was that he was able to show an improvement in hackbench.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/3] slub: reduce failure of this_cpu_cmpxchg in put_cpu_partial() after unfreezing
  2012-06-22 18:22   ` [PATCH 2/3] slub: reduce failure of this_cpu_cmpxchg in put_cpu_partial() after unfreezing Joonsoo Kim
@ 2012-07-04 13:05     ` Pekka Enberg
  2012-07-05 14:20       ` Christoph Lameter
  2012-08-16  7:06     ` Pekka Enberg
  1 sibling, 1 reply; 37+ messages in thread
From: Pekka Enberg @ 2012-07-04 13:05 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Christoph Lameter, linux-kernel, linux-mm, David Rientjes

On Fri, Jun 22, 2012 at 9:22 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> In current implementation, after unfreezing, we doesn't touch oldpage,
> so it remain 'NOT NULL'. When we call this_cpu_cmpxchg()
> with this old oldpage, this_cpu_cmpxchg() is mostly be failed.
>
> We can change value of oldpage to NULL after unfreezing,
> because unfreeze_partial() ensure that all the cpu partial slabs is removed
> from cpu partial list. In this time, we could expect that
> this_cpu_cmpxchg is mostly succeed.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 92f1c0e..531d8ed 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1968,6 +1968,7 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
>                                 local_irq_save(flags);
>                                 unfreeze_partials(s);
>                                 local_irq_restore(flags);
> +                               oldpage = NULL;
>                                 pobjects = 0;
>                                 pages = 0;
>                                 stat(s, CPU_PARTIAL_DRAIN);

Makes sense. Christoph, David?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free()
  2012-06-22 18:22   ` [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free() Joonsoo Kim
@ 2012-07-04 13:10     ` Pekka Enberg
  2012-07-04 14:48       ` JoonSoo Kim
  2012-07-05 14:26     ` Christoph Lameter
  1 sibling, 1 reply; 37+ messages in thread
From: Pekka Enberg @ 2012-07-04 13:10 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Christoph Lameter, linux-kernel, linux-mm, David Rientjes

On Fri, Jun 22, 2012 at 9:22 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> In some case of __slab_free(), we need a lock for manipulating partial list.
> If freeing object with a lock is failed, a lock doesn't needed anymore
> for some reasons.
>
> Case 1. prior is NULL, kmem_cache_debug(s) is true
>
> In this case, another free is occured before our free is succeed.
> When slab is full(prior is NULL), only possible operation is slab_free().
> So in this case, we guess another free is occured.
> It may make a slab frozen, so lock is not needed anymore.
>
> Case 2. inuse is NULL
>
> In this case, acquire_slab() is occured before out free is succeed.
> We have a last object for slab, so other operation for this slab is
> not possible except acquire_slab().
> Acquire_slab() makes a slab frozen, so lock is not needed anymore.
>
> Above two reason explain why we don't need a lock
> when freeing object with a lock is failed.
>
> So, when cmpxchg_double_slab() is failed, releasing a lock is more suitable.
> This may reduce lock contention.
>
> This also make logic somehow simple that 'was_frozen with a lock' case
> is never occured. Remove it.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 531d8ed..3e0b9db 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2438,7 +2438,6 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
>         void *prior;
>         void **object = (void *)x;
>         int was_frozen;
> -       int inuse;
>         struct page new;
>         unsigned long counters;
>         struct kmem_cache_node *n = NULL;
> @@ -2450,13 +2449,17 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
>                 return;
>
>         do {
> +               if (unlikely(n)) {
> +                       spin_unlock_irqrestore(&n->list_lock, flags);
> +                       n = NULL;
> +               }
>                 prior = page->freelist;
>                 counters = page->counters;
>                 set_freepointer(s, object, prior);
>                 new.counters = counters;
>                 was_frozen = new.frozen;
>                 new.inuse--;
> -               if ((!new.inuse || !prior) && !was_frozen && !n) {
> +               if ((!new.inuse || !prior) && !was_frozen) {
>
>                         if (!kmem_cache_debug(s) && !prior)
>
> @@ -2481,7 +2484,6 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
>
>                         }
>                 }
> -               inuse = new.inuse;
>
>         } while (!cmpxchg_double_slab(s, page,
>                 prior, counters,
> @@ -2507,25 +2509,17 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
>                  return;
>          }
>
> +       if (unlikely(!new.inuse && n->nr_partial > s->min_partial))
> +               goto slab_empty;
> +
>         /*
> -        * was_frozen may have been set after we acquired the list_lock in
> -        * an earlier loop. So we need to check it here again.
> +        * Objects left in the slab. If it was not on the partial list before
> +        * then add it.
>          */
> -       if (was_frozen)
> -               stat(s, FREE_FROZEN);
> -       else {
> -               if (unlikely(!inuse && n->nr_partial > s->min_partial))
> -                        goto slab_empty;
> -
> -               /*
> -                * Objects left in the slab. If it was not on the partial list before
> -                * then add it.
> -                */
> -               if (unlikely(!prior)) {
> -                       remove_full(s, page);
> -                       add_partial(n, page, DEACTIVATE_TO_TAIL);
> -                       stat(s, FREE_ADD_PARTIAL);
> -               }
> +       if (kmem_cache_debug(s) && unlikely(!prior)) {
> +               remove_full(s, page);
> +               add_partial(n, page, DEACTIVATE_TO_TAIL);
> +               stat(s, FREE_ADD_PARTIAL);
>         }
>         spin_unlock_irqrestore(&n->list_lock, flags);
>         return;

I'm confused. Does this fix a bug or is it an optimization?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc()
  2012-07-04 13:00     ` Pekka Enberg
@ 2012-07-04 14:30       ` JoonSoo Kim
  2012-07-04 15:08         ` Pekka Enberg
  0 siblings, 1 reply; 37+ messages in thread
From: JoonSoo Kim @ 2012-07-04 14:30 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Lameter, linux-kernel, linux-mm, Eric Dumazet

2012/7/4 Pekka Enberg <penberg@kernel.org>:
> Well, can you show improvement in any benchmark or workload?
> Prefetching is not always an obvious win and the reason we merged
> Eric's patch was that he was able to show an improvement in hackbench.

I thinks that this patch is perfectly same effect as Eric's patch, so
doesn't include benchmark result.
Eric's patch which add "prefetch instruction" in fastpath works for
second ~ last object of cpu slab.
This patch which add "prefetch instrunction" in slowpath works for
first object of cpu slab.

But, I do test "./perf stat -r 20 ./hackbench 50 process 4000 >
/dev/null" and gain following outputs.

***** vanilla *****

 Performance counter stats for './hackbench 50 process 4000' (20 runs):

     114189.571311 task-clock                #    7.924 CPUs utilized
          ( +-  0.29% )
         2,978,515 context-switches          #    0.026 M/sec
          ( +-  3.45% )
           102,635 CPU-migrations            #    0.899 K/sec
          ( +-  5.63% )
           123,948 page-faults               #    0.001 M/sec
          ( +-  0.16% )
   422,477,120,134 cycles                    #    3.700 GHz
          ( +-  0.29% )
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
   251,943,851,074 instructions              #    0.60  insns per
cycle          ( +-  0.14% )
    46,214,207,979 branches                  #  404.715 M/sec
          ( +-  0.15% )
       215,342,095 branch-misses             #    0.47% of all
branches          ( +-  0.53% )

      14.409990448 seconds time elapsed
          ( +-  0.30% )

 Performance counter stats for './hackbench 50 process 4000' (20 runs):

     114576.053284 task-clock                #    7.921 CPUs utilized
          ( +-  0.35% )
         2,810,138 context-switches          #    0.025 M/sec
          ( +-  3.21% )
            85,641 CPU-migrations            #    0.747 K/sec
          ( +-  5.05% )
           124,299 page-faults               #    0.001 M/sec
          ( +-  0.18% )
   423,906,539,517 cycles                    #    3.700 GHz
          ( +-  0.35% )
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
   251,354,351,283 instructions              #    0.59  insns per
cycle          ( +-  0.13% )
    46,098,601,012 branches                  #  402.341 M/sec
          ( +-  0.13% )
       213,448,657 branch-misses             #    0.46% of all
branches          ( +-  0.50% )

      14.464325969 seconds time elapsed
          ( +-  0.34% )


***** patch applied *****

 Performance counter stats for './hackbench 50 process 4000' (20 runs):

     112935.199731 task-clock                #    7.926 CPUs utilized
          ( +-  0.29% )
         2,810,157 context-switches          #    0.025 M/sec
          ( +-  2.95% )
           104,278 CPU-migrations            #    0.923 K/sec
          ( +-  6.83% )
           123,999 page-faults               #    0.001 M/sec
          ( +-  0.17% )
   417,834,406,420 cycles                    #    3.700 GHz
          ( +-  0.29% )
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
   251,291,523,926 instructions              #    0.60  insns per
cycle          ( +-  0.11% )
    46,083,091,476 branches                  #  408.049 M/sec
          ( +-  0.12% )
       213,714,228 branch-misses             #    0.46% of all
branches          ( +-  0.43% )

      14.248980376 seconds time elapsed
          ( +-  0.29% )

 Performance counter stats for './hackbench 50 process 4000' (20 runs):

     113640.944855 task-clock                #    7.926 CPUs utilized
          ( +-  0.28% )
         2,776,983 context-switches          #    0.024 M/sec
          ( +-  5.66% )
            95,962 CPU-migrations            #    0.844 K/sec
          ( +- 10.69% )
           123,849 page-faults               #    0.001 M/sec
          ( +-  0.15% )
   420,446,572,595 cycles                    #    3.700 GHz
          ( +-  0.28% )
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
   251,174,259,429 instructions              #    0.60  insns per
cycle          ( +-  0.21% )
    46,060,683,039 branches                  #  405.318 M/sec
          ( +-  0.23% )
       213,480,999 branch-misses             #    0.46% of all
branches          ( +-  0.75% )

      14.336843534 seconds time elapsed
          ( +-  0.28% )

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free()
  2012-07-04 13:10     ` Pekka Enberg
@ 2012-07-04 14:48       ` JoonSoo Kim
  0 siblings, 0 replies; 37+ messages in thread
From: JoonSoo Kim @ 2012-07-04 14:48 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Lameter, linux-kernel, linux-mm, David Rientjes

2012/7/4 Pekka Enberg <penberg@kernel.org>:
> On Fri, Jun 22, 2012 at 9:22 PM, Joonsoo Kim <js1304@gmail.com> wrote:
>> In some case of __slab_free(), we need a lock for manipulating partial list.
>> If freeing object with a lock is failed, a lock doesn't needed anymore
>> for some reasons.
>>
>> Case 1. prior is NULL, kmem_cache_debug(s) is true
>>
>> In this case, another free is occured before our free is succeed.
>> When slab is full(prior is NULL), only possible operation is slab_free().
>> So in this case, we guess another free is occured.
>> It may make a slab frozen, so lock is not needed anymore.
>>
>> Case 2. inuse is NULL
>>
>> In this case, acquire_slab() is occured before out free is succeed.
>> We have a last object for slab, so other operation for this slab is
>> not possible except acquire_slab().
>> Acquire_slab() makes a slab frozen, so lock is not needed anymore.
>>
>> Above two reason explain why we don't need a lock
>> when freeing object with a lock is failed.
>>
>> So, when cmpxchg_double_slab() is failed, releasing a lock is more suitable.
>> This may reduce lock contention.
>>
>> This also make logic somehow simple that 'was_frozen with a lock' case
>> is never occured. Remove it.
>>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 531d8ed..3e0b9db 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2438,7 +2438,6 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
>>         void *prior;
>>         void **object = (void *)x;
>>         int was_frozen;
>> -       int inuse;
>>         struct page new;
>>         unsigned long counters;
>>         struct kmem_cache_node *n = NULL;
>> @@ -2450,13 +2449,17 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
>>                 return;
>>
>>         do {
>> +               if (unlikely(n)) {
>> +                       spin_unlock_irqrestore(&n->list_lock, flags);
>> +                       n = NULL;
>> +               }
>>                 prior = page->freelist;
>>                 counters = page->counters;
>>                 set_freepointer(s, object, prior);
>>                 new.counters = counters;
>>                 was_frozen = new.frozen;
>>                 new.inuse--;
>> -               if ((!new.inuse || !prior) && !was_frozen && !n) {
>> +               if ((!new.inuse || !prior) && !was_frozen) {
>>
>>                         if (!kmem_cache_debug(s) && !prior)
>>
>> @@ -2481,7 +2484,6 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
>>
>>                         }
>>                 }
>> -               inuse = new.inuse;
>>
>>         } while (!cmpxchg_double_slab(s, page,
>>                 prior, counters,
>> @@ -2507,25 +2509,17 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
>>                  return;
>>          }
>>
>> +       if (unlikely(!new.inuse && n->nr_partial > s->min_partial))
>> +               goto slab_empty;
>> +
>>         /*
>> -        * was_frozen may have been set after we acquired the list_lock in
>> -        * an earlier loop. So we need to check it here again.
>> +        * Objects left in the slab. If it was not on the partial list before
>> +        * then add it.
>>          */
>> -       if (was_frozen)
>> -               stat(s, FREE_FROZEN);
>> -       else {
>> -               if (unlikely(!inuse && n->nr_partial > s->min_partial))
>> -                        goto slab_empty;
>> -
>> -               /*
>> -                * Objects left in the slab. If it was not on the partial list before
>> -                * then add it.
>> -                */
>> -               if (unlikely(!prior)) {
>> -                       remove_full(s, page);
>> -                       add_partial(n, page, DEACTIVATE_TO_TAIL);
>> -                       stat(s, FREE_ADD_PARTIAL);
>> -               }
>> +       if (kmem_cache_debug(s) && unlikely(!prior)) {
>> +               remove_full(s, page);
>> +               add_partial(n, page, DEACTIVATE_TO_TAIL);
>> +               stat(s, FREE_ADD_PARTIAL);
>>         }
>>         spin_unlock_irqrestore(&n->list_lock, flags);
>>         return;
>
> I'm confused. Does this fix a bug or is it an optimization?

It is for reducing lock contention and code clean-up.

If we aquire a lock and cmpxchg_double failed, we do releasing a lock.
This result in reducing lock contention.
And this remove "was_frozen and having a lock" case, so logic slightly
simpler than before.

Commit message which confuse u means that this patch do not decrease
performance for two reasons.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc()
  2012-07-04 14:30       ` JoonSoo Kim
@ 2012-07-04 15:08         ` Pekka Enberg
  2012-07-04 15:26           ` Eric Dumazet
  2012-07-04 15:45           ` JoonSoo Kim
  0 siblings, 2 replies; 37+ messages in thread
From: Pekka Enberg @ 2012-07-04 15:08 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Christoph Lameter, linux-kernel, linux-mm, Eric Dumazet,
	David Rientjes

> 2012/7/4 Pekka Enberg <penberg@kernel.org>:
>> Well, can you show improvement in any benchmark or workload?
>> Prefetching is not always an obvious win and the reason we merged
>> Eric's patch was that he was able to show an improvement in hackbench.

On Wed, Jul 4, 2012 at 5:30 PM, JoonSoo Kim <js1304@gmail.com> wrote:
> I thinks that this patch is perfectly same effect as Eric's patch, so
> doesn't include benchmark result.
> Eric's patch which add "prefetch instruction" in fastpath works for
> second ~ last object of cpu slab.
> This patch which add "prefetch instrunction" in slowpath works for
> first object of cpu slab.

Prefetching can also have negative effect on overall performance:

http://lwn.net/Articles/444336/

> But, I do test "./perf stat -r 20 ./hackbench 50 process 4000 >
> /dev/null" and gain following outputs.
>
> ***** vanilla *****
>
>  Performance counter stats for './hackbench 50 process 4000' (20 runs):
>
>      114189.571311 task-clock                #    7.924 CPUs utilized
>           ( +-  0.29% )
>          2,978,515 context-switches          #    0.026 M/sec
>           ( +-  3.45% )
>            102,635 CPU-migrations            #    0.899 K/sec
>           ( +-  5.63% )
>            123,948 page-faults               #    0.001 M/sec
>           ( +-  0.16% )
>    422,477,120,134 cycles                    #    3.700 GHz
>           ( +-  0.29% )
>    <not supported> stalled-cycles-frontend
>    <not supported> stalled-cycles-backend
>    251,943,851,074 instructions              #    0.60  insns per
> cycle          ( +-  0.14% )
>     46,214,207,979 branches                  #  404.715 M/sec
>           ( +-  0.15% )
>        215,342,095 branch-misses             #    0.47% of all
> branches          ( +-  0.53% )
>
>       14.409990448 seconds time elapsed
>           ( +-  0.30% )
>
>  Performance counter stats for './hackbench 50 process 4000' (20 runs):
>
>      114576.053284 task-clock                #    7.921 CPUs utilized
>           ( +-  0.35% )
>          2,810,138 context-switches          #    0.025 M/sec
>           ( +-  3.21% )
>             85,641 CPU-migrations            #    0.747 K/sec
>           ( +-  5.05% )
>            124,299 page-faults               #    0.001 M/sec
>           ( +-  0.18% )
>    423,906,539,517 cycles                    #    3.700 GHz
>           ( +-  0.35% )
>    <not supported> stalled-cycles-frontend
>    <not supported> stalled-cycles-backend
>    251,354,351,283 instructions              #    0.59  insns per
> cycle          ( +-  0.13% )
>     46,098,601,012 branches                  #  402.341 M/sec
>           ( +-  0.13% )
>        213,448,657 branch-misses             #    0.46% of all
> branches          ( +-  0.50% )
>
>       14.464325969 seconds time elapsed
>           ( +-  0.34% )
>
>
> ***** patch applied *****
>
>  Performance counter stats for './hackbench 50 process 4000' (20 runs):
>
>      112935.199731 task-clock                #    7.926 CPUs utilized
>           ( +-  0.29% )
>          2,810,157 context-switches          #    0.025 M/sec
>           ( +-  2.95% )
>            104,278 CPU-migrations            #    0.923 K/sec
>           ( +-  6.83% )
>            123,999 page-faults               #    0.001 M/sec
>           ( +-  0.17% )
>    417,834,406,420 cycles                    #    3.700 GHz
>           ( +-  0.29% )
>    <not supported> stalled-cycles-frontend
>    <not supported> stalled-cycles-backend
>    251,291,523,926 instructions              #    0.60  insns per
> cycle          ( +-  0.11% )
>     46,083,091,476 branches                  #  408.049 M/sec
>           ( +-  0.12% )
>        213,714,228 branch-misses             #    0.46% of all
> branches          ( +-  0.43% )
>
>       14.248980376 seconds time elapsed
>           ( +-  0.29% )
>
>  Performance counter stats for './hackbench 50 process 4000' (20 runs):
>
>      113640.944855 task-clock                #    7.926 CPUs utilized
>           ( +-  0.28% )
>          2,776,983 context-switches          #    0.024 M/sec
>           ( +-  5.66% )
>             95,962 CPU-migrations            #    0.844 K/sec
>           ( +- 10.69% )
>            123,849 page-faults               #    0.001 M/sec
>           ( +-  0.15% )
>    420,446,572,595 cycles                    #    3.700 GHz
>           ( +-  0.28% )
>    <not supported> stalled-cycles-frontend
>    <not supported> stalled-cycles-backend
>    251,174,259,429 instructions              #    0.60  insns per
> cycle          ( +-  0.21% )
>     46,060,683,039 branches                  #  405.318 M/sec
>           ( +-  0.23% )
>        213,480,999 branch-misses             #    0.46% of all
> branches          ( +-  0.75% )
>
>       14.336843534 seconds time elapsed
>           ( +-  0.28% )

That doesn't seem like that obvious win to me... Eric, Christoph?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc()
  2012-07-04 15:08         ` Pekka Enberg
@ 2012-07-04 15:26           ` Eric Dumazet
  2012-07-04 15:48             ` JoonSoo Kim
  2012-07-04 15:45           ` JoonSoo Kim
  1 sibling, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2012-07-04 15:26 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: JoonSoo Kim, Christoph Lameter, linux-kernel, linux-mm,
	David Rientjes

On Wed, 2012-07-04 at 18:08 +0300, Pekka Enberg wrote:

> That doesn't seem like that obvious win to me... Eric, Christoph?

Its the slow path. I am not convinced its useful on real workloads (not
a benchmark)

I mean, if a workload hits badly slow path, some more important work
should be done to avoid this at a higher level.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc()
  2012-07-04 15:08         ` Pekka Enberg
  2012-07-04 15:26           ` Eric Dumazet
@ 2012-07-04 15:45           ` JoonSoo Kim
  2012-07-04 15:59             ` Pekka Enberg
  1 sibling, 1 reply; 37+ messages in thread
From: JoonSoo Kim @ 2012-07-04 15:45 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, linux-kernel, linux-mm, Eric Dumazet,
	David Rientjes

2012/7/5 Pekka Enberg <penberg@kernel.org>:
>> 2012/7/4 Pekka Enberg <penberg@kernel.org>:
>>> Well, can you show improvement in any benchmark or workload?
>>> Prefetching is not always an obvious win and the reason we merged
>>> Eric's patch was that he was able to show an improvement in hackbench.
>
> On Wed, Jul 4, 2012 at 5:30 PM, JoonSoo Kim <js1304@gmail.com> wrote:
>> I thinks that this patch is perfectly same effect as Eric's patch, so
>> doesn't include benchmark result.
>> Eric's patch which add "prefetch instruction" in fastpath works for
>> second ~ last object of cpu slab.
>> This patch which add "prefetch instrunction" in slowpath works for
>> first object of cpu slab.
>
> Prefetching can also have negative effect on overall performance:
>
> http://lwn.net/Articles/444336/
>

Thanks for good article which is very helpful to me.

> That doesn't seem like that obvious win to me... Eric, Christoph?

Could you tell me how I test this patch more deeply, plz?
I am a kernel newbie and in the process of learning.
I doesn't know what I can do more for this.
I googling previous patch related to slub, some people use netperf.

Just do below is sufficient?
How is this test related to slub?

for in in `seq 1 32`
do
 netperf -H 192.168.0.8 -v 0 -l -100000 -t TCP_RR > /dev/null &
done
wait

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc()
  2012-07-04 15:26           ` Eric Dumazet
@ 2012-07-04 15:48             ` JoonSoo Kim
  2012-07-04 16:15               ` Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: JoonSoo Kim @ 2012-07-04 15:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Pekka Enberg, Christoph Lameter, linux-kernel, linux-mm,
	David Rientjes

2012/7/5 Eric Dumazet <eric.dumazet@gmail.com>:
> Its the slow path. I am not convinced its useful on real workloads (not
> a benchmark)
>
> I mean, if a workload hits badly slow path, some more important work
> should be done to avoid this at a higher level.
>

In hackbench test, fast path allocation is about to 93%.
Is it insufficient?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc()
  2012-07-04 15:45           ` JoonSoo Kim
@ 2012-07-04 15:59             ` Pekka Enberg
  2012-07-04 16:04               ` JoonSoo Kim
  0 siblings, 1 reply; 37+ messages in thread
From: Pekka Enberg @ 2012-07-04 15:59 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Christoph Lameter, linux-kernel, linux-mm, Eric Dumazet,
	David Rientjes

On Wed, Jul 4, 2012 at 6:45 PM, JoonSoo Kim <js1304@gmail.com> wrote:
>> Prefetching can also have negative effect on overall performance:
>>
>> http://lwn.net/Articles/444336/
>
> Thanks for good article which is very helpful to me.
>
>> That doesn't seem like that obvious win to me... Eric, Christoph?
>
> Could you tell me how I test this patch more deeply, plz?
> I am a kernel newbie and in the process of learning.
> I doesn't know what I can do more for this.
> I googling previous patch related to slub, some people use netperf.
>
> Just do below is sufficient?
> How is this test related to slub?
>
> for in in `seq 1 32`
> do
>  netperf -H 192.168.0.8 -v 0 -l -100000 -t TCP_RR > /dev/null &
> done
> wait

The networking subsystem is sensitive to slab allocator performance
which makes netperf an interesting benchmark, that's all.

As for slab benchmarking, you might want to look at what Mel Gorman
has done in the past:

https://lkml.org/lkml/2009/2/16/252

For something like prefetch optimization, you'd really want to see a
noticeable win in some benchmark. The kind of improvement you're
seeing with your patch is likely to be lost in the noise - or even
worse, cause negative performance for real world workloads.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc()
  2012-07-04 15:59             ` Pekka Enberg
@ 2012-07-04 16:04               ` JoonSoo Kim
  0 siblings, 0 replies; 37+ messages in thread
From: JoonSoo Kim @ 2012-07-04 16:04 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, linux-kernel, linux-mm, Eric Dumazet,
	David Rientjes

2012/7/5 Pekka Enberg <penberg@kernel.org>:
> On Wed, Jul 4, 2012 at 6:45 PM, JoonSoo Kim <js1304@gmail.com> wrote:
>>> Prefetching can also have negative effect on overall performance:
>>>
>>> http://lwn.net/Articles/444336/
>>
>> Thanks for good article which is very helpful to me.
>>
>>> That doesn't seem like that obvious win to me... Eric, Christoph?
>>
>> Could you tell me how I test this patch more deeply, plz?
>> I am a kernel newbie and in the process of learning.
>> I doesn't know what I can do more for this.
>> I googling previous patch related to slub, some people use netperf.
>>
>> Just do below is sufficient?
>> How is this test related to slub?
>>
>> for in in `seq 1 32`
>> do
>>  netperf -H 192.168.0.8 -v 0 -l -100000 -t TCP_RR > /dev/null &
>> done
>> wait
>
> The networking subsystem is sensitive to slab allocator performance
> which makes netperf an interesting benchmark, that's all.
>
> As for slab benchmarking, you might want to look at what Mel Gorman
> has done in the past:
>
> https://lkml.org/lkml/2009/2/16/252
>
> For something like prefetch optimization, you'd really want to see a
> noticeable win in some benchmark. The kind of improvement you're
> seeing with your patch is likely to be lost in the noise - or even
> worse, cause negative performance for real world workloads.

Okay.
Thanks for comments.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc()
  2012-07-04 15:48             ` JoonSoo Kim
@ 2012-07-04 16:15               ` Eric Dumazet
  2012-07-04 16:24                 ` JoonSoo Kim
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2012-07-04 16:15 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Pekka Enberg, Christoph Lameter, linux-kernel, linux-mm,
	David Rientjes

On Thu, 2012-07-05 at 00:48 +0900, JoonSoo Kim wrote:
> 2012/7/5 Eric Dumazet <eric.dumazet@gmail.com>:
> > Its the slow path. I am not convinced its useful on real workloads (not
> > a benchmark)
> >
> > I mean, if a workload hits badly slow path, some more important work
> > should be done to avoid this at a higher level.
> >
> 
> In hackbench test, fast path allocation is about to 93%.
> Is it insufficient?

7% is insufficient I am afraid.

One prefetch() in the fast path serves 93% of the allocations,
so added icache pressure is ok.

One prefetch() in slow path serves 7% of the allocations, do you see the
difference ?

Adding a prefetch() is usually a win when a benchmark uses the path one
million times per second.

But adding prefetches also increases kernel size and it hurts globally.
(Latency of the kernel depends on its size, when cpu caches are cold)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc()
  2012-07-04 16:15               ` Eric Dumazet
@ 2012-07-04 16:24                 ` JoonSoo Kim
  0 siblings, 0 replies; 37+ messages in thread
From: JoonSoo Kim @ 2012-07-04 16:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Pekka Enberg, Christoph Lameter, linux-kernel, linux-mm,
	David Rientjes

2012/7/5 Eric Dumazet <eric.dumazet@gmail.com>:
> On Thu, 2012-07-05 at 00:48 +0900, JoonSoo Kim wrote:
>> 2012/7/5 Eric Dumazet <eric.dumazet@gmail.com>:
>> > Its the slow path. I am not convinced its useful on real workloads (not
>> > a benchmark)
>> >
>> > I mean, if a workload hits badly slow path, some more important work
>> > should be done to avoid this at a higher level.
>> >
>>
>> In hackbench test, fast path allocation is about to 93%.
>> Is it insufficient?
>
> 7% is insufficient I am afraid.
>
> One prefetch() in the fast path serves 93% of the allocations,
> so added icache pressure is ok.
>
> One prefetch() in slow path serves 7% of the allocations, do you see the
> difference ?
>
> Adding a prefetch() is usually a win when a benchmark uses the path one
> million times per second.
>
> But adding prefetches also increases kernel size and it hurts globally.
> (Latency of the kernel depends on its size, when cpu caches are cold)
>

Okay.
Thanks for comments which is very helpful to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/3] slub: reduce failure of this_cpu_cmpxchg in put_cpu_partial() after unfreezing
  2012-07-04 13:05     ` Pekka Enberg
@ 2012-07-05 14:20       ` Christoph Lameter
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Lameter @ 2012-07-05 14:20 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Joonsoo Kim, linux-kernel, linux-mm, David Rientjes

On Wed, 4 Jul 2012, Pekka Enberg wrote:

> On Fri, Jun 22, 2012 at 9:22 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> > In current implementation, after unfreezing, we doesn't touch oldpage,
> > so it remain 'NOT NULL'. When we call this_cpu_cmpxchg()
> > with this old oldpage, this_cpu_cmpxchg() is mostly be failed.
> >
> > We can change value of oldpage to NULL after unfreezing,
> > because unfreeze_partial() ensure that all the cpu partial slabs is removed
> > from cpu partial list. In this time, we could expect that
> > this_cpu_cmpxchg is mostly succeed.
> >
> > Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 92f1c0e..531d8ed 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1968,6 +1968,7 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
> >                                 local_irq_save(flags);
> >                                 unfreeze_partials(s);
> >                                 local_irq_restore(flags);
> > +                               oldpage = NULL;
> >                                 pobjects = 0;
> >                                 pages = 0;
> >                                 stat(s, CPU_PARTIAL_DRAIN);
>
> Makes sense. Christoph, David?

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free()
  2012-06-22 18:22   ` [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free() Joonsoo Kim
  2012-07-04 13:10     ` Pekka Enberg
@ 2012-07-05 14:26     ` Christoph Lameter
  2012-07-06 14:19       ` JoonSoo Kim
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Lameter @ 2012-07-05 14:26 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm

On Sat, 23 Jun 2012, Joonsoo Kim wrote:

> In some case of __slab_free(), we need a lock for manipulating partial list.
> If freeing object with a lock is failed, a lock doesn't needed anymore
> for some reasons.
>
> Case 1. prior is NULL, kmem_cache_debug(s) is true
>
> In this case, another free is occured before our free is succeed.
> When slab is full(prior is NULL), only possible operation is slab_free().
> So in this case, we guess another free is occured.
> It may make a slab frozen, so lock is not needed anymore.

A free cannot freeze the slab without taking the lock. The taken lock
makes sure that the thread that first enters slab_free() will be able to
hold back the thread that wants to freeze the slab.

> Case 2. inuse is NULL
>
> In this case, acquire_slab() is occured before out free is succeed.
> We have a last object for slab, so other operation for this slab is
> not possible except acquire_slab().
> Acquire_slab() makes a slab frozen, so lock is not needed anymore.

acquire_slab() also requires lock acquisition and would be held of by
slab_free holding the lock.

> This also make logic somehow simple that 'was_frozen with a lock' case
> is never occured. Remove it.

That is actually interesting and would be a good optimization.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free()
  2012-07-05 14:26     ` Christoph Lameter
@ 2012-07-06 14:19       ` JoonSoo Kim
  2012-07-06 14:34         ` Christoph Lameter
  0 siblings, 1 reply; 37+ messages in thread
From: JoonSoo Kim @ 2012-07-06 14:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, linux-kernel, linux-mm

2012/7/5 Christoph Lameter <cl@linux.com>:
> On Sat, 23 Jun 2012, Joonsoo Kim wrote:
>
>> In some case of __slab_free(), we need a lock for manipulating partial list.
>> If freeing object with a lock is failed, a lock doesn't needed anymore
>> for some reasons.
>>
>> Case 1. prior is NULL, kmem_cache_debug(s) is true
>>
>> In this case, another free is occured before our free is succeed.
>> When slab is full(prior is NULL), only possible operation is slab_free().
>> So in this case, we guess another free is occured.
>> It may make a slab frozen, so lock is not needed anymore.
>
> A free cannot freeze the slab without taking the lock. The taken lock
> makes sure that the thread that first enters slab_free() will be able to
> hold back the thread that wants to freeze the slab.

I don't mean we can freeze the slab without taking the lock.
We can fail cmpxchg_double_slab with taking the lock.
And in this case, we don't need lock anymore, so let's release lock.

For example,
When we try to free object A at cpu 1, another process try to free
object B at cpu 2 at the same time.
object A, B is in same slab, and this slab is in full list.

CPU 1                           CPU 2
prior = page->freelist;    prior = page->freelist
....                                  ...
new.inuse--;                   new.inuse--;
taking lock                      try to take the lock, but failed, so
spinning...
free success                   spinning...
add_partial
release lock                    taking lock
                                       fail cmpxchg_double_slab
                                       retry
                                       currently, we don't need lock

At CPU2, we don't need lock anymore, because this slab already in partial list.

Case 2 is similar as case 1.
So skip explain.

Case 1, 2 in commit message explain almost retry case with taking lock.
So, below is reasonable.

@@ -2450,13 +2449,17 @@ static void __slab_free(struct kmem_cache *s,
struct page *page,
                return;

        do {
+               if (unlikely(n)) {
+                       spin_unlock_irqrestore(&n->list_lock, flags);
+                       n = NULL;
+               }
                prior = page->freelist;
                counters = page->counters;
                set_freepointer(s, object, prior);


>> Case 2. inuse is NULL
>>
>> In this case, acquire_slab() is occured before out free is succeed.
>> We have a last object for slab, so other operation for this slab is
>> not possible except acquire_slab().
>> Acquire_slab() makes a slab frozen, so lock is not needed anymore.
>
> acquire_slab() also requires lock acquisition and would be held of by
> slab_free holding the lock.

See above explain.

>> This also make logic somehow simple that 'was_frozen with a lock' case
>> is never occured. Remove it.
>
> That is actually interesting and would be a good optimization.
>

So, I think patch is valid.
Thanks for comments.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free()
  2012-07-06 14:19       ` JoonSoo Kim
@ 2012-07-06 14:34         ` Christoph Lameter
  2012-07-06 14:59           ` JoonSoo Kim
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Lameter @ 2012-07-06 14:34 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm

On Fri, 6 Jul 2012, JoonSoo Kim wrote:

> For example,
> When we try to free object A at cpu 1, another process try to free
> object B at cpu 2 at the same time.
> object A, B is in same slab, and this slab is in full list.
>
> CPU 1                           CPU 2
> prior = page->freelist;    prior = page->freelist
> ....                                  ...
> new.inuse--;                   new.inuse--;
> taking lock                      try to take the lock, but failed, so
> spinning...
> free success                   spinning...
> add_partial
> release lock                    taking lock
>                                        fail cmpxchg_double_slab
>                                        retry
>                                        currently, we don't need lock
>
> At CPU2, we don't need lock anymore, because this slab already in partial list.

For that scenario we could also simply do a trylock there and redo
the loop if we fail. But still what guarantees that another process will
not modify the page struct between fetching the data and a successful
trylock?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free()
  2012-07-06 14:34         ` Christoph Lameter
@ 2012-07-06 14:59           ` JoonSoo Kim
  2012-07-06 15:10             ` Christoph Lameter
  0 siblings, 1 reply; 37+ messages in thread
From: JoonSoo Kim @ 2012-07-06 14:59 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, linux-kernel, linux-mm

2012/7/6 Christoph Lameter <cl@linux.com>:
> On Fri, 6 Jul 2012, JoonSoo Kim wrote:
>
>> For example,
>> When we try to free object A at cpu 1, another process try to free
>> object B at cpu 2 at the same time.
>> object A, B is in same slab, and this slab is in full list.
>>
>> CPU 1                           CPU 2
>> prior = page->freelist;    prior = page->freelist
>> ....                                  ...
>> new.inuse--;                   new.inuse--;
>> taking lock                      try to take the lock, but failed, so
>> spinning...
>> free success                   spinning...
>> add_partial
>> release lock                    taking lock
>>                                        fail cmpxchg_double_slab
>>                                        retry
>>                                        currently, we don't need lock
>>
>> At CPU2, we don't need lock anymore, because this slab already in partial list.
>
> For that scenario we could also simply do a trylock there and redo
> the loop if we fail. But still what guarantees that another process will
> not modify the page struct between fetching the data and a successful
> trylock?


I'm not familiar with English, so take my ability to understand into
consideration.

we don't need guarantees that another process will not modify
the page struct between fetching the data and a successful trylock.

As I understand, do u ask below scenario?

CPU A               CPU B
lock
cmpxchg fail
retry
unlock
...                       modify page strcut
...
cmpxchg~~

In this case, cmpxchg will fail and just redo the loop.
If we need the lock again during redo, re-take the lock.
But I think this is not common case.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free()
  2012-07-06 14:59           ` JoonSoo Kim
@ 2012-07-06 15:10             ` Christoph Lameter
  2012-07-08 16:19               ` JoonSoo Kim
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Lameter @ 2012-07-06 15:10 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm

On Fri, 6 Jul 2012, JoonSoo Kim wrote:

> >> At CPU2, we don't need lock anymore, because this slab already in partial list.
> >
> > For that scenario we could also simply do a trylock there and redo
> > the loop if we fail. But still what guarantees that another process will
> > not modify the page struct between fetching the data and a successful
> > trylock?
>
>
> I'm not familiar with English, so take my ability to understand into
> consideration.

I have a hard time understanding what you want to accomplish here.

> we don't need guarantees that another process will not modify
> the page struct between fetching the data and a successful trylock.

No we do not need that since the cmpxchg will then fail.

Maybe it would be useful to split this patch into two?

One where you introduce the dropping of the lock and the other where you
get rid of certain code paths?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free()
  2012-07-06 15:10             ` Christoph Lameter
@ 2012-07-08 16:19               ` JoonSoo Kim
  0 siblings, 0 replies; 37+ messages in thread
From: JoonSoo Kim @ 2012-07-08 16:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, linux-kernel, linux-mm

2012/7/7 Christoph Lameter <cl@linux.com>:
> On Fri, 6 Jul 2012, JoonSoo Kim wrote:
>
>> >> At CPU2, we don't need lock anymore, because this slab already in partial list.
>> >
>> > For that scenario we could also simply do a trylock there and redo
>> > the loop if we fail. But still what guarantees that another process will
>> > not modify the page struct between fetching the data and a successful
>> > trylock?
>>
>>
>> I'm not familiar with English, so take my ability to understand into
>> consideration.
>
> I have a hard time understanding what you want to accomplish here.
>
>> we don't need guarantees that another process will not modify
>> the page struct between fetching the data and a successful trylock.
>
> No we do not need that since the cmpxchg will then fail.
>
> Maybe it would be useful to split this patch into two?
>
> One where you introduce the dropping of the lock and the other where you
> get rid of certain code paths?
>

Dropping of the lock is need for getting rid of certain code paths.
So, I can't split this patch into two.

Sorry for confusing all the people.
I think that I don't explain my purpose well.
I will prepare new version in which I explain purpose of patch better.

Thanks for kind review.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/3] slub: reduce failure of this_cpu_cmpxchg in put_cpu_partial() after unfreezing
  2012-06-22 18:22   ` [PATCH 2/3] slub: reduce failure of this_cpu_cmpxchg in put_cpu_partial() after unfreezing Joonsoo Kim
  2012-07-04 13:05     ` Pekka Enberg
@ 2012-08-16  7:06     ` Pekka Enberg
  1 sibling, 0 replies; 37+ messages in thread
From: Pekka Enberg @ 2012-08-16  7:06 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Christoph Lameter, linux-kernel, linux-mm

On Sat, 23 Jun 2012, Joonsoo Kim wrote:
> In current implementation, after unfreezing, we doesn't touch oldpage,
> so it remain 'NOT NULL'. When we call this_cpu_cmpxchg()
> with this old oldpage, this_cpu_cmpxchg() is mostly be failed.
> 
> We can change value of oldpage to NULL after unfreezing,
> because unfreeze_partial() ensure that all the cpu partial slabs is removed
> from cpu partial list. In this time, we could expect that
> this_cpu_cmpxchg is mostly succeed.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 92f1c0e..531d8ed 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1968,6 +1968,7 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
>  				local_irq_save(flags);
>  				unfreeze_partials(s);
>  				local_irq_restore(flags);
> +				oldpage = NULL;
>  				pobjects = 0;
>  				pages = 0;
>  				stat(s, CPU_PARTIAL_DRAIN);

Applied, thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2012-08-16  7:06 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <yes>
2012-06-08 17:23 ` [PATCH 1/4] slub: change declare of get_slab() to inline at all times Joonsoo Kim
2012-06-08 17:23   ` [PATCH 2/4] slub: use __cmpxchg_double_slab() at interrupt disabled place Joonsoo Kim
2012-06-08 17:23   ` [PATCH 3/4] slub: refactoring unfreeze_partials() Joonsoo Kim
2012-06-20  7:19     ` Pekka Enberg
2012-06-08 17:23   ` [PATCH 4/4] slub: deactivate freelist of kmem_cache_cpu all at once in deactivate_slab() Joonsoo Kim
2012-06-08 19:04     ` Christoph Lameter
2012-06-10 10:27       ` JoonSoo Kim
2012-06-22 18:34         ` JoonSoo Kim
2012-06-08 19:02   ` [PATCH 1/4] slub: change declare of get_slab() to inline at all times Christoph Lameter
2012-06-09 15:57     ` JoonSoo Kim
2012-06-11 15:04       ` Christoph Lameter
2012-06-22 18:22 ` [PATCH 1/3] slub: prefetch next freelist pointer in __slab_alloc() Joonsoo Kim
2012-06-22 18:22   ` [PATCH 2/3] slub: reduce failure of this_cpu_cmpxchg in put_cpu_partial() after unfreezing Joonsoo Kim
2012-07-04 13:05     ` Pekka Enberg
2012-07-05 14:20       ` Christoph Lameter
2012-08-16  7:06     ` Pekka Enberg
2012-06-22 18:22   ` [PATCH 3/3] slub: release a lock if freeing object with a lock is failed in __slab_free() Joonsoo Kim
2012-07-04 13:10     ` Pekka Enberg
2012-07-04 14:48       ` JoonSoo Kim
2012-07-05 14:26     ` Christoph Lameter
2012-07-06 14:19       ` JoonSoo Kim
2012-07-06 14:34         ` Christoph Lameter
2012-07-06 14:59           ` JoonSoo Kim
2012-07-06 15:10             ` Christoph Lameter
2012-07-08 16:19               ` JoonSoo Kim
2012-06-22 18:45   ` [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc() Joonsoo Kim
2012-07-04 12:58     ` JoonSoo Kim
2012-07-04 13:00     ` Pekka Enberg
2012-07-04 14:30       ` JoonSoo Kim
2012-07-04 15:08         ` Pekka Enberg
2012-07-04 15:26           ` Eric Dumazet
2012-07-04 15:48             ` JoonSoo Kim
2012-07-04 16:15               ` Eric Dumazet
2012-07-04 16:24                 ` JoonSoo Kim
2012-07-04 15:45           ` JoonSoo Kim
2012-07-04 15:59             ` Pekka Enberg
2012-07-04 16:04               ` JoonSoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).