linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [rfc 00/18] slub: irqless/lockless slow allocation paths
@ 2011-11-11 20:07 Christoph Lameter
  2011-11-11 20:07 ` [rfc 01/18] slub: Get rid of the node field Christoph Lameter
                   ` (19 more replies)
  0 siblings, 20 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

This is a patchset that makes the allocator slow path also lockless like
the free paths. However, in the process it is making processing more
complex so that this is not a performance improvement. I am going to
drop this series unless someone comes up with a bright idea to fix the
following performance issues:

1. Had to reduce the per cpu state kept to two words in order to
   be able to operate without preempt disable / interrupt disable only
   through cmpxchg_double(). This means that the node information and
   the page struct location have to be calculated from the free pointer.
   That is possible but relatively expensive and has to be done frequently
   in fast paths.

2. If the freepointer becomes NULL then the page struct location can
   no longer be determined. So per cpu slabs must be deactivated when
   the last object is retrieved from them causing more regressions.

If these issues remain unresolved then I am fine with the way things are
right now in slub. Currently interrupts are disabled in the slow paths and
then multiple fields in the kmem_cache_cpu structure are modified without
regard to instruction atomicity.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 01/18] slub: Get rid of the node field
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-14 21:42   ` Pekka Enberg
  2011-11-20 23:01   ` David Rientjes
  2011-11-11 20:07 ` [rfc 02/18] slub: Separate out kmem_cache_cpu processing from deactivate_slab Christoph Lameter
                   ` (18 subsequent siblings)
  19 siblings, 2 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: get_rid_of_cnode --]
[-- Type: text/plain, Size: 3407 bytes --]

The node field is always page_to_nid(c->page). So its rather easy to
replace. Note that there will be additional overhead in various hot paths
due to the need to mask a set of bits in page->flags and shift the
result.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 include/linux/slub_def.h |    1 -
 mm/slub.c                |   15 ++++++---------
 2 files changed, 6 insertions(+), 10 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-08 09:53:04.043865616 -0600
+++ linux-2.6/mm/slub.c	2011-11-09 11:10:46.111334466 -0600
@@ -1551,7 +1551,6 @@ static void *get_partial_node(struct kme
 
 		if (!object) {
 			c->page = page;
-			c->node = page_to_nid(page);
 			stat(s, ALLOC_FROM_PARTIAL);
 			object = t;
 			available =  page->objects - page->inuse;
@@ -2016,7 +2015,7 @@ static void flush_all(struct kmem_cache
 static inline int node_match(struct kmem_cache_cpu *c, int node)
 {
 #ifdef CONFIG_NUMA
-	if (node != NUMA_NO_NODE && c->node != node)
+	if (node != NUMA_NO_NODE && page_to_nid(c->page) != node)
 		return 0;
 #endif
 	return 1;
@@ -2105,7 +2104,6 @@ static inline void *new_slab_objects(str
 		page->freelist = NULL;
 
 		stat(s, ALLOC_SLAB);
-		c->node = page_to_nid(page);
 		c->page = page;
 		*pc = c;
 	} else
@@ -2202,7 +2200,6 @@ new_slab:
 	if (c->partial) {
 		c->page = c->partial;
 		c->partial = c->page->next;
-		c->node = page_to_nid(c->page);
 		stat(s, CPU_PARTIAL_ALLOC);
 		c->freelist = NULL;
 		goto redo;
@@ -2233,7 +2230,6 @@ new_slab:
 
 	c->freelist = get_freepointer(s, object);
 	deactivate_slab(s, c);
-	c->node = NUMA_NO_NODE;
 	local_irq_restore(flags);
 	return object;
 }
@@ -4437,9 +4433,10 @@ static ssize_t show_slab_objects(struct
 			struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
 			struct page *page;
 
-			if (!c || c->node < 0)
+			if (!c || !c->page)
 				continue;
 
+			node = page_to_nid(c->page);
 			if (c->page) {
 					if (flags & SO_TOTAL)
 						x = c->page->objects;
@@ -4449,16 +4446,16 @@ static ssize_t show_slab_objects(struct
 					x = 1;
 
 				total += x;
-				nodes[c->node] += x;
+				nodes[node] += x;
 			}
 			page = c->partial;
 
 			if (page) {
 				x = page->pobjects;
                                 total += x;
-                                nodes[c->node] += x;
+                                nodes[node] += x;
 			}
-			per_cpu[c->node]++;
+			per_cpu[node]++;
 		}
 	}
 
Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h	2011-11-08 09:53:03.979865196 -0600
+++ linux-2.6/include/linux/slub_def.h	2011-11-09 11:10:46.121334523 -0600
@@ -45,7 +45,6 @@ struct kmem_cache_cpu {
 	unsigned long tid;	/* Globally unique transaction id */
 	struct page *page;	/* The slab from which we are allocating */
 	struct page *partial;	/* Partially allocated frozen slabs */
-	int node;		/* The node of the page (or -1 for debug) */
 #ifdef CONFIG_SLUB_STATS
 	unsigned stat[NR_SLUB_STAT_ITEMS];
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 02/18] slub: Separate out kmem_cache_cpu processing from deactivate_slab
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
  2011-11-11 20:07 ` [rfc 01/18] slub: Get rid of the node field Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-20 23:10   ` David Rientjes
  2011-11-11 20:07 ` [rfc 03/18] slub: Extract get_freelist from __slab_alloc Christoph Lameter
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: separate_deactivate_slab --]
[-- Type: text/plain, Size: 2656 bytes --]

Processing on fields of kmem_cache needs to be outside of deactivate_slab()
since we will be handling that with cmpxchg_double later.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-09 11:10:46.111334466 -0600
+++ linux-2.6/mm/slub.c	2011-11-09 11:10:55.671388657 -0600
@@ -1708,14 +1708,12 @@ void init_kmem_cache_cpus(struct kmem_ca
 /*
  * Remove the cpu slab
  */
-static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
+static void deactivate_slab(struct kmem_cache *s, struct page *page, void *freelist)
 {
 	enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE };
-	struct page *page = c->page;
 	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
 	int lock = 0;
 	enum slab_modes l = M_NONE, m = M_NONE;
-	void *freelist;
 	void *nextfree;
 	int tail = DEACTIVATE_TO_HEAD;
 	struct page new;
@@ -1726,11 +1724,6 @@ static void deactivate_slab(struct kmem_
 		tail = DEACTIVATE_TO_TAIL;
 	}
 
-	c->tid = next_tid(c->tid);
-	c->page = NULL;
-	freelist = c->freelist;
-	c->freelist = NULL;
-
 	/*
 	 * Stage one: Free all available per cpu objects back
 	 * to the page freelist while it is still frozen. Leave the
@@ -1976,7 +1969,11 @@ int put_cpu_partial(struct kmem_cache *s
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
 	stat(s, CPUSLAB_FLUSH);
-	deactivate_slab(s, c);
+	deactivate_slab(s, c->page, c->freelist);
+
+	c->tid = next_tid(c->tid);
+	c->page = NULL;
+	c->freelist = NULL;
 }
 
 /*
@@ -2151,7 +2148,9 @@ static void *__slab_alloc(struct kmem_ca
 redo:
 	if (unlikely(!node_match(c, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
-		deactivate_slab(s, c);
+		deactivate_slab(s, c->page, c->freelist);
+		c->page = NULL;
+		c->freelist = NULL;
 		goto new_slab;
 	}
 
@@ -2228,8 +2227,9 @@ new_slab:
 	if (!alloc_debug_processing(s, c->page, object, addr))
 		goto new_slab;	/* Slab failed checks. Next slab needed */
 
-	c->freelist = get_freepointer(s, object);
-	deactivate_slab(s, c);
+	deactivate_slab(s, c->page, get_freepointer(s, object));
+	c->page = NULL;
+	c->freelist = NULL;
 	local_irq_restore(flags);
 	return object;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 03/18] slub: Extract get_freelist from __slab_alloc
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
  2011-11-11 20:07 ` [rfc 01/18] slub: Get rid of the node field Christoph Lameter
  2011-11-11 20:07 ` [rfc 02/18] slub: Separate out kmem_cache_cpu processing from deactivate_slab Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-14 21:43   ` Pekka Enberg
  2011-11-20 23:18   ` David Rientjes
  2011-11-11 20:07 ` [rfc 04/18] slub: Use freelist instead of "object" in __slab_alloc Christoph Lameter
                   ` (16 subsequent siblings)
  19 siblings, 2 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: extract_get_freelist --]
[-- Type: text/plain, Size: 2750 bytes --]

get_freelist retrieves free objects from the page freelist (put there by remote
frees) or deactivates a slab page if no more objects are available.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 mm/slub.c |   57 ++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 32 insertions(+), 25 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-09 11:10:55.671388657 -0600
+++ linux-2.6/mm/slub.c	2011-11-09 11:11:13.471490305 -0600
@@ -2110,6 +2110,37 @@ static inline void *new_slab_objects(str
 }
 
 /*
+ * Check the page->freelist of a page and either transfer the freelist to the per cpu freelist
+ * or deactivate the page.
+ *
+ * The page is still frozen if the return value is not NULL.
+ *
+ * If this function returns NULL then the page has been unfrozen.
+ */
+static inline void *get_freelist(struct kmem_cache *s, struct page *page)
+{
+	struct page new;
+	unsigned long counters;
+	void *freelist;
+
+	do {
+		freelist = page->freelist;
+		counters = page->counters;
+		new.counters = counters;
+		VM_BUG_ON(!new.frozen);
+
+		new.inuse = page->objects;
+		new.frozen = freelist != NULL;
+
+	} while (!cmpxchg_double_slab(s, page,
+		freelist, counters,
+		NULL, new.counters,
+		"get_freelist"));
+
+	return freelist;
+}
+
+/*
  * Slow path. The lockless freelist is empty or we need to perform
  * debugging duties.
  *
@@ -2130,8 +2161,6 @@ static void *__slab_alloc(struct kmem_ca
 {
 	void **object;
 	unsigned long flags;
-	struct page new;
-	unsigned long counters;
 
 	local_irq_save(flags);
 #ifdef CONFIG_PREEMPT
@@ -2156,29 +2185,7 @@ redo:
 
 	stat(s, ALLOC_SLOWPATH);
 
-	do {
-		object = c->page->freelist;
-		counters = c->page->counters;
-		new.counters = counters;
-		VM_BUG_ON(!new.frozen);
-
-		/*
-		 * If there is no object left then we use this loop to
-		 * deactivate the slab which is simple since no objects
-		 * are left in the slab and therefore we do not need to
-		 * put the page back onto the partial list.
-		 *
-		 * If there are objects left then we retrieve them
-		 * and use them to refill the per cpu queue.
-		 */
-
-		new.inuse = c->page->objects;
-		new.frozen = object != NULL;
-
-	} while (!__cmpxchg_double_slab(s, c->page,
-			object, counters,
-			NULL, new.counters,
-			"__slab_alloc"));
+	object = get_freelist(s, c->page);
 
 	if (!object) {
 		c->page = NULL;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 04/18] slub: Use freelist instead of "object" in __slab_alloc
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (2 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 03/18] slub: Extract get_freelist from __slab_alloc Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-14 21:44   ` Pekka Enberg
  2011-11-20 23:22   ` David Rientjes
  2011-11-11 20:07 ` [rfc 05/18] slub: Simplify control flow in __slab_alloc() Christoph Lameter
                   ` (15 subsequent siblings)
  19 siblings, 2 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: use_freelist_instead_of_object --]
[-- Type: text/plain, Size: 3812 bytes --]

The variable "object" really refers to a list of objects that we
are handling. Since the lockless allocator path will depend on it
we rename the variable now.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-09 11:11:13.471490305 -0600
+++ linux-2.6/mm/slub.c	2011-11-09 11:11:22.381541568 -0600
@@ -2084,7 +2084,7 @@ slab_out_of_memory(struct kmem_cache *s,
 static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
 			int node, struct kmem_cache_cpu **pc)
 {
-	void *object;
+	void *freelist;
 	struct kmem_cache_cpu *c;
 	struct page *page = new_slab(s, flags, node);
 
@@ -2097,16 +2097,16 @@ static inline void *new_slab_objects(str
 		 * No other reference to the page yet so we can
 		 * muck around with it freely without cmpxchg
 		 */
-		object = page->freelist;
+		freelist = page->freelist;
 		page->freelist = NULL;
 
 		stat(s, ALLOC_SLAB);
 		c->page = page;
 		*pc = c;
 	} else
-		object = NULL;
+		freelist = NULL;
 
-	return object;
+	return freelist;
 }
 
 /*
@@ -2159,7 +2159,7 @@ static inline void *get_freelist(struct
 static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			  unsigned long addr, struct kmem_cache_cpu *c)
 {
-	void **object;
+	void *freelist;
 	unsigned long flags;
 
 	local_irq_save(flags);
@@ -2175,6 +2175,7 @@ static void *__slab_alloc(struct kmem_ca
 	if (!c->page)
 		goto new_slab;
 redo:
+
 	if (unlikely(!node_match(c, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
 		deactivate_slab(s, c->page, c->freelist);
@@ -2185,9 +2186,9 @@ redo:
 
 	stat(s, ALLOC_SLOWPATH);
 
-	object = get_freelist(s, c->page);
+	freelist = get_freelist(s, c->page);
 
-	if (!object) {
+	if (unlikely(!freelist)) {
 		c->page = NULL;
 		stat(s, DEACTIVATE_BYPASS);
 		goto new_slab;
@@ -2196,10 +2197,15 @@ redo:
 	stat(s, ALLOC_REFILL);
 
 load_freelist:
-	c->freelist = get_freepointer(s, object);
+	/*
+	 * freelist is pointing to the list of objects to be used.
+	 * page is pointing to the page from which the objects are obtained.
+	 */
+	VM_BUG_ON(!c->page->frozen);
+	c->freelist = get_freepointer(s, freelist);
 	c->tid = next_tid(c->tid);
 	local_irq_restore(flags);
-	return object;
+	return freelist;
 
 new_slab:
 
@@ -2211,14 +2217,12 @@ new_slab:
 		goto redo;
 	}
 
-	/* Then do expensive stuff like retrieving pages from the partial lists */
-	object = get_partial(s, gfpflags, node, c);
+	freelist = get_partial(s, gfpflags, node, c);
 
-	if (unlikely(!object)) {
+	if (unlikely(!freelist)) {
+		freelist = new_slab_objects(s, gfpflags, node, &c);
 
-		object = new_slab_objects(s, gfpflags, node, &c);
-
-		if (unlikely(!object)) {
+		if (unlikely(!freelist)) {
 			if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
 				slab_out_of_memory(s, gfpflags, node);
 
@@ -2231,14 +2235,14 @@ new_slab:
 		goto load_freelist;
 
 	/* Only entered in the debug case */
-	if (!alloc_debug_processing(s, c->page, object, addr))
+	if (!alloc_debug_processing(s, c->page, freelist, addr))
 		goto new_slab;	/* Slab failed checks. Next slab needed */
+	deactivate_slab(s, c->page, get_freepointer(s, freelist));
 
-	deactivate_slab(s, c->page, get_freepointer(s, object));
 	c->page = NULL;
 	c->freelist = NULL;
 	local_irq_restore(flags);
-	return object;
+	return freelist;
 }
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 05/18] slub: Simplify control flow in __slab_alloc()
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (3 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 04/18] slub: Use freelist instead of "object" in __slab_alloc Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-14 21:45   ` Pekka Enberg
  2011-11-20 23:24   ` David Rientjes
  2011-11-11 20:07 ` [rfc 06/18] slub: Use page variable instead of c->page Christoph Lameter
                   ` (14 subsequent siblings)
  19 siblings, 2 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: min_cleaups --]
[-- Type: text/plain, Size: 1282 bytes --]

Simplify control flow.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 mm/slub.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-09 11:11:22.381541568 -0600
+++ linux-2.6/mm/slub.c	2011-11-09 11:11:25.881561697 -0600
@@ -2219,16 +2219,16 @@ new_slab:
 
 	freelist = get_partial(s, gfpflags, node, c);
 
-	if (unlikely(!freelist)) {
+	if (!freelist)
 		freelist = new_slab_objects(s, gfpflags, node, &c);
 
-		if (unlikely(!freelist)) {
-			if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
-				slab_out_of_memory(s, gfpflags, node);
-
-			local_irq_restore(flags);
-			return NULL;
-		}
+
+	if (unlikely(!freelist)) {
+		if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
+			slab_out_of_memory(s, gfpflags, node);
+
+		local_irq_restore(flags);
+		return NULL;
 	}
 
 	if (likely(!kmem_cache_debug(s)))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 06/18] slub: Use page variable instead of c->page.
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (4 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 05/18] slub: Simplify control flow in __slab_alloc() Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-14 21:46   ` Pekka Enberg
  2011-11-20 23:27   ` David Rientjes
  2011-11-11 20:07 ` [rfc 07/18] slub: pass page to node_match() instead of kmem_cache_cpu structure Christoph Lameter
                   ` (13 subsequent siblings)
  19 siblings, 2 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: use_paget --]
[-- Type: text/plain, Size: 2368 bytes --]

The kmem_cache_cpu object pointed to by c will become
volatile with the lockless patches later so extract
the c->page pointer at certain times.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 mm/slub.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-09 11:11:25.881561697 -0600
+++ linux-2.6/mm/slub.c	2011-11-09 11:11:32.231598204 -0600
@@ -2160,6 +2160,7 @@ static void *__slab_alloc(struct kmem_ca
 			  unsigned long addr, struct kmem_cache_cpu *c)
 {
 	void *freelist;
+	struct page *page;
 	unsigned long flags;
 
 	local_irq_save(flags);
@@ -2172,13 +2173,14 @@ static void *__slab_alloc(struct kmem_ca
 	c = this_cpu_ptr(s->cpu_slab);
 #endif
 
-	if (!c->page)
+	page = c->page;
+	if (!page)
 		goto new_slab;
 redo:
 
 	if (unlikely(!node_match(c, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
-		deactivate_slab(s, c->page, c->freelist);
+		deactivate_slab(s, page, c->freelist);
 		c->page = NULL;
 		c->freelist = NULL;
 		goto new_slab;
@@ -2186,7 +2188,7 @@ redo:
 
 	stat(s, ALLOC_SLOWPATH);
 
-	freelist = get_freelist(s, c->page);
+	freelist = get_freelist(s, page);
 
 	if (unlikely(!freelist)) {
 		c->page = NULL;
@@ -2210,8 +2212,8 @@ load_freelist:
 new_slab:
 
 	if (c->partial) {
-		c->page = c->partial;
-		c->partial = c->page->next;
+		page = c->page = c->partial;
+		c->partial = page->next;
 		stat(s, CPU_PARTIAL_ALLOC);
 		c->freelist = NULL;
 		goto redo;
@@ -2231,13 +2233,14 @@ new_slab:
 		return NULL;
 	}
 
+	page = c->page;
 	if (likely(!kmem_cache_debug(s)))
 		goto load_freelist;
 
 	/* Only entered in the debug case */
-	if (!alloc_debug_processing(s, c->page, freelist, addr))
+	if (!alloc_debug_processing(s, page, freelist, addr))
 		goto new_slab;	/* Slab failed checks. Next slab needed */
-	deactivate_slab(s, c->page, get_freepointer(s, freelist));
+	deactivate_slab(s, page, get_freepointer(s, freelist));
 
 	c->page = NULL;
 	c->freelist = NULL;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 07/18] slub: pass page to node_match() instead of kmem_cache_cpu structure
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (5 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 06/18] slub: Use page variable instead of c->page Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-20 23:28   ` David Rientjes
  2011-11-11 20:07 ` [rfc 08/18] slub: enable use of deactivate_slab with interrupts on Christoph Lameter
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: page_parameter_to_node_match --]
[-- Type: text/plain, Size: 1950 bytes --]

The page field in struct kmem_cache_cpu will go away soon and so its more
convenient to pass the page struct to kmem_cache_cpu instead.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-09 11:11:32.231598204 -0600
+++ linux-2.6/mm/slub.c	2011-11-09 11:11:42.081654804 -0600
@@ -2009,10 +2009,10 @@ static void flush_all(struct kmem_cache
  * Check if the objects in a per cpu structure fit numa
  * locality expectations.
  */
-static inline int node_match(struct kmem_cache_cpu *c, int node)
+static inline int node_match(struct page *page, int node)
 {
 #ifdef CONFIG_NUMA
-	if (node != NUMA_NO_NODE && page_to_nid(c->page) != node)
+	if (node != NUMA_NO_NODE && page_to_nid(page) != node)
 		return 0;
 #endif
 	return 1;
@@ -2178,7 +2178,7 @@ static void *__slab_alloc(struct kmem_ca
 		goto new_slab;
 redo:
 
-	if (unlikely(!node_match(c, node))) {
+	if (unlikely(!node_match(page, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
 		deactivate_slab(s, page, c->freelist);
 		c->page = NULL;
@@ -2263,6 +2263,7 @@ static __always_inline void *slab_alloc(
 {
 	void **object;
 	struct kmem_cache_cpu *c;
+	struct page *page;
 	unsigned long tid;
 
 	if (slab_pre_alloc_hook(s, gfpflags))
@@ -2288,7 +2289,8 @@ redo:
 	barrier();
 
 	object = c->freelist;
-	if (unlikely(!object || !node_match(c, node)))
+	page = c->page;
+	if (unlikely(!object || !node_match(page, node)))
 
 		object = __slab_alloc(s, gfpflags, node, addr, c);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 08/18] slub: enable use of deactivate_slab with interrupts on
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (6 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 07/18] slub: pass page to node_match() instead of kmem_cache_cpu structure Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-11 20:07 ` [rfc 09/18] slub: Run deactivate_slab with interrupts enabled Christoph Lameter
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: allocate_slab_with_irq_enabled --]
[-- Type: text/plain, Size: 2324 bytes --]

Locking needs to change a bit because we can no longer rely on interrupts
having been disabled.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 mm/slub.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-09 11:11:42.081654804 -0600
+++ linux-2.6/mm/slub.c	2011-11-09 11:11:45.341673526 -0600
@@ -1718,6 +1718,7 @@ static void deactivate_slab(struct kmem_
 	int tail = DEACTIVATE_TO_HEAD;
 	struct page new;
 	struct page old;
+	unsigned long uninitialized_var(flags);
 
 	if (page->freelist) {
 		stat(s, DEACTIVATE_REMOTE_FREES);
@@ -1744,7 +1745,7 @@ static void deactivate_slab(struct kmem_
 			new.inuse--;
 			VM_BUG_ON(!new.frozen);
 
-		} while (!__cmpxchg_double_slab(s, page,
+		} while (!cmpxchg_double_slab(s, page,
 			prior, counters,
 			freelist, new.counters,
 			"drain percpu freelist"));
@@ -1794,7 +1795,7 @@ redo:
 			 * that acquire_slab() will see a slab page that
 			 * is frozen
 			 */
-			spin_lock(&n->list_lock);
+			spin_lock_irqsave(&n->list_lock, flags);
 		}
 	} else {
 		m = M_FULL;
@@ -1805,7 +1806,7 @@ redo:
 			 * slabs from diagnostic functions will not see
 			 * any frozen slabs.
 			 */
-			spin_lock(&n->list_lock);
+			spin_lock_irqsave(&n->list_lock, flags);
 		}
 	}
 
@@ -1833,14 +1834,14 @@ redo:
 	}
 
 	l = m;
-	if (!__cmpxchg_double_slab(s, page,
+	if (!cmpxchg_double_slab(s, page,
 				old.freelist, old.counters,
 				new.freelist, new.counters,
 				"unfreezing slab"))
 		goto redo;
 
 	if (lock)
-		spin_unlock(&n->list_lock);
+		spin_unlock_irqrestore(&n->list_lock, flags);
 
 	if (m == M_FREE) {
 		stat(s, DEACTIVATE_EMPTY);
@@ -2178,7 +2179,7 @@ static void *__slab_alloc(struct kmem_ca
 		goto new_slab;
 redo:
 
-	if (unlikely(!node_match(page, node))) {
+	if (unlikely(!node_match(page, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
 		deactivate_slab(s, page, c->freelist);
 		c->page = NULL;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 09/18] slub: Run deactivate_slab with interrupts enabled
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (7 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 08/18] slub: enable use of deactivate_slab with interrupts on Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-11 20:07 ` [rfc 10/18] slub: Enable use of get_partial " Christoph Lameter
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: irq_enabled_deactivate_slab --]
[-- Type: text/plain, Size: 1297 bytes --]

Do not enable and disable interrupts if we were called with interrupts
enabled.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 mm/slub.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-09 11:11:45.341673526 -0600
+++ linux-2.6/mm/slub.c	2011-11-09 11:11:48.831693571 -0600
@@ -1279,10 +1279,11 @@ static struct page *allocate_slab(struct
 	struct page *page;
 	struct kmem_cache_order_objects oo = s->oo;
 	gfp_t alloc_gfp;
+	int irqs_were_disabled = irqs_disabled();
 
 	flags &= gfp_allowed_mask;
 
-	if (flags & __GFP_WAIT)
+	if (irqs_were_disabled && flags & __GFP_WAIT)
 		local_irq_enable();
 
 	flags |= s->allocflags;
@@ -1306,7 +1307,7 @@ static struct page *allocate_slab(struct
 			stat(s, ORDER_FALLBACK);
 	}
 
-	if (flags & __GFP_WAIT)
+	if (irqs_were_disabled && flags & __GFP_WAIT)
 		local_irq_disable();
 
 	if (!page)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 10/18] slub: Enable use of get_partial with interrupts enabled
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (8 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 09/18] slub: Run deactivate_slab with interrupts enabled Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-11 20:07 ` [rfc 11/18] slub: Acquire_slab() avoid loop Christoph Lameter
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: irq_enabled_acquire_slab --]
[-- Type: text/plain, Size: 1414 bytes --]

Need to disable interrupts when taking the nodelist lock.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-09 11:11:48.831693571 -0600
+++ linux-2.6/mm/slub.c	2011-11-09 11:11:53.611721013 -0600
@@ -1532,6 +1532,7 @@ static void *get_partial_node(struct kme
 {
 	struct page *page, *page2;
 	void *object = NULL;
+	unsigned long flags;
 
 	/*
 	 * Racy check. If we mistakenly see no partial slabs then we
@@ -1542,7 +1543,7 @@ static void *get_partial_node(struct kme
 	if (!n || !n->nr_partial)
 		return NULL;
 
-	spin_lock(&n->list_lock);
+	spin_lock_irqsave(&n->list_lock, flags);
 	list_for_each_entry_safe(page, page2, &n->partial, lru) {
 		void *t = acquire_slab(s, n, page, object == NULL);
 		int available;
@@ -1563,7 +1564,7 @@ static void *get_partial_node(struct kme
 			break;
 
 	}
-	spin_unlock(&n->list_lock);
+	spin_unlock_irqrestore(&n->list_lock, flags);
 	return object;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 11/18] slub: Acquire_slab() avoid loop
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (9 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 10/18] slub: Enable use of get_partial " Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-11 20:07 ` [rfc 12/18] slub: Remove kmem_cache_cpu dependency from acquire slab Christoph Lameter
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: simplify --]
[-- Type: text/plain, Size: 1613 bytes --]

Avoid the loop in acquire slab and simply fail if there is a conflict.

This will cause the next page on the lisdt to be considered.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-09 16:05:42.632059449 -0600
+++ linux-2.6/mm/slub.c	2011-11-09 16:06:06.802192215 -0600
@@ -1503,22 +1503,25 @@ static inline void *acquire_slab(struct
 	 * The old freelist is the list of objects for the
 	 * per cpu allocation list.
 	 */
-	do {
-		freelist = page->freelist;
-		counters = page->counters;
-		new.counters = counters;
-		if (mode)
-			new.inuse = page->objects;
+	freelist = page->freelist;
+	counters = page->counters;
+	new.counters = counters;
+	new.inuse = page->objects;
+	if (mode)
+		new.inuse = page->objects;
 
-		VM_BUG_ON(new.frozen);
-		new.frozen = 1;
+	VM_BUG_ON(new.frozen);
+	new.frozen = 1;
 
-	} while (!__cmpxchg_double_slab(s, page,
+	if (!__cmpxchg_double_slab(s, page,
 			freelist, counters,
 			NULL, new.counters,
-			"lock and freeze"));
+			"acquire_slab"))
+
+		return NULL;
 
 	remove_partial(n, page);
+	WARN_ON(!freelist);
 	return freelist;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 12/18] slub: Remove kmem_cache_cpu dependency from acquire slab
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (10 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 11/18] slub: Acquire_slab() avoid loop Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-11 20:07 ` [rfc 13/18] slub: Add functions to manage per cpu freelists Christoph Lameter
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: remove_kmem_cache_cpu_dependency_from_acquire_slab --]
[-- Type: text/plain, Size: 3520 bytes --]

The page can be determined later from the object pointer
via virt_to_head_page().

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-10 13:46:55.809479604 -0600
+++ linux-2.6/mm/slub.c	2011-11-10 14:33:56.815359070 -0600
@@ -1531,7 +1531,7 @@ static int put_cpu_partial(struct kmem_c
  * Try to allocate a partial slab from a specific node.
  */
 static void *get_partial_node(struct kmem_cache *s,
-		struct kmem_cache_node *n, struct kmem_cache_cpu *c)
+		struct kmem_cache_node *n)
 {
 	struct page *page, *page2;
 	void *object = NULL;
@@ -1555,7 +1555,6 @@ static void *get_partial_node(struct kme
 			break;
 
 		if (!object) {
-			c->page = page;
 			stat(s, ALLOC_FROM_PARTIAL);
 			object = t;
 			available =  page->objects - page->inuse;
@@ -1574,8 +1573,7 @@ static void *get_partial_node(struct kme
 /*
  * Get a page from somewhere. Search in increasing NUMA distances.
  */
-static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags,
-		struct kmem_cache_cpu *c)
+static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags)
 {
 #ifdef CONFIG_NUMA
 	struct zonelist *zonelist;
@@ -1615,7 +1613,7 @@ static struct page *get_any_partial(stru
 
 		if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
 				n->nr_partial > s->min_partial) {
-			object = get_partial_node(s, n, c);
+			object = get_partial_node(s, n);
 			if (object) {
 				put_mems_allowed();
 				return object;
@@ -1630,17 +1628,16 @@ static struct page *get_any_partial(stru
 /*
  * Get a partial page, lock it and return it.
  */
-static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
-		struct kmem_cache_cpu *c)
+static void *get_partial(struct kmem_cache *s, gfp_t flags, int node)
 {
 	void *object;
 	int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
 
-	object = get_partial_node(s, get_node(s, searchnode), c);
+	object = get_partial_node(s, get_node(s, searchnode));
 	if (object || node != NUMA_NO_NODE)
 		return object;
 
-	return get_any_partial(s, flags, c);
+	return get_any_partial(s, flags);
 }
 
 #ifdef CONFIG_PREEMPT
@@ -2088,7 +2085,7 @@ slab_out_of_memory(struct kmem_cache *s,
 }
 
 static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
-			int node, struct kmem_cache_cpu **pc)
+			int node)
 {
 	void *freelist;
 	struct kmem_cache_cpu *c;
@@ -2107,8 +2104,6 @@ static inline void *new_slab_objects(str
 		page->freelist = NULL;
 
 		stat(s, ALLOC_SLAB);
-		c->page = page;
-		*pc = c;
 	} else
 		freelist = NULL;
 
@@ -2225,10 +2220,10 @@ new_slab:
 		goto redo;
 	}
 
-	freelist = get_partial(s, gfpflags, node, c);
+	freelist = get_partial(s, gfpflags, node);
 
 	if (!freelist)
-		freelist = new_slab_objects(s, gfpflags, node, &c);
+		freelist = new_slab_objects(s, gfpflags, node);
 
 
 	if (unlikely(!freelist)) {
@@ -2239,7 +2234,8 @@ new_slab:
 		return NULL;
 	}
 
-	page = c->page;
+	page = c->page = virt_to_head_page(freelist);
+
 	if (likely(!kmem_cache_debug(s)))
 		goto load_freelist;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 13/18] slub: Add functions to manage per cpu freelists
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (11 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 12/18] slub: Remove kmem_cache_cpu dependency from acquire slab Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-11 20:07 ` [rfc 14/18] slub: Decomplicate the get_pointer_safe call and fixup statistics Christoph Lameter
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: newfuncs --]
[-- Type: text/plain, Size: 2236 bytes --]

Add a couple of functions that will be used later to manage the per cpu
freelists.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 mm/slub.c |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-10 13:50:40.250719277 -0600
+++ linux-2.6/mm/slub.c	2011-11-10 13:56:05.602531668 -0600
@@ -2111,6 +2111,58 @@ static inline void *new_slab_objects(str
 }
 
 /*
+ * Retrieve pointer to the current freelist and
+ * zap the per cpu object list.
+ *
+ * Returns NULL if there was no object on the freelist.
+ */
+void *get_cpu_objects(struct kmem_cache *s)
+{
+	void *freelist;
+	unsigned long tid;
+
+	do {
+		struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
+
+		tid = c->tid;
+		barrier();
+		freelist = c->freelist;
+		if (!freelist)
+			return NULL;
+
+	} while (!this_cpu_cmpxchg_double(s->cpu_slab->freelist, s->cpu_slab->tid,
+			freelist, tid,
+			NULL, next_tid(tid)));
+
+	return freelist;
+}
+
+/*
+ * Set the per cpu object list to the freelist. The page must
+ * be frozen.
+ *
+ * Page will be unfrozen (and the freelist object put onto the pages freelist)
+ * if the per cpu freelist has been used in the meantime.
+ */
+static inline void put_cpu_objects(struct kmem_cache *s,
+				struct page *page, void *freelist)
+{
+	unsigned long tid;
+
+	tid = this_cpu_read(s->cpu_slab->tid);
+	barrier();
+
+	VM_BUG_ON(!page->frozen);
+	if (!irqsafe_cpu_cmpxchg_double(s->cpu_slab->freelist, s->cpu_slab->tid,
+		NULL, tid, freelist, next_tid(tid)))
+
+		/*
+		 * There was an intervening free or alloc. Cannot free to the
+		 * per cpu queue. Must unfreeze page.
+		 */
+		deactivate_slab(s, page, freelist);
+}
+/*
  * Check the page->freelist of a page and either transfer the freelist to the per cpu freelist
  * or deactivate the page.
  *

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 14/18] slub: Decomplicate the get_pointer_safe call and fixup statistics
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (12 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 13/18] slub: Add functions to manage per cpu freelists Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-11 20:07 ` [rfc 15/18] slub: new_slab_objects() can also get objects from partial list Christoph Lameter
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: stats_and_co --]
[-- Type: text/plain, Size: 1650 bytes --]

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-10 14:06:58.436231549 -0600
+++ linux-2.6/mm/slub.c	2011-11-10 14:33:21.465160604 -0600
@@ -2349,6 +2349,8 @@ redo:
 		object = __slab_alloc(s, gfpflags, node, addr, c);
 
 	else {
+		void *next = get_freepointer_safe(s, object);
+
 		/*
 		 * The cmpxchg will only match if there was no additional
 		 * operation and if we are on the right processor.
@@ -2364,7 +2366,7 @@ redo:
 		if (unlikely(!irqsafe_cpu_cmpxchg_double(
 				s->cpu_slab->freelist, s->cpu_slab->tid,
 				object, tid,
-				get_freepointer_safe(s, object), next_tid(tid)))) {
+				next, next_tid(tid)))) {
 
 			note_cmpxchg_failure("slab_alloc", s, tid);
 			goto redo;
@@ -4506,12 +4508,13 @@ static ssize_t show_slab_objects(struct
 			if (!c || !c->page)
 				continue;
 
-			node = page_to_nid(c->page);
-			if (c->page) {
+			page = virt_to_head_page(c->freelist);
+			node = page_to_nid(page);
+			if (page) {
 					if (flags & SO_TOTAL)
-						x = c->page->objects;
+						x = page->objects;
 				else if (flags & SO_OBJECTS)
-					x = c->page->inuse;
+					x = page->inuse;
 				else
 					x = 1;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 15/18] slub: new_slab_objects() can also get objects from partial list
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (13 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 14/18] slub: Decomplicate the get_pointer_safe call and fixup statistics Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-11 20:07 ` [rfc 16/18] slub: Drop page field from kmem_cache_cpu Christoph Lameter
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: move_partials_into_new_slab --]
[-- Type: text/plain, Size: 1277 bytes --]

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-10 14:16:06.249338483 -0600
+++ linux-2.6/mm/slub.c	2011-11-10 14:22:31.831510734 -0600
@@ -2089,8 +2089,14 @@ static inline void *new_slab_objects(str
 {
 	void *freelist;
 	struct kmem_cache_cpu *c;
-	struct page *page = new_slab(s, flags, node);
+	struct page *page;
+
+	freelist = get_partial(s, flags, node);
 
+	if (freelist)
+		return freelist;
+
+	page = new_slab(s, flags, node);
 	if (page) {
 		c = __this_cpu_ptr(s->cpu_slab);
 		if (c->page)
@@ -2272,10 +2278,7 @@ new_slab:
 		goto redo;
 	}
 
-	freelist = get_partial(s, gfpflags, node);
-
-	if (!freelist)
-		freelist = new_slab_objects(s, gfpflags, node);
+	freelist = new_slab_objects(s, gfpflags, node);
 
 
 	if (unlikely(!freelist)) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 16/18] slub: Drop page field from kmem_cache_cpu
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (14 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 15/18] slub: new_slab_objects() can also get objects from partial list Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-11 20:07 ` [rfc 17/18] slub: Move __slab_free() into slab_free() Christoph Lameter
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: drop_kmem_cache_cpu_page --]
[-- Type: text/plain, Size: 6969 bytes --]

The page field can be calculated from the freelist pointer because

	page == virt_to_head_page(object)

This introduces additional inefficiencies since the determination of the
page can be complex.

We then end up with a special case for freelist == NULL because we can then no
longer determine which page is the active per cpu slab. Therefore we must
deactivate the slab page when the last object is allocated from the per cpu
list.

This patch in effect makes the slub allocator paths also lockless and no longer
requires a disabling of interrupts or preemption.

Signed-off-by: Christoph Lameter <cl@linux.com>



---
 mm/slub.c |  150 ++++++++++++++++++++++++++------------------------------------
 1 file changed, 65 insertions(+), 85 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-10 14:22:31.000000000 -0600
+++ linux-2.6/mm/slub.c	2011-11-10 14:23:54.971978776 -0600
@@ -1972,11 +1972,11 @@ int put_cpu_partial(struct kmem_cache *s
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
 	stat(s, CPUSLAB_FLUSH);
-	deactivate_slab(s, c->page, c->freelist);
-
-	c->tid = next_tid(c->tid);
-	c->page = NULL;
-	c->freelist = NULL;
+	if (c->freelist) {
+		deactivate_slab(s, virt_to_head_page(c->freelist), c->freelist);
+		c->tid = next_tid(c->tid);
+		c->freelist = NULL;
+}
 }
 
 /*
@@ -1988,9 +1988,8 @@ static inline void __flush_cpu_slab(stru
 {
 	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
 
-	if (likely(c)) {
-		if (c->page)
-			flush_slab(s, c);
+	if (likely(c->freelist)) {
+		flush_slab(s, c);
 
 		unfreeze_partials(s);
 	}
@@ -2088,9 +2087,9 @@ static inline void *new_slab_objects(str
 			int node)
 {
 	void *freelist;
-	struct kmem_cache_cpu *c;
 	struct page *page;
 
+	/* Per node partial list */
 	freelist = get_partial(s, flags, node);
 
 	if (freelist)
@@ -2098,10 +2097,6 @@ static inline void *new_slab_objects(str
 
 	page = new_slab(s, flags, node);
 	if (page) {
-		c = __this_cpu_ptr(s->cpu_slab);
-		if (c->page)
-			flush_slab(s, c);
-
 		/*
 		 * No other reference to the page yet so we can
 		 * muck around with it freely without cmpxchg
@@ -2216,92 +2211,72 @@ static inline void *get_freelist(struct
  * a call to the page allocator and the setup of a new slab.
  */
 static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
-			  unsigned long addr, struct kmem_cache_cpu *c)
+		unsigned long addr)
 {
 	void *freelist;
 	struct page *page;
-	unsigned long flags;
-
-	local_irq_save(flags);
-#ifdef CONFIG_PREEMPT
-	/*
-	 * We may have been preempted and rescheduled on a different
-	 * cpu before disabling interrupts. Need to reload cpu area
-	 * pointer.
-	 */
-	c = this_cpu_ptr(s->cpu_slab);
-#endif
-
-	page = c->page;
-	if (!page)
-		goto new_slab;
-redo:
-
-	if (unlikely(!node_match(page, node))) {
-		stat(s, ALLOC_NODE_MISMATCH);
-		deactivate_slab(s, page, c->freelist);
-		c->page = NULL;
-		c->freelist = NULL;
-		goto new_slab;
-	}
 
 	stat(s, ALLOC_SLOWPATH);
 
-	freelist = get_freelist(s, page);
+retry:
+	freelist = get_cpu_objects(s);
+	/* Try per cpu partial list */
+	if (!freelist) {
+
+		page = this_cpu_read(s->cpu_slab->partial);
+		if (page && this_cpu_cmpxchg(s->cpu_slab->partial,
+				page, page->next) == page) {
+			stat(s, CPU_PARTIAL_ALLOC);
+			freelist = get_freelist(s, page);
+		}
+	} else
+		page = virt_to_head_page(freelist);
 
-	if (unlikely(!freelist)) {
-		c->page = NULL;
-		stat(s, DEACTIVATE_BYPASS);
-		goto new_slab;
+	if (freelist) {
+		if (likely(node_match(page, node)))
+			stat(s, ALLOC_REFILL);
+		else {
+			stat(s, ALLOC_NODE_MISMATCH);
+			deactivate_slab(s, page, freelist);
+			freelist = NULL;
+		}
 	}
 
-	stat(s, ALLOC_REFILL);
-
-load_freelist:
-	/*
-	 * freelist is pointing to the list of objects to be used.
-	 * page is pointing to the page from which the objects are obtained.
-	 */
-	VM_BUG_ON(!c->page->frozen);
-	c->freelist = get_freepointer(s, freelist);
-	c->tid = next_tid(c->tid);
-	local_irq_restore(flags);
-	return freelist;
-
-new_slab:
-
-	if (c->partial) {
-		page = c->page = c->partial;
-		c->partial = page->next;
-		stat(s, CPU_PARTIAL_ALLOC);
-		c->freelist = NULL;
-		goto redo;
+	/* Allocate a new slab */
+	if (!freelist) {
+		freelist = new_slab_objects(s, gfpflags, node);
+		if (freelist)
+			page = virt_to_head_page(freelist);
 	}
 
-	freelist = new_slab_objects(s, gfpflags, node);
-
-
-	if (unlikely(!freelist)) {
+	/* If nothing worked then fail */
+	if (!freelist) {
 		if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
 			slab_out_of_memory(s, gfpflags, node);
 
-		local_irq_restore(flags);
 		return NULL;
 	}
 
-	page = c->page = virt_to_head_page(freelist);
+	if (unlikely(kmem_cache_debug(s)) &&
+				!alloc_debug_processing(s, page, freelist, addr))
+			goto retry;
+
+	VM_BUG_ON(!page->frozen);
+
+	{
+		void *next = get_freepointer(s, freelist);
 
-	if (likely(!kmem_cache_debug(s)))
-		goto load_freelist;
+		if (!next)
+			/*
+			 * last object so we either unfreeze the page or
+			 * get more objects.
+			 */
+			next = get_freelist(s, page);
+
+		if (next)
+			put_cpu_objects(s, page, next);
+	}
 
-	/* Only entered in the debug case */
-	if (!alloc_debug_processing(s, page, freelist, addr))
-		goto new_slab;	/* Slab failed checks. Next slab needed */
-	deactivate_slab(s, page, get_freepointer(s, freelist));
-
-	c->page = NULL;
-	c->freelist = NULL;
-	local_irq_restore(flags);
 	return freelist;
 }
 
@@ -2320,7 +2295,7 @@ static __always_inline void *slab_alloc(
 {
 	void **object;
 	struct kmem_cache_cpu *c;
-	struct page *page;
+	struct page *page = NULL;
 	unsigned long tid;
 
 	if (slab_pre_alloc_hook(s, gfpflags))
@@ -2346,10 +2321,9 @@ redo:
 	barrier();
 
 	object = c->freelist;
-	page = c->page;
-	if (unlikely(!object || !node_match(page, node)))
+	if (unlikely(!object || !node_match((page = virt_to_head_page(object)), node)))
 
-		object = __slab_alloc(s, gfpflags, node, addr, c);
+		object = __slab_alloc(s, gfpflags, node, addr);
 
 	else {
 		void *next = get_freepointer_safe(s, object);
@@ -2375,6 +2349,12 @@ redo:
 			goto redo;
 		}
 		stat(s, ALLOC_FASTPATH);
+		if (!next) {
+			next = get_freelist(s, page);
+			if (next)
+				/* Refill the per cpu queue */
+				put_cpu_objects(s, page, next);
+		}
 	}
 
 	if (unlikely(gfpflags & __GFP_ZERO) && object)
@@ -2593,7 +2573,7 @@ redo:
 	tid = c->tid;
 	barrier();
 
-	if (likely(page == c->page)) {
+	if (c->freelist && likely(page == virt_to_head_page(c->freelist))) {
 		set_freepointer(s, object, c->freelist);
 
 		if (unlikely(!irqsafe_cpu_cmpxchg_double(

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 17/18] slub: Move __slab_free() into slab_free()
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (15 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 16/18] slub: Drop page field from kmem_cache_cpu Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-11 20:07 ` [rfc 18/18] slub: Move __slab_alloc() into slab_alloc() Christoph Lameter
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: move_kfree --]
[-- Type: text/plain, Size: 6750 bytes --]

Both functions now share variables and the control flow is easier to follow
as a single function.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 mm/slub.c |  173 ++++++++++++++++++++++++++++++--------------------------------
 1 file changed, 84 insertions(+), 89 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-11 09:33:03.545996392 -0600
+++ linux-2.6/mm/slub.c	2011-11-11 09:42:39.619212550 -0600
@@ -2290,7 +2290,7 @@ retry:
  *
  * Otherwise we can simply pick the next object from the lockless free list.
  */
-static __always_inline void *slab_alloc(struct kmem_cache *s,
+static void *slab_alloc(struct kmem_cache *s,
 		gfp_t gfpflags, int node, unsigned long addr)
 {
 	void **object;
@@ -2421,30 +2421,69 @@ EXPORT_SYMBOL(kmem_cache_alloc_node_trac
 #endif
 
 /*
- * Slow patch handling. This may still be called frequently since objects
- * have a longer lifetime than the cpu slabs in most processing loads.
- *
- * So we still attempt to reduce cache line usage. Just take the slab
- * lock and free the item. If there is no additional partial page
- * handling required then we can return immediately.
+ * Free an object. First see if the object is from the per cpu slab.
+ * if so then it can be freed to the per cpu queue. Otherwise we have
+ * to free the object to the free queue of the slab page.
  */
-static void __slab_free(struct kmem_cache *s, struct page *page,
-			void *x, unsigned long addr)
+static void slab_free(struct kmem_cache *s,
+			struct page *page, void *x, unsigned long addr)
 {
-	void *prior;
+	struct kmem_cache_node *n = NULL;
 	void **object = (void *)x;
+	struct kmem_cache_cpu *c;
+	unsigned long tid;
+	void *prior;
 	int was_frozen;
 	int inuse;
-	struct page new;
 	unsigned long counters;
-	struct kmem_cache_node *n = NULL;
 	unsigned long uninitialized_var(flags);
+	struct page new;
 
-	stat(s, FREE_SLOWPATH);
+
+	slab_free_hook(s, x);
+
+	/*
+	 * First see if we can free to the per cpu list in kmem_cache_cpu
+	 */
+	do {
+		/*
+		 * Determine the currently cpus per cpu slab.
+		 * The cpu may change afterward. However that does not matter since
+		 * data is retrieved via this pointer. If we are on the same cpu
+		 * during the cmpxchg then the free will succeed.
+		 */
+		c = __this_cpu_ptr(s->cpu_slab);
+
+		tid = c->tid;
+		barrier();
+
+		if (!c->freelist || unlikely(page != virt_to_head_page(c->freelist)))
+			break;
+
+		set_freepointer(s, object, c->freelist);
+
+		if (likely(irqsafe_cpu_cmpxchg_double(
+				s->cpu_slab->freelist, s->cpu_slab->tid,
+				c->freelist, tid,
+				object, next_tid(tid)))) {
+
+			stat(s, FREE_FASTPATH);
+			return;
+
+		}
+
+		note_cmpxchg_failure("slab_free", s, tid);
+
+	} while (1);
 
 	if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr))
 		return;
 
+	stat(s, FREE_SLOWPATH);
+
+	/*
+ 	 * Put the object onto the slab pages freelist.
+	 */
 	do {
 		prior = page->freelist;
 		counters = page->counters;
@@ -2484,6 +2523,10 @@ static void __slab_free(struct kmem_cach
 		object, new.counters,
 		"__slab_free"));
 
+
+	if (was_frozen)
+		stat(s, FREE_FROZEN);
+
 	if (likely(!n)) {
 
 		/*
@@ -2497,20 +2540,37 @@ static void __slab_free(struct kmem_cach
 		 * The list lock was not taken therefore no list
 		 * activity can be necessary.
 		 */
-                if (was_frozen)
-                        stat(s, FREE_FROZEN);
-                return;
-        }
+		return;
+	}
 
 	/*
-	 * was_frozen may have been set after we acquired the list_lock in
-	 * an earlier loop. So we need to check it here again.
+	 * List lock was taken. We have to deal with additional
+	 * complexer processing.
 	 */
-	if (was_frozen)
-		stat(s, FREE_FROZEN);
-	else {
-		if (unlikely(!inuse && n->nr_partial > s->min_partial))
-                        goto slab_empty;
+	if (!was_frozen) {
+
+		/*
+		 * Only if the slab page was not frozen will we have to do
+		 * list update activities.
+		 */
+		if (unlikely(!inuse && n->nr_partial > s->min_partial)) {
+
+			/* Slab is now empty and could be freed */
+			if (prior) {
+				/*
+				 * Slab was on the partial list.
+				 */
+				remove_partial(n, page);
+				stat(s, FREE_REMOVE_PARTIAL);
+			} else
+				/* Slab must be on the full list */
+				remove_full(s, page);
+
+			spin_unlock_irqrestore(&n->list_lock, flags);
+			stat(s, FREE_SLAB);
+			discard_slab(s, page);
+			return;
+		}
 
 		/*
 		 * Objects left in the slab. If it was not on the partial list before
@@ -2523,71 +2583,6 @@ static void __slab_free(struct kmem_cach
 		}
 	}
 	spin_unlock_irqrestore(&n->list_lock, flags);
-	return;
-
-slab_empty:
-	if (prior) {
-		/*
-		 * Slab on the partial list.
-		 */
-		remove_partial(n, page);
-		stat(s, FREE_REMOVE_PARTIAL);
-	} else
-		/* Slab must be on the full list */
-		remove_full(s, page);
-
-	spin_unlock_irqrestore(&n->list_lock, flags);
-	stat(s, FREE_SLAB);
-	discard_slab(s, page);
-}
-
-/*
- * Fastpath with forced inlining to produce a kfree and kmem_cache_free that
- * can perform fastpath freeing without additional function calls.
- *
- * The fastpath is only possible if we are freeing to the current cpu slab
- * of this processor. This typically the case if we have just allocated
- * the item before.
- *
- * If fastpath is not possible then fall back to __slab_free where we deal
- * with all sorts of special processing.
- */
-static __always_inline void slab_free(struct kmem_cache *s,
-			struct page *page, void *x, unsigned long addr)
-{
-	void **object = (void *)x;
-	struct kmem_cache_cpu *c;
-	unsigned long tid;
-
-	slab_free_hook(s, x);
-
-redo:
-	/*
-	 * Determine the currently cpus per cpu slab.
-	 * The cpu may change afterward. However that does not matter since
-	 * data is retrieved via this pointer. If we are on the same cpu
-	 * during the cmpxchg then the free will succedd.
-	 */
-	c = __this_cpu_ptr(s->cpu_slab);
-
-	tid = c->tid;
-	barrier();
-
-	if (c->freelist && likely(page == virt_to_head_page(c->freelist))) {
-		set_freepointer(s, object, c->freelist);
-
-		if (unlikely(!irqsafe_cpu_cmpxchg_double(
-				s->cpu_slab->freelist, s->cpu_slab->tid,
-				c->freelist, tid,
-				object, next_tid(tid)))) {
-
-			note_cmpxchg_failure("slab_free", s, tid);
-			goto redo;
-		}
-		stat(s, FREE_FASTPATH);
-	} else
-		__slab_free(s, page, x, addr);
-
 }
 
 void kmem_cache_free(struct kmem_cache *s, void *x)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc 18/18] slub: Move __slab_alloc() into slab_alloc()
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (16 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 17/18] slub: Move __slab_free() into slab_free() Christoph Lameter
@ 2011-11-11 20:07 ` Christoph Lameter
  2011-11-16 17:39 ` [rfc 00/18] slub: irqless/lockless slow allocation paths Eric Dumazet
  2011-11-20 23:30 ` David Rientjes
  19 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-11 20:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

[-- Attachment #1: move_alloc --]
[-- Type: text/plain, Size: 5903 bytes --]

Both functions are now quite small and share numerous variables.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 mm/slub.c |  170 ++++++++++++++++++++++++++------------------------------------
 1 file changed, 73 insertions(+), 97 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-11-11 09:33:05.056004788 -0600
+++ linux-2.6/mm/slub.c	2011-11-11 09:38:51.767942529 -0600
@@ -2195,100 +2195,13 @@ static inline void *get_freelist(struct
 }
 
 /*
- * Slow path. The lockless freelist is empty or we need to perform
- * debugging duties.
+ * Main allocation function. First try to allocate from per cpu
+ * object list, if empty replenish list from per cpu page list,
+ * then from the per node partial list. Finally go to the
+ * page allocator if nothing else is available.
  *
- * Processing is still very fast if new objects have been freed to the
- * regular freelist. In that case we simply take over the regular freelist
- * as the lockless freelist and zap the regular freelist.
- *
- * If that is not working then we fall back to the partial lists. We take the
- * first element of the freelist as the object to allocate now and move the
- * rest of the freelist to the lockless freelist.
- *
- * And if we were unable to get a new slab from the partial slab lists then
- * we need to allocate a new slab. This is the slowest path since it involves
- * a call to the page allocator and the setup of a new slab.
- */
-static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
-		unsigned long addr)
-{
-	void *freelist;
-	struct page *page;
-
-	stat(s, ALLOC_SLOWPATH);
-
-retry:
-	freelist = get_cpu_objects(s);
-	/* Try per cpu partial list */
-	if (!freelist) {
-
-		page = this_cpu_read(s->cpu_slab->partial);
-		if (page && this_cpu_cmpxchg(s->cpu_slab->partial,
-				page, page->next) == page) {
-			stat(s, CPU_PARTIAL_ALLOC);
-			freelist = get_freelist(s, page);
-		}
-	} else
-		page = virt_to_head_page(freelist);
-
-	if (freelist) {
-		if (likely(node_match(page, node)))
-			stat(s, ALLOC_REFILL);
-		else {
-			stat(s, ALLOC_NODE_MISMATCH);
-			deactivate_slab(s, page, freelist);
-			freelist = NULL;
-		}
-	}
-
-	/* Allocate a new slab */
-	if (!freelist) {
-		freelist = new_slab_objects(s, gfpflags, node);
-		if (freelist)
-			page = virt_to_head_page(freelist);
-	}
-
-	/* If nothing worked then fail */
-	if (!freelist) {
-		if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
-			slab_out_of_memory(s, gfpflags, node);
-
-		return NULL;
-	}
-
-	if (unlikely(kmem_cache_debug(s)) &&
-				!alloc_debug_processing(s, page, freelist, addr))
-			goto retry;
-
-	VM_BUG_ON(!page->frozen);
-
-	{
-		void *next = get_freepointer(s, freelist);
-
-		if (!next)
-			/*
-			 * last object so we either unfreeze the page or
-			 * get more objects.
-			 */
-			next = get_freelist(s, page);
-
-		if (next)
-			put_cpu_objects(s, page, next);
-	}
-
-	return freelist;
-}
-
-/*
- * Inlined fastpath so that allocation functions (kmalloc, kmem_cache_alloc)
- * have the fastpath folded into their functions. So no function call
- * overhead for requests that can be satisfied on the fastpath.
- *
- * The fastpath works by first checking if the lockless freelist can be used.
- * If not then __slab_alloc is called for slow processing.
- *
- * Otherwise we can simply pick the next object from the lockless free list.
+ * This is one of the most performance critical function of the
+ * Linux kernel.
  */
 static void *slab_alloc(struct kmem_cache *s,
 		gfp_t gfpflags, int node, unsigned long addr)
@@ -2321,11 +2234,8 @@ redo:
 	barrier();
 
 	object = c->freelist;
-	if (unlikely(!object || !node_match((page = virt_to_head_page(object)), node)))
+	if (likely(object && node_match((page = virt_to_head_page(object)), node))) {
 
-		object = __slab_alloc(s, gfpflags, node, addr);
-
-	else {
 		void *next = get_freepointer_safe(s, object);
 
 		/*
@@ -2355,8 +2265,74 @@ redo:
 				/* Refill the per cpu queue */
 				put_cpu_objects(s, page, next);
 		}
+
+	} else {
+
+		void *freelist;
+
+		stat(s, ALLOC_SLOWPATH);
+
+retry:
+		freelist = get_cpu_objects(s);
+		/* Try per cpu partial list */
+		if (!freelist) {
+
+			page = this_cpu_read(s->cpu_slab->partial);
+			if (page && this_cpu_cmpxchg(s->cpu_slab->partial,
+					page, page->next) == page) {
+				stat(s, CPU_PARTIAL_ALLOC);
+				freelist = get_freelist(s, page);
+			}
+		} else
+			page = virt_to_head_page(freelist);
+
+		if (freelist) {
+			if (likely(node_match(page, node)))
+				stat(s, ALLOC_REFILL);
+			else {
+				stat(s, ALLOC_NODE_MISMATCH);
+				deactivate_slab(s, page, freelist);
+				freelist = NULL;
+			}
+		}
+
+		/* Allocate a new slab */
+		if (!freelist) {
+			freelist = new_slab_objects(s, gfpflags, node);
+			if (freelist)
+				page = virt_to_head_page(freelist);
+		}
+
+		/* If nothing worked then fail */
+		if (!freelist) {
+			if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
+				slab_out_of_memory(s, gfpflags, node);
+
+			return NULL;
+		}
+
+		if (unlikely(kmem_cache_debug(s)) &&
+				!alloc_debug_processing(s, page, freelist, addr))
+			goto retry;
+
+		VM_BUG_ON(!page->frozen);
+
+		object = freelist;
+		freelist = get_freepointer(s, freelist);
+
+		if (!freelist)
+			/*
+			 * last object so we either unfreeze the page or
+			 * get more objects.
+			 */
+			freelist = get_freelist(s, page);
+
+		if (freelist)
+			put_cpu_objects(s, page, freelist);
+
 	}
 
+
 	if (unlikely(gfpflags & __GFP_ZERO) && object)
 		memset(object, 0, s->objsize);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 01/18] slub: Get rid of the node field
  2011-11-11 20:07 ` [rfc 01/18] slub: Get rid of the node field Christoph Lameter
@ 2011-11-14 21:42   ` Pekka Enberg
  2011-11-15 16:07     ` Christoph Lameter
  2011-11-20 23:01   ` David Rientjes
  1 sibling, 1 reply; 39+ messages in thread
From: Pekka Enberg @ 2011-11-14 21:42 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

On Fri, Nov 11, 2011 at 10:07 PM, Christoph Lameter <cl@linux.com> wrote:
> The node field is always page_to_nid(c->page). So its rather easy to
> replace. Note that there will be additional overhead in various hot paths
> due to the need to mask a set of bits in page->flags and shift the
> result.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>

This is a nice cleanup even if we never go irqless in the slowpaths.
Is page_to_nid() really that slow?

>
> ---
>  include/linux/slub_def.h |    1 -
>  mm/slub.c                |   15 ++++++---------
>  2 files changed, 6 insertions(+), 10 deletions(-)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c    2011-11-08 09:53:04.043865616 -0600
> +++ linux-2.6/mm/slub.c 2011-11-09 11:10:46.111334466 -0600
> @@ -1551,7 +1551,6 @@ static void *get_partial_node(struct kme
>
>                if (!object) {
>                        c->page = page;
> -                       c->node = page_to_nid(page);
>                        stat(s, ALLOC_FROM_PARTIAL);
>                        object = t;
>                        available =  page->objects - page->inuse;
> @@ -2016,7 +2015,7 @@ static void flush_all(struct kmem_cache
>  static inline int node_match(struct kmem_cache_cpu *c, int node)
>  {
>  #ifdef CONFIG_NUMA
> -       if (node != NUMA_NO_NODE && c->node != node)
> +       if (node != NUMA_NO_NODE && page_to_nid(c->page) != node)
>                return 0;
>  #endif
>        return 1;
> @@ -2105,7 +2104,6 @@ static inline void *new_slab_objects(str
>                page->freelist = NULL;
>
>                stat(s, ALLOC_SLAB);
> -               c->node = page_to_nid(page);
>                c->page = page;
>                *pc = c;
>        } else
> @@ -2202,7 +2200,6 @@ new_slab:
>        if (c->partial) {
>                c->page = c->partial;
>                c->partial = c->page->next;
> -               c->node = page_to_nid(c->page);
>                stat(s, CPU_PARTIAL_ALLOC);
>                c->freelist = NULL;
>                goto redo;
> @@ -2233,7 +2230,6 @@ new_slab:
>
>        c->freelist = get_freepointer(s, object);
>        deactivate_slab(s, c);
> -       c->node = NUMA_NO_NODE;
>        local_irq_restore(flags);
>        return object;
>  }
> @@ -4437,9 +4433,10 @@ static ssize_t show_slab_objects(struct
>                        struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
>                        struct page *page;
>
> -                       if (!c || c->node < 0)
> +                       if (!c || !c->page)
>                                continue;
>
> +                       node = page_to_nid(c->page);
>                        if (c->page) {
>                                        if (flags & SO_TOTAL)
>                                                x = c->page->objects;
> @@ -4449,16 +4446,16 @@ static ssize_t show_slab_objects(struct
>                                        x = 1;
>
>                                total += x;
> -                               nodes[c->node] += x;
> +                               nodes[node] += x;
>                        }
>                        page = c->partial;
>
>                        if (page) {
>                                x = page->pobjects;
>                                 total += x;
> -                                nodes[c->node] += x;
> +                                nodes[node] += x;
>                        }
> -                       per_cpu[c->node]++;
> +                       per_cpu[node]++;
>                }
>        }
>
> Index: linux-2.6/include/linux/slub_def.h
> ===================================================================
> --- linux-2.6.orig/include/linux/slub_def.h     2011-11-08 09:53:03.979865196 -0600
> +++ linux-2.6/include/linux/slub_def.h  2011-11-09 11:10:46.121334523 -0600
> @@ -45,7 +45,6 @@ struct kmem_cache_cpu {
>        unsigned long tid;      /* Globally unique transaction id */
>        struct page *page;      /* The slab from which we are allocating */
>        struct page *partial;   /* Partially allocated frozen slabs */
> -       int node;               /* The node of the page (or -1 for debug) */
>  #ifdef CONFIG_SLUB_STATS
>        unsigned stat[NR_SLUB_STAT_ITEMS];
>  #endif
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 03/18] slub: Extract get_freelist from __slab_alloc
  2011-11-11 20:07 ` [rfc 03/18] slub: Extract get_freelist from __slab_alloc Christoph Lameter
@ 2011-11-14 21:43   ` Pekka Enberg
  2011-11-15 16:08     ` Christoph Lameter
  2011-11-20 23:18   ` David Rientjes
  1 sibling, 1 reply; 39+ messages in thread
From: Pekka Enberg @ 2011-11-14 21:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

On Fri, Nov 11, 2011 at 10:07 PM, Christoph Lameter <cl@linux.com> wrote:
> get_freelist retrieves free objects from the page freelist (put there by remote
> frees) or deactivates a slab page if no more objects are available.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>

This is a also a nice cleanup. Any reason I shouldn't apply this?

>
>
> ---
>  mm/slub.c |   57 ++++++++++++++++++++++++++++++++-------------------------
>  1 file changed, 32 insertions(+), 25 deletions(-)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c    2011-11-09 11:10:55.671388657 -0600
> +++ linux-2.6/mm/slub.c 2011-11-09 11:11:13.471490305 -0600
> @@ -2110,6 +2110,37 @@ static inline void *new_slab_objects(str
>  }
>
>  /*
> + * Check the page->freelist of a page and either transfer the freelist to the per cpu freelist
> + * or deactivate the page.
> + *
> + * The page is still frozen if the return value is not NULL.
> + *
> + * If this function returns NULL then the page has been unfrozen.
> + */
> +static inline void *get_freelist(struct kmem_cache *s, struct page *page)
> +{
> +       struct page new;
> +       unsigned long counters;
> +       void *freelist;
> +
> +       do {
> +               freelist = page->freelist;
> +               counters = page->counters;
> +               new.counters = counters;
> +               VM_BUG_ON(!new.frozen);
> +
> +               new.inuse = page->objects;
> +               new.frozen = freelist != NULL;
> +
> +       } while (!cmpxchg_double_slab(s, page,
> +               freelist, counters,
> +               NULL, new.counters,
> +               "get_freelist"));
> +
> +       return freelist;
> +}
> +
> +/*
>  * Slow path. The lockless freelist is empty or we need to perform
>  * debugging duties.
>  *
> @@ -2130,8 +2161,6 @@ static void *__slab_alloc(struct kmem_ca
>  {
>        void **object;
>        unsigned long flags;
> -       struct page new;
> -       unsigned long counters;
>
>        local_irq_save(flags);
>  #ifdef CONFIG_PREEMPT
> @@ -2156,29 +2185,7 @@ redo:
>
>        stat(s, ALLOC_SLOWPATH);
>
> -       do {
> -               object = c->page->freelist;
> -               counters = c->page->counters;
> -               new.counters = counters;
> -               VM_BUG_ON(!new.frozen);
> -
> -               /*
> -                * If there is no object left then we use this loop to
> -                * deactivate the slab which is simple since no objects
> -                * are left in the slab and therefore we do not need to
> -                * put the page back onto the partial list.
> -                *
> -                * If there are objects left then we retrieve them
> -                * and use them to refill the per cpu queue.
> -                */
> -
> -               new.inuse = c->page->objects;
> -               new.frozen = object != NULL;
> -
> -       } while (!__cmpxchg_double_slab(s, c->page,
> -                       object, counters,
> -                       NULL, new.counters,
> -                       "__slab_alloc"));
> +       object = get_freelist(s, c->page);
>
>        if (!object) {
>                c->page = NULL;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 04/18] slub: Use freelist instead of "object" in __slab_alloc
  2011-11-11 20:07 ` [rfc 04/18] slub: Use freelist instead of "object" in __slab_alloc Christoph Lameter
@ 2011-11-14 21:44   ` Pekka Enberg
  2011-11-20 23:22   ` David Rientjes
  1 sibling, 0 replies; 39+ messages in thread
From: Pekka Enberg @ 2011-11-14 21:44 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

On Fri, Nov 11, 2011 at 10:07 PM, Christoph Lameter <cl@linux.com> wrote:
> The variable "object" really refers to a list of objects that we
> are handling. Since the lockless allocator path will depend on it
> we rename the variable now.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>

Also a reasonable cleanup.

> ---
>  mm/slub.c |   40 ++++++++++++++++++++++------------------
>  1 file changed, 22 insertions(+), 18 deletions(-)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c    2011-11-09 11:11:13.471490305 -0600
> +++ linux-2.6/mm/slub.c 2011-11-09 11:11:22.381541568 -0600
> @@ -2084,7 +2084,7 @@ slab_out_of_memory(struct kmem_cache *s,
>  static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
>                        int node, struct kmem_cache_cpu **pc)
>  {
> -       void *object;
> +       void *freelist;
>        struct kmem_cache_cpu *c;
>        struct page *page = new_slab(s, flags, node);
>
> @@ -2097,16 +2097,16 @@ static inline void *new_slab_objects(str
>                 * No other reference to the page yet so we can
>                 * muck around with it freely without cmpxchg
>                 */
> -               object = page->freelist;
> +               freelist = page->freelist;
>                page->freelist = NULL;
>
>                stat(s, ALLOC_SLAB);
>                c->page = page;
>                *pc = c;
>        } else
> -               object = NULL;
> +               freelist = NULL;
>
> -       return object;
> +       return freelist;
>  }
>
>  /*
> @@ -2159,7 +2159,7 @@ static inline void *get_freelist(struct
>  static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>                          unsigned long addr, struct kmem_cache_cpu *c)
>  {
> -       void **object;
> +       void *freelist;
>        unsigned long flags;
>
>        local_irq_save(flags);
> @@ -2175,6 +2175,7 @@ static void *__slab_alloc(struct kmem_ca
>        if (!c->page)
>                goto new_slab;
>  redo:
> +
>        if (unlikely(!node_match(c, node))) {
>                stat(s, ALLOC_NODE_MISMATCH);
>                deactivate_slab(s, c->page, c->freelist);
> @@ -2185,9 +2186,9 @@ redo:
>
>        stat(s, ALLOC_SLOWPATH);
>
> -       object = get_freelist(s, c->page);
> +       freelist = get_freelist(s, c->page);
>
> -       if (!object) {
> +       if (unlikely(!freelist)) {
>                c->page = NULL;
>                stat(s, DEACTIVATE_BYPASS);
>                goto new_slab;
> @@ -2196,10 +2197,15 @@ redo:
>        stat(s, ALLOC_REFILL);
>
>  load_freelist:
> -       c->freelist = get_freepointer(s, object);
> +       /*
> +        * freelist is pointing to the list of objects to be used.
> +        * page is pointing to the page from which the objects are obtained.
> +        */
> +       VM_BUG_ON(!c->page->frozen);
> +       c->freelist = get_freepointer(s, freelist);
>        c->tid = next_tid(c->tid);
>        local_irq_restore(flags);
> -       return object;
> +       return freelist;
>
>  new_slab:
>
> @@ -2211,14 +2217,12 @@ new_slab:
>                goto redo;
>        }
>
> -       /* Then do expensive stuff like retrieving pages from the partial lists */
> -       object = get_partial(s, gfpflags, node, c);
> +       freelist = get_partial(s, gfpflags, node, c);
>
> -       if (unlikely(!object)) {
> +       if (unlikely(!freelist)) {
> +               freelist = new_slab_objects(s, gfpflags, node, &c);
>
> -               object = new_slab_objects(s, gfpflags, node, &c);
> -
> -               if (unlikely(!object)) {
> +               if (unlikely(!freelist)) {
>                        if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
>                                slab_out_of_memory(s, gfpflags, node);
>
> @@ -2231,14 +2235,14 @@ new_slab:
>                goto load_freelist;
>
>        /* Only entered in the debug case */
> -       if (!alloc_debug_processing(s, c->page, object, addr))
> +       if (!alloc_debug_processing(s, c->page, freelist, addr))
>                goto new_slab;  /* Slab failed checks. Next slab needed */
> +       deactivate_slab(s, c->page, get_freepointer(s, freelist));
>
> -       deactivate_slab(s, c->page, get_freepointer(s, object));
>        c->page = NULL;
>        c->freelist = NULL;
>        local_irq_restore(flags);
> -       return object;
> +       return freelist;
>  }
>
>  /*
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 05/18] slub: Simplify control flow in __slab_alloc()
  2011-11-11 20:07 ` [rfc 05/18] slub: Simplify control flow in __slab_alloc() Christoph Lameter
@ 2011-11-14 21:45   ` Pekka Enberg
  2011-11-20 23:24   ` David Rientjes
  1 sibling, 0 replies; 39+ messages in thread
From: Pekka Enberg @ 2011-11-14 21:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

On Fri, Nov 11, 2011 at 10:07 PM, Christoph Lameter <cl@linux.com> wrote:
> Simplify control flow.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>

Would like to merge this too.

> ---
>  mm/slub.c |   16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c    2011-11-09 11:11:22.381541568 -0600
> +++ linux-2.6/mm/slub.c 2011-11-09 11:11:25.881561697 -0600
> @@ -2219,16 +2219,16 @@ new_slab:
>
>        freelist = get_partial(s, gfpflags, node, c);
>
> -       if (unlikely(!freelist)) {
> +       if (!freelist)
>                freelist = new_slab_objects(s, gfpflags, node, &c);
>
> -               if (unlikely(!freelist)) {
> -                       if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
> -                               slab_out_of_memory(s, gfpflags, node);
> -
> -                       local_irq_restore(flags);
> -                       return NULL;
> -               }
> +
> +       if (unlikely(!freelist)) {
> +               if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
> +                       slab_out_of_memory(s, gfpflags, node);
> +
> +               local_irq_restore(flags);
> +               return NULL;
>        }
>
>        if (likely(!kmem_cache_debug(s)))
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 06/18] slub: Use page variable instead of c->page.
  2011-11-11 20:07 ` [rfc 06/18] slub: Use page variable instead of c->page Christoph Lameter
@ 2011-11-14 21:46   ` Pekka Enberg
  2011-11-20 23:27   ` David Rientjes
  1 sibling, 0 replies; 39+ messages in thread
From: Pekka Enberg @ 2011-11-14 21:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

On Fri, Nov 11, 2011 at 10:07 PM, Christoph Lameter <cl@linux.com> wrote:
> The kmem_cache_cpu object pointed to by c will become
> volatile with the lockless patches later so extract
> the c->page pointer at certain times.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>

I don't know what GCC does these days but this sort of thing used to
generate better asm in mm/slab.c. So it might be worth it to merge
this.

> ---
>  mm/slub.c |   17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c    2011-11-09 11:11:25.881561697 -0600
> +++ linux-2.6/mm/slub.c 2011-11-09 11:11:32.231598204 -0600
> @@ -2160,6 +2160,7 @@ static void *__slab_alloc(struct kmem_ca
>                          unsigned long addr, struct kmem_cache_cpu *c)
>  {
>        void *freelist;
> +       struct page *page;
>        unsigned long flags;
>
>        local_irq_save(flags);
> @@ -2172,13 +2173,14 @@ static void *__slab_alloc(struct kmem_ca
>        c = this_cpu_ptr(s->cpu_slab);
>  #endif
>
> -       if (!c->page)
> +       page = c->page;
> +       if (!page)
>                goto new_slab;
>  redo:
>
>        if (unlikely(!node_match(c, node))) {
>                stat(s, ALLOC_NODE_MISMATCH);
> -               deactivate_slab(s, c->page, c->freelist);
> +               deactivate_slab(s, page, c->freelist);
>                c->page = NULL;
>                c->freelist = NULL;
>                goto new_slab;
> @@ -2186,7 +2188,7 @@ redo:
>
>        stat(s, ALLOC_SLOWPATH);
>
> -       freelist = get_freelist(s, c->page);
> +       freelist = get_freelist(s, page);
>
>        if (unlikely(!freelist)) {
>                c->page = NULL;
> @@ -2210,8 +2212,8 @@ load_freelist:
>  new_slab:
>
>        if (c->partial) {
> -               c->page = c->partial;
> -               c->partial = c->page->next;
> +               page = c->page = c->partial;
> +               c->partial = page->next;
>                stat(s, CPU_PARTIAL_ALLOC);
>                c->freelist = NULL;
>                goto redo;
> @@ -2231,13 +2233,14 @@ new_slab:
>                return NULL;
>        }
>
> +       page = c->page;
>        if (likely(!kmem_cache_debug(s)))
>                goto load_freelist;
>
>        /* Only entered in the debug case */
> -       if (!alloc_debug_processing(s, c->page, freelist, addr))
> +       if (!alloc_debug_processing(s, page, freelist, addr))
>                goto new_slab;  /* Slab failed checks. Next slab needed */
> -       deactivate_slab(s, c->page, get_freepointer(s, freelist));
> +       deactivate_slab(s, page, get_freepointer(s, freelist));
>
>        c->page = NULL;
>        c->freelist = NULL;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 01/18] slub: Get rid of the node field
  2011-11-14 21:42   ` Pekka Enberg
@ 2011-11-15 16:07     ` Christoph Lameter
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-15 16:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

On Mon, 14 Nov 2011, Pekka Enberg wrote:

> On Fri, Nov 11, 2011 at 10:07 PM, Christoph Lameter <cl@linux.com> wrote:
> > The node field is always page_to_nid(c->page). So its rather easy to
> > replace. Note that there will be additional overhead in various hot paths
> > due to the need to mask a set of bits in page->flags and shift the
> > result.
> >
> > Signed-off-by: Christoph Lameter <cl@linux.com>
>
> This is a nice cleanup even if we never go irqless in the slowpaths.
> Is page_to_nid() really that slow?

The fastpath only uses a few cycles now. Relatively high overhead is
added.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 03/18] slub: Extract get_freelist from __slab_alloc
  2011-11-14 21:43   ` Pekka Enberg
@ 2011-11-15 16:08     ` Christoph Lameter
  2011-12-13 20:31       ` Pekka Enberg
  0 siblings, 1 reply; 39+ messages in thread
From: Christoph Lameter @ 2011-11-15 16:08 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

On Mon, 14 Nov 2011, Pekka Enberg wrote:

> On Fri, Nov 11, 2011 at 10:07 PM, Christoph Lameter <cl@linux.com> wrote:
> > get_freelist retrieves free objects from the page freelist (put there by remote
> > frees) or deactivates a slab page if no more objects are available.
> >
> > Signed-off-by: Christoph Lameter <cl@linux.com>
>
> This is a also a nice cleanup. Any reason I shouldn't apply this?

Cannot think of any reason not to apply this patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 00/18] slub: irqless/lockless slow allocation paths
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (17 preceding siblings ...)
  2011-11-11 20:07 ` [rfc 18/18] slub: Move __slab_alloc() into slab_alloc() Christoph Lameter
@ 2011-11-16 17:39 ` Eric Dumazet
  2011-11-16 17:45   ` Eric Dumazet
  2011-11-20 23:30 ` David Rientjes
  19 siblings, 1 reply; 39+ messages in thread
From: Eric Dumazet @ 2011-11-16 17:39 UTC (permalink / raw)
  To: Christoph Lameter, David Miller
  Cc: Pekka Enberg, David Rientjes, Andi Kleen, tj,
	Metathronius Galabant, Matt Mackall, Adrian Drzewiecki,
	Shaohua Li, Alex Shi, linux-mm, netdev

Le vendredi 11 novembre 2011 A  14:07 -0600, Christoph Lameter a A(C)crit :
> This is a patchset that makes the allocator slow path also lockless like
> the free paths. However, in the process it is making processing more
> complex so that this is not a performance improvement. I am going to
> drop this series unless someone comes up with a bright idea to fix the
> following performance issues:
> 
> 1. Had to reduce the per cpu state kept to two words in order to
>    be able to operate without preempt disable / interrupt disable only
>    through cmpxchg_double(). This means that the node information and
>    the page struct location have to be calculated from the free pointer.
>    That is possible but relatively expensive and has to be done frequently
>    in fast paths.
> 
> 2. If the freepointer becomes NULL then the page struct location can
>    no longer be determined. So per cpu slabs must be deactivated when
>    the last object is retrieved from them causing more regressions.
> 
> If these issues remain unresolved then I am fine with the way things are
> right now in slub. Currently interrupts are disabled in the slow paths and
> then multiple fields in the kmem_cache_cpu structure are modified without
> regard to instruction atomicity.
> 

I believe this is a wrong idea.

You try to have a lockless slow path, while I believe you should not,
and instead batch things a bit like SLAB, and be smart about false
sharing.

The lock cost is nothing compared to cache line ping pongs.

Here is a real use case I am facing right now :

In traditional NIC driver model, rx path used a ring buffer of
pre-allocated skbs (256 ... 4096 elems per ring), and feed them to upper
stack when interrupts signal frames are available.

If the skb is delivered to a socket, and consumed/freed by another cpu,
we had no particular problem because skb was part of a page that was
completely used (no free objects in it), because of the RX ring buffer
buffering (frame N is delivered to stack if allocations N+1 ... N+1024
were already done)

This model has a downside, since we initialize skb at allocation time,
then add it in the ring buffer. Later, we handle the frame while sk_buff
content had been taken out of cpu caches, so cpu must reload sk_buff
from memory before sending skb to stack, this adds some latency to
receive path (about 5 cache line misses per packet)

We now want to allocate/populate the sk_buff right before sending it to
upper stack. (see build_skb() infrastructure in net-next tree :

http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commit;h=b2b5ce9d1ccf1c45f8ac68e5d901112ab76ba199

http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commit;h=e52fcb2462ac484e6dd6e68869536609f0216938

)

But... we now ping-pong in slab_alloc() in the case skb consumer is on a
different cpu (this is typically the case if one cpu is fully used in
softirq handling / stress situation, or if RPS/RFS techniques are used).

So softirq handler and consumers compete on heavy contended cache line
for _every_ allocation and free.

Switching to SLAB solves the problem.

perf profile for SLAB (no packet drops, and 5% of idle stil available),
for CPU0 (the one handling softirqs) : We see normal network functions
in a network workload :)


  9.45%  [kernel]  [k] ipt_do_table
  7.81%  [kernel]  [k] __udp4_lib_lookup.clone.46
  7.11%  [kernel]  [k] build_skb
  5.85%  [tg3]     [k] tg3_poll_work
  4.39%  [kernel]  [k] udp_queue_rcv_skb
  4.37%  [kernel]  [k] sock_def_readable
  4.21%  [kernel]  [k] __sk_mem_schedule
  3.72%  [kernel]  [k] __netif_receive_skb
  3.21%  [kernel]  [k] __udp4_lib_rcv
  2.98%  [kernel]  [k] nf_iterate
  2.85%  [kernel]  [k] _raw_spin_lock
  2.85%  [kernel]  [k] ip_route_input_common
  2.83%  [kernel]  [k] sock_queue_rcv_skb
  2.77%  [kernel]  [k] ip_rcv
  2.76%  [kernel]  [k] __kmalloc
  2.03%  [kernel]  [k] kmem_cache_alloc
  1.93%  [kernel]  [k] _raw_spin_lock_irqsave
  1.76%  [kernel]  [k] eth_type_trans
  1.49%  [kernel]  [k] nf_hook_slow
  1.46%  [kernel]  [k] inet_gro_receive
  1.27%  [tg3]     [k] tg3_alloc_rx_data

With SLUB : We see contention in __slab_alloc, and packet drops.

 13.13%  [kernel]  [k] __slab_alloc.clone.56
  8.81%  [kernel]  [k] ipt_do_table
  7.41%  [kernel]  [k] __udp4_lib_lookup.clone.46
  4.64%  [tg3]     [k] tg3_poll_work
  3.93%  [kernel]  [k] build_skb
  3.65%  [kernel]  [k] udp_queue_rcv_skb
  3.33%  [kernel]  [k] __netif_receive_skb
  3.26%  [kernel]  [k] kmem_cache_alloc
  3.16%  [kernel]  [k] sock_def_readable
  3.15%  [kernel]  [k] nf_iterate
  3.13%  [kernel]  [k] __sk_mem_schedule
  2.81%  [kernel]  [k] __udp4_lib_rcv
  2.58%  [kernel]  [k] setup_object.clone.50
  2.54%  [kernel]  [k] sock_queue_rcv_skb
  2.32%  [kernel]  [k] ip_route_input_common
  2.25%  [kernel]  [k] ip_rcv
  2.14%  [kernel]  [k] _raw_spin_lock
  1.95%  [kernel]  [k] eth_type_trans
  1.55%  [kernel]  [k] inet_gro_receive
  1.50%  [kernel]  [k] ksize
  1.42%  [kernel]  [k] __kmalloc
  1.29%  [kernel]  [k] _raw_spin_lock_irqsave

Notice new_slab() is not there at all.

Adding SLUB_STATS gives :

$ cd /sys/kernel/slab/skbuff_head_cache ; grep . *
aliases:6
align:8
grep: alloc_calls: Function not implemented
alloc_fastpath:89181782 C0=89173048 C1=1599 C2=1357 C3=2140 C4=802 C5=675 C6=638 C7=1523
alloc_from_partial:412658 C0=412658
alloc_node_mismatch:0
alloc_refill:593417 C0=593189 C1=19 C2=15 C3=24 C4=51 C5=18 C6=17 C7=84
alloc_slab:2831313 C0=2831285 C1=2 C2=2 C3=2 C4=2 C5=12 C6=4 C7=4
alloc_slowpath:4430371 C0=4430112 C1=20 C2=17 C3=25 C4=57 C5=31 C6=21 C7=88
cache_dma:0
cmpxchg_double_cpu_fail:0
cmpxchg_double_fail:1 C0=1
cpu_partial:30
cpu_partial_alloc:592991 C0=592981 C2=1 C4=5 C5=2 C6=1 C7=1
cpu_partial_free:4429836 C0=592981 C1=25 C2=19 C3=23 C4=3836767 C5=6 C6=8 C7=7
cpuslab_flush:0
cpu_slabs:107
deactivate_bypass:3836954 C0=3836923 C1=1 C2=2 C3=1 C4=6 C5=13 C6=4 C7=4
deactivate_empty:2831168 C4=2831168
deactivate_full:0
deactivate_remote_frees:0
deactivate_to_head:0
deactivate_to_tail:0
destroy_by_rcu:0
free_add_partial:0
grep: free_calls: Function not implemented
free_fastpath:21192924 C0=21186268 C1=1420 C2=1204 C3=1966 C4=572 C5=349 C6=380 C7=765
free_frozen:67988498 C0=516 C1=121 C2=85 C3=841 C4=67986468 C5=215 C6=76 C7=176
free_remove_partial:18 C4=18
free_slab:2831186 C4=2831186
free_slowpath:71825749 C0=609 C1=146 C2=104 C3=864 C4=71823538 C5=221 C6=84 C7=183
hwcache_align:0
min_partial:5
objects:2494
object_size:192
objects_partial:121
objs_per_slab:21
order:0
order_fallback:0
partial:14
poison:0
reclaim_account:0
red_zone:0
reserved:0
sanity_checks:0
slabs:127
slabs_cpu_partial:99(99) C1=25(25) C2=18(18) C3=23(23) C4=16(16) C5=4(4) C6=7(7) C7=6(6)
slab_size:192
store_user:0
total_objects:2667
trace:0


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 00/18] slub: irqless/lockless slow allocation paths
  2011-11-16 17:39 ` [rfc 00/18] slub: irqless/lockless slow allocation paths Eric Dumazet
@ 2011-11-16 17:45   ` Eric Dumazet
  2011-11-20 23:32     ` David Rientjes
  0 siblings, 1 reply; 39+ messages in thread
From: Eric Dumazet @ 2011-11-16 17:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Miller, Pekka Enberg, David Rientjes, Andi Kleen, tj,
	Metathronius Galabant, Matt Mackall, Adrian Drzewiecki,
	Shaohua Li, Alex Shi, linux-mm, netdev

Le mercredi 16 novembre 2011 A  18:39 +0100, Eric Dumazet a A(C)crit :

> Adding SLUB_STATS gives :
> 
> $ cd /sys/kernel/slab/skbuff_head_cache ; grep . *
> aliases:6
> align:8
> grep: alloc_calls: Function not implemented
> alloc_fastpath:89181782 C0=89173048 C1=1599 C2=1357 C3=2140 C4=802 C5=675 C6=638 C7=1523
> alloc_from_partial:412658 C0=412658
> alloc_node_mismatch:0
> alloc_refill:593417 C0=593189 C1=19 C2=15 C3=24 C4=51 C5=18 C6=17 C7=84
> alloc_slab:2831313 C0=2831285 C1=2 C2=2 C3=2 C4=2 C5=12 C6=4 C7=4
> alloc_slowpath:4430371 C0=4430112 C1=20 C2=17 C3=25 C4=57 C5=31 C6=21 C7=88
> cache_dma:0
> cmpxchg_double_cpu_fail:0
> cmpxchg_double_fail:1 C0=1
> cpu_partial:30
> cpu_partial_alloc:592991 C0=592981 C2=1 C4=5 C5=2 C6=1 C7=1
> cpu_partial_free:4429836 C0=592981 C1=25 C2=19 C3=23 C4=3836767 C5=6 C6=8 C7=7
> cpuslab_flush:0
> cpu_slabs:107
> deactivate_bypass:3836954 C0=3836923 C1=1 C2=2 C3=1 C4=6 C5=13 C6=4 C7=4
> deactivate_empty:2831168 C4=2831168
> deactivate_full:0
> deactivate_remote_frees:0
> deactivate_to_head:0
> deactivate_to_tail:0
> destroy_by_rcu:0
> free_add_partial:0
> grep: free_calls: Function not implemented
> free_fastpath:21192924 C0=21186268 C1=1420 C2=1204 C3=1966 C4=572 C5=349 C6=380 C7=765
> free_frozen:67988498 C0=516 C1=121 C2=85 C3=841 C4=67986468 C5=215 C6=76 C7=176
> free_remove_partial:18 C4=18
> free_slab:2831186 C4=2831186
> free_slowpath:71825749 C0=609 C1=146 C2=104 C3=864 C4=71823538 C5=221 C6=84 C7=183
> hwcache_align:0
> min_partial:5
> objects:2494
> object_size:192
> objects_partial:121
> objs_per_slab:21
> order:0
> order_fallback:0
> partial:14
> poison:0
> reclaim_account:0
> red_zone:0
> reserved:0
> sanity_checks:0
> slabs:127
> slabs_cpu_partial:99(99) C1=25(25) C2=18(18) C3=23(23) C4=16(16) C5=4(4) C6=7(7) C7=6(6)
> slab_size:192
> store_user:0
> total_objects:2667
> trace:0
> 

And the SLUB stats for the 2048 bytes slab is even worse : About every
alloc/free is slow path

$ cd /sys/kernel/slab/:t-0002048 ; grep . *
aliases:0
align:8
grep: alloc_calls: Function not implemented
alloc_fastpath:8199220 C0=8196915 C1=306 C2=63 C3=297 C4=319 C5=550
C6=722 C7=48
alloc_from_partial:13931406 C0=13931401 C3=1 C5=4
alloc_node_mismatch:0
alloc_refill:70871657 C0=70871629 C1=2 C3=3 C4=9 C5=11 C6=3
alloc_slab:1335 C0=1216 C1=17 C2=2 C3=15 C4=17 C5=22 C6=44 C7=2
alloc_slowpath:155455299 C0=155455144 C1=18 C2=1 C3=21 C4=27 C5=40 C6=47
C7=1
cache_dma:0
cmpxchg_double_cpu_fail:0
cmpxchg_double_fail:27341 C0=12769 C4=14572
cpu_partial:6
cpu_partial_alloc:70650909 C0=70650899 C3=3 C4=2 C5=4 C6=1
cpu_partial_free:136279924 C0=71504388 C1=13 C2=1 C3=52 C4=64775461 C5=6
C6=2 C7=1
cpuslab_flush:0
cpu_slabs:29
deactivate_bypass:84583642 C0=84583515 C1=16 C2=1 C3=18 C4=18 C5=29
C6=44 C7=1
deactivate_empty:570 C0=80 C3=34 C4=456
deactivate_full:0
deactivate_remote_frees:0
deactivate_to_head:0
deactivate_to_tail:0
destroy_by_rcu:0
free_add_partial:0
grep: free_calls: Function not implemented
free_fastpath:89153 C0=88972 C1=34 C2=35 C3=27 C4=12 C5=23 C6=30 C7=20
free_frozen:5971363 C0=554097 C1=196 C2=14 C3=730 C4=5416278 C5=16 C6=19
C7=13
free_remove_partial:401 C1=1 C4=400
free_slab:971 C0=80 C1=1 C3=34 C4=856
free_slowpath:92913113 C0=21090357 C1=212 C2=15 C3=784 C4=71821691 C5=19
C6=21 C7=14
hwcache_align:0
min_partial:5
objects:1873
object_size:2048
objects_partial:945
objs_per_slab:16
order:3
order_fallback:0
partial:306
poison:0
reclaim_account:0
red_zone:0
reserved:0
sanity_checks:0
slabs:364
slabs_cpu_partial:21(21) C0=3(3) C1=6(6) C2=1(1) C3=7(7) C5=2(2) C6=1(1)
C7=1(1)
slab_size:2048
store_user:0
total_objects:5824
trace:0


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 01/18] slub: Get rid of the node field
  2011-11-11 20:07 ` [rfc 01/18] slub: Get rid of the node field Christoph Lameter
  2011-11-14 21:42   ` Pekka Enberg
@ 2011-11-20 23:01   ` David Rientjes
  2011-11-21 17:17     ` Christoph Lameter
  1 sibling, 1 reply; 39+ messages in thread
From: David Rientjes @ 2011-11-20 23:01 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
	Eric Dumazet, Adrian Drzewiecki, Shaohua Li, Alex Shi, linux-mm

On Fri, 11 Nov 2011, Christoph Lameter wrote:

> The node field is always page_to_nid(c->page). So its rather easy to
> replace. Note that there will be additional overhead in various hot paths
> due to the need to mask a set of bits in page->flags and shift the
> result.
> 

This certainly does add overhead to the fastpath just by checking 
node_match() if we're doing kmalloc_node(), and that overhead might be 
higher than you expect if NODE_NOT_IN_PAGE_FLAGS.  Storing the node in 
kmem_cache_cpu was always viewed as an optimization, not sure why you'd 
want to get rid of it?  The changelog at least doesn't mention any 
motivation.  Do we need to shrink that struct for something else later or 
something?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 02/18] slub: Separate out kmem_cache_cpu processing from deactivate_slab
  2011-11-11 20:07 ` [rfc 02/18] slub: Separate out kmem_cache_cpu processing from deactivate_slab Christoph Lameter
@ 2011-11-20 23:10   ` David Rientjes
  0 siblings, 0 replies; 39+ messages in thread
From: David Rientjes @ 2011-11-20 23:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
	Eric Dumazet, Adrian Drzewiecki, Shaohua Li, Alex Shi, linux-mm

On Fri, 11 Nov 2011, Christoph Lameter wrote:

> Processing on fields of kmem_cache needs to be outside of deactivate_slab()
> since we will be handling that with cmpxchg_double later.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 03/18] slub: Extract get_freelist from __slab_alloc
  2011-11-11 20:07 ` [rfc 03/18] slub: Extract get_freelist from __slab_alloc Christoph Lameter
  2011-11-14 21:43   ` Pekka Enberg
@ 2011-11-20 23:18   ` David Rientjes
  1 sibling, 0 replies; 39+ messages in thread
From: David Rientjes @ 2011-11-20 23:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
	Eric Dumazet, Adrian Drzewiecki, Shaohua Li, Alex Shi, linux-mm

On Fri, 11 Nov 2011, Christoph Lameter wrote:

> get_freelist retrieves free objects from the page freelist (put there by remote
> frees) or deactivates a slab page if no more objects are available.
> 

Please also mention that you're now using cmpxchg_double_slab() to deal 
with disabling irqs now when grabbing the freelist.

> Signed-off-by: Christoph Lameter <cl@linux.com>
> 

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 04/18] slub: Use freelist instead of "object" in __slab_alloc
  2011-11-11 20:07 ` [rfc 04/18] slub: Use freelist instead of "object" in __slab_alloc Christoph Lameter
  2011-11-14 21:44   ` Pekka Enberg
@ 2011-11-20 23:22   ` David Rientjes
  1 sibling, 0 replies; 39+ messages in thread
From: David Rientjes @ 2011-11-20 23:22 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
	Eric Dumazet, Adrian Drzewiecki, Shaohua Li, Alex Shi, linux-mm

On Fri, 11 Nov 2011, Christoph Lameter wrote:

> The variable "object" really refers to a list of objects that we
> are handling. Since the lockless allocator path will depend on it
> we rename the variable now.
> 

Some of this needs to be folded into the earlier patch that introduces 
get_freelist() since this patch doesn't just rename a variable, it changes 
the variable type.

> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> ---
>  mm/slub.c |   40 ++++++++++++++++++++++------------------
>  1 file changed, 22 insertions(+), 18 deletions(-)
> 
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c	2011-11-09 11:11:13.471490305 -0600
> +++ linux-2.6/mm/slub.c	2011-11-09 11:11:22.381541568 -0600
> @@ -2084,7 +2084,7 @@ slab_out_of_memory(struct kmem_cache *s,
>  static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
>  			int node, struct kmem_cache_cpu **pc)
>  {
> -	void *object;
> +	void *freelist;
>  	struct kmem_cache_cpu *c;
>  	struct page *page = new_slab(s, flags, node);
>  
> @@ -2097,16 +2097,16 @@ static inline void *new_slab_objects(str
>  		 * No other reference to the page yet so we can
>  		 * muck around with it freely without cmpxchg
>  		 */
> -		object = page->freelist;
> +		freelist = page->freelist;
>  		page->freelist = NULL;
>  
>  		stat(s, ALLOC_SLAB);
>  		c->page = page;
>  		*pc = c;
>  	} else
> -		object = NULL;
> +		freelist = NULL;
>  
> -	return object;
> +	return freelist;
>  }
>  
>  /*
> @@ -2159,7 +2159,7 @@ static inline void *get_freelist(struct
>  static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  			  unsigned long addr, struct kmem_cache_cpu *c)
>  {
> -	void **object;
> +	void *freelist;
>  	unsigned long flags;
>  
>  	local_irq_save(flags);
> @@ -2175,6 +2175,7 @@ static void *__slab_alloc(struct kmem_ca
>  	if (!c->page)
>  		goto new_slab;
>  redo:
> +
>  	if (unlikely(!node_match(c, node))) {
>  		stat(s, ALLOC_NODE_MISMATCH);
>  		deactivate_slab(s, c->page, c->freelist);

I don't think we need this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 05/18] slub: Simplify control flow in __slab_alloc()
  2011-11-11 20:07 ` [rfc 05/18] slub: Simplify control flow in __slab_alloc() Christoph Lameter
  2011-11-14 21:45   ` Pekka Enberg
@ 2011-11-20 23:24   ` David Rientjes
  1 sibling, 0 replies; 39+ messages in thread
From: David Rientjes @ 2011-11-20 23:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
	Eric Dumazet, Adrian Drzewiecki, Shaohua Li, Alex Shi, linux-mm

On Fri, 11 Nov 2011, Christoph Lameter wrote:

> Simplify control flow.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 06/18] slub: Use page variable instead of c->page.
  2011-11-11 20:07 ` [rfc 06/18] slub: Use page variable instead of c->page Christoph Lameter
  2011-11-14 21:46   ` Pekka Enberg
@ 2011-11-20 23:27   ` David Rientjes
  1 sibling, 0 replies; 39+ messages in thread
From: David Rientjes @ 2011-11-20 23:27 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
	Eric Dumazet, Adrian Drzewiecki, Shaohua Li, Alex Shi, linux-mm

On Fri, 11 Nov 2011, Christoph Lameter wrote:

> The kmem_cache_cpu object pointed to by c will become
> volatile with the lockless patches later so extract
> the c->page pointer at certain times.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 07/18] slub: pass page to node_match() instead of kmem_cache_cpu structure
  2011-11-11 20:07 ` [rfc 07/18] slub: pass page to node_match() instead of kmem_cache_cpu structure Christoph Lameter
@ 2011-11-20 23:28   ` David Rientjes
  0 siblings, 0 replies; 39+ messages in thread
From: David Rientjes @ 2011-11-20 23:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
	Eric Dumazet, Adrian Drzewiecki, Shaohua Li, Alex Shi, linux-mm

On Fri, 11 Nov 2011, Christoph Lameter wrote:

> The page field in struct kmem_cache_cpu will go away soon and so its more
> convenient to pass the page struct to kmem_cache_cpu instead.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 00/18] slub: irqless/lockless slow allocation paths
  2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
                   ` (18 preceding siblings ...)
  2011-11-16 17:39 ` [rfc 00/18] slub: irqless/lockless slow allocation paths Eric Dumazet
@ 2011-11-20 23:30 ` David Rientjes
  19 siblings, 0 replies; 39+ messages in thread
From: David Rientjes @ 2011-11-20 23:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
	Eric Dumazet, Adrian Drzewiecki, Shaohua Li, Alex Shi, linux-mm

On Fri, 11 Nov 2011, Christoph Lameter wrote:

> This is a patchset that makes the allocator slow path also lockless like
> the free paths. However, in the process it is making processing more
> complex so that this is not a performance improvement. I am going to
> drop this series unless someone comes up with a bright idea to fix the
> following performance issues:
> 
> 1. Had to reduce the per cpu state kept to two words in order to
>    be able to operate without preempt disable / interrupt disable only
>    through cmpxchg_double(). This means that the node information and
>    the page struct location have to be calculated from the free pointer.
>    That is possible but relatively expensive and has to be done frequently
>    in fast paths.
> 
> 2. If the freepointer becomes NULL then the page struct location can
>    no longer be determined. So per cpu slabs must be deactivated when
>    the last object is retrieved from them causing more regressions.
> 
> If these issues remain unresolved then I am fine with the way things are
> right now in slub. Currently interrupts are disabled in the slow paths and
> then multiple fields in the kmem_cache_cpu structure are modified without
> regard to instruction atomicity.
> 

I think patches 1-7 should be proposed as a separate set of cleanups that 
are an overall improvement to the slub code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 00/18] slub: irqless/lockless slow allocation paths
  2011-11-16 17:45   ` Eric Dumazet
@ 2011-11-20 23:32     ` David Rientjes
  0 siblings, 0 replies; 39+ messages in thread
From: David Rientjes @ 2011-11-20 23:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, David Miller, Pekka Enberg, Andi Kleen, tj,
	Metathronius Galabant, Matt Mackall, Adrian Drzewiecki,
	Shaohua Li, Alex Shi, linux-mm, netdev

On Wed, 16 Nov 2011, Eric Dumazet wrote:

> > Adding SLUB_STATS gives :
> > 
> > $ cd /sys/kernel/slab/skbuff_head_cache ; grep . *
> > aliases:6
> > align:8
> > grep: alloc_calls: Function not implemented
> > alloc_fastpath:89181782 C0=89173048 C1=1599 C2=1357 C3=2140 C4=802 C5=675 C6=638 C7=1523
> > alloc_from_partial:412658 C0=412658
> > alloc_node_mismatch:0
> > alloc_refill:593417 C0=593189 C1=19 C2=15 C3=24 C4=51 C5=18 C6=17 C7=84
> > alloc_slab:2831313 C0=2831285 C1=2 C2=2 C3=2 C4=2 C5=12 C6=4 C7=4
> > alloc_slowpath:4430371 C0=4430112 C1=20 C2=17 C3=25 C4=57 C5=31 C6=21 C7=88
> > cache_dma:0
> > cmpxchg_double_cpu_fail:0
> > cmpxchg_double_fail:1 C0=1
> > cpu_partial:30
> > cpu_partial_alloc:592991 C0=592981 C2=1 C4=5 C5=2 C6=1 C7=1
> > cpu_partial_free:4429836 C0=592981 C1=25 C2=19 C3=23 C4=3836767 C5=6 C6=8 C7=7
> > cpuslab_flush:0
> > cpu_slabs:107
> > deactivate_bypass:3836954 C0=3836923 C1=1 C2=2 C3=1 C4=6 C5=13 C6=4 C7=4
> > deactivate_empty:2831168 C4=2831168
> > deactivate_full:0
> > deactivate_remote_frees:0
> > deactivate_to_head:0
> > deactivate_to_tail:0
> > destroy_by_rcu:0
> > free_add_partial:0
> > grep: free_calls: Function not implemented
> > free_fastpath:21192924 C0=21186268 C1=1420 C2=1204 C3=1966 C4=572 C5=349 C6=380 C7=765
> > free_frozen:67988498 C0=516 C1=121 C2=85 C3=841 C4=67986468 C5=215 C6=76 C7=176
> > free_remove_partial:18 C4=18
> > free_slab:2831186 C4=2831186
> > free_slowpath:71825749 C0=609 C1=146 C2=104 C3=864 C4=71823538 C5=221 C6=84 C7=183
> > hwcache_align:0
> > min_partial:5
> > objects:2494
> > object_size:192
> > objects_partial:121
> > objs_per_slab:21
> > order:0
> > order_fallback:0
> > partial:14
> > poison:0
> > reclaim_account:0
> > red_zone:0
> > reserved:0
> > sanity_checks:0
> > slabs:127
> > slabs_cpu_partial:99(99) C1=25(25) C2=18(18) C3=23(23) C4=16(16) C5=4(4) C6=7(7) C7=6(6)
> > slab_size:192
> > store_user:0
> > total_objects:2667
> > trace:0
> > 
> 
> And the SLUB stats for the 2048 bytes slab is even worse : About every
> alloc/free is slow path
> 
> $ cd /sys/kernel/slab/:t-0002048 ; grep . *
> aliases:0
> align:8
> grep: alloc_calls: Function not implemented
> alloc_fastpath:8199220 C0=8196915 C1=306 C2=63 C3=297 C4=319 C5=550
> C6=722 C7=48
> alloc_from_partial:13931406 C0=13931401 C3=1 C5=4
> alloc_node_mismatch:0
> alloc_refill:70871657 C0=70871629 C1=2 C3=3 C4=9 C5=11 C6=3
> alloc_slab:1335 C0=1216 C1=17 C2=2 C3=15 C4=17 C5=22 C6=44 C7=2
> alloc_slowpath:155455299 C0=155455144 C1=18 C2=1 C3=21 C4=27 C5=40 C6=47
> C7=1

I certainly sympathize with your situation; these stats are even worse 
with netperf TCP_RR where slub regresses very heavily against slab.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 01/18] slub: Get rid of the node field
  2011-11-20 23:01   ` David Rientjes
@ 2011-11-21 17:17     ` Christoph Lameter
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2011-11-21 17:17 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pekka Enberg, Andi Kleen, tj, Metathronius Galabant, Matt Mackall,
	Eric Dumazet, Adrian Drzewiecki, Shaohua Li, Alex Shi, linux-mm

On Sun, 20 Nov 2011, David Rientjes wrote:

> On Fri, 11 Nov 2011, Christoph Lameter wrote:
>
> > The node field is always page_to_nid(c->page). So its rather easy to
> > replace. Note that there will be additional overhead in various hot paths
> > due to the need to mask a set of bits in page->flags and shift the
> > result.
> >
>
> This certainly does add overhead to the fastpath just by checking
> node_match() if we're doing kmalloc_node(), and that overhead might be
> higher than you expect if NODE_NOT_IN_PAGE_FLAGS.  Storing the node in
> kmem_cache_cpu was always viewed as an optimization, not sure why you'd
> want to get rid of it?  The changelog at least doesn't mention any
> motivation.  Do we need to shrink that struct for something else later or
> something?

If you would read the description of the patch series you could probably
figure it out.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc 03/18] slub: Extract get_freelist from __slab_alloc
  2011-11-15 16:08     ` Christoph Lameter
@ 2011-12-13 20:31       ` Pekka Enberg
  0 siblings, 0 replies; 39+ messages in thread
From: Pekka Enberg @ 2011-12-13 20:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, Shaohua Li,
	Alex Shi, linux-mm

On Tue, 2011-11-15 at 10:08 -0600, Christoph Lameter wrote:
> On Mon, 14 Nov 2011, Pekka Enberg wrote:
> 
> > On Fri, Nov 11, 2011 at 10:07 PM, Christoph Lameter <cl@linux.com> wrote:
> > > get_freelist retrieves free objects from the page freelist (put there by remote
> > > frees) or deactivates a slab page if no more objects are available.
> > >
> > > Signed-off-by: Christoph Lameter <cl@linux.com>
> >
> > This is a also a nice cleanup. Any reason I shouldn't apply this?
> 
> Cannot think of any reason not to apply this patch.

I ended up applying only one of the cleanups David ACK'd. I got too many
rejects when applying for the other ones.

			Pekka 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2011-12-13 20:31 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-11 20:07 [rfc 00/18] slub: irqless/lockless slow allocation paths Christoph Lameter
2011-11-11 20:07 ` [rfc 01/18] slub: Get rid of the node field Christoph Lameter
2011-11-14 21:42   ` Pekka Enberg
2011-11-15 16:07     ` Christoph Lameter
2011-11-20 23:01   ` David Rientjes
2011-11-21 17:17     ` Christoph Lameter
2011-11-11 20:07 ` [rfc 02/18] slub: Separate out kmem_cache_cpu processing from deactivate_slab Christoph Lameter
2011-11-20 23:10   ` David Rientjes
2011-11-11 20:07 ` [rfc 03/18] slub: Extract get_freelist from __slab_alloc Christoph Lameter
2011-11-14 21:43   ` Pekka Enberg
2011-11-15 16:08     ` Christoph Lameter
2011-12-13 20:31       ` Pekka Enberg
2011-11-20 23:18   ` David Rientjes
2011-11-11 20:07 ` [rfc 04/18] slub: Use freelist instead of "object" in __slab_alloc Christoph Lameter
2011-11-14 21:44   ` Pekka Enberg
2011-11-20 23:22   ` David Rientjes
2011-11-11 20:07 ` [rfc 05/18] slub: Simplify control flow in __slab_alloc() Christoph Lameter
2011-11-14 21:45   ` Pekka Enberg
2011-11-20 23:24   ` David Rientjes
2011-11-11 20:07 ` [rfc 06/18] slub: Use page variable instead of c->page Christoph Lameter
2011-11-14 21:46   ` Pekka Enberg
2011-11-20 23:27   ` David Rientjes
2011-11-11 20:07 ` [rfc 07/18] slub: pass page to node_match() instead of kmem_cache_cpu structure Christoph Lameter
2011-11-20 23:28   ` David Rientjes
2011-11-11 20:07 ` [rfc 08/18] slub: enable use of deactivate_slab with interrupts on Christoph Lameter
2011-11-11 20:07 ` [rfc 09/18] slub: Run deactivate_slab with interrupts enabled Christoph Lameter
2011-11-11 20:07 ` [rfc 10/18] slub: Enable use of get_partial " Christoph Lameter
2011-11-11 20:07 ` [rfc 11/18] slub: Acquire_slab() avoid loop Christoph Lameter
2011-11-11 20:07 ` [rfc 12/18] slub: Remove kmem_cache_cpu dependency from acquire slab Christoph Lameter
2011-11-11 20:07 ` [rfc 13/18] slub: Add functions to manage per cpu freelists Christoph Lameter
2011-11-11 20:07 ` [rfc 14/18] slub: Decomplicate the get_pointer_safe call and fixup statistics Christoph Lameter
2011-11-11 20:07 ` [rfc 15/18] slub: new_slab_objects() can also get objects from partial list Christoph Lameter
2011-11-11 20:07 ` [rfc 16/18] slub: Drop page field from kmem_cache_cpu Christoph Lameter
2011-11-11 20:07 ` [rfc 17/18] slub: Move __slab_free() into slab_free() Christoph Lameter
2011-11-11 20:07 ` [rfc 18/18] slub: Move __slab_alloc() into slab_alloc() Christoph Lameter
2011-11-16 17:39 ` [rfc 00/18] slub: irqless/lockless slow allocation paths Eric Dumazet
2011-11-16 17:45   ` Eric Dumazet
2011-11-20 23:32     ` David Rientjes
2011-11-20 23:30 ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).