public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/3] slub: introducing detached freelist
@ 2015-08-24  0:58 Jesper Dangaard Brouer
  2015-08-24  0:58 ` [PATCH V2 1/3] slub: extend slowpath __slab_free() to handle bulk free Jesper Dangaard Brouer
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Jesper Dangaard Brouer @ 2015-08-24  0:58 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, akpm
  Cc: aravinda, iamjoonsoo.kim, Paul E. McKenney, linux-kernel,
	Jesper Dangaard Brouer

REPOST:
 * Only updated comment in patch01 per request of Christoph Lameter.
 * No other objections have been made
 * Prev post: http://thread.gmane.org/gmane.linux.kernel.mm/135704

NEW use-cases for this API is RCU-free (and still for network NICs).

Introducing what I call detached freelist, for improving the
performance of object freeing in the "slowpath" of kmem_cache_free_bulk,
which calls __slab_free().

The benchmarking tool are avail here:
 https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm
 See: slab_bulk_test0{1,2,3}.c

Compared against existing bulk-API (in AKPMs tree), we see a small
regression for small size bulking (between 2-5 cycles), but a huge
improvement for the slowpath.

bulk- Bulk-API-before           - Bulk-API with patchset
  1 -  42 cycles(tsc) 10.520 ns - 47 cycles(tsc) 11.931 ns - improved -11.9%
  2 -  26 cycles(tsc)  6.697 ns - 29 cycles(tsc)  7.368 ns - improved -11.5%
  3 -  22 cycles(tsc)  5.589 ns - 24 cycles(tsc)  6.003 ns - improved -9.1%
  4 -  19 cycles(tsc)  4.921 ns - 22 cycles(tsc)  5.543 ns - improved -15.8%
  8 -  17 cycles(tsc)  4.499 ns - 20 cycles(tsc)  5.047 ns - improved -17.6%
 16 -  69 cycles(tsc) 17.424 ns - 20 cycles(tsc)  5.015 ns - improved 71.0%
 30 -  88 cycles(tsc) 22.075 ns - 20 cycles(tsc)  5.062 ns - improved 77.3%
 32 -  83 cycles(tsc) 20.965 ns - 20 cycles(tsc)  5.089 ns - improved 75.9%
 34 -  80 cycles(tsc) 20.039 ns - 28 cycles(tsc)  7.006 ns - improved 65.0%
 48 -  76 cycles(tsc) 19.252 ns - 31 cycles(tsc)  7.755 ns - improved 59.2%
 64 -  86 cycles(tsc) 21.523 ns - 68 cycles(tsc) 17.203 ns - improved 20.9%
128 -  97 cycles(tsc) 24.444 ns - 72 cycles(tsc) 18.195 ns - improved 25.8%
158 -  96 cycles(tsc) 24.036 ns - 73 cycles(tsc) 18.372 ns - improved 24.0%
250 - 100 cycles(tsc) 25.007 ns - 73 cycles(tsc) 18.430 ns - improved 27.0%

Patchset based on top of commit aefbef10e3ae with previous accepted
bulk patchset(V2) applied (avail in AKPMs quilt).

Small note, benchmark run with kernel compiled with .config
CONFIG_FTRACE in-order to use the perf probes to measure the amount of
page bulking into __slab_free().  While running the "worse-case"
testing module slab_bulk_test03.c

---

Jesper Dangaard Brouer (3):
      slub: extend slowpath __slab_free() to handle bulk free
      slub: optimize bulk slowpath free by detached freelist
      slub: build detached freelist with look-ahead


 mm/slub.c |  142 ++++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 112 insertions(+), 30 deletions(-)

--

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH V2 1/3] slub: extend slowpath __slab_free() to handle bulk free
  2015-08-24  0:58 [PATCH V2 0/3] slub: introducing detached freelist Jesper Dangaard Brouer
@ 2015-08-24  0:58 ` Jesper Dangaard Brouer
  2015-08-24  0:59 ` [PATCH V2 2/3] slub: optimize bulk slowpath free by detached freelist Jesper Dangaard Brouer
  2015-08-24  0:59 ` [PATCH V2 3/3] slub: build detached freelist with look-ahead Jesper Dangaard Brouer
  2 siblings, 0 replies; 4+ messages in thread
From: Jesper Dangaard Brouer @ 2015-08-24  0:58 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, akpm
  Cc: aravinda, iamjoonsoo.kim, Paul E. McKenney, linux-kernel,
	Jesper Dangaard Brouer

Make it possible to free a freelist with several objects by extending
__slab_free() with two arguments: a freelist_head pointer and objects
counter (cnt).  If freelist_head pointer is set, then the object must
be the freelist tail pointer.

This allows a freelist with several objects (all within the same
slab-page) to be free'ed using a single locked cmpxchg_double.

Micro benchmarking showed no performance reduction due to this change.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

---
V2: Per request of Christoph Lameter
 * Made it more clear that freelist objs must exist within same page

 mm/slub.c |   16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index c9305f525004..10b57a3bb895 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2573,9 +2573,14 @@ EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
  * So we still attempt to reduce cache line usage. Just take the slab
  * lock and free the item. If there is no additional partial page
  * handling required then we can return immediately.
+ *
+ * Bulk free of a freelist with several objects (all pointing to the
+ * same page) possible by specifying freelist_head ptr and object as
+ * tail ptr, plus objects count (cnt).
  */
 static void __slab_free(struct kmem_cache *s, struct page *page,
-			void *x, unsigned long addr)
+			void *x, unsigned long addr,
+			void *freelist_head, int cnt)
 {
 	void *prior;
 	void **object = (void *)x;
@@ -2584,6 +2589,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 	unsigned long counters;
 	struct kmem_cache_node *n = NULL;
 	unsigned long uninitialized_var(flags);
+	void *new_freelist = (!freelist_head) ? object : freelist_head;
 
 	stat(s, FREE_SLOWPATH);
 
@@ -2601,7 +2607,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 		set_freepointer(s, object, prior);
 		new.counters = counters;
 		was_frozen = new.frozen;
-		new.inuse--;
+		new.inuse -= cnt;
 		if ((!new.inuse || !prior) && !was_frozen) {
 
 			if (kmem_cache_has_cpu_partial(s) && !prior) {
@@ -2632,7 +2638,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
 
 	} while (!cmpxchg_double_slab(s, page,
 		prior, counters,
-		object, new.counters,
+		new_freelist, new.counters,
 		"__slab_free"));
 
 	if (likely(!n)) {
@@ -2736,7 +2742,7 @@ redo:
 		}
 		stat(s, FREE_FASTPATH);
 	} else
-		__slab_free(s, page, x, addr);
+		__slab_free(s, page, x, addr, NULL, 1);
 
 }
 
@@ -2780,7 +2786,7 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
 			c->tid = next_tid(c->tid);
 			local_irq_enable();
 			/* Slowpath: overhead locked cmpxchg_double_slab */
-			__slab_free(s, page, object, _RET_IP_);
+			__slab_free(s, page, object, _RET_IP_, NULL, 1);
 			local_irq_disable();
 			c = this_cpu_ptr(s->cpu_slab);
 		}


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH V2 2/3] slub: optimize bulk slowpath free by detached freelist
  2015-08-24  0:58 [PATCH V2 0/3] slub: introducing detached freelist Jesper Dangaard Brouer
  2015-08-24  0:58 ` [PATCH V2 1/3] slub: extend slowpath __slab_free() to handle bulk free Jesper Dangaard Brouer
@ 2015-08-24  0:59 ` Jesper Dangaard Brouer
  2015-08-24  0:59 ` [PATCH V2 3/3] slub: build detached freelist with look-ahead Jesper Dangaard Brouer
  2 siblings, 0 replies; 4+ messages in thread
From: Jesper Dangaard Brouer @ 2015-08-24  0:59 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, akpm
  Cc: aravinda, iamjoonsoo.kim, Paul E. McKenney, linux-kernel,
	Jesper Dangaard Brouer

This change focus on improving the speed of object freeing in the
"slowpath" of kmem_cache_free_bulk.

The slowpath call __slab_free() have been extended with support for
bulk free, which amortize the overhead of the locked cmpxchg_double_slab.

To use the new bulking feature of __slab_free(), we build what I call
a detached freelist.  The detached freelist takes advantage of three
properties:

 1) the free function call owns the object that is about to be freed,
    thus writing into this memory is synchronization-free.

 2) many freelist's can co-exist side-by-side in the same page each
    with a separate head pointer.

 3) it is the visibility of the head pointer that needs synchronization.

Given these properties, the brilliant part is that the detached
freelist can be constructed without any need for synchronization.
The freelist is constructed directly in the page objects, without any
synchronization needed.  The detached freelist is allocated on the
stack of the function call kmem_cache_free_bulk.  Thus, the freelist
head pointer is not visible to other CPUs.

This implementation is fairly simple, as it only builds the detached
freelist if two consecutive objects belongs to the same page.  When
detecting object page does not match, it simply flushes the local
freelist, and starts a new local detached freelist.  It will not
look-ahead to see if further opputunities exists in the

The next patch have a more advanced look-ahead approach, but is also
more complicated. Splitting them up, because I want to be able to
benchmark the simple against the advanced approach.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

---
bulk- Fallback                  - Bulk API
  1 -  64 cycles(tsc) 16.109 ns - 47 cycles(tsc) 11.894 - improved 26.6%
  2 -  56 cycles(tsc) 14.158 ns - 45 cycles(tsc) 11.274 - improved 19.6%
  3 -  54 cycles(tsc) 13.650 ns - 23 cycles(tsc)  6.001 - improved 57.4%
  4 -  53 cycles(tsc) 13.268 ns - 21 cycles(tsc)  5.262 - improved 60.4%
  8 -  51 cycles(tsc) 12.841 ns - 18 cycles(tsc)  4.718 - improved 64.7%
 16 -  50 cycles(tsc) 12.583 ns - 19 cycles(tsc)  4.896 - improved 62.0%
 30 -  85 cycles(tsc) 21.357 ns - 26 cycles(tsc)  6.549 - improved 69.4%
 32 -  82 cycles(tsc) 20.690 ns - 25 cycles(tsc)  6.412 - improved 69.5%
 34 -  81 cycles(tsc) 20.322 ns - 25 cycles(tsc)  6.365 - improved 69.1%
 48 -  93 cycles(tsc) 23.332 ns - 28 cycles(tsc)  7.139 - improved 69.9%
 64 -  98 cycles(tsc) 24.544 ns - 62 cycles(tsc) 15.543 - improved 36.7%
128 -  96 cycles(tsc) 24.219 ns - 68 cycles(tsc) 17.143 - improved 29.2%
158 - 107 cycles(tsc) 26.817 ns - 69 cycles(tsc) 17.431 - improved 35.5%
250 - 107 cycles(tsc) 26.824 ns - 70 cycles(tsc) 17.730 - improved 34.6%
---
 mm/slub.c |   48 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 41 insertions(+), 7 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 10b57a3bb895..40e4b5926311 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2756,12 +2756,26 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
 }
 EXPORT_SYMBOL(kmem_cache_free);
 
+struct detached_freelist {
+	struct page *page;
+	void *freelist;
+	void *tail_object;
+	int cnt;
+};
+
 /* Note that interrupts must be enabled when calling this function. */
 void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
 {
 	struct kmem_cache_cpu *c;
 	struct page *page;
 	int i;
+	/* Opportunistically delay updating page->freelist, hoping
+	 * next free happen to same page.  Start building the freelist
+	 * in the page, but keep local stack ptr to freelist.  If
+	 * successful several object can be transferred to page with a
+	 * single cmpxchg_double.
+	 */
+	struct detached_freelist df = {0};
 
 	local_irq_disable();
 	c = this_cpu_ptr(s->cpu_slab);
@@ -2778,22 +2792,42 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
 
 		page = virt_to_head_page(object);
 
-		if (c->page == page) {
+		if (page == df.page) {
+			/* Oppotunity to delay real free */
+			set_freepointer(s, object, df.freelist);
+			df.freelist = object;
+			df.cnt++;
+		} else if (c->page == page) {
 			/* Fastpath: local CPU free */
 			set_freepointer(s, object, c->freelist);
 			c->freelist = object;
 		} else {
-			c->tid = next_tid(c->tid);
-			local_irq_enable();
-			/* Slowpath: overhead locked cmpxchg_double_slab */
-			__slab_free(s, page, object, _RET_IP_, NULL, 1);
-			local_irq_disable();
-			c = this_cpu_ptr(s->cpu_slab);
+			/* Slowpath: Flush delayed free */
+			if (df.page) {
+				c->tid = next_tid(c->tid);
+				local_irq_enable();
+				__slab_free(s, df.page, df.tail_object,
+					    _RET_IP_, df.freelist, df.cnt);
+				local_irq_disable();
+				c = this_cpu_ptr(s->cpu_slab);
+			}
+			/* Start new round of delayed free */
+			df.page = page;
+			df.tail_object = object;
+			set_freepointer(s, object, NULL);
+			df.freelist = object;
+			df.cnt = 1;
 		}
 	}
 exit:
 	c->tid = next_tid(c->tid);
 	local_irq_enable();
+
+	/* Flush detached freelist */
+	if (df.page) {
+		__slab_free(s, df.page, df.tail_object,
+			    _RET_IP_, df.freelist, df.cnt);
+	}
 }
 EXPORT_SYMBOL(kmem_cache_free_bulk);
 


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH V2 3/3] slub: build detached freelist with look-ahead
  2015-08-24  0:58 [PATCH V2 0/3] slub: introducing detached freelist Jesper Dangaard Brouer
  2015-08-24  0:58 ` [PATCH V2 1/3] slub: extend slowpath __slab_free() to handle bulk free Jesper Dangaard Brouer
  2015-08-24  0:59 ` [PATCH V2 2/3] slub: optimize bulk slowpath free by detached freelist Jesper Dangaard Brouer
@ 2015-08-24  0:59 ` Jesper Dangaard Brouer
  2 siblings, 0 replies; 4+ messages in thread
From: Jesper Dangaard Brouer @ 2015-08-24  0:59 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter, akpm
  Cc: aravinda, iamjoonsoo.kim, Paul E. McKenney, linux-kernel,
	Jesper Dangaard Brouer

This change is a more advanced use of detached freelist.  The bulk
free array is scanned is a progressive manor with a limited look-ahead
facility.

To maintain the same performance level, as the previous simple
implementation, the look-ahead have been limited to only 3 objects.
This number have been determined my experimental micro benchmarking.

For performance the free loop in kmem_cache_free_bulk have been
significantly reorganized, with a focus on making the branches more
predictable for the compiler.  E.g. the per CPU c->freelist is also
build as a detached freelist, even-though it would be just as fast as
freeing directly to it, but it save creating an unpredictable branch.

Another benefit of this change is that kmem_cache_free_bulk() runs
mostly with IRQs enabled.  The local IRQs are only disabled when
updating the per CPU c->freelist.  This should please Thomas Gleixner.

Pitfall(1): Removed kmem debug support.

Pitfall(2): No BUG_ON() freeing NULL pointers, but the algorithm
            handles and skips these NULL pointers.

Compare against previous patch:
 There is some fluctuation in the benchmarks between runs.  To counter
this I've run some specific[1] bulk sizes, repeated 100 times and run
dmesg through  Rusty's "stats"[2] tool.

Command line:
  sudo dmesg -c ;\
  for x in `seq 100`; do \
    modprobe slab_bulk_test02 bulksz=48 loops=100000 && rmmod slab_bulk_test02; \
    echo $x; \
    sleep 0.${RANDOM} ;\
  done; \
  dmesg | stats

Results:

bulk size:16, average: +2.01 cycles
 Prev: between 19-52 (average: 22.65 stddev:+/-6.9)
 This: between 19-67 (average: 24.67 stddev:+/-9.9)

bulk size:48, average: +1.54 cycles
 Prev: between 23-45 (average: 27.88 stddev:+/-4)
 This: between 24-41 (average: 29.42 stddev:+/-3.7)

bulk size:144, average: +1.73 cycles
 Prev: between 44-76 (average: 60.31 stddev:+/-7.7)
 This: between 49-80 (average: 62.04 stddev:+/-7.3)

bulk size:512, average: +8.94 cycles
 Prev: between 50-68 (average: 60.11 stddev: +/-4.3)
 This: between 56-80 (average: 69.05 stddev: +/-5.2)

bulk size:2048, average: +26.81 cycles
 Prev: between 61-73 (average: 68.10 stddev:+/-2.9)
 This: between 90-104(average: 94.91 stddev:+/-2.1)

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test02.c
[2] https://github.com/rustyrussell/stats

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

---
bulk- Fallback                  - Bulk API
  1 -  64 cycles(tsc) 16.144 ns - 47 cycles(tsc) 11.931 - improved 26.6%
  2 -  57 cycles(tsc) 14.397 ns - 29 cycles(tsc)  7.368 - improved 49.1%
  3 -  55 cycles(tsc) 13.797 ns - 24 cycles(tsc)  6.003 - improved 56.4%
  4 -  53 cycles(tsc) 13.500 ns - 22 cycles(tsc)  5.543 - improved 58.5%
  8 -  52 cycles(tsc) 13.008 ns - 20 cycles(tsc)  5.047 - improved 61.5%
 16 -  51 cycles(tsc) 12.763 ns - 20 cycles(tsc)  5.015 - improved 60.8%
 30 -  50 cycles(tsc) 12.743 ns - 20 cycles(tsc)  5.062 - improved 60.0%
 32 -  51 cycles(tsc) 12.908 ns - 20 cycles(tsc)  5.089 - improved 60.8%
 34 -  87 cycles(tsc) 21.936 ns - 28 cycles(tsc)  7.006 - improved 67.8%
 48 -  79 cycles(tsc) 19.840 ns - 31 cycles(tsc)  7.755 - improved 60.8%
 64 -  86 cycles(tsc) 21.669 ns - 68 cycles(tsc) 17.203 - improved 20.9%
128 - 101 cycles(tsc) 25.340 ns - 72 cycles(tsc) 18.195 - improved 28.7%
158 - 112 cycles(tsc) 28.152 ns - 73 cycles(tsc) 18.372 - improved 34.8%
250 - 110 cycles(tsc) 27.727 ns - 73 cycles(tsc) 18.430 - improved 33.6%
---
 mm/slub.c |  138 ++++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 90 insertions(+), 48 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 40e4b5926311..49ae96f45670 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2763,71 +2763,113 @@ struct detached_freelist {
 	int cnt;
 };
 
-/* Note that interrupts must be enabled when calling this function. */
-void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
+/*
+ * This function extract objects belonging to the same page, and
+ * builds a detached freelist directly within the given page/objects.
+ * This can happen without any need for synchronization, because the
+ * objects are owned by running process.  The freelist is build up as
+ * a single linked list in the objects.  The idea is, that this
+ * detached freelist can then be bulk transferred to the real
+ * freelist(s), but only requiring a single synchronization primitive.
+ */
+static inline int build_detached_freelist(
+	struct kmem_cache *s, size_t size, void **p,
+	struct detached_freelist *df, int start_index)
 {
-	struct kmem_cache_cpu *c;
 	struct page *page;
 	int i;
-	/* Opportunistically delay updating page->freelist, hoping
-	 * next free happen to same page.  Start building the freelist
-	 * in the page, but keep local stack ptr to freelist.  If
-	 * successful several object can be transferred to page with a
-	 * single cmpxchg_double.
-	 */
-	struct detached_freelist df = {0};
+	int lookahead = 0;
+	void *object;
 
-	local_irq_disable();
-	c = this_cpu_ptr(s->cpu_slab);
+	/* Always re-init detached_freelist */
+	do {
+		object = p[start_index];
+		if (object) {
+			/* Start new delayed freelist */
+			df->page = virt_to_head_page(object);
+			df->tail_object = object;
+			set_freepointer(s, object, NULL);
+			df->freelist = object;
+			df->cnt = 1;
+			p[start_index] = NULL; /* mark object processed */
+		} else {
+			df->page = NULL; /* Handle NULL ptr in array */
+		}
+		start_index++;
+	} while (!object && start_index < size);
 
-	for (i = 0; i < size; i++) {
-		void *object = p[i];
+	for (i = start_index; i < size; i++) {
+		object = p[i];
 
-		BUG_ON(!object);
-		/* kmem cache debug support */
-		s = cache_from_obj(s, object);
-		if (unlikely(!s))
-			goto exit;
-		slab_free_hook(s, object);
+		if (!object)
+			continue; /* Skip processed objects */
 
 		page = virt_to_head_page(object);
 
-		if (page == df.page) {
-			/* Oppotunity to delay real free */
-			set_freepointer(s, object, df.freelist);
-			df.freelist = object;
-			df.cnt++;
-		} else if (c->page == page) {
-			/* Fastpath: local CPU free */
-			set_freepointer(s, object, c->freelist);
-			c->freelist = object;
+		/* df->page is always set at this point */
+		if (page == df->page) {
+			/* Oppotunity build freelist */
+			set_freepointer(s, object, df->freelist);
+			df->freelist = object;
+			df->cnt++;
+			p[i] = NULL; /* mark object processed */
+			if (!lookahead)
+				start_index++;
 		} else {
-			/* Slowpath: Flush delayed free */
-			if (df.page) {
+			/* Limit look ahead search */
+			if (++lookahead >= 3)
+				return start_index;
+			continue;
+		}
+	}
+	return start_index;
+}
+
+/* Note that interrupts must be enabled when calling this function. */
+void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
+{
+	struct kmem_cache_cpu *c;
+	int iterator = 0;
+	struct detached_freelist df;
+
+	BUG_ON(!size);
+
+	/* Per CPU ptr may change afterwards */
+	c = this_cpu_ptr(s->cpu_slab);
+
+	while (likely(iterator < size)) {
+		iterator = build_detached_freelist(s, size, p, &df, iterator);
+		if (likely(df.page)) {
+		redo:
+			if (c->page == df.page) {
+				/*
+				 * Local CPU free require disabling
+				 * IRQs.  It is possible to miss the
+				 * oppotunity and instead free to
+				 * page->freelist, but it does not
+				 * matter as page->freelist will
+				 * eventually be transferred to
+				 * c->freelist
+				 */
+				local_irq_disable();
+				c = this_cpu_ptr(s->cpu_slab); /* reload */
+				if (c->page != df.page) {
+					local_irq_enable();
+					goto redo;
+				}
+				/* Bulk transfer to CPU c->freelist */
+				set_freepointer(s, df.tail_object, c->freelist);
+				c->freelist = df.freelist;
+
 				c->tid = next_tid(c->tid);
 				local_irq_enable();
+			} else {
+				/* Bulk transfer to page->freelist */
 				__slab_free(s, df.page, df.tail_object,
 					    _RET_IP_, df.freelist, df.cnt);
-				local_irq_disable();
-				c = this_cpu_ptr(s->cpu_slab);
 			}
-			/* Start new round of delayed free */
-			df.page = page;
-			df.tail_object = object;
-			set_freepointer(s, object, NULL);
-			df.freelist = object;
-			df.cnt = 1;
 		}
 	}
-exit:
-	c->tid = next_tid(c->tid);
-	local_irq_enable();
-
-	/* Flush detached freelist */
-	if (df.page) {
-		__slab_free(s, df.page, df.tail_object,
-			    _RET_IP_, df.freelist, df.cnt);
-	}
 }
 EXPORT_SYMBOL(kmem_cache_free_bulk);
 


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-08-24  0:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-24  0:58 [PATCH V2 0/3] slub: introducing detached freelist Jesper Dangaard Brouer
2015-08-24  0:58 ` [PATCH V2 1/3] slub: extend slowpath __slab_free() to handle bulk free Jesper Dangaard Brouer
2015-08-24  0:59 ` [PATCH V2 2/3] slub: optimize bulk slowpath free by detached freelist Jesper Dangaard Brouer
2015-08-24  0:59 ` [PATCH V2 3/3] slub: build detached freelist with look-ahead Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox