From: Jesper Dangaard Brouer <brouer@redhat.com>
To: linux-mm@kvack.org, Christoph Lameter <cl@linux.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: netdev@vger.kernel.org,
Alexander Duyck <alexander.duyck@gmail.com>,
Jesper Dangaard Brouer <brouer@redhat.com>
Subject: [PATCH 0/7] slub: bulk alloc and free for slub allocator
Date: Mon, 15 Jun 2015 17:51:45 +0200 [thread overview]
Message-ID: <20150615155053.18824.617.stgit@devil> (raw)
With this patchset SLUB allocator now both have bulk alloc and free
implemented.
(This patchset is based on DaveM's net-next tree on-top of commit
c3eee1fb1d308. Tested patchset applied on-top of volatile linux-next
commit aa036f86e1bf ("slub bulk alloc: extract objects from the per
cpu slab"))
This mostly optimizes the "fastpath" where objects are available on
the per CPU fastpath page. This mostly amortize the less-heavy
none-locked cmpxchg_double used on fastpath.
The "fallback bulking" (e.g __kmem_cache_free_bulk) provides a good
basis for comparison, but to avoid counting the overhead of the
function call in benchmarking[1] I've used an inlined versions of
these.
Tested on (very fast) CPU i7-4790K @ 4.00GHz, thus look at cycles
count (as nanosec measurements are very low given the clock rate).
Baseline normal fastpath (alloc+free cost): 43 cycles(tsc) 10.814 ns
Bulk - Fallback bulking - fastpath-bulking
1 - 47 cycles(tsc) 11.921 ns - 45 cycles(tsc) 11.461 ns improved 4.3%
2 - 46 cycles(tsc) 11.649 ns - 28 cycles(tsc) 7.023 ns improved 39.1%
3 - 46 cycles(tsc) 11.550 ns - 22 cycles(tsc) 5.671 ns improved 52.2%
4 - 45 cycles(tsc) 11.398 ns - 19 cycles(tsc) 4.967 ns improved 57.8%
8 - 45 cycles(tsc) 11.303 ns - 17 cycles(tsc) 4.298 ns improved 62.2%
16 - 44 cycles(tsc) 11.221 ns - 17 cycles(tsc) 4.423 ns improved 61.4%
30 - 75 cycles(tsc) 18.894 ns - 57 cycles(tsc) 14.497 ns improved 24.0%
32 - 73 cycles(tsc) 18.491 ns - 56 cycles(tsc) 14.227 ns improved 23.3%
34 - 75 cycles(tsc) 18.962 ns - 58 cycles(tsc) 14.638 ns improved 22.7%
48 - 80 cycles(tsc) 20.049 ns - 64 cycles(tsc) 16.247 ns improved 20.0%
64 - 87 cycles(tsc) 21.929 ns - 74 cycles(tsc) 18.598 ns improved 14.9%
128 - 98 cycles(tsc) 24.511 ns - 89 cycles(tsc) 22.295 ns improved 9.2%
158 - 101 cycles(tsc) 25.389 ns - 93 cycles(tsc) 23.390 ns improved 7.9%
250 - 104 cycles(tsc) 26.170 ns - 100 cycles(tsc) 25.112 ns improved 3.8%
Benchmarking shows impressive improvements in the "fastpath" with a
small number of objects in the working set. Once the working set
increases, resulting in activating the "slowpath" (that contains the
heavier locked cmpxchg_double) the improvement decreases.
I'm currently working on also optimizing the "slowpath" (as network
stack use-case hits this), but this patchset should provide a good
foundation for further improvements.
Rest of my patch queue in this area needs some more work, but
preliminary results are good. I'm attending Netfilter Workshop[2]
next week, and I'll hopefully return working on further improvements
in this area.
[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c
[2] http://workshop.netfilter.org/2015/
---
Christoph Lameter (2):
slab: infrastructure for bulk object allocation and freeing
slub bulk alloc: extract objects from the per cpu slab
Jesper Dangaard Brouer (5):
slub: reduce indention level in kmem_cache_alloc_bulk()
slub: fix error path bug in kmem_cache_alloc_bulk
slub: kmem_cache_alloc_bulk() move clearing outside IRQ disabled section
slub: improve bulk alloc strategy
slub: initial bulk free implementation
include/linux/slab.h | 10 +++++
mm/slab.c | 13 +++++++
mm/slab.h | 9 +++++
mm/slab_common.c | 23 ++++++++++++
mm/slob.c | 13 +++++++
mm/slub.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 161 insertions(+)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: linux-mm@kvack.org, Christoph Lameter <cl@linux.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: netdev@vger.kernel.org,
Alexander Duyck <alexander.duyck@gmail.com>,
Jesper Dangaard Brouer <brouer@redhat.com>
Subject: [PATCH 0/7] slub: bulk alloc and free for slub allocator
Date: Mon, 15 Jun 2015 17:51:45 +0200 [thread overview]
Message-ID: <20150615155053.18824.617.stgit@devil> (raw)
With this patchset SLUB allocator now both have bulk alloc and free
implemented.
(This patchset is based on DaveM's net-next tree on-top of commit
c3eee1fb1d308. Tested patchset applied on-top of volatile linux-next
commit aa036f86e1bf ("slub bulk alloc: extract objects from the per
cpu slab"))
This mostly optimizes the "fastpath" where objects are available on
the per CPU fastpath page. This mostly amortize the less-heavy
none-locked cmpxchg_double used on fastpath.
The "fallback bulking" (e.g __kmem_cache_free_bulk) provides a good
basis for comparison, but to avoid counting the overhead of the
function call in benchmarking[1] I've used an inlined versions of
these.
Tested on (very fast) CPU i7-4790K @ 4.00GHz, thus look at cycles
count (as nanosec measurements are very low given the clock rate).
Baseline normal fastpath (alloc+free cost): 43 cycles(tsc) 10.814 ns
Bulk - Fallback bulking - fastpath-bulking
1 - 47 cycles(tsc) 11.921 ns - 45 cycles(tsc) 11.461 ns improved 4.3%
2 - 46 cycles(tsc) 11.649 ns - 28 cycles(tsc) 7.023 ns improved 39.1%
3 - 46 cycles(tsc) 11.550 ns - 22 cycles(tsc) 5.671 ns improved 52.2%
4 - 45 cycles(tsc) 11.398 ns - 19 cycles(tsc) 4.967 ns improved 57.8%
8 - 45 cycles(tsc) 11.303 ns - 17 cycles(tsc) 4.298 ns improved 62.2%
16 - 44 cycles(tsc) 11.221 ns - 17 cycles(tsc) 4.423 ns improved 61.4%
30 - 75 cycles(tsc) 18.894 ns - 57 cycles(tsc) 14.497 ns improved 24.0%
32 - 73 cycles(tsc) 18.491 ns - 56 cycles(tsc) 14.227 ns improved 23.3%
34 - 75 cycles(tsc) 18.962 ns - 58 cycles(tsc) 14.638 ns improved 22.7%
48 - 80 cycles(tsc) 20.049 ns - 64 cycles(tsc) 16.247 ns improved 20.0%
64 - 87 cycles(tsc) 21.929 ns - 74 cycles(tsc) 18.598 ns improved 14.9%
128 - 98 cycles(tsc) 24.511 ns - 89 cycles(tsc) 22.295 ns improved 9.2%
158 - 101 cycles(tsc) 25.389 ns - 93 cycles(tsc) 23.390 ns improved 7.9%
250 - 104 cycles(tsc) 26.170 ns - 100 cycles(tsc) 25.112 ns improved 3.8%
Benchmarking shows impressive improvements in the "fastpath" with a
small number of objects in the working set. Once the working set
increases, resulting in activating the "slowpath" (that contains the
heavier locked cmpxchg_double) the improvement decreases.
I'm currently working on also optimizing the "slowpath" (as network
stack use-case hits this), but this patchset should provide a good
foundation for further improvements.
Rest of my patch queue in this area needs some more work, but
preliminary results are good. I'm attending Netfilter Workshop[2]
next week, and I'll hopefully return working on further improvements
in this area.
[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c
[2] http://workshop.netfilter.org/2015/
---
Christoph Lameter (2):
slab: infrastructure for bulk object allocation and freeing
slub bulk alloc: extract objects from the per cpu slab
Jesper Dangaard Brouer (5):
slub: reduce indention level in kmem_cache_alloc_bulk()
slub: fix error path bug in kmem_cache_alloc_bulk
slub: kmem_cache_alloc_bulk() move clearing outside IRQ disabled section
slub: improve bulk alloc strategy
slub: initial bulk free implementation
include/linux/slab.h | 10 +++++
mm/slab.c | 13 +++++++
mm/slab.h | 9 +++++
mm/slab_common.c | 23 ++++++++++++
mm/slob.c | 13 +++++++
mm/slub.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 161 insertions(+)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
next reply other threads:[~2015-06-15 15:51 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-15 15:51 Jesper Dangaard Brouer [this message]
2015-06-15 15:51 ` [PATCH 0/7] slub: bulk alloc and free for slub allocator Jesper Dangaard Brouer
2015-06-15 15:51 ` [PATCH 1/7] slab: infrastructure for bulk object allocation and freeing Jesper Dangaard Brouer
2015-06-15 15:51 ` Jesper Dangaard Brouer
2015-06-15 16:45 ` Alexander Duyck
2015-06-15 16:45 ` Alexander Duyck
2015-06-15 16:50 ` Christoph Lameter
2015-06-16 21:44 ` Andrew Morton
2015-06-16 21:44 ` Andrew Morton
2015-06-15 15:52 ` [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab Jesper Dangaard Brouer
2015-06-15 15:52 ` Jesper Dangaard Brouer
2015-06-16 7:21 ` Joonsoo Kim
2015-06-16 7:21 ` Joonsoo Kim
2015-06-16 15:05 ` Christoph Lameter
2015-06-16 21:48 ` Andrew Morton
2015-06-16 21:48 ` Andrew Morton
2015-06-17 6:24 ` Jesper Dangaard Brouer
2015-06-15 15:52 ` [PATCH 3/7] slub: reduce indention level in kmem_cache_alloc_bulk() Jesper Dangaard Brouer
2015-06-15 15:52 ` Jesper Dangaard Brouer
2015-06-15 15:52 ` [PATCH 4/7] slub: fix error path bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
2015-06-15 15:52 ` Jesper Dangaard Brouer
2015-06-16 21:51 ` Andrew Morton
2015-06-17 6:25 ` Jesper Dangaard Brouer
2015-06-15 15:52 ` [PATCH 5/7] slub: kmem_cache_alloc_bulk() move clearing outside IRQ disabled section Jesper Dangaard Brouer
2015-06-15 15:52 ` Jesper Dangaard Brouer
2015-06-15 15:52 ` [PATCH 6/7] slub: improve bulk alloc strategy Jesper Dangaard Brouer
2015-06-15 15:52 ` Jesper Dangaard Brouer
2015-06-15 16:36 ` Christoph Lameter
2015-06-16 21:53 ` Andrew Morton
2015-06-17 6:29 ` Jesper Dangaard Brouer
2015-06-17 6:29 ` Jesper Dangaard Brouer
2015-06-15 15:52 ` [PATCH 7/7] slub: initial bulk free implementation Jesper Dangaard Brouer
2015-06-15 15:52 ` Jesper Dangaard Brouer
2015-06-15 16:34 ` Christoph Lameter
2015-06-16 8:04 ` Jesper Dangaard Brouer
2015-06-15 17:04 ` Alexander Duyck
2015-06-16 7:23 ` Joonsoo Kim
2015-06-16 7:23 ` Joonsoo Kim
2015-06-16 9:20 ` Jesper Dangaard Brouer
2015-06-16 12:00 ` Joonsoo Kim
2015-06-16 12:00 ` Joonsoo Kim
2015-06-16 13:58 ` Jesper Dangaard Brouer
2015-06-16 15:06 ` Christoph Lameter
2015-06-16 7:28 ` Joonsoo Kim
2015-06-16 8:21 ` Jesper Dangaard Brouer
2015-06-16 8:21 ` Jesper Dangaard Brouer
2015-06-16 8:57 ` Jesper Dangaard Brouer
2015-06-16 12:05 ` Joonsoo Kim
2015-06-16 15:10 ` Christoph Lameter
2015-06-16 15:52 ` Jesper Dangaard Brouer
2015-06-16 15:52 ` Jesper Dangaard Brouer
2015-06-16 16:04 ` Christoph Lameter
2015-06-16 16:04 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150615155053.18824.617.stgit@devil \
--to=brouer@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.duyck@gmail.com \
--cc=cl@linux.com \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.