From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Christoph Lameter <cl@linux.com>,
akpm@linuxfoundation.org, rostedt@goodmis.org,
LKML <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Pekka Enberg <penberg@kernel.org>,
brouer@redhat.com
Subject: Re: [PATCH 3/7] slub: Do not use c->page on free
Date: Tue, 16 Dec 2014 15:05:37 +0100 [thread overview]
Message-ID: <20141216150537.25c72553@redhat.com> (raw)
In-Reply-To: <CAPAsAGyGXSP-2eY1CQS1jDpJq89kwpCuJm4ZBa3cYDGkv_oTxA@mail.gmail.com>
On Tue, 16 Dec 2014 11:54:12 +0400
Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:
> 2014-12-16 5:42 GMT+03:00 Joonsoo Kim <iamjoonsoo.kim@lge.com>:
> > On Mon, Dec 15, 2014 at 08:16:00AM -0600, Christoph Lameter wrote:
> >> On Mon, 15 Dec 2014, Joonsoo Kim wrote:
> >>
> >> > > +static bool same_slab_page(struct kmem_cache *s, struct page *page, void *p)
> >> > > +{
> >> > > + long d = p - page->address;
> >> > > +
> >> > > + return d > 0 && d < (1 << MAX_ORDER) && d < (compound_order(page) << PAGE_SHIFT);
> >> > > +}
> >> > > +
> >> >
> >> > Somtimes, compound_order() induces one more cacheline access, because
> >> > compound_order() access second struct page in order to get order. Is there
> >> > any way to remove this?
> >>
> >> I already have code there to avoid the access if its within a MAX_ORDER
> >> page. We could probably go for a smaller setting there. PAGE_COSTLY_ORDER?
> >
> > That is the solution to avoid compound_order() call when slab of
> > object isn't matched with per cpu slab.
> >
> > What I'm asking is whether there is a way to avoid compound_order() call when slab
> > of object is matched with per cpu slab or not.
> >
>
> Can we use page->objects for that?
>
> Like this:
>
> return d > 0 && d < page->objects * s->size;
I gave this change a quick micro benchmark spin (with Christoph's
tool), the results are below.
Notice, the "2. Kmalloc: alloc/free test" for small obj sizes improves,
which is more "back-to-normal" as before this patchset.
Before (with curr patchset):
============================
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 50 cycles kfree -> 60 cycles
10000 times kmalloc(16) -> 52 cycles kfree -> 60 cycles
10000 times kmalloc(32) -> 56 cycles kfree -> 64 cycles
10000 times kmalloc(64) -> 67 cycles kfree -> 72 cycles
10000 times kmalloc(128) -> 86 cycles kfree -> 79 cycles
10000 times kmalloc(256) -> 97 cycles kfree -> 110 cycles
10000 times kmalloc(512) -> 88 cycles kfree -> 114 cycles
10000 times kmalloc(1024) -> 91 cycles kfree -> 115 cycles
10000 times kmalloc(2048) -> 119 cycles kfree -> 131 cycles
10000 times kmalloc(4096) -> 159 cycles kfree -> 163 cycles
10000 times kmalloc(8192) -> 269 cycles kfree -> 226 cycles
10000 times kmalloc(16384) -> 498 cycles kfree -> 291 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 112 cycles
10000 times kmalloc(16)/kfree -> 118 cycles
10000 times kmalloc(32)/kfree -> 117 cycles
10000 times kmalloc(64)/kfree -> 122 cycles
10000 times kmalloc(128)/kfree -> 133 cycles
10000 times kmalloc(256)/kfree -> 79 cycles
10000 times kmalloc(512)/kfree -> 79 cycles
10000 times kmalloc(1024)/kfree -> 79 cycles
10000 times kmalloc(2048)/kfree -> 72 cycles
10000 times kmalloc(4096)/kfree -> 78 cycles
10000 times kmalloc(8192)/kfree -> 78 cycles
10000 times kmalloc(16384)/kfree -> 596 cycles
After (with proposed change):
=============================
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 53 cycles kfree -> 62 cycles
10000 times kmalloc(16) -> 53 cycles kfree -> 64 cycles
10000 times kmalloc(32) -> 57 cycles kfree -> 66 cycles
10000 times kmalloc(64) -> 68 cycles kfree -> 72 cycles
10000 times kmalloc(128) -> 77 cycles kfree -> 80 cycles
10000 times kmalloc(256) -> 98 cycles kfree -> 110 cycles
10000 times kmalloc(512) -> 87 cycles kfree -> 113 cycles
10000 times kmalloc(1024) -> 90 cycles kfree -> 116 cycles
10000 times kmalloc(2048) -> 116 cycles kfree -> 131 cycles
10000 times kmalloc(4096) -> 160 cycles kfree -> 164 cycles
10000 times kmalloc(8192) -> 269 cycles kfree -> 226 cycles
10000 times kmalloc(16384) -> 499 cycles kfree -> 295 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 74 cycles
10000 times kmalloc(16)/kfree -> 73 cycles
10000 times kmalloc(32)/kfree -> 73 cycles
10000 times kmalloc(64)/kfree -> 74 cycles
10000 times kmalloc(128)/kfree -> 73 cycles
10000 times kmalloc(256)/kfree -> 72 cycles
10000 times kmalloc(512)/kfree -> 73 cycles
10000 times kmalloc(1024)/kfree -> 72 cycles
10000 times kmalloc(2048)/kfree -> 73 cycles
10000 times kmalloc(4096)/kfree -> 72 cycles
10000 times kmalloc(8192)/kfree -> 72 cycles
10000 times kmalloc(16384)/kfree -> 556 cycles
(kernel 3.18.0-net-next+ SMP PREEMPT on top of f96fe225677)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Christoph Lameter <cl@linux.com>,
akpm@linuxfoundation.org, rostedt@goodmis.org,
LKML <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Pekka Enberg <penberg@kernel.org>,
brouer@redhat.com
Subject: Re: [PATCH 3/7] slub: Do not use c->page on free
Date: Tue, 16 Dec 2014 15:05:37 +0100 [thread overview]
Message-ID: <20141216150537.25c72553@redhat.com> (raw)
In-Reply-To: <CAPAsAGyGXSP-2eY1CQS1jDpJq89kwpCuJm4ZBa3cYDGkv_oTxA@mail.gmail.com>
On Tue, 16 Dec 2014 11:54:12 +0400
Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:
> 2014-12-16 5:42 GMT+03:00 Joonsoo Kim <iamjoonsoo.kim@lge.com>:
> > On Mon, Dec 15, 2014 at 08:16:00AM -0600, Christoph Lameter wrote:
> >> On Mon, 15 Dec 2014, Joonsoo Kim wrote:
> >>
> >> > > +static bool same_slab_page(struct kmem_cache *s, struct page *page, void *p)
> >> > > +{
> >> > > + long d = p - page->address;
> >> > > +
> >> > > + return d > 0 && d < (1 << MAX_ORDER) && d < (compound_order(page) << PAGE_SHIFT);
> >> > > +}
> >> > > +
> >> >
> >> > Somtimes, compound_order() induces one more cacheline access, because
> >> > compound_order() access second struct page in order to get order. Is there
> >> > any way to remove this?
> >>
> >> I already have code there to avoid the access if its within a MAX_ORDER
> >> page. We could probably go for a smaller setting there. PAGE_COSTLY_ORDER?
> >
> > That is the solution to avoid compound_order() call when slab of
> > object isn't matched with per cpu slab.
> >
> > What I'm asking is whether there is a way to avoid compound_order() call when slab
> > of object is matched with per cpu slab or not.
> >
>
> Can we use page->objects for that?
>
> Like this:
>
> return d > 0 && d < page->objects * s->size;
I gave this change a quick micro benchmark spin (with Christoph's
tool), the results are below.
Notice, the "2. Kmalloc: alloc/free test" for small obj sizes improves,
which is more "back-to-normal" as before this patchset.
Before (with curr patchset):
============================
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 50 cycles kfree -> 60 cycles
10000 times kmalloc(16) -> 52 cycles kfree -> 60 cycles
10000 times kmalloc(32) -> 56 cycles kfree -> 64 cycles
10000 times kmalloc(64) -> 67 cycles kfree -> 72 cycles
10000 times kmalloc(128) -> 86 cycles kfree -> 79 cycles
10000 times kmalloc(256) -> 97 cycles kfree -> 110 cycles
10000 times kmalloc(512) -> 88 cycles kfree -> 114 cycles
10000 times kmalloc(1024) -> 91 cycles kfree -> 115 cycles
10000 times kmalloc(2048) -> 119 cycles kfree -> 131 cycles
10000 times kmalloc(4096) -> 159 cycles kfree -> 163 cycles
10000 times kmalloc(8192) -> 269 cycles kfree -> 226 cycles
10000 times kmalloc(16384) -> 498 cycles kfree -> 291 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 112 cycles
10000 times kmalloc(16)/kfree -> 118 cycles
10000 times kmalloc(32)/kfree -> 117 cycles
10000 times kmalloc(64)/kfree -> 122 cycles
10000 times kmalloc(128)/kfree -> 133 cycles
10000 times kmalloc(256)/kfree -> 79 cycles
10000 times kmalloc(512)/kfree -> 79 cycles
10000 times kmalloc(1024)/kfree -> 79 cycles
10000 times kmalloc(2048)/kfree -> 72 cycles
10000 times kmalloc(4096)/kfree -> 78 cycles
10000 times kmalloc(8192)/kfree -> 78 cycles
10000 times kmalloc(16384)/kfree -> 596 cycles
After (with proposed change):
=============================
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 53 cycles kfree -> 62 cycles
10000 times kmalloc(16) -> 53 cycles kfree -> 64 cycles
10000 times kmalloc(32) -> 57 cycles kfree -> 66 cycles
10000 times kmalloc(64) -> 68 cycles kfree -> 72 cycles
10000 times kmalloc(128) -> 77 cycles kfree -> 80 cycles
10000 times kmalloc(256) -> 98 cycles kfree -> 110 cycles
10000 times kmalloc(512) -> 87 cycles kfree -> 113 cycles
10000 times kmalloc(1024) -> 90 cycles kfree -> 116 cycles
10000 times kmalloc(2048) -> 116 cycles kfree -> 131 cycles
10000 times kmalloc(4096) -> 160 cycles kfree -> 164 cycles
10000 times kmalloc(8192) -> 269 cycles kfree -> 226 cycles
10000 times kmalloc(16384) -> 499 cycles kfree -> 295 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 74 cycles
10000 times kmalloc(16)/kfree -> 73 cycles
10000 times kmalloc(32)/kfree -> 73 cycles
10000 times kmalloc(64)/kfree -> 74 cycles
10000 times kmalloc(128)/kfree -> 73 cycles
10000 times kmalloc(256)/kfree -> 72 cycles
10000 times kmalloc(512)/kfree -> 73 cycles
10000 times kmalloc(1024)/kfree -> 72 cycles
10000 times kmalloc(2048)/kfree -> 73 cycles
10000 times kmalloc(4096)/kfree -> 72 cycles
10000 times kmalloc(8192)/kfree -> 72 cycles
10000 times kmalloc(16384)/kfree -> 556 cycles
(kernel 3.18.0-net-next+ SMP PREEMPT on top of f96fe225677)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2014-12-16 14:05 UTC|newest]
Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-10 16:30 [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1 Christoph Lameter
2014-12-10 16:30 ` Christoph Lameter
2014-12-10 16:30 ` [PATCH 1/7] slub: Remove __slab_alloc code duplication Christoph Lameter
2014-12-10 16:30 ` Christoph Lameter
2014-12-10 16:39 ` Pekka Enberg
2014-12-10 16:39 ` Pekka Enberg
2014-12-10 16:30 ` [PATCH 2/7] slub: Use page-mapping to store address of page frame like done in SLAB Christoph Lameter
2014-12-10 16:30 ` Christoph Lameter
2014-12-10 16:45 ` Pekka Enberg
2014-12-10 16:45 ` Pekka Enberg
2014-12-10 16:30 ` [PATCH 3/7] slub: Do not use c->page on free Christoph Lameter
2014-12-10 16:30 ` Christoph Lameter
2014-12-10 16:54 ` Pekka Enberg
2014-12-10 16:54 ` Pekka Enberg
2014-12-10 17:08 ` Christoph Lameter
2014-12-10 17:08 ` Christoph Lameter
2014-12-10 17:32 ` Pekka Enberg
2014-12-10 17:32 ` Pekka Enberg
2014-12-10 17:37 ` Christoph Lameter
2014-12-10 17:37 ` Christoph Lameter
2014-12-11 13:19 ` Jesper Dangaard Brouer
2014-12-11 13:19 ` Jesper Dangaard Brouer
2014-12-11 15:01 ` Christoph Lameter
2014-12-11 15:01 ` Christoph Lameter
2014-12-15 8:03 ` Joonsoo Kim
2014-12-15 8:03 ` Joonsoo Kim
2014-12-15 14:16 ` Christoph Lameter
2014-12-15 14:16 ` Christoph Lameter
2014-12-16 2:42 ` Joonsoo Kim
2014-12-16 2:42 ` Joonsoo Kim
2014-12-16 7:54 ` Andrey Ryabinin
2014-12-16 7:54 ` Andrey Ryabinin
2014-12-16 8:25 ` Joonsoo Kim
2014-12-16 8:25 ` Joonsoo Kim
2014-12-16 14:53 ` Christoph Lameter
2014-12-16 14:53 ` Christoph Lameter
2014-12-16 15:15 ` Jesper Dangaard Brouer
2014-12-16 15:15 ` Jesper Dangaard Brouer
2014-12-16 15:34 ` Andrey Ryabinin
2014-12-16 15:34 ` Andrey Ryabinin
2014-12-16 15:48 ` Christoph Lameter
2014-12-16 15:48 ` Christoph Lameter
2014-12-17 7:15 ` Joonsoo Kim
2014-12-17 7:15 ` Joonsoo Kim
2014-12-16 15:33 ` Andrey Ryabinin
2014-12-16 15:33 ` Andrey Ryabinin
2014-12-16 14:05 ` Jesper Dangaard Brouer [this message]
2014-12-16 14:05 ` Jesper Dangaard Brouer
2014-12-10 16:30 ` [PATCH 4/7] slub: Avoid using the page struct address in allocation fastpath Christoph Lameter
2014-12-10 16:30 ` Christoph Lameter
2014-12-10 16:56 ` Pekka Enberg
2014-12-10 16:56 ` Pekka Enberg
2014-12-10 16:30 ` [PATCH 5/7] slub: Use end_token instead of NULL to terminate freelists Christoph Lameter
2014-12-10 16:30 ` Christoph Lameter
2014-12-10 16:59 ` Pekka Enberg
2014-12-10 16:59 ` Pekka Enberg
2014-12-10 16:30 ` [PATCH 6/7] slub: Drop ->page field from kmem_cache_cpu Christoph Lameter
2014-12-10 16:30 ` Christoph Lameter
2014-12-10 17:29 ` Pekka Enberg
2014-12-10 17:29 ` Pekka Enberg
2014-12-10 16:30 ` [PATCH 7/7] slub: Remove preemption disable/enable from fastpath Christoph Lameter
2014-12-10 16:30 ` Christoph Lameter
2014-12-11 13:35 ` [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1 Jesper Dangaard Brouer
2014-12-11 13:35 ` Jesper Dangaard Brouer
2014-12-11 15:03 ` Christoph Lameter
2014-12-11 15:03 ` Christoph Lameter
2014-12-11 16:50 ` Jesper Dangaard Brouer
2014-12-11 16:50 ` Jesper Dangaard Brouer
2014-12-11 17:18 ` Christoph Lameter
2014-12-11 17:18 ` Christoph Lameter
2014-12-11 18:11 ` Jesper Dangaard Brouer
2014-12-11 18:11 ` Jesper Dangaard Brouer
2014-12-11 17:37 ` Jesper Dangaard Brouer
2014-12-11 17:37 ` Jesper Dangaard Brouer
2014-12-12 10:39 ` Jesper Dangaard Brouer
2014-12-12 10:39 ` Jesper Dangaard Brouer
2014-12-12 18:31 ` Christoph Lameter
2014-12-12 18:31 ` Christoph Lameter
2014-12-15 7:59 ` Joonsoo Kim
2014-12-15 7:59 ` Joonsoo Kim
2014-12-17 7:13 ` Joonsoo Kim
2014-12-17 7:13 ` Joonsoo Kim
2014-12-17 12:08 ` Jesper Dangaard Brouer
2014-12-17 12:08 ` Jesper Dangaard Brouer
2014-12-18 14:34 ` Joonsoo Kim
2014-12-18 14:34 ` Joonsoo Kim
2014-12-17 15:36 ` Christoph Lameter
2014-12-17 15:36 ` Christoph Lameter
2014-12-18 14:38 ` Joonsoo Kim
2014-12-18 14:38 ` Joonsoo Kim
2014-12-18 14:57 ` Christoph Lameter
2014-12-18 14:57 ` Christoph Lameter
2014-12-18 15:08 ` Joonsoo Kim
2014-12-18 15:08 ` Joonsoo Kim
2014-12-17 16:10 ` Christoph Lameter
2014-12-17 16:10 ` Christoph Lameter
2014-12-17 19:44 ` Christoph Lameter
2014-12-17 19:44 ` Christoph Lameter
2014-12-18 14:41 ` Joonsoo Kim
2014-12-18 14:41 ` Joonsoo Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141216150537.25c72553@redhat.com \
--to=brouer@redhat.com \
--cc=akpm@linuxfoundation.org \
--cc=cl@linux.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=penberg@kernel.org \
--cc=rostedt@goodmis.org \
--cc=ryabinin.a.a@gmail.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.