linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: linux-mm@kvack.org, Christoph Lameter <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	brouer@redhat.com
Subject: Re: [PATCH 3/3] slub: build detached freelist with look-ahead
Date: Thu, 16 Jul 2015 11:57:56 +0200	[thread overview]
Message-ID: <20150716115756.311496af@redhat.com> (raw)
In-Reply-To: <20150715160212.17525.88123.stgit@devil>


On Wed, 15 Jul 2015 18:02:39 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> Results:
[...]
> bulk- Fallback                  - Bulk API
>   1 -  64 cycles(tsc) 16.144 ns - 47 cycles(tsc) 11.931 - improved 26.6%
>   2 -  57 cycles(tsc) 14.397 ns - 29 cycles(tsc)  7.368 - improved 49.1%
>   3 -  55 cycles(tsc) 13.797 ns - 24 cycles(tsc)  6.003 - improved 56.4%
>   4 -  53 cycles(tsc) 13.500 ns - 22 cycles(tsc)  5.543 - improved 58.5%
>   8 -  52 cycles(tsc) 13.008 ns - 20 cycles(tsc)  5.047 - improved 61.5%
>  16 -  51 cycles(tsc) 12.763 ns - 20 cycles(tsc)  5.015 - improved 60.8%
>  30 -  50 cycles(tsc) 12.743 ns - 20 cycles(tsc)  5.062 - improved 60.0%
>  32 -  51 cycles(tsc) 12.908 ns - 20 cycles(tsc)  5.089 - improved 60.8%
>  34 -  87 cycles(tsc) 21.936 ns - 28 cycles(tsc)  7.006 - improved 67.8%
>  48 -  79 cycles(tsc) 19.840 ns - 31 cycles(tsc)  7.755 - improved 60.8%
>  64 -  86 cycles(tsc) 21.669 ns - 68 cycles(tsc) 17.203 - improved 20.9%
> 128 - 101 cycles(tsc) 25.340 ns - 72 cycles(tsc) 18.195 - improved 28.7%
> 158 - 112 cycles(tsc) 28.152 ns - 73 cycles(tsc) 18.372 - improved 34.8%
> 250 - 110 cycles(tsc) 27.727 ns - 73 cycles(tsc) 18.430 - improved 33.6%


Something interesting happens, when I'm tuning the SLAB/slub cache...

I was thinking what happens if I "give" the slub more per CPU partial
pages.  In my benchmark 250 is my "max" bulk working set.

Tuning SLAB/slub for 256 bytes object size, by tuning SLUB saying each
CPU partial should be allowed to contain 256 objects (cpu_partial).

 sudo sh -c 'echo 256 > /sys/kernel/slab/:t-0000256/cpu_partial'

And adjusting 'min_partial' affects __slab_free() by avoiding removing
partial if node->nr_partial >= s->min_partial.  Thus, in our test
min_partial=9 result in keeping 9 pages 32 * 9 = 288 objects in the

 sudo sh -c 'echo 9   > /sys/kernel/slab/:t-0000256/min_partial'
 sudo grep -H . /sys/kernel/slab/:t-0000256/*

First notice the normal fastpath is: 47 cycles(tsc) 11.894 ns

Patch03-TUNED-run01:
bulk-  Fallback                 - Bulk-API
  1 -  63 cycles(tsc) 15.866 ns - 46 cycles(tsc) 11.653 ns - improved 27.0%
  2 -  56 cycles(tsc) 14.137 ns - 28 cycles(tsc)  7.106 ns - improved 50.0%
  3 -  54 cycles(tsc) 13.623 ns - 23 cycles(tsc)  5.845 ns - improved 57.4%
  4 -  53 cycles(tsc) 13.345 ns - 21 cycles(tsc)  5.316 ns - improved 60.4%
  8 -  51 cycles(tsc) 12.960 ns - 20 cycles(tsc)  5.187 ns - improved 60.8%
 16 -  50 cycles(tsc) 12.743 ns - 20 cycles(tsc)  5.091 ns - improved 60.0%
 30 -  80 cycles(tsc) 20.153 ns - 28 cycles(tsc)  7.054 ns - improved 65.0%
 32 -  82 cycles(tsc) 20.621 ns - 33 cycles(tsc)  8.392 ns - improved 59.8%
 34 -  80 cycles(tsc) 20.125 ns - 32 cycles(tsc)  8.046 ns - improved 60.0%
 48 -  91 cycles(tsc) 22.887 ns - 30 cycles(tsc)  7.655 ns - improved 67.0%
 64 -  85 cycles(tsc) 21.362 ns - 36 cycles(tsc)  9.141 ns - improved 57.6%
128 - 101 cycles(tsc) 25.481 ns - 33 cycles(tsc)  8.286 ns - improved 67.3%
158 - 103 cycles(tsc) 25.909 ns - 36 cycles(tsc)  9.179 ns - improved 65.0%
250 - 105 cycles(tsc) 26.481 ns - 39 cycles(tsc)  9.994 ns - improved 62.9%

Notice how ALL of the bulk sizes now are faster than the 47 cycles of
the normal slub fastpath.  This is amazing!

A little strangely, the tuning didn't seem to help the fallback version.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer



On Wed, 15 Jul 2015 18:02:39 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> Results:
> 
> bulk size:16, average: +2.01 cycles
>  Prev: between 19-52 (average: 22.65 stddev:+/-6.9)
>  This: between 19-67 (average: 24.67 stddev:+/-9.9)

bulk16:  19-39(average: 21.68+/-4.5) cycles(tsc)
 
> bulk size:48, average: +1.54 cycles
>  Prev: between 23-45 (average: 27.88 stddev:+/-4)
>  This: between 24-41 (average: 29.42 stddev:+/-3.7)

bulk48:  25-38(average: 28.4+/-2.3) cycles(tsc)
 
> bulk size:144, average: +1.73 cycles
>  Prev: between 44-76 (average: 60.31 stddev:+/-7.7)
>  This: between 49-80 (average: 62.04 stddev:+/-7.3)

bulk144: 31-45(average: 34.54+/-3.4) cycles(tsc)

> bulk size:512, average: +8.94 cycles
>  Prev: between 50-68 (average: 60.11 stddev: +/-4.3)
>  This: between 56-80 (average: 69.05 stddev: +/-5.2)

bulk512: 38-68(average: 44.48+/-7.1) cycles(tsc)
(quite good given working set tuned for is 256)

> bulk size:2048, average: +26.81 cycles
>  Prev: between 61-73 (average: 68.10 stddev:+/-2.9)
>  This: between 90-104(average: 94.91 stddev:+/-2.1)

bulk2048: 80-87(average: 83.19+/-1.1)
 
> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test02.c
> [2] https://github.com/rustyrussell/stats

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-07-16  9:58 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-15 16:01 [PATCH 0/3] slub: introducing detached freelist Jesper Dangaard Brouer
2015-07-15 16:01 ` [PATCH 1/3] slub: extend slowpath __slab_free() to handle bulk free Jesper Dangaard Brouer
2015-07-15 16:54   ` Christoph Lameter
2015-07-15 16:02 ` [PATCH 2/3] slub: optimize bulk slowpath free by detached freelist Jesper Dangaard Brouer
2015-07-15 16:56   ` Christoph Lameter
2015-07-15 16:02 ` [PATCH 3/3] slub: build detached freelist with look-ahead Jesper Dangaard Brouer
2015-07-16  9:57   ` Jesper Dangaard Brouer [this message]
2015-07-20  2:54     ` Joonsoo Kim
2015-07-20 21:28       ` Jesper Dangaard Brouer
2015-07-21 13:50         ` Christoph Lameter
2015-07-21 23:28           ` Jesper Dangaard Brouer
2015-07-23  6:34             ` Joonsoo Kim
2015-07-23 11:09               ` Jesper Dangaard Brouer
2015-07-23 14:14                 ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150716115756.311496af@redhat.com \
    --to=brouer@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=cl@linux.com \
    --cc=hannes@stressinduktion.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).