Re: [this_cpu_xx V6 7/7] this_cpu: slub aggressive use of this_cpu operations in the hotpaths

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Pekka Enberg <penberg@cs.helsinki.fi>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>, Tejun Heo <tj@kernel.org>,
	linux-kernel@vger.kernel.org,
	Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	Mel Gorman <mel@csn.ul.ie>,
	Zhang Yanmin <yanmin_zhang@linux.intel.com>
Subject: Re: [this_cpu_xx V6 7/7] this_cpu: slub aggressive use of this_cpu operations in the hotpaths
Date: Tue, 13 Oct 2009 22:44:54 +0300	[thread overview]
Message-ID: <4AD4D8B6.6010700@cs.helsinki.fi> (raw)
In-Reply-To: <alpine.DEB.1.10.0910131509170.32394@gentwo.org>

Hi Christoph,

Christoph Lameter wrote:
> Here are some cycle numbers w/o the slub patches and with. I will post the
> full test results and the patches to do these in kernel tests in a new
> thread. The regression may be due to caching behavior of SLUB that will
> not change with these patches.
> 
> Alloc fastpath wins ~ 50%. kfree also has a 50% win if the fastpath is
> being used. First test does 10000 kmallocs and then frees them all.
> Second test alloc one and free one and does that 10000 times.

I wonder how reliable these numbers are. We did similar testing a while 
back because we thought kmalloc-96 caches had weird cache behavior but 
finally figured out the anomaly was explained by the order of the tests 
run, not cache size.

AFAICT, we have similar artifact in these tests as well:

> no this_cpu ops
> 
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 239 cycles kfree -> 261 cycles
> 10000 times kmalloc(16) -> 249 cycles kfree -> 208 cycles
> 10000 times kmalloc(32) -> 215 cycles kfree -> 232 cycles
> 10000 times kmalloc(64) -> 164 cycles kfree -> 216 cycles

Notice the jump from 32 to 64 and then back to 64. One would expect we 
see linear increase as object size grows as we hit the page allocator 
more often, no?

> 10000 times kmalloc(128) -> 266 cycles kfree -> 275 cycles
> 10000 times kmalloc(256) -> 478 cycles kfree -> 199 cycles
> 10000 times kmalloc(512) -> 449 cycles kfree -> 201 cycles
> 10000 times kmalloc(1024) -> 484 cycles kfree -> 398 cycles
> 10000 times kmalloc(2048) -> 475 cycles kfree -> 559 cycles
> 10000 times kmalloc(4096) -> 792 cycles kfree -> 506 cycles
> 10000 times kmalloc(8192) -> 753 cycles kfree -> 679 cycles
> 10000 times kmalloc(16384) -> 968 cycles kfree -> 712 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 292 cycles
> 10000 times kmalloc(16)/kfree -> 308 cycles
> 10000 times kmalloc(32)/kfree -> 326 cycles
> 10000 times kmalloc(64)/kfree -> 303 cycles
> 10000 times kmalloc(128)/kfree -> 257 cycles
> 10000 times kmalloc(256)/kfree -> 262 cycles
> 10000 times kmalloc(512)/kfree -> 293 cycles
> 10000 times kmalloc(1024)/kfree -> 262 cycles
> 10000 times kmalloc(2048)/kfree -> 289 cycles
> 10000 times kmalloc(4096)/kfree -> 274 cycles
> 10000 times kmalloc(8192)/kfree -> 265 cycles
> 10000 times kmalloc(16384)/kfree -> 1041 cycles
> 
> 
> with this_cpu_xx
> 
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 134 cycles kfree -> 212 cycles
> 10000 times kmalloc(16) -> 109 cycles kfree -> 116 cycles

Same artifact here.

> 10000 times kmalloc(32) -> 157 cycles kfree -> 231 cycles
> 10000 times kmalloc(64) -> 168 cycles kfree -> 169 cycles
> 10000 times kmalloc(128) -> 263 cycles kfree -> 260 cycles
> 10000 times kmalloc(256) -> 430 cycles kfree -> 251 cycles
> 10000 times kmalloc(512) -> 415 cycles kfree -> 258 cycles
> 10000 times kmalloc(1024) -> 406 cycles kfree -> 432 cycles
> 10000 times kmalloc(2048) -> 457 cycles kfree -> 579 cycles
> 10000 times kmalloc(4096) -> 624 cycles kfree -> 553 cycles
> 10000 times kmalloc(8192) -> 851 cycles kfree -> 851 cycles
> 10000 times kmalloc(16384) -> 907 cycles kfree -> 722 cycles

And looking at these numbers:

> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 232 cycles
> 10000 times kmalloc(16)/kfree -> 150 cycles
> 10000 times kmalloc(32)/kfree -> 278 cycles
> 10000 times kmalloc(64)/kfree -> 263 cycles
> 10000 times kmalloc(128)/kfree -> 280 cycles
> 10000 times kmalloc(256)/kfree -> 279 cycles
> 10000 times kmalloc(512)/kfree -> 299 cycles
> 10000 times kmalloc(1024)/kfree -> 289 cycles
> 10000 times kmalloc(2048)/kfree -> 288 cycles
> 10000 times kmalloc(4096)/kfree -> 321 cycles
> 10000 times kmalloc(8192)/kfree -> 285 cycles
> 10000 times kmalloc(16384)/kfree -> 1002 cycles

If there's 50% improvement in the kmalloc() path, why does the 
this_cpu() version seem to be roughly as fast as the mainline version?

			Pekka

next prev parent reply	other threads:[~2009-10-13 19:45 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-07 21:10 [this_cpu_xx V6 0/7] Introduce per cpu atomic operations and avoid per cpu address arithmetic cl
2009-10-07 21:10 ` [this_cpu_xx V6 1/7] this_cpu_ops: page allocator conversion cl
2009-10-08 10:38   ` Tejun Heo
2009-10-08 10:40     ` Tejun Heo
2009-10-08 16:15     ` Christoph Lameter
2009-10-08 10:53   ` Mel Gorman
2009-10-07 21:10 ` [this_cpu_xx V6 2/7] this_cpu ops: Remove pageset_notifier cl
2009-10-07 21:10 ` [this_cpu_xx V6 3/7] Use this_cpu operations in slub cl
2009-10-12 10:19   ` Tejun Heo
2009-10-12 10:21     ` Tejun Heo
2009-10-12 14:54     ` Christoph Lameter
2009-10-13  2:13       ` Tejun Heo
2009-10-13 14:41         ` Christoph Lameter
2009-10-13 14:56           ` Tejun Heo
2009-10-13 15:20             ` Christoph Lameter
2009-10-14  1:57               ` Tejun Heo
2009-10-14 14:14                 ` Christoph Lameter
2009-10-15  7:47                   ` Tejun Heo
2009-10-16 16:44                     ` Christoph Lameter
2009-10-18  3:11                       ` Tejun Heo
2009-10-07 21:10 ` [this_cpu_xx V6 4/7] SLUB: Get rid of dynamic DMA kmalloc cache allocation cl
2009-10-13 18:48   ` [FIX] patch "SLUB: Get rid of dynamic DMA kmalloc cache allocation" Christoph Lameter
2009-10-07 21:10 ` [this_cpu_xx V6 5/7] this_cpu: Remove slub kmem_cache fields cl
2009-10-07 23:10   ` Christoph Lameter
2009-10-07 21:10 ` [this_cpu_xx V6 6/7] Make slub statistics use this_cpu_inc cl
2009-10-07 21:10 ` [this_cpu_xx V6 7/7] this_cpu: slub aggressive use of this_cpu operations in the hotpaths cl
2009-10-12 10:40   ` Tejun Heo
2009-10-12 13:14     ` Pekka Enberg
2009-10-12 14:55       ` Christoph Lameter
2009-10-13  9:45       ` David Rientjes
2009-10-13 14:43         ` Christoph Lameter
2009-10-13 19:14           ` Christoph Lameter
2009-10-13 19:44             ` Pekka Enberg [this message]
2009-10-13 19:48               ` Christoph Lameter
2009-10-13 20:15                 ` David Rientjes
2009-10-13 20:28                   ` Christoph Lameter
2009-10-13 22:53                     ` David Rientjes
2009-10-14 13:34                       ` Mel Gorman
2009-10-14 14:08                         ` Christoph Lameter
2009-10-14 15:49                           ` Mel Gorman
2009-10-14 15:53                             ` Pekka Enberg
2009-10-14 15:56                               ` Christoph Lameter
2009-10-14 16:14                                 ` Pekka Enberg
2009-10-14 18:19                                   ` Christoph Lameter
2009-10-16 10:50                                 ` Mel Gorman
2009-10-16 18:40                                   ` David Rientjes
2009-10-15  9:03                         ` David Rientjes
2009-10-16 16:45                           ` Christoph Lameter
2009-10-16 18:43                             ` David Rientjes
2009-10-16 18:50                               ` Christoph Lameter
2009-10-13 20:25               ` Christoph Lameter
2009-10-14  1:33           ` David Rientjes
2009-10-13 15:40 ` [this_cpu_xx V6 0/7] Introduce per cpu atomic operations and avoid per cpu address arithmetic Mel Gorman
2009-10-13 15:45   ` Christoph Lameter
2009-10-13 16:09     ` Mel Gorman
2009-10-13 17:17       ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AD4D8B6.6010700@cs.helsinki.fi \
    --to=penberg@cs.helsinki.fi \
    --cc=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mel@csn.ul.ie \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox