public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	hpa@zytor.com, jeremy@xensource.com, chrisw@sous-sol.org,
	zach@vmware.com, rusty@rustcorp.com.au,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT
Date: Tue, 20 Jan 2009 15:03:24 +0100	[thread overview]
Message-ID: <20090120140324.GA26424@elte.hu> (raw)
In-Reply-To: <20090120112634.GA20858@elte.hu>


* Ingo Molnar <mingo@elte.hu> wrote:

> > Times I believe are in nanoseconds for lmbench, anyway lower is 
> > better.
> > 
> > non pv   AVG=464.22 STD=5.56
> > paravirt AVG=502.87 STD=7.36
> > 
> > Nearly 10% performance drop here, which is quite a bit... hopefully 
> > people are testing the speed of their PV implementations against 
> > non-PV bare metal :)
> 
> Ouch, that looks unacceptably expensive. All the major distros turn 
> CONFIG_PARAVIRT on. paravirt_ops was introduced in x86 with the express 
> promise to have no measurable runtime overhead.

Here are some more precise stats done via hw counters on a perfcounters 
kernel using 'timec', running a modified version of the 'mmap performance 
stress-test' app i made years ago.

The MM benchmark app can be downloaded from:

   http://redhat.com/~mingo/misc/mmap-perf.c

timec.c can be picked up from:

   http://redhat.com/~mingo/perfcounters/timec.c

mmap-perf conducts 1 million mmap()/munmap()/mremap() calls, and touches 
the mapped area as well with a certain chance. The patterns are 
pseudo-random and the random seed is initialized to the same value so 
repeated runs produce the exact same mmap sequence.

I ran the test with a single thread and bound to a single core:

  # taskset 2 timec -e -5,-4,-3,0,1,2,3 ./mmap-perf 1

[ I ran it as root - so that kernel-space hardware-counter statistics are 
  included as well. ]

The results are quite surprisingly candid about the true costs of 
paravirt_ops on the native kernel's overhead (CONFIG_PARAVIRT=y):

-----------------------------------------------
| Performance counter stats for './mmap-perf' |
-----------------------------------------------
|                |
|  x86-defconfig |   PARAVIRT=y         
|------------------------------------------------------------------
|
|    1311.554526 |  1360.624932  task clock ticks (msecs)    +3.74%
|                |
|              1 |            1  CPU migrations
|             91 |           79  context switches
|          55945 |        55943  pagefaults
|    ............................................
|     3781392474 |   3918777174  CPU cycles                  +3.63%
|     1957153827 |   2161280486  instructions               +10.43%
|       50234816 |     51303520  cache references            +2.12%
|        5428258 |      5583728  cache misses                +2.86%
|                |
|    1314.782469 |  1363.694447  time elapsed (msecs)        +3.72%
|                |
-----------------------------------

The most surprising element is that in the paravirt_ops case we run 204 
million more instructions - out of the ~2000 million instructions total. 

That's an increase of over 10%!

That shows the expected $cache risks here as well: i ran this on an 
Extreme Edition CPU which has a ton of L2 cache [4MB] which mutes L2 
$cache misses quite a bit.

Note that this workload tests a broader range of MM related codepaths - 
not just pure pagefault costs.

	Ingo

ps. Measurement methodology:

    The software counters show that the test was indeed done on an idle 
    system: there are no CPU migrations (the task is affine), nor any 
    significant context-switches, and the pagefault count is essentially 
    the same as well. (because this is a fully repeatable workload.)

    The numbers are a representative sample from a run of more than 10 
    testruns, on an otherwise idle system. Measurement noise is very low:

      3906920196  CPU cycles           (events)
      3907556124  CPU cycles           (events)
      3907902335  CPU cycles           (events)
      3914423870  CPU cycles           (events)
      3915642464  CPU cycles           (events)
      3916134988  CPU cycles           (events)
      3916840093  CPU cycles           (events)
      3918777174  CPU cycles           (events)
      3918993251  CPU cycles           (events)
      3919907192  CPU cycles           (events)

    The max/min spread of 10 runs is 0.3%, so the precision of this 
    measurement is in the 0.1% range - more than enough to be conclusive.

    The max/min spread of the instruction counts is even better: in the 
    0.01% range. (that is because exactly the same workload is executed - 
    only timer IRQs and small disturbances cause noise here.)

  parent reply	other threads:[~2009-01-20 14:03 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-20 11:05 lmbench lat_mmap slowdown with CONFIG_PARAVIRT Nick Piggin
2009-01-20 11:26 ` Ingo Molnar
2009-01-20 12:34   ` Nick Piggin
2009-01-20 12:45     ` Ingo Molnar
2009-01-20 13:41       ` Nick Piggin
2009-01-20 14:03   ` Ingo Molnar [this message]
2009-01-20 14:14     ` Nick Piggin
2009-01-20 14:17       ` Ingo Molnar
2009-01-20 14:41         ` Nick Piggin
2009-01-20 15:00           ` Ingo Molnar
2009-01-20 15:13     ` Ingo Molnar
2009-01-20 19:37     ` Ingo Molnar
2009-01-20 20:45     ` Jeremy Fitzhardinge
2009-01-20 20:56       ` Ingo Molnar
2009-01-21  7:27         ` Nick Piggin
2009-01-21 22:23           ` Jeremy Fitzhardinge
2009-01-22 22:28             ` Zachary Amsden
2009-01-22 22:44               ` Jeremy Fitzhardinge
2009-01-22 22:49                 ` H. Peter Anvin
2009-01-22 22:58                   ` Zachary Amsden
2009-01-22 23:52                     ` H. Peter Anvin
2009-01-23  0:08                       ` Jeremy Fitzhardinge
2009-01-22 22:55                 ` Zachary Amsden
2009-01-23  0:14                   ` Jeremy Fitzhardinge
2009-01-27  7:59                     ` Ingo Molnar
2009-01-27  8:24                       ` Jeremy Fitzhardinge
2009-01-27 10:17                       ` Jeremy Fitzhardinge
2009-01-20 19:05   ` Zachary Amsden
2009-01-20 19:31     ` Ingo Molnar
2009-01-22 22:26   ` Jeremy Fitzhardinge
2009-01-22 23:04     ` Ingo Molnar
2009-01-22 23:30       ` Zachary Amsden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090120140324.GA26424@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=chrisw@sous-sol.org \
    --cc=hpa@zytor.com \
    --cc=jeremy@xensource.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=rusty@rustcorp.com.au \
    --cc=torvalds@linux-foundation.org \
    --cc=zach@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox