Re: performance counter 20% error finding retired instruction count

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@elte.hu>
To: Vince Weaver <vince@deater.net>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Paul Mackerras <paulus@samba.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: performance counter 20% error finding retired instruction count
Date: Wed, 24 Jun 2009 17:10:10 +0200	[thread overview]
Message-ID: <20090624151010.GA12799@elte.hu> (raw)
In-Reply-To: <Pine.LNX.4.64.0906240937120.10620@pianoman.cluster.toy>


* Vince Weaver <vince@deater.net> wrote:

> Hello
>
> As an aside, is it time to set up a dedicated Performance Counters
> for Linux mailing list?   (Hereafter referred to as p10c7l to avoid
> confusion with the other implementations that have already taken
> all the good abbreviated forms of the concept).

('perfcounters' is the name of the subsystem/feature and it's 
unique.)

> [...]  If/when the infrastructure appears in a released kernel, 
> there's going to be a lot of chatter by people who use performance 
> counters and suddenly find they are stuck with a huge step 
> backwards in functionality.  And asking Fortran programmers to 
> provide kernel patches probably won't be a productive response.  
> But I digress.
>
> I was trying to get an exact retired instruction count from 
> p10c7l. I am using the test million.s, available here
>
>  ( http://www.csl.cornell.edu/~vince/projects/perf_counter/million.s )
>
> It should count exactly one million instructions.
>
> Tests with valgrind and qemu show that it does.
>
> Using perfmon2 on Pentium Pro, PII, PIII, P4, Athlon32, and Phenom
> all give the proper result:
>
> tobler:~% pfmon -e retired_instructions ./million
> 1000002 RETIRED_INSTRUCTIONS
>
>    ( it is 1,000,002 +/-2 because on most x86 architectures retired
>      instruction count includes any hardware interrupts that might
>      happen at the time.  It woud be a great feature if p10c7l
>      could add some way of gathering the per-process hardware
>      instruction count statistic to help quantify that).
>
> Yet with perf on the same Athlon32 machine (using
> kernel 2.6.30-03984-g45e3e19) gives:
>
> tobler:~%perf stat ./million
>
>  Performance counter stats for './million':
>
>        1.519366  task-clock-ticks     #       0.835 CPU utilization factor
>               3  context-switches     #       0.002 M/sec
>               0  CPU-migrations       #       0.000 M/sec
>              53  page-faults          #       0.035 M/sec
>         2483822  cycles               #    1634.775 M/sec
>         1240849  instructions         #     816.689 M/sec # 0.500 per cycle
>          612685  cache-references     #     403.250 M/sec
>            3564  cache-misses         #       2.346 M/sec
>
>  Wall-clock time elapsed:     1.819226 msecs
>
> Running multiple times gives:
>    1240849
>    1257312
>    1242313
>
> Which is a varying error of at least 20% which isn't even 
> consistent.  Is this because of sampling?  The documentation 
> doesn't really warn about this as far as I can tell.
>
> Thanks for any help resolving this problem

Thanks for the question! There's still gaps in the documentation so 
let me explain the basics here:

'perf stat' counts the true cost of executing the command in 
question, including the costs of:

   fork()ing the task
   exec()-ing it
   the ELF loader resolving dynamic symbols
   the app hitting various pagefaults that instantiate its pagetables

etc.

Those operations are pretty 'noisy' on a typical CPU, with lots of 
cache effects, so the noise you see is real.

You can eliminate much of the noise by only counting user-space 
instructions, as much of the command startup cost is in 
kernel-space.

Running your test-app that way can be done the following way:

 $ perf stat --repeat 10 -e 0:1:u ./million

 Performance counter stats for './million' (10 runs):

        1002106  instructions           ( +-   0.015% )

    0.000599029  seconds time elapsed.

( note the --repeat feature of perf stat - it does a loop of command 
  executions and observes the noise and displays it. )

Those ~2100 instructions are executed by your app: as the ELF 
dynamic loader starts up your test-app.

If you have some tool that reports less than that then that tool is 
not being truthful about the true overhead of your application.

Also note that applications that only execute 1 million instructions 
are very, very rare - a modern CPU can execute billions of 
instructions, per second, per core.

So i usually test a reference app that is more realistic, that 
executes 1 billion instructions:

 $ perf stat --repeat 10 -e 0:1:u ./loop_1b_instructions

 Performance counter stats for './loop_1b_instructions' (10 runs):

     1000079797  instructions           ( +-   0.000% )

    0.239947420  seconds time elapsed.

the noise there is very low. (despite 230 milliseconds still being a 
very short runtime)

Hope this helps - thanks,

	Ingo

next prev parent reply	other threads:[~2009-06-24 15:10 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-24 13:59 performance counter 20% error finding retired instruction count Vince Weaver
2009-06-24 15:10 ` Ingo Molnar [this message]
2009-06-25  2:12   ` Vince Weaver
2009-06-25  6:50     ` Peter Zijlstra
2009-06-25  9:13       ` Ingo Molnar
2009-06-26 18:22   ` Vince Weaver
2009-06-26 19:12     ` Peter Zijlstra
2009-06-27  5:32       ` Ingo Molnar
2009-06-26 19:23     ` Vince Weaver
2009-06-27  6:04       ` performance counter ~0.4% " Ingo Molnar
2009-06-27  6:44         ` [numbers] perfmon/pfmon overhead of 17%-94% Ingo Molnar
2009-06-29 18:25           ` Vince Weaver
2009-06-29 21:02             ` Ingo Molnar
2009-07-02 21:07               ` Vince Weaver
2009-07-03  7:58                 ` Ingo Molnar
2009-07-03 21:43                   ` Vince Weaver
2009-07-03 18:31                 ` Andi Kleen
2009-07-03 21:25                   ` Vince Weaver
2009-07-03 23:40                     ` Andi Kleen
2009-06-29 23:46             ` [patch] perf_counter: Add enable-on-exec attribute Ingo Molnar
2009-06-29 23:55             ` [numbers] perfmon/pfmon overhead of 17%-94% Ingo Molnar
2009-06-30  0:05             ` Ingo Molnar
2009-06-27  6:48         ` performance counter ~0.4% error finding retired instruction count Paul Mackerras
2009-06-27 17:28           ` Ingo Molnar
2009-06-29  2:12             ` Paul Mackerras
2009-06-29  2:13               ` Paul Mackerras
2009-06-29  3:48               ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090624151010.GA12799@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=vince@deater.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox