From: Ingo Molnar <mingo@elte.hu>
To: Vince Weaver <vince@deater.net>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Paul Mackerras <paulus@samba.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: performance counter 20% error finding retired instruction count
Date: Wed, 24 Jun 2009 17:10:10 +0200 [thread overview]
Message-ID: <20090624151010.GA12799@elte.hu> (raw)
In-Reply-To: <Pine.LNX.4.64.0906240937120.10620@pianoman.cluster.toy>
* Vince Weaver <vince@deater.net> wrote:
> Hello
>
> As an aside, is it time to set up a dedicated Performance Counters
> for Linux mailing list? (Hereafter referred to as p10c7l to avoid
> confusion with the other implementations that have already taken
> all the good abbreviated forms of the concept).
('perfcounters' is the name of the subsystem/feature and it's
unique.)
> [...] If/when the infrastructure appears in a released kernel,
> there's going to be a lot of chatter by people who use performance
> counters and suddenly find they are stuck with a huge step
> backwards in functionality. And asking Fortran programmers to
> provide kernel patches probably won't be a productive response.
> But I digress.
>
> I was trying to get an exact retired instruction count from
> p10c7l. I am using the test million.s, available here
>
> ( http://www.csl.cornell.edu/~vince/projects/perf_counter/million.s )
>
> It should count exactly one million instructions.
>
> Tests with valgrind and qemu show that it does.
>
> Using perfmon2 on Pentium Pro, PII, PIII, P4, Athlon32, and Phenom
> all give the proper result:
>
> tobler:~% pfmon -e retired_instructions ./million
> 1000002 RETIRED_INSTRUCTIONS
>
> ( it is 1,000,002 +/-2 because on most x86 architectures retired
> instruction count includes any hardware interrupts that might
> happen at the time. It woud be a great feature if p10c7l
> could add some way of gathering the per-process hardware
> instruction count statistic to help quantify that).
>
> Yet with perf on the same Athlon32 machine (using
> kernel 2.6.30-03984-g45e3e19) gives:
>
> tobler:~%perf stat ./million
>
> Performance counter stats for './million':
>
> 1.519366 task-clock-ticks # 0.835 CPU utilization factor
> 3 context-switches # 0.002 M/sec
> 0 CPU-migrations # 0.000 M/sec
> 53 page-faults # 0.035 M/sec
> 2483822 cycles # 1634.775 M/sec
> 1240849 instructions # 816.689 M/sec # 0.500 per cycle
> 612685 cache-references # 403.250 M/sec
> 3564 cache-misses # 2.346 M/sec
>
> Wall-clock time elapsed: 1.819226 msecs
>
> Running multiple times gives:
> 1240849
> 1257312
> 1242313
>
> Which is a varying error of at least 20% which isn't even
> consistent. Is this because of sampling? The documentation
> doesn't really warn about this as far as I can tell.
>
> Thanks for any help resolving this problem
Thanks for the question! There's still gaps in the documentation so
let me explain the basics here:
'perf stat' counts the true cost of executing the command in
question, including the costs of:
fork()ing the task
exec()-ing it
the ELF loader resolving dynamic symbols
the app hitting various pagefaults that instantiate its pagetables
etc.
Those operations are pretty 'noisy' on a typical CPU, with lots of
cache effects, so the noise you see is real.
You can eliminate much of the noise by only counting user-space
instructions, as much of the command startup cost is in
kernel-space.
Running your test-app that way can be done the following way:
$ perf stat --repeat 10 -e 0:1:u ./million
Performance counter stats for './million' (10 runs):
1002106 instructions ( +- 0.015% )
0.000599029 seconds time elapsed.
( note the --repeat feature of perf stat - it does a loop of command
executions and observes the noise and displays it. )
Those ~2100 instructions are executed by your app: as the ELF
dynamic loader starts up your test-app.
If you have some tool that reports less than that then that tool is
not being truthful about the true overhead of your application.
Also note that applications that only execute 1 million instructions
are very, very rare - a modern CPU can execute billions of
instructions, per second, per core.
So i usually test a reference app that is more realistic, that
executes 1 billion instructions:
$ perf stat --repeat 10 -e 0:1:u ./loop_1b_instructions
Performance counter stats for './loop_1b_instructions' (10 runs):
1000079797 instructions ( +- 0.000% )
0.239947420 seconds time elapsed.
the noise there is very low. (despite 230 milliseconds still being a
very short runtime)
Hope this helps - thanks,
Ingo
next prev parent reply other threads:[~2009-06-24 15:10 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-24 13:59 performance counter 20% error finding retired instruction count Vince Weaver
2009-06-24 15:10 ` Ingo Molnar [this message]
2009-06-25 2:12 ` Vince Weaver
2009-06-25 6:50 ` Peter Zijlstra
2009-06-25 9:13 ` Ingo Molnar
2009-06-26 18:22 ` Vince Weaver
2009-06-26 19:12 ` Peter Zijlstra
2009-06-27 5:32 ` Ingo Molnar
2009-06-26 19:23 ` Vince Weaver
2009-06-27 6:04 ` performance counter ~0.4% " Ingo Molnar
2009-06-27 6:44 ` [numbers] perfmon/pfmon overhead of 17%-94% Ingo Molnar
2009-06-29 18:25 ` Vince Weaver
2009-06-29 21:02 ` Ingo Molnar
2009-07-02 21:07 ` Vince Weaver
2009-07-03 7:58 ` Ingo Molnar
2009-07-03 21:43 ` Vince Weaver
2009-07-03 18:31 ` Andi Kleen
2009-07-03 21:25 ` Vince Weaver
2009-07-03 23:40 ` Andi Kleen
2009-06-29 23:46 ` [patch] perf_counter: Add enable-on-exec attribute Ingo Molnar
2009-06-29 23:55 ` [numbers] perfmon/pfmon overhead of 17%-94% Ingo Molnar
2009-06-30 0:05 ` Ingo Molnar
2009-06-27 6:48 ` performance counter ~0.4% error finding retired instruction count Paul Mackerras
2009-06-27 17:28 ` Ingo Molnar
2009-06-29 2:12 ` Paul Mackerras
2009-06-29 2:13 ` Paul Mackerras
2009-06-29 3:48 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090624151010.GA12799@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=paulus@samba.org \
--cc=vince@deater.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox