From: Ingo Molnar <mingo@elte.hu>
To: Vince Weaver <vince@deater.net>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Paul Mackerras <paulus@samba.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: performance counter 20% error finding retired instruction count
Date: Wed, 24 Jun 2009 17:10:10 +0200 [thread overview]
Message-ID: <20090624151010.GA12799@elte.hu> (raw)
In-Reply-To: <Pine.LNX.4.64.0906240937120.10620@pianoman.cluster.toy>
* Vince Weaver <vince@deater.net> wrote:
> Hello
>
> As an aside, is it time to set up a dedicated Performance Counters
> for Linux mailing list? (Hereafter referred to as p10c7l to avoid
> confusion with the other implementations that have already taken
> all the good abbreviated forms of the concept).
('perfcounters' is the name of the subsystem/feature and it's
unique.)
> [...] If/when the infrastructure appears in a released kernel,
> there's going to be a lot of chatter by people who use performance
> counters and suddenly find they are stuck with a huge step
> backwards in functionality. And asking Fortran programmers to
> provide kernel patches probably won't be a productive response.
> But I digress.
>
> I was trying to get an exact retired instruction count from
> p10c7l. I am using the test million.s, available here
>
> ( http://www.csl.cornell.edu/~vince/projects/perf_counter/million.s )
>
> It should count exactly one million instructions.
>
> Tests with valgrind and qemu show that it does.
>
> Using perfmon2 on Pentium Pro, PII, PIII, P4, Athlon32, and Phenom
> all give the proper result:
>
> tobler:~% pfmon -e retired_instructions ./million
> 1000002 RETIRED_INSTRUCTIONS
>
> ( it is 1,000,002 +/-2 because on most x86 architectures retired
> instruction count includes any hardware interrupts that might
> happen at the time. It woud be a great feature if p10c7l
> could add some way of gathering the per-process hardware
> instruction count statistic to help quantify that).
>
> Yet with perf on the same Athlon32 machine (using
> kernel 2.6.30-03984-g45e3e19) gives:
>
> tobler:~%perf stat ./million
>
> Performance counter stats for './million':
>
> 1.519366 task-clock-ticks # 0.835 CPU utilization factor
> 3 context-switches # 0.002 M/sec
> 0 CPU-migrations # 0.000 M/sec
> 53 page-faults # 0.035 M/sec
> 2483822 cycles # 1634.775 M/sec
> 1240849 instructions # 816.689 M/sec # 0.500 per cycle
> 612685 cache-references # 403.250 M/sec
> 3564 cache-misses # 2.346 M/sec
>
> Wall-clock time elapsed: 1.819226 msecs
>
> Running multiple times gives:
> 1240849
> 1257312
> 1242313
>
> Which is a varying error of at least 20% which isn't even
> consistent. Is this because of sampling? The documentation
> doesn't really warn about this as far as I can tell.
>
> Thanks for any help resolving this problem
Thanks for the question! There's still gaps in the documentation so
let me explain the basics here:
'perf stat' counts the true cost of executing the command in
question, including the costs of:
fork()ing the task
exec()-ing it
the ELF loader resolving dynamic symbols
the app hitting various pagefaults that instantiate its pagetables
etc.
Those operations are pretty 'noisy' on a typical CPU, with lots of
cache effects, so the noise you see is real.
You can eliminate much of the noise by only counting user-space
instructions, as much of the command startup cost is in
kernel-space.
Running your test-app that way can be done the following way:
$ perf stat --repeat 10 -e 0:1:u ./million
Performance counter stats for './million' (10 runs):
1002106 instructions ( +- 0.015% )
0.000599029 seconds time elapsed.
( note the --repeat feature of perf stat - it does a loop of command
executions and observes the noise and displays it. )
Those ~2100 instructions are executed by your app: as the ELF
dynamic loader starts up your test-app.
If you have some tool that reports less than that then that tool is
not being truthful about the true overhead of your application.
Also note that applications that only execute 1 million instructions
are very, very rare - a modern CPU can execute billions of
instructions, per second, per core.
So i usually test a reference app that is more realistic, that
executes 1 billion instructions:
$ perf stat --repeat 10 -e 0:1:u ./loop_1b_instructions
Performance counter stats for './loop_1b_instructions' (10 runs):
1000079797 instructions ( +- 0.000% )
0.239947420 seconds time elapsed.
the noise there is very low. (despite 230 milliseconds still being a
very short runtime)
Hope this helps - thanks,
Ingo
next prev parent reply other threads:[~2009-06-24 15:10 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-24 13:59 performance counter 20% error finding retired instruction count Vince Weaver
2009-06-24 15:10 ` Ingo Molnar [this message]
2009-06-25 2:12 ` Vince Weaver
2009-06-25 6:50 ` Peter Zijlstra
2009-06-25 9:13 ` Ingo Molnar
2009-06-26 18:22 ` Vince Weaver
2009-06-26 19:12 ` Peter Zijlstra
2009-06-27 5:32 ` Ingo Molnar
2009-06-26 19:23 ` Vince Weaver
2009-06-27 6:04 ` performance counter ~0.4% " Ingo Molnar
2009-06-27 6:44 ` [numbers] perfmon/pfmon overhead of 17%-94% Ingo Molnar
2009-06-29 18:25 ` Vince Weaver
2009-06-29 21:02 ` Ingo Molnar
2009-07-02 21:07 ` Vince Weaver
2009-07-03 7:58 ` Ingo Molnar
2009-07-03 21:43 ` Vince Weaver
2009-07-03 18:31 ` Andi Kleen
2009-07-03 21:25 ` Vince Weaver
2009-07-03 23:40 ` Andi Kleen
2009-06-29 23:46 ` [patch] perf_counter: Add enable-on-exec attribute Ingo Molnar
2009-06-29 23:55 ` [numbers] perfmon/pfmon overhead of 17%-94% Ingo Molnar
2009-06-30 0:05 ` Ingo Molnar
2009-06-27 6:48 ` performance counter ~0.4% error finding retired instruction count Paul Mackerras
2009-06-27 17:28 ` Ingo Molnar
2009-06-29 2:12 ` Paul Mackerras
2009-06-29 2:13 ` Paul Mackerras
2009-06-29 3:48 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090624151010.GA12799@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=paulus@samba.org \
--cc=vince@deater.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.