* perf stat output
@ 2009-11-12 22:03 Lucas De Marchi
2009-11-13 8:07 ` Peter Zijlstra
2009-11-15 9:13 ` Ingo Molnar
0 siblings, 2 replies; 7+ messages in thread
From: Lucas De Marchi @ 2009-11-12 22:03 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo; +Cc: linux-kernel
Hi all!
Some questions about perf stat output. See example:
lucas@LMS-linux:~/programming/testprograms> perf stat -e
L1-dcache-loads -e L1-dcache-load-misses -- make -j
gcc test_schedchanges.c -o test_schedchanges
gcc -pthread test_taskaff1.c -o test_taskaff1
gcc -pthread test_taskaff2.c -o test_taskaff2
gcc -pthread test_taskaff3.c -o test_taskaff3
Performance counter stats for 'make -j':
161384667 L1-dcache-loads # 0.000 M/sec
24853791 L1-dcache-load-misses # 0.000 M/sec
0.066893389 seconds time elapsed
Why do we have both L1-dcache-loads and L1-dcache-load-misses with
0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e
cache-misses -e cache-references" but values different than 0 when
running "perf stat -a" without selecting the events?
The last question: what does the "scaled from X%" mean? Is it related
to the maximum number of performance registers a processor can count
at a time?
Thanks
Lucas De Marchi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: perf stat output
2009-11-12 22:03 perf stat output Lucas De Marchi
@ 2009-11-13 8:07 ` Peter Zijlstra
2009-11-13 14:46 ` Lucas De Marchi
2009-11-15 9:13 ` Ingo Molnar
1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2009-11-13 8:07 UTC (permalink / raw)
To: Lucas De Marchi; +Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel
On Thu, 2009-11-12 at 20:03 -0200, Lucas De Marchi wrote:
> Hi all!
>
> Some questions about perf stat output. See example:
>
>
> lucas@LMS-linux:~/programming/testprograms> perf stat -e
> L1-dcache-loads -e L1-dcache-load-misses -- make -j
> gcc test_schedchanges.c -o test_schedchanges
> gcc -pthread test_taskaff1.c -o test_taskaff1
> gcc -pthread test_taskaff2.c -o test_taskaff2
> gcc -pthread test_taskaff3.c -o test_taskaff3
>
> Performance counter stats for 'make -j':
>
> 161384667 L1-dcache-loads # 0.000 M/sec
> 24853791 L1-dcache-load-misses # 0.000 M/sec
>
> 0.066893389 seconds time elapsed
>
> Why do we have both L1-dcache-loads and L1-dcache-load-misses with
> 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e
> cache-misses -e cache-references" but values different than 0 when
> running "perf stat -a" without selecting the events?
No idea, you'd have to look at the code computing this M/sec stuff. I
think Ingo wrote that, so he might have an idea.
> The last question: what does the "scaled from X%" mean? Is it related
> to the maximum number of performance registers a processor can count
> at a time?
Yes, if the hardware has only 2 counters and you specify 4, we'll
round-robin those 4 onto the 2. In that case you'll see things like
scaled from ~50% because each counter will only have been on the actual
PMU for about 50% of the time.
(RR happens with tick granularity, so if your runtime is of that order
or shorter you can get funny results with some counters being 0).
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: perf stat output
2009-11-13 8:07 ` Peter Zijlstra
@ 2009-11-13 14:46 ` Lucas De Marchi
0 siblings, 0 replies; 7+ messages in thread
From: Lucas De Marchi @ 2009-11-13 14:46 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel
On Fri, Nov 13, 2009 at 06:07, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2009-11-12 at 20:03 -0200, Lucas De Marchi wrote:
>> Hi all!
>>
>> Some questions about perf stat output. See example:
>>
>>
>> lucas@LMS-linux:~/programming/testprograms> perf stat -e
>> L1-dcache-loads -e L1-dcache-load-misses -- make -j
>> gcc test_schedchanges.c -o test_schedchanges
>> gcc -pthread test_taskaff1.c -o test_taskaff1
>> gcc -pthread test_taskaff2.c -o test_taskaff2
>> gcc -pthread test_taskaff3.c -o test_taskaff3
>>
>> Performance counter stats for 'make -j':
>>
>> 161384667 L1-dcache-loads # 0.000 M/sec
>> 24853791 L1-dcache-load-misses # 0.000 M/sec
>>
>> 0.066893389 seconds time elapsed
>>
>> Why do we have both L1-dcache-loads and L1-dcache-load-misses with
>> 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e
>> cache-misses -e cache-references" but values different than 0 when
>> running "perf stat -a" without selecting the events?
>
> No idea, you'd have to look at the code computing this M/sec stuff. I
> think Ingo wrote that, so he might have an idea.
But should it be, for example, L1-dcache-loads / seconds time elapsed,
or does it have another meaning?
Lucas De Marchi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: perf stat output
2009-11-12 22:03 perf stat output Lucas De Marchi
2009-11-13 8:07 ` Peter Zijlstra
@ 2009-11-15 9:13 ` Ingo Molnar
2009-11-15 14:05 ` Lucas De Marchi
1 sibling, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2009-11-15 9:13 UTC (permalink / raw)
To: Lucas De Marchi; +Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel
* Lucas De Marchi <lucas.de.marchi@gmail.com> wrote:
> Hi all!
>
> Some questions about perf stat output. See example:
>
>
> lucas@LMS-linux:~/programming/testprograms> perf stat -e
> L1-dcache-loads -e L1-dcache-load-misses -- make -j
> gcc test_schedchanges.c -o test_schedchanges
> gcc -pthread test_taskaff1.c -o test_taskaff1
> gcc -pthread test_taskaff2.c -o test_taskaff2
> gcc -pthread test_taskaff3.c -o test_taskaff3
>
> Performance counter stats for 'make -j':
>
> 161384667 L1-dcache-loads # 0.000 M/sec
> 24853791 L1-dcache-load-misses # 0.000 M/sec
>
> 0.066893389 seconds time elapsed
>
> Why do we have both L1-dcache-loads and L1-dcache-load-misses with
> 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e
> cache-misses -e cache-references" but values different than 0 when
> running "perf stat -a" without selecting the events?
You need the 'task-clock' event to be able to see M/sec metrics. I.e.:
$ perf stat -e L1-dcache-loads -e L1-dcache-load-misses -e task-clock sleep 1
Performance counter stats for 'sleep 1':
201330 L1-dcache-loads # 566.234 M/sec
29916 L1-dcache-load-misses # 84.138 M/sec
0.355560 task-clock-msecs # 0.000 CPUs
1.000621650 seconds time elapsed
I agree with you that seeing '0.000 M/sec' is confusing and incorrect as
well. One solution would be to skip the printout in that case.
You can find the latest 'perf' code in:
http://people.redhat.com/mingo/tip.git/README
( the tools/perf/ bits are backwards compatible with any perf kernel you
are running right now, so no reboot is needed. )
You can find the stats printing in tools/perf/builtin-stat.c, in the
abs_printout() function:
} else {
total = avg_stats(&runtime_nsecs_stats);
if (total)
ratio = 1000.0 * avg / total;
fprintf(stderr, " # %10.3f M/sec", ratio);
I think if runtime_nsecs_stats is zero (i.e. if no 'task-clock' events
were measured), then we might be able to skip the printout via doing
something like:
} else if (runtime_nsecs_stats.n != 0) {
Would you be interested in sending a (tested) patch for that? In theory
only that oneliner change should suffice - but i have not tested it.
Ingo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: perf stat output
2009-11-15 9:13 ` Ingo Molnar
@ 2009-11-15 14:05 ` Lucas De Marchi
2009-11-15 14:27 ` Ingo Molnar
2009-11-15 14:30 ` [tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted tip-bot for Lucas De Marchi
0 siblings, 2 replies; 7+ messages in thread
From: Lucas De Marchi @ 2009-11-15 14:05 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel
Ingo, following the tested patch you asked. It's a bit different of what you
sent because branch-misses is counted in another struct.
Lucas De Marchi
--
commit 70e98c814153f0026a683c5d19674ed8185d02fa
Author: Lucas De Marchi <lucas.de.marchi@gmail.com>
Date: Sun Nov 15 11:20:56 2009 -0200
Do not print ratio when task-clock event is not counted
The ratio between the number of events and the time elapsed makes sense only
if task-clock event is counted. Otherwise it will be simply a (confusing)
# 0.000 M/sec
This patch outputs the ratio only if task-clock event is counted. Some test
examples of before and after:
Before:
[lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1
Performance counter stats for 'sleep 1':
1367818 branch-misses # 0.000 M/sec
1.001494325 seconds time elapsed
After (without task-clock):
[lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1
Performance counter stats for 'sleep 1':
1135044 branch-misses
1.001370775 seconds time elapsed
After (with task-clock):
[lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1
Performance counter stats for 'sleep 1':
1070111 branch-misses # 0.534 M/sec
2002.730893 task-clock-msecs # 1.999 CPUs
1.001640292 seconds time elapsed
Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c6df377..c70d720 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -357,7 +357,8 @@ static void abs_printout(int counter, double avg)
ratio = avg / total;
fprintf(stderr, " # %10.3f IPC ", ratio);
- } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter)) {
+ } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter) &&
+ runtime_branches_stats.n != 0) {
total = avg_stats(&runtime_branches_stats);
if (total)
@@ -365,7 +366,7 @@ static void abs_printout(int counter, double avg)
fprintf(stderr, " # %10.3f %% ", ratio);
- } else {
+ } else if (runtime_nsecs_stats.n != 0) {
total = avg_stats(&runtime_nsecs_stats);
if (total)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: perf stat output
2009-11-15 14:05 ` Lucas De Marchi
@ 2009-11-15 14:27 ` Ingo Molnar
2009-11-15 14:30 ` [tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted tip-bot for Lucas De Marchi
1 sibling, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2009-11-15 14:27 UTC (permalink / raw)
To: Lucas De Marchi; +Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel
* Lucas De Marchi <lucas.de.marchi@gmail.com> wrote:
> Ingo, following the tested patch you asked. It's a bit different of
> what you sent because branch-misses is counted in another struct.
Looks perfect - applied it to tip:perf/core for v2.6.33 merging, thanks
Lucas!
Ingo
^ permalink raw reply [flat|nested] 7+ messages in thread
* [tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted
2009-11-15 14:05 ` Lucas De Marchi
2009-11-15 14:27 ` Ingo Molnar
@ 2009-11-15 14:30 ` tip-bot for Lucas De Marchi
1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Lucas De Marchi @ 2009-11-15 14:30 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, acme, hpa, mingo, a.p.zijlstra, lucas.de.marchi,
tglx, mingo
Commit-ID: 7255fe2a42c612f2b8fe4c347f0a5f0c97d85a46
Gitweb: http://git.kernel.org/tip/7255fe2a42c612f2b8fe4c347f0a5f0c97d85a46
Author: Lucas De Marchi <lucas.de.marchi@gmail.com>
AuthorDate: Sun, 15 Nov 2009 12:05:08 -0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Sun, 15 Nov 2009 15:25:50 +0100
perf stat: Do not print ratio when task-clock event is not counted
The ratio between the number of events and the time elapsed makes
sense only if task-clock event is counted. Otherwise it will be
simply a (confusing)
# 0.000 M/sec
This patch outputs the ratio only if task-clock event is counted.
Some test examples of before and after:
Before:
[lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1
Performance counter stats for 'sleep 1':
1367818 branch-misses # 0.000 M/sec
1.001494325 seconds time elapsed
After (without task-clock):
[lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1
Performance counter stats for 'sleep 1':
1135044 branch-misses
1.001370775 seconds time elapsed
After (with task-clock):
[lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1
Performance counter stats for 'sleep 1':
1070111 branch-misses # 0.534 M/sec
2002.730893 task-clock-msecs # 1.999 CPUs
1.001640292 seconds time elapsed
Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <20091115140507.GB21561@skywalker.lan>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
tools/perf/builtin-stat.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c6df377..c70d720 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -357,7 +357,8 @@ static void abs_printout(int counter, double avg)
ratio = avg / total;
fprintf(stderr, " # %10.3f IPC ", ratio);
- } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter)) {
+ } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter) &&
+ runtime_branches_stats.n != 0) {
total = avg_stats(&runtime_branches_stats);
if (total)
@@ -365,7 +366,7 @@ static void abs_printout(int counter, double avg)
fprintf(stderr, " # %10.3f %% ", ratio);
- } else {
+ } else if (runtime_nsecs_stats.n != 0) {
total = avg_stats(&runtime_nsecs_stats);
if (total)
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-11-15 14:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-12 22:03 perf stat output Lucas De Marchi
2009-11-13 8:07 ` Peter Zijlstra
2009-11-13 14:46 ` Lucas De Marchi
2009-11-15 9:13 ` Ingo Molnar
2009-11-15 14:05 ` Lucas De Marchi
2009-11-15 14:27 ` Ingo Molnar
2009-11-15 14:30 ` [tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted tip-bot for Lucas De Marchi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox