* perf stat output
@ 2009-11-12 22:03 Lucas De Marchi
2009-11-13 8:07 ` Peter Zijlstra
2009-11-15 9:13 ` Ingo Molnar
0 siblings, 2 replies; 7+ messages in thread
From: Lucas De Marchi @ 2009-11-12 22:03 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo; +Cc: linux-kernel
Hi all!
Some questions about perf stat output. See example:
lucas@LMS-linux:~/programming/testprograms> perf stat -e
L1-dcache-loads -e L1-dcache-load-misses -- make -j
gcc test_schedchanges.c -o test_schedchanges
gcc -pthread test_taskaff1.c -o test_taskaff1
gcc -pthread test_taskaff2.c -o test_taskaff2
gcc -pthread test_taskaff3.c -o test_taskaff3
Performance counter stats for 'make -j':
161384667 L1-dcache-loads # 0.000 M/sec
24853791 L1-dcache-load-misses # 0.000 M/sec
0.066893389 seconds time elapsed
Why do we have both L1-dcache-loads and L1-dcache-load-misses with
0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e
cache-misses -e cache-references" but values different than 0 when
running "perf stat -a" without selecting the events?
The last question: what does the "scaled from X%" mean? Is it related
to the maximum number of performance registers a processor can count
at a time?
Thanks
Lucas De Marchi
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: perf stat output 2009-11-12 22:03 perf stat output Lucas De Marchi @ 2009-11-13 8:07 ` Peter Zijlstra 2009-11-13 14:46 ` Lucas De Marchi 2009-11-15 9:13 ` Ingo Molnar 1 sibling, 1 reply; 7+ messages in thread From: Peter Zijlstra @ 2009-11-13 8:07 UTC (permalink / raw) To: Lucas De Marchi; +Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel On Thu, 2009-11-12 at 20:03 -0200, Lucas De Marchi wrote: > Hi all! > > Some questions about perf stat output. See example: > > > lucas@LMS-linux:~/programming/testprograms> perf stat -e > L1-dcache-loads -e L1-dcache-load-misses -- make -j > gcc test_schedchanges.c -o test_schedchanges > gcc -pthread test_taskaff1.c -o test_taskaff1 > gcc -pthread test_taskaff2.c -o test_taskaff2 > gcc -pthread test_taskaff3.c -o test_taskaff3 > > Performance counter stats for 'make -j': > > 161384667 L1-dcache-loads # 0.000 M/sec > 24853791 L1-dcache-load-misses # 0.000 M/sec > > 0.066893389 seconds time elapsed > > Why do we have both L1-dcache-loads and L1-dcache-load-misses with > 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e > cache-misses -e cache-references" but values different than 0 when > running "perf stat -a" without selecting the events? No idea, you'd have to look at the code computing this M/sec stuff. I think Ingo wrote that, so he might have an idea. > The last question: what does the "scaled from X%" mean? Is it related > to the maximum number of performance registers a processor can count > at a time? Yes, if the hardware has only 2 counters and you specify 4, we'll round-robin those 4 onto the 2. In that case you'll see things like scaled from ~50% because each counter will only have been on the actual PMU for about 50% of the time. (RR happens with tick granularity, so if your runtime is of that order or shorter you can get funny results with some counters being 0). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: perf stat output 2009-11-13 8:07 ` Peter Zijlstra @ 2009-11-13 14:46 ` Lucas De Marchi 0 siblings, 0 replies; 7+ messages in thread From: Lucas De Marchi @ 2009-11-13 14:46 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel On Fri, Nov 13, 2009 at 06:07, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, 2009-11-12 at 20:03 -0200, Lucas De Marchi wrote: >> Hi all! >> >> Some questions about perf stat output. See example: >> >> >> lucas@LMS-linux:~/programming/testprograms> perf stat -e >> L1-dcache-loads -e L1-dcache-load-misses -- make -j >> gcc test_schedchanges.c -o test_schedchanges >> gcc -pthread test_taskaff1.c -o test_taskaff1 >> gcc -pthread test_taskaff2.c -o test_taskaff2 >> gcc -pthread test_taskaff3.c -o test_taskaff3 >> >> Performance counter stats for 'make -j': >> >> 161384667 L1-dcache-loads # 0.000 M/sec >> 24853791 L1-dcache-load-misses # 0.000 M/sec >> >> 0.066893389 seconds time elapsed >> >> Why do we have both L1-dcache-loads and L1-dcache-load-misses with >> 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e >> cache-misses -e cache-references" but values different than 0 when >> running "perf stat -a" without selecting the events? > > No idea, you'd have to look at the code computing this M/sec stuff. I > think Ingo wrote that, so he might have an idea. But should it be, for example, L1-dcache-loads / seconds time elapsed, or does it have another meaning? Lucas De Marchi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: perf stat output 2009-11-12 22:03 perf stat output Lucas De Marchi 2009-11-13 8:07 ` Peter Zijlstra @ 2009-11-15 9:13 ` Ingo Molnar 2009-11-15 14:05 ` Lucas De Marchi 1 sibling, 1 reply; 7+ messages in thread From: Ingo Molnar @ 2009-11-15 9:13 UTC (permalink / raw) To: Lucas De Marchi; +Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel * Lucas De Marchi <lucas.de.marchi@gmail.com> wrote: > Hi all! > > Some questions about perf stat output. See example: > > > lucas@LMS-linux:~/programming/testprograms> perf stat -e > L1-dcache-loads -e L1-dcache-load-misses -- make -j > gcc test_schedchanges.c -o test_schedchanges > gcc -pthread test_taskaff1.c -o test_taskaff1 > gcc -pthread test_taskaff2.c -o test_taskaff2 > gcc -pthread test_taskaff3.c -o test_taskaff3 > > Performance counter stats for 'make -j': > > 161384667 L1-dcache-loads # 0.000 M/sec > 24853791 L1-dcache-load-misses # 0.000 M/sec > > 0.066893389 seconds time elapsed > > Why do we have both L1-dcache-loads and L1-dcache-load-misses with > 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e > cache-misses -e cache-references" but values different than 0 when > running "perf stat -a" without selecting the events? You need the 'task-clock' event to be able to see M/sec metrics. I.e.: $ perf stat -e L1-dcache-loads -e L1-dcache-load-misses -e task-clock sleep 1 Performance counter stats for 'sleep 1': 201330 L1-dcache-loads # 566.234 M/sec 29916 L1-dcache-load-misses # 84.138 M/sec 0.355560 task-clock-msecs # 0.000 CPUs 1.000621650 seconds time elapsed I agree with you that seeing '0.000 M/sec' is confusing and incorrect as well. One solution would be to skip the printout in that case. You can find the latest 'perf' code in: http://people.redhat.com/mingo/tip.git/README ( the tools/perf/ bits are backwards compatible with any perf kernel you are running right now, so no reboot is needed. ) You can find the stats printing in tools/perf/builtin-stat.c, in the abs_printout() function: } else { total = avg_stats(&runtime_nsecs_stats); if (total) ratio = 1000.0 * avg / total; fprintf(stderr, " # %10.3f M/sec", ratio); I think if runtime_nsecs_stats is zero (i.e. if no 'task-clock' events were measured), then we might be able to skip the printout via doing something like: } else if (runtime_nsecs_stats.n != 0) { Would you be interested in sending a (tested) patch for that? In theory only that oneliner change should suffice - but i have not tested it. Ingo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: perf stat output 2009-11-15 9:13 ` Ingo Molnar @ 2009-11-15 14:05 ` Lucas De Marchi 2009-11-15 14:27 ` Ingo Molnar 2009-11-15 14:30 ` [tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted tip-bot for Lucas De Marchi 0 siblings, 2 replies; 7+ messages in thread From: Lucas De Marchi @ 2009-11-15 14:05 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel Ingo, following the tested patch you asked. It's a bit different of what you sent because branch-misses is counted in another struct. Lucas De Marchi -- commit 70e98c814153f0026a683c5d19674ed8185d02fa Author: Lucas De Marchi <lucas.de.marchi@gmail.com> Date: Sun Nov 15 11:20:56 2009 -0200 Do not print ratio when task-clock event is not counted The ratio between the number of events and the time elapsed makes sense only if task-clock event is counted. Otherwise it will be simply a (confusing) # 0.000 M/sec This patch outputs the ratio only if task-clock event is counted. Some test examples of before and after: Before: [lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1 Performance counter stats for 'sleep 1': 1367818 branch-misses # 0.000 M/sec 1.001494325 seconds time elapsed After (without task-clock): [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1 Performance counter stats for 'sleep 1': 1135044 branch-misses 1.001370775 seconds time elapsed After (with task-clock): [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1 Performance counter stats for 'sleep 1': 1070111 branch-misses # 0.534 M/sec 2002.730893 task-clock-msecs # 1.999 CPUs 1.001640292 seconds time elapsed Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index c6df377..c70d720 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -357,7 +357,8 @@ static void abs_printout(int counter, double avg) ratio = avg / total; fprintf(stderr, " # %10.3f IPC ", ratio); - } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter)) { + } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter) && + runtime_branches_stats.n != 0) { total = avg_stats(&runtime_branches_stats); if (total) @@ -365,7 +366,7 @@ static void abs_printout(int counter, double avg) fprintf(stderr, " # %10.3f %% ", ratio); - } else { + } else if (runtime_nsecs_stats.n != 0) { total = avg_stats(&runtime_nsecs_stats); if (total) ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: perf stat output 2009-11-15 14:05 ` Lucas De Marchi @ 2009-11-15 14:27 ` Ingo Molnar 2009-11-15 14:30 ` [tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted tip-bot for Lucas De Marchi 1 sibling, 0 replies; 7+ messages in thread From: Ingo Molnar @ 2009-11-15 14:27 UTC (permalink / raw) To: Lucas De Marchi; +Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel * Lucas De Marchi <lucas.de.marchi@gmail.com> wrote: > Ingo, following the tested patch you asked. It's a bit different of > what you sent because branch-misses is counted in another struct. Looks perfect - applied it to tip:perf/core for v2.6.33 merging, thanks Lucas! Ingo ^ permalink raw reply [flat|nested] 7+ messages in thread
* [tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted 2009-11-15 14:05 ` Lucas De Marchi 2009-11-15 14:27 ` Ingo Molnar @ 2009-11-15 14:30 ` tip-bot for Lucas De Marchi 1 sibling, 0 replies; 7+ messages in thread From: tip-bot for Lucas De Marchi @ 2009-11-15 14:30 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, acme, hpa, mingo, a.p.zijlstra, lucas.de.marchi, tglx, mingo Commit-ID: 7255fe2a42c612f2b8fe4c347f0a5f0c97d85a46 Gitweb: http://git.kernel.org/tip/7255fe2a42c612f2b8fe4c347f0a5f0c97d85a46 Author: Lucas De Marchi <lucas.de.marchi@gmail.com> AuthorDate: Sun, 15 Nov 2009 12:05:08 -0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Sun, 15 Nov 2009 15:25:50 +0100 perf stat: Do not print ratio when task-clock event is not counted The ratio between the number of events and the time elapsed makes sense only if task-clock event is counted. Otherwise it will be simply a (confusing) # 0.000 M/sec This patch outputs the ratio only if task-clock event is counted. Some test examples of before and after: Before: [lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1 Performance counter stats for 'sleep 1': 1367818 branch-misses # 0.000 M/sec 1.001494325 seconds time elapsed After (without task-clock): [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1 Performance counter stats for 'sleep 1': 1135044 branch-misses 1.001370775 seconds time elapsed After (with task-clock): [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1 Performance counter stats for 'sleep 1': 1070111 branch-misses # 0.534 M/sec 2002.730893 task-clock-msecs # 1.999 CPUs 1.001640292 seconds time elapsed Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20091115140507.GB21561@skywalker.lan> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- tools/perf/builtin-stat.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index c6df377..c70d720 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -357,7 +357,8 @@ static void abs_printout(int counter, double avg) ratio = avg / total; fprintf(stderr, " # %10.3f IPC ", ratio); - } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter)) { + } else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter) && + runtime_branches_stats.n != 0) { total = avg_stats(&runtime_branches_stats); if (total) @@ -365,7 +366,7 @@ static void abs_printout(int counter, double avg) fprintf(stderr, " # %10.3f %% ", ratio); - } else { + } else if (runtime_nsecs_stats.n != 0) { total = avg_stats(&runtime_nsecs_stats); if (total) ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-11-15 14:31 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-11-12 22:03 perf stat output Lucas De Marchi 2009-11-13 8:07 ` Peter Zijlstra 2009-11-13 14:46 ` Lucas De Marchi 2009-11-15 9:13 ` Ingo Molnar 2009-11-15 14:05 ` Lucas De Marchi 2009-11-15 14:27 ` Ingo Molnar 2009-11-15 14:30 ` [tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted tip-bot for Lucas De Marchi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox