public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* perf stat output
@ 2009-11-12 22:03 Lucas De Marchi
  2009-11-13  8:07 ` Peter Zijlstra
  2009-11-15  9:13 ` Ingo Molnar
  0 siblings, 2 replies; 7+ messages in thread
From: Lucas De Marchi @ 2009-11-12 22:03 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo; +Cc: linux-kernel

Hi all!

Some questions about perf stat output. See example:


lucas@LMS-linux:~/programming/testprograms> perf stat -e
L1-dcache-loads -e L1-dcache-load-misses -- make -j
gcc test_schedchanges.c -o test_schedchanges
gcc -pthread test_taskaff1.c -o test_taskaff1
gcc -pthread test_taskaff2.c -o test_taskaff2
gcc -pthread test_taskaff3.c -o test_taskaff3

 Performance counter stats for 'make -j':

       161384667  L1-dcache-loads          #      0.000 M/sec
              24853791  L1-dcache-load-misses    #      0.000 M/sec

	          0.066893389  seconds time elapsed

Why do we have both L1-dcache-loads and  L1-dcache-load-misses with
0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e
cache-misses -e cache-references" but values different than 0 when
running "perf stat -a" without selecting the events?

The last question: what does the  "scaled from X%" mean? Is it related
to the maximum number of performance registers a processor can count
at a time?


Thanks

Lucas De Marchi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf stat output
  2009-11-12 22:03 perf stat output Lucas De Marchi
@ 2009-11-13  8:07 ` Peter Zijlstra
  2009-11-13 14:46   ` Lucas De Marchi
  2009-11-15  9:13 ` Ingo Molnar
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2009-11-13  8:07 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel

On Thu, 2009-11-12 at 20:03 -0200, Lucas De Marchi wrote:
> Hi all!
> 
> Some questions about perf stat output. See example:
> 
> 
> lucas@LMS-linux:~/programming/testprograms> perf stat -e
> L1-dcache-loads -e L1-dcache-load-misses -- make -j
> gcc test_schedchanges.c -o test_schedchanges
> gcc -pthread test_taskaff1.c -o test_taskaff1
> gcc -pthread test_taskaff2.c -o test_taskaff2
> gcc -pthread test_taskaff3.c -o test_taskaff3
> 
>  Performance counter stats for 'make -j':
> 
>        161384667  L1-dcache-loads          #      0.000 M/sec
>               24853791  L1-dcache-load-misses    #      0.000 M/sec
> 
> 	          0.066893389  seconds time elapsed
> 
> Why do we have both L1-dcache-loads and  L1-dcache-load-misses with
> 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e
> cache-misses -e cache-references" but values different than 0 when
> running "perf stat -a" without selecting the events?

No idea, you'd have to look at the code computing this M/sec stuff. I
think Ingo wrote that, so he might have an idea.

> The last question: what does the  "scaled from X%" mean? Is it related
> to the maximum number of performance registers a processor can count
> at a time?

Yes, if the hardware has only 2 counters and you specify 4, we'll
round-robin those 4 onto the 2. In that case you'll see things like
scaled from ~50% because each counter will only have been on the actual
PMU for about 50% of the time.

(RR happens with tick granularity, so if your runtime is of that order
or shorter you can get funny results with some counters being 0).


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf stat output
  2009-11-13  8:07 ` Peter Zijlstra
@ 2009-11-13 14:46   ` Lucas De Marchi
  0 siblings, 0 replies; 7+ messages in thread
From: Lucas De Marchi @ 2009-11-13 14:46 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel

On Fri, Nov 13, 2009 at 06:07, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2009-11-12 at 20:03 -0200, Lucas De Marchi wrote:
>> Hi all!
>>
>> Some questions about perf stat output. See example:
>>
>>
>> lucas@LMS-linux:~/programming/testprograms> perf stat -e
>> L1-dcache-loads -e L1-dcache-load-misses -- make -j
>> gcc test_schedchanges.c -o test_schedchanges
>> gcc -pthread test_taskaff1.c -o test_taskaff1
>> gcc -pthread test_taskaff2.c -o test_taskaff2
>> gcc -pthread test_taskaff3.c -o test_taskaff3
>>
>>  Performance counter stats for 'make -j':
>>
>>        161384667  L1-dcache-loads          #      0.000 M/sec
>>               24853791  L1-dcache-load-misses    #      0.000 M/sec
>>
>>                 0.066893389  seconds time elapsed
>>
>> Why do we have both L1-dcache-loads and  L1-dcache-load-misses with
>> 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e
>> cache-misses -e cache-references" but values different than 0 when
>> running "perf stat -a" without selecting the events?
>
> No idea, you'd have to look at the code computing this M/sec stuff. I
> think Ingo wrote that, so he might have an idea.

But should it be, for example, L1-dcache-loads / seconds time elapsed,
or does it have another meaning?



Lucas De Marchi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf stat output
  2009-11-12 22:03 perf stat output Lucas De Marchi
  2009-11-13  8:07 ` Peter Zijlstra
@ 2009-11-15  9:13 ` Ingo Molnar
  2009-11-15 14:05   ` Lucas De Marchi
  1 sibling, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2009-11-15  9:13 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel


* Lucas De Marchi <lucas.de.marchi@gmail.com> wrote:

> Hi all!
> 
> Some questions about perf stat output. See example:
> 
> 
> lucas@LMS-linux:~/programming/testprograms> perf stat -e
> L1-dcache-loads -e L1-dcache-load-misses -- make -j
> gcc test_schedchanges.c -o test_schedchanges
> gcc -pthread test_taskaff1.c -o test_taskaff1
> gcc -pthread test_taskaff2.c -o test_taskaff2
> gcc -pthread test_taskaff3.c -o test_taskaff3
> 
>  Performance counter stats for 'make -j':
> 
>        161384667  L1-dcache-loads          #      0.000 M/sec
>               24853791  L1-dcache-load-misses    #      0.000 M/sec
> 
> 	          0.066893389  seconds time elapsed
> 
> Why do we have both L1-dcache-loads and L1-dcache-load-misses with 
> 0.000 M/sec? Also, why do we have 0 M/s when running "perf stat -a -e 
> cache-misses -e cache-references" but values different than 0 when 
> running "perf stat -a" without selecting the events?

You need the 'task-clock' event to be able to see M/sec metrics. I.e.:

$ perf stat -e L1-dcache-loads -e L1-dcache-load-misses -e task-clock sleep 1

 Performance counter stats for 'sleep 1':

         201330  L1-dcache-loads          #    566.234 M/sec
          29916  L1-dcache-load-misses    #     84.138 M/sec
       0.355560  task-clock-msecs         #      0.000 CPUs 

    1.000621650  seconds time elapsed

I agree with you that seeing '0.000 M/sec' is confusing and incorrect as 
well. One solution would be to skip the printout in that case.

You can find the latest 'perf' code in:

  http://people.redhat.com/mingo/tip.git/README

( the tools/perf/ bits are backwards compatible with any perf kernel you 
  are running right now, so no reboot is needed. )

You can find the stats printing in tools/perf/builtin-stat.c, in the 
abs_printout() function:

        } else {
                total = avg_stats(&runtime_nsecs_stats);

                if (total)
                        ratio = 1000.0 * avg / total;

                fprintf(stderr, " # %10.3f M/sec", ratio);

I think if runtime_nsecs_stats is zero (i.e. if no 'task-clock' events 
were measured), then we might be able to skip the printout via doing 
something like:

        } else if (runtime_nsecs_stats.n != 0) {

Would you be interested in sending a (tested) patch for that? In theory 
only that oneliner change should suffice - but i have not tested it.

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf stat output
  2009-11-15  9:13 ` Ingo Molnar
@ 2009-11-15 14:05   ` Lucas De Marchi
  2009-11-15 14:27     ` Ingo Molnar
  2009-11-15 14:30     ` [tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted tip-bot for Lucas De Marchi
  0 siblings, 2 replies; 7+ messages in thread
From: Lucas De Marchi @ 2009-11-15 14:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel

Ingo, following the tested patch you asked. It's a bit different of what you
sent because branch-misses is counted in another struct.


Lucas De Marchi

--
commit 70e98c814153f0026a683c5d19674ed8185d02fa
Author: Lucas De Marchi <lucas.de.marchi@gmail.com>
Date:   Sun Nov 15 11:20:56 2009 -0200

    Do not print ratio when task-clock event is not counted
    
    The ratio between the number of events and the time elapsed makes sense only
    if task-clock event is counted. Otherwise it will be simply a (confusing)
    
    	#      0.000 M/sec
    
    This patch outputs the ratio only if task-clock event is counted. Some test
    examples of before and after:
    
    Before:
    
    	[lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1
    
    	 Performance counter stats for 'sleep 1':
    
    		1367818  branch-misses            #      0.000 M/sec
    
    	    1.001494325  seconds time elapsed
    
    After (without task-clock):
    
    	[lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1
    
    	 Performance counter stats for 'sleep 1':
    
    		1135044  branch-misses
    
    	    1.001370775  seconds time elapsed
    
    After (with task-clock):
    
    	[lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1
    
    	 Performance counter stats for 'sleep 1':
    
    		1070111  branch-misses            #      0.534 M/sec
    	    2002.730893  task-clock-msecs         #      1.999 CPUs
    
    	    1.001640292  seconds time elapsed
    
    Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c6df377..c70d720 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -357,7 +357,8 @@ static void abs_printout(int counter, double avg)
 			ratio = avg / total;
 
 		fprintf(stderr, " # %10.3f IPC  ", ratio);
-	} else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter)) {
+	} else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter) &&
+			runtime_branches_stats.n != 0) {
 		total = avg_stats(&runtime_branches_stats);
 
 		if (total)
@@ -365,7 +366,7 @@ static void abs_printout(int counter, double avg)
 
 		fprintf(stderr, " # %10.3f %%    ", ratio);
 
-	} else {
+	} else if (runtime_nsecs_stats.n != 0) {
 		total = avg_stats(&runtime_nsecs_stats);
 
 		if (total)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: perf stat output
  2009-11-15 14:05   ` Lucas De Marchi
@ 2009-11-15 14:27     ` Ingo Molnar
  2009-11-15 14:30     ` [tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted tip-bot for Lucas De Marchi
  1 sibling, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2009-11-15 14:27 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel


* Lucas De Marchi <lucas.de.marchi@gmail.com> wrote:

> Ingo, following the tested patch you asked. It's a bit different of 
> what you sent because branch-misses is counted in another struct.

Looks perfect - applied it to tip:perf/core for v2.6.33 merging, thanks 
Lucas!

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted
  2009-11-15 14:05   ` Lucas De Marchi
  2009-11-15 14:27     ` Ingo Molnar
@ 2009-11-15 14:30     ` tip-bot for Lucas De Marchi
  1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Lucas De Marchi @ 2009-11-15 14:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, acme, hpa, mingo, a.p.zijlstra, lucas.de.marchi,
	tglx, mingo

Commit-ID:  7255fe2a42c612f2b8fe4c347f0a5f0c97d85a46
Gitweb:     http://git.kernel.org/tip/7255fe2a42c612f2b8fe4c347f0a5f0c97d85a46
Author:     Lucas De Marchi <lucas.de.marchi@gmail.com>
AuthorDate: Sun, 15 Nov 2009 12:05:08 -0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sun, 15 Nov 2009 15:25:50 +0100

perf stat: Do not print ratio when task-clock event is not counted

The ratio between the number of events and the time elapsed makes
sense only if task-clock event is counted. Otherwise it will be
simply a (confusing)

	#      0.000 M/sec

This patch outputs the ratio only if task-clock event is counted.
Some test examples of before and after:

Before:

 [lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1

	 Performance counter stats for 'sleep 1':

		1367818  branch-misses            #      0.000 M/sec

	    1.001494325  seconds time elapsed

After (without task-clock):

 [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1

	 Performance counter stats for 'sleep 1':

		1135044  branch-misses

	    1.001370775  seconds time elapsed

After (with task-clock):

 [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1

	 Performance counter stats for 'sleep 1':

		1070111  branch-misses            #      0.534 M/sec
	    2002.730893  task-clock-msecs         #      1.999 CPUs

	    1.001640292  seconds time elapsed

Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <20091115140507.GB21561@skywalker.lan>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 tools/perf/builtin-stat.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c6df377..c70d720 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -357,7 +357,8 @@ static void abs_printout(int counter, double avg)
 			ratio = avg / total;
 
 		fprintf(stderr, " # %10.3f IPC  ", ratio);
-	} else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter)) {
+	} else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter) &&
+			runtime_branches_stats.n != 0) {
 		total = avg_stats(&runtime_branches_stats);
 
 		if (total)
@@ -365,7 +366,7 @@ static void abs_printout(int counter, double avg)
 
 		fprintf(stderr, " # %10.3f %%    ", ratio);
 
-	} else {
+	} else if (runtime_nsecs_stats.n != 0) {
 		total = avg_stats(&runtime_nsecs_stats);
 
 		if (total)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-11-15 14:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-12 22:03 perf stat output Lucas De Marchi
2009-11-13  8:07 ` Peter Zijlstra
2009-11-13 14:46   ` Lucas De Marchi
2009-11-15  9:13 ` Ingo Molnar
2009-11-15 14:05   ` Lucas De Marchi
2009-11-15 14:27     ` Ingo Molnar
2009-11-15 14:30     ` [tip:perf/core] perf stat: Do not print ratio when task-clock event is not counted tip-bot for Lucas De Marchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox