Re: perf tools miscellaneous questions

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: perf tools miscellaneous questions
       [not found]         ` <fa.4sHfhlc/fMhpYgKda4IfUHZ7jMY@ifi.uio.no>
@ 2010-11-05 12:38           ` Francis Moreau
  2010-11-05 14:02             ` Vince Weaver
  0 siblings, 1 reply; 16+ messages in thread
From: Francis Moreau @ 2010-11-05 12:38 UTC (permalink / raw)
  To: Victor Jimenez
  Cc: Reid Kleckner, Frederic Weisbecker, linux-kernel, Ingo Molnar,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Stephane Eranian,
	linux-perf-users

Victor Jimenez <victor.javier@bsc.es> writes:

[...]

> If you are measuring last level cache misses, I would recommend you to
> use a memory intensive application/benchmark instead of /bin/true, as
> otherwise there can be a significant variation between two runs.

I agree.

But still with intensive application, I got the same results:


  $ perf stat -r3 -e cache-misses:u gzip -9 -c vmlinux.o >/dev/null

   Performance counter stats for 'gzip -9 -c vmlinux.o' (3 runs):

               950704  cache-misses               ( +-  24.925% )

         82.619412905  seconds time elapsed   ( +-   0.072% )


  $ perf stat -r3 -e llc-load-misses:u,llc-store-misses:u gzip -9 -c vmlinux.o >/dev/null

   Performance counter stats for 'gzip -9 -c vmlinux.o' (3 runs):
  
               317054  LLC-load-misses            ( +-  11.758% )
               162634  LLC-store-misses           ( +-   9.700% )

         82.657099783  seconds time elapsed   ( +-   0.167% )

Thanks
-- 
Francis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: perf tools miscellaneous questions
  2010-11-05 12:38           ` perf tools miscellaneous questions Francis Moreau
@ 2010-11-05 14:02             ` Vince Weaver
  2010-11-06 14:44               ` Francis Moreau
  0 siblings, 1 reply; 16+ messages in thread
From: Vince Weaver @ 2010-11-05 14:02 UTC (permalink / raw)
  To: Francis Moreau
  Cc: Victor Jimenez, Reid Kleckner, Frederic Weisbecker, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Stephane Eranian, linux-perf-users

On Fri, 5 Nov 2010, Francis Moreau wrote:

> Victor Jimenez <victor.javier@bsc.es> writes:
> 
> [...]
> 
> > If you are measuring last level cache misses, I would recommend you to
> > use a memory intensive application/benchmark instead of /bin/true, as
> > otherwise there can be a significant variation between two runs.
> 
> I agree.
> 
> But still with intensive application, I got the same results:

you're going to need to get your architectural manual for your processor 
and use raw events (not the kernel default ones) if you really want to 
find out what's going on.  A tool like libpfm4 can help change the names 
to raw events for you.

Cache events are very tricky and they often don't return the values you 
expect.  Hardware prefetch can cause some very non-intuitive things to 
happen, and the prefetch only affects certain levels of cache.

Vince

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: perf tools miscellaneous questions
  2010-11-05 14:02             ` Vince Weaver
@ 2010-11-06 14:44               ` Francis Moreau
  2010-11-06 20:50                 ` Vince Weaver
  0 siblings, 1 reply; 16+ messages in thread
From: Francis Moreau @ 2010-11-06 14:44 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Victor Jimenez, Reid Kleckner, Frederic Weisbecker, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Stephane Eranian, linux-perf-users

Vince Weaver <vweaver1@eecs.utk.edu> writes:

> On Fri, 5 Nov 2010, Francis Moreau wrote:
>
>> Victor Jimenez <victor.javier@bsc.es> writes:
>> 
>> [...]
>> 
>> > If you are measuring last level cache misses, I would recommend you to
>> > use a memory intensive application/benchmark instead of /bin/true, as
>> > otherwise there can be a significant variation between two runs.
>> 
>> I agree.
>> 
>> But still with intensive application, I got the same results:
>
>
> you're going to need to get your architectural manual for your processor 
> and use raw events (not the kernel default ones) if you really want to 
> find out what's going on.  A tool like libpfm4 can help change the names 
> to raw events for you.

Ouch that's a bit rude for a man page ;)

Specially since 'llc-loads-misses' is and should be self speaking.

OK, my cpu is described by:

,----
|   vendor_id	: GenuineIntel
|   cpu family	: 6
|   model	: 15
|   model name	: Intel(R) Core(TM)2 CPU         T5500  @ 1.66GHz
`----

Could you point out the best architecture manual for it which describe
the raw events ?

BTW, I'm wondering if event names are coherent across the different
architectures supported by Linux.

Thanks
-- 
Francis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: perf tools miscellaneous questions
  2010-11-06 14:44               ` Francis Moreau
@ 2010-11-06 20:50                 ` Vince Weaver
  2010-11-06 20:52                   ` Vince Weaver
  2010-11-08 19:43                   ` Francis Moreau
  0 siblings, 2 replies; 16+ messages in thread
From: Vince Weaver @ 2010-11-06 20:50 UTC (permalink / raw)
  To: Francis Moreau
  Cc: Victor Jimenez, Reid Kleckner, Frederic Weisbecker, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Stephane Eranian, linux-perf-users

This is rapidly getting of topic, especially for linux-kernel

On Sat, 6 Nov 2010, Francis Moreau wrote:

> Specially since 'llc-loads-misses' is and should be self speaking.

Not necessarily.  Does "last level" mean the common L2 or the shared L3?
Do the misses count prefetch misses?  Do the misses count coherency
actions or else just "normal" cache accesses?  Does your processor count 
multiple loads from some single instructions [unfortunately, many do].

Most events are poorly documented, if at all.  And the Linux kernel 
predefined event list is loosely based upon the intel architectural
events, which not every processor has and I've heard from insiders saying 
that you should be very careful for the results from those events.  Also 
as far as I know there hasn't been much validation work on whether the 
events return useful values.  No chip company will guarantee the values 
returned by performance counters; they are more or less a bonus feature 
that works most of the time but you never really know the accuracy of what 
you are reading out of them.

> Could you point out the best architecture manual for it which describe
> the raw events ?

For your Core2 you want the Intel Software Developer's Manual, volume 2B.  
Google should find it.

> BTW, I'm wondering if event names are coherent across the different
> architectures supported by Linux.

Nope.  They aren't even consistent across the same chip company.  For 
example, Core2 and Nehalem have completely different event names, and even 
between Nehalem and Westmere there are incompatible changes.

Vince

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: perf tools miscellaneous questions
  2010-11-06 20:50                 ` Vince Weaver
@ 2010-11-06 20:52                   ` Vince Weaver
  2010-11-08 19:43                   ` Francis Moreau
  1 sibling, 0 replies; 16+ messages in thread
From: Vince Weaver @ 2010-11-06 20:52 UTC (permalink / raw)
  To: Francis Moreau
  Cc: Victor Jimenez, Reid Kleckner, Frederic Weisbecker, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Stephane Eranian, linux-perf-users

On Sat, 6 Nov 2010, Vince Weaver wrote:

> For your Core2 you want the Intel Software Developer's Manual, volume 2B.  
> Google should find it.

typo, sorry, I meant Volume 3B.

Vince

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: perf tools miscellaneous questions
  2010-11-06 20:50                 ` Vince Weaver
  2010-11-06 20:52                   ` Vince Weaver
@ 2010-11-08 19:43                   ` Francis Moreau
  1 sibling, 0 replies; 16+ messages in thread
From: Francis Moreau @ 2010-11-08 19:43 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Victor Jimenez, Reid Kleckner, Frederic Weisbecker, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Stephane Eranian, linux-perf-users

Vince Weaver <vweaver1@eecs.utk.edu> writes:

> This is rapidly getting of topic, especially for linux-kernel

Don't think so but feel free to remove LKML from Cc.

[...]

> Most events are poorly documented, if at all.  And the Linux kernel 
> predefined event list is loosely based upon the intel architectural
> events, which not every processor has and I've heard from insiders saying 
> that you should be very careful for the results from those events.

I agree, that's why I try to clarify some events.

Perf tools are cool stuffs, IMHO, but it's pretty hard for me to
interpret results. I tried to compare some numbers in my previous posts
but I got some 'random' figures for now.

Another example is given below where I'm trying to bench a 2 functions
which do the same thing but differently.

   $ perf stat -e cache-misses:u,l1d-loads-misses:u,cycles:u -p $(pgrep test)
     C-c C-c
    Performance counter stats for process id '30263':
   
                406532  cache-misses            
               4986030  L1-dcache-load-misses   
             120247366  cycles                  
   
           2.482196928  seconds time elapsed
   
   
   $ perf stat -e cache-misses:u,l1d-loads-misses:u,cycles:u -p $(pgrep test)
     C-c C-c
    Performance counter stats for process id '30271':
   
                459683  cache-misses            
               2513338  L1-dcache-load-misses   
             159968076  cycles                  
   
           2.129021265  seconds time elapsed

Which numbers are important here ? cache-misses ? L1-dcache-load-misses
?

I just can say that the first run looks faster.

-- 
Francis

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <fa.eqFHAk86WhpTuYclHhngn7QZr8Y@ifi.uio.no>]

[parent not found: <fa.X++YRAJg+rmPtm3nmroZZoNU7u8@ifi.uio.no>]

[parent not found: <fa.Elk8wOfBHRdKtkHIjP9hOtzVCgQ@ifi.uio.no>]

[parent not found: <fa.U3fdWUGXguguIIyQ4v66/uKOhus@ifi.uio.no>]

[parent not found: <fa.YTQQoGhoS6d3BaZXcZMN+l9TojQ@ifi.uio.no>]

* Re: perf tools miscellaneous questions
       [not found]       ` <fa.YTQQoGhoS6d3BaZXcZMN+l9TojQ@ifi.uio.no>
@ 2010-11-04 20:58         ` Francis Moreau
  2010-11-04 22:28           ` Victor Jimenez
  0 siblings, 1 reply; 16+ messages in thread
From: Francis Moreau @ 2010-11-04 20:58 UTC (permalink / raw)
  To: Reid Kleckner
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian, linux-perf-users

Francis Moreau <francis.moro@gmail.com> writes:

> Francis Moreau <francis.moro@gmail.com> writes:
>

[...]

>>
>> How could I know the number of cache level on my cpu ?
>>
>> I tried:
>>
>>   $ perf stat -e cache-misses:u,l1d-loads-misses:u true
>>
>>     Performance counter stats for 'true':
>>
>>                    802  cache-misses
>>                    937  L1-dcache-load-misses
>>
>>            0.000996578  seconds time elapsed
>>
>>   $ perf stat -e cache-misses:u,L2-loads-misses:u true
>>
>>    Performance counter stats for 'true':
>>
>>                   788  cache-misses
>>                    95  LLC-load-misses
>>
>>           0.001025423  seconds time elapsed
>>
>> So it looks like you're right: in my case I have this cache
>> configuration:
>>
>
> oops sorry, I replied too early...
>
> so my cache configuration is:
>
>    L1 -> L2 -> LLC
>
> where L2 misses is given by: 'cache-misses' - 'LLC-load-misses'
>
> Is that correct ?
>
> If so, I found 'cache-misses' term very not intuitive IMHO, probably
> because I'm not an expert in cpu caches...

Well thinking more about it, the above is wrong and I'm lost.

If 'cache-misses' is the last level cache misses then how to interpret
these results ?

  $ perf stat -e llc-load-misses:u,llc-store-misses:u true

   Performance counter stats for 'true':

                   94  LLC-load-misses         
                    0  LLC-store-misses        

          0.000981840  seconds time elapsed


  $ perf stat -e cache-misses:u true

   Performance counter stats for 'true':

                  796  cache-misses            

          0.001345136  seconds time elapsed

Here 'cache-misses' value is much more than llc misses one...

-- 
Francis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: perf tools miscellaneous questions
  2010-11-04 20:58         ` Francis Moreau
@ 2010-11-04 22:28           ` Victor Jimenez
  0 siblings, 0 replies; 16+ messages in thread
From: Victor Jimenez @ 2010-11-04 22:28 UTC (permalink / raw)
  To: Francis Moreau
  Cc: Reid Kleckner, Frederic Weisbecker, linux-kernel, Ingo Molnar,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Stephane Eranian,
	linux-perf-users

On 11/04/2010 09:58 PM, Francis Moreau wrote:
> Francis Moreau<francis.moro@gmail.com>  writes:
>
>> Francis Moreau<francis.moro@gmail.com>  writes:
>>
> [...]
>
>>> How could I know the number of cache level on my cpu ?
>>>
>>> I tried:
>>>
>>>    $ perf stat -e cache-misses:u,l1d-loads-misses:u true
>>>
>>>      Performance counter stats for 'true':
>>>
>>>                     802  cache-misses
>>>                     937  L1-dcache-load-misses
>>>
>>>             0.000996578  seconds time elapsed
>>>
>>>    $ perf stat -e cache-misses:u,L2-loads-misses:u true
>>>
>>>     Performance counter stats for 'true':
>>>
>>>                    788  cache-misses
>>>                     95  LLC-load-misses
>>>
>>>            0.001025423  seconds time elapsed
>>>
>>> So it looks like you're right: in my case I have this cache
>>> configuration:
>>>
>> oops sorry, I replied too early...
>>
>> so my cache configuration is:
>>
>>     L1 ->  L2 ->  LLC
>>
>> where L2 misses is given by: 'cache-misses' - 'LLC-load-misses'
>>
>> Is that correct ?
>>
>> If so, I found 'cache-misses' term very not intuitive IMHO, probably
>> because I'm not an expert in cpu caches...
> Well thinking more about it, the above is wrong and I'm lost.
>
> If 'cache-misses' is the last level cache misses then how to interpret
> these results ?
>
>    $ perf stat -e llc-load-misses:u,llc-store-misses:u true
>
>     Performance counter stats for 'true':
>
>                     94  LLC-load-misses
>                      0  LLC-store-misses
>
>            0.000981840  seconds time elapsed
>
>
>    $ perf stat -e cache-misses:u true
>
>     Performance counter stats for 'true':
>
>                    796  cache-misses
>
>            0.001345136  seconds time elapsed
>
> Here 'cache-misses' value is much more than llc misses one...
>
If you are measuring last level cache misses, I would recommend you to 
use a memory intensive application/benchmark instead of /bin/true, as 
otherwise there can be a significant variation between two runs.

Victor

-- 
------------------------------------------------------------------------

      Victor Jimenez Perez
      Barcelona Supercomputing Center
      Centro Nacional de Supercomputacion
      WWW: http://www.bsc.es
      e-mail: victor.javier@bsc.es

------------------------------------------------------------------------


WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer.htm

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <fa.yHA7Aw03llqLWxPVYRnHvK5/dT8@ifi.uio.no>]

[parent not found: <fa.13KEqWk+Dk+jLLdFlAoZtQ2Vjuw@ifi.uio.no>]

[parent not found: <fa.AGo9lmnVDcmFVUpOFG/kfd1aYfI@ifi.uio.no>]

* Re: perf tools miscellaneous questions
       [not found]   ` <fa.AGo9lmnVDcmFVUpOFG/kfd1aYfI@ifi.uio.no>
@ 2010-11-04  8:34     ` Francis Moreau
  0 siblings, 0 replies; 16+ messages in thread
From: Francis Moreau @ 2010-11-04  8:34 UTC (permalink / raw)
  To: Reid Kleckner
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian, linux-perf-users

Reid Kleckner <reid.kleckner@gmail.com> writes:

> On Wed, Nov 3, 2010 at 5:43 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>>> What's exactly the 'cache-misses' event ? does it include both instructions
>>> _and_ data cache misses ? both L1 and L2 caches ?
>>>
>
>>> I was expecting so but the following command makes me wondering:
>>>
>>>   $ perf stat -e cache-misses:u,l1d-loads-misses:u true
>>>     Performance counter stats for 'true':
>>>
>>>                 763  cache-misses
>>>                 874  L1-dcache-load-misses
>>>
>>>         0.000916609  seconds time elapsed
>>>
>>> Here cache-misses < L1-dcache-load-misses.
>>
>>
>>
>> Dunno, will let others answer.
>
> I think it corresponds to last level cache misses, which makes sense
> here.  The difference in the two numbers represents hits to L2 (and L3
> if it exists).

How could I know the number of cache level on my cpu ?

I tried:

  $ perf stat -e cache-misses:u,l1d-loads-misses:u true

    Performance counter stats for 'true':

                   802  cache-misses
                   937  L1-dcache-load-misses

           0.000996578  seconds time elapsed

  $ perf stat -e cache-misses:u,L2-loads-misses:u true

   Performance counter stats for 'true':

                  788  cache-misses
                   95  LLC-load-misses

          0.001025423  seconds time elapsed

So it looks like you're right: in my case I have this cache
configuration:










--
Francis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: perf tools miscellaneous questions
       [not found] ` <fa.13KEqWk+Dk+jLLdFlAoZtQ2Vjuw@ifi.uio.no>
       [not found]   ` <fa.AGo9lmnVDcmFVUpOFG/kfd1aYfI@ifi.uio.no>
@ 2010-11-04  8:52   ` Francis Moreau
  2010-11-07 21:40     ` Frederic Weisbecker
  1 sibling, 1 reply; 16+ messages in thread
From: Francis Moreau @ 2010-11-04  8:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian, linux-perf-users

Frederic Weisbecker <fweisbec@gmail.com> writes:

> On Wed, Nov 03, 2010 at 08:28:59PM +0100, Francis Moreau wrote:
>> Hello,
>> 
>> I'm trying to use perf-tools and also to learn some internals about
>> them. So I prefer to ask all of them in one email.
>> 
>> The first one is about the list of pre-defined events given by
>> perf-list. I couldn't find any documentations that describes these
>> events so excuse me if the question is stupid.
>
>
>
> Sorry about that. We indeed need to improve a lot the documentation.
> May be this particular part could come with the future sysfs exposure
> of the events.
>

No problem, but yes this part should be documented somewhere. And I
think the syntax of event too, specially the modifier like 'u' or 'p'.

>> 
>> What's the difference between 'cpu-clock' and 'task-clock' event ?
>
>
> cpu-clock is based on the total time spent on the cpu. task-clock is
> based only on the time spent on the profiled task, so that doesn't count
> time spent on other tasks, it has a per thread granularity.

Ok, so 'cpu-clock' could have been named 'proc-clock' even though a task
is a processus on Linux.


[...]

>> The last question is about the source code annotation done by
>> perf-report. I'm using it to locate the place in my code that generates
>> the most data cache miss events. I can read this during a perf-report
>> session:
>> 
>>    [...]
>>     0.00 :           df215:       c3                      retq
>>     0.00 :           df216:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
>>     0.00 :           df21d:       00 00 00
>>    10.00 :           df220:       48 8b 75 00             mov    0x0(%rbp),%rsi
>>    80.00 :           df224:       48 89 df                mov    %rbx,%rdi
>>     0.00 :           df227:       41 ff d4                callq  *%r12
>>     0.00 :           df22a:       85 c0                   test   %eax,%eax
>>    [...]
>> 
>> If I read the output correctly, most of the dcache misses are coming from
>> 'mov %rbx, %rdi', and AFAIK this intruction can't generate any dcache
>> miss. What am I missing ?
>
>
> Perhaps you need pebs to get the very precise location on your event.
>
> perf stat -e cache-misses:up,l1d-loads-misses:up true
>
>
> I think the more you add 'p', the more precise it is.
> Like:
>
> 	perf stat -e cache-misses:uppp,l1d-loads-misses:uppp true
>
> Not sure how much it will accept though :)

Well it doesn't want one actually:

  $ perf stat -v -e cache-misses:up true
  Error: counter 0, sys_perf_event_open() syscall returned with -1 (No
  space left on device)
  No permission to collect stats.
  Consider tweaking /proc/sys/kernel/perf_event_paranoid.

Where can I find a description of PEB ?

Thanks
-- 
Francis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: perf tools miscellaneous questions
  2010-11-04  8:52   ` Francis Moreau
@ 2010-11-07 21:40     ` Frederic Weisbecker
  2010-11-09 11:07       ` Francis Moreau
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2010-11-07 21:40 UTC (permalink / raw)
  To: Francis Moreau
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian, linux-perf-users

On Thu, Nov 04, 2010 at 09:52:09AM +0100, Francis Moreau wrote:
> Frederic Weisbecker <fweisbec@gmail.com> writes:
> 
> > On Wed, Nov 03, 2010 at 08:28:59PM +0100, Francis Moreau wrote:
> >> Hello,
> >> 
> >> I'm trying to use perf-tools and also to learn some internals about
> >> them. So I prefer to ask all of them in one email.
> >> 
> >> The first one is about the list of pre-defined events given by
> >> perf-list. I couldn't find any documentations that describes these
> >> events so excuse me if the question is stupid.
> >
> >
> >
> > Sorry about that. We indeed need to improve a lot the documentation.
> > May be this particular part could come with the future sysfs exposure
> > of the events.
> >
> 
> No problem, but yes this part should be documented somewhere. And I
> think the syntax of event too, specially the modifier like 'u' or 'p'.



Ah that is documented in "man perf-list".



> 
> >> 
> >> What's the difference between 'cpu-clock' and 'task-clock' event ?
> >
> >
> > cpu-clock is based on the total time spent on the cpu. task-clock is
> > based only on the time spent on the profiled task, so that doesn't count
> > time spent on other tasks, it has a per thread granularity.
> 
> Ok, so 'cpu-clock' could have been named 'proc-clock' even though a task
> is a processus on Linux.



Well, this is a matter of opinion probably, I think cpu-clock defines
better its role.



> 
> [...]
> 
> >> The last question is about the source code annotation done by
> >> perf-report. I'm using it to locate the place in my code that generates
> >> the most data cache miss events. I can read this during a perf-report
> >> session:
> >> 
> >>    [...]
> >>     0.00 :           df215:       c3                      retq
> >>     0.00 :           df216:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
> >>     0.00 :           df21d:       00 00 00
> >>    10.00 :           df220:       48 8b 75 00             mov    0x0(%rbp),%rsi
> >>    80.00 :           df224:       48 89 df                mov    %rbx,%rdi
> >>     0.00 :           df227:       41 ff d4                callq  *%r12
> >>     0.00 :           df22a:       85 c0                   test   %eax,%eax
> >>    [...]
> >> 
> >> If I read the output correctly, most of the dcache misses are coming from
> >> 'mov %rbx, %rdi', and AFAIK this intruction can't generate any dcache
> >> miss. What am I missing ?
> >
> >
> > Perhaps you need pebs to get the very precise location on your event.
> >
> > perf stat -e cache-misses:up,l1d-loads-misses:up true
> >
> >
> > I think the more you add 'p', the more precise it is.
> > Like:
> >
> > 	perf stat -e cache-misses:uppp,l1d-loads-misses:uppp true
> >
> > Not sure how much it will accept though :)
> 
> Well it doesn't want one actually:
> 
>   $ perf stat -v -e cache-misses:up true
>   Error: counter 0, sys_perf_event_open() syscall returned with -1 (No
>   space left on device)
>   No permission to collect stats.
>   Consider tweaking /proc/sys/kernel/perf_event_paranoid.
> 
> Where can I find a description of PEB ?


I have the same problem. But running perf record with this :p
works for me. Which is what we want: pebs is useful for sampling,
not counting-only.

Ah and that won't work if you don't run some intel CPU I think.
Check you have PEBS support in /proc/cpuinfo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: perf tools miscellaneous questions
  2010-11-07 21:40     ` Frederic Weisbecker
@ 2010-11-09 11:07       ` Francis Moreau
  0 siblings, 0 replies; 16+ messages in thread
From: Francis Moreau @ 2010-11-09 11:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian, linux-perf-users

Frederic Weisbecker <fweisbec@gmail.com> writes:

> On Thu, Nov 04, 2010 at 09:52:09AM +0100, Francis Moreau wrote:

[...]

>> 
>> No problem, but yes this part should be documented somewhere. And I
>> think the syntax of event too, specially the modifier like 'u' or 'p'.
>
> Ah that is documented in "man perf-list".

Ok, after updating my 3 weeks old kernel, modifiers are now documented.

But I failed to generate it:

    XMLTO perf-record.1
xmlto: /home/fmoreau/linux-2.6/tools/perf/Documentation/perf-record.xml does not validate (status 3)

BTW, what does 'skid' mean ? s(?) k(?) instruction delay ?

[...]

> I have the same problem. But running perf record with this :p works
> for me. Which is what we want: pebs is useful for sampling, not
> counting-only.

That makes sense but I still have a problem:

   $ perf record -e cache-misses:p -p $(pgrep test)

  Error: perfcounter syscall returned with -1 (No space left on device)

  Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?


> Ah and that won't work if you don't run some intel CPU I think.  Check
> you have PEBS support in /proc/cpuinfo

It seems so:

   $ grep -qi pebs /proc/cpuinfo && echo pebs
   pebs

Thanks
-- 
Francis

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <fa.AyvjD8RxwvZsnL5ZXcZ+OzALKY8@ifi.uio.no>]

[parent not found: <fa.wlHJxDLDciTHF6/icJo+JfjJPus@ifi.uio.no>]

[parent not found: <fa.8PQyp14JyjNgKJB4NUWsi+YoZBM@ifi.uio.no>]

[parent not found: <fa.ZajcSOe/t8/XBoLV1hZ7SjSJvtI@ifi.uio.no>]

* Re: perf tools miscellaneous questions
       [not found]     ` <fa.ZajcSOe/t8/XBoLV1hZ7SjSJvtI@ifi.uio.no>
@ 2010-11-04  8:45       ` Francis Moreau
  0 siblings, 0 replies; 16+ messages in thread
From: Francis Moreau @ 2010-11-04  8:45 UTC (permalink / raw)
  To: Reid Kleckner
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian, linux-perf-users

Francis Moreau <francis.moro@gmail.com> writes:

> Reid Kleckner <reid.kleckner@gmail.com> writes:
>
>> On Wed, Nov 3, 2010 at 5:43 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>>>> What's exactly the 'cache-misses' event ? does it include both instructions
>>>> _and_ data cache misses ? both L1 and L2 caches ?
>>>>
>>
>>>> I was expecting so but the following command makes me wondering:
>>>>
>>>>   $ perf stat -e cache-misses:u,l1d-loads-misses:u true
>>>>     Performance counter stats for 'true':
>>>>
>>>>                 763  cache-misses
>>>>                 874  L1-dcache-load-misses
>>>>
>>>>         0.000916609  seconds time elapsed
>>>>
>>>> Here cache-misses < L1-dcache-load-misses.
>>>
>>>
>>>
>>> Dunno, will let others answer.
>>
>> I think it corresponds to last level cache misses, which makes sense
>> here.  The difference in the two numbers represents hits to L2 (and L3
>> if it exists).
>
> How could I know the number of cache level on my cpu ?
>
> I tried:
>
>   $ perf stat -e cache-misses:u,l1d-loads-misses:u true
>
>     Performance counter stats for 'true':
>
>                    802  cache-misses
>                    937  L1-dcache-load-misses
>
>            0.000996578  seconds time elapsed
>
>   $ perf stat -e cache-misses:u,L2-loads-misses:u true
>
>    Performance counter stats for 'true':
>
>                   788  cache-misses
>                    95  LLC-load-misses
>
>           0.001025423  seconds time elapsed
>
> So it looks like you're right: in my case I have this cache
> configuration:
>

oops sorry, I replied too early...

so my cache configuration is:

   L1 -> L2 -> LLC

where L2 misses is given by: 'cache-misses' - 'LLC-load-misses'

Is that correct ?

If so, I found 'cache-misses' term very not intuitive IMHO, probably
because I'm not an expert in cpu caches...

-- 
Francis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* perf tools miscellaneous questions
@ 2010-11-03 19:28 Francis Moreau
  2010-11-03 21:43 ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Francis Moreau @ 2010-11-03 19:28 UTC (permalink / raw)
  To: linux-kernel

Hello,

I'm trying to use perf-tools and also to learn some internals about
them. So I prefer to ask all of them in one email.

The first one is about the list of pre-defined events given by
perf-list. I couldn't find any documentations that describes these
events so excuse me if the question is stupid.

What's the difference between 'cpu-clock' and 'task-clock' event ?

What's exactly the 'cache-misses' event ? does it include both instructions
_and_ data cache misses ? both L1 and L2 caches ?

I was expecting so but the following command makes me wondering:

  $ perf stat -e cache-misses:u,l1d-loads-misses:u true
    Performance counter stats for 'true':

                763  cache-misses            
                874  L1-dcache-load-misses   

        0.000916609  seconds time elapsed

Here cache-misses < L1-dcache-load-misses.

The last question is about the source code annotation done by
perf-report. I'm using it to locate the place in my code that generates
the most data cache miss events. I can read this during a perf-report
session:

   [...]
    0.00 :           df215:       c3                      retq
    0.00 :           df216:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
    0.00 :           df21d:       00 00 00
   10.00 :           df220:       48 8b 75 00             mov    0x0(%rbp),%rsi
   80.00 :           df224:       48 89 df                mov    %rbx,%rdi
    0.00 :           df227:       41 ff d4                callq  *%r12
    0.00 :           df22a:       85 c0                   test   %eax,%eax
   [...]

If I read the output correctly, most of the dcache misses are coming from
'mov %rbx, %rdi', and AFAIK this intruction can't generate any dcache
miss. What am I missing ?

Thanks.
-- 
Francis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: perf tools miscellaneous questions
  2010-11-03 19:28 Francis Moreau
@ 2010-11-03 21:43 ` Frederic Weisbecker
  2010-11-03 22:15   ` Reid Kleckner
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2010-11-03 21:43 UTC (permalink / raw)
  To: Francis Moreau
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian, linux-perf-users

On Wed, Nov 03, 2010 at 08:28:59PM +0100, Francis Moreau wrote:
> Hello,
> 
> I'm trying to use perf-tools and also to learn some internals about
> them. So I prefer to ask all of them in one email.
> 
> The first one is about the list of pre-defined events given by
> perf-list. I couldn't find any documentations that describes these
> events so excuse me if the question is stupid.



Sorry about that. We indeed need to improve a lot the documentation.
May be this particular part could come with the future sysfs exposure
of the events.



> 
> What's the difference between 'cpu-clock' and 'task-clock' event ?


cpu-clock is based on the total time spent on the cpu. task-clock is
based only on the time spent on the profiled task, so that doesn't count
time spent on other tasks, it has a per thread granularity.

(I might be somehow wrong in my explanation).



> 
> What's exactly the 'cache-misses' event ? does it include both instructions
> _and_ data cache misses ? both L1 and L2 caches ?
> 
> I was expecting so but the following command makes me wondering:
> 
>   $ perf stat -e cache-misses:u,l1d-loads-misses:u true
>     Performance counter stats for 'true':
> 
>                 763  cache-misses            
>                 874  L1-dcache-load-misses   
> 
>         0.000916609  seconds time elapsed
> 
> Here cache-misses < L1-dcache-load-misses.



Dunno, will let others answer.



> The last question is about the source code annotation done by
> perf-report. I'm using it to locate the place in my code that generates
> the most data cache miss events. I can read this during a perf-report
> session:
> 
>    [...]
>     0.00 :           df215:       c3                      retq
>     0.00 :           df216:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
>     0.00 :           df21d:       00 00 00
>    10.00 :           df220:       48 8b 75 00             mov    0x0(%rbp),%rsi
>    80.00 :           df224:       48 89 df                mov    %rbx,%rdi
>     0.00 :           df227:       41 ff d4                callq  *%r12
>     0.00 :           df22a:       85 c0                   test   %eax,%eax
>    [...]
> 
> If I read the output correctly, most of the dcache misses are coming from
> 'mov %rbx, %rdi', and AFAIK this intruction can't generate any dcache
> miss. What am I missing ?


Perhaps you need pebs to get the very precise location on your event.

perf stat -e cache-misses:up,l1d-loads-misses:up true


I think the more you add 'p', the more precise it is.
Like:

	perf stat -e cache-misses:uppp,l1d-loads-misses:uppp true

Not sure how much it will accept though :)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: perf tools miscellaneous questions
  2010-11-03 21:43 ` Frederic Weisbecker
@ 2010-11-03 22:15   ` Reid Kleckner
  0 siblings, 0 replies; 16+ messages in thread
From: Reid Kleckner @ 2010-11-03 22:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Francis Moreau, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian, linux-perf-users

On Wed, Nov 3, 2010 at 5:43 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>> What's exactly the 'cache-misses' event ? does it include both instructions
>> _and_ data cache misses ? both L1 and L2 caches ?
>>
>> I was expecting so but the following command makes me wondering:
>>
>>   $ perf stat -e cache-misses:u,l1d-loads-misses:u true
>>     Performance counter stats for 'true':
>>
>>                 763  cache-misses
>>                 874  L1-dcache-load-misses
>>
>>         0.000916609  seconds time elapsed
>>
>> Here cache-misses < L1-dcache-load-misses.
>
>
>
> Dunno, will let others answer.

I think it corresponds to last level cache misses, which makes sense
here.  The difference in the two numbers represents hits to L2 (and L3
if it exists).

Reid

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-11-09 11:07 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <fa.SSgtQesEhEQa5DUYUwBV3fWtoV4@ifi.uio.no>
     [not found] ` <fa.dD5ur5Phqa1TLmYBE2NVKCQMjTw@ifi.uio.no>
     [not found]   ` <fa.Xj1lA7n6nIJYL40CeRDpQzSKlfc@ifi.uio.no>
     [not found]     ` <fa.xcyA+VzIXesq6qsPU6ADM4xdCKY@ifi.uio.no>
     [not found]       ` <fa.MOg61Pcfdp2SJnwM2GFdOxP+xt0@ifi.uio.no>
     [not found]         ` <fa.4sHfhlc/fMhpYgKda4IfUHZ7jMY@ifi.uio.no>
2010-11-05 12:38           ` perf tools miscellaneous questions Francis Moreau
2010-11-05 14:02             ` Vince Weaver
2010-11-06 14:44               ` Francis Moreau
2010-11-06 20:50                 ` Vince Weaver
2010-11-06 20:52                   ` Vince Weaver
2010-11-08 19:43                   ` Francis Moreau
     [not found] <fa.eqFHAk86WhpTuYclHhngn7QZr8Y@ifi.uio.no>
     [not found] ` <fa.X++YRAJg+rmPtm3nmroZZoNU7u8@ifi.uio.no>
     [not found]   ` <fa.Elk8wOfBHRdKtkHIjP9hOtzVCgQ@ifi.uio.no>
     [not found]     ` <fa.U3fdWUGXguguIIyQ4v66/uKOhus@ifi.uio.no>
     [not found]       ` <fa.YTQQoGhoS6d3BaZXcZMN+l9TojQ@ifi.uio.no>
2010-11-04 20:58         ` Francis Moreau
2010-11-04 22:28           ` Victor Jimenez
     [not found] <fa.yHA7Aw03llqLWxPVYRnHvK5/dT8@ifi.uio.no>
     [not found] ` <fa.13KEqWk+Dk+jLLdFlAoZtQ2Vjuw@ifi.uio.no>
     [not found]   ` <fa.AGo9lmnVDcmFVUpOFG/kfd1aYfI@ifi.uio.no>
2010-11-04  8:34     ` Francis Moreau
2010-11-04  8:52   ` Francis Moreau
2010-11-07 21:40     ` Frederic Weisbecker
2010-11-09 11:07       ` Francis Moreau
     [not found] <fa.AyvjD8RxwvZsnL5ZXcZ+OzALKY8@ifi.uio.no>
     [not found] ` <fa.wlHJxDLDciTHF6/icJo+JfjJPus@ifi.uio.no>
     [not found]   ` <fa.8PQyp14JyjNgKJB4NUWsi+YoZBM@ifi.uio.no>
     [not found]     ` <fa.ZajcSOe/t8/XBoLV1hZ7SjSJvtI@ifi.uio.no>
2010-11-04  8:45       ` Francis Moreau
2010-11-03 19:28 Francis Moreau
2010-11-03 21:43 ` Frederic Weisbecker
2010-11-03 22:15   ` Reid Kleckner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox