Re: perf tools miscellaneous questions

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: perf tools miscellaneous questions
       [not found]                 ` <m262wao6k7.fsf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2010-11-06 20:50                   ` Vince Weaver
       [not found]                     ` <alpine.DEB.2.00.1011061642020.29635-h+XK9Y6koVLPD5dMldXnqTe48wsgrGvP@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Vince Weaver @ 2010-11-06 20:50 UTC (permalink / raw)
  To: Francis Moreau
  Cc: Victor Jimenez, Reid Kleckner, Frederic Weisbecker,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian,
	linux-perf-users-u79uwXL29TY76Z2rM5mHXA

This is rapidly getting of topic, especially for linux-kernel

On Sat, 6 Nov 2010, Francis Moreau wrote:

> Specially since 'llc-loads-misses' is and should be self speaking.

Not necessarily.  Does "last level" mean the common L2 or the shared L3?
Do the misses count prefetch misses?  Do the misses count coherency
actions or else just "normal" cache accesses?  Does your processor count 
multiple loads from some single instructions [unfortunately, many do].

Most events are poorly documented, if at all.  And the Linux kernel 
predefined event list is loosely based upon the intel architectural
events, which not every processor has and I've heard from insiders saying 
that you should be very careful for the results from those events.  Also 
as far as I know there hasn't been much validation work on whether the 
events return useful values.  No chip company will guarantee the values 
returned by performance counters; they are more or less a bonus feature 
that works most of the time but you never really know the accuracy of what 
you are reading out of them.

> Could you point out the best architecture manual for it which describe
> the raw events ?

For your Core2 you want the Intel Software Developer's Manual, volume 2B.  
Google should find it.

> BTW, I'm wondering if event names are coherent across the different
> architectures supported by Linux.

Nope.  They aren't even consistent across the same chip company.  For 
example, Core2 and Nehalem have completely different event names, and even 
between Nehalem and Westmere there are incompatible changes.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf tools miscellaneous questions
       [not found]                     ` <alpine.DEB.2.00.1011061642020.29635-h+XK9Y6koVLPD5dMldXnqTe48wsgrGvP@public.gmane.org>
@ 2010-11-06 20:52                       ` Vince Weaver
  2010-11-08 19:43                       ` Francis Moreau
  1 sibling, 0 replies; 7+ messages in thread
From: Vince Weaver @ 2010-11-06 20:52 UTC (permalink / raw)
  To: Francis Moreau
  Cc: Victor Jimenez, Reid Kleckner, Frederic Weisbecker,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian,
	linux-perf-users-u79uwXL29TY76Z2rM5mHXA

On Sat, 6 Nov 2010, Vince Weaver wrote:

> For your Core2 you want the Intel Software Developer's Manual, volume 2B.  
> Google should find it.

typo, sorry, I meant Volume 3B.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf tools miscellaneous questions
       [not found]     ` <m2lj59sc7a.fsf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2010-11-07 21:40       ` Frederic Weisbecker
  2010-11-09 11:07         ` Francis Moreau
  0 siblings, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2010-11-07 21:40 UTC (permalink / raw)
  To: Francis Moreau
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian,
	linux-perf-users-u79uwXL29TY76Z2rM5mHXA

On Thu, Nov 04, 2010 at 09:52:09AM +0100, Francis Moreau wrote:
> Frederic Weisbecker <fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> 
> > On Wed, Nov 03, 2010 at 08:28:59PM +0100, Francis Moreau wrote:
> >> Hello,
> >> 
> >> I'm trying to use perf-tools and also to learn some internals about
> >> them. So I prefer to ask all of them in one email.
> >> 
> >> The first one is about the list of pre-defined events given by
> >> perf-list. I couldn't find any documentations that describes these
> >> events so excuse me if the question is stupid.
> >
> >
> >
> > Sorry about that. We indeed need to improve a lot the documentation.
> > May be this particular part could come with the future sysfs exposure
> > of the events.
> >
> 
> No problem, but yes this part should be documented somewhere. And I
> think the syntax of event too, specially the modifier like 'u' or 'p'.



Ah that is documented in "man perf-list".



> 
> >> 
> >> What's the difference between 'cpu-clock' and 'task-clock' event ?
> >
> >
> > cpu-clock is based on the total time spent on the cpu. task-clock is
> > based only on the time spent on the profiled task, so that doesn't count
> > time spent on other tasks, it has a per thread granularity.
> 
> Ok, so 'cpu-clock' could have been named 'proc-clock' even though a task
> is a processus on Linux.



Well, this is a matter of opinion probably, I think cpu-clock defines
better its role.



> 
> [...]
> 
> >> The last question is about the source code annotation done by
> >> perf-report. I'm using it to locate the place in my code that generates
> >> the most data cache miss events. I can read this during a perf-report
> >> session:
> >> 
> >>    [...]
> >>     0.00 :           df215:       c3                      retq
> >>     0.00 :           df216:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
> >>     0.00 :           df21d:       00 00 00
> >>    10.00 :           df220:       48 8b 75 00             mov    0x0(%rbp),%rsi
> >>    80.00 :           df224:       48 89 df                mov    %rbx,%rdi
> >>     0.00 :           df227:       41 ff d4                callq  *%r12
> >>     0.00 :           df22a:       85 c0                   test   %eax,%eax
> >>    [...]
> >> 
> >> If I read the output correctly, most of the dcache misses are coming from
> >> 'mov %rbx, %rdi', and AFAIK this intruction can't generate any dcache
> >> miss. What am I missing ?
> >
> >
> > Perhaps you need pebs to get the very precise location on your event.
> >
> > perf stat -e cache-misses:up,l1d-loads-misses:up true
> >
> >
> > I think the more you add 'p', the more precise it is.
> > Like:
> >
> > 	perf stat -e cache-misses:uppp,l1d-loads-misses:uppp true
> >
> > Not sure how much it will accept though :)
> 
> Well it doesn't want one actually:
> 
>   $ perf stat -v -e cache-misses:up true
>   Error: counter 0, sys_perf_event_open() syscall returned with -1 (No
>   space left on device)
>   No permission to collect stats.
>   Consider tweaking /proc/sys/kernel/perf_event_paranoid.
> 
> Where can I find a description of PEB ?


I have the same problem. But running perf record with this :p
works for me. Which is what we want: pebs is useful for sampling,
not counting-only.

Ah and that won't work if you don't run some intel CPU I think.
Check you have PEBS support in /proc/cpuinfo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf tools miscellaneous questions
       [not found]                     ` <alpine.DEB.2.00.1011061642020.29635-h+XK9Y6koVLPD5dMldXnqTe48wsgrGvP@public.gmane.org>
  2010-11-06 20:52                       ` Vince Weaver
@ 2010-11-08 19:43                       ` Francis Moreau
       [not found]                         ` <m24obr1tzf.fsf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Francis Moreau @ 2010-11-08 19:43 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Victor Jimenez, Reid Kleckner, Frederic Weisbecker,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian,
	linux-perf-users-u79uwXL29TY76Z2rM5mHXA

Vince Weaver <vweaver1-qKp7vQ+Mknf2fBVCVOL8/A@public.gmane.org> writes:

> This is rapidly getting of topic, especially for linux-kernel

Don't think so but feel free to remove LKML from Cc.

[...]

> Most events are poorly documented, if at all.  And the Linux kernel 
> predefined event list is loosely based upon the intel architectural
> events, which not every processor has and I've heard from insiders saying 
> that you should be very careful for the results from those events.

I agree, that's why I try to clarify some events.

Perf tools are cool stuffs, IMHO, but it's pretty hard for me to
interpret results. I tried to compare some numbers in my previous posts
but I got some 'random' figures for now.

Another example is given below where I'm trying to bench a 2 functions
which do the same thing but differently.

   $ perf stat -e cache-misses:u,l1d-loads-misses:u,cycles:u -p $(pgrep test)
     C-c C-c
    Performance counter stats for process id '30263':
   
                406532  cache-misses            
               4986030  L1-dcache-load-misses   
             120247366  cycles                  
   
           2.482196928  seconds time elapsed
   
   
   $ perf stat -e cache-misses:u,l1d-loads-misses:u,cycles:u -p $(pgrep test)
     C-c C-c
    Performance counter stats for process id '30271':
   
                459683  cache-misses            
               2513338  L1-dcache-load-misses   
             159968076  cycles                  
   
           2.129021265  seconds time elapsed

Which numbers are important here ? cache-misses ? L1-dcache-load-misses
?

I just can say that the first run looks faster.

-- 
Francis

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf tools miscellaneous questions
       [not found]                         ` <m24obr1tzf.fsf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2010-11-08 20:06                           ` Reid Kleckner
       [not found]                             ` <AANLkTimyzO7_MzRs0tUaWFHWEN69MRhwQL=EPFiukXci-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Reid Kleckner @ 2010-11-08 20:06 UTC (permalink / raw)
  To: Francis Moreau
  Cc: Vince Weaver, Victor Jimenez, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Stephane Eranian,
	linux-perf-users-u79uwXL29TY76Z2rM5mHXA

-lkml

On Mon, Nov 8, 2010 at 2:43 PM, Francis Moreau <francis.moro-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Vince Weaver <vweaver1-qKp7vQ+Mknf2fBVCVOL8/A@public.gmane.org> writes:
>
>> This is rapidly getting of topic, especially for linux-kernel
>
> Don't think so but feel free to remove LKML from Cc.
>
> [...]
>
>> Most events are poorly documented, if at all.  And the Linux kernel
>> predefined event list is loosely based upon the intel architectural
>> events, which not every processor has and I've heard from insiders saying
>> that you should be very careful for the results from those events.
>
> I agree, that's why I try to clarify some events.
>
> Perf tools are cool stuffs, IMHO, but it's pretty hard for me to
> interpret results. I tried to compare some numbers in my previous posts
> but I got some 'random' figures for now.
>
> Another example is given below where I'm trying to bench a 2 functions
> which do the same thing but differently.
>
>   $ perf stat -e cache-misses:u,l1d-loads-misses:u,cycles:u -p $(pgrep test)
>     C-c C-c
>    Performance counter stats for process id '30263':
>
>                406532  cache-misses
>               4986030  L1-dcache-load-misses
>             120247366  cycles
>
>           2.482196928  seconds time elapsed
>
>
>   $ perf stat -e cache-misses:u,l1d-loads-misses:u,cycles:u -p $(pgrep test)
>     C-c C-c
>    Performance counter stats for process id '30271':
>
>                459683  cache-misses
>               2513338  L1-dcache-load-misses
>             159968076  cycles
>
>           2.129021265  seconds time elapsed
>
> Which numbers are important here ? cache-misses ? L1-dcache-load-misses
> ?

Totally depends.  In this particular piece of code, you seem to have
improved your L1 hit rate, but you've hurt your hit rate somewhere
else, so the extra memory traffic has hurt you overall.  Also, it's
also helpful to look at the rate, and not just absolute numbers.  You
may be doing more L1 references in the second, so you just have more
memory traffic overall.

I don't know what level of cache the generic cache-misses and
-references refer to on your processor.  Unfortunately, you'd have to
go look up the source code and cross reference it with a manual to
know for 100%.  Having looked at the code, I can assert that it's an
event that has to do with the higher level caches, ie not L1, and
apparently it's not LLC on your machine.

Try comparing it to the numbers for L2-dcache-load-misses and
L2-dcache-store-misses.  IMO it's worth doing multiple runs to look at
*all* of the cache counters on a variety of workloads with known cache
behavior so you can get an understanding.

The reality is that these things aren't well documented either by the
manufacturer or the kernel developers, and the best way to understand
them right now is to run your own experiments.

Reid

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf tools miscellaneous questions
  2010-11-07 21:40       ` Frederic Weisbecker
@ 2010-11-09 11:07         ` Francis Moreau
  0 siblings, 0 replies; 7+ messages in thread
From: Francis Moreau @ 2010-11-09 11:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Stephane Eranian,
	linux-perf-users-u79uwXL29TY76Z2rM5mHXA

Frederic Weisbecker <fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On Thu, Nov 04, 2010 at 09:52:09AM +0100, Francis Moreau wrote:

[...]

>> 
>> No problem, but yes this part should be documented somewhere. And I
>> think the syntax of event too, specially the modifier like 'u' or 'p'.
>
> Ah that is documented in "man perf-list".

Ok, after updating my 3 weeks old kernel, modifiers are now documented.

But I failed to generate it:

    XMLTO perf-record.1
xmlto: /home/fmoreau/linux-2.6/tools/perf/Documentation/perf-record.xml does not validate (status 3)

BTW, what does 'skid' mean ? s(?) k(?) instruction delay ?

[...]

> I have the same problem. But running perf record with this :p works
> for me. Which is what we want: pebs is useful for sampling, not
> counting-only.

That makes sense but I still have a problem:

   $ perf record -e cache-misses:p -p $(pgrep test)

  Error: perfcounter syscall returned with -1 (No space left on device)

  Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?


> Ah and that won't work if you don't run some intel CPU I think.  Check
> you have PEBS support in /proc/cpuinfo

It seems so:

   $ grep -qi pebs /proc/cpuinfo && echo pebs
   pebs

Thanks
-- 
Francis

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf tools miscellaneous questions
       [not found]                             ` <AANLkTimyzO7_MzRs0tUaWFHWEN69MRhwQL=EPFiukXci-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-11-09 15:22                               ` Francis Moreau
  0 siblings, 0 replies; 7+ messages in thread
From: Francis Moreau @ 2010-11-09 15:22 UTC (permalink / raw)
  To: Reid Kleckner
  Cc: Vince Weaver, Victor Jimenez, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Stephane Eranian,
	linux-perf-users-u79uwXL29TY76Z2rM5mHXA

Reid Kleckner <reid.kleckner-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

[...]

> I don't know what level of cache the generic cache-misses and
> -references refer to on your processor.  Unfortunately, you'd have to
> go look up the source code and cross reference it with a manual to
> know for 100%.

Looking at the source and intel processor manual, 'cache-misses' event
is a "Pre-defined Architectural Performance Events" with:

   - UMASK = 0x41
   - Event select = 0x2e

Here's the complete definition of 'cache-misses':

   Last Level Cache Misses — Event select 2EH, Umask 41H

   This event counts each cache miss condition for references to the
   last level cache.  The event count may include speculation, but
   excludes cache line fills due to hardware-prefetch.

   Because cache hierarchy, cache sizes and other
   implementation-specific characteristics; value comparison to estimate
   performance differences is not recommended.

Also "Pre-defined Architectural Performance Events" means, that's
the definition is common across all Intel CPUs, IIUC.

> Having looked at the code, I can assert that it's an event that has to
> do with the higher level caches, ie not L1, and apparently it's not
> LLC on your machine.

Unfortunately 'cache-misses' _is_ LLC on this machine hence my
confusion with my previous examples using true(1), and gzip(1).

But to add more confusion please see the numbers below...

> IMO it's worth doing multiple runs to look at *all* of the cache
> counters on a variety of workloads with known cache behavior so you
> can get an understanding.

Here's a more complete run.

method 1:
             408502  cache-misses
            3040439  cache-references
           38489028  L1-dcache-loads
            6616736  L1-dcache-stores
            4948739  L1-dcache-load-misses
                241  L1-dcache-store-misses
            2998011  LLC-loads
             406115  LLC-load-misses
                171  LLC-stores
                 41  LLC-store-misses
          120654728  cycles
           82578853  instructions             #      0.684 IPC
                  0  minor-faults
                  0  major-faults
                  0  alignment-faults

method 2:
             460273  cache-misses
            1891362  cache-references
           28549238  L1-dcache-loads
            6596346  L1-dcache-stores
            3699561  L1-dcache-load-misses
                608  L1-dcache-store-misses
            1884987  LLC-loads
             459826  LLC-load-misses
                 63  LLC-stores
                 38  LLC-store-misses
          160426298  cycles
           87272047  instructions             #      0.544 IPC  
                  0  minor-faults
                  0  major-faults
                  0  alignment-faults

Now 'cache-misses' and 'LLC-{load,store}-misses' are quite similar,
sigh...

So the first method since more efficient because it seems to execute
less instructions and have less LLC misses even if its L1-dcache misses
is lower.

Thanks
-- 
Francis

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-09 15:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <fa.SSgtQesEhEQa5DUYUwBV3fWtoV4@ifi.uio.no>
     [not found] ` <fa.dD5ur5Phqa1TLmYBE2NVKCQMjTw@ifi.uio.no>
     [not found]   ` <fa.Xj1lA7n6nIJYL40CeRDpQzSKlfc@ifi.uio.no>
     [not found]     ` <fa.xcyA+VzIXesq6qsPU6ADM4xdCKY@ifi.uio.no>
     [not found]       ` <fa.MOg61Pcfdp2SJnwM2GFdOxP+xt0@ifi.uio.no>
     [not found]         ` <fa.4sHfhlc/fMhpYgKda4IfUHZ7jMY@ifi.uio.no>
     [not found]           ` <m2wros3pzf.fsf@gmail.com>
     [not found]             ` <alpine.DEB.2.00.1011051000310.26020@cl320.eecs.utk.edu>
     [not found]               ` <m262wao6k7.fsf@gmail.com>
     [not found]                 ` <m262wao6k7.fsf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2010-11-06 20:50                   ` perf tools miscellaneous questions Vince Weaver
     [not found]                     ` <alpine.DEB.2.00.1011061642020.29635-h+XK9Y6koVLPD5dMldXnqTe48wsgrGvP@public.gmane.org>
2010-11-06 20:52                       ` Vince Weaver
2010-11-08 19:43                       ` Francis Moreau
     [not found]                         ` <m24obr1tzf.fsf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2010-11-08 20:06                           ` Reid Kleckner
     [not found]                             ` <AANLkTimyzO7_MzRs0tUaWFHWEN69MRhwQL=EPFiukXci-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-11-09 15:22                               ` Francis Moreau
     [not found] <fa.yHA7Aw03llqLWxPVYRnHvK5/dT8@ifi.uio.no>
     [not found] ` <fa.13KEqWk+Dk+jLLdFlAoZtQ2Vjuw@ifi.uio.no>
     [not found]   ` <m2lj59sc7a.fsf@gmail.com>
     [not found]     ` <m2lj59sc7a.fsf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2010-11-07 21:40       ` Frederic Weisbecker
2010-11-09 11:07         ` Francis Moreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).