From mboxrd@z Thu Jan 1 00:00:00 1970 From: Reid Kleckner Subject: Re: perf tools miscellaneous questions Date: Mon, 8 Nov 2010 15:06:51 -0500 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: Sender: linux-perf-users-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: To: Francis Moreau Cc: Vince Weaver , Victor Jimenez , Frederic Weisbecker , Ingo Molnar , Peter Zijlstra , Arnaldo Carvalho de Melo , Stephane Eranian , linux-perf-users-u79uwXL29TY76Z2rM5mHXA@public.gmane.org -lkml On Mon, Nov 8, 2010 at 2:43 PM, Francis Moreau = wrote: > Vince Weaver writes: > >> This is rapidly getting of topic, especially for linux-kernel > > Don't think so but feel free to remove LKML from Cc. > > [...] > >> Most events are poorly documented, if at all. =C2=A0And the Linux ke= rnel >> predefined event list is loosely based upon the intel architectural >> events, which not every processor has and I've heard from insiders s= aying >> that you should be very careful for the results from those events. > > I agree, that's why I try to clarify some events. > > Perf tools are cool stuffs, IMHO, but it's pretty hard for me to > interpret results. I tried to compare some numbers in my previous pos= ts > but I got some 'random' figures for now. > > Another example is given below where I'm trying to bench a 2 function= s > which do the same thing but differently. > > =C2=A0 $ perf stat -e cache-misses:u,l1d-loads-misses:u,cycles:u -p $= (pgrep test) > =C2=A0 =C2=A0 C-c C-c > =C2=A0 =C2=A0Performance counter stats for process id '30263': > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0406532 =C2=A0c= ache-misses > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 4986030 =C2=A0L1-dca= che-load-misses > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 120247366 =C2=A0cycles > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2.482196928 =C2=A0seconds time ela= psed > > > =C2=A0 $ perf stat -e cache-misses:u,l1d-loads-misses:u,cycles:u -p $= (pgrep test) > =C2=A0 =C2=A0 C-c C-c > =C2=A0 =C2=A0Performance counter stats for process id '30271': > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0459683 =C2=A0c= ache-misses > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2513338 =C2=A0L1-dca= che-load-misses > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 159968076 =C2=A0cycles > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2.129021265 =C2=A0seconds time ela= psed > > Which numbers are important here ? cache-misses ? L1-dcache-load-miss= es > ? Totally depends. In this particular piece of code, you seem to have improved your L1 hit rate, but you've hurt your hit rate somewhere else, so the extra memory traffic has hurt you overall. Also, it's also helpful to look at the rate, and not just absolute numbers. You may be doing more L1 references in the second, so you just have more memory traffic overall. I don't know what level of cache the generic cache-misses and -references refer to on your processor. Unfortunately, you'd have to go look up the source code and cross reference it with a manual to know for 100%. Having looked at the code, I can assert that it's an event that has to do with the higher level caches, ie not L1, and apparently it's not LLC on your machine. Try comparing it to the numbers for L2-dcache-load-misses and L2-dcache-store-misses. IMO it's worth doing multiple runs to look at *all* of the cache counters on a variety of workloads with known cache behavior so you can get an understanding. The reality is that these things aren't well documented either by the manufacturer or the kernel developers, and the best way to understand them right now is to run your own experiments. Reid