From mboxrd@z Thu Jan 1 00:00:00 1970 From: Francis Moreau Subject: Re: perf tools miscellaneous questions Date: Tue, 09 Nov 2010 16:22:31 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: (Reid Kleckner's message of "Mon, 8 Nov 2010 15:06:51 -0500") Sender: linux-perf-users-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: To: Reid Kleckner Cc: Vince Weaver , Victor Jimenez , Frederic Weisbecker , Ingo Molnar , Peter Zijlstra , Arnaldo Carvalho de Melo , Stephane Eranian , linux-perf-users-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Reid Kleckner writes: [...] > I don't know what level of cache the generic cache-misses and > -references refer to on your processor. Unfortunately, you'd have to > go look up the source code and cross reference it with a manual to > know for 100%. Looking at the source and intel processor manual, 'cache-misses' event is a "Pre-defined Architectural Performance Events" with: - UMASK =3D 0x41 - Event select =3D 0x2e Here's the complete definition of 'cache-misses': Last Level Cache Misses =E2=80=94 Event select 2EH, Umask 41H This event counts each cache miss condition for references to the last level cache. The event count may include speculation, but excludes cache line fills due to hardware-prefetch. Because cache hierarchy, cache sizes and other implementation-specific characteristics; value comparison to estimat= e performance differences is not recommended. Also "Pre-defined Architectural Performance Events" means, that's the definition is common across all Intel CPUs, IIUC. > Having looked at the code, I can assert that it's an event that has t= o > do with the higher level caches, ie not L1, and apparently it's not > LLC on your machine. Unfortunately 'cache-misses' _is_ LLC on this machine hence my confusion with my previous examples using true(1), and gzip(1). But to add more confusion please see the numbers below... > IMO it's worth doing multiple runs to look at *all* of the cache > counters on a variety of workloads with known cache behavior so you > can get an understanding. Here's a more complete run. method 1: 408502 cache-misses 3040439 cache-references 38489028 L1-dcache-loads 6616736 L1-dcache-stores 4948739 L1-dcache-load-misses 241 L1-dcache-store-misses 2998011 LLC-loads 406115 LLC-load-misses 171 LLC-stores 41 LLC-store-misses 120654728 cycles 82578853 instructions # 0.684 IPC 0 minor-faults 0 major-faults 0 alignment-faults method 2: 460273 cache-misses 1891362 cache-references 28549238 L1-dcache-loads 6596346 L1-dcache-stores 3699561 L1-dcache-load-misses 608 L1-dcache-store-misses 1884987 LLC-loads 459826 LLC-load-misses 63 LLC-stores 38 LLC-store-misses 160426298 cycles 87272047 instructions # 0.544 IPC =20 0 minor-faults 0 major-faults 0 alignment-faults Now 'cache-misses' and 'LLC-{load,store}-misses' are quite similar, sigh... So the first method since more efficient because it seems to execute less instructions and have less LLC misses even if its L1-dcache misses is lower. Thanks --=20 =46rancis