linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf pebs sampling through stores + period is wrong?
@ 2014-02-20 11:40 Harald Servat
  2014-02-20 21:27 ` Andi Kleen
  0 siblings, 1 reply; 3+ messages in thread
From: Harald Servat @ 2014-02-20 11:40 UTC (permalink / raw)
  To: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 2849 bytes --]

Dear all,

   I'd let you know that I'm observing that the PEBS sampling through 
PEBS stores seem to behave badly (at least to my understanding) in 
cooperation with -c flag.

   I'm running Linux 3.11.0 on a Intel SandyBridge machine with the 
following info

vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
stepping	: 7

   and for testing purposes I'm using the attached program (which simply 
transfers data from one vector to another one) to depict the problem. 
When I use perf stat to get information of the loads & stores of this 
app I get this output (which I reduced manually)

  $ perf stat -e r81d0 ./a.out  # Intel manual [1] in table 19-17 
indicates that event number d0 + umask 81 refers to all loads
   671.488.050 loads

  $ perf stat -e r82d0 ./a.out # The same as before, but for stores
   356.521.360 stores

   We can see there that the number of stores is half the number of 
loads. However, when I use the perf mem record command for every 10k 
loads I get the following info:

   $ perf mem -t load record -c 10000 ./a.out
   [perf record: Woken up 1 times to write data]
   [perf record: Captured and wrote 0.047 MB perf.data (~2036 samples)]

   but when looking for samples every 10k stores I get

   $ perf mem -t store record -c 10000 ./a.out
   ...
   [perf record: Woken up 4 times to write data]
   [perf record: Captured and wrote 0.921 MB perf.data (~40247 samples)]

   Notice that the number of samples raised by 20x, which to me seems 
very odd because the number of stores was half, so I expected 0.5x here. 
Or am I supposing this the wrong way?

   Just for further testing, if I omit the -c parameter (which I need 
:S), it seems to work better

   $ perf mem -t load record  ./a.out
   [perf record: Woken up 1 times to write data]
   [perf record: Captured and wrote 0.172 MB perf.data (~7508 samples)]

   $ perf mem -t store record  ./a.out
   [perf record: Woken up 1 times to write data]
   [perf record: Captured and wrote 0.151 MB perf.data (~6607 samples)]

Best regards.

[1] 
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf


WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer

[-- Attachment #2: memcpy.c --]
[-- Type: text/x-csrc, Size: 493 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *long_str = "This is a very long string!";
char dest[1024*1024*1024];

int main (int argc, char *argv[])
{
	int i;
	int length = strlen (long_str);

	for (i = 0; i < 1024*1024*1024-length; i += length)
		memcpy (&dest[i], long_str, length);

	printf ("CHECK: %c\n", dest[0*length+0]);
	printf ("CHECK: %c\n", dest[1*length+1]);
	printf ("CHECK: %c\n", dest[2*length+2]);
	printf ("CHECK: %c\n", dest[3*length+3]);

	return 0;
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-02-21  9:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-20 11:40 perf pebs sampling through stores + period is wrong? Harald Servat
2014-02-20 21:27 ` Andi Kleen
2014-02-21  9:45   ` Harald Servat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).