* perf pebs sampling through stores + period is wrong?
@ 2014-02-20 11:40 Harald Servat
2014-02-20 21:27 ` Andi Kleen
0 siblings, 1 reply; 3+ messages in thread
From: Harald Servat @ 2014-02-20 11:40 UTC (permalink / raw)
To: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 2849 bytes --]
Dear all,
I'd let you know that I'm observing that the PEBS sampling through
PEBS stores seem to behave badly (at least to my understanding) in
cooperation with -c flag.
I'm running Linux 3.11.0 on a Intel SandyBridge machine with the
following info
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
stepping : 7
and for testing purposes I'm using the attached program (which simply
transfers data from one vector to another one) to depict the problem.
When I use perf stat to get information of the loads & stores of this
app I get this output (which I reduced manually)
$ perf stat -e r81d0 ./a.out # Intel manual [1] in table 19-17
indicates that event number d0 + umask 81 refers to all loads
671.488.050 loads
$ perf stat -e r82d0 ./a.out # The same as before, but for stores
356.521.360 stores
We can see there that the number of stores is half the number of
loads. However, when I use the perf mem record command for every 10k
loads I get the following info:
$ perf mem -t load record -c 10000 ./a.out
[perf record: Woken up 1 times to write data]
[perf record: Captured and wrote 0.047 MB perf.data (~2036 samples)]
but when looking for samples every 10k stores I get
$ perf mem -t store record -c 10000 ./a.out
...
[perf record: Woken up 4 times to write data]
[perf record: Captured and wrote 0.921 MB perf.data (~40247 samples)]
Notice that the number of samples raised by 20x, which to me seems
very odd because the number of stores was half, so I expected 0.5x here.
Or am I supposing this the wrong way?
Just for further testing, if I omit the -c parameter (which I need
:S), it seems to work better
$ perf mem -t load record ./a.out
[perf record: Woken up 1 times to write data]
[perf record: Captured and wrote 0.172 MB perf.data (~7508 samples)]
$ perf mem -t store record ./a.out
[perf record: Woken up 1 times to write data]
[perf record: Captured and wrote 0.151 MB perf.data (~6607 samples)]
Best regards.
[1]
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf
WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.
http://www.bsc.es/disclaimer
[-- Attachment #2: memcpy.c --]
[-- Type: text/x-csrc, Size: 493 bytes --]
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *long_str = "This is a very long string!";
char dest[1024*1024*1024];
int main (int argc, char *argv[])
{
int i;
int length = strlen (long_str);
for (i = 0; i < 1024*1024*1024-length; i += length)
memcpy (&dest[i], long_str, length);
printf ("CHECK: %c\n", dest[0*length+0]);
printf ("CHECK: %c\n", dest[1*length+1]);
printf ("CHECK: %c\n", dest[2*length+2]);
printf ("CHECK: %c\n", dest[3*length+3]);
return 0;
}
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: perf pebs sampling through stores + period is wrong?
2014-02-20 11:40 perf pebs sampling through stores + period is wrong? Harald Servat
@ 2014-02-20 21:27 ` Andi Kleen
2014-02-21 9:45 ` Harald Servat
0 siblings, 1 reply; 3+ messages in thread
From: Andi Kleen @ 2014-02-20 21:27 UTC (permalink / raw)
To: Harald Servat; +Cc: linux-perf-users
Harald Servat <harald.servat@bsc.es> writes:
>
> $ perf mem -t store record -c 10000 ./a.out
> ...
> [perf record: Woken up 4 times to write data]
> [perf record: Captured and wrote 0.921 MB perf.data (~40247 samples)]
>
> Notice that the number of samples raised by 20x, which to me seems
> very odd because the number of stores was half, so I expected 0.5x
> here. Or am I supposing this the wrong way?
Likely you're throttling. 10k is a far too low period for such
measurements
(The cpu can do multiple stores per cycle and it runs at multiple
Ghz. Each PMI takes many thousands of cycles. You can do the math.)
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: perf pebs sampling through stores + period is wrong?
2014-02-20 21:27 ` Andi Kleen
@ 2014-02-21 9:45 ` Harald Servat
0 siblings, 0 replies; 3+ messages in thread
From: Harald Servat @ 2014-02-21 9:45 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-perf-users
On 20/02/14 22:27, Andi Kleen wrote:
> Harald Servat <harald.servat@bsc.es> writes:
>>
>> $ perf mem -t store record -c 10000 ./a.out
>> ...
>> [perf record: Woken up 4 times to write data]
>> [perf record: Captured and wrote 0.921 MB perf.data (~40247 samples)]
>>
>> Notice that the number of samples raised by 20x, which to me seems
>> very odd because the number of stores was half, so I expected 0.5x
>> here. Or am I supposing this the wrong way?
>
> Likely you're throttling. 10k is a far too low period for such
> measurements
>
> (The cpu can do multiple stores per cycle and it runs at multiple
> Ghz. Each PMI takes many thousands of cycles. You can do the math.)
>
> -Andi
>
Dear Andi,
but then why the loads aren't throttling? There are far more loads in
the app than stores (as seen in the perf stat results), but the loads do
not throttle while store do? Of course, apps face different performance
rate as apps run and there may be situations where the number of
loads/second is either larger or smaller than stores/second, but still
it is a bit confusing why so much difference between loads & stores. I
would expect also the loads to throttle, then.
Regards.
WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.
http://www.bsc.es/disclaimer
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-02-21 9:45 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-20 11:40 perf pebs sampling through stores + period is wrong? Harald Servat
2014-02-20 21:27 ` Andi Kleen
2014-02-21 9:45 ` Harald Servat
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).