Understanding perf mem -t load results

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Understanding perf mem -t load results
@ 2013-12-12 14:06 Manuel Selva
  2013-12-15 18:27 ` Manuel Selva
  2013-12-15 21:18 ` Andi Kleen
  0 siblings, 2 replies; 18+ messages in thread
From: Manuel Selva @ 2013-12-12 14:06 UTC (permalink / raw)
  To: linux-perf-users

Hi all,

I am trying to understand the output of the perf mem tool on my workstation with two intel Xeon X5650.

I recorded a perf.data file with memory load sampling (write sampling is not availble for these processors) as following (in the root directory of a Linux kernel source tree):

perf mem -t load rec -c 1 make -j18

Then I am reporting the results with

perf mem rep --sort=mem

 97.00%      25519343  L1 hit
  1.31%         43687  L3 hit
  1.15%         37253  LFB hit
  0.32%          3156  Remote Cache (1 hop) hit
  0.14%         38579  L3 miss
  0.05%          6309  L2 hit
  0.03%           231  Remote RAM (1 hop) hit
  0.00%             8  Local RAM hit
  0.00%             2  Uncached hit

As you can see, 97% of the loads (I am sampling all loads with -c 1: is it true ?) hit the L1 cache. My first question is about this high L1 hit ration and the small number of RAM requests (231 + 8). Is it realistic to have 97% of L1 hit and only 239 RAM accesses when compiling a Linux kernel ? 

Writing this email and looking again into Intel SDM I am thinking that the L3 misses are what is called "unknown L3 cache miss in SDM". As a consequence the total number of memory accesses would be L3 miss + Remote RAM + Local RAM, is it correct ?

The second question is the Uncached hit: is it the Un-cacheable memory in the SDM ? If yes, I guess it's also a request to RAM.

Finally, it's not very clear for me what Line Fill Buffer (LFB) is exactly and I was not able to find a pointer explaining that. Do you know where I can read information about this ?

Thanks,

------
Manuel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-12 14:06 Understanding perf mem -t load results Manuel Selva
@ 2013-12-15 18:27 ` Manuel Selva
  2013-12-15 21:18 ` Andi Kleen
  1 sibling, 0 replies; 18+ messages in thread
From: Manuel Selva @ 2013-12-15 18:27 UTC (permalink / raw)
  To: linux-perf-users

Hi all,

Is there anyone here using the perf mem tool able to answer to my 
previous questions ?

Thanks in advance,

----
Manu

On 12/12/2013 03:06 PM, Manuel Selva wrote:
> Hi all,
>
> I am trying to understand the output of the perf mem tool on my workstation with two intel Xeon X5650.
>
> I recorded a perf.data file with memory load sampling (write sampling is not availble for these processors) as following (in the root directory of a Linux kernel source tree):
>
> perf mem -t load rec -c 1 make -j18
>
> Then I am reporting the results with
>
> perf mem rep --sort=mem
>
>   97.00%      25519343  L1 hit
>    1.31%         43687  L3 hit
>    1.15%         37253  LFB hit
>    0.32%          3156  Remote Cache (1 hop) hit
>    0.14%         38579  L3 miss
>    0.05%          6309  L2 hit
>    0.03%           231  Remote RAM (1 hop) hit
>    0.00%             8  Local RAM hit
>    0.00%             2  Uncached hit
>
> As you can see, 97% of the loads (I am sampling all loads with -c 1: is it true ?) hit the L1 cache. My first question is about this high L1 hit ration and the small number of RAM requests (231 + 8). Is it realistic to have 97% of L1 hit and only 239 RAM accesses when compiling a Linux kernel ?
>
> Writing this email and looking again into Intel SDM I am thinking that the L3 misses are what is called "unknown L3 cache miss in SDM". As a consequence the total number of memory accesses would be L3 miss + Remote RAM + Local RAM, is it correct ?
>
> The second question is the Uncached hit: is it the Un-cacheable memory in the SDM ? If yes, I guess it's also a request to RAM.
>
> Finally, it's not very clear for me what Line Fill Buffer (LFB) is exactly and I was not able to find a pointer explaining that. Do you know where I can read information about this ?
>
> Thanks,
>
> ------
> Manuel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-12 14:06 Understanding perf mem -t load results Manuel Selva
  2013-12-15 18:27 ` Manuel Selva
@ 2013-12-15 21:18 ` Andi Kleen
  2013-12-15 22:03   ` Manuel Selva
  1 sibling, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2013-12-15 21:18 UTC (permalink / raw)
  To: Manuel Selva; +Cc: linux-perf-users

Manuel Selva <manuel.selva@insa-lyon.fr> writes:
>
> perf mem -t load rec -c 1 make -j18

With -c1 on such a frequent event you will not sample much, but just
throttle a lot, as the PMI handler would use too much time. 
Your numbers are likely garbage.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-15 21:18 ` Andi Kleen
@ 2013-12-15 22:03   ` Manuel Selva
  2013-12-15 23:45     ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Manuel Selva @ 2013-12-15 22:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

Thanks for the answer Andi.

Is there any way (I couldn't find in the Intel documentation) to have an 
approximation of the "maximum" acceptable frequency ?

Manu

On 12/15/2013 10:18 PM, Andi Kleen wrote:
> Manuel Selva <manuel.selva@insa-lyon.fr> writes:
>>
>> perf mem -t load rec -c 1 make -j18
>
> With -c1 on such a frequent event you will not sample much, but just
> throttle a lot, as the PMI handler would use too much time.
> Your numbers are likely garbage.
>
> -Andi
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-15 22:03   ` Manuel Selva
@ 2013-12-15 23:45     ` Andi Kleen
  2013-12-16  9:13       ` Manuel Selva
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2013-12-15 23:45 UTC (permalink / raw)
  To: Manuel Selva; +Cc: Andi Kleen, linux-perf-users

On Sun, Dec 15, 2013 at 11:03:13PM +0100, Manuel Selva wrote:
> Thanks for the answer Andi.
> 
> Is there any way (I couldn't find in the Intel documentation) to
> have an approximation of the "maximum" acceptable frequency ?

It depends on how much overhead is acceptable and how you configure
perf (which affects the cost of the handler) and what the executed 
code does. You could tune it to minimize throttles.

I would start with something like 20003

If you only want the relative ratios of course using perf stat would
be better.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-15 23:45     ` Andi Kleen
@ 2013-12-16  9:13       ` Manuel Selva
  2013-12-20  9:38         ` Manuel Selva
  0 siblings, 1 reply; 18+ messages in thread
From: Manuel Selva @ 2013-12-16  9:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Manuel Selva, linux-perf-users

Thanks again.

I guess that by "how you configure perf" you are speaking about
thred/cpu, kernel/user parameters ? For now I am monitoring one
application only on a given node only and I am trying to setup things
in order to get useful results independently of the overhead. I'll
investigate overhead in a second time.

I can't use perf stat because I need to relate perf events with my
application's code.

Manu

2013/12/16 Andi Kleen <andi@firstfloor.org>:
> On Sun, Dec 15, 2013 at 11:03:13PM +0100, Manuel Selva wrote:
>> Thanks for the answer Andi.
>>
>> Is there any way (I couldn't find in the Intel documentation) to
>> have an approximation of the "maximum" acceptable frequency ?
>
> It depends on how much overhead is acceptable and how you configure
> perf (which affects the cost of the handler) and what the executed
> code does. You could tune it to minimize throttles.
>
> I would start with something like 20003
>
> If you only want the relative ratios of course using perf stat would
> be better.
>
> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-16  9:13       ` Manuel Selva
@ 2013-12-20  9:38         ` Manuel Selva
  2013-12-24  2:18           ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Manuel Selva @ 2013-12-20  9:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

Hi,

I am still working on my quest to identify remote memory accesses
using perf. I am confused about the LFB samples collected wit "perf
mem rec". According to this Intel blog
http://software.intel.com/en-us/blogs/2010/11/11/utilizing-load-latency-event-in-performance-monitoring-to-get-line-fill-buffer-breakdown
one could use the weight information gathered by the hardware to
identify "the potential data sources of LFB sample events".

In my experiments I get LFB samples with weight equal or greater than
local RAM samples, I should I interpret these values ?

Moreover, because I am more interested in identifying remote memory
accesses sources than in effective load latency weight, I am wondering
if using the perf mem tool (using the
MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD event) is the best solution.
Couldn't I use sampling on other events such as UNCORE ones more
easily and efficiently ?

Thanks again for your help,

Manu

2013/12/16 Manuel Selva <selva.manuel@gmail.com>:
> Thanks again.
>
> I guess that by "how you configure perf" you are speaking about
> thred/cpu, kernel/user parameters ? For now I am monitoring one
> application only on a given node only and I am trying to setup things
> in order to get useful results independently of the overhead. I'll
> investigate overhead in a second time.
>
> I can't use perf stat because I need to relate perf events with my
> application's code.
>
> Manu
>
> 2013/12/16 Andi Kleen <andi@firstfloor.org>:
>> On Sun, Dec 15, 2013 at 11:03:13PM +0100, Manuel Selva wrote:
>>> Thanks for the answer Andi.
>>>
>>> Is there any way (I couldn't find in the Intel documentation) to
>>> have an approximation of the "maximum" acceptable frequency ?
>>
>> It depends on how much overhead is acceptable and how you configure
>> perf (which affects the cost of the handler) and what the executed
>> code does. You could tune it to minimize throttles.
>>
>> I would start with something like 20003
>>
>> If you only want the relative ratios of course using perf stat would
>> be better.
>>
>> -Andi
>>
>> --
>> ak@linux.intel.com -- Speaking for myself only.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-20  9:38         ` Manuel Selva
@ 2013-12-24  2:18           ` Andi Kleen
  2013-12-24  7:10             ` Manuel Selva
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2013-12-24  2:18 UTC (permalink / raw)
  To: Manuel Selva; +Cc: Andi Kleen, linux-perf-users

> In my experiments I get LFB samples with weight equal or greater than
> local RAM samples, I should I interpret these values ?

The measured latency can include the pipeline latency.

> Moreover, because I am more interested in identifying remote memory
> accesses sources than in effective load latency weight, I am wondering
> if using the perf mem tool (using the
> MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD event) is the best solution.
> Couldn't I use sampling on other events such as UNCORE ones more
> easily and efficiently ?

You cannot use uncore events to sample on IPs. 

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-24  2:18           ` Andi Kleen
@ 2013-12-24  7:10             ` Manuel Selva
  2013-12-24  7:28               ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Manuel Selva @ 2013-12-24  7:10 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

Thanks,

"You cannot use uncore events to sample IPs" means that the values 
corresponding to PERF_SAMPLE_IP are not correct ? Some benchmarks I did 
by sampling ME_INST_RETIRED with PERF_SAMPLE_IP let me think that I was 
able to get the source of th event. The IP value was coherent. Maybe 
this is not always the case.

I am thus going to look a offcore request to sample remote memory 
accesses with correct IPs values.

Manu

On 12/24/2013 03:18 AM, Andi Kleen wrote:
>> In my experiments I get LFB samples with weight equal or greater than
>> local RAM samples, I should I interpret these values ?
>
> The measured latency can include the pipeline latency.
>
>> Moreover, because I am more interested in identifying remote memory
>> accesses sources than in effective load latency weight, I am wondering
>> if using the perf mem tool (using the
>> MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD event) is the best solution.
>> Couldn't I use sampling on other events such as UNCORE ones more
>> easily and efficiently ?
>
> You cannot use uncore events to sample on IPs.
>
> -Andi
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-24  7:10             ` Manuel Selva
@ 2013-12-24  7:28               ` Andi Kleen
  2013-12-24  7:42                 ` Manuel Selva
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2013-12-24  7:28 UTC (permalink / raw)
  To: Manuel Selva; +Cc: Andi Kleen, linux-perf-users

On Tue, Dec 24, 2013 at 08:10:10AM +0100, Manuel Selva wrote:
> "You cannot use uncore events to sample IPs" means that the values
> corresponding to PERF_SAMPLE_IP are not correct ? Some benchmarks I
> did by sampling ME_INST_RETIRED with PERF_SAMPLE_IP let me think
> that I was able to get the source of th event. The IP value was
> coherent. Maybe this is not always the case.

It's the IP of a random core on the socket that happens to read the uncore
registers.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-24  7:28               ` Andi Kleen
@ 2013-12-24  7:42                 ` Manuel Selva
  2013-12-24 21:27                   ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Manuel Selva @ 2013-12-24  7:42 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

On 12/24/2013 08:28 AM, Andi Kleen wrote:
> On Tue, Dec 24, 2013 at 08:10:10AM +0100, Manuel Selva wrote:
>> "You cannot use uncore events to sample IPs" means that the values
>> corresponding to PERF_SAMPLE_IP are not correct ? Some benchmarks I
>> did by sampling ME_INST_RETIRED with PERF_SAMPLE_IP let me think
>> that I was able to get the source of th event. The IP value was
>> coherent. Maybe this is not always the case.
>
> It's the IP of a random core on the socket that happens to read the uncore
> registers.
>

Ok. Where should I have read this information, in the Intel Software 
Developer’s Manual volume 3B (I guess this is a hardware limitation) ?

I checked what can be measured with the offcore facility. I can ask to 
count remote memory accesses, but now I am not sure PEBS is enable for 
these events. If the answer is no, I can't accurately identify remote 
memory accesses with offcore events ? As a consequence I am wondering if 
there is a way to do such thing ?

Manu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-24  7:42                 ` Manuel Selva
@ 2013-12-24 21:27                   ` Andi Kleen
  2013-12-25 10:25                     ` Manuel Selva
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2013-12-24 21:27 UTC (permalink / raw)
  To: Manuel Selva; +Cc: Andi Kleen, linux-perf-users

> I checked what can be measured with the offcore facility. I can ask
> to count remote memory accesses, but now I am not sure PEBS is
> enable for these events. If the answer is no, I can't accurately
> identify remote memory accesses with offcore events ? As a
> consequence I am wondering if there is a way to do such thing ?

The OFFCORE events are not PEBS enabled. However the memory PEBS
events report the same information (with some limitations) in
the PEBS record (and perf reports this information) 

In general you cannot accurately profile all memory accesses.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-24 21:27                   ` Andi Kleen
@ 2013-12-25 10:25                     ` Manuel Selva
  2014-01-07 15:06                       ` Manuel Selva
  0 siblings, 1 reply; 18+ messages in thread
From: Manuel Selva @ 2013-12-25 10:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users


On 12/24/2013 10:27 PM, Andi Kleen wrote:
>> I checked what can be measured with the offcore facility. I can ask
>> to count remote memory accesses, but now I am not sure PEBS is
>> enable for these events. If the answer is no, I can't accurately
>> identify remote memory accesses with offcore events ? As a
>> consequence I am wondering if there is a way to do such thing ?
>
> The OFFCORE events are not PEBS enabled.

Ok, you confirm what I read in the Intel SDM.

However the memory PEBS
> events report the same information (with some limitations) in
> the PEBS record (and perf reports this information)
By memory PEBS events, do you mean the following events ?

MEM_LOAD_RETIRED.L1D_MISS
MEM_LOAD_RETIRED.L1D_LINE_MISS
MEM_LOAD_RETIRED.L2_MISS
MEM_LOAD_RETIRED.L2_LINE_MISS
MEM_LOAD_RETIRED.DTLB_MISS

I can't profile memory accesses with these events, isn't it ? What do 
you mean by perf reports this infomation, which perf tool ?

>
> In general you cannot accurately profile all memory accesses.

Does it means that the perf mem record tool is reporting wrong values 
concerning the source of the event ? (I checked that this tool use the 
load latency event with PEBS and you said that the IP reported by this 
uncore event is not always the correct one ?)

Thanks,

----
Manu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2013-12-25 10:25                     ` Manuel Selva
@ 2014-01-07 15:06                       ` Manuel Selva
  2014-01-07 21:27                         ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Manuel Selva @ 2014-01-07 15:06 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

Hi Andi,

Coming back from winter break, looking at the perf user land tool (in
the kernel sources) I am still confuse about this discussion.

The perf mem record tool use the
MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD which I guess belongs to the
memory PEBS events you mentionned (isn't it ?) is reporting
information about the sources of the events. it seems that for this
purpose the IP information is used, does it mean that what perf mem
reports can be wrong because of the random IP selection mechanism you
mentionned ?

Thanks for your help and sorry for all these questions, but I can't
find an answer elswhere than on this list.

Manu

2013/12/25 Manuel Selva <selva.manuel@gmail.com>:
>
> On 12/24/2013 10:27 PM, Andi Kleen wrote:
>>>
>>> I checked what can be measured with the offcore facility. I can ask
>>> to count remote memory accesses, but now I am not sure PEBS is
>>> enable for these events. If the answer is no, I can't accurately
>>> identify remote memory accesses with offcore events ? As a
>>> consequence I am wondering if there is a way to do such thing ?
>>
>>
>> The OFFCORE events are not PEBS enabled.
>
>
> Ok, you confirm what I read in the Intel SDM.
>
>
> However the memory PEBS
>>
>> events report the same information (with some limitations) in
>> the PEBS record (and perf reports this information)
>
> By memory PEBS events, do you mean the following events ?
>
> MEM_LOAD_RETIRED.L1D_MISS
> MEM_LOAD_RETIRED.L1D_LINE_MISS
> MEM_LOAD_RETIRED.L2_MISS
> MEM_LOAD_RETIRED.L2_LINE_MISS
> MEM_LOAD_RETIRED.DTLB_MISS
>
> I can't profile memory accesses with these events, isn't it ? What do you
> mean by perf reports this infomation, which perf tool ?
>
>
>>
>> In general you cannot accurately profile all memory accesses.
>
>
> Does it means that the perf mem record tool is reporting wrong values
> concerning the source of the event ? (I checked that this tool use the load
> latency event with PEBS and you said that the IP reported by this uncore
> event is not always the correct one ?)
>
> Thanks,
>
> ----
> Manu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2014-01-07 15:06                       ` Manuel Selva
@ 2014-01-07 21:27                         ` Andi Kleen
  2014-01-08  8:53                           ` Manuel Selva
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2014-01-07 21:27 UTC (permalink / raw)
  To: Manuel Selva; +Cc: linux-perf-users

Manuel Selva <selva.manuel@gmail.com> writes:


> The perf mem record tool use the
> MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD which I guess belongs to the
> memory PEBS events you mentionned (isn't it ?) is reporting

Yes.

> information about the sources of the events. it seems that for this
> purpose the IP information is used, does it mean that what perf mem
> reports can be wrong because of the random IP selection mechanism you
> mentionned ?

The CPU can theoretically execute billions of loads/stores every second.
There is no way any reporting mechanism can keep up with that.
So the only thing you can do is to sample (only collect
every N operations), with a fairly large period.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2014-01-07 21:27                         ` Andi Kleen
@ 2014-01-08  8:53                           ` Manuel Selva
  2014-01-08  9:50                             ` Manuel Selva
  2014-01-08 19:44                             ` Andi Kleen
  0 siblings, 2 replies; 18+ messages in thread
From: Manuel Selva @ 2014-01-08  8:53 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

Thanks.

I get the point about sampling, but my question was more about the
sources of memory accesses. In previous answer, you told me that the
PERF_SAMPLE_IP field is filled with "the IP of a random core on the
socket that happens to read the uncore registers", so my last question
is about the result presented by perf mem report tool.

This tool (like perf report) reports a Symbol and a Shared Object
colums. I am thus wondering how can the entries in these columns be
correct if the IP for each sample is wrong ?

Manu

2014/1/7 Andi Kleen <andi@firstfloor.org>:
> Manuel Selva <selva.manuel@gmail.com> writes:
>
>
>> The perf mem record tool use the
>> MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD which I guess belongs to the
>> memory PEBS events you mentionned (isn't it ?) is reporting
>
> Yes.
>
>> information about the sources of the events. it seems that for this
>> purpose the IP information is used, does it mean that what perf mem
>> reports can be wrong because of the random IP selection mechanism you
>> mentionned ?
>
> The CPU can theoretically execute billions of loads/stores every second.
> There is no way any reporting mechanism can keep up with that.
> So the only thing you can do is to sample (only collect
> every N operations), with a fairly large period.
>
> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2014-01-08  8:53                           ` Manuel Selva
@ 2014-01-08  9:50                             ` Manuel Selva
  2014-01-08 19:44                             ` Andi Kleen
  1 sibling, 0 replies; 18+ messages in thread
From: Manuel Selva @ 2014-01-08  9:50 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

Ok I wrongly interpreted one of your previous answer and my confusion
comes from there. Sorry. UNCORE events can't be used to sample on IP,
but of course the MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD event used
by perf mem is not an UNCORE one.

Thanks

Manu



2014/1/8 Manuel Selva <selva.manuel@gmail.com>:
> Thanks.
>
> I get the point about sampling, but my question was more about the
> sources of memory accesses. In previous answer, you told me that the
> PERF_SAMPLE_IP field is filled with "the IP of a random core on the
> socket that happens to read the uncore registers", so my last question
> is about the result presented by perf mem report tool.
>
> This tool (like perf report) reports a Symbol and a Shared Object
> colums. I am thus wondering how can the entries in these columns be
> correct if the IP for each sample is wrong ?
>
> Manu
>
> 2014/1/7 Andi Kleen <andi@firstfloor.org>:
>> Manuel Selva <selva.manuel@gmail.com> writes:
>>
>>
>>> The perf mem record tool use the
>>> MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD which I guess belongs to the
>>> memory PEBS events you mentionned (isn't it ?) is reporting
>>
>> Yes.
>>
>>> information about the sources of the events. it seems that for this
>>> purpose the IP information is used, does it mean that what perf mem
>>> reports can be wrong because of the random IP selection mechanism you
>>> mentionned ?
>>
>> The CPU can theoretically execute billions of loads/stores every second.
>> There is no way any reporting mechanism can keep up with that.
>> So the only thing you can do is to sample (only collect
>> every N operations), with a fairly large period.
>>
>> -Andi
>>
>> --
>> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Understanding perf mem -t load results
  2014-01-08  8:53                           ` Manuel Selva
  2014-01-08  9:50                             ` Manuel Selva
@ 2014-01-08 19:44                             ` Andi Kleen
  1 sibling, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2014-01-08 19:44 UTC (permalink / raw)
  To: Manuel Selva; +Cc: linux-perf-users

Manuel Selva <selva.manuel@gmail.com> writes:

> In previous answer, you told me that the
> PERF_SAMPLE_IP field is filled with "the IP of a random core on the
> socket that happens to read the uncore registers",


This was for uncore sampling, not perf mem.

perf mem reports the core IP.


-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-01-08 19:44 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-12 14:06 Understanding perf mem -t load results Manuel Selva
2013-12-15 18:27 ` Manuel Selva
2013-12-15 21:18 ` Andi Kleen
2013-12-15 22:03   ` Manuel Selva
2013-12-15 23:45     ` Andi Kleen
2013-12-16  9:13       ` Manuel Selva
2013-12-20  9:38         ` Manuel Selva
2013-12-24  2:18           ` Andi Kleen
2013-12-24  7:10             ` Manuel Selva
2013-12-24  7:28               ` Andi Kleen
2013-12-24  7:42                 ` Manuel Selva
2013-12-24 21:27                   ` Andi Kleen
2013-12-25 10:25                     ` Manuel Selva
2014-01-07 15:06                       ` Manuel Selva
2014-01-07 21:27                         ` Andi Kleen
2014-01-08  8:53                           ` Manuel Selva
2014-01-08  9:50                             ` Manuel Selva
2014-01-08 19:44                             ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).