Intel PEBS Load Latency Measurement

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Intel PEBS  Load Latency Measurement
@ 2013-09-17  7:44 Manuel Selva
  2013-09-19  8:22 ` Andi Kleen
  2013-10-28 11:28 ` Manuel Selva
  0 siblings, 2 replies; 13+ messages in thread
From: Manuel Selva @ 2013-09-17  7:44 UTC (permalink / raw)
  To: linux-perf-users

Hi all,

I am trying to use PMU on a 2 sockets workstation (2x Intel Xeon X5650 
currently running Linux 3.6.11) processor to identify memory controller 
unbalance.

For this purpose I successfully used some uncore events to count the 
load on each memory controller through the perf_event_open system call. 
I am now planning to use Intel PEBS Load Latency Measurement to identify 
if these loads result in "unusual" long memory latencies.

Looking at Vince Weaver web page discussing about kernel support for PMU 
here: 
http://web.eece.maine.edu/~vweaver/projects/perf_events/features.html I 
saw that the kernel 3.10 supports Load Latency Measurement. 
Unfortunately I can't find a man page describing how this works. Before 
looking at kernel sources, I wanted to ask here for confirmation about 
Load Latency Measurement in recent Linux kernels.

Can anyone confirm that this functionality is available and usable ? Is 
the perf userland tool using it to provide the functionality to end users ?

Thanks in advance for your help,

-- 
Manu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel PEBS  Load Latency Measurement
  2013-09-17  7:44 Intel PEBS Load Latency Measurement Manuel Selva
@ 2013-09-19  8:22 ` Andi Kleen
  2013-10-28 11:28 ` Manuel Selva
  1 sibling, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2013-09-19  8:22 UTC (permalink / raw)
  To: Manuel Selva; +Cc: linux-perf-users

Manuel Selva <selva.manuel@gmail.com> writes:

> Is the perf userland tool using it to provide the functionality to end
> users ?

It's in the standard perf

perf mem record ...
perf mem report

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel PEBS  Load Latency Measurement
  2013-09-17  7:44 Intel PEBS Load Latency Measurement Manuel Selva
  2013-09-19  8:22 ` Andi Kleen
@ 2013-10-28 11:28 ` Manuel Selva
  2013-10-29  2:36   ` Namhyung Kim
  1 sibling, 1 reply; 13+ messages in thread
From: Manuel Selva @ 2013-10-28 11:28 UTC (permalink / raw)
  To: Manuel Selva, linux-perf-users

Hi,

I am coming back on this subject after working on other stuff for 
several weeks. Andi pointed me to the userland tool 'perf mem' 
introduced in "recent" kernels (can't find the version) that is using 
the kernel perf_event_open system call to profile memory accesses.

I guess the answer to my question is in the code of this tool, but 
before stepping deeper inside it, I wanted to ask you (Linux perf 
experts) few questions, to be sure I am on the right track.

For now, I just configured a perf_event_attr to perform sampling of 
PERF_COUNT_HW_INSTRUCTIONS at a given period. Can you confirm than the 
sample_period means "the kernel will generate a sample (with fields 
asked through sample_type) every sample_period instructions ?

Then after calling the perf_event_open system call I mmap the file 
descriptor returned with an arbitrary size of X pages (with X = 1 + 2^n).

I then start recording events with ioctl on the file descriptor returned 
by perf_event_open. I am now wondering how to access the samples. My 
main concern is about the meaning of the data_head and data_tail fields 
of the metadata page located at the beginning of the memory mmaped. In 
understand that my samples are located just after this metadata page, 
and that these head and tail pointers are used to indicate where we are 
in the reading of the samples, is it correct ? While reading samples, 
should I use/modify these head and tail pointers, if yes what is the 
purpose of that ?

I am going now to look for the perf mem code, to try to understand that 
from my side, but I am interested in any hint on the subject that may 
help me.

Many thanks in advance for your help,

Manu

On 09/17/2013 09:44 AM, Manuel Selva wrote:
> Hi all,
>
> I am trying to use PMU on a 2 sockets workstation (2x Intel Xeon X5650
> currently running Linux 3.6.11) processor to identify memory controller
> unbalance.
>
> For this purpose I successfully used some uncore events to count the
> load on each memory controller through the perf_event_open system call.
> I am now planning to use Intel PEBS Load Latency Measurement to identify
> if these loads result in "unusual" long memory latencies.
>
> Looking at Vince Weaver web page discussing about kernel support for PMU
> here:
> http://web.eece.maine.edu/~vweaver/projects/perf_events/features.html I
> saw that the kernel 3.10 supports Load Latency Measurement.
> Unfortunately I can't find a man page describing how this works. Before
> looking at kernel sources, I wanted to ask here for confirmation about
> Load Latency Measurement in recent Linux kernels.
>
> Can anyone confirm that this functionality is available and usable ? Is
> the perf userland tool using it to provide the functionality to end users ?
>
> Thanks in advance for your help,
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel PEBS  Load Latency Measurement
  2013-10-28 11:28 ` Manuel Selva
@ 2013-10-29  2:36   ` Namhyung Kim
       [not found]     ` <CALbiyZy_JE+wai7d_=r-XzE+FdHRitTiAuPmANtRt7Qpet8fTg@mail.gmail.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Namhyung Kim @ 2013-10-29  2:36 UTC (permalink / raw)
  To: Manuel Selva; +Cc: Manuel Selva, linux-perf-users

Hi Manuel,

On Mon, 28 Oct 2013 12:28:06 +0100, Manuel Selva wrote:
> Hi,
>
> I am coming back on this subject after working on other stuff for
> several weeks. Andi pointed me to the userland tool 'perf mem'
> introduced in "recent" kernels (can't find the version) that is using
> the kernel perf_event_open system call to profile memory accesses.
>
> I guess the answer to my question is in the code of this tool, but
> before stepping deeper inside it, I wanted to ask you (Linux perf
> experts) few questions, to be sure I am on the right track.
>
> For now, I just configured a perf_event_attr to perform sampling of
> PERF_COUNT_HW_INSTRUCTIONS at a given period. Can you confirm than the
> sample_period means "the kernel will generate a sample (with fields
> asked through sample_type) every sample_period instructions ?

Yes.

>
> Then after calling the perf_event_open system call I mmap the file
> descriptor returned with an arbitrary size of X pages (with X = 1 +
> 2^n).
>
> I then start recording events with ioctl on the file descriptor
> returned by perf_event_open. I am now wondering how to access the
> samples. My main concern is about the meaning of the data_head and
> data_tail fields of the metadata page located at the beginning of the
> memory mmaped. In understand that my samples are located just after
> this metadata page, and that these head and tail pointers are used to
> indicate where we are in the reading of the samples, is it correct ?

Correct.


> While reading samples, should I use/modify these head and tail
> pointers, if yes what is the purpose of that ?

The head is updated by kernel, you only need to update the tail after
reading.  Please see perf_record__mmap_read().

>
> I am going now to look for the perf mem code, to try to understand
> that from my side, but I am interested in any hint on the subject that
> may help me.
>
> Many thanks in advance for your help,

Hope this helps,
Namhyung

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Fwd: Intel PEBS Load Latency Measurement
       [not found]     ` <CALbiyZy_JE+wai7d_=r-XzE+FdHRitTiAuPmANtRt7Qpet8fTg@mail.gmail.com>
@ 2013-10-29  9:12       ` Manuel Selva
  2013-10-29 13:20         ` Manuel Selva
  2013-11-01  8:38         ` Fwd: " Namhyung Kim
  0 siblings, 2 replies; 13+ messages in thread
From: Manuel Selva @ 2013-10-29  9:12 UTC (permalink / raw)
  To: Namhyung Kim, linux-perf-users

Hi Namhyung,

Many thanks for your answer and the function you pointed. I think I
now have all the required understanding of the perf_event_open syscall
to do what I want.

I still have two questions regarding Intel (I am on a Westmere-Ep Xeon
X5650) Load latency feature and its usage by the perf mem tool.

1- In the Intel software developer guide we can read: "load operations
are randomly selected by hardware and tagged to carry information
related to data source locality and latency" I am wondering what does
it mean, are we doing sampling at two different levels ? First the
hardware chooses some load instructions to tag, and then each time X
(sampling period in events count specified by software) such tagged
instructions with a latency greater than a software specify threshold
we record a sample with some information. What is the sampling rate of
the hardware tagging mechanism, is it enough to get some interesting
results ?

2- How does the perf mem tool (with the load option) with of course
the help of the kernel uses this feature ? After a quick browsing of
the code, here is my understanding, is it correct ?
The PEBS load latency feature is enabled with the minimal possible
latency (3 cycles) to do sampling on all loads and with a given
default sampling period (x tagged load events with latency greater or
equal to 3). In addition to these "loads events" the perf mem tool
asks the kernel to record events about processes naming, and memory
mappings of code to be able to retrieve offline the source code
associated to instruction pointers present in samples.

Thanks again for your help,

Manu

2013/10/29 Namhyung Kim <namhyung@kernel.org>
>
> Hi Manuel,
>
> On Mon, 28 Oct 2013 12:28:06 +0100, Manuel Selva wrote:
> > Hi,
> >
> > I am coming back on this subject after working on other stuff for
> > several weeks. Andi pointed me to the userland tool 'perf mem'
> > introduced in "recent" kernels (can't find the version) that is using
> > the kernel perf_event_open system call to profile memory accesses.
> >
> > I guess the answer to my question is in the code of this tool, but
> > before stepping deeper inside it, I wanted to ask you (Linux perf
> > experts) few questions, to be sure I am on the right track.
> >
> > For now, I just configured a perf_event_attr to perform sampling of
> > PERF_COUNT_HW_INSTRUCTIONS at a given period. Can you confirm than the
> > sample_period means "the kernel will generate a sample (with fields
> > asked through sample_type) every sample_period instructions ?
>
> Yes.
>
> >
> > Then after calling the perf_event_open system call I mmap the file
> > descriptor returned with an arbitrary size of X pages (with X = 1 +
> > 2^n).
> >
> > I then start recording events with ioctl on the file descriptor
> > returned by perf_event_open. I am now wondering how to access the
> > samples. My main concern is about the meaning of the data_head and
> > data_tail fields of the metadata page located at the beginning of the
> > memory mmaped. In understand that my samples are located just after
> > this metadata page, and that these head and tail pointers are used to
> > indicate where we are in the reading of the samples, is it correct ?
>
> Correct.
>
>
> > While reading samples, should I use/modify these head and tail
> > pointers, if yes what is the purpose of that ?
>
> The head is updated by kernel, you only need to update the tail after
> reading.  Please see perf_record__mmap_read().
>
> >
> > I am going now to look for the perf mem code, to try to understand
> > that from my side, but I am interested in any hint on the subject that
> > may help me.
> >
> > Many thanks in advance for your help,
>
> Hope this helps,
> Namhyung

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel PEBS Load Latency Measurement
  2013-10-29  9:12       ` Fwd: " Manuel Selva
@ 2013-10-29 13:20         ` Manuel Selva
  2013-11-01  8:41           ` Namhyung Kim
  2013-11-01  8:38         ` Fwd: " Namhyung Kim
  1 sibling, 1 reply; 13+ messages in thread
From: Manuel Selva @ 2013-10-29 13:20 UTC (permalink / raw)
  To: Namhyung Kim, linux-perf-users

One more thing I forgot to ask is clarification about the pid
parameter. According to Vince Weaver page: "If pid is 0, measurements
happen on the current thread, if pid is greater than 0, the process
indicated by pid is measured, and if pid is -1, all processes are
counted." and according to perf userland tool wiki page, it's possible
to attache to a specific thread with a -i option. As a consequence I
wonder how I can use the perf perf_event_sys_call to only count events
for a specific thread ?

Thanks again

2013/10/29 Manuel Selva <selva.manuel@gmail.com>:
> Hi Namhyung,
>
> Many thanks for your answer and the function you pointed. I think I
> now have all the required understanding of the perf_event_open syscall
> to do what I want.
>
> I still have two questions regarding Intel (I am on a Westmere-Ep Xeon
> X5650) Load latency feature and its usage by the perf mem tool.
>
> 1- In the Intel software developer guide we can read: "load operations
> are randomly selected by hardware and tagged to carry information
> related to data source locality and latency" I am wondering what does
> it mean, are we doing sampling at two different levels ? First the
> hardware chooses some load instructions to tag, and then each time X
> (sampling period in events count specified by software) such tagged
> instructions with a latency greater than a software specify threshold
> we record a sample with some information. What is the sampling rate of
> the hardware tagging mechanism, is it enough to get some interesting
> results ?
>
> 2- How does the perf mem tool (with the load option) with of course
> the help of the kernel uses this feature ? After a quick browsing of
> the code, here is my understanding, is it correct ?
> The PEBS load latency feature is enabled with the minimal possible
> latency (3 cycles) to do sampling on all loads and with a given
> default sampling period (x tagged load events with latency greater or
> equal to 3). In addition to these "loads events" the perf mem tool
> asks the kernel to record events about processes naming, and memory
> mappings of code to be able to retrieve offline the source code
> associated to instruction pointers present in samples.
>
> Thanks again for your help,
>
> Manu
>
>
> 2013/10/29 Namhyung Kim <namhyung@kernel.org>
>>
>> Hi Manuel,
>>
>> On Mon, 28 Oct 2013 12:28:06 +0100, Manuel Selva wrote:
>> > Hi,
>> >
>> > I am coming back on this subject after working on other stuff for
>> > several weeks. Andi pointed me to the userland tool 'perf mem'
>> > introduced in "recent" kernels (can't find the version) that is using
>> > the kernel perf_event_open system call to profile memory accesses.
>> >
>> > I guess the answer to my question is in the code of this tool, but
>> > before stepping deeper inside it, I wanted to ask you (Linux perf
>> > experts) few questions, to be sure I am on the right track.
>> >
>> > For now, I just configured a perf_event_attr to perform sampling of
>> > PERF_COUNT_HW_INSTRUCTIONS at a given period. Can you confirm than the
>> > sample_period means "the kernel will generate a sample (with fields
>> > asked through sample_type) every sample_period instructions ?
>>
>> Yes.
>>
>> >
>> > Then after calling the perf_event_open system call I mmap the file
>> > descriptor returned with an arbitrary size of X pages (with X = 1 +
>> > 2^n).
>> >
>> > I then start recording events with ioctl on the file descriptor
>> > returned by perf_event_open. I am now wondering how to access the
>> > samples. My main concern is about the meaning of the data_head and
>> > data_tail fields of the metadata page located at the beginning of the
>> > memory mmaped. In understand that my samples are located just after
>> > this metadata page, and that these head and tail pointers are used to
>> > indicate where we are in the reading of the samples, is it correct ?
>>
>> Correct.
>>
>>
>> > While reading samples, should I use/modify these head and tail
>> > pointers, if yes what is the purpose of that ?
>>
>> The head is updated by kernel, you only need to update the tail after
>> reading.  Please see perf_record__mmap_read().
>>
>> >
>> > I am going now to look for the perf mem code, to try to understand
>> > that from my side, but I am interested in any hint on the subject that
>> > may help me.
>> >
>> > Many thanks in advance for your help,
>>
>> Hope this helps,
>> Namhyung

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fwd: Intel PEBS Load Latency Measurement
  2013-10-29  9:12       ` Fwd: " Manuel Selva
  2013-10-29 13:20         ` Manuel Selva
@ 2013-11-01  8:38         ` Namhyung Kim
  2013-11-06 13:06           ` Manuel Selva
  2013-11-06 13:41           ` Stephane Eranian
  1 sibling, 2 replies; 13+ messages in thread
From: Namhyung Kim @ 2013-11-01  8:38 UTC (permalink / raw)
  To: Manuel Selva; +Cc: linux-perf-users, Stephane Eranian

Hi Manuel,

I'm CC-ing Stephane who is the author of the perf mem tool.  Stephane,
could you please answer the questions below if you have some time?

Thanks,
Namhyung


On Tue, 29 Oct 2013 10:12:39 +0100, Manuel Selva wrote:
> Hi Namhyung,
>
> Many thanks for your answer and the function you pointed. I think I
> now have all the required understanding of the perf_event_open syscall
> to do what I want.
>
> I still have two questions regarding Intel (I am on a Westmere-Ep Xeon
> X5650) Load latency feature and its usage by the perf mem tool.
>
> 1- In the Intel software developer guide we can read: "load operations
> are randomly selected by hardware and tagged to carry information
> related to data source locality and latency" I am wondering what does
> it mean, are we doing sampling at two different levels ? First the
> hardware chooses some load instructions to tag, and then each time X
> (sampling period in events count specified by software) such tagged
> instructions with a latency greater than a software specify threshold
> we record a sample with some information. What is the sampling rate of
> the hardware tagging mechanism, is it enough to get some interesting
> results ?
>
> 2- How does the perf mem tool (with the load option) with of course
> the help of the kernel uses this feature ? After a quick browsing of
> the code, here is my understanding, is it correct ?
> The PEBS load latency feature is enabled with the minimal possible
> latency (3 cycles) to do sampling on all loads and with a given
> default sampling period (x tagged load events with latency greater or
> equal to 3). In addition to these "loads events" the perf mem tool
> asks the kernel to record events about processes naming, and memory
> mappings of code to be able to retrieve offline the source code
> associated to instruction pointers present in samples.
>
> Thanks again for your help,
>
> Manu
>
>
> 2013/10/29 Namhyung Kim <namhyung@kernel.org>
>>
>> Hi Manuel,
>>
>> On Mon, 28 Oct 2013 12:28:06 +0100, Manuel Selva wrote:
>> > Hi,
>> >
>> > I am coming back on this subject after working on other stuff for
>> > several weeks. Andi pointed me to the userland tool 'perf mem'
>> > introduced in "recent" kernels (can't find the version) that is using
>> > the kernel perf_event_open system call to profile memory accesses.
>> >
>> > I guess the answer to my question is in the code of this tool, but
>> > before stepping deeper inside it, I wanted to ask you (Linux perf
>> > experts) few questions, to be sure I am on the right track.
>> >
>> > For now, I just configured a perf_event_attr to perform sampling of
>> > PERF_COUNT_HW_INSTRUCTIONS at a given period. Can you confirm than the
>> > sample_period means "the kernel will generate a sample (with fields
>> > asked through sample_type) every sample_period instructions ?
>>
>> Yes.
>>
>> >
>> > Then after calling the perf_event_open system call I mmap the file
>> > descriptor returned with an arbitrary size of X pages (with X = 1 +
>> > 2^n).
>> >
>> > I then start recording events with ioctl on the file descriptor
>> > returned by perf_event_open. I am now wondering how to access the
>> > samples. My main concern is about the meaning of the data_head and
>> > data_tail fields of the metadata page located at the beginning of the
>> > memory mmaped. In understand that my samples are located just after
>> > this metadata page, and that these head and tail pointers are used to
>> > indicate where we are in the reading of the samples, is it correct ?
>>
>> Correct.
>>
>>
>> > While reading samples, should I use/modify these head and tail
>> > pointers, if yes what is the purpose of that ?
>>
>> The head is updated by kernel, you only need to update the tail after
>> reading.  Please see perf_record__mmap_read().
>>
>> >
>> > I am going now to look for the perf mem code, to try to understand
>> > that from my side, but I am interested in any hint on the subject that
>> > may help me.
>> >
>> > Many thanks in advance for your help,
>>
>> Hope this helps,
>> Namhyung

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel PEBS Load Latency Measurement
  2013-10-29 13:20         ` Manuel Selva
@ 2013-11-01  8:41           ` Namhyung Kim
  2013-11-01  9:02             ` Manuel Selva
  2013-11-01 17:02             ` Vince Weaver
  0 siblings, 2 replies; 13+ messages in thread
From: Namhyung Kim @ 2013-11-01  8:41 UTC (permalink / raw)
  To: Manuel Selva; +Cc: linux-perf-users

On Tue, 29 Oct 2013 14:20:09 +0100, Manuel Selva wrote:
> One more thing I forgot to ask is clarification about the pid
> parameter. According to Vince Weaver page: "If pid is 0, measurements
> happen on the current thread, if pid is greater than 0, the process
> indicated by pid is measured, and if pid is -1, all processes are
> counted." and according to perf userland tool wiki page, it's possible
> to attache to a specific thread with a -i option. As a consequence I
> wonder how I can use the perf perf_event_sys_call to only count events
> for a specific thread ?

In the syscall's point of view, pid is actually tid AFAIK - so I works
on the thread-basis not the process.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel PEBS Load Latency Measurement
  2013-11-01  8:41           ` Namhyung Kim
@ 2013-11-01  9:02             ` Manuel Selva
  2013-11-01 17:02             ` Vince Weaver
  1 sibling, 0 replies; 13+ messages in thread
From: Manuel Selva @ 2013-11-01  9:02 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: linux-perf-users

Thanks for this info. It confirms what I finally concluded after looking 
at source code where I saw that pid is used to get a task_struct object.

Manu

On 11/01/2013 09:41 AM, Namhyung Kim wrote:
> On Tue, 29 Oct 2013 14:20:09 +0100, Manuel Selva wrote:
>> One more thing I forgot to ask is clarification about the pid
>> parameter. According to Vince Weaver page: "If pid is 0, measurements
>> happen on the current thread, if pid is greater than 0, the process
>> indicated by pid is measured, and if pid is -1, all processes are
>> counted." and according to perf userland tool wiki page, it's possible
>> to attache to a specific thread with a -i option. As a consequence I
>> wonder how I can use the perf perf_event_sys_call to only count events
>> for a specific thread ?
>
> In the syscall's point of view, pid is actually tid AFAIK - so I works
> on the thread-basis not the process.
>
> Thanks,
> Namhyung
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel PEBS Load Latency Measurement
  2013-11-01  8:41           ` Namhyung Kim
  2013-11-01  9:02             ` Manuel Selva
@ 2013-11-01 17:02             ` Vince Weaver
  2013-11-01 18:08               ` Manuel Selva
  1 sibling, 1 reply; 13+ messages in thread
From: Vince Weaver @ 2013-11-01 17:02 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Manuel Selva, linux-perf-users

On Fri, 1 Nov 2013, Namhyung Kim wrote:

> On Tue, 29 Oct 2013 14:20:09 +0100, Manuel Selva wrote:
> > One more thing I forgot to ask is clarification about the pid
> > parameter. According to Vince Weaver page: "If pid is 0, measurements
> > happen on the current thread, if pid is greater than 0, the process
> > indicated by pid is measured, and if pid is -1, all processes are
> > counted." and according to perf userland tool wiki page, it's possible
> > to attache to a specific thread with a -i option. As a consequence I
> > wonder how I can use the perf perf_event_sys_call to only count events
> > for a specific thread ?
> 
> In the syscall's point of view, pid is actually tid AFAIK - so I works
> on the thread-basis not the process.

It is true the manpage is a bit confusing here, though that's mostly due 
to the confusing way that Linux interchangably uses pid/tid for process 
and thread ids.  I'll see if I can get the documentation made more clear.

Vince

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Intel PEBS Load Latency Measurement
  2013-11-01 17:02             ` Vince Weaver
@ 2013-11-01 18:08               ` Manuel Selva
  0 siblings, 0 replies; 13+ messages in thread
From: Manuel Selva @ 2013-11-01 18:08 UTC (permalink / raw)
  To: Vince Weaver, Namhyung Kim; +Cc: linux-perf-users

I agree about the confusing way that Linux uses pid and tid. I guess 
this comes from the (old) time where processes only had one thread. 
Anyway, threads are just processes sharing some part of their memory, 
and processes are just threads that don't share anything.

Your man page (the online version, I have to check why I don't have it 
on my Linux workstation) was really helpful for me to build what I 
needed upon the perf_event_open system call without having to start from 
scratch with my own kernel module  or the msr module and the Intel 
documentation. I listed from my side some complementary information that 
maybe useful for others, if you plan to update the man page I can 
provide you these notes.

Thanks again to all here on the list for your help !

Manu

On 11/01/2013 06:02 PM, Vince Weaver wrote:
> On Fri, 1 Nov 2013, Namhyung Kim wrote:
>
>> On Tue, 29 Oct 2013 14:20:09 +0100, Manuel Selva wrote:
>>> One more thing I forgot to ask is clarification about the pid
>>> parameter. According to Vince Weaver page: "If pid is 0, measurements
>>> happen on the current thread, if pid is greater than 0, the process
>>> indicated by pid is measured, and if pid is -1, all processes are
>>> counted." and according to perf userland tool wiki page, it's possible
>>> to attache to a specific thread with a -i option. As a consequence I
>>> wonder how I can use the perf perf_event_sys_call to only count events
>>> for a specific thread ?
>>
>> In the syscall's point of view, pid is actually tid AFAIK - so I works
>> on the thread-basis not the process.
>
> It is true the manpage is a bit confusing here, though that's mostly due
> to the confusing way that Linux interchangably uses pid/tid for process
> and thread ids.  I'll see if I can get the documentation made more clear.
>
> Vince
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fwd: Intel PEBS Load Latency Measurement
  2013-11-01  8:38         ` Fwd: " Namhyung Kim
@ 2013-11-06 13:06           ` Manuel Selva
  2013-11-06 13:41           ` Stephane Eranian
  1 sibling, 0 replies; 13+ messages in thread
From: Manuel Selva @ 2013-11-06 13:06 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: linux-perf-users, Stephane Eranian

Hi all,

I think I got the point about the Intel SDM saying that the "hardware 
randomly tag load operations". This simply means, that when software 
indicates a sampling period of X events, the hardware choose one load 
randomly in each "packet" of X load events in order to load always 
sample the same thing for example when executing a loop. Is it correct ?

Thanks,

----
Manu

On 11/01/2013 09:38 AM, Namhyung Kim wrote:
> Hi Manuel,
>
> I'm CC-ing Stephane who is the author of the perf mem tool.  Stephane,
> could you please answer the questions below if you have some time?
>
> Thanks,
> Namhyung
>
>
> On Tue, 29 Oct 2013 10:12:39 +0100, Manuel Selva wrote:
>> Hi Namhyung,
>>
>> Many thanks for your answer and the function you pointed. I think I
>> now have all the required understanding of the perf_event_open syscall
>> to do what I want.
>>
>> I still have two questions regarding Intel (I am on a Westmere-Ep Xeon
>> X5650) Load latency feature and its usage by the perf mem tool.
>>
>> 1- In the Intel software developer guide we can read: "load operations
>> are randomly selected by hardware and tagged to carry information
>> related to data source locality and latency" I am wondering what does
>> it mean, are we doing sampling at two different levels ? First the
>> hardware chooses some load instructions to tag, and then each time X
>> (sampling period in events count specified by software) such tagged
>> instructions with a latency greater than a software specify threshold
>> we record a sample with some information. What is the sampling rate of
>> the hardware tagging mechanism, is it enough to get some interesting
>> results ?
>>
>> 2- How does the perf mem tool (with the load option) with of course
>> the help of the kernel uses this feature ? After a quick browsing of
>> the code, here is my understanding, is it correct ?
>> The PEBS load latency feature is enabled with the minimal possible
>> latency (3 cycles) to do sampling on all loads and with a given
>> default sampling period (x tagged load events with latency greater or
>> equal to 3). In addition to these "loads events" the perf mem tool
>> asks the kernel to record events about processes naming, and memory
>> mappings of code to be able to retrieve offline the source code
>> associated to instruction pointers present in samples.
>>
>> Thanks again for your help,
>>
>> Manu
>>
>>
>> 2013/10/29 Namhyung Kim <namhyung@kernel.org>
>>>
>>> Hi Manuel,
>>>
>>> On Mon, 28 Oct 2013 12:28:06 +0100, Manuel Selva wrote:
>>>> Hi,
>>>>
>>>> I am coming back on this subject after working on other stuff for
>>>> several weeks. Andi pointed me to the userland tool 'perf mem'
>>>> introduced in "recent" kernels (can't find the version) that is using
>>>> the kernel perf_event_open system call to profile memory accesses.
>>>>
>>>> I guess the answer to my question is in the code of this tool, but
>>>> before stepping deeper inside it, I wanted to ask you (Linux perf
>>>> experts) few questions, to be sure I am on the right track.
>>>>
>>>> For now, I just configured a perf_event_attr to perform sampling of
>>>> PERF_COUNT_HW_INSTRUCTIONS at a given period. Can you confirm than the
>>>> sample_period means "the kernel will generate a sample (with fields
>>>> asked through sample_type) every sample_period instructions ?
>>>
>>> Yes.
>>>
>>>>
>>>> Then after calling the perf_event_open system call I mmap the file
>>>> descriptor returned with an arbitrary size of X pages (with X = 1 +
>>>> 2^n).
>>>>
>>>> I then start recording events with ioctl on the file descriptor
>>>> returned by perf_event_open. I am now wondering how to access the
>>>> samples. My main concern is about the meaning of the data_head and
>>>> data_tail fields of the metadata page located at the beginning of the
>>>> memory mmaped. In understand that my samples are located just after
>>>> this metadata page, and that these head and tail pointers are used to
>>>> indicate where we are in the reading of the samples, is it correct ?
>>>
>>> Correct.
>>>
>>>
>>>> While reading samples, should I use/modify these head and tail
>>>> pointers, if yes what is the purpose of that ?
>>>
>>> The head is updated by kernel, you only need to update the tail after
>>> reading.  Please see perf_record__mmap_read().
>>>
>>>>
>>>> I am going now to look for the perf mem code, to try to understand
>>>> that from my side, but I am interested in any hint on the subject that
>>>> may help me.
>>>>
>>>> Many thanks in advance for your help,
>>>
>>> Hope this helps,
>>> Namhyung
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fwd: Intel PEBS Load Latency Measurement
  2013-11-01  8:38         ` Fwd: " Namhyung Kim
  2013-11-06 13:06           ` Manuel Selva
@ 2013-11-06 13:41           ` Stephane Eranian
  1 sibling, 0 replies; 13+ messages in thread
From: Stephane Eranian @ 2013-11-06 13:41 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Manuel Selva, linux-perf-users

Hi Manuel,

On Fri, Nov 1, 2013 at 9:38 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> Hi Manuel,
>
> I'm CC-ing Stephane who is the author of the perf mem tool.  Stephane,
> could you please answer the questions below if you have some time?
>
> Thanks,
> Namhyung
>
>
> On Tue, 29 Oct 2013 10:12:39 +0100, Manuel Selva wrote:
>> Hi Namhyung,
>>
>> Many thanks for your answer and the function you pointed. I think I
>> now have all the required understanding of the perf_event_open syscall
>> to do what I want.
>>
>> I still have two questions regarding Intel (I am on a Westmere-Ep Xeon
>> X5650) Load latency feature and its usage by the perf mem tool.
>>
>> 1- In the Intel software developer guide we can read: "load operations
>> are randomly selected by hardware and tagged to carry information
>> related to data source locality and latency" I am wondering what does
>> it mean, are we doing sampling at two different levels ? First the
>> hardware chooses some load instructions to tag, and then each time X
>> (sampling period in events count specified by software) such tagged
>> instructions with a latency greater than a software specify threshold
>> we record a sample with some information. What is the sampling rate of
>> the hardware tagging mechanism, is it enough to get some interesting
>> results ?
>>
The Load latency facility combines basic PEBS + a threshold mechanism
to filter only certain types of loads based on their latencies.

The mem_trans_retired:latency_above_threshold counts the number of
loads retired that qualify for the threshold. This is the event you are
actually sampling on. When that counter overflows, the retired load
is sampled. If you set the counter to -P, it will overflow after P occurrences
of the event. Now, it is clear that to get there you need to wait until
the load retires, otherwise you don't know the latency. Note that
latency here means instruction latency not just data access latency.
So, I suspect underneath there is indeed some tagging mechanism.
It can track only one load at a time. To avoid bias, the tagging mechanism
uses some randomization scheme. I don't know how this tagging mechanism
actually works. But clearly you may track loads that don't qualify for the
threshold, they won't increment the counter and therefore will never be
captured by perf_events.


>> 2- How does the perf mem tool (with the load option) with of course
>> the help of the kernel uses this feature ? After a quick browsing of
>> the code, here is my understanding, is it correct ?
>> The PEBS load latency feature is enabled with the minimal possible
>> latency (3 cycles) to do sampling on all loads and with a given
>> default sampling period (x tagged load events with latency greater or
>> equal to 3). In addition to these "loads events" the perf mem tool
>> asks the kernel to record events about processes naming, and memory
>> mappings of code to be able to retrieve offline the source code
>> associated to instruction pointers present in samples.
>>
Yes, your description is correct. The one difference compared
with regular code sampling is that we also ask the kernel to record
data mmaps, so we get a chance to symbolize data addresses (global
variables only).

Hope this helps.

>> Thanks again for your help,
>>
>> Manu
>>
>>
>> 2013/10/29 Namhyung Kim <namhyung@kernel.org>
>>>
>>> Hi Manuel,
>>>
>>> On Mon, 28 Oct 2013 12:28:06 +0100, Manuel Selva wrote:
>>> > Hi,
>>> >
>>> > I am coming back on this subject after working on other stuff for
>>> > several weeks. Andi pointed me to the userland tool 'perf mem'
>>> > introduced in "recent" kernels (can't find the version) that is using
>>> > the kernel perf_event_open system call to profile memory accesses.
>>> >
>>> > I guess the answer to my question is in the code of this tool, but
>>> > before stepping deeper inside it, I wanted to ask you (Linux perf
>>> > experts) few questions, to be sure I am on the right track.
>>> >
>>> > For now, I just configured a perf_event_attr to perform sampling of
>>> > PERF_COUNT_HW_INSTRUCTIONS at a given period. Can you confirm than the
>>> > sample_period means "the kernel will generate a sample (with fields
>>> > asked through sample_type) every sample_period instructions ?
>>>
>>> Yes.
>>>
>>> >
>>> > Then after calling the perf_event_open system call I mmap the file
>>> > descriptor returned with an arbitrary size of X pages (with X = 1 +
>>> > 2^n).
>>> >
>>> > I then start recording events with ioctl on the file descriptor
>>> > returned by perf_event_open. I am now wondering how to access the
>>> > samples. My main concern is about the meaning of the data_head and
>>> > data_tail fields of the metadata page located at the beginning of the
>>> > memory mmaped. In understand that my samples are located just after
>>> > this metadata page, and that these head and tail pointers are used to
>>> > indicate where we are in the reading of the samples, is it correct ?
>>>
>>> Correct.
>>>
>>>
>>> > While reading samples, should I use/modify these head and tail
>>> > pointers, if yes what is the purpose of that ?
>>>
>>> The head is updated by kernel, you only need to update the tail after
>>> reading.  Please see perf_record__mmap_read().
>>>
>>> >
>>> > I am going now to look for the perf mem code, to try to understand
>>> > that from my side, but I am interested in any hint on the subject that
>>> > may help me.
>>> >
>>> > Many thanks in advance for your help,
>>>
>>> Hope this helps,
>>> Namhyung

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-11-06 13:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-17  7:44 Intel PEBS Load Latency Measurement Manuel Selva
2013-09-19  8:22 ` Andi Kleen
2013-10-28 11:28 ` Manuel Selva
2013-10-29  2:36   ` Namhyung Kim
     [not found]     ` <CALbiyZy_JE+wai7d_=r-XzE+FdHRitTiAuPmANtRt7Qpet8fTg@mail.gmail.com>
2013-10-29  9:12       ` Fwd: " Manuel Selva
2013-10-29 13:20         ` Manuel Selva
2013-11-01  8:41           ` Namhyung Kim
2013-11-01  9:02             ` Manuel Selva
2013-11-01 17:02             ` Vince Weaver
2013-11-01 18:08               ` Manuel Selva
2013-11-01  8:38         ` Fwd: " Namhyung Kim
2013-11-06 13:06           ` Manuel Selva
2013-11-06 13:41           ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).