xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Question about running a program(Intel PCM) in ring 0 on Xen
@ 2014-02-17 22:32 Meng Xu
  2014-02-18  9:14 ` Dario Faggioli
  0 siblings, 1 reply; 6+ messages in thread
From: Meng Xu @ 2014-02-17 22:32 UTC (permalink / raw)
  To: xen-devel@lists.xen.org; +Cc: mengxu@cis.upenn.edu


[-- Attachment #1.1: Type: text/plain, Size: 1578 bytes --]

Hi,

I'm a PhD student, working on real time system.

*[My goal]*
I want to measure the cache hit/miss rate of each guest domain in Xen. I
may also want to measure some other events, say memory access rate, for
each program in each guest domain in Xen.

My machine's CPU uses intel IvyBridge architecture.

*[The problem I'm encountering]*
I tried intel's Performance Counter Monitor (PCM) in Linux on bare machine
to get the machine's cache access rate for each level of cache, it works
very well.

However, when I want to use the PCM in Xen and run it in dom0, it cannot
work. I think the PCM needs to run in ring 0 to read/write the MSR. Because
dom0 is running in ring 1, so PCM running in dom0 cannot work.

*So my question is:*
How can I run a program (say PCM) in ring 0 on Xen?

*What's in my mind is:*
Writing a hypercall to call the PCM in Xen's kernel space, then the PCM
will run in ring 0?
But the problem I'm concerned is that some of the PCM's instruction, say
printf(), may not be able to run in kernel space?

Do you have any suggestion on running PCM or other performance monitor
program in ring 0 on Xen?

*What I tried before:*
I wrote a hypercall to read and write the MSR and record the cache
hit/miss event for each level of cache, using Intel's performance counter.
It worked on my machine. But it's not portable to other machines since the
event number may be different. That's why I think running PCM or other
existing performance monitor program on Xen will be a better idea.

Thank you very much for your time and help in this question!

Best,

Meng

[-- Attachment #1.2: Type: text/html, Size: 3576 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about running a program(Intel PCM) in ring 0 on Xen
  2014-02-17 22:32 Question about running a program(Intel PCM) in ring 0 on Xen Meng Xu
@ 2014-02-18  9:14 ` Dario Faggioli
  2014-02-18 11:29   ` Dario Faggioli
  2014-02-18 15:24   ` Meng Xu
  0 siblings, 2 replies; 6+ messages in thread
From: Dario Faggioli @ 2014-02-18  9:14 UTC (permalink / raw)
  To: Meng Xu; +Cc: Boris Ostrovsky, mengxu@cis.upenn.edu, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 2518 bytes --]

On lun, 2014-02-17 at 17:32 -0500, Meng Xu wrote:
> Hi,
> 
Hi,

> I'm a PhD student, working on real time system. 
> 
Cool. There really seems to be a lot of interest in Real-Time
virtualization these days. :-D

> [My goal]
> I want to measure the cache hit/miss rate of each guest domain in Xen.
> I may also want to measure some other events, say memory access rate,
> for each program in each guest domain in Xen.
> 
Ok. Can I, out of curiosity, as you to detail a bit more what your
*final* goal is (I mean, you're interested in these measurements for a
reason, not just for the sake of having them, right?).

> [The problem I'm encountering]
> I tried intel's Performance Counter Monitor (PCM) in Linux on bare
> machine to get the machine's cache access rate for each level of
> cache, it works very well. 
> 
> 
> However, when I want to use the PCM in Xen and run it in dom0, it
> cannot work. I think the PCM needs to run in ring 0 to read/write the
> MSR. Because dom0 is running in ring 1, so PCM running in dom0 cannot
> work. 
> 
Indeed.

> So my question is:
> How can I run a program (say PCM) in ring 0 on Xen? 
> 
Running "a program" in there is going to be terribly difficult. What I
think you're better off is trying to access, from dom0 and/or
(para)virtualize the counters.

In fact, there is work going on already on this, although I don't have
all the details about what's the current status.

> What's in my mind is:
> Writing a hypercall to call the PCM in Xen's kernel space, then the
> PCM will run in ring 0? 
> But the problem I'm concerned is that some of the PCM's instruction,
> say printf(), may not be able to run in kernel space? 
> 
Well, Xen can print, e.g., on a serial console, but again, that's not
what you want. I'm adding the link to a few conversation about virtual
PMU. These are just the very first google's result, so there may well be
more:

http://xen.1045712.n5.nabble.com/Virtualization-of-the-CPU-Performance-Monitoring-Unit-td5623065.html
https://lwn.net/Articles/566159/

Boris (which I'm Cc-ing), gave a presentation about this at latest Xen
Developers Summit:
http://www.slideshare.net/xen_com_mgr/xen-pmu-xensummit2013

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about running a program(Intel PCM) in ring 0 on Xen
  2014-02-18  9:14 ` Dario Faggioli
@ 2014-02-18 11:29   ` Dario Faggioli
  2014-02-18 15:24   ` Meng Xu
  1 sibling, 0 replies; 6+ messages in thread
From: Dario Faggioli @ 2014-02-18 11:29 UTC (permalink / raw)
  To: Meng Xu; +Cc: Boris Ostrovsky, mengxu@cis.upenn.edu, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 711 bytes --]

On mar, 2014-02-18 at 10:14 +0100, Dario Faggioli wrote:
> Boris (which I'm Cc-ing), gave a presentation about this at latest Xen
> Developers Summit:
> http://www.slideshare.net/xen_com_mgr/xen-pmu-xensummit2013
> 
And there appears to be a new version of this work, released just
yesterday! :-)

Have a look here:
http://bugs.xenproject.org/xen/mid/%3C1392659764-22183-1-git-send-email-boris.ostrovsky@oracle.com%3E

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about running a program(Intel PCM) in ring 0 on Xen
  2014-02-18  9:14 ` Dario Faggioli
  2014-02-18 11:29   ` Dario Faggioli
@ 2014-02-18 15:24   ` Meng Xu
  2014-02-18 16:16     ` Boris Ostrovsky
  1 sibling, 1 reply; 6+ messages in thread
From: Meng Xu @ 2014-02-18 15:24 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Boris Ostrovsky, mengxu@cis.upenn.edu, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 3813 bytes --]

Hi Dario,

Thank you so much for your detailed reply! It is really helpful! I'm
looking at the vPMU and perf on Xen, and will try it. :-)

The reason why I want to know this information from hardware performance
counter is because I want to know the interference among each domains when
they are running.

In addition, when we measure the latency of accessing a large array, the
result is out of our expectation. We increase the size of an array from 1KB
to 12MB, which covers the L1(32KB), L2(256KB) and L3(12MB) cache size. We
expect that the latency of accessing the whole array should have clear cut
at around 32KB, 256KB and 12MB because the latency of L1 L2 and L3 are
several times different.

However, we saw the latency does not increase much when the array size is
larger than the size of L1, L2, and L3. It's weird because if we run the
same task in Linux on bare machine, it is the expected result.

We are not sure if this is because of the virt. overhead or cache miss,
that's why we want to know the cache access rate of each domain.

It's really appreciated  if you can share some of your insight on this. :-)

Thank you very much for your time!

Best,

Meng


2014-02-18 4:14 GMT-05:00 Dario Faggioli <dario.faggioli@citrix.com>:

> On lun, 2014-02-17 at 17:32 -0500, Meng Xu wrote:
> > Hi,
> >
> Hi,
>
> > I'm a PhD student, working on real time system.
> >
> Cool. There really seems to be a lot of interest in Real-Time
> virtualization these days. :-D
>
> > [My goal]
> > I want to measure the cache hit/miss rate of each guest domain in Xen.
> > I may also want to measure some other events, say memory access rate,
> > for each program in each guest domain in Xen.
> >
> Ok. Can I, out of curiosity, as you to detail a bit more what your
> *final* goal is (I mean, you're interested in these measurements for a
> reason, not just for the sake of having them, right?).
>
> > [The problem I'm encountering]
> > I tried intel's Performance Counter Monitor (PCM) in Linux on bare
> > machine to get the machine's cache access rate for each level of
> > cache, it works very well.
> >
> >
> > However, when I want to use the PCM in Xen and run it in dom0, it
> > cannot work. I think the PCM needs to run in ring 0 to read/write the
> > MSR. Because dom0 is running in ring 1, so PCM running in dom0 cannot
> > work.
> >
> Indeed.
>
> > So my question is:
> > How can I run a program (say PCM) in ring 0 on Xen?
> >
> Running "a program" in there is going to be terribly difficult. What I
> think you're better off is trying to access, from dom0 and/or
> (para)virtualize the counters.
>
> In fact, there is work going on already on this, although I don't have
> all the details about what's the current status.
>
> > What's in my mind is:
> > Writing a hypercall to call the PCM in Xen's kernel space, then the
> > PCM will run in ring 0?
> > But the problem I'm concerned is that some of the PCM's instruction,
> > say printf(), may not be able to run in kernel space?
> >
> Well, Xen can print, e.g., on a serial console, but again, that's not
> what you want. I'm adding the link to a few conversation about virtual
> PMU. These are just the very first google's result, so there may well be
> more:
>
>
> http://xen.1045712.n5.nabble.com/Virtualization-of-the-CPU-Performance-Monitoring-Unit-td5623065.html
> https://lwn.net/Articles/566159/
>
> Boris (which I'm Cc-ing), gave a presentation about this at latest Xen
> Developers Summit:
> http://www.slideshare.net/xen_com_mgr/xen-pmu-xensummit2013
>
> Regards,
> Dario
>
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>
>

[-- Attachment #1.2: Type: text/html, Size: 6207 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about running a program(Intel PCM) in ring 0 on Xen
  2014-02-18 15:24   ` Meng Xu
@ 2014-02-18 16:16     ` Boris Ostrovsky
  2014-02-19 14:12       ` Meng Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Boris Ostrovsky @ 2014-02-18 16:16 UTC (permalink / raw)
  To: Meng Xu; +Cc: Dario Faggioli, mengxu@cis.upenn.edu, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 5037 bytes --]

On 02/18/2014 10:24 AM, Meng Xu wrote:
> Hi Dario,
>
> Thank you so much for your detailed reply! It is really helpful! I'm 
> looking at the vPMU and perf on Xen, and will try it. :-)

You will need the Xen patches that Dario pointed you to (thanks Dario) 
plus Linux kernel and toolstack changes that I can send you in a 
separate email (they still need some cleanup but should be usable).

BTW, you mentioned in the earlier email that you you wrote some code to 
directly access PMU registers and didn't think the code is particularly 
useful because of portability concerns. I believe basic counters (such 
as those for cache misses) and controls are common across pretty much 
all recent Intel processors.

>
> The reason why I want to know this information from hardware 
> performance counter is because I want to know the interference among 
> each domains when they are running.
>
> In addition, when we measure the latency of accessing a large array, 
> the result is out of our expectation. We increase the size of an array 
> from 1KB to 12MB, which covers the L1(32KB), L2(256KB) and L3(12MB) 
> cache size. We expect that the latency of accessing the whole array 
> should have clear cut at around 32KB, 256KB and 12MB because the 
> latency of L1 L2 and L3 are several times different.
>
> However, we saw the latency does not increase much when the array size 
> is larger than the size of L1, L2, and L3. It's weird because if we 
> run the same task in Linux on bare machine, it is the expected result.

Although most likely your vcpus are not migrating you should still make 
sure that they are pinned (and not oversubscribed to physical processors).

And (as with any performance measurements) disable power management and 
turbo mode. These things often mess up your timing.

-boris

>
> We are not sure if this is because of the virt. overhead or cache 
> miss, that's why we want to know the cache access rate of each domain.
>
> It's really appreciated  if you can share some of your insight on 
> this. :-)
>
> Thank you very much for your time!
>
> Best,
>
> Meng
>
>
> 2014-02-18 4:14 GMT-05:00 Dario Faggioli <dario.faggioli@citrix.com 
> <mailto:dario.faggioli@citrix.com>>:
>
>     On lun, 2014-02-17 at 17:32 -0500, Meng Xu wrote:
>     > Hi,
>     >
>     Hi,
>
>     > I'm a PhD student, working on real time system.
>     >
>     Cool. There really seems to be a lot of interest in Real-Time
>     virtualization these days. :-D
>
>     > [My goal]
>     > I want to measure the cache hit/miss rate of each guest domain
>     in Xen.
>     > I may also want to measure some other events, say memory access
>     rate,
>     > for each program in each guest domain in Xen.
>     >
>     Ok. Can I, out of curiosity, as you to detail a bit more what your
>     *final* goal is (I mean, you're interested in these measurements for a
>     reason, not just for the sake of having them, right?).
>
>     > [The problem I'm encountering]
>     > I tried intel's Performance Counter Monitor (PCM) in Linux on bare
>     > machine to get the machine's cache access rate for each level of
>     > cache, it works very well.
>     >
>     >
>     > However, when I want to use the PCM in Xen and run it in dom0, it
>     > cannot work. I think the PCM needs to run in ring 0 to
>     read/write the
>     > MSR. Because dom0 is running in ring 1, so PCM running in dom0
>     cannot
>     > work.
>     >
>     Indeed.
>
>     > So my question is:
>     > How can I run a program (say PCM) in ring 0 on Xen?
>     >
>     Running "a program" in there is going to be terribly difficult. What I
>     think you're better off is trying to access, from dom0 and/or
>     (para)virtualize the counters.think
>
>     In fact, there is work going on already on this, although I don't have
>     all the details about what's the current status.
>
>     > What's in my mind is:
>     > Writing a hypercall to call the PCM in Xen's kernel space, then the
>     > PCM will run in ring 0?
>     > But the problem I'm concerned is that some of the PCM's instruction,
>     > say printf(), may not be able to run in kernel space?
>     >
>     Well, Xen can print, e.g., on a serial console, but again, that's not
>     what you want. I'm adding the link to a few conversation about virtual
>     PMU. These are just the very first google's result, so there may
>     well be
>     more:
>
>     http://xen.1045712.n5.nabble.com/Virtualization-of-the-CPU-Performance-Monitoring-Unit-td5623065.html
>     https://lwn.net/Articles/566159/
>
>     Boris (which I'm Cc-ing), gave a presentation about this at latest Xen
>     Developers Summit:
>     http://www.slideshare.net/xen_com_mgr/xen-pmu-xensummit2013
>
>     Regards,
>     Dario
>
>     --
>     <<This happens because I choose it to happen!>> (Raistlin Majere)
>     -----------------------------------------------------------------
>     Dario Faggioli, Ph.D, http://about.me/dario.faggioli
>     Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>
>


[-- Attachment #1.2: Type: text/html, Size: 10039 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about running a program(Intel PCM) in ring 0 on Xen
  2014-02-18 16:16     ` Boris Ostrovsky
@ 2014-02-19 14:12       ` Meng Xu
  0 siblings, 0 replies; 6+ messages in thread
From: Meng Xu @ 2014-02-19 14:12 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Dario Faggioli, mengxu@cis.upenn.edu, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 5469 bytes --]

Hi Boris,

2014-02-18 11:16 GMT-05:00 Boris Ostrovsky <boris.ostrovsky@oracle.com>:

>  On 02/18/2014 10:24 AM, Meng Xu wrote:
>
>  Hi Dario,
>
>  Thank you so much for your detailed reply! It is really helpful! I'm
> looking at the vPMU and perf on Xen, and will try it. :-)
>
>
> You will need the Xen patches that Dario pointed you to (thanks Dario)
> plus Linux kernel and toolstack changes that I can send you in a separate
> email (they still need some cleanup but should be usable).
>

Thank you so much for pointing this out! :)



>
> BTW, you mentioned in the earlier email that you you wrote some code to
> directly access PMU registers and didn't think the code is particularly
> useful because of portability concerns. I believe basic counters (such as
> those for cache misses) and controls are common  across pretty much all
> recent Intel processors.
>

Yes, the counters are there. But when I looked at the events and umask
number, they have slightly difference among the 2nd, 3rd and 4th generation
of Intel's cpu. Some events are not there in earlier version of CPU. (If I
code those difference in the xen tool I wrote, it will be like writing part
of intel's PMC. that's why I hope to use the existing work to run in Xen.
:-) )


>
>  The reason why I want to know this information from hardware performance
> counter is because I want to know the interference among each domains when
> they are running.
>
>  In addition, when we measure the latency of accessing a large array, the
> result is out of our expectation. We increase the size of an array from 1KB
> to 12MB, which covers the L1(32KB), L2(256KB) and L3(12MB) cache size. We
> expect that the latency of accessing the whole array should have clear cut
> at around 32KB, 256KB and 12MB because the latency of L1 L2 and L3 are
> several times different.
>
>  However, we saw the latency does not increase much when the array size
> is larger than the size of L1, L2, and L3. It's weird because if we run the
> same task in Linux on bare machine, it is the expected result.
>
>
> Although most likely your vcpus are not migrating you should still make
> sure that they are pinned (and not oversubscribed to physical processors).
>
> Thanks for pointing this out! 



> And (as with any performance measurements) disable power management and
> turbo mode. These things often mess up your timing.
>

Sure! 

Thank you very much for your help!

best,

Meng



>
> -boris
>
>
>  We are not sure if this is because of the virt. overhead or cache miss,
> that's why we want to know the cache access rate of each domain.
>
>  It's really appreciated  if you can share some of your insight on this.
> :-)
>
>  Thank you very much for your time!
>
>  Best,
>
>  Meng
>
>
>  2014-02-18 4:14 GMT-05:00 Dario Faggioli <dario.faggioli@citrix.com>:
>
>> On lun, 2014-02-17 at 17:32 -0500, Meng Xu wrote:
>> > Hi,
>> >
>> Hi,
>>
>> > I'm a PhD student, working on real time system.
>> >
>>  Cool. There really seems to be a lot of interest in Real-Time
>> virtualization these days. :-D
>>
>> > [My goal]
>> > I want to measure the cache hit/miss rate of each guest domain in Xen.
>> > I may also want to measure some other events, say memory access rate,
>> > for each program in each guest domain in Xen.
>> >
>>  Ok. Can I, out of curiosity, as you to detail a bit more what your
>> *final* goal is (I mean, you're interested in these measurements for a
>> reason, not just for the sake of having them, right?).
>>
>> > [The problem I'm encountering]
>> > I tried intel's Performance Counter Monitor (PCM) in Linux on bare
>> > machine to get the machine's cache access rate for each level of
>> > cache, it works very well.
>> >
>> >
>> > However, when I want to use the PCM in Xen and run it in dom0, it
>> > cannot work. I think the PCM needs to run in ring 0 to read/write the
>> > MSR. Because dom0 is running in ring 1, so PCM running in dom0 cannot
>> > work.
>> >
>>  Indeed.
>>
>> > So my question is:
>> > How can I run a program (say PCM) in ring 0 on Xen?
>> >
>>  Running "a program" in there is going to be terribly difficult. What I
>> think you're better off is trying to access, from dom0 and/or
>> (para)virtualize the counters.think
>>
>>
>> In fact, there is work going on already on this, although I don't have
>> all the details about what's the current status.
>>
>> > What's in my mind is:
>> > Writing a hypercall to call the PCM in Xen's kernel space, then the
>> > PCM will run in ring 0?
>> > But the problem I'm concerned is that some of the PCM's instruction,
>> > say printf(), may not be able to run in kernel space?
>> >
>>  Well, Xen can print, e.g., on a serial console, but again, that's not
>> what you want. I'm adding the link to a few conversation about virtual
>> PMU. These are just the very first google's result, so there may well be
>> more:
>>
>>
>> http://xen.1045712.n5.nabble.com/Virtualization-of-the-CPU-Performance-Monitoring-Unit-td5623065.html
>> https://lwn.net/Articles/566159/
>>
>> Boris (which I'm Cc-ing), gave a presentation about this at latest Xen
>> Developers Summit:
>> http://www.slideshare.net/xen_com_mgr/xen-pmu-xensummit2013
>>
>> Regards,
>> Dario
>>
>> --
>> <<This happens because I choose it to happen!>> (Raistlin Majere)
>> -----------------------------------------------------------------
>> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
>> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>>
>>
>
>

[-- Attachment #1.2: Type: text/html, Size: 12161 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-02-19 14:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-17 22:32 Question about running a program(Intel PCM) in ring 0 on Xen Meng Xu
2014-02-18  9:14 ` Dario Faggioli
2014-02-18 11:29   ` Dario Faggioli
2014-02-18 15:24   ` Meng Xu
2014-02-18 16:16     ` Boris Ostrovsky
2014-02-19 14:12       ` Meng Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).