* Question about running a program(Intel PCM) in ring 0 on Xen @ 2014-02-17 22:32 Meng Xu 2014-02-18 9:14 ` Dario Faggioli 0 siblings, 1 reply; 6+ messages in thread From: Meng Xu @ 2014-02-17 22:32 UTC (permalink / raw) To: xen-devel@lists.xen.org; +Cc: mengxu@cis.upenn.edu [-- Attachment #1.1: Type: text/plain, Size: 1578 bytes --] Hi, I'm a PhD student, working on real time system. *[My goal]* I want to measure the cache hit/miss rate of each guest domain in Xen. I may also want to measure some other events, say memory access rate, for each program in each guest domain in Xen. My machine's CPU uses intel IvyBridge architecture. *[The problem I'm encountering]* I tried intel's Performance Counter Monitor (PCM) in Linux on bare machine to get the machine's cache access rate for each level of cache, it works very well. However, when I want to use the PCM in Xen and run it in dom0, it cannot work. I think the PCM needs to run in ring 0 to read/write the MSR. Because dom0 is running in ring 1, so PCM running in dom0 cannot work. *So my question is:* How can I run a program (say PCM) in ring 0 on Xen? *What's in my mind is:* Writing a hypercall to call the PCM in Xen's kernel space, then the PCM will run in ring 0? But the problem I'm concerned is that some of the PCM's instruction, say printf(), may not be able to run in kernel space? Do you have any suggestion on running PCM or other performance monitor program in ring 0 on Xen? *What I tried before:* I wrote a hypercall to read and write the MSR and record the cache hit/miss event for each level of cache, using Intel's performance counter. It worked on my machine. But it's not portable to other machines since the event number may be different. That's why I think running PCM or other existing performance monitor program on Xen will be a better idea. Thank you very much for your time and help in this question! Best, Meng [-- Attachment #1.2: Type: text/html, Size: 3576 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question about running a program(Intel PCM) in ring 0 on Xen 2014-02-17 22:32 Question about running a program(Intel PCM) in ring 0 on Xen Meng Xu @ 2014-02-18 9:14 ` Dario Faggioli 2014-02-18 11:29 ` Dario Faggioli 2014-02-18 15:24 ` Meng Xu 0 siblings, 2 replies; 6+ messages in thread From: Dario Faggioli @ 2014-02-18 9:14 UTC (permalink / raw) To: Meng Xu; +Cc: Boris Ostrovsky, mengxu@cis.upenn.edu, xen-devel@lists.xen.org [-- Attachment #1.1: Type: text/plain, Size: 2518 bytes --] On lun, 2014-02-17 at 17:32 -0500, Meng Xu wrote: > Hi, > Hi, > I'm a PhD student, working on real time system. > Cool. There really seems to be a lot of interest in Real-Time virtualization these days. :-D > [My goal] > I want to measure the cache hit/miss rate of each guest domain in Xen. > I may also want to measure some other events, say memory access rate, > for each program in each guest domain in Xen. > Ok. Can I, out of curiosity, as you to detail a bit more what your *final* goal is (I mean, you're interested in these measurements for a reason, not just for the sake of having them, right?). > [The problem I'm encountering] > I tried intel's Performance Counter Monitor (PCM) in Linux on bare > machine to get the machine's cache access rate for each level of > cache, it works very well. > > > However, when I want to use the PCM in Xen and run it in dom0, it > cannot work. I think the PCM needs to run in ring 0 to read/write the > MSR. Because dom0 is running in ring 1, so PCM running in dom0 cannot > work. > Indeed. > So my question is: > How can I run a program (say PCM) in ring 0 on Xen? > Running "a program" in there is going to be terribly difficult. What I think you're better off is trying to access, from dom0 and/or (para)virtualize the counters. In fact, there is work going on already on this, although I don't have all the details about what's the current status. > What's in my mind is: > Writing a hypercall to call the PCM in Xen's kernel space, then the > PCM will run in ring 0? > But the problem I'm concerned is that some of the PCM's instruction, > say printf(), may not be able to run in kernel space? > Well, Xen can print, e.g., on a serial console, but again, that's not what you want. I'm adding the link to a few conversation about virtual PMU. These are just the very first google's result, so there may well be more: http://xen.1045712.n5.nabble.com/Virtualization-of-the-CPU-Performance-Monitoring-Unit-td5623065.html https://lwn.net/Articles/566159/ Boris (which I'm Cc-ing), gave a presentation about this at latest Xen Developers Summit: http://www.slideshare.net/xen_com_mgr/xen-pmu-xensummit2013 Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question about running a program(Intel PCM) in ring 0 on Xen 2014-02-18 9:14 ` Dario Faggioli @ 2014-02-18 11:29 ` Dario Faggioli 2014-02-18 15:24 ` Meng Xu 1 sibling, 0 replies; 6+ messages in thread From: Dario Faggioli @ 2014-02-18 11:29 UTC (permalink / raw) To: Meng Xu; +Cc: Boris Ostrovsky, mengxu@cis.upenn.edu, xen-devel@lists.xen.org [-- Attachment #1.1: Type: text/plain, Size: 711 bytes --] On mar, 2014-02-18 at 10:14 +0100, Dario Faggioli wrote: > Boris (which I'm Cc-ing), gave a presentation about this at latest Xen > Developers Summit: > http://www.slideshare.net/xen_com_mgr/xen-pmu-xensummit2013 > And there appears to be a new version of this work, released just yesterday! :-) Have a look here: http://bugs.xenproject.org/xen/mid/%3C1392659764-22183-1-git-send-email-boris.ostrovsky@oracle.com%3E Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question about running a program(Intel PCM) in ring 0 on Xen 2014-02-18 9:14 ` Dario Faggioli 2014-02-18 11:29 ` Dario Faggioli @ 2014-02-18 15:24 ` Meng Xu 2014-02-18 16:16 ` Boris Ostrovsky 1 sibling, 1 reply; 6+ messages in thread From: Meng Xu @ 2014-02-18 15:24 UTC (permalink / raw) To: Dario Faggioli Cc: Boris Ostrovsky, mengxu@cis.upenn.edu, xen-devel@lists.xen.org [-- Attachment #1.1: Type: text/plain, Size: 3813 bytes --] Hi Dario, Thank you so much for your detailed reply! It is really helpful! I'm looking at the vPMU and perf on Xen, and will try it. :-) The reason why I want to know this information from hardware performance counter is because I want to know the interference among each domains when they are running. In addition, when we measure the latency of accessing a large array, the result is out of our expectation. We increase the size of an array from 1KB to 12MB, which covers the L1(32KB), L2(256KB) and L3(12MB) cache size. We expect that the latency of accessing the whole array should have clear cut at around 32KB, 256KB and 12MB because the latency of L1 L2 and L3 are several times different. However, we saw the latency does not increase much when the array size is larger than the size of L1, L2, and L3. It's weird because if we run the same task in Linux on bare machine, it is the expected result. We are not sure if this is because of the virt. overhead or cache miss, that's why we want to know the cache access rate of each domain. It's really appreciated if you can share some of your insight on this. :-) Thank you very much for your time! Best, Meng 2014-02-18 4:14 GMT-05:00 Dario Faggioli <dario.faggioli@citrix.com>: > On lun, 2014-02-17 at 17:32 -0500, Meng Xu wrote: > > Hi, > > > Hi, > > > I'm a PhD student, working on real time system. > > > Cool. There really seems to be a lot of interest in Real-Time > virtualization these days. :-D > > > [My goal] > > I want to measure the cache hit/miss rate of each guest domain in Xen. > > I may also want to measure some other events, say memory access rate, > > for each program in each guest domain in Xen. > > > Ok. Can I, out of curiosity, as you to detail a bit more what your > *final* goal is (I mean, you're interested in these measurements for a > reason, not just for the sake of having them, right?). > > > [The problem I'm encountering] > > I tried intel's Performance Counter Monitor (PCM) in Linux on bare > > machine to get the machine's cache access rate for each level of > > cache, it works very well. > > > > > > However, when I want to use the PCM in Xen and run it in dom0, it > > cannot work. I think the PCM needs to run in ring 0 to read/write the > > MSR. Because dom0 is running in ring 1, so PCM running in dom0 cannot > > work. > > > Indeed. > > > So my question is: > > How can I run a program (say PCM) in ring 0 on Xen? > > > Running "a program" in there is going to be terribly difficult. What I > think you're better off is trying to access, from dom0 and/or > (para)virtualize the counters. > > In fact, there is work going on already on this, although I don't have > all the details about what's the current status. > > > What's in my mind is: > > Writing a hypercall to call the PCM in Xen's kernel space, then the > > PCM will run in ring 0? > > But the problem I'm concerned is that some of the PCM's instruction, > > say printf(), may not be able to run in kernel space? > > > Well, Xen can print, e.g., on a serial console, but again, that's not > what you want. I'm adding the link to a few conversation about virtual > PMU. These are just the very first google's result, so there may well be > more: > > > http://xen.1045712.n5.nabble.com/Virtualization-of-the-CPU-Performance-Monitoring-Unit-td5623065.html > https://lwn.net/Articles/566159/ > > Boris (which I'm Cc-ing), gave a presentation about this at latest Xen > Developers Summit: > http://www.slideshare.net/xen_com_mgr/xen-pmu-xensummit2013 > > Regards, > Dario > > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > > [-- Attachment #1.2: Type: text/html, Size: 6207 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question about running a program(Intel PCM) in ring 0 on Xen 2014-02-18 15:24 ` Meng Xu @ 2014-02-18 16:16 ` Boris Ostrovsky 2014-02-19 14:12 ` Meng Xu 0 siblings, 1 reply; 6+ messages in thread From: Boris Ostrovsky @ 2014-02-18 16:16 UTC (permalink / raw) To: Meng Xu; +Cc: Dario Faggioli, mengxu@cis.upenn.edu, xen-devel@lists.xen.org [-- Attachment #1.1: Type: text/plain, Size: 5037 bytes --] On 02/18/2014 10:24 AM, Meng Xu wrote: > Hi Dario, > > Thank you so much for your detailed reply! It is really helpful! I'm > looking at the vPMU and perf on Xen, and will try it. :-) You will need the Xen patches that Dario pointed you to (thanks Dario) plus Linux kernel and toolstack changes that I can send you in a separate email (they still need some cleanup but should be usable). BTW, you mentioned in the earlier email that you you wrote some code to directly access PMU registers and didn't think the code is particularly useful because of portability concerns. I believe basic counters (such as those for cache misses) and controls are common across pretty much all recent Intel processors. > > The reason why I want to know this information from hardware > performance counter is because I want to know the interference among > each domains when they are running. > > In addition, when we measure the latency of accessing a large array, > the result is out of our expectation. We increase the size of an array > from 1KB to 12MB, which covers the L1(32KB), L2(256KB) and L3(12MB) > cache size. We expect that the latency of accessing the whole array > should have clear cut at around 32KB, 256KB and 12MB because the > latency of L1 L2 and L3 are several times different. > > However, we saw the latency does not increase much when the array size > is larger than the size of L1, L2, and L3. It's weird because if we > run the same task in Linux on bare machine, it is the expected result. Although most likely your vcpus are not migrating you should still make sure that they are pinned (and not oversubscribed to physical processors). And (as with any performance measurements) disable power management and turbo mode. These things often mess up your timing. -boris > > We are not sure if this is because of the virt. overhead or cache > miss, that's why we want to know the cache access rate of each domain. > > It's really appreciated if you can share some of your insight on > this. :-) > > Thank you very much for your time! > > Best, > > Meng > > > 2014-02-18 4:14 GMT-05:00 Dario Faggioli <dario.faggioli@citrix.com > <mailto:dario.faggioli@citrix.com>>: > > On lun, 2014-02-17 at 17:32 -0500, Meng Xu wrote: > > Hi, > > > Hi, > > > I'm a PhD student, working on real time system. > > > Cool. There really seems to be a lot of interest in Real-Time > virtualization these days. :-D > > > [My goal] > > I want to measure the cache hit/miss rate of each guest domain > in Xen. > > I may also want to measure some other events, say memory access > rate, > > for each program in each guest domain in Xen. > > > Ok. Can I, out of curiosity, as you to detail a bit more what your > *final* goal is (I mean, you're interested in these measurements for a > reason, not just for the sake of having them, right?). > > > [The problem I'm encountering] > > I tried intel's Performance Counter Monitor (PCM) in Linux on bare > > machine to get the machine's cache access rate for each level of > > cache, it works very well. > > > > > > However, when I want to use the PCM in Xen and run it in dom0, it > > cannot work. I think the PCM needs to run in ring 0 to > read/write the > > MSR. Because dom0 is running in ring 1, so PCM running in dom0 > cannot > > work. > > > Indeed. > > > So my question is: > > How can I run a program (say PCM) in ring 0 on Xen? > > > Running "a program" in there is going to be terribly difficult. What I > think you're better off is trying to access, from dom0 and/or > (para)virtualize the counters.think > > In fact, there is work going on already on this, although I don't have > all the details about what's the current status. > > > What's in my mind is: > > Writing a hypercall to call the PCM in Xen's kernel space, then the > > PCM will run in ring 0? > > But the problem I'm concerned is that some of the PCM's instruction, > > say printf(), may not be able to run in kernel space? > > > Well, Xen can print, e.g., on a serial console, but again, that's not > what you want. I'm adding the link to a few conversation about virtual > PMU. These are just the very first google's result, so there may > well be > more: > > http://xen.1045712.n5.nabble.com/Virtualization-of-the-CPU-Performance-Monitoring-Unit-td5623065.html > https://lwn.net/Articles/566159/ > > Boris (which I'm Cc-ing), gave a presentation about this at latest Xen > Developers Summit: > http://www.slideshare.net/xen_com_mgr/xen-pmu-xensummit2013 > > Regards, > Dario > > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > > [-- Attachment #1.2: Type: text/html, Size: 10039 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question about running a program(Intel PCM) in ring 0 on Xen 2014-02-18 16:16 ` Boris Ostrovsky @ 2014-02-19 14:12 ` Meng Xu 0 siblings, 0 replies; 6+ messages in thread From: Meng Xu @ 2014-02-19 14:12 UTC (permalink / raw) To: Boris Ostrovsky Cc: Dario Faggioli, mengxu@cis.upenn.edu, xen-devel@lists.xen.org [-- Attachment #1.1: Type: text/plain, Size: 5469 bytes --] Hi Boris, 2014-02-18 11:16 GMT-05:00 Boris Ostrovsky <boris.ostrovsky@oracle.com>: > On 02/18/2014 10:24 AM, Meng Xu wrote: > > Hi Dario, > > Thank you so much for your detailed reply! It is really helpful! I'm > looking at the vPMU and perf on Xen, and will try it. :-) > > > You will need the Xen patches that Dario pointed you to (thanks Dario) > plus Linux kernel and toolstack changes that I can send you in a separate > email (they still need some cleanup but should be usable). > Thank you so much for pointing this out! :) > > BTW, you mentioned in the earlier email that you you wrote some code to > directly access PMU registers and didn't think the code is particularly > useful because of portability concerns. I believe basic counters (such as > those for cache misses) and controls are common across pretty much all > recent Intel processors. > Yes, the counters are there. But when I looked at the events and umask number, they have slightly difference among the 2nd, 3rd and 4th generation of Intel's cpu. Some events are not there in earlier version of CPU. (If I code those difference in the xen tool I wrote, it will be like writing part of intel's PMC. that's why I hope to use the existing work to run in Xen. :-) ) > > The reason why I want to know this information from hardware performance > counter is because I want to know the interference among each domains when > they are running. > > In addition, when we measure the latency of accessing a large array, the > result is out of our expectation. We increase the size of an array from 1KB > to 12MB, which covers the L1(32KB), L2(256KB) and L3(12MB) cache size. We > expect that the latency of accessing the whole array should have clear cut > at around 32KB, 256KB and 12MB because the latency of L1 L2 and L3 are > several times different. > > However, we saw the latency does not increase much when the array size > is larger than the size of L1, L2, and L3. It's weird because if we run the > same task in Linux on bare machine, it is the expected result. > > > Although most likely your vcpus are not migrating you should still make > sure that they are pinned (and not oversubscribed to physical processors). > > Thanks for pointing this out! > And (as with any performance measurements) disable power management and > turbo mode. These things often mess up your timing. > Sure! Thank you very much for your help! best, Meng > > -boris > > > We are not sure if this is because of the virt. overhead or cache miss, > that's why we want to know the cache access rate of each domain. > > It's really appreciated if you can share some of your insight on this. > :-) > > Thank you very much for your time! > > Best, > > Meng > > > 2014-02-18 4:14 GMT-05:00 Dario Faggioli <dario.faggioli@citrix.com>: > >> On lun, 2014-02-17 at 17:32 -0500, Meng Xu wrote: >> > Hi, >> > >> Hi, >> >> > I'm a PhD student, working on real time system. >> > >> Cool. There really seems to be a lot of interest in Real-Time >> virtualization these days. :-D >> >> > [My goal] >> > I want to measure the cache hit/miss rate of each guest domain in Xen. >> > I may also want to measure some other events, say memory access rate, >> > for each program in each guest domain in Xen. >> > >> Ok. Can I, out of curiosity, as you to detail a bit more what your >> *final* goal is (I mean, you're interested in these measurements for a >> reason, not just for the sake of having them, right?). >> >> > [The problem I'm encountering] >> > I tried intel's Performance Counter Monitor (PCM) in Linux on bare >> > machine to get the machine's cache access rate for each level of >> > cache, it works very well. >> > >> > >> > However, when I want to use the PCM in Xen and run it in dom0, it >> > cannot work. I think the PCM needs to run in ring 0 to read/write the >> > MSR. Because dom0 is running in ring 1, so PCM running in dom0 cannot >> > work. >> > >> Indeed. >> >> > So my question is: >> > How can I run a program (say PCM) in ring 0 on Xen? >> > >> Running "a program" in there is going to be terribly difficult. What I >> think you're better off is trying to access, from dom0 and/or >> (para)virtualize the counters.think >> >> >> In fact, there is work going on already on this, although I don't have >> all the details about what's the current status. >> >> > What's in my mind is: >> > Writing a hypercall to call the PCM in Xen's kernel space, then the >> > PCM will run in ring 0? >> > But the problem I'm concerned is that some of the PCM's instruction, >> > say printf(), may not be able to run in kernel space? >> > >> Well, Xen can print, e.g., on a serial console, but again, that's not >> what you want. I'm adding the link to a few conversation about virtual >> PMU. These are just the very first google's result, so there may well be >> more: >> >> >> http://xen.1045712.n5.nabble.com/Virtualization-of-the-CPU-Performance-Monitoring-Unit-td5623065.html >> https://lwn.net/Articles/566159/ >> >> Boris (which I'm Cc-ing), gave a presentation about this at latest Xen >> Developers Summit: >> http://www.slideshare.net/xen_com_mgr/xen-pmu-xensummit2013 >> >> Regards, >> Dario >> >> -- >> <<This happens because I choose it to happen!>> (Raistlin Majere) >> ----------------------------------------------------------------- >> Dario Faggioli, Ph.D, http://about.me/dario.faggioli >> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) >> >> > > [-- Attachment #1.2: Type: text/html, Size: 12161 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-02-19 14:12 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-02-17 22:32 Question about running a program(Intel PCM) in ring 0 on Xen Meng Xu 2014-02-18 9:14 ` Dario Faggioli 2014-02-18 11:29 ` Dario Faggioli 2014-02-18 15:24 ` Meng Xu 2014-02-18 16:16 ` Boris Ostrovsky 2014-02-19 14:12 ` Meng Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).