From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: Branch Trace Storage for guestsandVPMUinitialization Date: Thu, 26 Feb 2015 12:53:53 -0500 Message-ID: <54EF5DB1.4040205@oracle.com> References: <5C9C3B9BEF1B354596EAE3D6800D876BA3BF8F@e1.gdata.de> <54ECB114.6040101@oracle.com> <5C9C3B9BEF1B354596EAE3D6800D876BA441F4@e1.gdata.de> <54EDF8E3.2060609@oracle.com> <5C9C3B9BEF1B354596EAE3D6800D876BA47347@e1.gdata.de> <54EE4A8E.3030207@oracle.com> <5C9C3B9BEF1B354596EAE3D6800D876BA474FB@e1.gdata.de> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <5C9C3B9BEF1B354596EAE3D6800D876BA474FB@e1.gdata.de> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Kevin.Mayer@gdata.de Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 02/26/2015 08:44 AM, Kevin.Mayer@gdata.de wrote: > >> -----Urspr=FCngliche Nachricht----- >> Von: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com] >> Gesendet: Mittwoch, 25. Februar 2015 23:20 >> An: Mayer, Kevin >> Betreff: Re: AW: AW: [Xen-devel] Branch Trace Storage for guests >> andVPMUinitialization >> >> On 02/25/2015 01:23 PM, Kevin.Mayer@gdata.de wrote: >>>> -----Urspr=FCngliche Nachricht----- >>>> Von: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com] >>>> Gesendet: Mittwoch, 25. Februar 2015 17:32 >>>> An: Mayer, Kevin >>>> Cc: xen-devel@lists.xen.org >>>> Betreff: Re: AW: [Xen-devel] Branch Trace Storage for guests and >>>> VPMUinitialization >>>> >>>> On 02/25/2015 10:12 AM, Kevin.Mayer@gdata.de wrote: >>>>>> -----Urspr=FCngliche Nachricht----- >>>>>> Von: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com] >>>>>> Gesendet: Dienstag, 24. Februar 2015 18:13 >>>>>> An: Mayer, Kevin; xen-devel@lists.xen.org >>>>>> Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMU >>>>>> initialization >>>>>> >>>>>> On 02/24/2015 10:27 AM, Kevin.Mayer@gdata.de wrote: >>>>>>> Hi guys >>>>>>> >>>>>>> I`m trying to set up the BTS so that I can log the branches taken >>>>>>> in the guest using Xen 4.4.1 with a WinXP SP3 guest on a Core i7 >>>>>>> Sandy Bridge. >>>>>>> >>>>>>> I added the vpmu=3Dbts boot parameter to my grub2 configuration and >>>>>>> extended the libxl,libxc,domctl,=85 with an own command so that I >>>>>>> can trigger the activation of the BTS whenever I want. >>>>>>> >>>>>> I am not sure why you are doing all these changes to Xen code. BTS >>>>>> is supposed to be managed from the guest. For example, a Fedora >> HVM >>>>>> guest will produce this: >>>>>> >>>>>> [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf record -e >>>>>> branches:u -c 1 -d sleep 1 [ perf record: Woken up 3838 times to >>>>>> write data ] [ perf record: Captured and wrote 0.704 MB perf.data >>>>>> (~30756 samples) ] >>>>>> [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf script -f >>>>>> ip,addr,sym,dso,symoff --show-kernel-path >>>>>> ffffffff8167c347 native_irq_return_iret+0x0 (/proc/kcore) =3D> >>>>>> 328c001590 [unknown] (/proc/kcore) >>>>>> ffffffff8167c347 native_irq_return_iret+0x0 (/proc/kcore) =3D> >>>>>> 328c001590 [unknown] ([unknown]) >>>>>> 328c001593 [unknown] ([unknown]) =3D> 328c004b70 [u= nknown] >>>>>> ([unknown]) >>>>>> ... >>>>>> >>>>> I want to be able to log the taken branches (of the guest) without >>>>> the need >>>> to modify the guest at all. >>>>> This means I have to do all the logic in the hypervisor, or am I wron= g? >>>> In that case, yes. But then you have to make sure that at least >>>> * you don't load guest's VPMU (or, at least, BTS-related >>>> registers) on context switch >>>> * You don't send the interrupt to the guest (meaning that you will >>>> need to somehow inform dom0 of the BTS interrupt) >>>> >>>> and probably more. >>>> >>>> Essentially, you want dom0 to profile the guest. I have been working >>>> on patches that would allow that but they are still under review. >>>> >>> Yes, this is exactly what I want to do. >>> Too bad that your patches are under review. Would have been pretty >> helpful I think. >> >> To be honest, I never tested them for BTS so they may not work in that >> mode. In fact, as you will realize by reading what I said below, they pr= obably >> don't ;-( >> >>> Maybe I should point out that I=B4m a total noob with xen and I definit= ely >> don=92t understand all parts yet. >>> So there may be some dumb mistakes in my assumptions. >>> >>>>>>> In this command I do the following: >>>>>>> >>>>>>> I set up the memory region for the BTS Buffer and the DS Buffer >>>>>>> Management Area using xzalloc_bytes >>>>>>> >>>>>> I don't think you should be allocating BTS buffers in the >>>>>> hypervisor, they >>>> are >>>>>> in guest's memory. >>>>> I agree. As I said I think this is where my main problem is at the mo= ment. >>>>> Is there any way I can allocate memory in the hypervisor in a way >>>>> the guest >>>> can access it? >>>> >>>> I am not sure this is what you want since you seem to *not* want the >>>> guest to process the samples, right? >>>> >>>> But yes, you can. E.g. something like what map_vcpu_info() does. (I >>>> have no idea how you'd do this from Windows.) >>> Right again. As you said my goal is to profile the guest from dom0. So >> whenever the CPU is in guestmode and a branch is taken it should be stor= ed >> in the BTS, but not when the CPU is running dom0. My idea was basically = to >> set up the memory for the BTS and the GUEST_IA32_DEBUGCTL so when >> there is a vmexit the logging stops and starts again when there is a vme= nter. >> As far as I understand the IA32_DEBUGCTL gets switched between the >> dom0-value and the guest-value (stored in vmcs) when there is a >> vmexit/vmenter, right? >> >> Right. And now I am not longer sure whether your buffer should be in >> hypervisor or guest's space: after VMENTER the hardware will load guest's >> versions of IA32_DEBUGCTLMSR and MSR_IA32_DS_AREA. I don't know >> whether you can prevent this from happening (need to look in the spec). >> And if that's the case then you might be able to: >> >> 1. Map DS area and BTS buffer in both guest and hypervisor. I believe yo= ur >> guest will have to have this mapped since these ares will be accessed via >> guest's EPT. As I said, I don't know how you'd do this in Windows --- I = know >> nothing about programming there. I assume it can be done since there are >> Windows PV drivers for Xen. >> 2. Have dom0 set appropriate bits in IA32_DEBUGCTLMSR to start tracing. >> You will need to first pause your guest's VCPUs, then update appropriate >> register in VMCS (bracketed with vmx_vmcs_enter/exit) and then unpause >> it. >> 3. If you program BTS to generate interrupts you may need to do something >> about it in vpmu_interrupt() to prevent those interrupts from going into= the >> guest as this will likely confuse it and it will die (the interrupt I th= ink will be an >> NMI, making things real bad for the guest). >> 3. Now you should be able to read buffers from hypervisor. > Why should I prevent the loading of guest IA32_DEBUGCTLMSR and MSR_IA32_D= S_AREA? I was hoping that you might then keep the buffer in hypervisor space. = But I don't think it's possible to make HW not virtualize those two = registers (DS_AREA in particular). > The idea was to access/setup the guest IA32_DEBUGCTLMSR and MSR_IA32_DS_A= REA when in dom0. > So when there is a VMENTER the guest registers get loaded and the BTS sta= rts to log. > And stops of course when there is an VMEXIT. > > Regarding 1. > I`m not sure how I know at which address the BTS is located in this case. > Let=B4s say I setup the BTS in the guest at address x. To get this addres= s x I need to > read the guest MSR_IA32_DS_AREA, right? If the guest wrote this address to MSR_IA32_DS_AREA then yes. > For this I would need to access the vcpu->arch_vcpu-> hvm_vcpu->vpmu > used by the guest since the MSR_IA32_DS_AREA isn`t part of the vmcs > (and therefore cannot be accessed by the handy __vmread()). > Is there a good way to get this information during a vpmu_interrupt() (si= nce I believe > The BTINT will have to be handled there), or maybe a VMEXIT? I believe the interrupt can only happen when the guest is running on the = physical processor so you should be able to get to guest's VPMU as = vcpu_vpmu(v)->context->ds_area. Same for VMEXIT. > > 2. I already use the vmx_vmcs_enter/exit but didn=92t think about pausing= the vpcu. > I will add that. > > 3. I didn=92t look at the BTINT yet, but this sounds reasonable. > >>> This would be "the guest is logging the branch traces", but it is setup= and >> controlled from the dom0. So more or less a hybrid I think. >>>>> Of course the guest must not be able to use this memory in its >>>>> normal >>>> operations but just for BTS. >>>>> Is this even possible? I am rather confused at the moment. :-D >>>>> >>>>>>> Then I write the pointer to the BTS Buffer into the DS Buffer >>>>>>> Management Area at +0x0 and +0x8 (BTS Buffer Base and BTS Index) >>>>>>> >>>>>>> When I use vmx_msr_write_intercept to store the value in >>>>>>> MSR_IA32_DS_AREA the host reboots (my idea is he tries to access a >>>>>>> vpmu-struct that isn=B4t there in the current vcpu and panics). >>>> Who is trying to write to MSR_IA32_DS_AREA? The guest or dom0? I >>>> thought you said that you want dom0 to do sampling. Or are you trying >>>> to setup DS area from your guest and control it from dom0? I am >> somewhat confused. >>> The dom0 writes to MSR_IA32_DS_AREA. I want to do all the setup and >>> controlling from dom0 in a way that enables the guest to store branch >>> traces in the BTS (that was setup by the dom0) >> I think I understand why you crash hypervisor now. I mentioned above that >> writing into vmcs requires bracketing by vmx_vmcs_enter/exit. So, in >> addition to having new vcpu parameter to vmx_msr_write_intercept(), you >> need to add those two. See vmx_vlapic_msr_changed(), right above >> vmx_msr_write_intercept(). And don't forget to pause guest's vcpu (I am >> pretty sure you need that since your guest may be running somewhere else >> at this time). >> >> >>> Sorry if my explanations are a bit confusing. I myself am confused about >> this part of the Xen-code. >>>>>> Can you post hypervisor log? (hard to say how helpful it will be >>>>>> without seeing your code changes though) >>>>>> >>>>> Right after enabling the BTS I get a triple fault. >>>>> hvm.c:1357:d2 Triple fault on VCPU0 - invoking HVM shutdown action 1. >>>> That's not host reboot, this is your guest dying. >>> Yes >>> When I use my own vmx_msr_write_intercept (which explicitly uses the >> vcpu of my guest domain instead of the "current") and my own >> core2_vpmu_do_wrmsr , core2_vpmu_msr_common_check I don=92t get a >> host reboot, but a dying guest when I try to enable BTS. As you said most >> likely because the MSR_IA32_DS_AREA points to dom0-memory and the >> hypervisor is not amused when a guest tries to write stuff there. >>> When I use the build in ones (which all use struct vcpu *v =3D current;= ) I get a >> host reboot. >>> Maybe because of a missing vpmu-structs as I notice that only one vcpu_= id >> gets initialized in vpmu_initialise during boot. >>> So when using the build in vmx_msr_write_intercept the writing ends in >>> vpmu_do_wrmsr at if ( vpmu->arch_vpmu_ops && vpmu- >>> arch_vpmu_ops->do_wrmsr ) >>> return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content); and >>> the host reboots. >>> Maybe I need some special kind of initialization before I call >> vmx_msr_write_intercept? >>> Even with >>> struct vcpu *current_v=3Dcurrent; >>> vpmu_initialise(current_v); >>> return_value=3D vmx_msr_write_intercept(MSR_IA32_DS_AREA, >>> ds_buffer_management_area); I get an instant host reboot at the above >>> mentioned return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content); >> Right. Because you are trying to access VMCS from dom0 context. dom0 >> doesn't have VMCS as it is a PV guest. >> > I thought so, but isn=92t the if clause > if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr ) > supposed to catch that? Yes, it should. BTW, why are you explicitly calling vpmu_initialise()? = It should be called during guest VCPU initialization. I'd need to see 'xl dmesg' output at time of reboot, possibly with = disassembly of code at RIP that presumably will be reported as causing = the reboot. -boris > > Kevin > >> -boris >> >>>>>>> When I use a modified version of vmx_msr_write_intercept I don=92t >> get >>>>>>> any crashes as long as I don=92t enable BTS and TR in the >>>>>>> GUEST_IA32_DEBUGCTL (BTR works). When I enable the BTS (and TR) >>>> the >>>>>>> guest crashes. I suppose he gets killed by the hypervisor for >>>>>>> accessing forbidden memory. >>>>>>> >>>>>> Possibly because DS area point to hypervisor memory. >>>>>> >>>>>> >>>>>> Having said all this, I am not sure how well BTS works. You did noti= ce >>>>>> this in the hypervisor log: >>>>>> >>>>>> (XEN) >>>> ****************************************************** >>>>>> (XEN) ** WARNING: Emulation of BTS Feature is switched on ** >>>>>> (XEN) ** Using this processor feature in a virtualized ** >>>>>> (XEN) ** environment is not 100% safe. ** >>>>>> (XEN) ** Setting the DS buffer address with wrong values ** >>>>>> (XEN) ** may lead to hypervisor hangs or crashes. ** >>>>>> (XEN) ** It is NOT recommended for production use! ** >>>>>> (XEN) >>>> ****************************************************** >>>>> Yes, I saw that. It doesn=92t state that BTS is not working at all, j= ust that it is >>>> not that safe to use. >>>>> As I understand it as long as I set the DS buffer address correctly I= should >> be >>>> fine, right? >>>> >>>> Right. Except that I am not convinced you did set this buffer correctl= y, >>>> which is possibly why your hypervisor crashed (I am not sure I >>>> understood under what circumstances though). >>>> >>>> -boris >>> We are thinking very much alike. I also am not convinced I set the buff= er >> correctly. ^^ >>> But since I get a reboot as soon as >>> return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content); gets called >> I don=92t think that the setup of the buffer is the problem (when using = the >> original vmx_msr_write_intercept), but rather something with the setup of >> the vpmu. >>> When I use my own vmx_msr_write_intercept with the d->vcpu[0] instead >> of current the writing succeeds but the guest crashes/gets killed when t= he >> BTS is enabled. >>> So in this second case the setup of the buffer seems to be the problem. >>> >>> Kevin >>> >>>>> Since I don=92t want to use for production that is fine with me. At l= east for >>>> now. >>>>> Kevin >>>>>> -boris >>>>>> >>>>>> >>>>>>> The modified version of vmx_msr_write_intercept takes a vcpu-struct >> as >>>>>>> a parameter and uses this instead of the current vcpu. >>>>>>> >>>>>>> Instead of >>>>>>> >>>>>>> staticint vmx_msr_write_intercept(unsigned int msr, uint64_t >>>>>> msr_content) >>>>>>> { >>>>>>> >>>>>>> struct vcpu *v =3D current; >>>>>>> >>>>>>> I just have >>>>>>> >>>>>>> staticint own_vmx_msr_write_intercept(unsigned int msr, uint64_t >>>>>>> msr_content, struct vcpu *v) >>>>>>> >>>>>>> I get this vcpu by d->vcpu[0] as I have limited my guest domain to = one >>>>>>> vcpu atm. >>>>>>> >>>>>>> Of course I also use similarly modified version of the called >>>>>>> functions(vpmu_do_wrmsr,=85). >>>>>>> >>>>>>> I=B4m pretty sure that my problem is with a wrong scope/usage of the >>>>>>> vcpus/memory, but I have no idea how to fix this. >>>>>>> >>>>>>> I can see a potential problem with the memory allocation (in the ho= st) >>>>>>> into which the cpu in guest-mode is supposed to write. >>>>>>> >>>>>>> Or maybe I got the principle of a vcpu/vpmu all wrong. >>>>>>> >>>>>>> Since I couldn=92t find any project that uses the BTS for the guest= , I >>>>>>> am wondering if anyone has ever done this and if it is possible at = all. >>>>>>> >>>>>>> Any input is welcome as I am pretty much stuck atm=85 >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> Kevin >>>>>>> >>>>>>> >>>>>>> ____________ >>>>>>> Virus checked by G Data MailSecurity >>>>>>> Version: AVA 25.404 dated 24.02.2015 >>>>>>> Virus news: www.antiviruslab.com >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Xen-devel mailing list >>>>>>> Xen-devel@lists.xen.org >>>>>>> http://lists.xen.org/xen-devel >>>>> ____________ >>>>> Virus checked by G Data MailSecurity >>>>> Version: AVA 25.418 dated 25.02.2015 >>>>> Virus news: www.antiviruslab.com >>> ____________ >>> Virus checked by G Data MailSecurity >>> Version: AVA 25.420 dated 25.02.2015 >>> Virus news: www.antiviruslab.com > ____________ > Virus checked by G Data MailSecurity > Version: AVA 25.433 dated 26.02.2015 > Virus news: www.antiviruslab.com