From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Kevin.Mayer@gdata.de
Cc: xen-devel@lists.xen.org
Subject: Re: Branch Trace Storage for guestsandVPMUinitialization
Date: Thu, 26 Feb 2015 12:53:53 -0500 [thread overview]
Message-ID: <54EF5DB1.4040205@oracle.com> (raw)
In-Reply-To: <5C9C3B9BEF1B354596EAE3D6800D876BA474FB@e1.gdata.de>
On 02/26/2015 08:44 AM, Kevin.Mayer@gdata.de wrote:
>
>> -----Ursprüngliche Nachricht-----
>> Von: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com]
>> Gesendet: Mittwoch, 25. Februar 2015 23:20
>> An: Mayer, Kevin
>> Betreff: Re: AW: AW: [Xen-devel] Branch Trace Storage for guests
>> andVPMUinitialization
>>
>> On 02/25/2015 01:23 PM, Kevin.Mayer@gdata.de wrote:
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com]
>>>> Gesendet: Mittwoch, 25. Februar 2015 17:32
>>>> An: Mayer, Kevin
>>>> Cc: xen-devel@lists.xen.org
>>>> Betreff: Re: AW: [Xen-devel] Branch Trace Storage for guests and
>>>> VPMUinitialization
>>>>
>>>> On 02/25/2015 10:12 AM, Kevin.Mayer@gdata.de wrote:
>>>>>> -----Ursprüngliche Nachricht-----
>>>>>> Von: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com]
>>>>>> Gesendet: Dienstag, 24. Februar 2015 18:13
>>>>>> An: Mayer, Kevin; xen-devel@lists.xen.org
>>>>>> Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMU
>>>>>> initialization
>>>>>>
>>>>>> On 02/24/2015 10:27 AM, Kevin.Mayer@gdata.de wrote:
>>>>>>> Hi guys
>>>>>>>
>>>>>>> I`m trying to set up the BTS so that I can log the branches taken
>>>>>>> in the guest using Xen 4.4.1 with a WinXP SP3 guest on a Core i7
>>>>>>> Sandy Bridge.
>>>>>>>
>>>>>>> I added the vpmu=bts boot parameter to my grub2 configuration and
>>>>>>> extended the libxl,libxc,domctl,… with an own command so that I
>>>>>>> can trigger the activation of the BTS whenever I want.
>>>>>>>
>>>>>> I am not sure why you are doing all these changes to Xen code. BTS
>>>>>> is supposed to be managed from the guest. For example, a Fedora
>> HVM
>>>>>> guest will produce this:
>>>>>>
>>>>>> [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf record -e
>>>>>> branches:u -c 1 -d sleep 1 [ perf record: Woken up 3838 times to
>>>>>> write data ] [ perf record: Captured and wrote 0.704 MB perf.data
>>>>>> (~30756 samples) ]
>>>>>> [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf script -f
>>>>>> ip,addr,sym,dso,symoff --show-kernel-path
>>>>>> ffffffff8167c347 native_irq_return_iret+0x0 (/proc/kcore) =>
>>>>>> 328c001590 [unknown] (/proc/kcore)
>>>>>> ffffffff8167c347 native_irq_return_iret+0x0 (/proc/kcore) =>
>>>>>> 328c001590 [unknown] ([unknown])
>>>>>> 328c001593 [unknown] ([unknown]) => 328c004b70 [unknown]
>>>>>> ([unknown])
>>>>>> ...
>>>>>>
>>>>> I want to be able to log the taken branches (of the guest) without
>>>>> the need
>>>> to modify the guest at all.
>>>>> This means I have to do all the logic in the hypervisor, or am I wrong?
>>>> In that case, yes. But then you have to make sure that at least
>>>> * you don't load guest's VPMU (or, at least, BTS-related
>>>> registers) on context switch
>>>> * You don't send the interrupt to the guest (meaning that you will
>>>> need to somehow inform dom0 of the BTS interrupt)
>>>>
>>>> and probably more.
>>>>
>>>> Essentially, you want dom0 to profile the guest. I have been working
>>>> on patches that would allow that but they are still under review.
>>>>
>>> Yes, this is exactly what I want to do.
>>> Too bad that your patches are under review. Would have been pretty
>> helpful I think.
>>
>> To be honest, I never tested them for BTS so they may not work in that
>> mode. In fact, as you will realize by reading what I said below, they probably
>> don't ;-(
>>
>>> Maybe I should point out that I´m a total noob with xen and I definitely
>> don’t understand all parts yet.
>>> So there may be some dumb mistakes in my assumptions.
>>>
>>>>>>> In this command I do the following:
>>>>>>>
>>>>>>> I set up the memory region for the BTS Buffer and the DS Buffer
>>>>>>> Management Area using xzalloc_bytes
>>>>>>>
>>>>>> I don't think you should be allocating BTS buffers in the
>>>>>> hypervisor, they
>>>> are
>>>>>> in guest's memory.
>>>>> I agree. As I said I think this is where my main problem is at the moment.
>>>>> Is there any way I can allocate memory in the hypervisor in a way
>>>>> the guest
>>>> can access it?
>>>>
>>>> I am not sure this is what you want since you seem to *not* want the
>>>> guest to process the samples, right?
>>>>
>>>> But yes, you can. E.g. something like what map_vcpu_info() does. (I
>>>> have no idea how you'd do this from Windows.)
>>> Right again. As you said my goal is to profile the guest from dom0. So
>> whenever the CPU is in guestmode and a branch is taken it should be stored
>> in the BTS, but not when the CPU is running dom0. My idea was basically to
>> set up the memory for the BTS and the GUEST_IA32_DEBUGCTL so when
>> there is a vmexit the logging stops and starts again when there is a vmenter.
>> As far as I understand the IA32_DEBUGCTL gets switched between the
>> dom0-value and the guest-value (stored in vmcs) when there is a
>> vmexit/vmenter, right?
>>
>> Right. And now I am not longer sure whether your buffer should be in
>> hypervisor or guest's space: after VMENTER the hardware will load guest's
>> versions of IA32_DEBUGCTLMSR and MSR_IA32_DS_AREA. I don't know
>> whether you can prevent this from happening (need to look in the spec).
>> And if that's the case then you might be able to:
>>
>> 1. Map DS area and BTS buffer in both guest and hypervisor. I believe your
>> guest will have to have this mapped since these ares will be accessed via
>> guest's EPT. As I said, I don't know how you'd do this in Windows --- I know
>> nothing about programming there. I assume it can be done since there are
>> Windows PV drivers for Xen.
>> 2. Have dom0 set appropriate bits in IA32_DEBUGCTLMSR to start tracing.
>> You will need to first pause your guest's VCPUs, then update appropriate
>> register in VMCS (bracketed with vmx_vmcs_enter/exit) and then unpause
>> it.
>> 3. If you program BTS to generate interrupts you may need to do something
>> about it in vpmu_interrupt() to prevent those interrupts from going into the
>> guest as this will likely confuse it and it will die (the interrupt I think will be an
>> NMI, making things real bad for the guest).
>> 3. Now you should be able to read buffers from hypervisor.
> Why should I prevent the loading of guest IA32_DEBUGCTLMSR and MSR_IA32_DS_AREA?
I was hoping that you might then keep the buffer in hypervisor space.
But I don't think it's possible to make HW not virtualize those two
registers (DS_AREA in particular).
> The idea was to access/setup the guest IA32_DEBUGCTLMSR and MSR_IA32_DS_AREA when in dom0.
> So when there is a VMENTER the guest registers get loaded and the BTS starts to log.
> And stops of course when there is an VMEXIT.
>
> Regarding 1.
> I`m not sure how I know at which address the BTS is located in this case.
> Let´s say I setup the BTS in the guest at address x. To get this address x I need to
> read the guest MSR_IA32_DS_AREA, right?
If the guest wrote this address to MSR_IA32_DS_AREA then yes.
> For this I would need to access the vcpu->arch_vcpu-> hvm_vcpu->vpmu
> used by the guest since the MSR_IA32_DS_AREA isn`t part of the vmcs
> (and therefore cannot be accessed by the handy __vmread()).
> Is there a good way to get this information during a vpmu_interrupt() (since I believe
> The BTINT will have to be handled there), or maybe a VMEXIT?
I believe the interrupt can only happen when the guest is running on the
physical processor so you should be able to get to guest's VPMU as
vcpu_vpmu(v)->context->ds_area. Same for VMEXIT.
>
> 2. I already use the vmx_vmcs_enter/exit but didn’t think about pausing the vpcu.
> I will add that.
>
> 3. I didn’t look at the BTINT yet, but this sounds reasonable.
>
>>> This would be "the guest is logging the branch traces", but it is setup and
>> controlled from the dom0. So more or less a hybrid I think.
>>>>> Of course the guest must not be able to use this memory in its
>>>>> normal
>>>> operations but just for BTS.
>>>>> Is this even possible? I am rather confused at the moment. :-D
>>>>>
>>>>>>> Then I write the pointer to the BTS Buffer into the DS Buffer
>>>>>>> Management Area at +0x0 and +0x8 (BTS Buffer Base and BTS Index)
>>>>>>>
>>>>>>> When I use vmx_msr_write_intercept to store the value in
>>>>>>> MSR_IA32_DS_AREA the host reboots (my idea is he tries to access a
>>>>>>> vpmu-struct that isn´t there in the current vcpu and panics).
>>>> Who is trying to write to MSR_IA32_DS_AREA? The guest or dom0? I
>>>> thought you said that you want dom0 to do sampling. Or are you trying
>>>> to setup DS area from your guest and control it from dom0? I am
>> somewhat confused.
>>> The dom0 writes to MSR_IA32_DS_AREA. I want to do all the setup and
>>> controlling from dom0 in a way that enables the guest to store branch
>>> traces in the BTS (that was setup by the dom0)
>> I think I understand why you crash hypervisor now. I mentioned above that
>> writing into vmcs requires bracketing by vmx_vmcs_enter/exit. So, in
>> addition to having new vcpu parameter to vmx_msr_write_intercept(), you
>> need to add those two. See vmx_vlapic_msr_changed(), right above
>> vmx_msr_write_intercept(). And don't forget to pause guest's vcpu (I am
>> pretty sure you need that since your guest may be running somewhere else
>> at this time).
>>
>>
>>> Sorry if my explanations are a bit confusing. I myself am confused about
>> this part of the Xen-code.
>>>>>> Can you post hypervisor log? (hard to say how helpful it will be
>>>>>> without seeing your code changes though)
>>>>>>
>>>>> Right after enabling the BTS I get a triple fault.
>>>>> hvm.c:1357:d2 Triple fault on VCPU0 - invoking HVM shutdown action 1.
>>>> That's not host reboot, this is your guest dying.
>>> Yes
>>> When I use my own vmx_msr_write_intercept (which explicitly uses the
>> vcpu of my guest domain instead of the "current") and my own
>> core2_vpmu_do_wrmsr , core2_vpmu_msr_common_check I don’t get a
>> host reboot, but a dying guest when I try to enable BTS. As you said most
>> likely because the MSR_IA32_DS_AREA points to dom0-memory and the
>> hypervisor is not amused when a guest tries to write stuff there.
>>> When I use the build in ones (which all use struct vcpu *v = current;) I get a
>> host reboot.
>>> Maybe because of a missing vpmu-structs as I notice that only one vcpu_id
>> gets initialized in vpmu_initialise during boot.
>>> So when using the build in vmx_msr_write_intercept the writing ends in
>>> vpmu_do_wrmsr at if ( vpmu->arch_vpmu_ops && vpmu-
>>> arch_vpmu_ops->do_wrmsr )
>>> return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content); and
>>> the host reboots.
>>> Maybe I need some special kind of initialization before I call
>> vmx_msr_write_intercept?
>>> Even with
>>> struct vcpu *current_v=current;
>>> vpmu_initialise(current_v);
>>> return_value= vmx_msr_write_intercept(MSR_IA32_DS_AREA,
>>> ds_buffer_management_area); I get an instant host reboot at the above
>>> mentioned return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
>> Right. Because you are trying to access VMCS from dom0 context. dom0
>> doesn't have VMCS as it is a PV guest.
>>
> I thought so, but isn’t the if clause
> if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
> supposed to catch that?
Yes, it should. BTW, why are you explicitly calling vpmu_initialise()?
It should be called during guest VCPU initialization.
I'd need to see 'xl dmesg' output at time of reboot, possibly with
disassembly of code at RIP that presumably will be reported as causing
the reboot.
-boris
>
> Kevin
>
>> -boris
>>
>>>>>>> When I use a modified version of vmx_msr_write_intercept I don’t
>> get
>>>>>>> any crashes as long as I don’t enable BTS and TR in the
>>>>>>> GUEST_IA32_DEBUGCTL (BTR works). When I enable the BTS (and TR)
>>>> the
>>>>>>> guest crashes. I suppose he gets killed by the hypervisor for
>>>>>>> accessing forbidden memory.
>>>>>>>
>>>>>> Possibly because DS area point to hypervisor memory.
>>>>>>
>>>>>>
>>>>>> Having said all this, I am not sure how well BTS works. You did notice
>>>>>> this in the hypervisor log:
>>>>>>
>>>>>> (XEN)
>>>> ******************************************************
>>>>>> (XEN) ** WARNING: Emulation of BTS Feature is switched on **
>>>>>> (XEN) ** Using this processor feature in a virtualized **
>>>>>> (XEN) ** environment is not 100% safe. **
>>>>>> (XEN) ** Setting the DS buffer address with wrong values **
>>>>>> (XEN) ** may lead to hypervisor hangs or crashes. **
>>>>>> (XEN) ** It is NOT recommended for production use! **
>>>>>> (XEN)
>>>> ******************************************************
>>>>> Yes, I saw that. It doesn’t state that BTS is not working at all, just that it is
>>>> not that safe to use.
>>>>> As I understand it as long as I set the DS buffer address correctly I should
>> be
>>>> fine, right?
>>>>
>>>> Right. Except that I am not convinced you did set this buffer correctly,
>>>> which is possibly why your hypervisor crashed (I am not sure I
>>>> understood under what circumstances though).
>>>>
>>>> -boris
>>> We are thinking very much alike. I also am not convinced I set the buffer
>> correctly. ^^
>>> But since I get a reboot as soon as
>>> return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content); gets called
>> I don’t think that the setup of the buffer is the problem (when using the
>> original vmx_msr_write_intercept), but rather something with the setup of
>> the vpmu.
>>> When I use my own vmx_msr_write_intercept with the d->vcpu[0] instead
>> of current the writing succeeds but the guest crashes/gets killed when the
>> BTS is enabled.
>>> So in this second case the setup of the buffer seems to be the problem.
>>>
>>> Kevin
>>>
>>>>> Since I don’t want to use for production that is fine with me. At least for
>>>> now.
>>>>> Kevin
>>>>>> -boris
>>>>>>
>>>>>>
>>>>>>> The modified version of vmx_msr_write_intercept takes a vcpu-struct
>> as
>>>>>>> a parameter and uses this instead of the current vcpu.
>>>>>>>
>>>>>>> Instead of
>>>>>>>
>>>>>>> staticint vmx_msr_write_intercept(unsigned int msr, uint64_t
>>>>>> msr_content)
>>>>>>> {
>>>>>>>
>>>>>>> struct vcpu *v = current;
>>>>>>>
>>>>>>> I just have
>>>>>>>
>>>>>>> staticint own_vmx_msr_write_intercept(unsigned int msr, uint64_t
>>>>>>> msr_content, struct vcpu *v)
>>>>>>>
>>>>>>> I get this vcpu by d->vcpu[0] as I have limited my guest domain to one
>>>>>>> vcpu atm.
>>>>>>>
>>>>>>> Of course I also use similarly modified version of the called
>>>>>>> functions(vpmu_do_wrmsr,…).
>>>>>>>
>>>>>>> I´m pretty sure that my problem is with a wrong scope/usage of the
>>>>>>> vcpus/memory, but I have no idea how to fix this.
>>>>>>>
>>>>>>> I can see a potential problem with the memory allocation (in the host)
>>>>>>> into which the cpu in guest-mode is supposed to write.
>>>>>>>
>>>>>>> Or maybe I got the principle of a vcpu/vpmu all wrong.
>>>>>>>
>>>>>>> Since I couldn’t find any project that uses the BTS for the guest, I
>>>>>>> am wondering if anyone has ever done this and if it is possible at all.
>>>>>>>
>>>>>>> Any input is welcome as I am pretty much stuck atm…
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>>
>>>>>>> ____________
>>>>>>> Virus checked by G Data MailSecurity
>>>>>>> Version: AVA 25.404 dated 24.02.2015
>>>>>>> Virus news: www.antiviruslab.com <http://www.antiviruslab.com>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@lists.xen.org
>>>>>>> http://lists.xen.org/xen-devel
>>>>> ____________
>>>>> Virus checked by G Data MailSecurity
>>>>> Version: AVA 25.418 dated 25.02.2015
>>>>> Virus news: www.antiviruslab.com
>>> ____________
>>> Virus checked by G Data MailSecurity
>>> Version: AVA 25.420 dated 25.02.2015
>>> Virus news: www.antiviruslab.com
> ____________
> Virus checked by G Data MailSecurity
> Version: AVA 25.433 dated 26.02.2015
> Virus news: www.antiviruslab.com
prev parent reply other threads:[~2015-02-26 17:53 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-24 15:27 Branch Trace Storage for guests and VPMU initialization Kevin.Mayer
2015-02-24 17:12 ` Boris Ostrovsky
2015-02-25 15:12 ` Branch Trace Storage for guests and VPMUinitialization Kevin.Mayer
2015-02-25 16:31 ` Boris Ostrovsky
2015-02-26 8:56 ` Dietmar Hahn
2015-02-26 16:35 ` Boris Ostrovsky
2015-02-26 17:57 ` Branch Trace Storage for guests andVPMUinitialization Kevin.Mayer
2015-02-26 18:38 ` Boris Ostrovsky
[not found] ` <5C9C3B9BEF1B354596EAE3D6800D876BA47347@e1.gdata.de>
[not found] ` <54EE4A8E.3030207@oracle.com>
2015-02-26 13:44 ` Branch Trace Storage for guestsandVPMUinitialization Kevin.Mayer
2015-02-26 17:53 ` Boris Ostrovsky [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54EF5DB1.4040205@oracle.com \
--to=boris.ostrovsky@oracle.com \
--cc=Kevin.Mayer@gdata.de \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.