From mboxrd@z Thu Jan 1 00:00:00 1970 From: Don Slutz Subject: Re: [PATCH for-4.5 v6 04/16] xen: Add vmware_port support Date: Wed, 24 Sep 2014 12:48:31 -0400 Message-ID: <5422F5DF.5070902@terremark.com> References: <1411236447-7435-1-git-send-email-dslutz@verizon.com> <1411236447-7435-5-git-send-email-dslutz@verizon.com> <5422EAEB.2050807@eu.citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5422EAEB.2050807@eu.citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: George Dunlap , Don Slutz , xen-devel@lists.xen.org Cc: Kevin Tian , Keir Fraser , Ian Campbell , Stefano Stabellini , Jun Nakajima , Eddie Dong , Ian Jackson , Tim Deegan , Aravind Gopalakrishnan , Jan Beulich , Andrew Cooper , Boris Ostrovsky , Suravee Suthikulpanit List-Id: xen-devel@lists.xenproject.org On 09/24/14 12:01, George Dunlap wrote: > On 09/20/2014 07:07 PM, Don Slutz wrote: >> This includes adding is_vmware_port_enabled >> >> This is a new domain_create() flag, DOMCRF_vmware_port. It is >> passed to domctl as XEN_DOMCTL_CDF_vmware_port. >> >> This enables limited support of VMware's hyper-call. >> >> This is both a more complete support then in currently provided by >> QEMU and/or KVM and less. The missing part requires QEMU changes >> and has been left out until the QEMU patches are accepted upstream. >> >> VMware's hyper-call is also known as VMware Backdoor I/O Port. >> >> Note: this support does not depend on vmware_hw being non-zero. >> >> Summary is that VMware treats "in (%dx),%eax" (or "out %eax,(%dx)") >> to port 0x5658 specially. Note: since many operations return data >> in EAX, "in (%dx),%eax" is the one to use. The other lengths like >> "in (%dx),%al" will still do things, only AL part of EAX will be >> changed. For "out %eax,(%dx)" of all lengths, EAX will remain >> unchanged. >> >> Also this instruction is allowed to be used from ring 3. To >> support this the vmexit for GP needs to be enabled. I have not >> fully tested that nested HVM is doing the right thing for this. >> >> An open source example of using this is: >> >> http://open-vm-tools.sourceforge.net/ >> >> Which only uses "inl (%dx)". Also >> >> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458 >> >> >> The support included is enough to allow VMware tools to install in a >> HVM domU. >> >> For a debug=y build there is a new command line option >> vmport_debug=. It enabled output to the console of various >> stages of handling the "in (%dx)" instruction. >> >> Signed-off-by: Don Slutz > > [snip] > >> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c >> index 7b1dfe6..e2e4aad 100644 >> --- a/xen/arch/x86/domain.c >> +++ b/xen/arch/x86/domain.c >> @@ -510,6 +510,8 @@ int arch_domain_create(struct domain *d, unsigned >> int domcr_flags) >> d->arch.hvm_domain.mem_sharing_enabled = 0; >> d->arch.s3_integrity = !!(domcr_flags & DOMCRF_s3_integrity); >> + d->arch.hvm_domain.is_vmware_port_enabled = >> + (domcr_flags & DOMCRF_vmware_port); > > Should this be "!!(domcr..."? > I do not think it is needed, but happy to change to that. >> diff --git a/xen/arch/x86/hvm/vmware/vmport.c >> b/xen/arch/x86/hvm/vmware/vmport.c >> new file mode 100644 >> index 0000000..811c303 >> --- /dev/null >> +++ b/xen/arch/x86/hvm/vmware/vmport.c >> @@ -0,0 +1,326 @@ >> +/* >> + * HVM VMPORT emulation >> + * >> + * Copyright (C) 2012 Verizon Corporation >> + * >> + * This file is free software; you can redistribute it and/or modify it >> + * under the terms of the GNU General Public License Version 2 (GPLv2) >> + * as published by the Free Software Foundation. >> + * >> + * This file is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + * General Public License for more details. >> . >> + */ >> + >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +#include "backdoor_def.h" >> +#include "guest_msg_def.h" >> + >> +#define MAX_INST_LEN 15 >> + >> +#ifndef NDEBUG >> +unsigned int opt_vmport_debug __read_mostly; >> +integer_param("vmport_debug", opt_vmport_debug); >> +#endif >> + >> +/* More VMware defines */ >> + >> +#define VMWARE_GUI_AUTO_GRAB 0x001 >> +#define VMWARE_GUI_AUTO_UNGRAB 0x002 >> +#define VMWARE_GUI_AUTO_SCROLL 0x004 >> +#define VMWARE_GUI_AUTO_RAISE 0x008 >> +#define VMWARE_GUI_EXCHANGE_SELECTIONS 0x010 >> +#define VMWARE_GUI_WARP_CURSOR_ON_UNGRAB 0x020 >> +#define VMWARE_GUI_FULL_SCREEN 0x040 >> + >> +#define VMWARE_GUI_TO_FULL_SCREEN 0x080 >> +#define VMWARE_GUI_TO_WINDOW 0x100 >> + >> +#define VMWARE_GUI_AUTO_RAISE_DISABLED 0x200 >> + >> +#define VMWARE_GUI_SYNC_TIME 0x400 >> + >> +/* When set, toolboxes should not show the cursor options page. */ >> +#define VMWARE_DISABLE_CURSOR_OPTIONS 0x800 >> + >> +void vmport_register(struct domain *d) >> +{ >> + register_portio_handler(d, BDOOR_PORT, 4, vmport_ioport); >> +} >> + >> +int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t >> *val) >> +{ >> + struct cpu_user_regs *regs = guest_cpu_user_regs(); >> + uint32_t cmd = regs->rcx & 0xffff; >> + uint32_t magic = regs->rax; >> + int rc = X86EMUL_OKAY; >> + >> + if ( magic == BDOOR_MAGIC ) >> + { >> + uint64_t saved_rax = regs->rax; >> + uint64_t value; >> + >> + VMPORT_DBG_LOG(VMPORT_LOG_TRACE, >> + "VMware trace dir=%d bytes=%u ip=%"PRIx64" >> cmd=%d ax=%" >> + PRIx64" bx=%"PRIx64" cx=%"PRIx64" >> dx=%"PRIx64" si=%" >> + PRIx64" di=%"PRIx64"\n", dir, bytes, >> + regs->rip, cmd, regs->rax, regs->rbx, regs->rcx, >> + regs->rdx, regs->rsi, regs->rdi); >> + switch ( cmd ) >> + { >> + case BDOOR_CMD_GETMHZ: >> + /* ... */ >> + regs->rbx = BDOOR_MAGIC; >> + regs->rax = current->domain->arch.tsc_khz / 1000; >> + break; >> + case BDOOR_CMD_GETVERSION: >> + /* ... */ >> + regs->rbx = BDOOR_MAGIC; >> + /* VERSION_MAGIC */ >> + regs->rax = 6; >> + /* Claim we are an ESX. VMX_TYPE_SCALABLE_SERVER */ >> + regs->rcx = 2; >> + break; >> + case BDOOR_CMD_GETSCREENSIZE: >> + /* We have no screen size */ >> + regs->rax = 0; >> + break; >> + case BDOOR_CMD_GETHWVERSION: >> + /* vmware_hw */ >> + regs->rax = 0; >> + if ( is_hvm_vcpu(current) ) >> + { >> + struct hvm_domain *hd = >> ¤t->domain->arch.hvm_domain; >> + >> + regs->rax = hd->params[HVM_PARAM_VMWARE_HW]; >> + } >> + if ( !regs->rax ) >> + regs->rax = 4; /* Act like version 4 */ >> + break; >> + case BDOOR_CMD_GETHZ: >> + value = current->domain->arch.tsc_khz * 1000; >> + /* apic-frequency (bus speed) */ >> + regs->rcx = (uint32_t)(1000000000ULL / APIC_BUS_CYCLE_NS); >> + /* High part of tsc-frequency */ >> + regs->rbx = (uint32_t)(value >> 32); >> + /* Low part of tsc-frequency */ >> + regs->rax = value; > > Either the comment or the code here is wrong -- this is clearly not > the lower 32 bits, at least on 64-bit guests. :-) > Opps, it should have included the (uint32_t) cast also. Will fix. > If the code is right -- that is, if a 32-bit guest find this truncated > automatically, but a 64-bit guest find all 64 bits here (and thus not > have to reconstruct it) -- you should make the comment more > informative; for example: > /* On 32-bit systems this will be the lower 32 bits. 64-bit systems > can just use the full value from rax. */ > > (Word-wrapped, of course.) > > Hmm -- looks like regs->rax will be clipped to 32 bits for a 4-byte IO > read? In which case the comment here should reflect this, but you > have the same basic issue for BDOOR_CMD_GETTIMEFUL regs->rdx (which > will not be clipped, I don't think). > It will also be "adjusted" for 2 or 1 byte IO. rdx does not get clipped later, but is clipped to 32bits (see below). >> + break; >> + case BDOOR_CMD_GETTIME: >> + value = get_localtime_us(current->domain); >> + /* hostUsecs */ >> + regs->rbx = (uint32_t)(value % 1000000UL); >> + /* hostSecs */ >> + regs->rax = value / 1000000ULL; >> + /* maxTimeLag */ >> + regs->rcx = 0; >> + break; >> + case BDOOR_CMD_GETTIMEFULL: >> + value = get_localtime_us(current->domain); >> + /* ... */ >> + regs->rax = BDOOR_MAGIC; >> + /* hostUsecs */ >> + regs->rbx = (uint32_t)(value % 1000000UL); >> + /* High part of hostSecs */ >> + regs->rsi = (uint32_t)((value / 1000000ULL) >> 32); >> + /* Low part of hostSecs */ >> + regs->rdx = (uint32_t)(value / 1000000ULL); > > Same here. > But the (uint32_t) does make it just 32bits. >> + /* maxTimeLag */ >> + regs->rcx = 0; >> + break; >> + case BDOOR_CMD_GETGUIOPTIONS: >> + regs->rax = VMWARE_GUI_AUTO_GRAB | VMWARE_GUI_AUTO_UNGRAB | >> + VMWARE_GUI_AUTO_RAISE_DISABLED | VMWARE_GUI_SYNC_TIME | >> + VMWARE_DISABLE_CURSOR_OPTIONS; >> + break; >> + case BDOOR_CMD_SETGUIOPTIONS: >> + regs->rax = 0x0; >> + break; >> + default: >> + VMPORT_DBG_LOG(VMPORT_LOG_ERROR, >> + "VMware bytes=%d dir=%d cmd=%d", >> + bytes, dir, cmd); >> + break; >> + } >> + VMPORT_DBG_LOG(VMPORT_LOG_VMWARE_AFTER, >> + "VMware after ip=%"PRIx64" cmd=%d >> ax=%"PRIx64" bx=%" >> + PRIx64" cx=%"PRIx64" dx=%"PRIx64" >> si=%"PRIx64" di=%" >> + PRIx64"\n", >> + regs->rip, cmd, regs->rax, regs->rbx, regs->rcx, >> + regs->rdx, regs->rsi, regs->rdi); >> + if ( dir == IOREQ_READ ) >> + { >> + switch ( bytes ) >> + { >> + case 1: >> + regs->rax = (saved_rax & 0xffffff00) | (regs->rax & >> 0xff); >> + break; >> + case 2: >> + regs->rax = (saved_rax & 0xffff0000) | (regs->rax & >> 0xffff); >> + break; >> + case 4: >> + regs->rax = (uint32_t)regs->rax; >> + break; >> + } >> + *val = regs->rax; >> + } >> + else >> + regs->rax = saved_rax; >> + } >> + else >> + { >> + rc = X86EMUL_UNHANDLEABLE; >> + VMPORT_DBG_LOG(VMPORT_LOG_ERROR, >> + "Not VMware %x vs %x; ip=%"PRIx64" ax=%"PRIx64 >> + " bx=%"PRIx64" cx=%"PRIx64" dx=%"PRIx64" >> si=%"PRIx64 >> + " di=%"PRIx64"", >> + magic, BDOOR_MAGIC, regs->rip, regs->rax, >> regs->rbx, >> + regs->rcx, regs->rdx, regs->rsi, regs->rdi); >> + } >> + >> + return rc; >> +} >> + >> +int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v, >> + unsigned long *inst_len, unsigned long inst_addr, >> + unsigned long ei1, unsigned long ei2) > > I've wondered, in another e-mail, whether it would make more sense to > try to re-use the normal Xen emulation code, instead of duplicating > its IO instruction decoding stuff here. > Since this is only called on a #GP, I do not what to attempt to emulate the instruction by the normal way. In fact the normal Xen emulation should say "do a #GP", not do the VMware hypercall. I will say that I had not looked into getting the normal Xen emulation "fixed" for this case. In all my testing, I have not see this issue. With the patch: Subject: [Xen-devel] [PATCH 5/6] x86/hvm: Forced Emulation Prefix for debug builds of Xen Message-ID: <1411484611-31027-6-git-send-email-andrew.cooper3@citrix.com> I need to look into this. > I think I probably wouldn't make that a blocker for acceptance, > though. However... > Thanks. >> +{ >> + ASSERT(v->domain->arch.hvm_domain.is_vmware_port_enabled); > > At the moment I think this ASSERT is misplaced; there are no checks > for this anywhere in the handler path to this point. If at any time > in the future (or for any other reason) #GP exiting gets enabled (for > example, if the introspection stuff wants to be notifed on #GPs), > you'll end up going through this path whether or not > is_vmware_port_enabled is true. > > I think you should instead "if(!is_vmware_port_enabled) return" here. > That would effectively isolate these new changes from being able to > introduce security issues for VMs which don't enable vmware_port, > making it less risky to accept as-is. > Ok, it was a return in an older version. Happy to switch back. > [snip] > >> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c >> index fc1f882..6fe9389 100644 >> --- a/xen/arch/x86/hvm/vmx/vmcs.c >> +++ b/xen/arch/x86/hvm/vmx/vmcs.c >> @@ -1102,6 +1102,8 @@ static int construct_vmcs(struct vcpu *v) >> v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK >> | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault)) >> + | (v->domain->arch.hvm_domain.is_vmware_port_enabled ? >> + (1U << TRAP_gp_fault) : 0) >> | (1U << TRAP_no_device); > > Probably not mandatory, but it might be nice to pull this bitmap logic > into a function, so that you don't have the code duplication (here and > in vmx_update_guest_cr()). > Ok, Will keep it in mind. -Don Slutz > -George