From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH for-4.5 v6 04/16] xen: Add vmware_port support Date: Wed, 24 Sep 2014 17:01:47 +0100 Message-ID: <5422EAEB.2050807@eu.citrix.com> References: <1411236447-7435-1-git-send-email-dslutz@verizon.com> <1411236447-7435-5-git-send-email-dslutz@verizon.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1411236447-7435-5-git-send-email-dslutz@verizon.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Don Slutz , xen-devel@lists.xen.org Cc: Kevin Tian , Keir Fraser , Ian Campbell , Stefano Stabellini , Jun Nakajima , Eddie Dong , Ian Jackson , Tim Deegan , Aravind Gopalakrishnan , Jan Beulich , Andrew Cooper , Boris Ostrovsky , Suravee Suthikulpanit List-Id: xen-devel@lists.xenproject.org On 09/20/2014 07:07 PM, Don Slutz wrote: > This includes adding is_vmware_port_enabled > > This is a new domain_create() flag, DOMCRF_vmware_port. It is > passed to domctl as XEN_DOMCTL_CDF_vmware_port. > > This enables limited support of VMware's hyper-call. > > This is both a more complete support then in currently provided by > QEMU and/or KVM and less. The missing part requires QEMU changes > and has been left out until the QEMU patches are accepted upstream. > > VMware's hyper-call is also known as VMware Backdoor I/O Port. > > Note: this support does not depend on vmware_hw being non-zero. > > Summary is that VMware treats "in (%dx),%eax" (or "out %eax,(%dx)") > to port 0x5658 specially. Note: since many operations return data > in EAX, "in (%dx),%eax" is the one to use. The other lengths like > "in (%dx),%al" will still do things, only AL part of EAX will be > changed. For "out %eax,(%dx)" of all lengths, EAX will remain > unchanged. > > Also this instruction is allowed to be used from ring 3. To > support this the vmexit for GP needs to be enabled. I have not > fully tested that nested HVM is doing the right thing for this. > > An open source example of using this is: > > http://open-vm-tools.sourceforge.net/ > > Which only uses "inl (%dx)". Also > > http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458 > > The support included is enough to allow VMware tools to install in a > HVM domU. > > For a debug=y build there is a new command line option > vmport_debug=. It enabled output to the console of various > stages of handling the "in (%dx)" instruction. > > Signed-off-by: Don Slutz [snip] > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > index 7b1dfe6..e2e4aad 100644 > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -510,6 +510,8 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) > d->arch.hvm_domain.mem_sharing_enabled = 0; > > d->arch.s3_integrity = !!(domcr_flags & DOMCRF_s3_integrity); > + d->arch.hvm_domain.is_vmware_port_enabled = > + (domcr_flags & DOMCRF_vmware_port); Should this be "!!(domcr..."? > diff --git a/xen/arch/x86/hvm/vmware/vmport.c b/xen/arch/x86/hvm/vmware/vmport.c > new file mode 100644 > index 0000000..811c303 > --- /dev/null > +++ b/xen/arch/x86/hvm/vmware/vmport.c > @@ -0,0 +1,326 @@ > +/* > + * HVM VMPORT emulation > + * > + * Copyright (C) 2012 Verizon Corporation > + * > + * This file is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License Version 2 (GPLv2) > + * as published by the Free Software Foundation. > + * > + * This file is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. . > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +#include "backdoor_def.h" > +#include "guest_msg_def.h" > + > +#define MAX_INST_LEN 15 > + > +#ifndef NDEBUG > +unsigned int opt_vmport_debug __read_mostly; > +integer_param("vmport_debug", opt_vmport_debug); > +#endif > + > +/* More VMware defines */ > + > +#define VMWARE_GUI_AUTO_GRAB 0x001 > +#define VMWARE_GUI_AUTO_UNGRAB 0x002 > +#define VMWARE_GUI_AUTO_SCROLL 0x004 > +#define VMWARE_GUI_AUTO_RAISE 0x008 > +#define VMWARE_GUI_EXCHANGE_SELECTIONS 0x010 > +#define VMWARE_GUI_WARP_CURSOR_ON_UNGRAB 0x020 > +#define VMWARE_GUI_FULL_SCREEN 0x040 > + > +#define VMWARE_GUI_TO_FULL_SCREEN 0x080 > +#define VMWARE_GUI_TO_WINDOW 0x100 > + > +#define VMWARE_GUI_AUTO_RAISE_DISABLED 0x200 > + > +#define VMWARE_GUI_SYNC_TIME 0x400 > + > +/* When set, toolboxes should not show the cursor options page. */ > +#define VMWARE_DISABLE_CURSOR_OPTIONS 0x800 > + > +void vmport_register(struct domain *d) > +{ > + register_portio_handler(d, BDOOR_PORT, 4, vmport_ioport); > +} > + > +int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val) > +{ > + struct cpu_user_regs *regs = guest_cpu_user_regs(); > + uint32_t cmd = regs->rcx & 0xffff; > + uint32_t magic = regs->rax; > + int rc = X86EMUL_OKAY; > + > + if ( magic == BDOOR_MAGIC ) > + { > + uint64_t saved_rax = regs->rax; > + uint64_t value; > + > + VMPORT_DBG_LOG(VMPORT_LOG_TRACE, > + "VMware trace dir=%d bytes=%u ip=%"PRIx64" cmd=%d ax=%" > + PRIx64" bx=%"PRIx64" cx=%"PRIx64" dx=%"PRIx64" si=%" > + PRIx64" di=%"PRIx64"\n", dir, bytes, > + regs->rip, cmd, regs->rax, regs->rbx, regs->rcx, > + regs->rdx, regs->rsi, regs->rdi); > + switch ( cmd ) > + { > + case BDOOR_CMD_GETMHZ: > + /* ... */ > + regs->rbx = BDOOR_MAGIC; > + regs->rax = current->domain->arch.tsc_khz / 1000; > + break; > + case BDOOR_CMD_GETVERSION: > + /* ... */ > + regs->rbx = BDOOR_MAGIC; > + /* VERSION_MAGIC */ > + regs->rax = 6; > + /* Claim we are an ESX. VMX_TYPE_SCALABLE_SERVER */ > + regs->rcx = 2; > + break; > + case BDOOR_CMD_GETSCREENSIZE: > + /* We have no screen size */ > + regs->rax = 0; > + break; > + case BDOOR_CMD_GETHWVERSION: > + /* vmware_hw */ > + regs->rax = 0; > + if ( is_hvm_vcpu(current) ) > + { > + struct hvm_domain *hd = ¤t->domain->arch.hvm_domain; > + > + regs->rax = hd->params[HVM_PARAM_VMWARE_HW]; > + } > + if ( !regs->rax ) > + regs->rax = 4; /* Act like version 4 */ > + break; > + case BDOOR_CMD_GETHZ: > + value = current->domain->arch.tsc_khz * 1000; > + /* apic-frequency (bus speed) */ > + regs->rcx = (uint32_t)(1000000000ULL / APIC_BUS_CYCLE_NS); > + /* High part of tsc-frequency */ > + regs->rbx = (uint32_t)(value >> 32); > + /* Low part of tsc-frequency */ > + regs->rax = value; Either the comment or the code here is wrong -- this is clearly not the lower 32 bits, at least on 64-bit guests. :-) If the code is right -- that is, if a 32-bit guest find this truncated automatically, but a 64-bit guest find all 64 bits here (and thus not have to reconstruct it) -- you should make the comment more informative; for example: /* On 32-bit systems this will be the lower 32 bits. 64-bit systems can just use the full value from rax. */ (Word-wrapped, of course.) Hmm -- looks like regs->rax will be clipped to 32 bits for a 4-byte IO read? In which case the comment here should reflect this, but you have the same basic issue for BDOOR_CMD_GETTIMEFUL regs->rdx (which will not be clipped, I don't think). > + break; > + case BDOOR_CMD_GETTIME: > + value = get_localtime_us(current->domain); > + /* hostUsecs */ > + regs->rbx = (uint32_t)(value % 1000000UL); > + /* hostSecs */ > + regs->rax = value / 1000000ULL; > + /* maxTimeLag */ > + regs->rcx = 0; > + break; > + case BDOOR_CMD_GETTIMEFULL: > + value = get_localtime_us(current->domain); > + /* ... */ > + regs->rax = BDOOR_MAGIC; > + /* hostUsecs */ > + regs->rbx = (uint32_t)(value % 1000000UL); > + /* High part of hostSecs */ > + regs->rsi = (uint32_t)((value / 1000000ULL) >> 32); > + /* Low part of hostSecs */ > + regs->rdx = (uint32_t)(value / 1000000ULL); Same here. > + /* maxTimeLag */ > + regs->rcx = 0; > + break; > + case BDOOR_CMD_GETGUIOPTIONS: > + regs->rax = VMWARE_GUI_AUTO_GRAB | VMWARE_GUI_AUTO_UNGRAB | > + VMWARE_GUI_AUTO_RAISE_DISABLED | VMWARE_GUI_SYNC_TIME | > + VMWARE_DISABLE_CURSOR_OPTIONS; > + break; > + case BDOOR_CMD_SETGUIOPTIONS: > + regs->rax = 0x0; > + break; > + default: > + VMPORT_DBG_LOG(VMPORT_LOG_ERROR, > + "VMware bytes=%d dir=%d cmd=%d", > + bytes, dir, cmd); > + break; > + } > + VMPORT_DBG_LOG(VMPORT_LOG_VMWARE_AFTER, > + "VMware after ip=%"PRIx64" cmd=%d ax=%"PRIx64" bx=%" > + PRIx64" cx=%"PRIx64" dx=%"PRIx64" si=%"PRIx64" di=%" > + PRIx64"\n", > + regs->rip, cmd, regs->rax, regs->rbx, regs->rcx, > + regs->rdx, regs->rsi, regs->rdi); > + if ( dir == IOREQ_READ ) > + { > + switch ( bytes ) > + { > + case 1: > + regs->rax = (saved_rax & 0xffffff00) | (regs->rax & 0xff); > + break; > + case 2: > + regs->rax = (saved_rax & 0xffff0000) | (regs->rax & 0xffff); > + break; > + case 4: > + regs->rax = (uint32_t)regs->rax; > + break; > + } > + *val = regs->rax; > + } > + else > + regs->rax = saved_rax; > + } > + else > + { > + rc = X86EMUL_UNHANDLEABLE; > + VMPORT_DBG_LOG(VMPORT_LOG_ERROR, > + "Not VMware %x vs %x; ip=%"PRIx64" ax=%"PRIx64 > + " bx=%"PRIx64" cx=%"PRIx64" dx=%"PRIx64" si=%"PRIx64 > + " di=%"PRIx64"", > + magic, BDOOR_MAGIC, regs->rip, regs->rax, regs->rbx, > + regs->rcx, regs->rdx, regs->rsi, regs->rdi); > + } > + > + return rc; > +} > + > +int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v, > + unsigned long *inst_len, unsigned long inst_addr, > + unsigned long ei1, unsigned long ei2) I've wondered, in another e-mail, whether it would make more sense to try to re-use the normal Xen emulation code, instead of duplicating its IO instruction decoding stuff here. I think I probably wouldn't make that a blocker for acceptance, though. However... > +{ > + ASSERT(v->domain->arch.hvm_domain.is_vmware_port_enabled); At the moment I think this ASSERT is misplaced; there are no checks for this anywhere in the handler path to this point. If at any time in the future (or for any other reason) #GP exiting gets enabled (for example, if the introspection stuff wants to be notifed on #GPs), you'll end up going through this path whether or not is_vmware_port_enabled is true. I think you should instead "if(!is_vmware_port_enabled) return" here. That would effectively isolate these new changes from being able to introduce security issues for VMs which don't enable vmware_port, making it less risky to accept as-is. [snip] > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c > index fc1f882..6fe9389 100644 > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -1102,6 +1102,8 @@ static int construct_vmcs(struct vcpu *v) > > v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK > | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault)) > + | (v->domain->arch.hvm_domain.is_vmware_port_enabled ? > + (1U << TRAP_gp_fault) : 0) > | (1U << TRAP_no_device); Probably not mandatory, but it might be nice to pull this bitmap logic into a function, so that you don't have the code duplication (here and in vmx_update_guest_cr()). -George