From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH] Xen PV-on-HVM guest support (v2) Date: Thu, 15 Oct 2009 16:09:36 +0900 Message-ID: <4AD6CAB0.6030301@redhat.com> References: <1255585318.12773.14.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Jan Kiszka , Gerd Hoffmann To: Ed Swierk Return-path: Received: from mx1.redhat.com ([209.132.183.28]:14181 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752187AbZJOHKH (ORCPT ); Thu, 15 Oct 2009 03:10:07 -0400 In-Reply-To: <1255585318.12773.14.camel@localhost.localdomain> Sender: kvm-owner@vger.kernel.org List-ID: On 10/15/2009 02:41 PM, Ed Swierk wrote: > Support for Xen PV-on-HVM guests can be implemented almost entirely in > userspace, except for handling one annoying MSR that maps a Xen > hypercall blob into guest address space. > > A generic mechanism to delegate MSR writes to userspace seems overkill > and risks encouraging similar MSR abuse in the future. Thus this patch > adds special support for the Xen HVM MSR. > > I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell > KVM which MSR the guest will write to, as well as the starting address > and size of the hypercall blobs (one each for 32-bit and 64-bit) that > userspace has loaded from files. When the guest writes to the MSR, KVM > copies one page of the blob from userspace to the guest. > > I've tested this patch with a hacked-up version of Gerd's userspace > code, booting a number of guests (CentOS 5.3 i386 and x86_64, and > FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices. > > v2: fix ioctl struct padding; renumber CAP and ioctl constants; check > kvm_write_guest() return value; change printks to KERN_DEBUG (I think > they're worth keeping for debugging userspace) > > > > +#ifdef KVM_CAP_XEN_HVM > +struct kvm_xen_hvm_config { > + __u32 msr; > + __u8 pad[2]; > + __u8 blob_size[2]; > + __u64 blob_addr[2]; > +}; > +#endif > Please change the arrays to separate variables (e.g. blob_size_32, blob_size_64), so readers don't have to guess the meaning. Also, reserve a bunch of space at the end in case we need more hackery. Is the msr number really variable? Isn't it an ABI? > * ioctls for vcpu fds > Index: kvm-kmod/include/linux/kvm_host.h > =================================================================== > --- kvm-kmod.orig/include/linux/kvm_host.h > +++ kvm-kmod/include/linux/kvm_host.h > @@ -236,6 +236,10 @@ struct kvm { > unsigned long mmu_notifier_seq; > long mmu_notifier_count; > #endif > + > +#ifdef KVM_CAP_XEN_HVM > + struct kvm_xen_hvm_config xen_hvm_config; > +#endif > }; > struct kvm_arch is a better place for this. > /* The guest did something we don't support. */ > Index: kvm-kmod/x86/x86.c > =================================================================== > --- kvm-kmod.orig/x86/x86.c > +++ kvm-kmod/x86/x86.c > @@ -875,6 +875,35 @@ static int set_msr_mce(struct kvm_vcpu * > return 0; > } > > +#ifdef KVM_CAP_XEN_HVM > No need for the ifdef - it will always be defined for x86. > +static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data) > +{ > + int blob = !!(vcpu->arch.shadow_efer& EFER_LME); > Can use is_long_mode() for this. > + u32 pnum = data& ~PAGE_MASK; > + u64 paddr = data& PAGE_MASK; > + u8 *page; > + int r = 1; > + > + if (pnum>= vcpu->kvm->xen_hvm_config.blob_size[blob]) > + goto out; > + page = kzalloc(PAGE_SIZE, GFP_KERNEL); > + if (!page) > + goto out; > + if (copy_from_user(page, (u8 *)vcpu->kvm->xen_hvm_config.blob_addr[blob] > + + pnum * PAGE_SIZE, PAGE_SIZE)) > + goto out_free; > We want to return -EFAULT here (but make sure the entire code path allows this). > + if (kvm_write_guest(vcpu->kvm, paddr, page, PAGE_SIZE)) > + goto out_free; > + printk(KERN_DEBUG "kvm: copied xen hvm blob %d page %d to 0x%llx\n", > + blob, pnum, paddr); > + r = 0; > +out_free: > + kfree(page); > +out: > + return r; > +} > +#endif > + > int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) > { > switch (msr) { > @@ -990,6 +1019,10 @@ int kvm_set_msr_common(struct kvm_vcpu * > "0x%x data 0x%llx\n", msr, data); > break; > default: > +#ifdef KVM_CAP_XEN_HVM > + if (msr&& (msr == vcpu->kvm->xen_hvm_config.msr)) > + return xen_hvm_config(vcpu, data); > +#endif > Again, can skip the ifdef. > if (!ignore_msrs) { > pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", > msr, data); > @@ -2453,6 +2486,17 @@ long kvm_arch_vm_ioctl(struct file *filp > r = 0; > break; > } > +#ifdef KVM_CAP_XEN_HVM > + case KVM_XEN_HVM_CONFIG: { > + r = -EFAULT; > + if (copy_from_user(&kvm->xen_hvm_config, argp, > + sizeof(struct kvm_xen_hvm_config))) > + goto out; > + printk(KERN_DEBUG "kvm: configured xen hvm\n"); > + r = 0; > + break; > + } > +#endif > default: > ; > } > Do we need support for reading the msr? IMO you can drop the debugging printk()s. I don't see how they add much value. Please submit the patch against a current kernel tree, not kvm-kmod. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.