From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH] Xen PV-on-HVM guest support (v2)
Date: Thu, 15 Oct 2009 16:09:36 +0900
Message-ID: <4AD6CAB0.6030301@redhat.com>
References: <1255585318.12773.14.camel@localhost.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org, Jan Kiszka <jan.kiszka@web.de>,
	Gerd Hoffmann <kraxel@redhat.com>
To: Ed Swierk <eswierk@aristanetworks.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:14181 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752187AbZJOHKH (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 15 Oct 2009 03:10:07 -0400
In-Reply-To: <1255585318.12773.14.camel@localhost.localdomain>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 10/15/2009 02:41 PM, Ed Swierk wrote:
> Support for Xen PV-on-HVM guests can be implemented almost entirely in
> userspace, except for handling one annoying MSR that maps a Xen
> hypercall blob into guest address space.
>
> A generic mechanism to delegate MSR writes to userspace seems overkill
> and risks encouraging similar MSR abuse in the future.  Thus this patch
> adds special support for the Xen HVM MSR.
>
> I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
> KVM which MSR the guest will write to, as well as the starting address
> and size of the hypercall blobs (one each for 32-bit and 64-bit) that
> userspace has loaded from files.  When the guest writes to the MSR, KVM
> copies one page of the blob from userspace to the guest.
>
> I've tested this patch with a hacked-up version of Gerd's userspace
> code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
> FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.
>
> v2: fix ioctl struct padding; renumber CAP and ioctl constants; check
> kvm_write_guest() return value; change printks to KERN_DEBUG (I think
> they're worth keeping for debugging userspace)
>
>
>
> +#ifdef KVM_CAP_XEN_HVM
> +struct kvm_xen_hvm_config {
> +	__u32 msr;
> +	__u8 pad[2];
> +	__u8 blob_size[2];
> +	__u64 blob_addr[2];
> +};
> +#endif
>    

Please change the arrays to separate variables (e.g. blob_size_32, 
blob_size_64), so readers don't have to guess the meaning.

Also, reserve a bunch of space at the end in case we need more hackery.

Is the msr number really variable?  Isn't it an ABI?

>    * ioctls for vcpu fds
> Index: kvm-kmod/include/linux/kvm_host.h
> ===================================================================
> --- kvm-kmod.orig/include/linux/kvm_host.h
> +++ kvm-kmod/include/linux/kvm_host.h
> @@ -236,6 +236,10 @@ struct kvm {
>   	unsigned long mmu_notifier_seq;
>   	long mmu_notifier_count;
>   #endif
> +
> +#ifdef KVM_CAP_XEN_HVM
> +	struct kvm_xen_hvm_config xen_hvm_config;
> +#endif
>   };
>    

struct kvm_arch is a better place for this.

>   /* The guest did something we don't support. */
> Index: kvm-kmod/x86/x86.c
> ===================================================================
> --- kvm-kmod.orig/x86/x86.c
> +++ kvm-kmod/x86/x86.c
> @@ -875,6 +875,35 @@ static int set_msr_mce(struct kvm_vcpu *
>   	return 0;
>   }
>
> +#ifdef KVM_CAP_XEN_HVM
>    

No need for the ifdef - it will always be defined for x86.

> +static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
> +{
> +	int blob = !!(vcpu->arch.shadow_efer&  EFER_LME);
>    

Can use is_long_mode() for this.

> +	u32 pnum = data&  ~PAGE_MASK;
> +	u64 paddr = data&  PAGE_MASK;
> +	u8 *page;
> +	int r = 1;
> +
> +	if (pnum>= vcpu->kvm->xen_hvm_config.blob_size[blob])
> +		goto out;
> +	page = kzalloc(PAGE_SIZE, GFP_KERNEL);
> +	if (!page)
> +		goto out;
> +	if (copy_from_user(page, (u8 *)vcpu->kvm->xen_hvm_config.blob_addr[blob]
> +			   + pnum * PAGE_SIZE, PAGE_SIZE))
> +		goto out_free;
>    

We want to return -EFAULT here (but make sure the entire code path 
allows this).

> +	if (kvm_write_guest(vcpu->kvm, paddr, page, PAGE_SIZE))
> +		goto out_free;
> +	printk(KERN_DEBUG "kvm: copied xen hvm blob %d page %d to 0x%llx\n",
> +	       blob, pnum, paddr);
> +	r = 0;
> +out_free:
> +	kfree(page);
> +out:
> +	return r;
> +}
> +#endif
> +
>   int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
>   {
>   	switch (msr) {
> @@ -990,6 +1019,10 @@ int kvm_set_msr_common(struct kvm_vcpu *
>   			"0x%x data 0x%llx\n", msr, data);
>   		break;
>   	default:
> +#ifdef KVM_CAP_XEN_HVM
> +		if (msr&&  (msr == vcpu->kvm->xen_hvm_config.msr))
> +			return xen_hvm_config(vcpu, data);
> +#endif
>    

Again, can skip the ifdef.

>   		if (!ignore_msrs) {
>   			pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n",
>   				msr, data);
> @@ -2453,6 +2486,17 @@ long kvm_arch_vm_ioctl(struct file *filp
>   		r = 0;
>   		break;
>   	}
> +#ifdef KVM_CAP_XEN_HVM
> +	case KVM_XEN_HVM_CONFIG: {
> +		r = -EFAULT;
> +		if (copy_from_user(&kvm->xen_hvm_config, argp,
> +				   sizeof(struct kvm_xen_hvm_config)))
> +			goto out;
> +		printk(KERN_DEBUG "kvm: configured xen hvm\n");
> +		r = 0;
> +		break;
> +	}
> +#endif
>   	default:
>   		;
>   	}
>    

Do we need support for reading the msr?

IMO you can drop the debugging printk()s.  I don't see how they add much 
value.

Please submit the patch against a current kernel tree, not kvm-kmod.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.