public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: "Kay, Allen M" <allen.m.kay@intel.com>
Cc: kvm-devel@lists.sourceforge.net, Chris Wright <chrisw@redhat.com>,
	Avi Kivity <avik@qumranet.com>,
	Andrea Arcangeli <andrea@qumranet.com>,
	Ben-Ami Yassour1 <benami@il.ibm.com>
Subject: Re: [RFC] [VTD][patch 1/3] vt-d support for pci passthrough: kvm-vtd--kernel.patch
Date: Tue, 06 May 2008 19:31:01 -0500	[thread overview]
Message-ID: <4820F845.3080704@codemonkey.ws> (raw)
In-Reply-To: <1FE6DD409037234FAB833C420AA843EC01438CB3@orsmsx424.amr.corp.intel.com>

Kay, Allen M wrote:
> Kvm kernel changes.
>
> Signed-off-by: Allen M Kay <allen.m.kay@intel.com>
>
> ------
>  arch/x86/kvm/Makefile      |    2 
>  arch/x86/kvm/vtd.c         |  183
> +++++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c         |    7 +
>  include/asm-x86/kvm_host.h |    3 
>  include/asm-x86/kvm_para.h |    1 
>  include/linux/kvm_host.h   |    6 +
>  virt/kvm/kvm_main.c        |    3 
>  7 files changed, 204 insertions(+), 1 deletion(-)
>
> ------
>
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index c97d35c..b1057fb 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -12,7 +12,7 @@ EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
>  kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o
> lapic.o \
>  	i8254.o
>  obj-$(CONFIG_KVM) += kvm.o
> -kvm-intel-objs = vmx.o
> +kvm-intel-objs = vmx.o vtd.o
>  obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
>  kvm-amd-objs = svm.o
>  obj-$(CONFIG_KVM_AMD) += kvm-amd.o
> diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
> new file mode 100644
> index 0000000..9a080b5
> --- /dev/null
> +++ b/arch/x86/kvm/vtd.c
> @@ -0,0 +1,183 @@
> +/*
> + * Copyright (c) 2006, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but
> WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> along with
> + * this program; if not, write to the Free Software Foundation, Inc.,
> 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + *
> + * Copyright (C) 2006-2008 Intel Corporation
> + * Author: Allen M. Kay <allen.m.kay@intel.com>
> + * Author: Weidong Han <weidong.han@intel.com>
> + */
> +
> +#include <linux/list.h>
> +#include <linux/kvm_host.h>
> +#include <linux/pci.h>
> +#include <linux/dmar.h>
> +#include <linux/intel-iommu.h>
> +
> +//#define DEBUG
> +
> +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
> +
> +struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev
> *dev);
> +struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu);
> +void iommu_free_domain(struct dmar_domain *domain);
> +int domain_init(struct dmar_domain *domain, int guest_width);
> +int domain_context_mapping(struct dmar_domain *d,
> +			struct pci_dev *pdev);
> +int domain_page_mapping(struct dmar_domain *domain, dma_addr_t iova,
> +			u64 hpa, size_t size, int prot);
> +void detach_domain_for_dev(struct dmar_domain *domain, u8 bus, u8
> devfn);
> +struct dmar_domain * find_domain(struct pci_dev *pdev);
>   

These definitely need to be moved to a common header.

> +
> +int kvm_iommu_map_pages(struct kvm *kvm,
> +	gfn_t base_gfn, unsigned long npages)
> +{
> +	unsigned long gpa;
> +	struct page *page;
> +	hpa_t hpa;
> +	int j, write;
> +	struct vm_area_struct *vma;
> +
> +	if (!kvm->arch.domain)
> +		return 1;
>   

In the kernel, we should be using -errno to return error codes.

> +	gpa = base_gfn << PAGE_SHIFT;
> +	page = gfn_to_page(kvm, base_gfn);
> +	hpa = page_to_phys(page);
>   

Please use gfn_to_pfn().  Keep in mind, by using gfn_to_page/gfn_to_pfn, 
you take a reference to a page.  You're leaking that reference here.

> +	printk(KERN_DEBUG "kvm_iommu_map_page: gpa = %lx\n", gpa);
> +	printk(KERN_DEBUG "kvm_iommu_map_page: hpa = %llx\n", hpa);
> +	printk(KERN_DEBUG "kvm_iommu_map_page: size = %lx\n",
> +		npages*PAGE_SIZE);
> +
> +	for (j = 0; j < npages; j++) {
> +		gpa +=  PAGE_SIZE;
> +		page = gfn_to_page(kvm, gpa >> PAGE_SHIFT);
> +		hpa = page_to_phys(page);
>   

Again, gfn_to_pfn() and you're taking a reference that I never see you 
releasing.

> +		domain_page_mapping(kvm->arch.domain, gpa, hpa,
> PAGE_SIZE,
> +					DMA_PTE_READ | DMA_PTE_WRITE);
> +		vma = find_vma(current->mm, gpa);
> +		if (!vma)
> +			return 1;
> +		write = (vma->vm_flags & VM_WRITE) != 0;
> +		get_user_pages(current, current->mm, gpa,
> +				PAGE_SIZE, write, 0, NULL, NULL);
>   

I don't quite see what you're doing here.  It looks like you're trying 
to pre-fault the page in?  gfn_to_pfn will do that for you.  You're 
taking a bunch of references here that are never getting released.

I think the general approach here is a bit faulty.  I think what we want 
to do is mlock() from userspace to ensure all the memory is present for 
the guest.  We should combine this with MMU-notifiers such that whenever 
the userspace mapping changes, we can reprogram the IOMMU.  In the case 
where we don't have MMU-notifiers, we simply hold on to the memory 
forever and never program the IOMMU.  The initial mlock() in userspace 
is somewhat of a nop here but it's important with MMU-notifiers because 
we will no longer be holding a reference for guest memory.

We have to ensure we don't swap KVM guest memory while using hardware 
pass-through, but AFAICT, we do not need to make the memory 
non-reclaimable  As long as we reprogram the IOMMU with a new, valid, 
mapping everything should be fine.  mlock() really gives us the right 
semantics.

Semantically, a PV API that supports DMA window registration simply 
mlock()s the DMA regions on behalf of the guest.  No special logic 
should be needed.

Regards,

Anthony Liguori

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

  parent reply	other threads:[~2008-05-07  0:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-05 21:36 [RFC] [VTD][patch 1/3] vt-d support for pci passthrough: kvm-vtd--kernel.patch Kay, Allen M
2008-05-06  7:37 ` Amit Shah
2008-05-06 22:01   ` Kay, Allen M
2008-05-06 10:34 ` Avi Kivity
2008-05-07  0:31 ` Anthony Liguori [this message]
2008-05-07  0:47   ` Kay, Allen M
2008-05-07  1:56     ` Anthony Liguori
2008-05-07  5:51       ` Avi Kivity
2008-05-07 19:22         ` Anthony Liguori
2008-05-11 15:31           ` Avi Kivity
2008-05-11  8:53 ` Muli Ben-Yehuda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4820F845.3080704@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=allen.m.kay@intel.com \
    --cc=andrea@qumranet.com \
    --cc=avik@qumranet.com \
    --cc=benami@il.ibm.com \
    --cc=chrisw@redhat.com \
    --cc=kvm-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox