All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: "Kay, Allen M" <allen.m.kay@intel.com>
Cc: kvm-devel@lists.sourceforge.net, Chris Wright <chrisw@redhat.com>,
	Avi Kivity <avik@qumranet.com>,
	Andrea Arcangeli <andrea@qumranet.com>,
	Ben-Ami Yassour1 <benami@il.ibm.com>
Subject: Re: [RFC] [VTD][patch 1/3] vt-d support for pci passthrough: kvm-vtd--kernel.patch
Date: Tue, 06 May 2008 19:31:01 -0500	[thread overview]
Message-ID: <4820F845.3080704@codemonkey.ws> (raw)
In-Reply-To: <1FE6DD409037234FAB833C420AA843EC01438CB3@orsmsx424.amr.corp.intel.com>

Kay, Allen M wrote:
> Kvm kernel changes.
>
> Signed-off-by: Allen M Kay <allen.m.kay@intel.com>
>
> ------
>  arch/x86/kvm/Makefile      |    2 
>  arch/x86/kvm/vtd.c         |  183
> +++++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c         |    7 +
>  include/asm-x86/kvm_host.h |    3 
>  include/asm-x86/kvm_para.h |    1 
>  include/linux/kvm_host.h   |    6 +
>  virt/kvm/kvm_main.c        |    3 
>  7 files changed, 204 insertions(+), 1 deletion(-)
>
> ------
>
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index c97d35c..b1057fb 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -12,7 +12,7 @@ EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
>  kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o
> lapic.o \
>  	i8254.o
>  obj-$(CONFIG_KVM) += kvm.o
> -kvm-intel-objs = vmx.o
> +kvm-intel-objs = vmx.o vtd.o
>  obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
>  kvm-amd-objs = svm.o
>  obj-$(CONFIG_KVM_AMD) += kvm-amd.o
> diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
> new file mode 100644
> index 0000000..9a080b5
> --- /dev/null
> +++ b/arch/x86/kvm/vtd.c
> @@ -0,0 +1,183 @@
> +/*
> + * Copyright (c) 2006, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but
> WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> along with
> + * this program; if not, write to the Free Software Foundation, Inc.,
> 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + *
> + * Copyright (C) 2006-2008 Intel Corporation
> + * Author: Allen M. Kay <allen.m.kay@intel.com>
> + * Author: Weidong Han <weidong.han@intel.com>
> + */
> +
> +#include <linux/list.h>
> +#include <linux/kvm_host.h>
> +#include <linux/pci.h>
> +#include <linux/dmar.h>
> +#include <linux/intel-iommu.h>
> +
> +//#define DEBUG
> +
> +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
> +
> +struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev
> *dev);
> +struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu);
> +void iommu_free_domain(struct dmar_domain *domain);
> +int domain_init(struct dmar_domain *domain, int guest_width);
> +int domain_context_mapping(struct dmar_domain *d,
> +			struct pci_dev *pdev);
> +int domain_page_mapping(struct dmar_domain *domain, dma_addr_t iova,
> +			u64 hpa, size_t size, int prot);
> +void detach_domain_for_dev(struct dmar_domain *domain, u8 bus, u8
> devfn);
> +struct dmar_domain * find_domain(struct pci_dev *pdev);
>   

These definitely need to be moved to a common header.

> +
> +int kvm_iommu_map_pages(struct kvm *kvm,
> +	gfn_t base_gfn, unsigned long npages)
> +{
> +	unsigned long gpa;
> +	struct page *page;
> +	hpa_t hpa;
> +	int j, write;
> +	struct vm_area_struct *vma;
> +
> +	if (!kvm->arch.domain)
> +		return 1;
>   

In the kernel, we should be using -errno to return error codes.

> +	gpa = base_gfn << PAGE_SHIFT;
> +	page = gfn_to_page(kvm, base_gfn);
> +	hpa = page_to_phys(page);
>   

Please use gfn_to_pfn().  Keep in mind, by using gfn_to_page/gfn_to_pfn, 
you take a reference to a page.  You're leaking that reference here.

> +	printk(KERN_DEBUG "kvm_iommu_map_page: gpa = %lx\n", gpa);
> +	printk(KERN_DEBUG "kvm_iommu_map_page: hpa = %llx\n", hpa);
> +	printk(KERN_DEBUG "kvm_iommu_map_page: size = %lx\n",
> +		npages*PAGE_SIZE);
> +
> +	for (j = 0; j < npages; j++) {
> +		gpa +=  PAGE_SIZE;
> +		page = gfn_to_page(kvm, gpa >> PAGE_SHIFT);
> +		hpa = page_to_phys(page);
>   

Again, gfn_to_pfn() and you're taking a reference that I never see you 
releasing.

> +		domain_page_mapping(kvm->arch.domain, gpa, hpa,
> PAGE_SIZE,
> +					DMA_PTE_READ | DMA_PTE_WRITE);
> +		vma = find_vma(current->mm, gpa);
> +		if (!vma)
> +			return 1;
> +		write = (vma->vm_flags & VM_WRITE) != 0;
> +		get_user_pages(current, current->mm, gpa,
> +				PAGE_SIZE, write, 0, NULL, NULL);
>   

I don't quite see what you're doing here.  It looks like you're trying 
to pre-fault the page in?  gfn_to_pfn will do that for you.  You're 
taking a bunch of references here that are never getting released.

I think the general approach here is a bit faulty.  I think what we want 
to do is mlock() from userspace to ensure all the memory is present for 
the guest.  We should combine this with MMU-notifiers such that whenever 
the userspace mapping changes, we can reprogram the IOMMU.  In the case 
where we don't have MMU-notifiers, we simply hold on to the memory 
forever and never program the IOMMU.  The initial mlock() in userspace 
is somewhat of a nop here but it's important with MMU-notifiers because 
we will no longer be holding a reference for guest memory.

We have to ensure we don't swap KVM guest memory while using hardware 
pass-through, but AFAICT, we do not need to make the memory 
non-reclaimable  As long as we reprogram the IOMMU with a new, valid, 
mapping everything should be fine.  mlock() really gives us the right 
semantics.

Semantically, a PV API that supports DMA window registration simply 
mlock()s the DMA regions on behalf of the guest.  No special logic 
should be needed.

Regards,

Anthony Liguori

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

  parent reply	other threads:[~2008-05-07  0:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-05 21:36 [RFC] [VTD][patch 1/3] vt-d support for pci passthrough: kvm-vtd--kernel.patch Kay, Allen M
2008-05-06  7:37 ` Amit Shah
2008-05-06 22:01   ` Kay, Allen M
2008-05-06 10:34 ` Avi Kivity
2008-05-07  0:31 ` Anthony Liguori [this message]
2008-05-07  0:47   ` Kay, Allen M
2008-05-07  1:56     ` Anthony Liguori
2008-05-07  5:51       ` Avi Kivity
2008-05-07 19:22         ` Anthony Liguori
2008-05-11 15:31           ` Avi Kivity
2008-05-11  8:53 ` Muli Ben-Yehuda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4820F845.3080704@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=allen.m.kay@intel.com \
    --cc=andrea@qumranet.com \
    --cc=avik@qumranet.com \
    --cc=benami@il.ibm.com \
    --cc=chrisw@redhat.com \
    --cc=kvm-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.