PCI passthrough with VT-d - native performance

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* PCI passthrough with VT-d - native performance
@ 2008-07-16 15:56 Ben-Ami Yassour
  2008-07-16 15:56 ` [PATCH 1/6] KVM: Introduce a callback routine for IOAPIC ack handling Ben-Ami Yassour
  0 siblings, 1 reply; 11+ messages in thread
From: Ben-Ami Yassour @ 2008-07-16 15:56 UTC (permalink / raw)
  To: amit.shah; +Cc: kvm, muli, benami, weidong.han, anthony

In last few tests that we made with PCI-passthrough and VT-d using
iperf, we were able to get the same throughput as on native OS with a 1G
NIC (with higher CPU utilization).

The following patches are the PCI-passthrough patches that Amit sent
(re-based on the last kvm tree), followed by a few improvements and the
VT-d extension.
I am also sending the userspace patches: the patch that Amit sent for
PCI passthrough and the direct-mmio extension for userspace (note that
without the direct mmio extension we get less then half the throughput).

Per Avi's request, I am resending the kernel patches, folding patches 3/8, 4/8, 5/8.

Regards,
Ben

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/6] KVM: Introduce a callback routine for IOAPIC ack handling
  2008-07-16 15:56 PCI passthrough with VT-d - native performance Ben-Ami Yassour
@ 2008-07-16 15:56 ` Ben-Ami Yassour
  2008-07-16 15:56   ` [PATCH 2/6] KVM: Introduce a callback routine for PIC " Ben-Ami Yassour
  0 siblings, 1 reply; 11+ messages in thread
From: Ben-Ami Yassour @ 2008-07-16 15:56 UTC (permalink / raw)
  To: amit.shah; +Cc: kvm, muli, benami, weidong.han, anthony

From: Amit Shah <amit.shah@qumranet.com>

This will be useful for acking irqs of assigned devices

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 virt/kvm/ioapic.c |    3 +++
 virt/kvm/ioapic.h |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index c0d2287..8ce93c7 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -295,6 +295,9 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi)
 	ent->fields.remote_irr = 0;
 	if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
 		ioapic_service(ioapic, gsi);
+
+	if (ioapic->ack_notifier)
+		ioapic->ack_notifier(ioapic->kvm, gsi);
 }
 
 void kvm_ioapic_update_eoi(struct kvm *kvm, int vector)
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index 7f16675..a42743f 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -58,6 +58,7 @@ struct kvm_ioapic {
 	} redirtbl[IOAPIC_NUM_PINS];
 	struct kvm_io_device dev;
 	struct kvm *kvm;
+	void (*ack_notifier)(void *opaque, int irq);
 };
 
 #ifdef DEBUG
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/6] KVM: Introduce a callback routine for PIC ack handling
  2008-07-16 15:56 ` [PATCH 1/6] KVM: Introduce a callback routine for IOAPIC ack handling Ben-Ami Yassour
@ 2008-07-16 15:56   ` Ben-Ami Yassour
  2008-07-16 15:56     ` [PATCH 3/6] KVM: Handle device assignment to guests Ben-Ami Yassour
  0 siblings, 1 reply; 11+ messages in thread
From: Ben-Ami Yassour @ 2008-07-16 15:56 UTC (permalink / raw)
  To: amit.shah; +Cc: kvm, muli, benami, weidong.han, anthony

From: Amit Shah <amit.shah@qumranet.com>

This is useful for acking irqs of assigned devices

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 arch/x86/kvm/i8259.c |    6 +++++-
 arch/x86/kvm/irq.c   |    2 +-
 arch/x86/kvm/irq.h   |    3 ++-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 55e179a..c1a4110 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -159,9 +159,10 @@ static inline void pic_intack(struct kvm_kpic_state *s, int irq)
 		s->irr &= ~(1 << irq);
 }
 
-int kvm_pic_read_irq(struct kvm_pic *s)
+int kvm_pic_read_irq(struct kvm *kvm)
 {
 	int irq, irq2, intno;
+	struct kvm_pic *s = pic_irqchip(kvm);
 
 	irq = pic_get_irq(&s->pics[0]);
 	if (irq >= 0) {
@@ -186,6 +187,9 @@ int kvm_pic_read_irq(struct kvm_pic *s)
 		irq = 7;
 		intno = s->pics[0].irq_base + irq;
 	}
+	if (kvm->arch.vpic->ack_notifier)
+		kvm->arch.vpic->ack_notifier(kvm, irq);
+
 	pic_update_irq(s);
 
 	return intno;
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 0d9e552..3529620 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -72,7 +72,7 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
 		if (kvm_apic_accept_pic_intr(v)) {
 			s = pic_irqchip(v->kvm);
 			s->output = 0;		/* PIC */
-			vector = kvm_pic_read_irq(s);
+			vector = kvm_pic_read_irq(v->kvm);
 		}
 	}
 	return vector;
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 07ff2ae..ac71a04 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -63,11 +63,12 @@ struct kvm_pic {
 	void *irq_request_opaque;
 	int output;		/* intr from master PIC */
 	struct kvm_io_device dev;
+	void (*ack_notifier)(void *opaque, int irq);
 };
 
 struct kvm_pic *kvm_create_pic(struct kvm *kvm);
 void kvm_pic_set_irq(void *opaque, int irq, int level);
-int kvm_pic_read_irq(struct kvm_pic *s);
+int kvm_pic_read_irq(struct kvm *kvm);
 void kvm_pic_update_irq(struct kvm_pic *s);
 
 static inline struct kvm_pic *pic_irqchip(struct kvm *kvm)
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/6] KVM: Handle device assignment to guests
  2008-07-16 15:56   ` [PATCH 2/6] KVM: Introduce a callback routine for PIC " Ben-Ami Yassour
@ 2008-07-16 15:56     ` Ben-Ami Yassour
  2008-07-16 15:56       ` Ben-Ami Yassour
  2008-07-17  2:00       ` Yang, Sheng
  0 siblings, 2 replies; 11+ messages in thread
From: Ben-Ami Yassour @ 2008-07-16 15:56 UTC (permalink / raw)
  To: amit.shah; +Cc: kvm, muli, benami, weidong.han, anthony

From: Amit Shah <amit.shah@qumranet.com>

This patch adds support for handling PCI devices that are assigned to
the guest ("PCI passthrough").

The device to be assigned to the guest is registered in the host kernel
and interrupt delivery is handled. If a device is already assigned, or
the device driver for it is still loaded on the host, the device assignment
is failed by conveying a -EBUSY reply to the userspace.

Devices that share their interrupt line are not supported at the moment.

By itself, this patch will not make devices work within the guest.
The VT-d extension is required to enable the device to perform DMA.
Another alternative is PVDMA.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Han, Weidong <weidong.han@intel.com>
---
 arch/x86/kvm/x86.c         |  267 ++++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/kvm_host.h |   37 ++++++
 include/asm-x86/kvm_para.h |   16 +++-
 include/linux/kvm.h        |    3 +
 virt/kvm/ioapic.c          |   12 ++-
 5 files changed, 332 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3167006..65b307d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4,10 +4,12 @@
  * derived from drivers/kvm/kvm_main.c
  *
  * Copyright (C) 2006 Qumranet, Inc.
+ * Copyright (C) 2008 Qumranet, Inc.
  *
  * Authors:
  *   Avi Kivity   <avi@qumranet.com>
  *   Yaniv Kamay  <yaniv@qumranet.com>
+ *   Amit Shah    <amit.shah@qumranet.com>
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -23,8 +25,10 @@
 #include "x86.h"
 
 #include <linux/clocksource.h>
+#include <linux/interrupt.h>
 #include <linux/kvm.h>
 #include <linux/fs.h>
+#include <linux/pci.h>
 #include <linux/vmalloc.h>
 #include <linux/module.h>
 #include <linux/mman.h>
@@ -98,6 +102,256 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ NULL }
 };
 
+DEFINE_RWLOCK(kvm_pci_pt_lock);
+
+/*
+ * Used to find a registered host PCI device (a "passthrough" device)
+ * during ioctls, interrupts or EOI
+ */
+struct kvm_pci_pt_dev_list *
+kvm_find_pci_pt_dev(struct list_head *head,
+		struct kvm_pci_pt_info *pt_pci_info, int irq, int source)
+{
+	struct list_head *ptr;
+	struct kvm_pci_pt_dev_list *match;
+
+	list_for_each(ptr, head) {
+		match = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
+
+		switch (source) {
+		case KVM_PT_SOURCE_IRQ:
+			/*
+			 * Used to find a registered host device
+			 * during interrupt context on host
+			 */
+			if (match->pt_dev.host.irq == irq)
+				return match;
+			break;
+		case KVM_PT_SOURCE_IRQ_ACK:
+			/*
+			 * Used to find a registered host device when
+			 * the guest acks an interrupt
+			 */
+			if (match->pt_dev.guest.irq == irq)
+				return match;
+			break;
+		case KVM_PT_SOURCE_UPDATE:
+			if ((match->pt_dev.host.busnr == pt_pci_info->busnr) &&
+			    (match->pt_dev.host.devfn == pt_pci_info->devfn))
+				return match;
+			break;
+		}
+	}
+	return NULL;
+}
+
+static void kvm_pci_pt_int_work_fn(struct work_struct *work)
+{
+	struct kvm_pci_pt_work *int_work;
+
+	int_work = container_of(work, struct kvm_pci_pt_work, work);
+
+	/* This is taken to safely inject irq inside the guest. When
+	 * the interrupt injection (or the ioapic code) uses a
+	 * finer-grained lock, update this
+	 */
+	mutex_lock(&int_work->pt_dev->kvm->lock);
+	kvm_set_irq(int_work->pt_dev->kvm, int_work->pt_dev->guest.irq, 1);
+	mutex_unlock(&int_work->pt_dev->kvm->lock);
+	kvm_put_kvm(int_work->pt_dev->kvm);
+}
+
+static void kvm_pci_pt_ack_work_fn(struct work_struct *work)
+{
+	struct kvm_pci_pt_work *ack_work;
+
+	ack_work = container_of(work, struct kvm_pci_pt_work, work);
+
+	/* This is taken to safely inject irq inside the guest. When
+	 * the interrupt injection (or the ioapic code) uses a
+	 * finer-grained lock, update this
+	 */
+	mutex_lock(&ack_work->pt_dev->kvm->lock);
+	kvm_set_irq(ack_work->pt_dev->kvm, ack_work->pt_dev->guest.irq, 0);
+	enable_irq(ack_work->pt_dev->host.irq);
+	mutex_unlock(&ack_work->pt_dev->kvm->lock);
+	kvm_put_kvm(ack_work->pt_dev->kvm);
+}
+
+/* FIXME: Implement the OR logic needed to make shared interrupts on
+ * this line behave properly
+ */
+static irqreturn_t kvm_pci_pt_dev_intr(int irq, void *dev_id)
+{
+	struct kvm_pci_passthrough_dev_kernel *pt_dev =
+		(struct kvm_pci_passthrough_dev_kernel *) dev_id;
+
+	kvm_get_kvm(pt_dev->kvm);
+	schedule_work(&pt_dev->int_work.work);
+	disable_irq_nosync(irq);
+	return IRQ_HANDLED;
+}
+
+/* Ack the irq line for a passthrough device */
+static void kvm_pci_pt_ack_irq(void *opaque, int irq)
+{
+	struct kvm *kvm = opaque;
+	struct kvm_pci_pt_dev_list *pci_pt_dev;
+
+	if (irq == -1)
+		return;
+
+	read_lock(&kvm_pci_pt_lock);
+	pci_pt_dev = kvm_find_pci_pt_dev(&kvm->arch.pci_pt_dev_head, NULL, irq,
+					 KVM_PT_SOURCE_IRQ_ACK);
+	if (!pci_pt_dev) {
+		read_unlock(&kvm_pci_pt_lock);
+		return;
+	}
+	kvm_get_kvm(kvm);
+	read_unlock(&kvm_pci_pt_lock);
+	schedule_work(&pci_pt_dev->pt_dev.ack_work.work);
+}
+
+static int kvm_vm_ioctl_pci_pt_dev(struct kvm *kvm,
+				   struct kvm_pci_passthrough_dev *pci_pt_dev)
+{
+	int r = 0;
+	struct kvm_pci_pt_dev_list *match;
+	struct pci_dev *dev;
+
+	write_lock(&kvm_pci_pt_lock);
+
+	/* Check if this is a request to update the irq of the device
+	 * in the guest (BIOS/ kernels can dynamically reprogram irq
+	 * numbers).  This also protects us from adding the same
+	 * device twice.
+	 */
+	match = kvm_find_pci_pt_dev(&kvm->arch.pci_pt_dev_head,
+				    &pci_pt_dev->host, 0, KVM_PT_SOURCE_UPDATE);
+	if (match) {
+		match->pt_dev.guest.irq = pci_pt_dev->guest.irq;
+		write_unlock(&kvm_pci_pt_lock);
+		goto out;
+	}
+	write_unlock(&kvm_pci_pt_lock);
+
+	match = kzalloc(sizeof(struct kvm_pci_pt_dev_list), GFP_KERNEL);
+	if (match == NULL) {
+		printk(KERN_INFO "%s: Couldn't allocate memory\n",
+		       __func__);
+		r = -ENOMEM;
+		goto out;
+	}
+	dev = pci_get_bus_and_slot(pci_pt_dev->host.busnr,
+				   pci_pt_dev->host.devfn);
+	if (!dev) {
+		printk(KERN_INFO "%s: host device not found\n", __func__);
+		r = -EINVAL;
+		goto out_free;
+	}
+	if (pci_enable_device(dev)) {
+		printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
+		r = -EBUSY;
+		goto out_put;
+	}
+	r = pci_request_regions(dev, "kvm_pt_device");
+	if (r) {
+		printk(KERN_INFO "%s: Could not get access to device regions\n",
+		       __func__);
+		goto out_put;
+	}
+	match->pt_dev.guest.busnr = pci_pt_dev->guest.busnr;
+	match->pt_dev.guest.devfn = pci_pt_dev->guest.devfn;
+	match->pt_dev.host.busnr = pci_pt_dev->host.busnr;
+	match->pt_dev.host.devfn = pci_pt_dev->host.devfn;
+	match->pt_dev.dev = dev;
+
+	write_lock(&kvm_pci_pt_lock);
+
+	INIT_WORK(&match->pt_dev.int_work.work, kvm_pci_pt_int_work_fn);
+	INIT_WORK(&match->pt_dev.ack_work.work, kvm_pci_pt_ack_work_fn);
+
+	match->pt_dev.kvm = kvm;
+	match->pt_dev.int_work.pt_dev = &match->pt_dev;
+	match->pt_dev.ack_work.pt_dev = &match->pt_dev;
+
+	list_add(&match->list, &kvm->arch.pci_pt_dev_head);
+
+	write_unlock(&kvm_pci_pt_lock);
+
+	if (irqchip_in_kernel(kvm)) {
+		match->pt_dev.guest.irq = pci_pt_dev->guest.irq;
+		match->pt_dev.host.irq = dev->irq;
+		if (kvm->arch.vioapic)
+			kvm->arch.vioapic->ack_notifier = kvm_pci_pt_ack_irq;
+		if (kvm->arch.vpic)
+			kvm->arch.vpic->ack_notifier = kvm_pci_pt_ack_irq;
+
+		/* Even though this is PCI, we don't want to use shared
+		 * interrupts. Sharing host devices with guest-assigned devices
+		 * on the same interrupt line is not a happy situation: there
+		 * are going to be long delays in accepting, acking, etc.
+		 */
+		if (request_irq(dev->irq, kvm_pci_pt_dev_intr, 0,
+				"kvm_pt_device", (void *)&match->pt_dev)) {
+			printk(KERN_INFO "%s: couldn't allocate irq for pv "
+			       "device\n", __func__);
+			r = -EIO;
+			goto out_list_del;
+		}
+	}
+
+out:
+	return r;
+out_list_del:
+	list_del(&match->list);
+out_put:
+	pci_dev_put(dev);
+out_free:
+	kfree(match);
+	goto out;
+}
+
+static void kvm_free_pci_passthrough(struct kvm *kvm)
+{
+	struct list_head *ptr, *ptr2;
+	struct kvm_pci_pt_dev_list *pci_pt_dev;
+
+	write_lock(&kvm_pci_pt_lock);
+	list_for_each_safe(ptr, ptr2, &kvm->arch.pci_pt_dev_head) {
+		pci_pt_dev = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
+
+		if (irqchip_in_kernel(kvm) && pci_pt_dev->pt_dev.host.irq)
+			free_irq(pci_pt_dev->pt_dev.host.irq,
+				 (void *)&pci_pt_dev->pt_dev);
+
+		if (cancel_work_sync(&pci_pt_dev->pt_dev.int_work.work))
+			/* We had pending work. That means we will have to take
+			 * care of kvm_put_kvm.
+			 */
+			kvm_put_kvm(kvm);
+
+		if (cancel_work_sync(&pci_pt_dev->pt_dev.ack_work.work))
+			/* We had pending work. That means we will have to take
+			 * care of kvm_put_kvm.
+			 */
+			kvm_put_kvm(kvm);
+	}
+
+	list_for_each_safe(ptr, ptr2, &kvm->arch.pci_pt_dev_head) {
+		pci_pt_dev = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
+
+		/* Search for this device got us a refcount */
+		pci_release_regions(pci_pt_dev->pt_dev.dev);
+		pci_disable_device(pci_pt_dev->pt_dev.dev);
+		pci_dev_put(pci_pt_dev->pt_dev.dev);
+
+		list_del(&pci_pt_dev->list);
+		kfree(pci_pt_dev);
+	}
+	write_unlock(&kvm_pci_pt_lock);
+}
 
 unsigned long segment_base(u16 selector)
 {
@@ -1746,6 +2000,17 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_UPDATE_PCI_PT_DEV: {
+		struct kvm_pci_passthrough_dev pci_pt_dev;
+
+		r = -EFAULT;
+		if (copy_from_user(&pci_pt_dev, argp, sizeof pci_pt_dev))
+			goto out;
+		r = kvm_vm_ioctl_pci_pt_dev(kvm, &pci_pt_dev);
+		if (r)
+			goto out;
+		break;
+	}
 	case KVM_GET_PIT: {
 		struct kvm_pit_state ps;
 		r = -EFAULT;
@@ -3948,6 +4213,7 @@ struct  kvm *kvm_arch_create_vm(void)
 		return ERR_PTR(-ENOMEM);
 
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
+	INIT_LIST_HEAD(&kvm->arch.pci_pt_dev_head);
 
 	return kvm;
 }
@@ -3980,6 +4246,7 @@ static void kvm_free_vcpus(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
+	kvm_free_pci_passthrough(kvm);
 	kvm_free_pit(kvm);
 	kfree(kvm->arch.vpic);
 	kfree(kvm->arch.vioapic);
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 4a47859..f6973e0 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -88,6 +88,7 @@
 #define KVM_NR_VAR_MTRR 8
 
 extern spinlock_t kvm_lock;
+extern rwlock_t kvm_pci_pt_lock;
 extern struct list_head vm_list;
 
 struct kvm_vcpu;
@@ -319,6 +320,37 @@ struct kvm_mem_alias {
 	gfn_t target_gfn;
 };
 
+/* Some definitions for devices assigned to the guest by the host */
+#define KVM_PT_SOURCE_IRQ	1
+#define KVM_PT_SOURCE_IRQ_ACK	2
+#define KVM_PT_SOURCE_UPDATE	3
+
+/* For assigned devices, we schedule work in the system workqueue to
+ * inject interrupts into the guest when an interrupt occurs on the
+ * physical device and also when the guest acks the interrupt.
+ */
+struct kvm_pci_pt_work {
+	struct work_struct work;
+	struct kvm_pci_passthrough_dev_kernel *pt_dev;
+};
+
+struct kvm_pci_passthrough_dev_kernel {
+	struct kvm_pci_pt_info guest;
+	struct kvm_pci_pt_info host;
+	struct kvm_pci_pt_work int_work;
+	struct kvm_pci_pt_work ack_work;
+	struct pci_dev *dev;
+	struct kvm *kvm;
+};
+
+/* This list is to store the guest bus:device:function-irq and host
+ * bus:device:function-irq mapping for assigned devices.
+ */
+struct kvm_pci_pt_dev_list {
+	struct list_head list;
+	struct kvm_pci_passthrough_dev_kernel pt_dev;
+};
+
 struct kvm_arch{
 	int naliases;
 	struct kvm_mem_alias aliases[KVM_ALIAS_SLOTS];
@@ -331,6 +363,7 @@ struct kvm_arch{
 	 * Hash table of struct kvm_mmu_page.
 	 */
 	struct list_head active_mmu_pages;
+	struct list_head pci_pt_dev_head;
 	struct kvm_pic *vpic;
 	struct kvm_ioapic *vioapic;
 	struct kvm_pit *vpit;
@@ -577,6 +610,10 @@ void kvm_disable_tdp(void);
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
 int complete_pio(struct kvm_vcpu *vcpu);
 
+struct kvm_pci_pt_dev_list *
+kvm_find_pci_pt_dev(struct list_head *head,
+		    struct kvm_pci_pt_info *pt_pci_info, int irq, int source);
+
 static inline struct kvm_mmu_page *page_header(hpa_t shadow_page)
 {
 	struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT);
diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index 76f3921..b6c5d00 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -142,6 +142,20 @@ static inline unsigned int kvm_arch_para_features(void)
 	return cpuid_eax(KVM_CPUID_FEATURES);
 }
 
-#endif
+#endif /* KERNEL */
 
+/* Stores information for identifying host PCI devices assigned to the
+ * guest: this is used in the host kernel and in the userspace.
+ */
+struct kvm_pci_pt_info {
+	unsigned char busnr;
+	unsigned int devfn;
+	__u32 irq;
+};
+
+/* Mapping between host and guest PCI device */
+struct kvm_pci_passthrough_dev {
+	struct kvm_pci_pt_info guest;
+	struct kvm_pci_pt_info host;
+};
 #endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 6edba45..3370e80 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -382,6 +382,7 @@ struct kvm_trace_rec {
 #define KVM_CAP_PV_MMU 13
 #define KVM_CAP_MP_STATE 14
 #define KVM_CAP_COALESCED_MMIO 15
+#define KVM_CAP_PCI_PASSTHROUGH 16
 
 /*
  * ioctls for VM fds
@@ -411,6 +412,8 @@ struct kvm_trace_rec {
 			_IOW(KVMIO,  0x67, struct kvm_coalesced_mmio_zone)
 #define KVM_UNREGISTER_COALESCED_MMIO \
 			_IOW(KVMIO,  0x68, struct kvm_coalesced_mmio_zone)
+#define KVM_UPDATE_PCI_PT_DEV	  _IOR(KVMIO, 0x69, \
+				       struct kvm_pci_passthrough_dev)
 
 /*
  * ioctls for vcpu fds
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 8ce93c7..6ec99fd 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -288,13 +288,21 @@ void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
 static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi)
 {
 	union ioapic_redir_entry *ent;
+	struct kvm_pci_pt_dev_list *match;
 
 	ent = &ioapic->redirtbl[gsi];
 	ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
 
 	ent->fields.remote_irr = 0;
-	if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
-		ioapic_service(ioapic, gsi);
+
+	read_lock(&kvm_pci_pt_lock);
+	match = kvm_find_pci_pt_dev(&ioapic->kvm->arch.pci_pt_dev_head, NULL,
+				    gsi, KVM_PT_SOURCE_IRQ_ACK);
+	read_unlock(&kvm_pci_pt_lock);
+	if (!match) {
+		if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
+			ioapic_service(ioapic, gsi);
+	}
 
 	if (ioapic->ack_notifier)
 		ioapic->ack_notifier(ioapic->kvm, gsi);
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/6] KVM: Handle device assignment to guests
  2008-07-16 15:56     ` [PATCH 3/6] KVM: Handle device assignment to guests Ben-Ami Yassour
@ 2008-07-16 15:56       ` Ben-Ami Yassour
  2008-07-16 15:56         ` [PATCH 4/6] VT-d: changes to support KVM Ben-Ami Yassour
  2008-07-16 16:02         ` [PATCH 3/6] KVM: Handle device assignment to guests Ben-Ami Yassour
  2008-07-17  2:00       ` Yang, Sheng
  1 sibling, 2 replies; 11+ messages in thread
From: Ben-Ami Yassour @ 2008-07-16 15:56 UTC (permalink / raw)
  To: amit.shah; +Cc: kvm, muli, benami, weidong.han, anthony

From: Amit Shah <amit.shah@qumranet.com>

This patch adds support for handling PCI devices that are assigned to
the guest ("PCI passthrough").

The device to be assigned to the guest is registered in the host kernel
and interrupt delivery is handled. If a device is already assigned, or
the device driver for it is still loaded on the host, the device assignment
is failed by conveying a -EBUSY reply to the userspace.

Devices that share their interrupt line are not supported at the moment.

By itself, this patch will not make devices work within the guest.
The VT-d extension is required to enable the device to perform DMA.
Another alternative is PVDMA.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Han, Weidong <weidong.han@intel.com>
---
 arch/x86/kvm/x86.c         |  267 ++++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/kvm_host.h |   37 ++++++
 include/asm-x86/kvm_para.h |   16 +++-
 include/linux/kvm.h        |    3 +
 virt/kvm/ioapic.c          |   12 ++-
 5 files changed, 332 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3167006..65b307d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4,10 +4,12 @@
  * derived from drivers/kvm/kvm_main.c
  *
  * Copyright (C) 2006 Qumranet, Inc.
+ * Copyright (C) 2008 Qumranet, Inc.
  *
  * Authors:
  *   Avi Kivity   <avi@qumranet.com>
  *   Yaniv Kamay  <yaniv@qumranet.com>
+ *   Amit Shah    <amit.shah@qumranet.com>
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -23,8 +25,10 @@
 #include "x86.h"
 
 #include <linux/clocksource.h>
+#include <linux/interrupt.h>
 #include <linux/kvm.h>
 #include <linux/fs.h>
+#include <linux/pci.h>
 #include <linux/vmalloc.h>
 #include <linux/module.h>
 #include <linux/mman.h>
@@ -98,6 +102,256 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ NULL }
 };
 
+DEFINE_RWLOCK(kvm_pci_pt_lock);
+
+/*
+ * Used to find a registered host PCI device (a "passthrough" device)
+ * during ioctls, interrupts or EOI
+ */
+struct kvm_pci_pt_dev_list *
+kvm_find_pci_pt_dev(struct list_head *head,
+		struct kvm_pci_pt_info *pt_pci_info, int irq, int source)
+{
+	struct list_head *ptr;
+	struct kvm_pci_pt_dev_list *match;
+
+	list_for_each(ptr, head) {
+		match = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
+
+		switch (source) {
+		case KVM_PT_SOURCE_IRQ:
+			/*
+			 * Used to find a registered host device
+			 * during interrupt context on host
+			 */
+			if (match->pt_dev.host.irq == irq)
+				return match;
+			break;
+		case KVM_PT_SOURCE_IRQ_ACK:
+			/*
+			 * Used to find a registered host device when
+			 * the guest acks an interrupt
+			 */
+			if (match->pt_dev.guest.irq == irq)
+				return match;
+			break;
+		case KVM_PT_SOURCE_UPDATE:
+			if ((match->pt_dev.host.busnr == pt_pci_info->busnr) &&
+			    (match->pt_dev.host.devfn == pt_pci_info->devfn))
+				return match;
+			break;
+		}
+	}
+	return NULL;
+}
+
+static void kvm_pci_pt_int_work_fn(struct work_struct *work)
+{
+	struct kvm_pci_pt_work *int_work;
+
+	int_work = container_of(work, struct kvm_pci_pt_work, work);
+
+	/* This is taken to safely inject irq inside the guest. When
+	 * the interrupt injection (or the ioapic code) uses a
+	 * finer-grained lock, update this
+	 */
+	mutex_lock(&int_work->pt_dev->kvm->lock);
+	kvm_set_irq(int_work->pt_dev->kvm, int_work->pt_dev->guest.irq, 1);
+	mutex_unlock(&int_work->pt_dev->kvm->lock);
+	kvm_put_kvm(int_work->pt_dev->kvm);
+}
+
+static void kvm_pci_pt_ack_work_fn(struct work_struct *work)
+{
+	struct kvm_pci_pt_work *ack_work;
+
+	ack_work = container_of(work, struct kvm_pci_pt_work, work);
+
+	/* This is taken to safely inject irq inside the guest. When
+	 * the interrupt injection (or the ioapic code) uses a
+	 * finer-grained lock, update this
+	 */
+	mutex_lock(&ack_work->pt_dev->kvm->lock);
+	kvm_set_irq(ack_work->pt_dev->kvm, ack_work->pt_dev->guest.irq, 0);
+	enable_irq(ack_work->pt_dev->host.irq);
+	mutex_unlock(&ack_work->pt_dev->kvm->lock);
+	kvm_put_kvm(ack_work->pt_dev->kvm);
+}
+
+/* FIXME: Implement the OR logic needed to make shared interrupts on
+ * this line behave properly
+ */
+static irqreturn_t kvm_pci_pt_dev_intr(int irq, void *dev_id)
+{
+	struct kvm_pci_passthrough_dev_kernel *pt_dev =
+		(struct kvm_pci_passthrough_dev_kernel *) dev_id;
+
+	kvm_get_kvm(pt_dev->kvm);
+	schedule_work(&pt_dev->int_work.work);
+	disable_irq_nosync(irq);
+	return IRQ_HANDLED;
+}
+
+/* Ack the irq line for a passthrough device */
+static void kvm_pci_pt_ack_irq(void *opaque, int irq)
+{
+	struct kvm *kvm = opaque;
+	struct kvm_pci_pt_dev_list *pci_pt_dev;
+
+	if (irq == -1)
+		return;
+
+	read_lock(&kvm_pci_pt_lock);
+	pci_pt_dev = kvm_find_pci_pt_dev(&kvm->arch.pci_pt_dev_head, NULL, irq,
+					 KVM_PT_SOURCE_IRQ_ACK);
+	if (!pci_pt_dev) {
+		read_unlock(&kvm_pci_pt_lock);
+		return;
+	}
+	kvm_get_kvm(kvm);
+	read_unlock(&kvm_pci_pt_lock);
+	schedule_work(&pci_pt_dev->pt_dev.ack_work.work);
+}
+
+static int kvm_vm_ioctl_pci_pt_dev(struct kvm *kvm,
+				   struct kvm_pci_passthrough_dev *pci_pt_dev)
+{
+	int r = 0;
+	struct kvm_pci_pt_dev_list *match;
+	struct pci_dev *dev;
+
+	write_lock(&kvm_pci_pt_lock);
+
+	/* Check if this is a request to update the irq of the device
+	 * in the guest (BIOS/ kernels can dynamically reprogram irq
+	 * numbers).  This also protects us from adding the same
+	 * device twice.
+	 */
+	match = kvm_find_pci_pt_dev(&kvm->arch.pci_pt_dev_head,
+				    &pci_pt_dev->host, 0, KVM_PT_SOURCE_UPDATE);
+	if (match) {
+		match->pt_dev.guest.irq = pci_pt_dev->guest.irq;
+		write_unlock(&kvm_pci_pt_lock);
+		goto out;
+	}
+	write_unlock(&kvm_pci_pt_lock);
+
+	match = kzalloc(sizeof(struct kvm_pci_pt_dev_list), GFP_KERNEL);
+	if (match == NULL) {
+		printk(KERN_INFO "%s: Couldn't allocate memory\n",
+		       __func__);
+		r = -ENOMEM;
+		goto out;
+	}
+	dev = pci_get_bus_and_slot(pci_pt_dev->host.busnr,
+				   pci_pt_dev->host.devfn);
+	if (!dev) {
+		printk(KERN_INFO "%s: host device not found\n", __func__);
+		r = -EINVAL;
+		goto out_free;
+	}
+	if (pci_enable_device(dev)) {
+		printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
+		r = -EBUSY;
+		goto out_put;
+	}
+	r = pci_request_regions(dev, "kvm_pt_device");
+	if (r) {
+		printk(KERN_INFO "%s: Could not get access to device regions\n",
+		       __func__);
+		goto out_put;
+	}
+	match->pt_dev.guest.busnr = pci_pt_dev->guest.busnr;
+	match->pt_dev.guest.devfn = pci_pt_dev->guest.devfn;
+	match->pt_dev.host.busnr = pci_pt_dev->host.busnr;
+	match->pt_dev.host.devfn = pci_pt_dev->host.devfn;
+	match->pt_dev.dev = dev;
+
+	write_lock(&kvm_pci_pt_lock);
+
+	INIT_WORK(&match->pt_dev.int_work.work, kvm_pci_pt_int_work_fn);
+	INIT_WORK(&match->pt_dev.ack_work.work, kvm_pci_pt_ack_work_fn);
+
+	match->pt_dev.kvm = kvm;
+	match->pt_dev.int_work.pt_dev = &match->pt_dev;
+	match->pt_dev.ack_work.pt_dev = &match->pt_dev;
+
+	list_add(&match->list, &kvm->arch.pci_pt_dev_head);
+
+	write_unlock(&kvm_pci_pt_lock);
+
+	if (irqchip_in_kernel(kvm)) {
+		match->pt_dev.guest.irq = pci_pt_dev->guest.irq;
+		match->pt_dev.host.irq = dev->irq;
+		if (kvm->arch.vioapic)
+			kvm->arch.vioapic->ack_notifier = kvm_pci_pt_ack_irq;
+		if (kvm->arch.vpic)
+			kvm->arch.vpic->ack_notifier = kvm_pci_pt_ack_irq;
+
+		/* Even though this is PCI, we don't want to use shared
+		 * interrupts. Sharing host devices with guest-assigned devices
+		 * on the same interrupt line is not a happy situation: there
+		 * are going to be long delays in accepting, acking, etc.
+		 */
+		if (request_irq(dev->irq, kvm_pci_pt_dev_intr, 0,
+				"kvm_pt_device", (void *)&match->pt_dev)) {
+			printk(KERN_INFO "%s: couldn't allocate irq for pv "
+			       "device\n", __func__);
+			r = -EIO;
+			goto out_list_del;
+		}
+	}
+
+out:
+	return r;
+out_list_del:
+	list_del(&match->list);
+out_put:
+	pci_dev_put(dev);
+out_free:
+	kfree(match);
+	goto out;
+}
+
+static void kvm_free_pci_passthrough(struct kvm *kvm)
+{
+	struct list_head *ptr, *ptr2;
+	struct kvm_pci_pt_dev_list *pci_pt_dev;
+
+	write_lock(&kvm_pci_pt_lock);
+	list_for_each_safe(ptr, ptr2, &kvm->arch.pci_pt_dev_head) {
+		pci_pt_dev = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
+
+		if (irqchip_in_kernel(kvm) && pci_pt_dev->pt_dev.host.irq)
+			free_irq(pci_pt_dev->pt_dev.host.irq,
+				 (void *)&pci_pt_dev->pt_dev);
+
+		if (cancel_work_sync(&pci_pt_dev->pt_dev.int_work.work))
+			/* We had pending work. That means we will have to take
+			 * care of kvm_put_kvm.
+			 */
+			kvm_put_kvm(kvm);
+
+		if (cancel_work_sync(&pci_pt_dev->pt_dev.ack_work.work))
+			/* We had pending work. That means we will have to take
+			 * care of kvm_put_kvm.
+			 */
+			kvm_put_kvm(kvm);
+	}
+
+	list_for_each_safe(ptr, ptr2, &kvm->arch.pci_pt_dev_head) {
+		pci_pt_dev = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
+
+		/* Search for this device got us a refcount */
+		pci_release_regions(pci_pt_dev->pt_dev.dev);
+		pci_disable_device(pci_pt_dev->pt_dev.dev);
+		pci_dev_put(pci_pt_dev->pt_dev.dev);
+
+		list_del(&pci_pt_dev->list);
+		kfree(pci_pt_dev);
+	}
+	write_unlock(&kvm_pci_pt_lock);
+}
 
 unsigned long segment_base(u16 selector)
 {
@@ -1746,6 +2000,17 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_UPDATE_PCI_PT_DEV: {
+		struct kvm_pci_passthrough_dev pci_pt_dev;
+
+		r = -EFAULT;
+		if (copy_from_user(&pci_pt_dev, argp, sizeof pci_pt_dev))
+			goto out;
+		r = kvm_vm_ioctl_pci_pt_dev(kvm, &pci_pt_dev);
+		if (r)
+			goto out;
+		break;
+	}
 	case KVM_GET_PIT: {
 		struct kvm_pit_state ps;
 		r = -EFAULT;
@@ -3948,6 +4213,7 @@ struct  kvm *kvm_arch_create_vm(void)
 		return ERR_PTR(-ENOMEM);
 
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
+	INIT_LIST_HEAD(&kvm->arch.pci_pt_dev_head);
 
 	return kvm;
 }
@@ -3980,6 +4246,7 @@ static void kvm_free_vcpus(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
+	kvm_free_pci_passthrough(kvm);
 	kvm_free_pit(kvm);
 	kfree(kvm->arch.vpic);
 	kfree(kvm->arch.vioapic);
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 4a47859..f6973e0 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -88,6 +88,7 @@
 #define KVM_NR_VAR_MTRR 8
 
 extern spinlock_t kvm_lock;
+extern rwlock_t kvm_pci_pt_lock;
 extern struct list_head vm_list;
 
 struct kvm_vcpu;
@@ -319,6 +320,37 @@ struct kvm_mem_alias {
 	gfn_t target_gfn;
 };
 
+/* Some definitions for devices assigned to the guest by the host */
+#define KVM_PT_SOURCE_IRQ	1
+#define KVM_PT_SOURCE_IRQ_ACK	2
+#define KVM_PT_SOURCE_UPDATE	3
+
+/* For assigned devices, we schedule work in the system workqueue to
+ * inject interrupts into the guest when an interrupt occurs on the
+ * physical device and also when the guest acks the interrupt.
+ */
+struct kvm_pci_pt_work {
+	struct work_struct work;
+	struct kvm_pci_passthrough_dev_kernel *pt_dev;
+};
+
+struct kvm_pci_passthrough_dev_kernel {
+	struct kvm_pci_pt_info guest;
+	struct kvm_pci_pt_info host;
+	struct kvm_pci_pt_work int_work;
+	struct kvm_pci_pt_work ack_work;
+	struct pci_dev *dev;
+	struct kvm *kvm;
+};
+
+/* This list is to store the guest bus:device:function-irq and host
+ * bus:device:function-irq mapping for assigned devices.
+ */
+struct kvm_pci_pt_dev_list {
+	struct list_head list;
+	struct kvm_pci_passthrough_dev_kernel pt_dev;
+};
+
 struct kvm_arch{
 	int naliases;
 	struct kvm_mem_alias aliases[KVM_ALIAS_SLOTS];
@@ -331,6 +363,7 @@ struct kvm_arch{
 	 * Hash table of struct kvm_mmu_page.
 	 */
 	struct list_head active_mmu_pages;
+	struct list_head pci_pt_dev_head;
 	struct kvm_pic *vpic;
 	struct kvm_ioapic *vioapic;
 	struct kvm_pit *vpit;
@@ -577,6 +610,10 @@ void kvm_disable_tdp(void);
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
 int complete_pio(struct kvm_vcpu *vcpu);
 
+struct kvm_pci_pt_dev_list *
+kvm_find_pci_pt_dev(struct list_head *head,
+		    struct kvm_pci_pt_info *pt_pci_info, int irq, int source);
+
 static inline struct kvm_mmu_page *page_header(hpa_t shadow_page)
 {
 	struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT);
diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index 76f3921..b6c5d00 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -142,6 +142,20 @@ static inline unsigned int kvm_arch_para_features(void)
 	return cpuid_eax(KVM_CPUID_FEATURES);
 }
 
-#endif
+#endif /* KERNEL */
 
+/* Stores information for identifying host PCI devices assigned to the
+ * guest: this is used in the host kernel and in the userspace.
+ */
+struct kvm_pci_pt_info {
+	unsigned char busnr;
+	unsigned int devfn;
+	__u32 irq;
+};
+
+/* Mapping between host and guest PCI device */
+struct kvm_pci_passthrough_dev {
+	struct kvm_pci_pt_info guest;
+	struct kvm_pci_pt_info host;
+};
 #endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 6edba45..3370e80 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -382,6 +382,7 @@ struct kvm_trace_rec {
 #define KVM_CAP_PV_MMU 13
 #define KVM_CAP_MP_STATE 14
 #define KVM_CAP_COALESCED_MMIO 15
+#define KVM_CAP_PCI_PASSTHROUGH 16
 
 /*
  * ioctls for VM fds
@@ -411,6 +412,8 @@ struct kvm_trace_rec {
 			_IOW(KVMIO,  0x67, struct kvm_coalesced_mmio_zone)
 #define KVM_UNREGISTER_COALESCED_MMIO \
 			_IOW(KVMIO,  0x68, struct kvm_coalesced_mmio_zone)
+#define KVM_UPDATE_PCI_PT_DEV	  _IOR(KVMIO, 0x69, \
+				       struct kvm_pci_passthrough_dev)
 
 /*
  * ioctls for vcpu fds
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 8ce93c7..6ec99fd 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -288,13 +288,21 @@ void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
 static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi)
 {
 	union ioapic_redir_entry *ent;
+	struct kvm_pci_pt_dev_list *match;
 
 	ent = &ioapic->redirtbl[gsi];
 	ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
 
 	ent->fields.remote_irr = 0;
-	if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
-		ioapic_service(ioapic, gsi);
+
+	read_lock(&kvm_pci_pt_lock);
+	match = kvm_find_pci_pt_dev(&ioapic->kvm->arch.pci_pt_dev_head, NULL,
+				    gsi, KVM_PT_SOURCE_IRQ_ACK);
+	read_unlock(&kvm_pci_pt_lock);
+	if (!match) {
+		if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
+			ioapic_service(ioapic, gsi);
+	}
 
 	if (ioapic->ack_notifier)
 		ioapic->ack_notifier(ioapic->kvm, gsi);
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/6] VT-d: changes to support KVM
  2008-07-16 15:56       ` Ben-Ami Yassour
@ 2008-07-16 15:56         ` Ben-Ami Yassour
  2008-07-16 15:56           ` [PATCH 5/6] KVM: PCIPT: VT-d support Ben-Ami Yassour
  2008-07-16 16:02         ` [PATCH 3/6] KVM: Handle device assignment to guests Ben-Ami Yassour
  1 sibling, 1 reply; 11+ messages in thread
From: Ben-Ami Yassour @ 2008-07-16 15:56 UTC (permalink / raw)
  To: amit.shah; +Cc: kvm, muli, benami, weidong.han, anthony, Kay, Allen M

From: Kay, Allen M <allen.m.kay@intel.com>

This patch extends the VT-d driver to support KVM

[Ben: fixed memory pinning]

Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
---
 drivers/pci/dmar.c          |    4 +-
 drivers/pci/intel-iommu.c   |  117 ++++++++++++++-
 drivers/pci/intel-iommu.h   |  344 -----------------------------------------
 drivers/pci/iova.c          |    2 +-
 drivers/pci/iova.h          |   52 -------
 include/linux/intel-iommu.h |  355 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/iova.h        |   52 +++++++
 7 files changed, 523 insertions(+), 403 deletions(-)
 delete mode 100644 drivers/pci/intel-iommu.h
 delete mode 100644 drivers/pci/iova.h
 create mode 100644 include/linux/intel-iommu.h
 create mode 100644 include/linux/iova.h

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index f941f60..a58a5b0 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -26,8 +26,8 @@
 
 #include <linux/pci.h>
 #include <linux/dmar.h>
-#include "iova.h"
-#include "intel-iommu.h"
+#include <linux/iova.h>
+#include <linux/intel-iommu.h>
 
 #undef PREFIX
 #define PREFIX "DMAR:"
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index bb06423..a566406 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -20,6 +20,7 @@
  * Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
  */
 
+#undef DEBUG
 #include <linux/init.h>
 #include <linux/bitmap.h>
 #include <linux/debugfs.h>
@@ -33,8 +34,8 @@
 #include <linux/dma-mapping.h>
 #include <linux/mempool.h>
 #include <linux/timer.h>
-#include "iova.h"
-#include "intel-iommu.h"
+#include <linux/iova.h>
+#include <linux/intel-iommu.h>
 #include <asm/proto.h> /* force_iommu in this header in x86-64*/
 #include <asm/cacheflush.h>
 #include <asm/gart.h>
@@ -160,7 +161,7 @@ static inline void *alloc_domain_mem(void)
 	return iommu_kmem_cache_alloc(iommu_domain_cache);
 }
 
-static inline void free_domain_mem(void *vaddr)
+static void free_domain_mem(void *vaddr)
 {
 	kmem_cache_free(iommu_domain_cache, vaddr);
 }
@@ -1414,7 +1415,7 @@ static void domain_remove_dev_info(struct dmar_domain *domain)
  * find_domain
  * Note: we use struct pci_dev->dev.archdata.iommu stores the info
  */
-struct dmar_domain *
+static struct dmar_domain *
 find_domain(struct pci_dev *pdev)
 {
 	struct device_domain_info *info;
@@ -2431,3 +2432,111 @@ int __init intel_iommu_init(void)
 	return 0;
 }
 
+void intel_iommu_domain_exit(struct dmar_domain *domain)
+{
+	u64 end;
+
+	/* Domain 0 is reserved, so dont process it */
+	if (!domain)
+		return;
+
+	end = DOMAIN_MAX_ADDR(domain->gaw);
+	end = end & (~PAGE_MASK_4K);
+
+	/* clear ptes */
+	dma_pte_clear_range(domain, 0, end);
+
+	/* free page tables */
+	dma_pte_free_pagetable(domain, 0, end);
+
+	iommu_free_domain(domain);
+	free_domain_mem(domain);
+}
+EXPORT_SYMBOL_GPL(intel_iommu_domain_exit);
+
+struct dmar_domain *intel_iommu_domain_alloc(struct pci_dev *pdev)
+{
+	struct dmar_drhd_unit *drhd;
+	struct dmar_domain *domain;
+	struct intel_iommu *iommu;
+
+	drhd = dmar_find_matched_drhd_unit(pdev);
+	if (!drhd) {
+		printk(KERN_ERR "intel_iommu_domain_alloc: drhd == NULL\n");
+		return NULL;
+	}
+
+	iommu = drhd->iommu;
+	if (!iommu) {
+		printk(KERN_ERR
+			"intel_iommu_domain_alloc: iommu == NULL\n");
+		return NULL;
+	}
+	domain = iommu_alloc_domain(iommu);
+	if (!domain) {
+		printk(KERN_ERR
+			"intel_iommu_domain_alloc: domain == NULL\n");
+		return NULL;
+	}
+	if (domain_init(domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
+		printk(KERN_ERR
+			"intel_iommu_domain_alloc: domain_init() failed\n");
+		intel_iommu_domain_exit(domain);
+		return NULL;
+	}
+	return domain;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_domain_alloc);
+
+int intel_iommu_context_mapping(
+	struct dmar_domain *domain, struct pci_dev *pdev)
+{
+	int rc;
+	rc = domain_context_mapping(domain, pdev);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_context_mapping);
+
+int intel_iommu_page_mapping(
+	struct dmar_domain *domain, dma_addr_t iova,
+	u64 hpa, size_t size, int prot)
+{
+	int rc;
+	rc = domain_page_mapping(domain, iova, hpa, size, prot);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_page_mapping);
+
+void intel_iommu_detach_dev(struct dmar_domain *domain, u8 bus, u8 devfn)
+{
+	detach_domain_for_dev(domain, bus, devfn);
+}
+EXPORT_SYMBOL_GPL(intel_iommu_detach_dev);
+
+struct dmar_domain *
+intel_iommu_find_domain(struct pci_dev *pdev)
+{
+	return find_domain(pdev);
+}
+EXPORT_SYMBOL_GPL(intel_iommu_find_domain);
+
+int intel_iommu_found(void)
+{
+	return g_num_of_iommus;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_found);
+
+u64 intel_iommu_iova_to_pfn(struct dmar_domain *domain, u64 iova)
+{
+	struct dma_pte *pte;
+	u64 pfn;
+
+	pfn = 0;
+	pte = addr_to_dma_pte(domain, iova);
+
+	if (pte)
+		pfn = dma_pte_addr(*pte);
+
+	return pfn >> PAGE_SHIFT_4K;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_iova_to_pfn);
diff --git a/drivers/pci/intel-iommu.h b/drivers/pci/intel-iommu.h
deleted file mode 100644
index afc0ad9..0000000
--- a/drivers/pci/intel-iommu.h
+++ /dev/null
@@ -1,344 +0,0 @@
-/*
- * Copyright (c) 2006, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Copyright (C) 2006-2008 Intel Corporation
- * Author: Ashok Raj <ashok.raj@intel.com>
- * Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
- */
-
-#ifndef _INTEL_IOMMU_H_
-#define _INTEL_IOMMU_H_
-
-#include <linux/types.h>
-#include <linux/msi.h>
-#include <linux/sysdev.h>
-#include "iova.h"
-#include <linux/io.h>
-
-/*
- * We need a fixed PAGE_SIZE of 4K irrespective of
- * arch PAGE_SIZE for IOMMU page tables.
- */
-#define PAGE_SHIFT_4K		(12)
-#define PAGE_SIZE_4K		(1UL << PAGE_SHIFT_4K)
-#define PAGE_MASK_4K		(((u64)-1) << PAGE_SHIFT_4K)
-#define PAGE_ALIGN_4K(addr)	(((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
-
-#define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT_4K)
-#define DMA_32BIT_PFN		IOVA_PFN(DMA_32BIT_MASK)
-#define DMA_64BIT_PFN		IOVA_PFN(DMA_64BIT_MASK)
-
-/*
- * Intel IOMMU register specification per version 1.0 public spec.
- */
-
-#define	DMAR_VER_REG	0x0	/* Arch version supported by this IOMMU */
-#define	DMAR_CAP_REG	0x8	/* Hardware supported capabilities */
-#define	DMAR_ECAP_REG	0x10	/* Extended capabilities supported */
-#define	DMAR_GCMD_REG	0x18	/* Global command register */
-#define	DMAR_GSTS_REG	0x1c	/* Global status register */
-#define	DMAR_RTADDR_REG	0x20	/* Root entry table */
-#define	DMAR_CCMD_REG	0x28	/* Context command reg */
-#define	DMAR_FSTS_REG	0x34	/* Fault Status register */
-#define	DMAR_FECTL_REG	0x38	/* Fault control register */
-#define	DMAR_FEDATA_REG	0x3c	/* Fault event interrupt data register */
-#define	DMAR_FEADDR_REG	0x40	/* Fault event interrupt addr register */
-#define	DMAR_FEUADDR_REG 0x44	/* Upper address register */
-#define	DMAR_AFLOG_REG	0x58	/* Advanced Fault control */
-#define	DMAR_PMEN_REG	0x64	/* Enable Protected Memory Region */
-#define	DMAR_PLMBASE_REG 0x68	/* PMRR Low addr */
-#define	DMAR_PLMLIMIT_REG 0x6c	/* PMRR low limit */
-#define	DMAR_PHMBASE_REG 0x70	/* pmrr high base addr */
-#define	DMAR_PHMLIMIT_REG 0x78	/* pmrr high limit */
-
-#define OFFSET_STRIDE		(9)
-/*
-#define dmar_readl(dmar, reg) readl(dmar + reg)
-#define dmar_readq(dmar, reg) ({ \
-		u32 lo, hi; \
-		lo = readl(dmar + reg); \
-		hi = readl(dmar + reg + 4); \
-		(((u64) hi) << 32) + lo; })
-*/
-static inline u64 dmar_readq(void __iomem *addr)
-{
-	u32 lo, hi;
-	lo = readl(addr);
-	hi = readl(addr + 4);
-	return (((u64) hi) << 32) + lo;
-}
-
-static inline void dmar_writeq(void __iomem *addr, u64 val)
-{
-	writel((u32)val, addr);
-	writel((u32)(val >> 32), addr + 4);
-}
-
-#define DMAR_VER_MAJOR(v)		(((v) & 0xf0) >> 4)
-#define DMAR_VER_MINOR(v)		((v) & 0x0f)
-
-/*
- * Decoding Capability Register
- */
-#define cap_read_drain(c)	(((c) >> 55) & 1)
-#define cap_write_drain(c)	(((c) >> 54) & 1)
-#define cap_max_amask_val(c)	(((c) >> 48) & 0x3f)
-#define cap_num_fault_regs(c)	((((c) >> 40) & 0xff) + 1)
-#define cap_pgsel_inv(c)	(((c) >> 39) & 1)
-
-#define cap_super_page_val(c)	(((c) >> 34) & 0xf)
-#define cap_super_offset(c)	(((find_first_bit(&cap_super_page_val(c), 4)) \
-					* OFFSET_STRIDE) + 21)
-
-#define cap_fault_reg_offset(c)	((((c) >> 24) & 0x3ff) * 16)
-#define cap_max_fault_reg_offset(c) \
-	(cap_fault_reg_offset(c) + cap_num_fault_regs(c) * 16)
-
-#define cap_zlr(c)		(((c) >> 22) & 1)
-#define cap_isoch(c)		(((c) >> 23) & 1)
-#define cap_mgaw(c)		((((c) >> 16) & 0x3f) + 1)
-#define cap_sagaw(c)		(((c) >> 8) & 0x1f)
-#define cap_caching_mode(c)	(((c) >> 7) & 1)
-#define cap_phmr(c)		(((c) >> 6) & 1)
-#define cap_plmr(c)		(((c) >> 5) & 1)
-#define cap_rwbf(c)		(((c) >> 4) & 1)
-#define cap_afl(c)		(((c) >> 3) & 1)
-#define cap_ndoms(c)		(((unsigned long)1) << (4 + 2 * ((c) & 0x7)))
-/*
- * Extended Capability Register
- */
-
-#define ecap_niotlb_iunits(e)	((((e) >> 24) & 0xff) + 1)
-#define ecap_iotlb_offset(e) 	((((e) >> 8) & 0x3ff) * 16)
-#define ecap_max_iotlb_offset(e) \
-	(ecap_iotlb_offset(e) + ecap_niotlb_iunits(e) * 16)
-#define ecap_coherent(e)	((e) & 0x1)
-
-
-/* IOTLB_REG */
-#define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
-#define DMA_TLB_DSI_FLUSH (((u64)2) << 60)
-#define DMA_TLB_PSI_FLUSH (((u64)3) << 60)
-#define DMA_TLB_IIRG(type) ((type >> 60) & 7)
-#define DMA_TLB_IAIG(val) (((val) >> 57) & 7)
-#define DMA_TLB_READ_DRAIN (((u64)1) << 49)
-#define DMA_TLB_WRITE_DRAIN (((u64)1) << 48)
-#define DMA_TLB_DID(id)	(((u64)((id) & 0xffff)) << 32)
-#define DMA_TLB_IVT (((u64)1) << 63)
-#define DMA_TLB_IH_NONLEAF (((u64)1) << 6)
-#define DMA_TLB_MAX_SIZE (0x3f)
-
-/* PMEN_REG */
-#define DMA_PMEN_EPM (((u32)1)<<31)
-#define DMA_PMEN_PRS (((u32)1)<<0)
-
-/* GCMD_REG */
-#define DMA_GCMD_TE (((u32)1) << 31)
-#define DMA_GCMD_SRTP (((u32)1) << 30)
-#define DMA_GCMD_SFL (((u32)1) << 29)
-#define DMA_GCMD_EAFL (((u32)1) << 28)
-#define DMA_GCMD_WBF (((u32)1) << 27)
-
-/* GSTS_REG */
-#define DMA_GSTS_TES (((u32)1) << 31)
-#define DMA_GSTS_RTPS (((u32)1) << 30)
-#define DMA_GSTS_FLS (((u32)1) << 29)
-#define DMA_GSTS_AFLS (((u32)1) << 28)
-#define DMA_GSTS_WBFS (((u32)1) << 27)
-
-/* CCMD_REG */
-#define DMA_CCMD_ICC (((u64)1) << 63)
-#define DMA_CCMD_GLOBAL_INVL (((u64)1) << 61)
-#define DMA_CCMD_DOMAIN_INVL (((u64)2) << 61)
-#define DMA_CCMD_DEVICE_INVL (((u64)3) << 61)
-#define DMA_CCMD_FM(m) (((u64)((m) & 0x3)) << 32)
-#define DMA_CCMD_MASK_NOBIT 0
-#define DMA_CCMD_MASK_1BIT 1
-#define DMA_CCMD_MASK_2BIT 2
-#define DMA_CCMD_MASK_3BIT 3
-#define DMA_CCMD_SID(s) (((u64)((s) & 0xffff)) << 16)
-#define DMA_CCMD_DID(d) ((u64)((d) & 0xffff))
-
-/* FECTL_REG */
-#define DMA_FECTL_IM (((u32)1) << 31)
-
-/* FSTS_REG */
-#define DMA_FSTS_PPF ((u32)2)
-#define DMA_FSTS_PFO ((u32)1)
-#define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
-
-/* FRCD_REG, 32 bits access */
-#define DMA_FRCD_F (((u32)1) << 31)
-#define dma_frcd_type(d) ((d >> 30) & 1)
-#define dma_frcd_fault_reason(c) (c & 0xff)
-#define dma_frcd_source_id(c) (c & 0xffff)
-#define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
-
-/*
- * 0: Present
- * 1-11: Reserved
- * 12-63: Context Ptr (12 - (haw-1))
- * 64-127: Reserved
- */
-struct root_entry {
-	u64	val;
-	u64	rsvd1;
-};
-#define ROOT_ENTRY_NR (PAGE_SIZE_4K/sizeof(struct root_entry))
-static inline bool root_present(struct root_entry *root)
-{
-	return (root->val & 1);
-}
-static inline void set_root_present(struct root_entry *root)
-{
-	root->val |= 1;
-}
-static inline void set_root_value(struct root_entry *root, unsigned long value)
-{
-	root->val |= value & PAGE_MASK_4K;
-}
-
-struct context_entry;
-static inline struct context_entry *
-get_context_addr_from_root(struct root_entry *root)
-{
-	return (struct context_entry *)
-		(root_present(root)?phys_to_virt(
-		root->val & PAGE_MASK_4K):
-		NULL);
-}
-
-/*
- * low 64 bits:
- * 0: present
- * 1: fault processing disable
- * 2-3: translation type
- * 12-63: address space root
- * high 64 bits:
- * 0-2: address width
- * 3-6: aval
- * 8-23: domain id
- */
-struct context_entry {
-	u64 lo;
-	u64 hi;
-};
-#define context_present(c) ((c).lo & 1)
-#define context_fault_disable(c) (((c).lo >> 1) & 1)
-#define context_translation_type(c) (((c).lo >> 2) & 3)
-#define context_address_root(c) ((c).lo & PAGE_MASK_4K)
-#define context_address_width(c) ((c).hi &  7)
-#define context_domain_id(c) (((c).hi >> 8) & ((1 << 16) - 1))
-
-#define context_set_present(c) do {(c).lo |= 1;} while (0)
-#define context_set_fault_enable(c) \
-	do {(c).lo &= (((u64)-1) << 2) | 1;} while (0)
-#define context_set_translation_type(c, val) \
-	do { \
-		(c).lo &= (((u64)-1) << 4) | 3; \
-		(c).lo |= ((val) & 3) << 2; \
-	} while (0)
-#define CONTEXT_TT_MULTI_LEVEL 0
-#define context_set_address_root(c, val) \
-	do {(c).lo |= (val) & PAGE_MASK_4K;} while (0)
-#define context_set_address_width(c, val) do {(c).hi |= (val) & 7;} while (0)
-#define context_set_domain_id(c, val) \
-	do {(c).hi |= ((val) & ((1 << 16) - 1)) << 8;} while (0)
-#define context_clear_entry(c) do {(c).lo = 0; (c).hi = 0;} while (0)
-
-/*
- * 0: readable
- * 1: writable
- * 2-6: reserved
- * 7: super page
- * 8-11: available
- * 12-63: Host physcial address
- */
-struct dma_pte {
-	u64 val;
-};
-#define dma_clear_pte(p)	do {(p).val = 0;} while (0)
-
-#define DMA_PTE_READ (1)
-#define DMA_PTE_WRITE (2)
-
-#define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while (0)
-#define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while (0)
-#define dma_set_pte_prot(p, prot) \
-		do {(p).val = ((p).val & ~3) | ((prot) & 3); } while (0)
-#define dma_pte_addr(p) ((p).val & PAGE_MASK_4K)
-#define dma_set_pte_addr(p, addr) do {\
-		(p).val |= ((addr) & PAGE_MASK_4K); } while (0)
-#define dma_pte_present(p) (((p).val & 3) != 0)
-
-struct intel_iommu;
-
-struct dmar_domain {
-	int	id;			/* domain id */
-	struct intel_iommu *iommu;	/* back pointer to owning iommu */
-
-	struct list_head devices; 	/* all devices' list */
-	struct iova_domain iovad;	/* iova's that belong to this domain */
-
-	struct dma_pte	*pgd;		/* virtual address */
-	spinlock_t	mapping_lock;	/* page table lock */
-	int		gaw;		/* max guest address width */
-
-	/* adjusted guest address width, 0 is level 2 30-bit */
-	int		agaw;
-
-#define DOMAIN_FLAG_MULTIPLE_DEVICES 1
-	int		flags;
-};
-
-/* PCI domain-device relationship */
-struct device_domain_info {
-	struct list_head link;	/* link to domain siblings */
-	struct list_head global; /* link to global list */
-	u8 bus;			/* PCI bus numer */
-	u8 devfn;		/* PCI devfn number */
-	struct pci_dev *dev; /* it's NULL for PCIE-to-PCI bridge */
-	struct dmar_domain *domain; /* pointer to domain */
-};
-
-extern int init_dmars(void);
-
-struct intel_iommu {
-	void __iomem	*reg; /* Pointer to hardware regs, virtual addr */
-	u64		cap;
-	u64		ecap;
-	unsigned long 	*domain_ids; /* bitmap of domains */
-	struct dmar_domain **domains; /* ptr to domains */
-	int		seg;
-	u32		gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
-	spinlock_t	lock; /* protect context, domain ids */
-	spinlock_t	register_lock; /* protect register handling */
-	struct root_entry *root_entry; /* virtual address */
-
-	unsigned int irq;
-	unsigned char name[7];    /* Device Name */
-	struct msi_msg saved_msg;
-	struct sys_device sysdev;
-};
-
-#ifndef CONFIG_DMAR_GFX_WA
-static inline void iommu_prepare_gfx_mapping(void)
-{
-	return;
-}
-#endif /* !CONFIG_DMAR_GFX_WA */
-
-#endif
diff --git a/drivers/pci/iova.c b/drivers/pci/iova.c
index 3ef4ac0..2287116 100644
--- a/drivers/pci/iova.c
+++ b/drivers/pci/iova.c
@@ -7,7 +7,7 @@
  * Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
  */
 
-#include "iova.h"
+#include <linux/iova.h>
 
 void
 init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit)
diff --git a/drivers/pci/iova.h b/drivers/pci/iova.h
deleted file mode 100644
index 228f6c9..0000000
--- a/drivers/pci/iova.h
+++ /dev/null
@@ -1,52 +0,0 @@
-/*
- * Copyright (c) 2006, Intel Corporation.
- *
- * This file is released under the GPLv2.
- *
- * Copyright (C) 2006-2008 Intel Corporation
- * Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
- *
- */
-
-#ifndef _IOVA_H_
-#define _IOVA_H_
-
-#include <linux/types.h>
-#include <linux/kernel.h>
-#include <linux/rbtree.h>
-#include <linux/dma-mapping.h>
-
-/* IO virtual address start page frame number */
-#define IOVA_START_PFN		(1)
-
-/* iova structure */
-struct iova {
-	struct rb_node	node;
-	unsigned long	pfn_hi; /* IOMMU dish out addr hi */
-	unsigned long	pfn_lo; /* IOMMU dish out addr lo */
-};
-
-/* holds all the iova translations for a domain */
-struct iova_domain {
-	spinlock_t	iova_alloc_lock;/* Lock to protect iova  allocation */
-	spinlock_t	iova_rbtree_lock; /* Lock to protect update of rbtree */
-	struct rb_root	rbroot;		/* iova domain rbtree root */
-	struct rb_node	*cached32_node; /* Save last alloced node */
-	unsigned long	dma_32bit_pfn;
-};
-
-struct iova *alloc_iova_mem(void);
-void free_iova_mem(struct iova *iova);
-void free_iova(struct iova_domain *iovad, unsigned long pfn);
-void __free_iova(struct iova_domain *iovad, struct iova *iova);
-struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size,
-	unsigned long limit_pfn,
-	bool size_aligned);
-struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo,
-	unsigned long pfn_hi);
-void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to);
-void init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit);
-struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn);
-void put_iova_domain(struct iova_domain *iovad);
-
-#endif
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
new file mode 100644
index 0000000..1490fc0
--- /dev/null
+++ b/include/linux/intel-iommu.h
@@ -0,0 +1,355 @@
+/*
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Copyright (C) 2006-2008 Intel Corporation
+ * Author: Ashok Raj <ashok.raj@intel.com>
+ * Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
+ */
+
+#ifndef _INTEL_IOMMU_H_
+#define _INTEL_IOMMU_H_
+
+#include <linux/types.h>
+#include <linux/msi.h>
+#include <linux/sysdev.h>
+#include "iova.h"
+#include <linux/io.h>
+
+/*
+ * We need a fixed PAGE_SIZE of 4K irrespective of
+ * arch PAGE_SIZE for IOMMU page tables.
+ */
+#define PAGE_SHIFT_4K		(12)
+#define PAGE_SIZE_4K		(1UL << PAGE_SHIFT_4K)
+#define PAGE_MASK_4K		(((u64)-1) << PAGE_SHIFT_4K)
+#define PAGE_ALIGN_4K(addr)	(((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
+
+#define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT_4K)
+#define DMA_32BIT_PFN		IOVA_PFN(DMA_32BIT_MASK)
+#define DMA_64BIT_PFN		IOVA_PFN(DMA_64BIT_MASK)
+
+/*
+ * Intel IOMMU register specification per version 1.0 public spec.
+ */
+
+#define	DMAR_VER_REG	0x0	/* Arch version supported by this IOMMU */
+#define	DMAR_CAP_REG	0x8	/* Hardware supported capabilities */
+#define	DMAR_ECAP_REG	0x10	/* Extended capabilities supported */
+#define	DMAR_GCMD_REG	0x18	/* Global command register */
+#define	DMAR_GSTS_REG	0x1c	/* Global status register */
+#define	DMAR_RTADDR_REG	0x20	/* Root entry table */
+#define	DMAR_CCMD_REG	0x28	/* Context command reg */
+#define	DMAR_FSTS_REG	0x34	/* Fault Status register */
+#define	DMAR_FECTL_REG	0x38	/* Fault control register */
+#define	DMAR_FEDATA_REG	0x3c	/* Fault event interrupt data register */
+#define	DMAR_FEADDR_REG	0x40	/* Fault event interrupt addr register */
+#define	DMAR_FEUADDR_REG 0x44	/* Upper address register */
+#define	DMAR_AFLOG_REG	0x58	/* Advanced Fault control */
+#define	DMAR_PMEN_REG	0x64	/* Enable Protected Memory Region */
+#define	DMAR_PLMBASE_REG 0x68	/* PMRR Low addr */
+#define	DMAR_PLMLIMIT_REG 0x6c	/* PMRR low limit */
+#define	DMAR_PHMBASE_REG 0x70	/* pmrr high base addr */
+#define	DMAR_PHMLIMIT_REG 0x78	/* pmrr high limit */
+
+#define OFFSET_STRIDE		(9)
+/*
+#define dmar_readl(dmar, reg) readl(dmar + reg)
+#define dmar_readq(dmar, reg) ({ \
+		u32 lo, hi; \
+		lo = readl(dmar + reg); \
+		hi = readl(dmar + reg + 4); \
+		(((u64) hi) << 32) + lo; })
+*/
+static inline u64 dmar_readq(void __iomem *addr)
+{
+	u32 lo, hi;
+	lo = readl(addr);
+	hi = readl(addr + 4);
+	return (((u64) hi) << 32) + lo;
+}
+
+static inline void dmar_writeq(void __iomem *addr, u64 val)
+{
+	writel((u32)val, addr);
+	writel((u32)(val >> 32), addr + 4);
+}
+
+#define DMAR_VER_MAJOR(v)		(((v) & 0xf0) >> 4)
+#define DMAR_VER_MINOR(v)		((v) & 0x0f)
+
+/*
+ * Decoding Capability Register
+ */
+#define cap_read_drain(c)	(((c) >> 55) & 1)
+#define cap_write_drain(c)	(((c) >> 54) & 1)
+#define cap_max_amask_val(c)	(((c) >> 48) & 0x3f)
+#define cap_num_fault_regs(c)	((((c) >> 40) & 0xff) + 1)
+#define cap_pgsel_inv(c)	(((c) >> 39) & 1)
+
+#define cap_super_page_val(c)	(((c) >> 34) & 0xf)
+#define cap_super_offset(c)	(((find_first_bit(&cap_super_page_val(c), 4)) \
+					* OFFSET_STRIDE) + 21)
+
+#define cap_fault_reg_offset(c)	((((c) >> 24) & 0x3ff) * 16)
+#define cap_max_fault_reg_offset(c) \
+	(cap_fault_reg_offset(c) + cap_num_fault_regs(c) * 16)
+
+#define cap_zlr(c)		(((c) >> 22) & 1)
+#define cap_isoch(c)		(((c) >> 23) & 1)
+#define cap_mgaw(c)		((((c) >> 16) & 0x3f) + 1)
+#define cap_sagaw(c)		(((c) >> 8) & 0x1f)
+#define cap_caching_mode(c)	(((c) >> 7) & 1)
+#define cap_phmr(c)		(((c) >> 6) & 1)
+#define cap_plmr(c)		(((c) >> 5) & 1)
+#define cap_rwbf(c)		(((c) >> 4) & 1)
+#define cap_afl(c)		(((c) >> 3) & 1)
+#define cap_ndoms(c)		(((unsigned long)1) << (4 + 2 * ((c) & 0x7)))
+/*
+ * Extended Capability Register
+ */
+
+#define ecap_niotlb_iunits(e)	((((e) >> 24) & 0xff) + 1)
+#define ecap_iotlb_offset(e) 	((((e) >> 8) & 0x3ff) * 16)
+#define ecap_max_iotlb_offset(e) \
+	(ecap_iotlb_offset(e) + ecap_niotlb_iunits(e) * 16)
+#define ecap_coherent(e)	((e) & 0x1)
+
+
+/* IOTLB_REG */
+#define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
+#define DMA_TLB_DSI_FLUSH (((u64)2) << 60)
+#define DMA_TLB_PSI_FLUSH (((u64)3) << 60)
+#define DMA_TLB_IIRG(type) ((type >> 60) & 7)
+#define DMA_TLB_IAIG(val) (((val) >> 57) & 7)
+#define DMA_TLB_READ_DRAIN (((u64)1) << 49)
+#define DMA_TLB_WRITE_DRAIN (((u64)1) << 48)
+#define DMA_TLB_DID(id)	(((u64)((id) & 0xffff)) << 32)
+#define DMA_TLB_IVT (((u64)1) << 63)
+#define DMA_TLB_IH_NONLEAF (((u64)1) << 6)
+#define DMA_TLB_MAX_SIZE (0x3f)
+
+/* PMEN_REG */
+#define DMA_PMEN_EPM (((u32)1)<<31)
+#define DMA_PMEN_PRS (((u32)1)<<0)
+
+/* GCMD_REG */
+#define DMA_GCMD_TE (((u32)1) << 31)
+#define DMA_GCMD_SRTP (((u32)1) << 30)
+#define DMA_GCMD_SFL (((u32)1) << 29)
+#define DMA_GCMD_EAFL (((u32)1) << 28)
+#define DMA_GCMD_WBF (((u32)1) << 27)
+
+/* GSTS_REG */
+#define DMA_GSTS_TES (((u32)1) << 31)
+#define DMA_GSTS_RTPS (((u32)1) << 30)
+#define DMA_GSTS_FLS (((u32)1) << 29)
+#define DMA_GSTS_AFLS (((u32)1) << 28)
+#define DMA_GSTS_WBFS (((u32)1) << 27)
+
+/* CCMD_REG */
+#define DMA_CCMD_ICC (((u64)1) << 63)
+#define DMA_CCMD_GLOBAL_INVL (((u64)1) << 61)
+#define DMA_CCMD_DOMAIN_INVL (((u64)2) << 61)
+#define DMA_CCMD_DEVICE_INVL (((u64)3) << 61)
+#define DMA_CCMD_FM(m) (((u64)((m) & 0x3)) << 32)
+#define DMA_CCMD_MASK_NOBIT 0
+#define DMA_CCMD_MASK_1BIT 1
+#define DMA_CCMD_MASK_2BIT 2
+#define DMA_CCMD_MASK_3BIT 3
+#define DMA_CCMD_SID(s) (((u64)((s) & 0xffff)) << 16)
+#define DMA_CCMD_DID(d) ((u64)((d) & 0xffff))
+
+/* FECTL_REG */
+#define DMA_FECTL_IM (((u32)1) << 31)
+
+/* FSTS_REG */
+#define DMA_FSTS_PPF ((u32)2)
+#define DMA_FSTS_PFO ((u32)1)
+#define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
+
+/* FRCD_REG, 32 bits access */
+#define DMA_FRCD_F (((u32)1) << 31)
+#define dma_frcd_type(d) ((d >> 30) & 1)
+#define dma_frcd_fault_reason(c) (c & 0xff)
+#define dma_frcd_source_id(c) (c & 0xffff)
+#define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
+
+/*
+ * 0: Present
+ * 1-11: Reserved
+ * 12-63: Context Ptr (12 - (haw-1))
+ * 64-127: Reserved
+ */
+struct root_entry {
+	u64	val;
+	u64	rsvd1;
+};
+#define ROOT_ENTRY_NR (PAGE_SIZE_4K/sizeof(struct root_entry))
+static inline bool root_present(struct root_entry *root)
+{
+	return (root->val & 1);
+}
+static inline void set_root_present(struct root_entry *root)
+{
+	root->val |= 1;
+}
+static inline void set_root_value(struct root_entry *root, unsigned long value)
+{
+	root->val |= value & PAGE_MASK_4K;
+}
+
+struct context_entry;
+static inline struct context_entry *
+get_context_addr_from_root(struct root_entry *root)
+{
+	return (struct context_entry *)
+		(root_present(root)?phys_to_virt(
+		root->val & PAGE_MASK_4K):
+		NULL);
+}
+
+/*
+ * low 64 bits:
+ * 0: present
+ * 1: fault processing disable
+ * 2-3: translation type
+ * 12-63: address space root
+ * high 64 bits:
+ * 0-2: address width
+ * 3-6: aval
+ * 8-23: domain id
+ */
+struct context_entry {
+	u64 lo;
+	u64 hi;
+};
+#define context_present(c) ((c).lo & 1)
+#define context_fault_disable(c) (((c).lo >> 1) & 1)
+#define context_translation_type(c) (((c).lo >> 2) & 3)
+#define context_address_root(c) ((c).lo & PAGE_MASK_4K)
+#define context_address_width(c) ((c).hi &  7)
+#define context_domain_id(c) (((c).hi >> 8) & ((1 << 16) - 1))
+
+#define context_set_present(c) do {(c).lo |= 1;} while (0)
+#define context_set_fault_enable(c) \
+	do {(c).lo &= (((u64)-1) << 2) | 1;} while (0)
+#define context_set_translation_type(c, val) \
+	do { \
+		(c).lo &= (((u64)-1) << 4) | 3; \
+		(c).lo |= ((val) & 3) << 2; \
+	} while (0)
+#define CONTEXT_TT_MULTI_LEVEL 0
+#define context_set_address_root(c, val) \
+	do {(c).lo |= (val) & PAGE_MASK_4K;} while (0)
+#define context_set_address_width(c, val) do {(c).hi |= (val) & 7;} while (0)
+#define context_set_domain_id(c, val) \
+	do {(c).hi |= ((val) & ((1 << 16) - 1)) << 8;} while (0)
+#define context_clear_entry(c) do {(c).lo = 0; (c).hi = 0;} while (0)
+
+/*
+ * 0: readable
+ * 1: writable
+ * 2-6: reserved
+ * 7: super page
+ * 8-11: available
+ * 12-63: Host physcial address
+ */
+struct dma_pte {
+	u64 val;
+};
+#define dma_clear_pte(p)	do {(p).val = 0;} while (0)
+
+#define DMA_PTE_READ (1)
+#define DMA_PTE_WRITE (2)
+
+#define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while (0)
+#define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while (0)
+#define dma_set_pte_prot(p, prot) \
+		do {(p).val = ((p).val & ~3) | ((prot) & 3); } while (0)
+#define dma_pte_addr(p) ((p).val & PAGE_MASK_4K)
+#define dma_set_pte_addr(p, addr) do {\
+		(p).val |= ((addr) & PAGE_MASK_4K); } while (0)
+#define dma_pte_present(p) (((p).val & 3) != 0)
+
+struct intel_iommu;
+
+struct dmar_domain {
+	int	id;			/* domain id */
+	struct intel_iommu *iommu;	/* back pointer to owning iommu */
+
+	struct list_head devices; 	/* all devices' list */
+	struct iova_domain iovad;	/* iova's that belong to this domain */
+
+	struct dma_pte	*pgd;		/* virtual address */
+	spinlock_t	mapping_lock;	/* page table lock */
+	int		gaw;		/* max guest address width */
+
+	/* adjusted guest address width, 0 is level 2 30-bit */
+	int		agaw;
+
+#define DOMAIN_FLAG_MULTIPLE_DEVICES 1
+	int		flags;
+};
+
+/* PCI domain-device relationship */
+struct device_domain_info {
+	struct list_head link;	/* link to domain siblings */
+	struct list_head global; /* link to global list */
+	u8 bus;			/* PCI bus numer */
+	u8 devfn;		/* PCI devfn number */
+	struct pci_dev *dev; /* it's NULL for PCIE-to-PCI bridge */
+	struct dmar_domain *domain; /* pointer to domain */
+};
+
+extern int init_dmars(void);
+
+struct intel_iommu {
+	void __iomem	*reg; /* Pointer to hardware regs, virtual addr */
+	u64		cap;
+	u64		ecap;
+	unsigned long 	*domain_ids; /* bitmap of domains */
+	struct dmar_domain **domains; /* ptr to domains */
+	int		seg;
+	u32		gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
+	spinlock_t	lock; /* protect context, domain ids */
+	spinlock_t	register_lock; /* protect register handling */
+	struct root_entry *root_entry; /* virtual address */
+
+	unsigned int irq;
+	unsigned char name[7];    /* Device Name */
+	struct msi_msg saved_msg;
+	struct sys_device sysdev;
+};
+
+#ifndef CONFIG_DMAR_GFX_WA
+static inline void iommu_prepare_gfx_mapping(void)
+{
+	return;
+}
+#endif /* !CONFIG_DMAR_GFX_WA */
+
+void intel_iommu_domain_exit(struct dmar_domain *domain);
+struct dmar_domain *intel_iommu_domain_alloc(struct pci_dev *pdev);
+int intel_iommu_context_mapping(struct dmar_domain *domain,
+				struct pci_dev *pdev);
+int intel_iommu_page_mapping(struct dmar_domain *domain, dma_addr_t iova,
+			     u64 hpa, size_t size, int prot);
+void intel_iommu_detach_dev(struct dmar_domain *domain, u8 bus, u8 devfn);
+struct dmar_domain *intel_iommu_find_domain(struct pci_dev *pdev);
+int intel_iommu_found(void);
+u64 intel_iommu_iova_to_pfn(struct dmar_domain *domain, u64 iova);
+
+#endif
diff --git a/include/linux/iova.h b/include/linux/iova.h
new file mode 100644
index 0000000..228f6c9
--- /dev/null
+++ b/include/linux/iova.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Copyright (C) 2006-2008 Intel Corporation
+ * Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
+ *
+ */
+
+#ifndef _IOVA_H_
+#define _IOVA_H_
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/rbtree.h>
+#include <linux/dma-mapping.h>
+
+/* IO virtual address start page frame number */
+#define IOVA_START_PFN		(1)
+
+/* iova structure */
+struct iova {
+	struct rb_node	node;
+	unsigned long	pfn_hi; /* IOMMU dish out addr hi */
+	unsigned long	pfn_lo; /* IOMMU dish out addr lo */
+};
+
+/* holds all the iova translations for a domain */
+struct iova_domain {
+	spinlock_t	iova_alloc_lock;/* Lock to protect iova  allocation */
+	spinlock_t	iova_rbtree_lock; /* Lock to protect update of rbtree */
+	struct rb_root	rbroot;		/* iova domain rbtree root */
+	struct rb_node	*cached32_node; /* Save last alloced node */
+	unsigned long	dma_32bit_pfn;
+};
+
+struct iova *alloc_iova_mem(void);
+void free_iova_mem(struct iova *iova);
+void free_iova(struct iova_domain *iovad, unsigned long pfn);
+void __free_iova(struct iova_domain *iovad, struct iova *iova);
+struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size,
+	unsigned long limit_pfn,
+	bool size_aligned);
+struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo,
+	unsigned long pfn_hi);
+void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to);
+void init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit);
+struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn);
+void put_iova_domain(struct iova_domain *iovad);
+
+#endif
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/6] KVM: PCIPT: VT-d support
  2008-07-16 15:56         ` [PATCH 4/6] VT-d: changes to support KVM Ben-Ami Yassour
@ 2008-07-16 15:56           ` Ben-Ami Yassour
  2008-07-16 15:56             ` [PATCH 6/6] KVM: PCIPT: VT-d: dont map mmio memory slots Ben-Ami Yassour
  0 siblings, 1 reply; 11+ messages in thread
From: Ben-Ami Yassour @ 2008-07-16 15:56 UTC (permalink / raw)
  To: amit.shah; +Cc: kvm, muli, benami, weidong.han, anthony, Kay, Allen M

From: Kay, Allen M <allen.m.kay@intel.com>

This patch includes the functions to support VT-d for passthrough
devices.

[Ben: fixed memory pinning, cleanup]

Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
---
 arch/x86/kvm/Makefile      |    2 +-
 arch/x86/kvm/vtd.c         |  176 ++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c         |   11 +++
 include/asm-x86/kvm_host.h |    1 +
 include/linux/kvm_host.h   |    6 ++
 virt/kvm/kvm_main.c        |    6 ++
 6 files changed, 201 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/kvm/vtd.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index d0e940b..5d9d079 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -11,7 +11,7 @@ endif
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
 
 kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
-	i8254.o
+	i8254.o vtd.o
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
new file mode 100644
index 0000000..83efb8a
--- /dev/null
+++ b/arch/x86/kvm/vtd.c
@@ -0,0 +1,176 @@
+/*
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Copyright (C) 2006-2008 Intel Corporation
+ * Author: Allen M. Kay <allen.m.kay@intel.com>
+ * Author: Weidong Han <weidong.han@intel.com>
+ */
+
+#include <linux/list.h>
+#include <linux/kvm_host.h>
+#include <linux/pci.h>
+#include <linux/dmar.h>
+#include <linux/intel-iommu.h>
+
+static int kvm_iommu_unmap_memslots(struct kvm *kvm);
+
+int kvm_iommu_map_pages(struct kvm *kvm,
+			gfn_t base_gfn, unsigned long npages)
+{
+	gfn_t gfn = base_gfn;
+	pfn_t pfn;
+	int i, rc;
+	struct dmar_domain *domain = kvm->arch.intel_iommu_domain;
+
+	if (!domain)
+		return -EFAULT;
+
+	for (i = 0; i < npages; i++) {
+		pfn = gfn_to_pfn(kvm, gfn);
+		rc = intel_iommu_page_mapping(domain,
+					      gfn << PAGE_SHIFT,
+						      pfn << PAGE_SHIFT,
+					      PAGE_SIZE,
+					      DMA_PTE_READ |
+					      DMA_PTE_WRITE);
+		if (rc)
+			kvm_release_pfn_clean(pfn);
+
+		gfn++;
+	}
+	return 0;
+}
+
+static int kvm_iommu_map_memslots(struct kvm *kvm)
+{
+	int i, rc;
+	for (i = 0; i < kvm->nmemslots; i++) {
+		rc = kvm_iommu_map_pages(kvm, kvm->memslots[i].base_gfn,
+					 kvm->memslots[i].npages);
+		if (rc)
+			return rc;
+	}
+	return 0;
+}
+
+int kvm_iommu_map_guest(struct kvm *kvm,
+			struct kvm_pci_passthrough_dev *pci_pt_dev)
+{
+	struct pci_dev *pdev = NULL;
+
+	printk(KERN_DEBUG "VT-d direct map: host bdf = %x:%x:%x\n",
+	       pci_pt_dev->host.busnr,
+	       PCI_SLOT(pci_pt_dev->host.devfn),
+	       PCI_FUNC(pci_pt_dev->host.devfn));
+
+	for_each_pci_dev(pdev) {
+		if ((pdev->bus->number == pci_pt_dev->host.busnr) &&
+		    (pdev->devfn == pci_pt_dev->host.devfn)) {
+			break;
+		}
+	}
+
+	if (pdev == NULL) {
+		if (kvm->arch.intel_iommu_domain) {
+			intel_iommu_domain_exit(kvm->arch.intel_iommu_domain);
+			kvm->arch.intel_iommu_domain = NULL;
+		}
+		return -ENODEV;
+	}
+
+	kvm->arch.intel_iommu_domain = intel_iommu_domain_alloc(pdev);
+
+	if (kvm_iommu_map_memslots(kvm)) {
+		kvm_iommu_unmap_memslots(kvm);
+		return -EFAULT;
+	}
+
+	intel_iommu_detach_dev(kvm->arch.intel_iommu_domain,
+			       pdev->bus->number, pdev->devfn);
+
+	if (intel_iommu_context_mapping(kvm->arch.intel_iommu_domain,
+					pdev)) {
+		printk(KERN_ERR "Domain context map for %s failed",
+		       pci_name(pdev));
+		return -EFAULT;
+	}
+	return 0;
+}
+
+static int kvm_iommu_put_pages(struct kvm *kvm,
+			       gfn_t base_gfn, unsigned long npages)
+{
+	gfn_t gfn = base_gfn;
+	pfn_t pfn;
+	struct dmar_domain *domain = kvm->arch.intel_iommu_domain;
+	int i;
+
+	if (!domain)
+		return -EFAULT;
+
+	for (i = 0; i < npages; i++) {
+		pfn = (pfn_t)intel_iommu_iova_to_pfn(domain,
+						     gfn << PAGE_SHIFT);
+		kvm_release_pfn_clean(pfn);
+		gfn++;
+	}
+	return 0;
+}
+
+static int kvm_iommu_unmap_memslots(struct kvm *kvm)
+{
+	int i, rc;
+	for (i = 0; i < kvm->nmemslots; i++) {
+		rc = kvm_iommu_put_pages(kvm, kvm->memslots[i].base_gfn,
+					 kvm->memslots[i].npages);
+		if (rc)
+			return rc;
+	}
+	return 0;
+}
+
+int kvm_iommu_unmap_guest(struct kvm *kvm)
+{
+	struct kvm_pci_pt_dev_list *entry;
+	struct pci_dev *pdev = NULL;
+	struct dmar_domain *domain = kvm->arch.intel_iommu_domain;
+
+	if (!domain)
+		return 0;
+
+	list_for_each_entry(entry, &kvm->arch.pci_pt_dev_head, list) {
+		printk(KERN_DEBUG "VT-d unmap: host bdf = %x:%x:%x\n",
+		       entry->pt_dev.host.busnr,
+		       PCI_SLOT(entry->pt_dev.host.devfn),
+		       PCI_FUNC(entry->pt_dev.host.devfn));
+
+		for_each_pci_dev(pdev) {
+			if ((pdev->bus->number == entry->pt_dev.host.busnr) &&
+			    (pdev->devfn == entry->pt_dev.host.devfn))
+				break;
+		}
+
+		if (pdev == NULL)
+			return -ENODEV;
+
+		/* detach kvm dmar domain */
+		intel_iommu_detach_dev(domain,
+				       pdev->bus->number, pdev->devfn);
+	}
+	kvm_iommu_unmap_memslots(kvm);
+	intel_iommu_domain_exit(domain);
+	return 0;
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 65b307d..9a1caf0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -33,6 +33,7 @@
 #include <linux/module.h>
 #include <linux/mman.h>
 #include <linux/highmem.h>
+#include <linux/intel-iommu.h>
 
 #include <asm/uaccess.h>
 #include <asm/msr.h>
@@ -302,8 +303,16 @@ static int kvm_vm_ioctl_pci_pt_dev(struct kvm *kvm,
 		}
 	}
 
+	if (intel_iommu_found()) {
+		r = kvm_iommu_map_guest(kvm, pci_pt_dev);
+		if (r)
+			goto out_intr;
+	}
+
 out:
 	return r;
+out_intr:
+	free_irq(dev->irq, &match->pt_dev);
 out_list_del:
 	list_del(&match->list);
 out_put:
@@ -4246,6 +4255,8 @@ static void kvm_free_vcpus(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
+	if (intel_iommu_found())
+		kvm_iommu_unmap_guest(kvm);
 	kvm_free_pci_passthrough(kvm);
 	kvm_free_pit(kvm);
 	kfree(kvm->arch.vpic);
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index f6973e0..6185ed7 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -364,6 +364,7 @@ struct kvm_arch{
 	 */
 	struct list_head active_mmu_pages;
 	struct list_head pci_pt_dev_head;
+	struct dmar_domain *intel_iommu_domain;
 	struct kvm_pic *vpic;
 	struct kvm_ioapic *vioapic;
 	struct kvm_pit *vpit;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3798097..e9eae80 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -279,6 +279,12 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v);
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 
+int kvm_iommu_map_pages(struct kvm *kvm, gfn_t base_gfn,
+			unsigned long npages);
+int kvm_iommu_map_guest(struct kvm *kvm,
+			struct kvm_pci_passthrough_dev *pci_pt_dev);
+int kvm_iommu_unmap_guest(struct kvm *kvm);
+
 static inline void kvm_guest_enter(void)
 {
 	account_system_vtime(current);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 825fbd3..77d7001 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -41,6 +41,7 @@
 #include <linux/pagemap.h>
 #include <linux/mman.h>
 #include <linux/swap.h>
+#include <linux/intel-iommu.h>
 
 #include <asm/processor.h>
 #include <asm/io.h>
@@ -425,6 +426,11 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	}
 
 	kvm_free_physmem_slot(&old, &new);
+
+	/* map the pages in iommu page table */
+	if (intel_iommu_found())
+		kvm_iommu_map_pages(kvm, base_gfn, npages);
+
 	return 0;
 
 out_free:
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/6] KVM: PCIPT: VT-d: dont map mmio memory slots
  2008-07-16 15:56           ` [PATCH 5/6] KVM: PCIPT: VT-d support Ben-Ami Yassour
@ 2008-07-16 15:56             ` Ben-Ami Yassour
  0 siblings, 0 replies; 11+ messages in thread
From: Ben-Ami Yassour @ 2008-07-16 15:56 UTC (permalink / raw)
  To: amit.shah; +Cc: kvm, muli, benami, weidong.han, anthony

Avoid mapping mmio memory slots.

Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
---
 arch/x86/kvm/vtd.c         |   20 +++++++++++++-------
 include/asm-x86/kvm_host.h |    2 ++
 virt/kvm/kvm_main.c        |    2 +-
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
index 83efb8a..77044fb 100644
--- a/arch/x86/kvm/vtd.c
+++ b/arch/x86/kvm/vtd.c
@@ -40,14 +40,20 @@ int kvm_iommu_map_pages(struct kvm *kvm,
 
 	for (i = 0; i < npages; i++) {
 		pfn = gfn_to_pfn(kvm, gfn);
-		rc = intel_iommu_page_mapping(domain,
-					      gfn << PAGE_SHIFT,
+		if (!is_mmio_pfn(pfn)) {
+			rc = intel_iommu_page_mapping(domain,
+						      gfn << PAGE_SHIFT,
 						      pfn << PAGE_SHIFT,
-					      PAGE_SIZE,
-					      DMA_PTE_READ |
-					      DMA_PTE_WRITE);
-		if (rc)
-			kvm_release_pfn_clean(pfn);
+						      PAGE_SIZE,
+						      DMA_PTE_READ |
+						      DMA_PTE_WRITE);
+			if (rc)
+				kvm_release_pfn_clean(pfn);
+		} else {
+			printk(KERN_DEBUG "kvm_iommu_map_page:"
+			       "invalid pfn=%lx\n", pfn);
+			return 0;
+		}
 
 		gfn++;
 	}
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 6185ed7..ee4685c 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -513,6 +513,8 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
 		  gpa_t addr, unsigned long *ret);
 
+int is_mmio_pfn(pfn_t pfn);
+
 extern bool tdp_enabled;
 
 enum emulation_result {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 77d7001..0653ec1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -77,7 +77,7 @@ static inline int valid_vcpu(int n)
 	return likely(n >= 0 && n < KVM_MAX_VCPUS);
 }
 
-static inline int is_mmio_pfn(pfn_t pfn)
+inline int is_mmio_pfn(pfn_t pfn)
 {
 	if (pfn_valid(pfn))
 		return PageReserved(pfn_to_page(pfn));
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/6] KVM: Handle device assignment to guests
  2008-07-16 15:56       ` Ben-Ami Yassour
  2008-07-16 15:56         ` [PATCH 4/6] VT-d: changes to support KVM Ben-Ami Yassour
@ 2008-07-16 16:02         ` Ben-Ami Yassour
  1 sibling, 0 replies; 11+ messages in thread
From: Ben-Ami Yassour @ 2008-07-16 16:02 UTC (permalink / raw)
  To: amit.shah; +Cc: kvm, Muli Ben-Yehuda, weidong.han, anthony

Please ignore this repeated patch



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/6] KVM: Handle device assignment to guests
  2008-07-16 15:56     ` [PATCH 3/6] KVM: Handle device assignment to guests Ben-Ami Yassour
  2008-07-16 15:56       ` Ben-Ami Yassour
@ 2008-07-17  2:00       ` Yang, Sheng
  2008-07-22 12:23         ` Ben-Ami Yassour
  1 sibling, 1 reply; 11+ messages in thread
From: Yang, Sheng @ 2008-07-17  2:00 UTC (permalink / raw)
  To: kvm; +Cc: Ben-Ami Yassour, amit.shah, muli, weidong.han, anthony

Some comments below. :)

On Wednesday 16 July 2008 23:56:50 Ben-Ami Yassour wrote:
> From: Amit Shah <amit.shah@qumranet.com>
>
> This patch adds support for handling PCI devices that are assigned
> to the guest ("PCI passthrough").
>
> The device to be assigned to the guest is registered in the host
> kernel and interrupt delivery is handled. If a device is already
> assigned, or the device driver for it is still loaded on the host,
> the device assignment is failed by conveying a -EBUSY reply to the
> userspace.
>
> Devices that share their interrupt line are not supported at the
> moment.
>
> By itself, this patch will not make devices work within the guest.
> The VT-d extension is required to enable the device to perform DMA.
> Another alternative is PVDMA.
>
> Signed-off-by: Amit Shah <amit.shah@qumranet.com>
> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> Signed-off-by: Han, Weidong <weidong.han@intel.com>
> ---
>  arch/x86/kvm/x86.c         |  267
> ++++++++++++++++++++++++++++++++++++++++++++
> include/asm-x86/kvm_host.h |   37 ++++++
>  include/asm-x86/kvm_para.h |   16 +++-
>  include/linux/kvm.h        |    3 +
>  virt/kvm/ioapic.c          |   12 ++-
>  5 files changed, 332 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3167006..65b307d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4,10 +4,12 @@
>   * derived from drivers/kvm/kvm_main.c
>   *
>   * Copyright (C) 2006 Qumranet, Inc.
> + * Copyright (C) 2008 Qumranet, Inc.
>   *
>   * Authors:
>   *   Avi Kivity   <avi@qumranet.com>
>   *   Yaniv Kamay  <yaniv@qumranet.com>
> + *   Amit Shah    <amit.shah@qumranet.com>
>   *
>   * This work is licensed under the terms of the GNU GPL, version
> 2.  See * the COPYING file in the top-level directory.
> @@ -23,8 +25,10 @@
>  #include "x86.h"
>
>  #include <linux/clocksource.h>
> +#include <linux/interrupt.h>
>  #include <linux/kvm.h>
>  #include <linux/fs.h>
> +#include <linux/pci.h>
>  #include <linux/vmalloc.h>
>  #include <linux/module.h>
>  #include <linux/mman.h>
> @@ -98,6 +102,256 @@ struct kvm_stats_debugfs_item
> debugfs_entries[] = { { NULL }
>  };
>
[snip]
> +
> +static int kvm_vm_ioctl_pci_pt_dev(struct kvm *kvm,
> +				   struct kvm_pci_passthrough_dev *pci_pt_dev)
> +{
> +	int r = 0;
> +	struct kvm_pci_pt_dev_list *match;
> +	struct pci_dev *dev;
> +
> +	write_lock(&kvm_pci_pt_lock);
> +
> +	/* Check if this is a request to update the irq of the device
> +	 * in the guest (BIOS/ kernels can dynamically reprogram irq
> +	 * numbers).  This also protects us from adding the same
> +	 * device twice.
> +	 */
> +	match = kvm_find_pci_pt_dev(&kvm->arch.pci_pt_dev_head,
> +				    &pci_pt_dev->host, 0, KVM_PT_SOURCE_UPDATE);
> +	if (match) {
> +		match->pt_dev.guest.irq = pci_pt_dev->guest.irq;
> +		write_unlock(&kvm_pci_pt_lock);
> +		goto out;
> +	}
> +	write_unlock(&kvm_pci_pt_lock);
> +
> +	match = kzalloc(sizeof(struct kvm_pci_pt_dev_list), GFP_KERNEL);
> +	if (match == NULL) {
> +		printk(KERN_INFO "%s: Couldn't allocate memory\n",
> +		       __func__);
> +		r = -ENOMEM;
> +		goto out;
> +	}
> +	dev = pci_get_bus_and_slot(pci_pt_dev->host.busnr,
> +				   pci_pt_dev->host.devfn);
> +	if (!dev) {
> +		printk(KERN_INFO "%s: host device not found\n", __func__);
> +		r = -EINVAL;
> +		goto out_free;
> +	}
> +	if (pci_enable_device(dev)) {
> +		printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
> +		r = -EBUSY;
> +		goto out_put;
> +	}
> +	r = pci_request_regions(dev, "kvm_pt_device");
> +	if (r) {
> +		printk(KERN_INFO "%s: Could not get access to device regions\n",
> +		       __func__);
> +		goto out_put;

pci_disable_device()?

> +	}
> +	match->pt_dev.guest.busnr = pci_pt_dev->guest.busnr;
> +	match->pt_dev.guest.devfn = pci_pt_dev->guest.devfn;
> +	match->pt_dev.host.busnr = pci_pt_dev->host.busnr;
> +	match->pt_dev.host.devfn = pci_pt_dev->host.devfn;
> +	match->pt_dev.dev = dev;
> +
> +	write_lock(&kvm_pci_pt_lock);
> +
> +	INIT_WORK(&match->pt_dev.int_work.work, kvm_pci_pt_int_work_fn);
> +	INIT_WORK(&match->pt_dev.ack_work.work, kvm_pci_pt_ack_work_fn);
> +
> +	match->pt_dev.kvm = kvm;
> +	match->pt_dev.int_work.pt_dev = &match->pt_dev;
> +	match->pt_dev.ack_work.pt_dev = &match->pt_dev;
> +
> +	list_add(&match->list, &kvm->arch.pci_pt_dev_head);
> +
> +	write_unlock(&kvm_pci_pt_lock);
> +
> +	if (irqchip_in_kernel(kvm)) {
> +		match->pt_dev.guest.irq = pci_pt_dev->guest.irq;
> +		match->pt_dev.host.irq = dev->irq;
> +		if (kvm->arch.vioapic)
> +			kvm->arch.vioapic->ack_notifier = kvm_pci_pt_ack_irq;
> +		if (kvm->arch.vpic)
> +			kvm->arch.vpic->ack_notifier = kvm_pci_pt_ack_irq;
> +
> +		/* Even though this is PCI, we don't want to use shared
> +		 * interrupts. Sharing host devices with guest-assigned devices
> +		 * on the same interrupt line is not a happy situation: there
> +		 * are going to be long delays in accepting, acking, etc.
> +		 */
> +		if (request_irq(dev->irq, kvm_pci_pt_dev_intr, 0,
> +				"kvm_pt_device", (void *)&match->pt_dev)) {
> +			printk(KERN_INFO "%s: couldn't allocate irq for pv "
> +			       "device\n", __func__);
> +			r = -EIO;
> +			goto out_list_del;
> +		}

PCI memory region has not been freed if request_irq() fail.

> +	}
> +
> +out:
> +	return r;
> +out_list_del:
> +	list_del(&match->list);
> +out_put:
> +	pci_dev_put(dev);
> +out_free:
> +	kfree(match);
> +	goto out;
> +}
> +
> +static void kvm_free_pci_passthrough(struct kvm *kvm)
> +{
> +	struct list_head *ptr, *ptr2;
> +	struct kvm_pci_pt_dev_list *pci_pt_dev;
> +
> +	write_lock(&kvm_pci_pt_lock);
> +	list_for_each_safe(ptr, ptr2, &kvm->arch.pci_pt_dev_head) {
> +		pci_pt_dev = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
> +
> +		if (irqchip_in_kernel(kvm) && pci_pt_dev->pt_dev.host.irq)
> +			free_irq(pci_pt_dev->pt_dev.host.irq,
> +				 (void *)&pci_pt_dev->pt_dev);
> +
> +		if (cancel_work_sync(&pci_pt_dev->pt_dev.int_work.work))
> +			/* We had pending work. That means we will have to take
> +			 * care of kvm_put_kvm.
> +			 */
> +			kvm_put_kvm(kvm);

Cancel_work_sync() is might_sleep() within spinlock...

-- 
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/6] KVM: Handle device assignment to guests
  2008-07-17  2:00       ` Yang, Sheng
@ 2008-07-22 12:23         ` Ben-Ami Yassour
  0 siblings, 0 replies; 11+ messages in thread
From: Ben-Ami Yassour @ 2008-07-22 12:23 UTC (permalink / raw)
  To: Yang, Sheng; +Cc: kvm, amit.shah, Muli Ben-Yehuda, weidong.han, anthony

Sheng, 

Thanks for the comments, I sent an updated version of the patches.
Please see reply below.

Thanks,
Ben

On Thu, 2008-07-17 at 10:00 +0800, Yang, Sheng wrote:
> Some comments below. :)
> 
> On Wednesday 16 July 2008 23:56:50 Ben-Ami Yassour wrote:
> > +		       __func__);
> > +		goto out_put;
> 
> pci_disable_device()?
fixed in the new version.
> 
> > +	}
> > +	match->pt_dev.guest.busnr = pci_pt_dev->guest.busnr;
> > +	match->pt_dev.guest.devfn = pci_pt_dev->guest.devfn;
> > +	match->pt_dev.host.busnr = pci_pt_dev->host.busnr;
> > +	match->pt_dev.host.devfn = pci_pt_dev->host.devfn;
> > +	match->pt_dev.dev = dev;
> > +
> > +	write_lock(&kvm_pci_pt_lock);
> > +
> > +	INIT_WORK(&match->pt_dev.int_work.work, kvm_pci_pt_int_work_fn);
> > +	INIT_WORK(&match->pt_dev.ack_work.work, kvm_pci_pt_ack_work_fn);
> > +
> > +	match->pt_dev.kvm = kvm;
> > +	match->pt_dev.int_work.pt_dev = &match->pt_dev;
> > +	match->pt_dev.ack_work.pt_dev = &match->pt_dev;
> > +
> > +	list_add(&match->list, &kvm->arch.pci_pt_dev_head);
> > +
> > +	write_unlock(&kvm_pci_pt_lock);
> > +
> > +	if (irqchip_in_kernel(kvm)) {
> > +		match->pt_dev.guest.irq = pci_pt_dev->guest.irq;
> > +		match->pt_dev.host.irq = dev->irq;
> > +		if (kvm->arch.vioapic)
> > +			kvm->arch.vioapic->ack_notifier = kvm_pci_pt_ack_irq;
> > +		if (kvm->arch.vpic)
> > +			kvm->arch.vpic->ack_notifier = kvm_pci_pt_ack_irq;
> > +
> > +		/* Even though this is PCI, we don't want to use shared
> > +		 * interrupts. Sharing host devices with guest-assigned devices
> > +		 * on the same interrupt line is not a happy situation: there
> > +		 * are going to be long delays in accepting, acking, etc.
> > +		 */
> > +		if (request_irq(dev->irq, kvm_pci_pt_dev_intr, 0,
> > +				"kvm_pt_device", (void *)&match->pt_dev)) {
> > +			printk(KERN_INFO "%s: couldn't allocate irq for pv "
> > +			       "device\n", __func__);
> > +			r = -EIO;
> > +			goto out_list_del;
> > +		}
> 
> PCI memory region has not been freed if request_irq() fail.
> 
fixed in the new version.


> > +	struct list_head *ptr, *ptr2;
> > +	struct kvm_pci_pt_dev_list *pci_pt_dev;
> > +
> > +	write_lock(&kvm_pci_pt_lock);
> > +	list_for_each_safe(ptr, ptr2, &kvm->arch.pci_pt_dev_head) {
> > +		pci_pt_dev = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
> > +
> > +		if (irqchip_in_kernel(kvm) && pci_pt_dev->pt_dev.host.irq)
> > +			free_irq(pci_pt_dev->pt_dev.host.irq,
> > +				 (void *)&pci_pt_dev->pt_dev);
> > +
> > +		if (cancel_work_sync(&pci_pt_dev->pt_dev.int_work.work))
> > +			/* We had pending work. That means we will have to take
> > +			 * care of kvm_put_kvm.
> > +			 */
> > +			kvm_put_kvm(kvm);
> 
> Cancel_work_sync() is might_sleep() within spinlock...

Due to other changes the lock was no longer needed here and removed.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-07-22 12:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-16 15:56 PCI passthrough with VT-d - native performance Ben-Ami Yassour
2008-07-16 15:56 ` [PATCH 1/6] KVM: Introduce a callback routine for IOAPIC ack handling Ben-Ami Yassour
2008-07-16 15:56   ` [PATCH 2/6] KVM: Introduce a callback routine for PIC " Ben-Ami Yassour
2008-07-16 15:56     ` [PATCH 3/6] KVM: Handle device assignment to guests Ben-Ami Yassour
2008-07-16 15:56       ` Ben-Ami Yassour
2008-07-16 15:56         ` [PATCH 4/6] VT-d: changes to support KVM Ben-Ami Yassour
2008-07-16 15:56           ` [PATCH 5/6] KVM: PCIPT: VT-d support Ben-Ami Yassour
2008-07-16 15:56             ` [PATCH 6/6] KVM: PCIPT: VT-d: dont map mmio memory slots Ben-Ami Yassour
2008-07-16 16:02         ` [PATCH 3/6] KVM: Handle device assignment to guests Ben-Ami Yassour
2008-07-17  2:00       ` Yang, Sheng
2008-07-22 12:23         ` Ben-Ami Yassour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox