kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Device assignemnt: updated patches
@ 2008-07-28 16:26 Ben-Ami Yassour
  2008-07-28 16:26 ` [PATCH 1/5] KVM: PCIPT: direct mmio pfn check Ben-Ami Yassour
  2008-07-30 15:21 ` Device assignemnt: updated patches Avi Kivity
  0 siblings, 2 replies; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-07-28 16:26 UTC (permalink / raw)
  To: avi; +Cc: amit.shah, kvm, muli, benami, weidong.han, anthony

Following are the device assignment patches with the fixes of the
comments that were sent for the previous version.

The main changes with respect to the previous version:
1. Add a missing patch that better defines mmio memory slots.
2. Change the device assignment ioctls: one ioctl for assigning a
device, and another to set the irq.

Pending comment: support shared guest interrputs.

Comments are appriciated.

Regards,
Ben



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/5] KVM: PCIPT: direct mmio pfn check
  2008-07-28 16:26 Device assignemnt: updated patches Ben-Ami Yassour
@ 2008-07-28 16:26 ` Ben-Ami Yassour
  2008-07-28 16:26   ` [PATCH 2/5] KVM: Add irq ack notifier list Ben-Ami Yassour
  2008-07-30 15:21 ` Device assignemnt: updated patches Avi Kivity
  1 sibling, 1 reply; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-07-28 16:26 UTC (permalink / raw)
  To: avi; +Cc: amit.shah, kvm, muli, benami, weidong.han, anthony

In some cases it is not enough to identify mmio memory slots by
pfn_valid. This patch adds checking the PageReserved as well.

Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
---
 virt/kvm/kvm_main.c |   22 +++++++++++++++-------
 1 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a845890..e4db5b2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -76,6 +76,14 @@ static inline int valid_vcpu(int n)
 	return likely(n >= 0 && n < KVM_MAX_VCPUS);
 }
 
+static inline int is_mmio_pfn(pfn_t pfn)
+{
+	if (pfn_valid(pfn))
+		return PageReserved(pfn_to_page(pfn));
+
+	return true;
+}
+
 /*
  * Switches to specified vcpu, until a matching vcpu_put()
  */
@@ -593,7 +601,7 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 		}
 
 		pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
-		BUG_ON(pfn_valid(pfn));
+		BUG_ON(!is_mmio_pfn(pfn));
 	} else
 		pfn = page_to_pfn(page[0]);
 
@@ -607,10 +615,10 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 	pfn_t pfn;
 
 	pfn = gfn_to_pfn(kvm, gfn);
-	if (pfn_valid(pfn))
+	if (!is_mmio_pfn(pfn))
 		return pfn_to_page(pfn);
 
-	WARN_ON(!pfn_valid(pfn));
+	WARN_ON(is_mmio_pfn(pfn));
 
 	get_page(bad_page);
 	return bad_page;
@@ -626,7 +634,7 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean);
 
 void kvm_release_pfn_clean(pfn_t pfn)
 {
-	if (pfn_valid(pfn))
+	if (!is_mmio_pfn(pfn))
 		put_page(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
@@ -652,7 +660,7 @@ EXPORT_SYMBOL_GPL(kvm_set_page_dirty);
 
 void kvm_set_pfn_dirty(pfn_t pfn)
 {
-	if (pfn_valid(pfn)) {
+	if (!is_mmio_pfn(pfn)) {
 		struct page *page = pfn_to_page(pfn);
 		if (!PageReserved(page))
 			SetPageDirty(page);
@@ -662,14 +670,14 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
 
 void kvm_set_pfn_accessed(pfn_t pfn)
 {
-	if (pfn_valid(pfn))
+	if (!is_mmio_pfn(pfn))
 		mark_page_accessed(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
 
 void kvm_get_pfn(pfn_t pfn)
 {
-	if (pfn_valid(pfn))
+	if (!is_mmio_pfn(pfn))
 		get_page(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_get_pfn);
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/5] KVM: Add irq ack notifier list
  2008-07-28 16:26 ` [PATCH 1/5] KVM: PCIPT: direct mmio pfn check Ben-Ami Yassour
@ 2008-07-28 16:26   ` Ben-Ami Yassour
  2008-07-28 16:26     ` [PATCH 3/5] KVM: pci device assignment Ben-Ami Yassour
  2008-07-29  7:14     ` [PATCH 2/5] KVM: Add irq ack notifier list Yang, Sheng
  0 siblings, 2 replies; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-07-28 16:26 UTC (permalink / raw)
  To: avi; +Cc: amit.shah, kvm, muli, benami, weidong.han, anthony

FROM: Avi Kivity <avi@qumranet.com>

This can be used by kvm subsystems that are interested in when
interrupts
are acked, for example time drift compenstation.

[Ben: add notification call to the pic and ioapic]

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
---
 arch/x86/kvm/i8259.c       |    1 +
 arch/x86/kvm/irq.c         |   22 ++++++++++++++++++++++
 arch/x86/kvm/irq.h         |    5 +++++
 include/asm-x86/kvm_host.h |    7 +++++++
 virt/kvm/ioapic.c          |    2 ++
 5 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 55e179a..d2a61bf 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -186,6 +186,7 @@ int kvm_pic_read_irq(struct kvm_pic *s)
 		irq = 7;
 		intno = s->pics[0].irq_base + irq;
 	}
+	kvm_notify_acked_irq(s->irq_request_opaque, irq);
 	pic_update_irq(s);
 
 	return intno;
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 0d9e552..9091195 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -111,3 +111,25 @@ void kvm_set_irq(struct kvm *kvm, int irq, int level)
 	kvm_ioapic_set_irq(kvm->arch.vioapic, irq, level);
 	kvm_pic_set_irq(pic_irqchip(kvm), irq, level);
 }
+
+void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi)
+{
+	struct kvm_irq_ack_notifier *kian;
+	struct hlist_node *n;
+
+	hlist_for_each_entry(kian, n, &kvm->arch.irq_ack_notifier_list, link)
+		if (kian->gsi == gsi)
+			kian->irq_acked(kian);
+}
+
+void kvm_register_irq_ack_notifier(struct kvm *kvm,
+				   struct kvm_irq_ack_notifier *kian)
+{
+	hlist_add_head(&kian->link, &kvm->arch.irq_ack_notifier_list);
+}
+
+void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
+				     struct kvm_irq_ack_notifier *kian)
+{
+	hlist_del(&kian->link);
+}
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 07ff2ae..95fe718 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -83,6 +83,11 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
 void kvm_pic_reset(struct kvm_kpic_state *s);
 
 void kvm_set_irq(struct kvm *kvm, int irq, int level);
+void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi);
+void kvm_register_irq_ack_notifier(struct kvm *kvm,
+				   struct kvm_irq_ack_notifier *kian);
+void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
+				     struct kvm_irq_ack_notifier *kian);
 
 void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 0b6b996..1b9d335 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -319,6 +319,12 @@ struct kvm_mem_alias {
 	gfn_t target_gfn;
 };
 
+struct kvm_irq_ack_notifier {
+	struct hlist_node link;
+	unsigned gsi;
+	void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
+};
+
 struct kvm_arch{
 	int naliases;
 	struct kvm_mem_alias aliases[KVM_ALIAS_SLOTS];
@@ -334,6 +340,7 @@ struct kvm_arch{
 	struct kvm_pic *vpic;
 	struct kvm_ioapic *vioapic;
 	struct kvm_pit *vpit;
+	struct hlist_head irq_ack_notifier_list;
 
 	int round_robin_prev_vcpu;
 	unsigned int tss_addr;
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index c0d2287..d012f6a 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -39,6 +39,7 @@
 
 #include "ioapic.h"
 #include "lapic.h"
+#include "irq.h"
 
 #if 0
 #define ioapic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg)
@@ -293,6 +294,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi)
 	ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
 
 	ent->fields.remote_irr = 0;
+	kvm_notify_acked_irq(ioapic->kvm, gsi);
 	if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
 		ioapic_service(ioapic, gsi);
 }
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/5] KVM: pci device assignment
  2008-07-28 16:26   ` [PATCH 2/5] KVM: Add irq ack notifier list Ben-Ami Yassour
@ 2008-07-28 16:26     ` Ben-Ami Yassour
  2008-07-28 16:26       ` [PATCH 4/5] VT-d: changes to support KVM Ben-Ami Yassour
                         ` (2 more replies)
  2008-07-29  7:14     ` [PATCH 2/5] KVM: Add irq ack notifier list Yang, Sheng
  1 sibling, 3 replies; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-07-28 16:26 UTC (permalink / raw)
  To: avi; +Cc: amit.shah, kvm, muli, benami, weidong.han, anthony

Based on a patch from: Amit Shah <amit.shah@qumranet.com>

This patch adds support for handling PCI devices that are assigned to
the guest.

The device to be assigned to the guest is registered in the host kernel
and interrupt delivery is handled. If a device is already assigned, or
the device driver for it is still loaded on the host, the device
assignment
is failed by conveying a -EBUSY reply to the userspace.

Devices that share their interrupt line are not supported at the moment.

By itself, this patch will not make devices work within the guest.
The VT-d extension is required to enable the device to perform DMA.
Another alternative is PVDMA.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
---
 arch/x86/kvm/x86.c         |  243 ++++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/kvm_host.h |   16 +++
 include/linux/kvm.h        |   19 ++++
 3 files changed, 278 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f7726a0..968f949 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4,10 +4,14 @@
  * derived from drivers/kvm/kvm_main.c
  *
  * Copyright (C) 2006 Qumranet, Inc.
+ * Copyright (C) 2008 Qumranet, Inc.
+ * Copyright IBM Corporation, 2008
  *
  * Authors:
  *   Avi Kivity   <avi@qumranet.com>
  *   Yaniv Kamay  <yaniv@qumranet.com>
+ *   Amit Shah    <amit.shah@qumranet.com>
+ *   Ben-Ami Yassour <benami@il.ibm.com>
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -23,8 +27,10 @@
 #include "x86.h"
 
 #include <linux/clocksource.h>
+#include <linux/interrupt.h>
 #include <linux/kvm.h>
 #include <linux/fs.h>
+#include <linux/pci.h>
 #include <linux/vmalloc.h>
 #include <linux/module.h>
 #include <linux/mman.h>
@@ -98,6 +104,219 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ NULL }
 };
 
+struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head,
+						      int assigned_dev_id)
+{
+	struct list_head *ptr;
+	struct kvm_assigned_dev_kernel *match;
+
+	list_for_each(ptr, head) {
+		match = list_entry(ptr, struct kvm_assigned_dev_kernel, list);
+		if (match->assigned_dev_id == assigned_dev_id)
+			return match;
+	}
+	return NULL;
+}
+
+static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
+{
+	struct kvm_assigned_dev_kernel *assigned_dev;
+
+	assigned_dev = container_of(work, struct kvm_assigned_dev_kernel,
+				    interrupt_work);
+
+	/* This is taken to safely inject irq inside the guest. When
+	 * the interrupt injection (or the ioapic code) uses a
+	 * finer-grained lock, update this
+	 */
+	mutex_lock(&assigned_dev->kvm->lock);
+	kvm_set_irq(assigned_dev->kvm,
+		    assigned_dev->guest_irq, 1);
+	mutex_unlock(&assigned_dev->kvm->lock);
+	kvm_put_kvm(assigned_dev->kvm);
+}
+
+/* FIXME: Implement the OR logic needed to make shared interrupts on
+ * this line behave properly
+ */
+static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
+{
+	struct kvm_assigned_dev_kernel *assigned_dev =
+		(struct kvm_assigned_dev_kernel *) dev_id;
+
+	kvm_get_kvm(assigned_dev->kvm);
+	schedule_work(&assigned_dev->interrupt_work);
+	disable_irq_nosync(irq);
+	return IRQ_HANDLED;
+}
+
+/* Ack the irq line for an assigned device */
+static void kvm_assigned_dev_ack_irq(struct kvm_irq_ack_notifier *kian)
+{
+	struct kvm_assigned_dev_kernel *dev;
+
+	if (kian->gsi == -1)
+		return;
+
+	dev = container_of(kian, struct kvm_assigned_dev_kernel,
+			   ack_notifier);
+	kvm_set_irq(dev->kvm, dev->guest_irq, 0);
+	enable_irq(dev->host_irq);
+}
+
+static int kvm_vm_ioctl_assign_irq(struct kvm *kvm,
+				   struct kvm_assigned_irq
+				   *assigned_irq)
+{
+	int r = 0;
+	struct kvm_assigned_dev_kernel *match;
+
+	mutex_lock(&kvm->lock);
+
+	match = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
+				      assigned_irq->assigned_dev_id);
+	if (!match) {
+		mutex_unlock(&kvm->lock);
+		return -EINVAL;
+	}
+
+	if (match->irq_requested) {
+		match->guest_irq = assigned_irq->guest_irq;
+		match->ack_notifier.gsi = assigned_irq->guest_irq;
+		mutex_unlock(&kvm->lock);
+		return 0;
+	}
+
+	INIT_WORK(&match->interrupt_work,
+		  kvm_assigned_dev_interrupt_work_handler);
+
+	if (irqchip_in_kernel(kvm)) {
+		if (assigned_irq->host_irq)
+			match->host_irq = assigned_irq->host_irq;
+		else
+			match->host_irq = match->dev->irq;
+		match->guest_irq = assigned_irq->guest_irq;
+		match->ack_notifier.gsi = assigned_irq->guest_irq;
+		match->ack_notifier.irq_acked = kvm_assigned_dev_ack_irq;
+		kvm_register_irq_ack_notifier(kvm, &match->ack_notifier);
+
+		/* Even though this is PCI, we don't want to use shared
+		 * interrupts. Sharing host devices with guest-assigned devices
+		 * on the same interrupt line is not a happy situation: there
+		 * are going to be long delays in accepting, acking, etc.
+		 */
+		if (request_irq(match->host_irq, kvm_assigned_dev_intr, 0,
+				"kvm_assigned_device", (void *)match)) {
+			printk(KERN_INFO "%s: couldn't allocate irq for pv "
+			       "device\n", __func__);
+			r = -EIO;
+			goto out;
+		}
+	}
+
+	match->irq_requested = true;
+out:
+	mutex_unlock(&kvm->lock);
+	return r;
+}
+
+static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
+				      struct kvm_assigned_pci_dev *assigned_dev)
+{
+	int r = 0;
+	struct kvm_assigned_dev_kernel *match;
+	struct pci_dev *dev;
+
+	mutex_lock(&kvm->lock);
+
+	match = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
+				      assigned_dev->assigned_dev_id);
+	if (match) {
+		/* device already assigned */
+		r = -EINVAL;
+		goto out;
+	}
+
+	match = kzalloc(sizeof(struct kvm_assigned_dev_kernel), GFP_KERNEL);
+	if (match == NULL) {
+		printk(KERN_INFO "%s: Couldn't allocate memory\n",
+		       __func__);
+		r = -ENOMEM;
+		goto out;
+	}
+	dev = pci_get_bus_and_slot(assigned_dev->busnr,
+				   assigned_dev->devfn);
+	if (!dev) {
+		printk(KERN_INFO "%s: host device not found\n", __func__);
+		r = -EINVAL;
+		goto out_free;
+	}
+	if (pci_enable_device(dev)) {
+		printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
+		r = -EBUSY;
+		goto out_put;
+	}
+	r = pci_request_regions(dev, "kvm_assigned_device");
+	if (r) {
+		printk(KERN_INFO "%s: Could not get access to device regions\n",
+		       __func__);
+		goto out_disable;
+	}
+	match->assigned_dev_id = assigned_dev->assigned_dev_id;
+	match->host_busnr = assigned_dev->busnr;
+	match->host_devfn = assigned_dev->devfn;
+	match->dev = dev;
+
+	match->kvm = kvm;
+
+	list_add(&match->list, &kvm->arch.assigned_dev_head);
+
+out:
+	mutex_unlock(&kvm->lock);
+	return r;
+out_disable:
+	pci_disable_device(dev);
+out_put:
+	pci_dev_put(dev);
+out_free:
+	kfree(match);
+	mutex_unlock(&kvm->lock);
+	return r;
+}
+
+static void kvm_free_assigned_devices(struct kvm *kvm)
+{
+	struct list_head *ptr, *ptr2;
+	struct kvm_assigned_dev_kernel *assigned_dev;
+
+	list_for_each_safe(ptr, ptr2, &kvm->arch.assigned_dev_head) {
+		assigned_dev = list_entry(ptr,
+					  struct kvm_assigned_dev_kernel,
+					  list);
+
+		if (irqchip_in_kernel(kvm) && assigned_dev->irq_requested) {
+			free_irq(assigned_dev->host_irq,
+				 (void *)assigned_dev);
+
+			kvm_unregister_irq_ack_notifier(kvm,
+							&assigned_dev->
+							ack_notifier);
+		}
+
+		if (cancel_work_sync(&assigned_dev->interrupt_work))
+			/* We had pending work. That means we will have to take
+			 * care of kvm_put_kvm.
+			 */
+			kvm_put_kvm(kvm);
+
+		pci_release_regions(assigned_dev->dev);
+		pci_disable_device(assigned_dev->dev);
+		pci_dev_put(assigned_dev->dev);
+
+		list_del(&assigned_dev->list);
+		kfree(assigned_dev);
+	}
+}
 
 unsigned long segment_base(u16 selector)
 {
@@ -1763,6 +1982,28 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_ASSIGN_PCI_DEVICE: {
+		struct kvm_assigned_pci_dev assigned_dev;
+
+		r = -EFAULT;
+		if (copy_from_user(&assigned_dev, argp, sizeof assigned_dev))
+			goto out;
+		r = kvm_vm_ioctl_assign_device(kvm, &assigned_dev);
+		if (r)
+			goto out;
+		break;
+	}
+	case KVM_ASSIGN_IRQ: {
+		struct kvm_assigned_irq assigned_irq;
+
+		r = -EFAULT;
+		if (copy_from_user(&assigned_irq, argp, sizeof assigned_irq))
+			goto out;
+		r = kvm_vm_ioctl_assign_irq(kvm, &assigned_irq);
+		if (r)
+			goto out;
+		break;
+	}
 	case KVM_GET_PIT: {
 		struct kvm_pit_state ps;
 		r = -EFAULT;
@@ -3942,6 +4183,7 @@ struct  kvm *kvm_arch_create_vm(void)
 		return ERR_PTR(-ENOMEM);
 
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
+	INIT_LIST_HEAD(&kvm->arch.assigned_dev_head);
 
 	return kvm;
 }
@@ -3974,6 +4216,7 @@ static void kvm_free_vcpus(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
+	kvm_free_assigned_devices(kvm);
 	kvm_free_pit(kvm);
 	kfree(kvm->arch.vpic);
 	kfree(kvm->arch.vioapic);
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 1b9d335..ed8f561 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -325,6 +325,21 @@ struct kvm_irq_ack_notifier {
 	void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
 };
 
+struct kvm_assigned_dev_kernel {
+	struct kvm_irq_ack_notifier ack_notifier;
+	struct work_struct interrupt_work;
+	struct list_head list;
+	struct kvm_assigned_pci_dev assigned_dev;
+	int assigned_dev_id;
+	int host_busnr;
+	int host_devfn;
+	int host_irq;
+	int guest_irq;
+	int irq_requested;
+	struct pci_dev *dev;
+	struct kvm *kvm;
+};
+
 struct kvm_arch{
 	int naliases;
 	struct kvm_mem_alias aliases[KVM_ALIAS_SLOTS];
@@ -337,6 +352,7 @@ struct kvm_arch{
 	 * Hash table of struct kvm_mmu_page.
 	 */
 	struct list_head active_mmu_pages;
+	struct list_head assigned_dev_head;
 	struct kvm_pic *vpic;
 	struct kvm_ioapic *vioapic;
 	struct kvm_pit *vpit;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 6edba45..70a896a 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -382,6 +382,7 @@ struct kvm_trace_rec {
 #define KVM_CAP_PV_MMU 13
 #define KVM_CAP_MP_STATE 14
 #define KVM_CAP_COALESCED_MMIO 15
+#define KVM_CAP_DEVICE_ASSIGNMENT 16
 
 /*
  * ioctls for VM fds
@@ -411,6 +412,10 @@ struct kvm_trace_rec {
 			_IOW(KVMIO,  0x67, struct kvm_coalesced_mmio_zone)
 #define KVM_UNREGISTER_COALESCED_MMIO \
 			_IOW(KVMIO,  0x68, struct kvm_coalesced_mmio_zone)
+#define KVM_ASSIGN_PCI_DEVICE _IOR(KVMIO, 0x69, \
+				   struct kvm_assigned_pci_dev)
+#define KVM_ASSIGN_IRQ _IOR(KVMIO, 0x70, \
+			    struct kvm_assigned_irq)
 
 /*
  * ioctls for vcpu fds
@@ -475,4 +480,18 @@ struct kvm_trace_rec {
 #define KVM_TRC_STLB_INVAL       (KVM_TRC_HANDLER + 0x18)
 #define KVM_TRC_PPC_INSTR        (KVM_TRC_HANDLER + 0x19)
 
+struct kvm_assigned_pci_dev {
+	__u32 assigned_dev_id;
+	__u32 busnr;
+	__u32 devfn;
+	__u32 flags;
+};
+
+struct kvm_assigned_irq {
+	__u32 assigned_dev_id;
+	__u32 host_irq;
+	__u32 guest_irq;
+	__u32 flags;
+};
+
 #endif
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/5] VT-d: changes to support KVM
  2008-07-28 16:26     ` [PATCH 3/5] KVM: pci device assignment Ben-Ami Yassour
@ 2008-07-28 16:26       ` Ben-Ami Yassour
  2008-07-28 16:26         ` [PATCH 5/5] This patch extends the VT-d driver " Ben-Ami Yassour
  2008-07-29  9:19       ` [PATCH 3/5] KVM: pci device assignment Amit Shah
  2008-07-29 12:27       ` Ben-Ami Yassour
  2 siblings, 1 reply; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-07-28 16:26 UTC (permalink / raw)
  To: avi; +Cc: amit.shah, kvm, muli, benami, weidong.han, anthony, Kay, Allen M

From: Kay, Allen M <allen.m.kay@intel.com>

This patch extends the VT-d driver to support KVM

[Ben: fixed memory pinning]

Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
---
 drivers/pci/dmar.c                           |    4 +-
 drivers/pci/intel-iommu.c                    |  117 +++++++++++++++++++++++++-
 drivers/pci/iova.c                           |    2 +-
 {drivers/pci => include/linux}/intel-iommu.h |   11 +++
 {drivers/pci => include/linux}/iova.h        |    0 
 5 files changed, 127 insertions(+), 7 deletions(-)
 rename {drivers/pci => include/linux}/intel-iommu.h (94%)
 rename {drivers/pci => include/linux}/iova.h (100%)

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index 8bf86ae..1df28ea 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -26,8 +26,8 @@
 
 #include <linux/pci.h>
 #include <linux/dmar.h>
-#include "iova.h"
-#include "intel-iommu.h"
+#include <linux/iova.h>
+#include <linux/intel-iommu.h>
 
 #undef PREFIX
 #define PREFIX "DMAR:"
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 8d0e60a..1eefc60 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -20,6 +20,7 @@
  * Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
  */
 
+#undef DEBUG
 #include <linux/init.h>
 #include <linux/bitmap.h>
 #include <linux/debugfs.h>
@@ -33,8 +34,8 @@
 #include <linux/dma-mapping.h>
 #include <linux/mempool.h>
 #include <linux/timer.h>
-#include "iova.h"
-#include "intel-iommu.h"
+#include <linux/iova.h>
+#include <linux/intel-iommu.h>
 #include <asm/proto.h> /* force_iommu in this header in x86-64*/
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
@@ -160,7 +161,7 @@ static inline void *alloc_domain_mem(void)
 	return iommu_kmem_cache_alloc(iommu_domain_cache);
 }
 
-static inline void free_domain_mem(void *vaddr)
+static void free_domain_mem(void *vaddr)
 {
 	kmem_cache_free(iommu_domain_cache, vaddr);
 }
@@ -1414,7 +1415,7 @@ static void domain_remove_dev_info(struct dmar_domain *domain)
  * find_domain
  * Note: we use struct pci_dev->dev.archdata.iommu stores the info
  */
-struct dmar_domain *
+static struct dmar_domain *
 find_domain(struct pci_dev *pdev)
 {
 	struct device_domain_info *info;
@@ -2430,3 +2431,111 @@ int __init intel_iommu_init(void)
 	return 0;
 }
 
+void intel_iommu_domain_exit(struct dmar_domain *domain)
+{
+	u64 end;
+
+	/* Domain 0 is reserved, so dont process it */
+	if (!domain)
+		return;
+
+	end = DOMAIN_MAX_ADDR(domain->gaw);
+	end = end & (~PAGE_MASK_4K);
+
+	/* clear ptes */
+	dma_pte_clear_range(domain, 0, end);
+
+	/* free page tables */
+	dma_pte_free_pagetable(domain, 0, end);
+
+	iommu_free_domain(domain);
+	free_domain_mem(domain);
+}
+EXPORT_SYMBOL_GPL(intel_iommu_domain_exit);
+
+struct dmar_domain *intel_iommu_domain_alloc(struct pci_dev *pdev)
+{
+	struct dmar_drhd_unit *drhd;
+	struct dmar_domain *domain;
+	struct intel_iommu *iommu;
+
+	drhd = dmar_find_matched_drhd_unit(pdev);
+	if (!drhd) {
+		printk(KERN_ERR "intel_iommu_domain_alloc: drhd == NULL\n");
+		return NULL;
+	}
+
+	iommu = drhd->iommu;
+	if (!iommu) {
+		printk(KERN_ERR
+			"intel_iommu_domain_alloc: iommu == NULL\n");
+		return NULL;
+	}
+	domain = iommu_alloc_domain(iommu);
+	if (!domain) {
+		printk(KERN_ERR
+			"intel_iommu_domain_alloc: domain == NULL\n");
+		return NULL;
+	}
+	if (domain_init(domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
+		printk(KERN_ERR
+			"intel_iommu_domain_alloc: domain_init() failed\n");
+		intel_iommu_domain_exit(domain);
+		return NULL;
+	}
+	return domain;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_domain_alloc);
+
+int intel_iommu_context_mapping(
+	struct dmar_domain *domain, struct pci_dev *pdev)
+{
+	int rc;
+	rc = domain_context_mapping(domain, pdev);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_context_mapping);
+
+int intel_iommu_page_mapping(
+	struct dmar_domain *domain, dma_addr_t iova,
+	u64 hpa, size_t size, int prot)
+{
+	int rc;
+	rc = domain_page_mapping(domain, iova, hpa, size, prot);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_page_mapping);
+
+void intel_iommu_detach_dev(struct dmar_domain *domain, u8 bus, u8 devfn)
+{
+	detach_domain_for_dev(domain, bus, devfn);
+}
+EXPORT_SYMBOL_GPL(intel_iommu_detach_dev);
+
+struct dmar_domain *
+intel_iommu_find_domain(struct pci_dev *pdev)
+{
+	return find_domain(pdev);
+}
+EXPORT_SYMBOL_GPL(intel_iommu_find_domain);
+
+int intel_iommu_found(void)
+{
+	return g_num_of_iommus;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_found);
+
+u64 intel_iommu_iova_to_pfn(struct dmar_domain *domain, u64 iova)
+{
+	struct dma_pte *pte;
+	u64 pfn;
+
+	pfn = 0;
+	pte = addr_to_dma_pte(domain, iova);
+
+	if (pte)
+		pfn = dma_pte_addr(*pte);
+
+	return pfn >> PAGE_SHIFT_4K;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_iova_to_pfn);
diff --git a/drivers/pci/iova.c b/drivers/pci/iova.c
index 3ef4ac0..2287116 100644
--- a/drivers/pci/iova.c
+++ b/drivers/pci/iova.c
@@ -7,7 +7,7 @@
  * Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
  */
 
-#include "iova.h"
+#include <linux/iova.h>
 
 void
 init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit)
diff --git a/drivers/pci/intel-iommu.h b/include/linux/intel-iommu.h
similarity index 94%
rename from drivers/pci/intel-iommu.h
rename to include/linux/intel-iommu.h
index afc0ad9..1490fc0 100644
--- a/drivers/pci/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -341,4 +341,15 @@ static inline void iommu_prepare_gfx_mapping(void)
 }
 #endif /* !CONFIG_DMAR_GFX_WA */
 
+void intel_iommu_domain_exit(struct dmar_domain *domain);
+struct dmar_domain *intel_iommu_domain_alloc(struct pci_dev *pdev);
+int intel_iommu_context_mapping(struct dmar_domain *domain,
+				struct pci_dev *pdev);
+int intel_iommu_page_mapping(struct dmar_domain *domain, dma_addr_t iova,
+			     u64 hpa, size_t size, int prot);
+void intel_iommu_detach_dev(struct dmar_domain *domain, u8 bus, u8 devfn);
+struct dmar_domain *intel_iommu_find_domain(struct pci_dev *pdev);
+int intel_iommu_found(void);
+u64 intel_iommu_iova_to_pfn(struct dmar_domain *domain, u64 iova);
+
 #endif
diff --git a/drivers/pci/iova.h b/include/linux/iova.h
similarity index 100%
rename from drivers/pci/iova.h
rename to include/linux/iova.h
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-07-28 16:26       ` [PATCH 4/5] VT-d: changes to support KVM Ben-Ami Yassour
@ 2008-07-28 16:26         ` Ben-Ami Yassour
  2008-07-28 16:32           ` Device assignment - userspace part Ben-Ami Yassour
                             ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-07-28 16:26 UTC (permalink / raw)
  To: avi; +Cc: amit.shah, kvm, muli, benami, weidong.han, anthony, Kay, Allen M

[Ben: fixed memory pinning]

Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
---
 arch/x86/kvm/Makefile      |    2 +-
 arch/x86/kvm/vtd.c         |  180 ++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c         |   12 +++
 include/asm-x86/kvm_host.h |    3 +
 include/linux/kvm_host.h   |   11 +++
 virt/kvm/kvm_main.c        |    8 ++-
 6 files changed, 214 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/kvm/vtd.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index d0e940b..5d9d079 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -11,7 +11,7 @@ endif
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
 
 kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
-	i8254.o
+	i8254.o vtd.o
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
new file mode 100644
index 0000000..1ecfdfd
--- /dev/null
+++ b/arch/x86/kvm/vtd.c
@@ -0,0 +1,180 @@
+/*
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Copyright (C) 2006-2008 Intel Corporation
+ * Copyright IBM Corporation, 2008
+ * Author: Allen M. Kay <allen.m.kay@intel.com>
+ * Author: Weidong Han <weidong.han@intel.com>
+ * Author: Ben-Ami Yassour <benami@il.ibm.com>
+ */
+
+#include <linux/list.h>
+#include <linux/kvm_host.h>
+#include <linux/pci.h>
+#include <linux/dmar.h>
+#include <linux/intel-iommu.h>
+
+static int kvm_iommu_unmap_memslots(struct kvm *kvm);
+
+int kvm_iommu_map_pages(struct kvm *kvm,
+			gfn_t base_gfn, unsigned long npages)
+{
+	gfn_t gfn = base_gfn;
+	pfn_t pfn;
+	int i, rc;
+	struct dmar_domain *domain = kvm->arch.intel_iommu_domain;
+
+	if (!domain)
+		return -EFAULT;
+
+	for (i = 0; i < npages; i++) {
+		pfn = gfn_to_pfn(kvm, gfn);
+		if (!is_mmio_pfn(pfn)) {
+			rc = intel_iommu_page_mapping(domain,
+						      gfn_to_gpa(gfn),
+						      pfn_to_hpa(pfn),
+						      PAGE_SIZE,
+						      DMA_PTE_READ |
+						      DMA_PTE_WRITE);
+			if (rc) {
+				kvm_release_pfn_clean(pfn);
+				printk(KERN_DEBUG "kvm_iommu_map_pages:"
+				       "iommu failed to map pfn=%lx\n", pfn);
+				return rc;
+			}
+		} else {
+			printk(KERN_DEBUG "kvm_iommu_map_page:"
+			       "invalid pfn=%lx\n", pfn);
+			return 0;
+		}
+
+		gfn++;
+	}
+	return 0;
+}
+
+static int kvm_iommu_map_memslots(struct kvm *kvm)
+{
+	int i, rc;
+	for (i = 0; i < kvm->nmemslots; i++) {
+		rc = kvm_iommu_map_pages(kvm, kvm->memslots[i].base_gfn,
+					 kvm->memslots[i].npages);
+		if (rc)
+			return rc;
+	}
+	return 0;
+}
+
+int kvm_iommu_map_guest(struct kvm *kvm,
+			struct kvm_assigned_dev_kernel *assigned_dev)
+{
+	struct pci_dev *pdev = NULL;
+	int rc;
+
+	printk(KERN_DEBUG "VT-d direct map: host bdf = %x:%x:%x\n",
+	       assigned_dev->host_busnr,
+	       PCI_SLOT(assigned_dev->host_devfn),
+	       PCI_FUNC(assigned_dev->host_devfn));
+
+	pdev = assigned_dev->dev;
+
+	if (pdev == NULL) {
+		if (kvm->arch.intel_iommu_domain) {
+			intel_iommu_domain_exit(kvm->arch.intel_iommu_domain);
+			kvm->arch.intel_iommu_domain = NULL;
+		}
+		return -ENODEV;
+	}
+
+	kvm->arch.intel_iommu_domain = intel_iommu_domain_alloc(pdev);
+
+	rc = kvm_iommu_map_memslots(kvm);
+	if (rc) {
+		kvm_iommu_unmap_memslots(kvm);
+		return rc;
+	}
+
+	intel_iommu_detach_dev(kvm->arch.intel_iommu_domain,
+			       pdev->bus->number, pdev->devfn);
+
+	rc = intel_iommu_context_mapping(kvm->arch.intel_iommu_domain,
+					 pdev);
+	if (rc) {
+		printk(KERN_ERR "Domain context map for %s failed",
+		       pci_name(pdev));
+		return rc;
+	}
+	return 0;
+}
+
+static void kvm_iommu_put_pages(struct kvm *kvm,
+			       gfn_t base_gfn, unsigned long npages)
+{
+	gfn_t gfn = base_gfn;
+	pfn_t pfn;
+	struct dmar_domain *domain = kvm->arch.intel_iommu_domain;
+	int i;
+
+	for (i = 0; i < npages; i++) {
+		pfn = (pfn_t)intel_iommu_iova_to_pfn(domain,
+						     gfn_to_gpa(gfn));
+		kvm_release_pfn_clean(pfn);
+		gfn++;
+	}
+}
+
+static int kvm_iommu_unmap_memslots(struct kvm *kvm)
+{
+	int i;
+	for (i = 0; i < kvm->nmemslots; i++) {
+		kvm_iommu_put_pages(kvm, kvm->memslots[i].base_gfn,
+				    kvm->memslots[i].npages);
+	}
+	return 0;
+}
+
+int kvm_iommu_unmap_guest(struct kvm *kvm)
+{
+	struct kvm_assigned_dev_kernel *entry;
+	struct pci_dev *pdev = NULL;
+	struct dmar_domain *domain = kvm->arch.intel_iommu_domain;
+
+	if (!domain)
+		return 0;
+
+	list_for_each_entry(entry, &kvm->arch.assigned_dev_head, list) {
+		printk(KERN_DEBUG "VT-d unmap: host bdf = %x:%x:%x\n",
+		       entry->host_busnr,
+		       PCI_SLOT(entry->host_devfn),
+		       PCI_FUNC(entry->host_devfn));
+
+		for_each_pci_dev(pdev) {
+			if ((pdev->bus->number == entry->host_busnr) &&
+			    (pdev->devfn == entry->host_devfn))
+				break;
+		}
+
+		if (pdev == NULL)
+			return -ENODEV;
+
+		/* detach kvm dmar domain */
+		intel_iommu_detach_dev(domain,
+				       pdev->bus->number, pdev->devfn);
+	}
+	kvm_iommu_unmap_memslots(kvm);
+	intel_iommu_domain_exit(domain);
+	return 0;
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 968f949..80ebfcd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -35,6 +35,7 @@
 #include <linux/module.h>
 #include <linux/mman.h>
 #include <linux/highmem.h>
+#include <linux/intel-iommu.h>
 
 #include <asm/uaccess.h>
 #include <asm/msr.h>
@@ -271,9 +272,18 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 
 	list_add(&match->list, &kvm->arch.assigned_dev_head);
 
+	if (intel_iommu_found()) {
+		r = kvm_iommu_map_guest(kvm, match);
+		if (r)
+			goto out_list_del;
+	}
+
 out:
 	mutex_unlock(&kvm->lock);
 	return r;
+out_list_del:
+	list_del(&match->list);
+	pci_release_regions(dev);
 out_disable:
 	pci_disable_device(dev);
 out_put:
@@ -4216,6 +4226,8 @@ static void kvm_free_vcpus(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
+	if (intel_iommu_found())
+		kvm_iommu_unmap_guest(kvm);
 	kvm_free_assigned_devices(kvm);
 	kvm_free_pit(kvm);
 	kfree(kvm->arch.vpic);
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index ed8f561..5c0917b 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -353,6 +353,7 @@ struct kvm_arch{
 	 */
 	struct list_head active_mmu_pages;
 	struct list_head assigned_dev_head;
+	struct dmar_domain *intel_iommu_domain;
 	struct kvm_pic *vpic;
 	struct kvm_ioapic *vioapic;
 	struct kvm_pit *vpit;
@@ -502,6 +503,8 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
 		  gpa_t addr, unsigned long *ret);
 
+int is_mmio_pfn(pfn_t pfn);
+
 extern bool tdp_enabled;
 
 enum emulation_result {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3798097..2e2c9eb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -279,6 +279,12 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v);
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 
+int kvm_iommu_map_pages(struct kvm *kvm, gfn_t base_gfn,
+			unsigned long npages);
+int kvm_iommu_map_guest(struct kvm *kvm,
+			struct kvm_assigned_dev_kernel *assigned_dev);
+int kvm_iommu_unmap_guest(struct kvm *kvm);
+
 static inline void kvm_guest_enter(void)
 {
 	account_system_vtime(current);
@@ -301,6 +307,11 @@ static inline gpa_t gfn_to_gpa(gfn_t gfn)
 	return (gpa_t)gfn << PAGE_SHIFT;
 }
 
+static inline hpa_t pfn_to_hpa(pfn_t pfn)
+{
+	return (hpa_t)pfn << PAGE_SHIFT;
+}
+
 static inline void kvm_migrate_timers(struct kvm_vcpu *vcpu)
 {
 	set_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e4db5b2..94c519b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -41,6 +41,7 @@
 #include <linux/pagemap.h>
 #include <linux/mman.h>
 #include <linux/swap.h>
+#include <linux/intel-iommu.h>
 
 #include <asm/processor.h>
 #include <asm/io.h>
@@ -76,7 +77,7 @@ static inline int valid_vcpu(int n)
 	return likely(n >= 0 && n < KVM_MAX_VCPUS);
 }
 
-static inline int is_mmio_pfn(pfn_t pfn)
+inline int is_mmio_pfn(pfn_t pfn)
 {
 	if (pfn_valid(pfn))
 		return PageReserved(pfn_to_page(pfn));
@@ -431,6 +432,11 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	}
 
 	kvm_free_physmem_slot(&old, &new);
+
+	/* map the pages in iommu page table */
+	if (intel_iommu_found())
+		kvm_iommu_map_pages(kvm, base_gfn, npages);
+
 	return 0;
 
 out_free:
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Device assignment - userspace part
  2008-07-28 16:26         ` [PATCH 5/5] This patch extends the VT-d driver " Ben-Ami Yassour
@ 2008-07-28 16:32           ` Ben-Ami Yassour
  2008-07-28 16:32             ` [PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guest Ben-Ami Yassour
  2008-07-29  7:28           ` [PATCH 5/5] This patch extends the VT-d driver to support KVM Yang, Sheng
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-07-28 16:32 UTC (permalink / raw)
  To: avi; +Cc: amit.shah, kvm, muli, benami, weidong.han, anthony

Follwing is the patch for the userspace part



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guest
  2008-07-28 16:32           ` Device assignment - userspace part Ben-Ami Yassour
@ 2008-07-28 16:32             ` Ben-Ami Yassour
  2008-08-01  3:09               ` Han, Weidong
  0 siblings, 1 reply; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-07-28 16:32 UTC (permalink / raw)
  To: avi
  Cc: amit.shah, kvm, muli, benami, weidong.han, anthony, Nir Peleg,
	Glauber de Oliveira Costa

Based on a patch from: Amit Shah <amit.shah@qumranet.com>

We can assign a device from the host machine to a guest. The
original code comes from Neocleus.

A new command-line option, -pcidevice is added.
For example, to invoke it for an Ethernet device sitting at
PCI bus:dev.fn 04:08.0 with host IRQ 18, use this:

        -pcidevice Ethernet/04:08.0

The host ethernet driver is to be removed before doing assigning
the device to a guest. If not, the device assignment fails but
the guest continues without the assignment.

If kvm uses the in-kernel irqchip, interrupts are routed to the
guest via the kvm module (accompanied kernel changes are
necessary).

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Nir Peleg <nir@tutis.com>
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
---
 libkvm/libkvm-x86.c         |   13 +
 libkvm/libkvm.h             |   26 ++
 qemu/Makefile.target        |    1 +
 qemu/hw/device-assignment.c |  595 +++++++++++++++++++++++++++++++++++++++++++
 qemu/hw/device-assignment.h |   95 +++++++
 qemu/hw/isa.h               |    2 +
 qemu/hw/pc.c                |    9 +
 qemu/hw/pci.c               |   12 +
 qemu/hw/pci.h               |    1 +
 qemu/hw/piix_pci.c          |   19 ++
 qemu/vl.c                   |   17 ++
 11 files changed, 790 insertions(+), 0 deletions(-)
 create mode 100644 qemu/hw/device-assignment.c
 create mode 100644 qemu/hw/device-assignment.h

diff --git a/libkvm/libkvm-x86.c b/libkvm/libkvm-x86.c
index ea97bdd..8f7a6fb 100644
--- a/libkvm/libkvm-x86.c
+++ b/libkvm/libkvm-x86.c
@@ -126,6 +126,19 @@ static int kvm_init_tss(kvm_context_t kvm)
 	return 0;
 }
 
+#ifdef KVM_CAP_DEVICE_ASSIGNMENT
+int kvm_assign_pci_device(kvm_context_t kvm,
+			  struct kvm_assigned_pci_dev *assigned_dev)
+{
+	return ioctl(kvm->vm_fd, KVM_ASSIGN_PCI_DEVICE, assigned_dev);
+}
+int kvm_assign_irq(kvm_context_t kvm,
+			  struct kvm_assigned_irq *assigned_irq)
+{
+	return ioctl(kvm->vm_fd, KVM_ASSIGN_IRQ, assigned_irq);
+}
+#endif
+
 int kvm_arch_create_default_phys_mem(kvm_context_t kvm,
 				       unsigned long phys_mem_bytes,
 				       void **vm_mem)
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 9f06fcc..7929d51 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -647,6 +647,32 @@ int kvm_disable_tpr_access_reporting(kvm_context_t kvm, int vcpu);
 
 int kvm_enable_vapic(kvm_context_t kvm, int vcpu, uint64_t vapic);
 
+#ifdef KVM_CAP_DEVICE_ASSIGNMENT
+/*!
+ * \brief Notifies host kernel aboud a PCI device assigned to guest
+ *
+ * Used for PCI device assignment, this function notifies the host
+ * kernel about the assigning of the physical PCI device.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param assigned_dev Parameters, like bus, devfn number, etc
+ */
+int kvm_assign_pci_device(kvm_context_t kvm,
+			  struct kvm_assigned_pci_dev *assigned_dev);
+
+/*!
+ * \brief Notifies host kernel about changes to a irq assignment
+ *
+ * Used for PCI device assignment, this function notifies the host
+ * kernel about the assigning of the irq for an assigned physical
+ * PCI device.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
+ */
+int kvm_assign_irq(kvm_context_t kvm,
+		   struct kvm_assigned_irq *assigned_irq);
+#endif
 #endif
 
 #if defined(__s390__)
diff --git a/qemu/Makefile.target b/qemu/Makefile.target
index f4df081..32aa09a 100644
--- a/qemu/Makefile.target
+++ b/qemu/Makefile.target
@@ -602,6 +602,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o
+OBJS+= device-assignment.o
 ifeq ($(USE_KVM_PIT), 1)
 OBJS+= i8254-kvm.o
 endif
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
new file mode 100644
index 0000000..44120c0
--- /dev/null
+++ b/qemu/hw/device-assignment.c
@@ -0,0 +1,595 @@
+/*
+ * Copyright (c) 2007, Neocleus Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ *
+ *  Pass a PCI device from the host to a guest VM.
+ *
+ *  Adapted for KVM by Qumranet.
+ *
+ *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
+ *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
+ *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
+ */
+#include <stdio.h>
+#include <pthread.h>
+#include <sys/io.h>
+#include <sys/ioctl.h>
+#include <linux/types.h>
+
+/* From linux/ioport.h */
+#define IORESOURCE_IO		0x00000100	/* Resource type */
+#define IORESOURCE_MEM		0x00000200
+#define IORESOURCE_IRQ		0x00000400
+#define IORESOURCE_DMA		0x00000800
+#define IORESOURCE_PREFETCH	0x00001000	/* No side effects */
+
+#include "device-assignment.h"
+#include "irq.h"
+
+#include "qemu-kvm.h"
+#include <linux/kvm_para.h>
+extern FILE *logfile;
+
+/* #define DEVICE_ASSIGNMENT_DEBUG */
+
+#ifdef DEVICE_ASSIGNMENT_DEBUG
+#define DEBUG(fmt, args...) fprintf(stderr, "%s: " fmt, __func__ , ## args)
+#else
+#define DEBUG(fmt, args...)
+#endif
+
+#define assigned_dev_ioport_write(suffix)				\
+static void assigned_dev_ioport_write##suffix(void *opaque,             \
+					      uint32_t addr,            \
+                                              uint32_t value)		\
+{									\
+	assigned_dev_region_t *r_access =                               \
+                  (assigned_dev_region_t *)opaque;			\
+	uint32_t r_pio = (unsigned long)r_access->r_virtbase		\
+		+ (addr - r_access->e_physbase);			\
+	if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {		\
+		fprintf(logfile, "assigned_dev_ioport_write" #suffix	\
+			": r_pio=%08x e_physbase=%08x"			\
+			" r_virtbase=%08lx value=%08x\n",		\
+			r_pio, (int)r_access->e_physbase,		\
+			(unsigned long)r_access->r_virtbase, value);	\
+	}								\
+	iopl(3);\
+	out##suffix(value, r_pio);					\
+}
+assigned_dev_ioport_write(b)
+assigned_dev_ioport_write(w)
+assigned_dev_ioport_write(l)
+
+#define assigned_dev_ioport_read(suffix)				\
+static uint32_t assigned_dev_ioport_read##suffix(void *opaque,          \
+                                                 uint32_t addr)  	\
+{									\
+	assigned_dev_region_t *r_access =                               \
+                    (assigned_dev_region_t *)opaque;     		\
+	uint32_t r_pio = (addr - r_access->e_physbase)			\
+		+ (unsigned long)r_access->r_virtbase;			\
+		uint32_t value = in##suffix(r_pio);			\
+		if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {	\
+			fprintf(logfile, "assigned_dev_ioport_read"     \
+                                #suffix	": r_pio=%08x "                 \
+                                "e_physbase=%08x r_virtbase=%08lx "	\
+				"value=%08x\n",				\
+				r_pio, (int)r_access->e_physbase,	\
+				(unsigned long)r_access->r_virtbase,    \
+                                value); 				\
+		}							\
+		return value;						\
+}
+
+assigned_dev_ioport_read(b)
+assigned_dev_ioport_read(w)
+assigned_dev_ioport_read(l)
+
+void assigned_dev_iomem_map(PCIDevice * pci_dev, int region_num,
+			    uint32_t e_phys, uint32_t e_size, int type)
+{
+	assigned_dev_t *r_dev = (assigned_dev_t *) pci_dev;
+	assigned_dev_region_t *region = &r_dev->v_addrs[region_num];
+	int first_map = (region->e_size == 0);
+	int ret = 0;
+
+	DEBUG("e_phys=%08x r_virt=%p type=%d len=%08x region_num=%d \n",
+	      e_phys, r_dev->v_addrs[region_num].r_virtbase, type, e_size,
+	      region_num);
+
+	region->e_physbase = e_phys;
+	region->e_size = e_size;
+
+	if (!first_map)
+		kvm_destroy_phys_mem(kvm_context, e_phys, e_size);
+
+	if (e_size > 0)
+		ret = kvm_register_userspace_phys_mem(kvm_context,
+						      e_phys,
+						      region->r_virtbase,
+						      e_size,
+						      0);
+	if (ret != 0)
+		fprintf(logfile, "Error: create new mapping failed\n");
+}
+
+static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num,
+				    uint32_t addr, uint32_t size, int type)
+{
+	assigned_dev_t *r_dev = (assigned_dev_t *) pci_dev;
+	int i;
+	uint32_t ((*rf[])(void *, uint32_t)) =  { assigned_dev_ioport_readb,
+						  assigned_dev_ioport_readw,
+						  assigned_dev_ioport_readl
+	};
+	void ((*wf[])(void *, uint32_t, uint32_t)) =
+		{ assigned_dev_ioport_writeb,
+		  assigned_dev_ioport_writew,
+		  assigned_dev_ioport_writel
+		};
+
+	r_dev->v_addrs[region_num].e_physbase = addr;
+	DEBUG("assigned_dev_ioport_map: address=0x%x type=0x%x len=%d"
+	      "region_num=%d \n", addr, type, size, region_num);
+
+	for (i = 0; i < 3; i++) {
+		register_ioport_write(addr, size, 1<<i, wf[i],
+				      (void *) (r_dev->v_addrs + region_num));
+		register_ioport_read(addr, size, 1<<i, rf[i],
+				     (void *) (r_dev->v_addrs + region_num));
+	}
+}
+
+static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
+					  uint32_t val, int len)
+{
+	int fd, r;
+
+	DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
+	      ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7), (uint16_t) address,
+	      val, len);
+
+	if (address == 0x4)
+		pci_default_write_config(d, address, val, len);
+
+	if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
+	    address == 0x3c || address == 0x3d) {
+		/* used for update-mappings (BAR emulation) */
+		pci_default_write_config(d, address, val, len);
+		return;
+	}
+
+	DEBUG("NON BAR (%x.%x): address=%04x val=0x%08x len=%d\n",
+	      ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7), (uint16_t) address,
+	      val, len);
+	fd = ((assigned_dev_t *)d)->real_device.config_fd;
+	lseek(fd, address, SEEK_SET);
+again:
+	r = write(fd, &val, len);
+	if (r < 0) {
+		if (errno == EINTR || errno == EAGAIN)
+			goto again;
+		fprintf(stderr, "%s: write failed, errno = %d\n", __func__,
+			errno);
+	}
+}
+
+static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
+					     int len)
+{
+	uint32_t val = 0;
+	int fd, r;
+
+	if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
+	    address == 0x3c || address == 0x3d) {
+		val = pci_default_read_config(d, address, len);
+		DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
+		      (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val,
+		      len);
+		return val;
+	}
+
+	/* vga specific, remove later */
+	if (address == 0xFC)
+		goto do_log;
+
+	fd = ((assigned_dev_t *)d)->real_device.config_fd;
+	lseek(fd, address, SEEK_SET);
+again:
+	r = read(fd, &val, len);
+	if (r < 0) {
+		if (errno == EINTR || errno == EAGAIN)
+			goto again;
+		fprintf(stderr, "%s: read failed, errno = %d\n", __func__,
+			errno);
+	}
+
+do_log:
+	DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
+	      (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
+
+	/* kill the special capabilities */
+	if (address == 4 && len == 4)
+		val &= ~0x100000;
+	else if (address == 6)
+		val &= ~0x10;
+
+	return val;
+}
+
+static int assigned_dev_register_regions(pci_region_t *io_regions,
+					 unsigned long regions_num,
+					 assigned_dev_t *pci_dev)
+{
+	uint32_t i;
+	pci_region_t *cur_region = io_regions;
+
+	for (i = 0; i < regions_num; i++, cur_region++) {
+		if (!cur_region->valid)
+			continue;
+#ifdef DEVICE_ASSIGNMENT_DEBUG
+		pci_dev->v_addrs[i].debug |= DEVICE_ASSIGNMENT_DEBUG_MMIO |
+			DEVICE_ASSIGNMENT_DEBUG_PIO;
+#endif
+		pci_dev->v_addrs[i].num = i;
+
+		/* handle memory io regions */
+		if (cur_region->type & IORESOURCE_MEM) {
+			int t = cur_region->type & IORESOURCE_PREFETCH
+				? PCI_ADDRESS_SPACE_MEM_PREFETCH
+				: PCI_ADDRESS_SPACE_MEM;
+
+			/* map physical memory */
+			pci_dev->v_addrs[i].e_physbase = cur_region->base_addr;
+			pci_dev->v_addrs[i].r_virtbase =
+				mmap(NULL,
+				     (cur_region->size + 0xFFF) & 0xFFFFF000,
+				     PROT_WRITE | PROT_READ, MAP_SHARED,
+				     cur_region->resource_fd, (off_t) 0);
+
+			if ((void *) -1 == pci_dev->v_addrs[i].r_virtbase) {
+				fprintf(stderr, "Error: Couldn't mmap 0x%x!\n",
+					(uint32_t) (cur_region->base_addr));
+				return -1;
+			}
+			pci_dev->v_addrs[i].r_size = cur_region->size;
+			pci_dev->v_addrs[i].e_size = 0;
+
+			/* add offset */
+			pci_dev->v_addrs[i].r_virtbase +=
+				(cur_region->base_addr & 0xFFF);
+
+			pci_register_io_region((PCIDevice *) pci_dev, i,
+					       cur_region->size, t,
+					       assigned_dev_iomem_map);
+
+			continue;
+		}
+		/* handle port io regions */
+
+		pci_register_io_region((PCIDevice *) pci_dev, i,
+				       cur_region->size, PCI_ADDRESS_SPACE_IO,
+				       assigned_dev_ioport_map);
+
+		pci_dev->v_addrs[i].e_physbase = cur_region->base_addr;
+		pci_dev->v_addrs[i].r_virtbase =
+			(void *)(long)cur_region->base_addr;
+		/* not relevant for port io */
+		pci_dev->v_addrs[i].memory_index = 0;
+	}
+
+	/* success */
+	return 0;
+
+}
+
+static int get_real_device(assigned_dev_t *pci_dev, uint8_t r_bus,
+			   uint8_t r_dev, uint8_t r_func)
+{
+	char dir[128], name[128], comp[16];
+	int fd, r = 0;
+	FILE *f;
+	unsigned long long start, end, size, flags;
+	pci_region_t *rp;
+	pci_dev_t *dev = &pci_dev->real_device;
+
+	dev->region_number = 0;
+
+	sprintf(dir, "/sys/bus/pci/devices/0000:%02x:%02x.%x/",
+		r_bus, r_dev, r_func);
+	strcpy(name, dir);
+	strcat(name, "config");
+	fd = open(name, O_RDWR);
+	if (fd == -1) {
+		fprintf(stderr, "%s: %m\n", name);
+		return 1;
+	}
+	dev->config_fd = fd;
+again:
+	r = read(fd, pci_dev->dev.config, sizeof pci_dev->dev.config);
+	if (r < 0) {
+		if (errno == EINTR || errno == EAGAIN)
+			goto again;
+		fprintf(stderr, "%s: read failed, errno = %d\n", __func__,
+			errno);
+	}
+
+	strcpy(name, dir);
+	strcat(name, "resource");
+	f = fopen(name, "r");
+	if (f == NULL) {
+		fprintf(stderr, "%s: %m\n", name);
+		return 1;
+	}
+
+	for (r = 0; fscanf(f, "%lli %lli %lli\n", &start, &end, &flags) == 3;
+	     r++) {
+		rp = dev->regions + r;
+		rp->valid = 0;
+		size = end - start + 1;
+		flags &= IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH;
+		if (size == 0 || (flags & ~IORESOURCE_PREFETCH) == 0)
+			continue;
+		if (flags & IORESOURCE_MEM) {
+			flags &= ~IORESOURCE_IO;
+			sprintf(comp, "resource%d", r);
+			strcpy(name, dir);
+			strcat(name, comp);
+			fd = open(name, O_RDWR);
+			if (fd == -1)
+				continue;		/* probably ROM */
+			rp->resource_fd = fd;
+		} else
+			flags &= ~IORESOURCE_PREFETCH;
+
+		rp->type = flags;
+		rp->valid = 1;
+		rp->base_addr = start;
+		rp->size = size;
+		DEBUG("region %d size %d start 0x%x type %d "
+		      "resource_fd %d\n", r, rp->size, start, rp->type,
+		      rp->resource_fd);
+	}
+	fclose(f);
+
+	dev->region_number = r;
+	return 0;
+}
+
+static uint32_t calc_assigned_dev_id(uint8_t bus, uint8_t devfn)
+{
+	return (uint32_t)bus << 8 | (uint32_t)devfn;
+}
+
+	
+static assigned_dev_t *register_real_device(PCIBus *e_bus,
+					    const char *e_dev_name,
+					    int e_devfn, uint8_t r_bus,
+					    uint8_t r_dev,
+					    uint8_t r_func)
+{
+	int rc;
+	assigned_dev_t *pci_dev;
+	uint8_t e_device, e_intx;
+
+	DEBUG("register_real_device: Registering real physical "
+	      "device %s (devfn=0x%x)\n", e_dev_name, e_devfn);
+	
+	pci_dev = (assigned_dev_t *)
+		pci_register_device(e_bus, e_dev_name,
+				    sizeof(assigned_dev_t), e_devfn,
+				    assigned_dev_pci_read_config,
+				    assigned_dev_pci_write_config);
+
+	if (NULL == pci_dev) {
+		fprintf(stderr, "register_real_device: Error: Couldn't "
+			"register real device %s\n", e_dev_name);
+		return NULL;
+	}
+	if (get_real_device(pci_dev, r_bus, r_dev, r_func)) {
+		fprintf(stderr, "register_real_device: Error: Couldn't get "
+			"real device (%s)!\n", e_dev_name);
+		return NULL;
+	}
+
+	/* handle real device's MMIO/PIO BARs */
+	if (assigned_dev_register_regions(pci_dev->real_device.regions,
+					  pci_dev->real_device.region_number,
+					  pci_dev))
+		return NULL;
+
+	/* handle interrupt routing */
+	e_device = (pci_dev->dev.devfn >> 3) & 0x1f;
+	e_intx = pci_dev->dev.config[0x3d] - 1;
+	pci_dev->intpin = e_intx;
+	pci_dev->run = 0;
+	pci_dev->girq = 0;
+	pci_dev->h_busnr = r_bus;
+	pci_dev->h_devfn = PCI_DEVFN(r_dev, r_func);
+
+#ifdef KVM_CAP_DEVICE_ASSIGNMENT
+	if (kvm_enabled()) {
+		struct kvm_assigned_pci_dev assigned_dev_data;
+		struct kvm_assigned_irq assigned_irq_data;
+
+		memset(&assigned_dev_data, 0, sizeof(assigned_dev_data));
+		assigned_dev_data.assigned_dev_id  =
+			calc_assigned_dev_id(pci_dev->h_busnr,
+					     (uint32_t)pci_dev->h_devfn);
+		assigned_dev_data.busnr = pci_dev->h_busnr;	
+		assigned_dev_data.devfn = pci_dev->h_devfn;
+		rc = kvm_assign_pci_device(kvm_context,
+					   &assigned_dev_data);
+		if (rc < 0) {
+			fprintf(stderr, "Could not notify kernel about "
+				"assigned device\n");
+			perror("pt-ioctl");
+			return NULL;
+		}
+		assigned_irq_data.assigned_dev_id  =
+			calc_assigned_dev_id(pci_dev->h_busnr,
+					     (uint8_t)pci_dev->h_devfn);
+		assigned_irq_data.guest_irq = 0;
+		assigned_irq_data.host_irq =
+			pci_dev->real_device.irq;
+		rc = kvm_assign_irq(kvm_context,
+				    &assigned_irq_data);
+		if (rc < 0) {
+			fprintf(stderr, "Could not notify kernel about "
+				"irq\n");
+			perror("pt-ioctl");
+			return NULL;
+		}
+		printf("%s:%d\n", __func__, __LINE__);
+
+	}
+#endif
+
+	fprintf(logfile, "Registered host PCI device %02x:%02x.%1x "
+		"as guest device %02x:%02x.%1x\n",
+		r_bus, r_dev, r_func,
+		pci_bus_num(e_bus), e_device, r_func);
+
+	return pci_dev;
+}
+
+#define	MAX_ASSIGNED_DEVS 4
+struct {
+	char name[128];
+	int bus;
+	int dev;
+	int func;
+	assigned_dev_t *assigned_dev;
+} assigned_devices[MAX_ASSIGNED_DEVS];
+
+int num_assigned_devices;
+extern int piix_get_irq(int);
+
+#ifdef KVM_CAP_DEVICE_ASSIGNMENT
+/* The pci config space got updated. Check if irq numbers have changed
+ * for our devices
+ */
+void assigned_dev_update_irq(PCIDevice *d)
+{
+	int i, irq, r;
+	assigned_dev_t *assigned_dev;
+
+	for (i = 0; i < num_assigned_devices; i++) {
+		assigned_dev = assigned_devices[i].assigned_dev;
+		if (assigned_dev == NULL)
+			continue;
+
+		irq = pci_map_irq(&assigned_dev->dev, assigned_dev->intpin);
+		irq = piix_get_irq(irq);
+		if (irq != assigned_dev->girq) {
+			struct kvm_assigned_irq assigned_irq_data;
+			
+			memset(&assigned_irq_data, 0,
+			       sizeof(assigned_irq_data));
+			assigned_irq_data.assigned_dev_id  =
+				calc_assigned_dev_id(assigned_dev->h_busnr,
+						     (uint8_t)assigned_dev->h_devfn);
+			assigned_irq_data.guest_irq = irq;
+			assigned_irq_data.host_irq =
+				assigned_dev->real_device.irq;
+			r = kvm_assign_irq(kvm_context,
+					   &assigned_irq_data);
+			if (r < 0) {
+				perror("assigned_dev_update_irq");
+				continue;
+			}
+			assigned_dev->girq = irq;
+		}
+	}
+}
+#endif
+
+int init_device_assignment(void)
+{
+	/* Do we have any devices to be assigned? */
+	if (num_assigned_devices == 0)
+		return -1;
+
+	iopl(3);
+
+	return 0;
+}
+
+int init_assigned_device(PCIBus *bus, int *index)
+{
+	assigned_dev_t *dev = NULL;
+	int i, ret = 0;
+
+	if (*index == -1) {
+		if (init_device_assignment() < 0)
+			return -1;
+
+		*index = num_assigned_devices - 1;
+	}
+	i = *index;
+
+	dev = register_real_device(bus, assigned_devices[i].name, -1,
+				   assigned_devices[i].bus,
+				   assigned_devices[i].dev,
+				   assigned_devices[i].func);
+	if (dev == NULL) {
+		fprintf(stderr, "Error: Couldn't register device %s\n",
+			assigned_devices[i].name);
+		ret = -1;
+	}
+	assigned_devices[i].assigned_dev = dev;
+
+	--*index;
+	return ret;
+}
+
+void add_assigned_device(const char *arg)
+{
+	/* name/bus:dev.func */
+	char *cp, *cp1;
+
+	if (num_assigned_devices >= MAX_ASSIGNED_DEVS) {
+		fprintf(stderr, "Too many assigned devices (max %d)\n",
+			MAX_ASSIGNED_DEVS);
+		return;
+	}
+	strcpy(assigned_devices[num_assigned_devices].name, arg);
+	cp = strchr(assigned_devices[num_assigned_devices].name, '/');
+	if (cp == NULL)
+		goto bad;
+	*cp++ = 0;
+
+	assigned_devices[num_assigned_devices].bus = strtoul(cp, &cp1, 16);
+	if (*cp1 != ':')
+		goto bad;
+	cp = cp1 + 1;
+
+	assigned_devices[num_assigned_devices].dev = strtoul(cp, &cp1, 16);
+	if (*cp1 != '.')
+		goto bad;
+	cp = cp1 + 1;
+
+	assigned_devices[num_assigned_devices].func = strtoul(cp, &cp1, 16);
+	if (*cp1 != 0)
+		goto bad;
+
+	num_assigned_devices++;
+	return;
+bad:
+	fprintf(stderr, "assigned device arg (%s) not in the form of "
+		"name/bus:dev.func\n", arg);
+}
diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
new file mode 100644
index 0000000..f80a1d5
--- /dev/null
+++ b/qemu/hw/device-assignment.h
@@ -0,0 +1,95 @@
+/*
+ * Copyright (c) 2007, Neocleus Corporation.
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ *  Data structures for storing PCI state
+ *
+ *  Adapted to kvm by Qumranet
+ *
+ *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
+ *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
+ *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
+ */
+
+#ifndef __DEVICE_ASSIGNMENT_H__
+#define __DEVICE_ASSIGNMENT_H__
+
+#include <sys/mman.h>
+#include "qemu-common.h"
+#include "pci.h"
+#include <linux/types.h>
+
+#define DEVICE_ASSIGNMENT_DEBUG_PIO	(0x01)
+#define DEVICE_ASSIGNMENT_DEBUG_MMIO	(0x02)
+
+/* From include/linux/pci.h in the kernel sources */
+#define PCI_DEVFN(slot, func)	((((slot) & 0x1f) << 3) | ((func) & 0x07))
+
+typedef uint32_t pciaddr_t;
+
+#define MAX_IO_REGIONS			(6)
+
+typedef struct pci_region_s {
+	int type;	/* Memory or port I/O */
+	int valid;
+	pciaddr_t base_addr;
+	pciaddr_t size;		/* size of the region */
+	int resource_fd;
+} pci_region_t;
+
+typedef struct pci_dev_s {
+	uint8_t bus, dev, func;	/* Bus inside domain, device and function */
+	int irq;		/* IRQ number */
+	uint16_t region_number;	/* number of active regions */
+
+	/* Port I/O or MMIO Regions */
+	pci_region_t regions[MAX_IO_REGIONS];
+	int config_fd;
+} pci_dev_t;
+
+typedef struct assigned_dev_region_s {
+	target_phys_addr_t e_physbase;
+	uint32_t memory_index;
+	void *r_virtbase;	/* mmapped access address */
+	int num;		/* our index within v_addrs[] */
+	uint32_t e_size;        /* emulated size of region in bytes */
+	uint32_t r_size;        /* real size of region in bytes */
+	uint32_t debug;
+} assigned_dev_region_t;
+
+typedef struct assigned_dev_s {
+	PCIDevice dev;
+	int intpin;
+	uint8_t debug_flags;
+	assigned_dev_region_t v_addrs[PCI_NUM_REGIONS];
+	pci_dev_t real_device;
+	int run;
+	int girq;
+	char sirq[4];
+	unsigned char h_busnr;
+	unsigned int h_devfn;
+	int bound;
+} assigned_dev_t;
+
+/* Initialization functions */
+int init_assigned_device(PCIBus *bus, int *index);
+void add_assigned_device(const char *arg);
+void assigned_dev_set_vector(int irq, int vector);
+void assigned_dev_ack_mirq(int vector);
+
+#define logfile stderr
+
+#endif				/* __DEVICE_ASSIGNMENT_H__ */
diff --git a/qemu/hw/isa.h b/qemu/hw/isa.h
index 89b3004..c720f5e 100644
--- a/qemu/hw/isa.h
+++ b/qemu/hw/isa.h
@@ -1,5 +1,7 @@
 /* ISA bus */
 
+#include "hw.h"
+
 extern target_phys_addr_t isa_mem_base;
 
 int register_ioport_read(int start, int length, int size,
diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c
index 3a8269b..7b085d1 100644
--- a/qemu/hw/pc.c
+++ b/qemu/hw/pc.c
@@ -32,6 +32,7 @@
 #include "smbus.h"
 #include "boards.h"
 #include "console.h"
+#include "device-assignment.h"
 
 #include "qemu-kvm.h"
 
@@ -999,6 +1000,14 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
         }
     }
 
+    /* Initialize device assignment */
+    if (pci_enabled) {
+	    int r = -1;
+	    do {
+		    init_assigned_device(pci_bus, &r);
+	    } while (r >= 0);
+    }
+
     rtc_state = rtc_init(0x70, i8259[8]);
 
     qemu_register_boot_set(pc_boot_set, rtc_state);
diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
index 92683d1..d45d0ce 100644
--- a/qemu/hw/pci.c
+++ b/qemu/hw/pci.c
@@ -50,6 +50,7 @@ struct PCIBus {
 
 static void pci_update_mappings(PCIDevice *d);
 static void pci_set_irq(void *opaque, int irq_num, int level);
+static void assigned_dev_update_irq(PCIDevice *d);
 
 target_phys_addr_t pci_mem_base;
 static int pci_irq_index;
@@ -453,6 +454,12 @@ void pci_default_write_config(PCIDevice *d,
         val >>= 8;
     }
 
+#ifdef KVM_CAP_DEVICE_ASSIGNMENT
+    if (kvm_enabled() && qemu_kvm_irqchip_in_kernel() &&
+	address >= 0x60 && address <= 0x63)
+	assigned_dev_update_irq(d);
+#endif
+
     end = address + len;
     if (end > PCI_COMMAND && address < (PCI_COMMAND + 2)) {
         /* if the command register is modified, we must modify the mappings */
@@ -555,6 +562,11 @@ static void pci_set_irq(void *opaque, int irq_num, int level)
     bus->set_irq(bus->irq_opaque, irq_num, bus->irq_count[irq_num] != 0);
 }
 
+int pci_map_irq(PCIDevice *pci_dev, int pin)
+{
+	return pci_dev->bus->map_irq(pci_dev, pin);
+}
+
 /***********************************************************/
 /* monitor info on PCI */
 
diff --git a/qemu/hw/pci.h b/qemu/hw/pci.h
index 60e4094..e11fbbf 100644
--- a/qemu/hw/pci.h
+++ b/qemu/hw/pci.h
@@ -81,6 +81,7 @@ void pci_register_io_region(PCIDevice *pci_dev, int region_num,
                             uint32_t size, int type,
                             PCIMapIORegionFunc *map_func);
 
+int pci_map_irq(PCIDevice *pci_dev, int pin);
 uint32_t pci_default_read_config(PCIDevice *d,
                                  uint32_t address, int len);
 void pci_default_write_config(PCIDevice *d,
diff --git a/qemu/hw/piix_pci.c b/qemu/hw/piix_pci.c
index 90cb3a6..9ba1d8e 100644
--- a/qemu/hw/piix_pci.c
+++ b/qemu/hw/piix_pci.c
@@ -237,6 +237,25 @@ static void piix3_set_irq(qemu_irq *pic, int irq_num, int level)
     }
 }
 
+int piix3_get_pin(int pic_irq)
+{
+    int i;
+    for (i = 0; i < 4; i++)
+	    if (piix3_dev->config[0x60+i] == pic_irq)
+		    return i;
+    return -1;
+}
+
+int piix_get_irq(int pin)
+{
+    if (piix3_dev)
+	    return piix3_dev->config[0x60+pin];
+    if (piix4_dev)
+	    return piix4_dev->config[0x60+pin];
+
+    return 0;
+}
+
 static void piix3_reset(PCIDevice *d)
 {
     uint8_t *pci_conf = d->config;
diff --git a/qemu/vl.c b/qemu/vl.c
index e1762ee..6f95386 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -37,6 +37,7 @@
 #include "qemu-char.h"
 #include "block.h"
 #include "audio/audio.h"
+#include "hw/device-assignment.h"
 #include "migration.h"
 #include "qemu-kvm.h"
 
@@ -8070,6 +8071,11 @@ static void help(int exitcode)
 #endif
 	   "-no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC\n"
 	   "-no-kvm-pit	    disable KVM kernel mode PIT\n"
+#if defined(TARGET_I386) || defined(TARGET_X86_64)
+	   "-pcidevice name/bus:dev.func\n"
+	   "                expose a PCI device to the guest OS.\n"
+	   "                'name' is just used for debug logs.\n"
+#endif
 #endif
 #ifdef TARGET_I386
            "-std-vga        simulate a standard VGA card with VESA Bochs Extensions\n"
@@ -8193,6 +8199,9 @@ enum {
     QEMU_OPTION_no_kvm,
     QEMU_OPTION_no_kvm_irqchip,
     QEMU_OPTION_no_kvm_pit,
+#if defined(TARGET_I386) || defined(TARGET_X86_64)
+    QEMU_OPTION_pcidevice,
+#endif
     QEMU_OPTION_no_reboot,
     QEMU_OPTION_no_shutdown,
     QEMU_OPTION_show_cursor,
@@ -8281,6 +8290,9 @@ const QEMUOption qemu_options[] = {
 #endif
     { "no-kvm-irqchip", 0, QEMU_OPTION_no_kvm_irqchip },
     { "no-kvm-pit", 0, QEMU_OPTION_no_kvm_pit },
+#if defined(TARGET_I386) || defined(TARGET_X86_64)
+    { "pcidevice", HAS_ARG, QEMU_OPTION_pcidevice },
+#endif
 #endif
 #if defined(TARGET_PPC) || defined(TARGET_SPARC)
     { "g", 1, QEMU_OPTION_g },
@@ -9163,6 +9175,11 @@ int main(int argc, char **argv)
 		kvm_pit = 0;
 		break;
 	    }
+#if defined(TARGET_I386) || defined(TARGET_X86_64)
+	    case QEMU_OPTION_pcidevice:
+		add_assigned_device(optarg);
+		break;
+#endif
 #endif
             case QEMU_OPTION_usb:
                 usb_enabled = 1;
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/5] KVM: Add irq ack notifier list
  2008-07-28 16:26   ` [PATCH 2/5] KVM: Add irq ack notifier list Ben-Ami Yassour
  2008-07-28 16:26     ` [PATCH 3/5] KVM: pci device assignment Ben-Ami Yassour
@ 2008-07-29  7:14     ` Yang, Sheng
  2008-07-29  9:34       ` Amit Shah
  1 sibling, 1 reply; 36+ messages in thread
From: Yang, Sheng @ 2008-07-29  7:14 UTC (permalink / raw)
  To: kvm, Marcelo Tosatti
  Cc: Ben-Ami Yassour, avi, amit.shah, muli, weidong.han, anthony

On Tuesday 29 July 2008 00:26:25 Ben-Ami Yassour wrote:
> FROM: Avi Kivity <avi@qumranet.com>
>
> This can be used by kvm subsystems that are interested in when
> interrupts
> are acked, for example time drift compenstation.
>
> [Ben: add notification call to the pic and ioapic]
>
> Signed-off-by: Avi Kivity <avi@qumranet.com>
> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> ---
>  arch/x86/kvm/i8259.c       |    1 +
>  arch/x86/kvm/irq.c         |   22 ++++++++++++++++++++++
>  arch/x86/kvm/irq.h         |    5 +++++
>  include/asm-x86/kvm_host.h |    7 +++++++
>  virt/kvm/ioapic.c          |    2 ++
>  5 files changed, 37 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
> index 55e179a..d2a61bf 100644
> --- a/arch/x86/kvm/i8259.c
> +++ b/arch/x86/kvm/i8259.c
> @@ -186,6 +186,7 @@ int kvm_pic_read_irq(struct kvm_pic *s)
>  		irq = 7;
>  		intno = s->pics[0].irq_base + irq;
>  	}
> +	kvm_notify_acked_irq(s->irq_request_opaque, irq);

It's not what I mean, sorry to not tell it clearly... Now it got 
confusing semantic.

irq_request_opaque has nothing to do with acked_irq. What I mean is 
rename irq_request_opaque to struct* kvm in struct kvm_pic, and 
modify all irq_request() calling(three of them in all) with (void 
*)kvm.

It's not a must, I just think it's clearer...

--
regards
Yang, Sheng


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-07-28 16:26         ` [PATCH 5/5] This patch extends the VT-d driver " Ben-Ami Yassour
  2008-07-28 16:32           ` Device assignment - userspace part Ben-Ami Yassour
@ 2008-07-29  7:28           ` Yang, Sheng
  2008-08-05  6:01           ` Yang, Sheng
  2008-08-05 14:46           ` Han, Weidong
  3 siblings, 0 replies; 36+ messages in thread
From: Yang, Sheng @ 2008-07-29  7:28 UTC (permalink / raw)
  To: kvm
  Cc: Ben-Ami Yassour, avi, amit.shah, muli, weidong.han, anthony,
	Kay, Allen M

On Tuesday 29 July 2008 00:26:28 Ben-Ami Yassour wrote:
> [Ben: fixed memory pinning]
>
> Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
> Signed-off-by: Weidong Han <weidong.han@intel.com>
> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> ---
>  arch/x86/kvm/Makefile      |    2 +-
>  arch/x86/kvm/vtd.c         |  180
> ++++++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c    
>     |   12 +++
>  include/asm-x86/kvm_host.h |    3 +
>  include/linux/kvm_host.h   |   11 +++
>  virt/kvm/kvm_main.c        |    8 ++-
>  6 files changed, 214 insertions(+), 2 deletions(-)
>  create mode 100644 arch/x86/kvm/vtd.c
>
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index d0e940b..5d9d079 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -11,7 +11,7 @@ endif
>  EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
>
>  kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o
> lapic.o \ -	i8254.o
> +	i8254.o vtd.o

Well, at least this depends on kernel CONFIG_DMAR, otherwise kvm 
building would be broken... 

--
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] KVM: pci device assignment
  2008-07-28 16:26     ` [PATCH 3/5] KVM: pci device assignment Ben-Ami Yassour
  2008-07-28 16:26       ` [PATCH 4/5] VT-d: changes to support KVM Ben-Ami Yassour
@ 2008-07-29  9:19       ` Amit Shah
  2008-07-29  9:38         ` Ben-Ami Yassour
  2008-07-29 12:27       ` Ben-Ami Yassour
  2 siblings, 1 reply; 36+ messages in thread
From: Amit Shah @ 2008-07-29  9:19 UTC (permalink / raw)
  To: Ben-Ami Yassour; +Cc: avi, kvm, muli, weidong.han, anthony

* On Monday 28 Jul 2008 21:56:26 Ben-Ami Yassour wrote:

> +static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
> +				      struct kvm_assigned_pci_dev *assigned_dev)
> +{


> +	if (pci_enable_device(dev)) {
> +		printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
> +		r = -EBUSY;
> +		goto out_put;
> +	}
> +	r = pci_request_regions(dev, "kvm_assigned_device");
> +	if (r) {
> +		printk(KERN_INFO "%s: Could not get access to device regions\n",
> +		       __func__);
> +		goto out_disable;

Shouldn't disable here unconditionally (see my comment earlier to the previous 
patch).

> +static void kvm_free_assigned_devices(struct kvm *kvm)
> +{
> +	struct list_head *ptr, *ptr2;
> +	struct kvm_assigned_dev_kernel *assigned_dev;
> +
> +	list_for_each_safe(ptr, ptr2, &kvm->arch.assigned_dev_head) {
> +		assigned_dev = list_entry(ptr,
> +					  struct kvm_assigned_dev_kernel,
> +					  list);
> +
> +		if (irqchip_in_kernel(kvm) && assigned_dev->irq_requested) {
> +			free_irq(assigned_dev->host_irq,
> +				 (void *)assigned_dev);
> +
> +			kvm_unregister_irq_ack_notifier(kvm,
> +							&assigned_dev->
> +							ack_notifier);
> +		}

Unregister the notifier before freeing irq

Amit

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/5] KVM: Add irq ack notifier list
  2008-07-29  7:14     ` [PATCH 2/5] KVM: Add irq ack notifier list Yang, Sheng
@ 2008-07-29  9:34       ` Amit Shah
  2008-07-29  9:56         ` Yang, Sheng
  0 siblings, 1 reply; 36+ messages in thread
From: Amit Shah @ 2008-07-29  9:34 UTC (permalink / raw)
  To: Yang, Sheng
  Cc: kvm, Marcelo Tosatti, Ben-Ami Yassour, avi, muli, weidong.han,
	anthony

* On Tuesday 29 Jul 2008 12:44:17 Yang, Sheng wrote:
> On Tuesday 29 July 2008 00:26:25 Ben-Ami Yassour wrote:
> > FROM: Avi Kivity <avi@qumranet.com>
> >
> > This can be used by kvm subsystems that are interested in when
> > interrupts
> > are acked, for example time drift compenstation.
> >
> > [Ben: add notification call to the pic and ioapic]
> >
> > Signed-off-by: Avi Kivity <avi@qumranet.com>
> > Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> > ---
> >  arch/x86/kvm/i8259.c       |    1 +
> >  arch/x86/kvm/irq.c         |   22 ++++++++++++++++++++++
> >  arch/x86/kvm/irq.h         |    5 +++++
> >  include/asm-x86/kvm_host.h |    7 +++++++
> >  virt/kvm/ioapic.c          |    2 ++
> >  5 files changed, 37 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
> > index 55e179a..d2a61bf 100644
> > --- a/arch/x86/kvm/i8259.c
> > +++ b/arch/x86/kvm/i8259.c
> > @@ -186,6 +186,7 @@ int kvm_pic_read_irq(struct kvm_pic *s)
> >  		irq = 7;
> >  		intno = s->pics[0].irq_base + irq;
> >  	}
> > +	kvm_notify_acked_irq(s->irq_request_opaque, irq);
>
> It's not what I mean, sorry to not tell it clearly... Now it got
> confusing semantic.
>
> irq_request_opaque has nothing to do with acked_irq. What I mean is

The change here uses the irq_request_opaque field which actually is the kvm 
struct, pointed out by you, thanks for that.

> rename irq_request_opaque to struct* kvm in struct kvm_pic, and
> modify all irq_request() calling(three of them in all) with (void
> *)kvm.

'opaque' fields can be later made to point to other structures without 
changing the structure itself. This is an advantage. Will the kvm_pic struct 
be needed to change in the future? Very unlikely. So we can rename it to 
struct kvm *, however, that gives us no real benefit as against opaque (just 
readability).

Amit

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] KVM: pci device assignment
  2008-07-29  9:19       ` [PATCH 3/5] KVM: pci device assignment Amit Shah
@ 2008-07-29  9:38         ` Ben-Ami Yassour
  2008-07-29 14:02           ` Avi Kivity
  2008-07-30  6:03           ` Amit Shah
  0 siblings, 2 replies; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-07-29  9:38 UTC (permalink / raw)
  To: Amit Shah; +Cc: avi, kvm, Muli Ben-Yehuda, weidong.han, anthony

On Tue, 2008-07-29 at 14:49 +0530, Amit Shah wrote:
> * On Monday 28 Jul 2008 21:56:26 Ben-Ami Yassour wrote:
> 
> > +static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
> > +				      struct kvm_assigned_pci_dev *assigned_dev)
> > +{
> 
> 
> > +	if (pci_enable_device(dev)) {
> > +		printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
> > +		r = -EBUSY;
> > +		goto out_put;
> > +	}
> > +	r = pci_request_regions(dev, "kvm_assigned_device");
> > +	if (r) {
> > +		printk(KERN_INFO "%s: Could not get access to device regions\n",
> > +		       __func__);
> > +		goto out_disable;
> 
> Shouldn't disable here unconditionally (see my comment earlier to the previous 
> patch).
Why? the device should not be used by the host at the same time.
What is the condition that you were thinking of?

> 
> > +static void kvm_free_assigned_devices(struct kvm *kvm)
> > +{
> > +	struct list_head *ptr, *ptr2;
> > +	struct kvm_assigned_dev_kernel *assigned_dev;
> > +
> > +	list_for_each_safe(ptr, ptr2, &kvm->arch.assigned_dev_head) {
> > +		assigned_dev = list_entry(ptr,
> > +					  struct kvm_assigned_dev_kernel,
> > +					  list);
> > +
> > +		if (irqchip_in_kernel(kvm) && assigned_dev->irq_requested) {
> > +			free_irq(assigned_dev->host_irq,
> > +				 (void *)assigned_dev);
> > +
> > +			kvm_unregister_irq_ack_notifier(kvm,
> > +							&assigned_dev->
> > +							ack_notifier);
> > +		}
> 
> Unregister the notifier before freeing irq

I don't think that there is any importance to the order in this case,
but just in case, the order should be in reverse to the initialization
order, which is the case here.

Regards,
Ben


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/5] KVM: Add irq ack notifier list
  2008-07-29  9:34       ` Amit Shah
@ 2008-07-29  9:56         ` Yang, Sheng
  2008-07-30  5:54           ` Amit Shah
  0 siblings, 1 reply; 36+ messages in thread
From: Yang, Sheng @ 2008-07-29  9:56 UTC (permalink / raw)
  To: Amit Shah
  Cc: kvm, Marcelo Tosatti, Ben-Ami Yassour, avi, muli, weidong.han,
	anthony

On Tuesday 29 July 2008 17:34:47 Amit Shah wrote:
> * On Tuesday 29 Jul 2008 12:44:17 Yang, Sheng wrote:
> > On Tuesday 29 July 2008 00:26:25 Ben-Ami Yassour wrote:
> > > FROM: Avi Kivity <avi@qumranet.com>
> > >
> > > This can be used by kvm subsystems that are interested in when
> > > interrupts
> > > are acked, for example time drift compenstation.
> > >
> > > [Ben: add notification call to the pic and ioapic]
> > >
> > > Signed-off-by: Avi Kivity <avi@qumranet.com>
> > > Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> > > ---
> > >  arch/x86/kvm/i8259.c       |    1 +
> > >  arch/x86/kvm/irq.c         |   22 ++++++++++++++++++++++
> > >  arch/x86/kvm/irq.h         |    5 +++++
> > >  include/asm-x86/kvm_host.h |    7 +++++++
> > >  virt/kvm/ioapic.c          |    2 ++
> > >  5 files changed, 37 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
> > > index 55e179a..d2a61bf 100644
> > > --- a/arch/x86/kvm/i8259.c
> > > +++ b/arch/x86/kvm/i8259.c
> > > @@ -186,6 +186,7 @@ int kvm_pic_read_irq(struct kvm_pic *s)
> > >  		irq = 7;
> > >  		intno = s->pics[0].irq_base + irq;
> > >  	}
> > > +	kvm_notify_acked_irq(s->irq_request_opaque, irq);
> >
> > It's not what I mean, sorry to not tell it clearly... Now it got
> > confusing semantic.
> >
> > irq_request_opaque has nothing to do with acked_irq. What I mean
> > is
>
> The change here uses the irq_request_opaque field which actually is
> the kvm struct, pointed out by you, thanks for that.

Yeah, I know that... I just meant the meaning of words has no 
relevant. :)
>
> > rename irq_request_opaque to struct* kvm in struct kvm_pic, and
> > modify all irq_request() calling(three of them in all) with (void
> > *)kvm.
>
> 'opaque' fields can be later made to point to other structures
> without changing the structure itself. This is an advantage. Will
> the kvm_pic struct be needed to change in the future? Very
> unlikely. So we can rename it to struct kvm *, however, that gives
> us no real benefit as against opaque (just readability).

Yes, the readability...

I think people would be very curious about why 

void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi)

got a irq_request_opaque as a parameter. It's more like a hack, which 
is not my meaning...

Anyway, it's trivial one and just a coding style. :)

--
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] KVM: pci device assignment
  2008-07-28 16:26     ` [PATCH 3/5] KVM: pci device assignment Ben-Ami Yassour
  2008-07-28 16:26       ` [PATCH 4/5] VT-d: changes to support KVM Ben-Ami Yassour
  2008-07-29  9:19       ` [PATCH 3/5] KVM: pci device assignment Amit Shah
@ 2008-07-29 12:27       ` Ben-Ami Yassour
  2 siblings, 0 replies; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-07-29 12:27 UTC (permalink / raw)
  To: avi; +Cc: amit.shah, kvm, Muli Ben-Yehuda, weidong.han, anthony

On Mon, 2008-07-28 at 19:26 +0300, Ben-Ami Yassour1 wrote:
> Based on a patch from: Amit Shah <amit.shah@qumranet.com>
> 
>  	void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
>  };
>  
> +struct kvm_assigned_dev_kernel {
> +	struct kvm_irq_ack_notifier ack_notifier;
> +	struct work_struct interrupt_work;
> +	struct list_head list;
> +	struct kvm_assigned_pci_dev assigned_dev;
the field: struct kvm_assigned_pci_dev assigned_dev
is no longer used and should be deleted.
Will be updated in the next version.

Regards,
Ben



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] KVM: pci device assignment
  2008-07-29  9:38         ` Ben-Ami Yassour
@ 2008-07-29 14:02           ` Avi Kivity
  2008-07-30  6:00             ` Amit Shah
  2008-07-30  6:03           ` Amit Shah
  1 sibling, 1 reply; 36+ messages in thread
From: Avi Kivity @ 2008-07-29 14:02 UTC (permalink / raw)
  To: Ben-Ami Yassour; +Cc: Amit Shah, kvm, Muli Ben-Yehuda, weidong.han, anthony

Ben-Ami Yassour wrote:
> On Tue, 2008-07-29 at 14:49 +0530, Amit Shah wrote:
>   
>> * On Monday 28 Jul 2008 21:56:26 Ben-Ami Yassour wrote:
>>
>>     
>>> +static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
>>> +				      struct kvm_assigned_pci_dev *assigned_dev)
>>> +{
>>>       
>>     
>>> +	if (pci_enable_device(dev)) {
>>> +		printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
>>> +		r = -EBUSY;
>>> +		goto out_put;
>>> +	}
>>> +	r = pci_request_regions(dev, "kvm_assigned_device");
>>> +	if (r) {
>>> +		printk(KERN_INFO "%s: Could not get access to device regions\n",
>>> +		       __func__);
>>> +		goto out_disable;
>>>       
>> Shouldn't disable here unconditionally (see my comment earlier to the previous 
>> patch).
>>     
> Why? the device should not be used by the host at the same time.
> What is the condition that you were thinking of?
>
>   

pci_enable_device() can succeed even if the device was already enabled, 
so Amit was probably wishing to avoid an assignment failure disabling a 
device under a driver's feet.  But I see that pci_disable_device() will 
pair with pci_enable_device() correctly (doing reference counts), so I 
think the code is correct as is.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/5] KVM: Add irq ack notifier list
  2008-07-29  9:56         ` Yang, Sheng
@ 2008-07-30  5:54           ` Amit Shah
  2008-07-31  8:55             ` Avi Kivity
  0 siblings, 1 reply; 36+ messages in thread
From: Amit Shah @ 2008-07-30  5:54 UTC (permalink / raw)
  To: Yang, Sheng
  Cc: kvm, Marcelo Tosatti, Ben-Ami Yassour, avi, muli, weidong.han,
	anthony

* On Tuesday 29 July 2008 15:26:45 Yang, Sheng wrote:
> On Tuesday 29 July 2008 17:34:47 Amit Shah wrote:

> > > rename irq_request_opaque to struct* kvm in struct kvm_pic, and
> > > modify all irq_request() calling(three of them in all) with (void
> > > *)kvm.
> >
> > 'opaque' fields can be later made to point to other structures
> > without changing the structure itself. This is an advantage. Will
> > the kvm_pic struct be needed to change in the future? Very
> > unlikely. So we can rename it to struct kvm *, however, that gives
> > us no real benefit as against opaque (just readability).
>
> Yes, the readability...
>
> I think people would be very curious about why
>
> void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi)
>
> got a irq_request_opaque as a parameter. It's more like a hack, which
> is not my meaning...
>
> Anyway, it's trivial one and just a coding style. :)

However, this idiom is used in quite a few places in the kernel already so it 
shouldn't come as a big surprise to someone reading the code.

Amit

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] KVM: pci device assignment
  2008-07-29 14:02           ` Avi Kivity
@ 2008-07-30  6:00             ` Amit Shah
  0 siblings, 0 replies; 36+ messages in thread
From: Amit Shah @ 2008-07-30  6:00 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Ben-Ami Yassour, kvm, Muli Ben-Yehuda, weidong.han, anthony

* On Tuesday 29 July 2008 19:32:26 Avi Kivity wrote:
> Ben-Ami Yassour wrote:
> > On Tue, 2008-07-29 at 14:49 +0530, Amit Shah wrote:
> >> * On Monday 28 Jul 2008 21:56:26 Ben-Ami Yassour wrote:
> >>> +static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
> >>> +				      struct kvm_assigned_pci_dev *assigned_dev)
> >>> +{
> >>>
> >>>
> >>> +	if (pci_enable_device(dev)) {
> >>> +		printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
> >>> +		r = -EBUSY;
> >>> +		goto out_put;
> >>> +	}
> >>> +	r = pci_request_regions(dev, "kvm_assigned_device");
> >>> +	if (r) {
> >>> +		printk(KERN_INFO "%s: Could not get access to device regions\n",
> >>> +		       __func__);
> >>> +		goto out_disable;
> >>
> >> Shouldn't disable here unconditionally (see my comment earlier to the
> >> previous patch).
> >
> > Why? the device should not be used by the host at the same time.
> > What is the condition that you were thinking of?
>
> pci_enable_device() can succeed even if the device was already enabled,
> so Amit was probably wishing to avoid an assignment failure disabling a
> device under a driver's feet.  But I see that pci_disable_device() will
> pair with pci_enable_device() correctly (doing reference counts), so I
> think the code is correct as is.

Ouch, right. I missed that.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] KVM: pci device assignment
  2008-07-29  9:38         ` Ben-Ami Yassour
  2008-07-29 14:02           ` Avi Kivity
@ 2008-07-30  6:03           ` Amit Shah
  2008-07-30 11:58             ` Ben-Ami Yassour
  1 sibling, 1 reply; 36+ messages in thread
From: Amit Shah @ 2008-07-30  6:03 UTC (permalink / raw)
  To: Ben-Ami Yassour; +Cc: avi, kvm, Muli Ben-Yehuda, weidong.han, anthony

* On Tuesday 29 July 2008 15:08:27 Ben-Ami Yassour wrote:
> On Tue, 2008-07-29 at 14:49 +0530, Amit Shah wrote:
> > * On Monday 28 Jul 2008 21:56:26 Ben-Ami Yassour wrote:
> > > +static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
> > > +				      struct kvm_assigned_pci_dev *assigned_dev)
> > > +{
> > >
> > >
> > > +	if (pci_enable_device(dev)) {
> > > +		printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
> > > +		r = -EBUSY;
> > > +		goto out_put;
> > > +	}
> > > +	r = pci_request_regions(dev, "kvm_assigned_device");
> > > +	if (r) {
> > > +		printk(KERN_INFO "%s: Could not get access to device regions\n",
> > > +		       __func__);
> > > +		goto out_disable;
> >
> > Shouldn't disable here unconditionally (see my comment earlier to the
> > previous patch).
>
> Why? the device should not be used by the host at the same time.
> What is the condition that you were thinking of?
>
> > > +static void kvm_free_assigned_devices(struct kvm *kvm)
> > > +{
> > > +	struct list_head *ptr, *ptr2;
> > > +	struct kvm_assigned_dev_kernel *assigned_dev;
> > > +
> > > +	list_for_each_safe(ptr, ptr2, &kvm->arch.assigned_dev_head) {
> > > +		assigned_dev = list_entry(ptr,
> > > +					  struct kvm_assigned_dev_kernel,
> > > +					  list);
> > > +
> > > +		if (irqchip_in_kernel(kvm) && assigned_dev->irq_requested) {
> > > +			free_irq(assigned_dev->host_irq,
> > > +				 (void *)assigned_dev);
> > > +
> > > +			kvm_unregister_irq_ack_notifier(kvm,
> > > +							&assigned_dev->
> > > +							ack_notifier);
> > > +		}
> >
> > Unregister the notifier before freeing irq
>
> I don't think that there is any importance to the order in this case,
> but just in case, the order should be in reverse to the initialization
> order, which is the case here.

Just that we shouldn't accept any new interrupts to ack because we're going to 
free the irq as the next step. This is more relevant now that we don't have 
the rwlock for the device assignment structures. If the work gets scheduled 
after we free the resources, it's going to bomb:

> +/* Ack the irq line for an assigned device */
> +static void kvm_assigned_dev_ack_irq(struct kvm_irq_ack_notifier *kian)
> +{
> +       struct kvm_assigned_dev_kernel *dev;
> +
> +       if (kian->gsi == -1)
> +               return;
> +
> +       dev = container_of(kian, struct kvm_assigned_dev_kernel,
> +                          ack_notifier);
> +       kvm_set_irq(dev->kvm, dev->guest_irq, 0);
> +       enable_irq(dev->host_irq);
> +}


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] KVM: pci device assignment
  2008-07-30  6:03           ` Amit Shah
@ 2008-07-30 11:58             ` Ben-Ami Yassour
  2008-08-01 11:24               ` Amit Shah
  0 siblings, 1 reply; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-07-30 11:58 UTC (permalink / raw)
  To: Amit Shah; +Cc: avi, kvm, Muli Ben-Yehuda, weidong.han, anthony

On Wed, 2008-07-30 at 11:33 +0530, Amit Shah wrote:
> * On Tuesday 29 July 2008 15:08:27 Ben-Ami Yassour wrote:
> > On Tue, 2008-07-29 at 14:49 +0530, Amit Shah wrote:
> > > * On Monday 28 Jul 2008 21:56:26 Ben-Ami Yassour wrote:
> > > > +static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
> > > > +				      struct kvm_assigned_pci_dev *assigned_dev)
> > > > +{
> > > >
> > > >
> > > > +	if (pci_enable_device(dev)) {
> > > > +		printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
> > > > +		r = -EBUSY;
> > > > +		goto out_put;
> > > > +	}
> > > > +	r = pci_request_regions(dev, "kvm_assigned_device");
> > > > +	if (r) {
> > > > +		printk(KERN_INFO "%s: Could not get access to device regions\n",
> > > > +		       __func__);
> > > > +		goto out_disable;
> > >
> > > Shouldn't disable here unconditionally (see my comment earlier to the
> > > previous patch).
> >
> > Why? the device should not be used by the host at the same time.
> > What is the condition that you were thinking of?
> >
> > > > +static void kvm_free_assigned_devices(struct kvm *kvm)
> > > > +{
> > > > +	struct list_head *ptr, *ptr2;
> > > > +	struct kvm_assigned_dev_kernel *assigned_dev;
> > > > +
> > > > +	list_for_each_safe(ptr, ptr2, &kvm->arch.assigned_dev_head) {
> > > > +		assigned_dev = list_entry(ptr,
> > > > +					  struct kvm_assigned_dev_kernel,
> > > > +					  list);
> > > > +
> > > > +		if (irqchip_in_kernel(kvm) && assigned_dev->irq_requested) {
> > > > +			free_irq(assigned_dev->host_irq,
> > > > +				 (void *)assigned_dev);
> > > > +
> > > > +			kvm_unregister_irq_ack_notifier(kvm,
> > > > +							&assigned_dev->
> > > > +							ack_notifier);
> > > > +		}
> > >
> > > Unregister the notifier before freeing irq
> >
> > I don't think that there is any importance to the order in this case,
> > but just in case, the order should be in reverse to the initialization
> > order, which is the case here.
> 
> Just that we shouldn't accept any new interrupts to ack because we're going to 
> free the irq as the next step. This is more relevant now that we don't have 
> the rwlock for the device assignment structures. If the work gets scheduled 
> after we free the resources, it's going to bomb:

These are two separate things.
The ack notifier is called when the guest is signaling the virutal
IOAPIC with EOI, and it has no direct relation with the host interrupt
handler.
The connection is that when the interrupt is generated, the host inject
it to the guest, and then the guest will eventually do EOI. But we are
not going to run the guest after we started to run this termination code
so this is irrelevant. Even if it was, then the right order is first to
make sure that there are no interrupts received in the host and then
cancel the notifier (which is the reverse order of the initialization).

Regards,
Ben



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Device assignemnt: updated patches
  2008-07-28 16:26 Device assignemnt: updated patches Ben-Ami Yassour
  2008-07-28 16:26 ` [PATCH 1/5] KVM: PCIPT: direct mmio pfn check Ben-Ami Yassour
@ 2008-07-30 15:21 ` Avi Kivity
  1 sibling, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2008-07-30 15:21 UTC (permalink / raw)
  To: Ben-Ami Yassour; +Cc: amit.shah, kvm, muli, weidong.han, anthony

Ben-Ami Yassour wrote:
> Following are the device assignment patches with the fixes of the
> comments that were sent for the previous version.
>
>   

Applied 1 and 3, thanks.  I already got irq ack notifiers from Marcelo.  
4/5 need to be updated to compile when VT-d is not built into the 
kernel, and also need retesting with mmu notifiers.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/5] KVM: Add irq ack notifier list
  2008-07-30  5:54           ` Amit Shah
@ 2008-07-31  8:55             ` Avi Kivity
  0 siblings, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2008-07-31  8:55 UTC (permalink / raw)
  To: Amit Shah
  Cc: Yang, Sheng, kvm, Marcelo Tosatti, Ben-Ami Yassour, muli,
	weidong.han, anthony

Amit Shah wrote:
>>> 'opaque' fields can be later made to point to other structures
>>> without changing the structure itself. This is an advantage. Will
>>> the kvm_pic struct be needed to change in the future? Very
>>> unlikely. So we can rename it to struct kvm *, however, that gives
>>> us no real benefit as against opaque (just readability).
>>>       
>> Yes, the readability...
>>
>> I think people would be very curious about why
>>
>> void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi)
>>
>> got a irq_request_opaque as a parameter. It's more like a hack, which
>> is not my meaning...
>>
>> Anyway, it's trivial one and just a coding style. :)
>>     
>
> However, this idiom is used in quite a few places in the kernel already so it 
> shouldn't come as a big surprise to someone reading the code.
>   

Void pointers are useful where there are multiple clients for a service 
which needs a callback.  When there is just one client (and not because 
no one has written another, but because logically there is only one 
client), specifying it explicity is better.

An alternative to void pointers (which I prefer) is to embed the 
structure that holds the callback within the structure that needs to be 
passed to the callback, and use container_of().  This saves the space to 
store the void pointer, and is also cleaner, IMO.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guest
  2008-07-28 16:32             ` [PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guest Ben-Ami Yassour
@ 2008-08-01  3:09               ` Han, Weidong
  2008-08-05  9:41                 ` Ben-Ami Yassour
  0 siblings, 1 reply; 36+ messages in thread
From: Han, Weidong @ 2008-08-01  3:09 UTC (permalink / raw)
  To: Ben-Ami Yassour, avi
  Cc: amit.shah, kvm, muli, anthony, Nir Peleg,
	Glauber de Oliveira Costa

Ben-Ami Yassour wrote:
> +#define assigned_dev_ioport_write(suffix)
\
> +static void assigned_dev_ioport_write##suffix(void *opaque,         
> \ +					      uint32_t addr,
\
> +                                              uint32_t value)
\
> +{
\
> +	assigned_dev_region_t *r_access =
\
> +                  (assigned_dev_region_t *)opaque;
\
> +	uint32_t r_pio = (unsigned long)r_access->r_virtbase
\
> +		+ (addr - r_access->e_physbase);
\
> +	if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
\
> +		fprintf(logfile, "assigned_dev_ioport_write" #suffix
\
> +			": r_pio=%08x e_physbase=%08x"
\
> +			" r_virtbase=%08lx value=%08x\n",
\
> +			r_pio, (int)r_access->e_physbase,
\
> +			(unsigned long)r_access->r_virtbase, value);
\
> +	}
\
> +	iopl(3);\
> +	out##suffix(value, r_pio);
\
> +}
> +assigned_dev_ioport_write(b)
> +assigned_dev_ioport_write(w)
> +assigned_dev_ioport_write(l)
> +
> +#define assigned_dev_ioport_read(suffix)
\
> +static uint32_t assigned_dev_ioport_read##suffix(void *opaque,      
> \ +                                                 uint32_t addr)
\
> +{
\
> +	assigned_dev_region_t *r_access =
\
> +                    (assigned_dev_region_t *)opaque;
\
> +	uint32_t r_pio = (addr - r_access->e_physbase)
\
> +		+ (unsigned long)r_access->r_virtbase;
\
> +		uint32_t value = in##suffix(r_pio);
\
> +		if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
\
> +			fprintf(logfile, "assigned_dev_ioport_read"
\
> +                                #suffix	": r_pio=%08x "

> \ +                                "e_physbase=%08x r_virtbase=%08lx
> "	\ +				"value=%08x\n",
\
> +				r_pio, (int)r_access->e_physbase,
\
> +				(unsigned long)r_access->r_virtbase,
\
> +                                value);
\
> +		}
\
> +		return value;
\
> +}
> +
> +assigned_dev_ioport_read(b)
> +assigned_dev_ioport_read(w)
> +assigned_dev_ioport_read(l)
> +

Need to add iopl(3) in assigned_dev_ioport_read##suffix(). ioport read
may be before ioport write. When assign USB controller to guest, I found
it read ioport before write. If there is no iopl(3) in
assigned_dev_ioport_read##suffix(), it results in "segmentation fault". 

BTW, code style in assigned_dev_ioport_read##suffix() is not correct,
pls update it.

Randy (Weidong)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] KVM: pci device assignment
  2008-07-30 11:58             ` Ben-Ami Yassour
@ 2008-08-01 11:24               ` Amit Shah
  0 siblings, 0 replies; 36+ messages in thread
From: Amit Shah @ 2008-08-01 11:24 UTC (permalink / raw)
  To: Ben-Ami Yassour; +Cc: avi, kvm, Muli Ben-Yehuda, weidong.han, anthony

* On Wednesday 30 Jul 2008 17:28:01 Ben-Ami Yassour wrote:
> On Wed, 2008-07-30 at 11:33 +0530, Amit Shah wrote:
> > * On Tuesday 29 July 2008 15:08:27 Ben-Ami Yassour wrote:
> > > On Tue, 2008-07-29 at 14:49 +0530, Amit Shah wrote:
> > > > * On Monday 28 Jul 2008 21:56:26 Ben-Ami Yassour wrote:
> > > > > +static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
> > > > > +				      struct kvm_assigned_pci_dev *assigned_dev)
> > > > > +{
> > > > >
> > > > >
> > > > > +	if (pci_enable_device(dev)) {
> > > > > +		printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
> > > > > +		r = -EBUSY;
> > > > > +		goto out_put;
> > > > > +	}
> > > > > +	r = pci_request_regions(dev, "kvm_assigned_device");
> > > > > +	if (r) {
> > > > > +		printk(KERN_INFO "%s: Could not get access to device regions\n",
> > > > > +		       __func__);
> > > > > +		goto out_disable;
> > > >
> > > > Shouldn't disable here unconditionally (see my comment earlier to the
> > > > previous patch).
> > >
> > > Why? the device should not be used by the host at the same time.
> > > What is the condition that you were thinking of?
> > >
> > > > > +static void kvm_free_assigned_devices(struct kvm *kvm)
> > > > > +{
> > > > > +	struct list_head *ptr, *ptr2;
> > > > > +	struct kvm_assigned_dev_kernel *assigned_dev;
> > > > > +
> > > > > +	list_for_each_safe(ptr, ptr2, &kvm->arch.assigned_dev_head) {
> > > > > +		assigned_dev = list_entry(ptr,
> > > > > +					  struct kvm_assigned_dev_kernel,
> > > > > +					  list);
> > > > > +
> > > > > +		if (irqchip_in_kernel(kvm) && assigned_dev->irq_requested) {
> > > > > +			free_irq(assigned_dev->host_irq,
> > > > > +				 (void *)assigned_dev);
> > > > > +
> > > > > +			kvm_unregister_irq_ack_notifier(kvm,
> > > > > +							&assigned_dev->
> > > > > +							ack_notifier);
> > > > > +		}
> > > >
> > > > Unregister the notifier before freeing irq
> > >
> > > I don't think that there is any importance to the order in this case,
> > > but just in case, the order should be in reverse to the initialization
> > > order, which is the case here.
> >
> > Just that we shouldn't accept any new interrupts to ack because we're
> > going to free the irq as the next step. This is more relevant now that we
> > don't have the rwlock for the device assignment structures. If the work
> > gets scheduled after we free the resources, it's going to bomb:
>
> These are two separate things.
> The ack notifier is called when the guest is signaling the virutal
> IOAPIC with EOI, and it has no direct relation with the host interrupt
> handler.

Of course. You're right.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-07-28 16:26         ` [PATCH 5/5] This patch extends the VT-d driver " Ben-Ami Yassour
  2008-07-28 16:32           ` Device assignment - userspace part Ben-Ami Yassour
  2008-07-29  7:28           ` [PATCH 5/5] This patch extends the VT-d driver to support KVM Yang, Sheng
@ 2008-08-05  6:01           ` Yang, Sheng
  2008-08-05  9:32             ` Ben-Ami Yassour
  2008-08-05 14:46           ` Han, Weidong
  3 siblings, 1 reply; 36+ messages in thread
From: Yang, Sheng @ 2008-08-05  6:01 UTC (permalink / raw)
  To: kvm
  Cc: Ben-Ami Yassour, avi, amit.shah, muli, weidong.han, anthony,
	Kay, Allen M

On Tuesday 29 July 2008 00:26:28 Ben-Ami Yassour wrote:
> [Ben: fixed memory pinning]
>
>  	kvm_free_physmem_slot(&old, &new);
> +
> +	/* map the pages in iommu page table */
> +	if (intel_iommu_found())
> +		kvm_iommu_map_pages(kvm, base_gfn, npages);
> +

Realize one serious problem here: this line pinned all memory 
regardless of if VT-d device have been assigned. That's means if VT-d 
engine is there(even without assigned device), KVM would pin all 
memory, then disable swapping capability. It's very undesired... I 
don't know if Avi aware of that.

I think we should check assigned device list here to avoid this kind 
of thing happen. Also, we may need a lock here to prevent racy with 
assign_device().

--
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-08-05  6:01           ` Yang, Sheng
@ 2008-08-05  9:32             ` Ben-Ami Yassour
  0 siblings, 0 replies; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-08-05  9:32 UTC (permalink / raw)
  To: Yang, Sheng
  Cc: kvm, avi, amit.shah, Muli Ben-Yehuda, weidong.han, anthony,
	Kay, Allen M

On Tue, 2008-08-05 at 14:01 +0800, Yang, Sheng wrote:
> On Tuesday 29 July 2008 00:26:28 Ben-Ami Yassour wrote:
> > [Ben: fixed memory pinning]
> >
> >  	kvm_free_physmem_slot(&old, &new);
> > +
> > +	/* map the pages in iommu page table */
> > +	if (intel_iommu_found())
> > +		kvm_iommu_map_pages(kvm, base_gfn, npages);
> > +
> 
> Realize one serious problem here: this line pinned all memory 
> regardless of if VT-d device have been assigned. That's means if VT-d 
> engine is there(even without assigned device), KVM would pin all 
> memory, then disable swapping capability. It's very undesired... I 
> don't know if Avi aware of that.
> 
> I think we should check assigned device list here to avoid this kind 
> of thing happen. Also, we may need a lock here to prevent racy with 
> assign_device().

Agree. good catch. we will add that in the next version.

Regards,
Ben



^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guest
  2008-08-01  3:09               ` Han, Weidong
@ 2008-08-05  9:41                 ` Ben-Ami Yassour
  0 siblings, 0 replies; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-08-05  9:41 UTC (permalink / raw)
  To: Han, Weidong
  Cc: avi, amit.shah, kvm, Muli Ben-Yehuda, anthony, Nir Peleg,
	Glauber de Oliveira Costa

On Fri, 2008-08-01 at 11:09 +0800, Han, Weidong wrote:
> Ben-Ami Yassour wrote:
> > +#define assigned_dev_ioport_write(suffix)
> \
> > +static void assigned_dev_ioport_write##suffix(void *opaque,         
> > \ +					      uint32_t addr,
> \
> > +                                              uint32_t value)
> \
> > +{
> \
> > +	assigned_dev_region_t *r_access =
> \
> > +                  (assigned_dev_region_t *)opaque;
> \
> > +	uint32_t r_pio = (unsigned long)r_access->r_virtbase
> \
> > +		+ (addr - r_access->e_physbase);
> \
> > +	if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> \
> > +		fprintf(logfile, "assigned_dev_ioport_write" #suffix
> \
> > +			": r_pio=%08x e_physbase=%08x"
> \
> > +			" r_virtbase=%08lx value=%08x\n",
> \
> > +			r_pio, (int)r_access->e_physbase,
> \
> > +			(unsigned long)r_access->r_virtbase, value);
> \
> > +	}
> \
> > +	iopl(3);\
> > +	out##suffix(value, r_pio);
> \
> > +}
> > +assigned_dev_ioport_write(b)
> > +assigned_dev_ioport_write(w)
> > +assigned_dev_ioport_write(l)
> > +
> > +#define assigned_dev_ioport_read(suffix)
> \
> > +static uint32_t assigned_dev_ioport_read##suffix(void *opaque,      
> > \ +                                                 uint32_t addr)
> \
> > +{
> \
> > +	assigned_dev_region_t *r_access =
> \
> > +                    (assigned_dev_region_t *)opaque;
> \
> > +	uint32_t r_pio = (addr - r_access->e_physbase)
> \
> > +		+ (unsigned long)r_access->r_virtbase;
> \
> > +		uint32_t value = in##suffix(r_pio);
> \
> > +		if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> \
> > +			fprintf(logfile, "assigned_dev_ioport_read"
> \
> > +                                #suffix	": r_pio=%08x "
> 
> > \ +                                "e_physbase=%08x r_virtbase=%08lx
> > "	\ +				"value=%08x\n",
> \
> > +				r_pio, (int)r_access->e_physbase,
> \
> > +				(unsigned long)r_access->r_virtbase,
> \
> > +                                value);
> \
> > +		}
> \
> > +		return value;
> \
> > +}
> > +
> > +assigned_dev_ioport_read(b)
> > +assigned_dev_ioport_read(w)
> > +assigned_dev_ioport_read(l)
> > +
> 
> Need to add iopl(3) in assigned_dev_ioport_read##suffix(). ioport read
> may be before ioport write. When assign USB controller to guest, I found
> it read ioport before write. If there is no iopl(3) in
> assigned_dev_ioport_read##suffix(), it results in "segmentation fault". 
OK, we can add that for the next version, but for the long run we need
to change the pio handling anyhow.

> 
> BTW, code style in assigned_dev_ioport_read##suffix() is not correct,
> pls update it.
do you refer to the indentation, or anything else?

Thanks,
Ben

> 
> Randy (Weidong)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-07-28 16:26         ` [PATCH 5/5] This patch extends the VT-d driver " Ben-Ami Yassour
                             ` (2 preceding siblings ...)
  2008-08-05  6:01           ` Yang, Sheng
@ 2008-08-05 14:46           ` Han, Weidong
  2008-08-06  5:50             ` Ben-Ami Yassour
  3 siblings, 1 reply; 36+ messages in thread
From: Han, Weidong @ 2008-08-05 14:46 UTC (permalink / raw)
  To: Ben-Ami Yassour, avi; +Cc: amit.shah, kvm, muli, anthony, Kay, Allen M

Ben-Ami Yassour wrote:
> [Ben: fixed memory pinning]
> 
> Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
> Signed-off-by: Weidong Han <weidong.han@intel.com>
> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> ---
> +
> +int kvm_iommu_map_guest(struct kvm *kvm,
> +			struct kvm_assigned_dev_kernel *assigned_dev)
> +{
> +	struct pci_dev *pdev = NULL;
> +	int rc;
> +
> +	printk(KERN_DEBUG "VT-d direct map: host bdf = %x:%x:%x\n",
> +	       assigned_dev->host_busnr,
> +	       PCI_SLOT(assigned_dev->host_devfn),
> +	       PCI_FUNC(assigned_dev->host_devfn));
> +
> +	pdev = assigned_dev->dev;
> +
> +	if (pdev == NULL) {
> +		if (kvm->arch.intel_iommu_domain) {
> +
intel_iommu_domain_exit(kvm->arch.intel_iommu_domain);
> +			kvm->arch.intel_iommu_domain = NULL;
> +		}
> +		return -ENODEV;
> +	}
> +
> +	kvm->arch.intel_iommu_domain = intel_iommu_domain_alloc(pdev);
> +
> +	rc = kvm_iommu_map_memslots(kvm);
> +	if (rc) {
> +		kvm_iommu_unmap_memslots(kvm);
> +		return rc;
> +	}
> +
> +	intel_iommu_detach_dev(kvm->arch.intel_iommu_domain,
> +			       pdev->bus->number, pdev->devfn);

I think we should remove intel_iommu_detach_dev(), which detaches device
from iommu. If the device has been assigned to other domain already, it
should not be assigned to this domain. Maybe we need to add a check
whether the device has been assigned already.

Randy (Weidong)

> +
> +	rc = intel_iommu_context_mapping(kvm->arch.intel_iommu_domain,
> +					 pdev);
> +	if (rc) {
> +		printk(KERN_ERR "Domain context map for %s failed",
> +		       pci_name(pdev));
> +		return rc;
> +	}
> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-08-05 14:46           ` Han, Weidong
@ 2008-08-06  5:50             ` Ben-Ami Yassour
  2008-08-06  6:18               ` Han, Weidong
  0 siblings, 1 reply; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-08-06  5:50 UTC (permalink / raw)
  To: Han, Weidong; +Cc: avi, amit.shah, kvm, Muli Ben-Yehuda, anthony, Kay, Allen M

On Tue, 2008-08-05 at 22:46 +0800, Han, Weidong wrote:
> Ben-Ami Yassour wrote:
> > [Ben: fixed memory pinning]
> > 
> > Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
> > Signed-off-by: Weidong Han <weidong.han@intel.com>
> > Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> > ---
> > +
> > +int kvm_iommu_map_guest(struct kvm *kvm,
> > +			struct kvm_assigned_dev_kernel *assigned_dev)
> > +{
> > +	struct pci_dev *pdev = NULL;
> > +	int rc;
> > +
> > +	printk(KERN_DEBUG "VT-d direct map: host bdf = %x:%x:%x\n",
> > +	       assigned_dev->host_busnr,
> > +	       PCI_SLOT(assigned_dev->host_devfn),
> > +	       PCI_FUNC(assigned_dev->host_devfn));
> > +
> > +	pdev = assigned_dev->dev;
> > +
> > +	if (pdev == NULL) {
> > +		if (kvm->arch.intel_iommu_domain) {
> > +
> intel_iommu_domain_exit(kvm->arch.intel_iommu_domain);
> > +			kvm->arch.intel_iommu_domain = NULL;
> > +		}
> > +		return -ENODEV;
> > +	}
> > +
> > +	kvm->arch.intel_iommu_domain = intel_iommu_domain_alloc(pdev);
> > +
> > +	rc = kvm_iommu_map_memslots(kvm);
> > +	if (rc) {
> > +		kvm_iommu_unmap_memslots(kvm);
> > +		return rc;
> > +	}
> > +
> > +	intel_iommu_detach_dev(kvm->arch.intel_iommu_domain,
> > +			       pdev->bus->number, pdev->devfn);
> 
> I think we should remove intel_iommu_detach_dev(), which detaches device
> from iommu. If the device has been assigned to other domain already, it
> should not be assigned to this domain. Maybe we need to add a check
> whether the device has been assigned already.

The problem is that in the generic VT-d code, the IOMMU is not updated
when a device is detached, for example when the driver releases the
device. Consider the following scenario:
1. load device driver on host
2. use device on host
3. unload device driver on host
4. Start KVM and assign the device to the guest.

In this case there is no clearing of the IOMMU on step 3, and we get
Vt-d failures (if we remove the above call).

Regards,
Ben

> 
> Randy (Weidong)
> 
> > +
> > +	rc = intel_iommu_context_mapping(kvm->arch.intel_iommu_domain,
> > +					 pdev);
> > +	if (rc) {
> > +		printk(KERN_ERR "Domain context map for %s failed",
> > +		       pci_name(pdev));
> > +		return rc;
> > +	}
> > +	return 0;
> > +}
> > +


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-08-06  5:50             ` Ben-Ami Yassour
@ 2008-08-06  6:18               ` Han, Weidong
  2008-08-06  8:56                 ` Ben-Ami Yassour
  0 siblings, 1 reply; 36+ messages in thread
From: Han, Weidong @ 2008-08-06  6:18 UTC (permalink / raw)
  To: Ben-Ami Yassour
  Cc: avi, amit.shah, kvm, Muli Ben-Yehuda, anthony, Kay, Allen M

Ben-Ami Yassour wrote:
> On Tue, 2008-08-05 at 22:46 +0800, Han, Weidong wrote:
>> Ben-Ami Yassour wrote:
>>> [Ben: fixed memory pinning]
>>> 
>>> Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
>>> Signed-off-by: Weidong Han <weidong.han@intel.com>
>>> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
>>> ---
>>> +
>>> +int kvm_iommu_map_guest(struct kvm *kvm,
>>> +			struct kvm_assigned_dev_kernel *assigned_dev)
>>> +{
>>> +	struct pci_dev *pdev = NULL;
>>> +	int rc;
>>> +
>>> +	printk(KERN_DEBUG "VT-d direct map: host bdf = %x:%x:%x\n",
>>> +	       assigned_dev->host_busnr,
>>> +	       PCI_SLOT(assigned_dev->host_devfn),
>>> +	       PCI_FUNC(assigned_dev->host_devfn));
>>> +
>>> +	pdev = assigned_dev->dev;
>>> +
>>> +	if (pdev == NULL) {
>>> +		if (kvm->arch.intel_iommu_domain) {
>>> +
>> intel_iommu_domain_exit(kvm->arch.intel_iommu_domain);
>>> +			kvm->arch.intel_iommu_domain = NULL;
>>> +		}
>>> +		return -ENODEV;
>>> +	}
>>> +
>>> +	kvm->arch.intel_iommu_domain = intel_iommu_domain_alloc(pdev); +
>>> +	rc = kvm_iommu_map_memslots(kvm);
>>> +	if (rc) {
>>> +		kvm_iommu_unmap_memslots(kvm);
>>> +		return rc;
>>> +	}
>>> +
>>> +	intel_iommu_detach_dev(kvm->arch.intel_iommu_domain,
>>> +			       pdev->bus->number, pdev->devfn);
>> 
>> I think we should remove intel_iommu_detach_dev(), which detaches
>> device from iommu. If the device has been assigned to other domain
>> already, it should not be assigned to this domain. Maybe we need to
>> add a check whether the device has been assigned already.
> 
> The problem is that in the generic VT-d code, the IOMMU is not updated
> when a device is detached, for example when the driver releases the
> device. Consider the following scenario:
> 1. load device driver on host
> 2. use device on host
> 3. unload device driver on host
> 4. Start KVM and assign the device to the guest.
> 
> In this case there is no clearing of the IOMMU on step 3, and we get
> Vt-d failures (if we remove the above call).
> 

I don't prefer this usage process. As we discussed before, there should
be a dummy driver to hide/unbind passthrough devices from host, only
these devices can be assigned. There should be a check whether device is
assignable or not. If the device is not hidden, or has been assigned to
other guest already, it cannot be assigned. At this point, I perfer to
remove the device driver before restart host to avoid loading driver
during booting. 

Randy (Weidong)

> Regards,
> Ben
> 
>> 
>> Randy (Weidong)
>> 
>>> +
>>> +	rc = intel_iommu_context_mapping(kvm->arch.intel_iommu_domain,
>>> +					 pdev); +	if (rc) {
>>> +		printk(KERN_ERR "Domain context map for %s failed", +

>>> pci_name(pdev)); +		return rc;
>>> +	}
>>> +	return 0;
>>> +}
>>> +


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-08-06  6:18               ` Han, Weidong
@ 2008-08-06  8:56                 ` Ben-Ami Yassour
  2008-08-06  9:12                   ` Han, Weidong
  0 siblings, 1 reply; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-08-06  8:56 UTC (permalink / raw)
  To: Han, Weidong; +Cc: avi, amit.shah, kvm, Muli Ben-Yehuda, anthony, Kay, Allen M

On Wed, 2008-08-06 at 14:18 +0800, Han, Weidong wrote:
> Ben-Ami Yassour wrote:
> > On Tue, 2008-08-05 at 22:46 +0800, Han, Weidong wrote:
> >> Ben-Ami Yassour wrote:
> >>> [Ben: fixed memory pinning]
> >>> 
> >>> Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
> >>> Signed-off-by: Weidong Han <weidong.han@intel.com>
> >>> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> >>> ---
> >>> +
> >>> +int kvm_iommu_map_guest(struct kvm *kvm,
> >>> +			struct kvm_assigned_dev_kernel *assigned_dev)
> >>> +{
> >>> +	struct pci_dev *pdev = NULL;
> >>> +	int rc;
> >>> +
> >>> +	printk(KERN_DEBUG "VT-d direct map: host bdf = %x:%x:%x\n",
> >>> +	       assigned_dev->host_busnr,
> >>> +	       PCI_SLOT(assigned_dev->host_devfn),
> >>> +	       PCI_FUNC(assigned_dev->host_devfn));
> >>> +
> >>> +	pdev = assigned_dev->dev;
> >>> +
> >>> +	if (pdev == NULL) {
> >>> +		if (kvm->arch.intel_iommu_domain) {
> >>> +
> >> intel_iommu_domain_exit(kvm->arch.intel_iommu_domain);
> >>> +			kvm->arch.intel_iommu_domain = NULL;
> >>> +		}
> >>> +		return -ENODEV;
> >>> +	}
> >>> +
> >>> +	kvm->arch.intel_iommu_domain = intel_iommu_domain_alloc(pdev); +
> >>> +	rc = kvm_iommu_map_memslots(kvm);
> >>> +	if (rc) {
> >>> +		kvm_iommu_unmap_memslots(kvm);
> >>> +		return rc;
> >>> +	}
> >>> +
> >>> +	intel_iommu_detach_dev(kvm->arch.intel_iommu_domain,
> >>> +			       pdev->bus->number, pdev->devfn);
> >> 
> >> I think we should remove intel_iommu_detach_dev(), which detaches
> >> device from iommu. If the device has been assigned to other domain
> >> already, it should not be assigned to this domain. Maybe we need to
> >> add a check whether the device has been assigned already.
> > 
> > The problem is that in the generic VT-d code, the IOMMU is not updated
> > when a device is detached, for example when the driver releases the
> > device. Consider the following scenario:
> > 1. load device driver on host
> > 2. use device on host
> > 3. unload device driver on host
> > 4. Start KVM and assign the device to the guest.
> > 
> > In this case there is no clearing of the IOMMU on step 3, and we get
> > Vt-d failures (if we remove the above call).
> > 
> 
> I don't prefer this usage process. As we discussed before, there should
> be a dummy driver to hide/unbind passthrough devices from host, only
> these devices can be assigned. There should be a check whether device is
> assignable or not. If the device is not hidden, or has been assigned to
> other guest already, it cannot be assigned. At this point, I perfer to
> remove the device driver before restart host to avoid loading driver
> during booting. 

I agree, but this is an orthogonal issue. Checking if the device is
assigned has to be part of the device assignment solution in general, it
is not specific to VT-d.
As for VT-d detach, the point is that the generic VT-d code is not
detaching the domain, so we have to do that in KVM. Again, removing the
device driver is not enough (with the current VT-d code). 

Regards,
Ben

> 
> Randy (Weidong)
> 
> > Regards,
> > Ben
> > 
> >> 
> >> Randy (Weidong)
> >> 
> >>> +
> >>> +	rc = intel_iommu_context_mapping(kvm->arch.intel_iommu_domain,
> >>> +					 pdev); +	if (rc) {
> >>> +		printk(KERN_ERR "Domain context map for %s failed", +
> 
> >>> pci_name(pdev)); +		return rc;
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-08-06  8:56                 ` Ben-Ami Yassour
@ 2008-08-06  9:12                   ` Han, Weidong
  2008-08-06  9:42                     ` Ben-Ami Yassour
  0 siblings, 1 reply; 36+ messages in thread
From: Han, Weidong @ 2008-08-06  9:12 UTC (permalink / raw)
  To: Ben-Ami Yassour
  Cc: avi, amit.shah, kvm, Muli Ben-Yehuda, anthony, Kay, Allen M

Ben-Ami Yassour wrote:
> On Wed, 2008-08-06 at 14:18 +0800, Han, Weidong wrote:
>> Ben-Ami Yassour wrote:
>>> On Tue, 2008-08-05 at 22:46 +0800, Han, Weidong wrote:
>>>> Ben-Ami Yassour wrote:
>>>>> [Ben: fixed memory pinning]
>>>>> 
>>>>> Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
>>>>> Signed-off-by: Weidong Han <weidong.han@intel.com>
>>>>> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
>>>>> ---
>>>>> +
>>>>> +int kvm_iommu_map_guest(struct kvm *kvm,
>>>>> +			struct kvm_assigned_dev_kernel *assigned_dev)
>>>>> +{
>>>>> +	struct pci_dev *pdev = NULL;
>>>>> +	int rc;
>>>>> +
>>>>> +	printk(KERN_DEBUG "VT-d direct map: host bdf = %x:%x:%x\n",
>>>>> +	       assigned_dev->host_busnr,
>>>>> +	       PCI_SLOT(assigned_dev->host_devfn),
>>>>> +	       PCI_FUNC(assigned_dev->host_devfn));
>>>>> +
>>>>> +	pdev = assigned_dev->dev;
>>>>> +
>>>>> +	if (pdev == NULL) {
>>>>> +		if (kvm->arch.intel_iommu_domain) {
>>>>> +
>>>> intel_iommu_domain_exit(kvm->arch.intel_iommu_domain);
>>>>> +			kvm->arch.intel_iommu_domain = NULL;
>>>>> +		}
>>>>> +		return -ENODEV;
>>>>> +	}
>>>>> +
>>>>> +	kvm->arch.intel_iommu_domain = intel_iommu_domain_alloc(pdev); +
>>>>> +	rc = kvm_iommu_map_memslots(kvm);
>>>>> +	if (rc) {
>>>>> +		kvm_iommu_unmap_memslots(kvm);
>>>>> +		return rc;
>>>>> +	}
>>>>> +
>>>>> +	intel_iommu_detach_dev(kvm->arch.intel_iommu_domain,
>>>>> +			       pdev->bus->number, pdev->devfn);
>>>> 
>>>> I think we should remove intel_iommu_detach_dev(), which detaches
>>>> device from iommu. If the device has been assigned to other domain
>>>> already, it should not be assigned to this domain. Maybe we need to
>>>> add a check whether the device has been assigned already.
>>> 
>>> The problem is that in the generic VT-d code, the IOMMU is not
>>> updated when a device is detached, for example when the driver
>>> releases the device. Consider the following scenario:
>>> 1. load device driver on host
>>> 2. use device on host
>>> 3. unload device driver on host
>>> 4. Start KVM and assign the device to the guest.
>>> 
>>> In this case there is no clearing of the IOMMU on step 3, and we get
>>> Vt-d failures (if we remove the above call).
>>> 
>> 
>> I don't prefer this usage process. As we discussed before, there
>> should be a dummy driver to hide/unbind passthrough devices from
>> host, only these devices can be assigned. There should be a check
>> whether device is assignable or not. If the device is not hidden, or
>> has been assigned to other guest already, it cannot be assigned. At
>> this point, I perfer to remove the device driver before restart host
>> to avoid loading driver during booting.
> 
> I agree, but this is an orthogonal issue. Checking if the device is
> assigned has to be part of the device assignment solution in general,
> it is not specific to VT-d.

Yes, the check is general. When the check fails, should quit guest
creation, and prompt the reason to user.

> As for VT-d detach, the point is that the generic VT-d code is not
> detaching the domain, so we have to do that in KVM.

In intel_iommu_unmap_guest(), assigned devices will be detached from the
guest. I think it's enough.

> Again, removing the device driver is not enough (with the current VT-d
code).

Why? It works for me. e.g. I removed e1000.ko before reboot host, it
works well. BTW, for USB assignment, I add "nousb" in host grub to avoid
loading driver for USB controllers, it also works.

Randy (Weidong)







^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-08-06  9:12                   ` Han, Weidong
@ 2008-08-06  9:42                     ` Ben-Ami Yassour
  2008-08-07  1:21                       ` Han, Weidong
  0 siblings, 1 reply; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-08-06  9:42 UTC (permalink / raw)
  To: Han, Weidong; +Cc: avi, amit.shah, kvm, Muli Ben-Yehuda, anthony, Kay, Allen M

On Wed, 2008-08-06 at 17:12 +0800, Han, Weidong wrote:
> Ben-Ami Yassour wrote:
> > On Wed, 2008-08-06 at 14:18 +0800, Han, Weidong wrote:
> >> Ben-Ami Yassour wrote:
> >>> On Tue, 2008-08-05 at 22:46 +0800, Han, Weidong wrote:
> >>>> Ben-Ami Yassour wrote:
> >>>>> [Ben: fixed memory pinning]
> >>>>> 
> >>>>> Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
> >>>>> Signed-off-by: Weidong Han <weidong.han@intel.com>
> >>>>> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> >>>>> ---
> >>>>> +
> >>>>> +int kvm_iommu_map_guest(struct kvm *kvm,
> >>>>> +			struct kvm_assigned_dev_kernel *assigned_dev)
> >>>>> +{
> >>>>> +	struct pci_dev *pdev = NULL;
> >>>>> +	int rc;
> >>>>> +
> >>>>> +	printk(KERN_DEBUG "VT-d direct map: host bdf = %x:%x:%x\n",
> >>>>> +	       assigned_dev->host_busnr,
> >>>>> +	       PCI_SLOT(assigned_dev->host_devfn),
> >>>>> +	       PCI_FUNC(assigned_dev->host_devfn));
> >>>>> +
> >>>>> +	pdev = assigned_dev->dev;
> >>>>> +
> >>>>> +	if (pdev == NULL) {
> >>>>> +		if (kvm->arch.intel_iommu_domain) {
> >>>>> +
> >>>> intel_iommu_domain_exit(kvm->arch.intel_iommu_domain);
> >>>>> +			kvm->arch.intel_iommu_domain = NULL;
> >>>>> +		}
> >>>>> +		return -ENODEV;
> >>>>> +	}
> >>>>> +
> >>>>> +	kvm->arch.intel_iommu_domain = intel_iommu_domain_alloc(pdev); +
> >>>>> +	rc = kvm_iommu_map_memslots(kvm);
> >>>>> +	if (rc) {
> >>>>> +		kvm_iommu_unmap_memslots(kvm);
> >>>>> +		return rc;
> >>>>> +	}
> >>>>> +
> >>>>> +	intel_iommu_detach_dev(kvm->arch.intel_iommu_domain,
> >>>>> +			       pdev->bus->number, pdev->devfn);
> >>>> 
> >>>> I think we should remove intel_iommu_detach_dev(), which detaches
> >>>> device from iommu. If the device has been assigned to other domain
> >>>> already, it should not be assigned to this domain. Maybe we need to
> >>>> add a check whether the device has been assigned already.
> >>> 
> >>> The problem is that in the generic VT-d code, the IOMMU is not
> >>> updated when a device is detached, for example when the driver
> >>> releases the device. Consider the following scenario:
> >>> 1. load device driver on host
> >>> 2. use device on host
> >>> 3. unload device driver on host
> >>> 4. Start KVM and assign the device to the guest.
> >>> 
> >>> In this case there is no clearing of the IOMMU on step 3, and we get
> >>> Vt-d failures (if we remove the above call).
> >>> 
> >> 
> >> I don't prefer this usage process. As we discussed before, there
> >> should be a dummy driver to hide/unbind passthrough devices from
> >> host, only these devices can be assigned. There should be a check
> >> whether device is assignable or not. If the device is not hidden, or
> >> has been assigned to other guest already, it cannot be assigned. At
> >> this point, I perfer to remove the device driver before restart host
> >> to avoid loading driver during booting.
> > 
> > I agree, but this is an orthogonal issue. Checking if the device is
> > assigned has to be part of the device assignment solution in general,
> > it is not specific to VT-d.
> 
> Yes, the check is general. When the check fails, should quit guest
> creation, and prompt the reason to user.
> 
> > As for VT-d detach, the point is that the generic VT-d code is not
> > detaching the domain, so we have to do that in KVM.
> 
> In intel_iommu_unmap_guest(), assigned devices will be detached from the
> guest. I think it's enough.
> 
> > Again, removing the device driver is not enough (with the current VT-d
> code).
> 
> Why? It works for me. e.g. I removed e1000.ko before reboot host, it
> works well. BTW, for USB assignment, I add "nousb" in host grub to avoid
> loading driver for USB controllers, it also works.
What about the scenario described above:
1. load device driver on host
2. use device on host
3. unload device driver on host
4. Start KVM and assign the device to the guest.

Does it work for you as well?

Thanks,
Ben



^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-08-06  9:42                     ` Ben-Ami Yassour
@ 2008-08-07  1:21                       ` Han, Weidong
  2008-08-07 10:35                         ` Ben-Ami Yassour
  0 siblings, 1 reply; 36+ messages in thread
From: Han, Weidong @ 2008-08-07  1:21 UTC (permalink / raw)
  To: Ben-Ami Yassour
  Cc: avi, amit.shah, kvm, Muli Ben-Yehuda, anthony, Kay, Allen M

Ben-Ami Yassour wrote:
> On Wed, 2008-08-06 at 17:12 +0800, Han, Weidong wrote:
>> Ben-Ami Yassour wrote:
>>> On Wed, 2008-08-06 at 14:18 +0800, Han, Weidong wrote:
>>>> Ben-Ami Yassour wrote:
>>>>> On Tue, 2008-08-05 at 22:46 +0800, Han, Weidong wrote:
>>>>>> Ben-Ami Yassour wrote:
>>>>>>> [Ben: fixed memory pinning]
>>>>>>> 
>>>>>>> Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
>>>>>>> Signed-off-by: Weidong Han <weidong.han@intel.com>
>>>>>>> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
>>>>>>> ---
>>>>>>> +
>>>>>>> +int kvm_iommu_map_guest(struct kvm *kvm,
>>>>>>> +			struct kvm_assigned_dev_kernel
*assigned_dev)
>>>>>>> +{
>>>>>>> +	struct pci_dev *pdev = NULL;
>>>>>>> +	int rc;
>>>>>>> +
>>>>>>> +	printk(KERN_DEBUG "VT-d direct map: host bdf =
%x:%x:%x\n",
>>>>>>> +	       assigned_dev->host_busnr,
>>>>>>> +	       PCI_SLOT(assigned_dev->host_devfn),
>>>>>>> +	       PCI_FUNC(assigned_dev->host_devfn));
>>>>>>> +
>>>>>>> +	pdev = assigned_dev->dev;
>>>>>>> +
>>>>>>> +	if (pdev == NULL) {
>>>>>>> +		if (kvm->arch.intel_iommu_domain) {
>>>>>>> +
>>>>>> intel_iommu_domain_exit(kvm->arch.intel_iommu_domain);
>>>>>>> +			kvm->arch.intel_iommu_domain = NULL;
>>>>>>> +		}
>>>>>>> +		return -ENODEV;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	kvm->arch.intel_iommu_domain =
>>>>>>> intel_iommu_domain_alloc(pdev); + +	rc =
>>>>>>> kvm_iommu_map_memslots(kvm); +	if (rc) {
>>>>>>> +		kvm_iommu_unmap_memslots(kvm);
>>>>>>> +		return rc;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	intel_iommu_detach_dev(kvm->arch.intel_iommu_domain,
>>>>>>> +			       pdev->bus->number, pdev->devfn);
>>>>>> 
>>>>>> I think we should remove intel_iommu_detach_dev(), which detaches
>>>>>> device from iommu. If the device has been assigned to other
>>>>>> domain already, it should not be assigned to this domain. Maybe
>>>>>> we need to add a check whether the device has been assigned
>>>>>> already. 
>>>>> 
>>>>> The problem is that in the generic VT-d code, the IOMMU is not
>>>>> updated when a device is detached, for example when the driver
>>>>> releases the device. Consider the following scenario:
>>>>> 1. load device driver on host
>>>>> 2. use device on host
>>>>> 3. unload device driver on host
>>>>> 4. Start KVM and assign the device to the guest.
>>>>> 
>>>>> In this case there is no clearing of the IOMMU on step 3, and we
>>>>> get Vt-d failures (if we remove the above call).
>>>>> 
>>>> 
>>>> I don't prefer this usage process. As we discussed before, there
>>>> should be a dummy driver to hide/unbind passthrough devices from
>>>> host, only these devices can be assigned. There should be a check
>>>> whether device is assignable or not. If the device is not hidden,
>>>> or has been assigned to other guest already, it cannot be
>>>> assigned. At this point, I perfer to remove the device driver
>>>> before restart host to avoid loading driver during booting.
>>> 
>>> I agree, but this is an orthogonal issue. Checking if the device is
>>> assigned has to be part of the device assignment solution in
>>> general, it is not specific to VT-d.
>> 
>> Yes, the check is general. When the check fails, should quit guest
>> creation, and prompt the reason to user.
>> 
>>> As for VT-d detach, the point is that the generic VT-d code is not
>>> detaching the domain, so we have to do that in KVM.
>> 
>> In intel_iommu_unmap_guest(), assigned devices will be detached from
>> the guest. I think it's enough. 
>> 
>>> Again, removing the device driver is not enough (with the current
>>> VT-d code). 
>> 
>> Why? It works for me. e.g. I removed e1000.ko before reboot host, it
>> works well. BTW, for USB assignment, I add "nousb" in host grub to
>> avoid loading driver for USB controllers, it also works.
> What about the scenario described above:
> 1. load device driver on host
> 2. use device on host
> 3. unload device driver on host
> 4. Start KVM and assign the device to the guest.
> 
> Does it work for you as well?
> 

I didn't try the scenario you described, it should not work if removing
the detach call. I means removing device driver before reboot host. We
should implement a mechanism to ensure assignable devices won't be set
in context, instead of adding the detach call unconditionally in
kvm_iommu_map_guest(). At this moment, I perfer to remove corresponding
device driver before reboot host to assign the device.

Randy (Weidong)



^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-08-07  1:21                       ` Han, Weidong
@ 2008-08-07 10:35                         ` Ben-Ami Yassour
  2008-08-12  3:29                           ` Han, Weidong
  0 siblings, 1 reply; 36+ messages in thread
From: Ben-Ami Yassour @ 2008-08-07 10:35 UTC (permalink / raw)
  To: Han, Weidong; +Cc: avi, amit.shah, kvm, Muli Ben-Yehuda, anthony, Kay, Allen M

On Thu, 2008-08-07 at 09:21 +0800, Han, Weidong wrote:
> Ben-Ami Yassour wrote:
> > On Wed, 2008-08-06 at 17:12 +0800, Han, Weidong wrote:
> >> Ben-Ami Yassour wrote:
> >>> On Wed, 2008-08-06 at 14:18 +0800, Han, Weidong wrote:
> >>>> Ben-Ami Yassour wrote:
> >>>>> On Tue, 2008-08-05 at 22:46 +0800, Han, Weidong wrote:
> >>>>>> Ben-Ami Yassour wrote:
> >>>>>>> [Ben: fixed memory pinning]
> >>>>>>> 
> >>>>>>> Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
> >>>>>>> Signed-off-by: Weidong Han <weidong.han@intel.com>
> >>>>>>> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> >>>>>>> ---
> >>>>>>> +
> >>>>>>> +int kvm_iommu_map_guest(struct kvm *kvm,
> >>>>>>> +			struct kvm_assigned_dev_kernel
> *assigned_dev)
> >>>>>>> +{
> >>>>>>> +	struct pci_dev *pdev = NULL;
> >>>>>>> +	int rc;
> >>>>>>> +
> >>>>>>> +	printk(KERN_DEBUG "VT-d direct map: host bdf =
> %x:%x:%x\n",
> >>>>>>> +	       assigned_dev->host_busnr,
> >>>>>>> +	       PCI_SLOT(assigned_dev->host_devfn),
> >>>>>>> +	       PCI_FUNC(assigned_dev->host_devfn));
> >>>>>>> +
> >>>>>>> +	pdev = assigned_dev->dev;
> >>>>>>> +
> >>>>>>> +	if (pdev == NULL) {
> >>>>>>> +		if (kvm->arch.intel_iommu_domain) {
> >>>>>>> +
> >>>>>> intel_iommu_domain_exit(kvm->arch.intel_iommu_domain);
> >>>>>>> +			kvm->arch.intel_iommu_domain = NULL;
> >>>>>>> +		}
> >>>>>>> +		return -ENODEV;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>> +	kvm->arch.intel_iommu_domain =
> >>>>>>> intel_iommu_domain_alloc(pdev); + +	rc =
> >>>>>>> kvm_iommu_map_memslots(kvm); +	if (rc) {
> >>>>>>> +		kvm_iommu_unmap_memslots(kvm);
> >>>>>>> +		return rc;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>> +	intel_iommu_detach_dev(kvm->arch.intel_iommu_domain,
> >>>>>>> +			       pdev->bus->number, pdev->devfn);
> >>>>>> 
> >>>>>> I think we should remove intel_iommu_detach_dev(), which detaches
> >>>>>> device from iommu. If the device has been assigned to other
> >>>>>> domain already, it should not be assigned to this domain. Maybe
> >>>>>> we need to add a check whether the device has been assigned
> >>>>>> already. 
> >>>>> 
> >>>>> The problem is that in the generic VT-d code, the IOMMU is not
> >>>>> updated when a device is detached, for example when the driver
> >>>>> releases the device. Consider the following scenario:
> >>>>> 1. load device driver on host
> >>>>> 2. use device on host
> >>>>> 3. unload device driver on host
> >>>>> 4. Start KVM and assign the device to the guest.
> >>>>> 
> >>>>> In this case there is no clearing of the IOMMU on step 3, and we
> >>>>> get Vt-d failures (if we remove the above call).
> >>>>> 
> >>>> 
> >>>> I don't prefer this usage process. As we discussed before, there
> >>>> should be a dummy driver to hide/unbind passthrough devices from
> >>>> host, only these devices can be assigned. There should be a check
> >>>> whether device is assignable or not. If the device is not hidden,
> >>>> or has been assigned to other guest already, it cannot be
> >>>> assigned. At this point, I perfer to remove the device driver
> >>>> before restart host to avoid loading driver during booting.
> >>> 
> >>> I agree, but this is an orthogonal issue. Checking if the device is
> >>> assigned has to be part of the device assignment solution in
> >>> general, it is not specific to VT-d.
> >> 
> >> Yes, the check is general. When the check fails, should quit guest
> >> creation, and prompt the reason to user.
> >> 
> >>> As for VT-d detach, the point is that the generic VT-d code is not
> >>> detaching the domain, so we have to do that in KVM.
> >> 
> >> In intel_iommu_unmap_guest(), assigned devices will be detached from
> >> the guest. I think it's enough. 
> >> 
> >>> Again, removing the device driver is not enough (with the current
> >>> VT-d code). 
> >> 
> >> Why? It works for me. e.g. I removed e1000.ko before reboot host, it
> >> works well. BTW, for USB assignment, I add "nousb" in host grub to
> >> avoid loading driver for USB controllers, it also works.
> > What about the scenario described above:
> > 1. load device driver on host
> > 2. use device on host
> > 3. unload device driver on host
> > 4. Start KVM and assign the device to the guest.
> > 
> > Does it work for you as well?
> > 
> 
> I didn't try the scenario you described, it should not work if removing
> the detach call. I means removing device driver before reboot host. We
> should implement a mechanism to ensure assignable devices won't be set
> in context, instead of adding the detach call unconditionally in
> kvm_iommu_map_guest(). At this moment, I perfer to remove corresponding
> device driver before reboot host to assign the device.
> 

I agree that the detach call is not the ideal solution.
On the other hand, I don't think that forcing the user to reboot is
reasonable.
Another approach is to update the VT-d driver code in such a way that
will make sure that the context is cleared when the driver is removed,
e.g. tie it to the device release path.

Regards,
Ben

> Randy (Weidong)
> 
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH 5/5] This patch extends the VT-d driver to support KVM
  2008-08-07 10:35                         ` Ben-Ami Yassour
@ 2008-08-12  3:29                           ` Han, Weidong
  0 siblings, 0 replies; 36+ messages in thread
From: Han, Weidong @ 2008-08-12  3:29 UTC (permalink / raw)
  To: Ben-Ami Yassour
  Cc: avi, amit.shah, kvm, Muli Ben-Yehuda, anthony, Kay, Allen M

Ben-Ami Yassour wrote:
> On Thu, 2008-08-07 at 09:21 +0800, Han, Weidong wrote:
>> Ben-Ami Yassour wrote:
>>> On Wed, 2008-08-06 at 17:12 +0800, Han, Weidong wrote:
>>>> Ben-Ami Yassour wrote:
>>>>> On Wed, 2008-08-06 at 14:18 +0800, Han, Weidong wrote:
>>>>>> Ben-Ami Yassour wrote:
>>>>>>> On Tue, 2008-08-05 at 22:46 +0800, Han, Weidong wrote:
>>>>>>>> Ben-Ami Yassour wrote:
>>>>>>>>> [Ben: fixed memory pinning]
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
>>>>>>>>> Signed-off-by: Weidong Han <weidong.han@intel.com>
>>>>>>>>> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
>>>>>>>>> ---
>>>>>>>>> +
>>>>>>>>> +int kvm_iommu_map_guest(struct kvm *kvm,
>>>>>>>>> +			struct kvm_assigned_dev_kernel
*assigned_dev)
>>>>>>>>> +{
>>>>>>>>> +	struct pci_dev *pdev = NULL;
>>>>>>>>> +	int rc;
>>>>>>>>> +
>>>>>>>>> +	printk(KERN_DEBUG "VT-d direct map: host bdf =
%x:%x:%x\n",
>>>>>>>>> +	       assigned_dev->host_busnr,
>>>>>>>>> +	       PCI_SLOT(assigned_dev->host_devfn),
>>>>>>>>> +	       PCI_FUNC(assigned_dev->host_devfn));
>>>>>>>>> +
>>>>>>>>> +	pdev = assigned_dev->dev;
>>>>>>>>> +
>>>>>>>>> +	if (pdev == NULL) {
>>>>>>>>> +		if (kvm->arch.intel_iommu_domain) {
>>>>>>>>> +
>>>>>>>> intel_iommu_domain_exit(kvm->arch.intel_iommu_domain);
>>>>>>>>> +			kvm->arch.intel_iommu_domain = NULL;
>>>>>>>>> +		}
>>>>>>>>> +		return -ENODEV;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	kvm->arch.intel_iommu_domain =
>>>>>>>>> intel_iommu_domain_alloc(pdev); + +	rc =
>>>>>>>>> kvm_iommu_map_memslots(kvm); +	if (rc) {
>>>>>>>>> +		kvm_iommu_unmap_memslots(kvm);
>>>>>>>>> +		return rc;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	intel_iommu_detach_dev(kvm->arch.intel_iommu_domain,
>>>>>>>>> +			       pdev->bus->number, pdev->devfn);
>>>>>>>> 
>>>>>>>> I think we should remove intel_iommu_detach_dev(), which
>>>>>>>> detaches device from iommu. If the device has been assigned to
>>>>>>>> other domain already, it should not be assigned to this
>>>>>>>> domain. Maybe we need to add a check whether the device has
>>>>>>>> been assigned already.
>>>>>>> 
>>>>>>> The problem is that in the generic VT-d code, the IOMMU is not
>>>>>>> updated when a device is detached, for example when the driver
>>>>>>> releases the device. Consider the following scenario:
>>>>>>> 1. load device driver on host
>>>>>>> 2. use device on host
>>>>>>> 3. unload device driver on host
>>>>>>> 4. Start KVM and assign the device to the guest.
>>>>>>> 
>>>>>>> In this case there is no clearing of the IOMMU on step 3, and we
>>>>>>> get Vt-d failures (if we remove the above call).
>>>>>>> 
>>>>>> 
>>>>>> I don't prefer this usage process. As we discussed before, there
>>>>>> should be a dummy driver to hide/unbind passthrough devices from
>>>>>> host, only these devices can be assigned. There should be a check
>>>>>> whether device is assignable or not. If the device is not hidden,
>>>>>> or has been assigned to other guest already, it cannot be
>>>>>> assigned. At this point, I perfer to remove the device driver
>>>>>> before restart host to avoid loading driver during booting.
>>>>> 
>>>>> I agree, but this is an orthogonal issue. Checking if the device
>>>>> is assigned has to be part of the device assignment solution in
>>>>> general, it is not specific to VT-d.
>>>> 
>>>> Yes, the check is general. When the check fails, should quit guest
>>>> creation, and prompt the reason to user.
>>>> 
>>>>> As for VT-d detach, the point is that the generic VT-d code is not
>>>>> detaching the domain, so we have to do that in KVM.
>>>> 
>>>> In intel_iommu_unmap_guest(), assigned devices will be detached
>>>> from the guest. I think it's enough.
>>>> 
>>>>> Again, removing the device driver is not enough (with the current
>>>>> VT-d code).
>>>> 
>>>> Why? It works for me. e.g. I removed e1000.ko before reboot host,
>>>> it works well. BTW, for USB assignment, I add "nousb" in host grub
>>>> to avoid loading driver for USB controllers, it also works.
>>> What about the scenario described above:
>>> 1. load device driver on host
>>> 2. use device on host
>>> 3. unload device driver on host
>>> 4. Start KVM and assign the device to the guest.
>>> 
>>> Does it work for you as well?
>>> 
>> 
>> I didn't try the scenario you described, it should not work if
>> removing the detach call. I means removing device driver before
>> reboot host. We should implement a mechanism to ensure assignable
>> devices won't be set in context, instead of adding the detach call
>> unconditionally in kvm_iommu_map_guest(). At this moment, I perfer
>> to remove corresponding device driver before reboot host to assign
>> the device. 
>> 
> 
> I agree that the detach call is not the ideal solution.
> On the other hand, I don't think that forcing the user to reboot is
> reasonable.
> Another approach is to update the VT-d driver code in such a way that
> will make sure that the context is cleared when the driver is removed,
> e.g. tie it to the device release path.
> 

Okay, we can consider this when implement dummy driver to hide/unbind
passthrough devices from host. And also, we need to define a clear and
friendly VT-d usage later.

Randy (Weidong)




^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2008-08-12  3:29 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-28 16:26 Device assignemnt: updated patches Ben-Ami Yassour
2008-07-28 16:26 ` [PATCH 1/5] KVM: PCIPT: direct mmio pfn check Ben-Ami Yassour
2008-07-28 16:26   ` [PATCH 2/5] KVM: Add irq ack notifier list Ben-Ami Yassour
2008-07-28 16:26     ` [PATCH 3/5] KVM: pci device assignment Ben-Ami Yassour
2008-07-28 16:26       ` [PATCH 4/5] VT-d: changes to support KVM Ben-Ami Yassour
2008-07-28 16:26         ` [PATCH 5/5] This patch extends the VT-d driver " Ben-Ami Yassour
2008-07-28 16:32           ` Device assignment - userspace part Ben-Ami Yassour
2008-07-28 16:32             ` [PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guest Ben-Ami Yassour
2008-08-01  3:09               ` Han, Weidong
2008-08-05  9:41                 ` Ben-Ami Yassour
2008-07-29  7:28           ` [PATCH 5/5] This patch extends the VT-d driver to support KVM Yang, Sheng
2008-08-05  6:01           ` Yang, Sheng
2008-08-05  9:32             ` Ben-Ami Yassour
2008-08-05 14:46           ` Han, Weidong
2008-08-06  5:50             ` Ben-Ami Yassour
2008-08-06  6:18               ` Han, Weidong
2008-08-06  8:56                 ` Ben-Ami Yassour
2008-08-06  9:12                   ` Han, Weidong
2008-08-06  9:42                     ` Ben-Ami Yassour
2008-08-07  1:21                       ` Han, Weidong
2008-08-07 10:35                         ` Ben-Ami Yassour
2008-08-12  3:29                           ` Han, Weidong
2008-07-29  9:19       ` [PATCH 3/5] KVM: pci device assignment Amit Shah
2008-07-29  9:38         ` Ben-Ami Yassour
2008-07-29 14:02           ` Avi Kivity
2008-07-30  6:00             ` Amit Shah
2008-07-30  6:03           ` Amit Shah
2008-07-30 11:58             ` Ben-Ami Yassour
2008-08-01 11:24               ` Amit Shah
2008-07-29 12:27       ` Ben-Ami Yassour
2008-07-29  7:14     ` [PATCH 2/5] KVM: Add irq ack notifier list Yang, Sheng
2008-07-29  9:34       ` Amit Shah
2008-07-29  9:56         ` Yang, Sheng
2008-07-30  5:54           ` Amit Shah
2008-07-31  8:55             ` Avi Kivity
2008-07-30 15:21 ` Device assignemnt: updated patches Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).