kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/15] Device assignment & MSI enhancement
@ 2008-12-25  9:09 Sheng Yang
  2008-12-25  9:09 ` [PATCH 01/15] KVM: Add MSI_ACTION flag for assigned irq Sheng Yang
                   ` (15 more replies)
  0 siblings, 16 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm

Hi Avi and Marcelo

Merry Xmas! And here is the v2 of patchset. Target at 2.6.29 for it contained
a lot of fix and improvement of current device assignment and MSI feature.

Change from V1:

Addressed Marcelo's comments, and:
1. Fix racy in kvm_free_assigned_irq(). In case to do this, I fetch one
patch (irq_fifo) from original MSI-X patchset. Indeed a nice catch of Marcelo.
:)

2. Unified kvm_set_irq() with ioapic_deliver(). It didn't save much, but
duplicate is always bothering, and I have modified bitmask for vcpu to a real
bitmap (maybe not all, just what I have seen).

And for V1:

1. Add gsi_msg mapping mechanism, which gsi can used to indicated a MSI
interrupt.(Notice API/ABI changed a little, but we don't have userspace patch
now, so it should be OK.)

2. Provide MSI disable capability.

arch/x86/kvm/lapic.c      |   11 ++-
include/linux/kvm.h       |   15 +++-
include/linux/kvm_host.h  |   26 +++++-
include/linux/kvm_types.h |   17 ++++
virt/kvm/ioapic.c         |  117 ++++++++++---------------
virt/kvm/ioapic.h         |   23 +----
virt/kvm/irq_comm.c       |  184 ++++++++++++++++++++++++++++++++++++---
virt/kvm/kvm_main.c       |  212 ++++++++++++++++++++++++++++-----------------
8 files changed, 415 insertions(+), 190 deletions(-)

Sorry for the patchset size, it's too easy to grow fast, and I am a little too
lazy to split them into more batches in the Xmas... :)

--
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/15] KVM: Add MSI_ACTION flag for assigned irq
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 02/15] KVM: Use kvm_free_assigned_irq() for free irq Sheng Yang
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

For MSI disable feature later.

Notice I changed ABI here, but due to no userspace patch, I think it's OK.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ef7f98e..5b965f6 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -544,6 +544,7 @@ struct kvm_assigned_irq {
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 
-#define KVM_DEV_IRQ_ASSIGN_ENABLE_MSI	(1 << 0)
+#define KVM_DEV_IRQ_ASSIGN_MSI_ACTION	(1 << 0)
+#define KVM_DEV_IRQ_ASSIGN_ENABLE_MSI	(1 << 1)
 
 #endif
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/15] KVM: Use kvm_free_assigned_irq() for free irq
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
  2008-12-25  9:09 ` [PATCH 01/15] KVM: Add MSI_ACTION flag for assigned irq Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 03/15] KVM: Add support to disable MSI for assigned device Sheng Yang
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

Which is more convenient...

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 virt/kvm/kvm_main.c |   10 ++--------
 1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ffd261d..cd84b3e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -284,11 +284,7 @@ static int assigned_device_update_intx(struct kvm *kvm,
 		return 0;
 
 	if (irqchip_in_kernel(kvm)) {
-		if (!msi2intx &&
-		    adev->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI) {
-			free_irq(adev->host_irq, (void *)kvm);
-			pci_disable_msi(adev->dev);
-		}
+		kvm_free_assigned_irq(kvm, adev);
 
 		if (!capable(CAP_SYS_RAWIO))
 			return -EPERM;
@@ -339,9 +335,7 @@ static int assigned_device_update_msi(struct kvm *kvm,
 
 	if (irqchip_in_kernel(kvm)) {
 		if (!msi2intx) {
-			if (adev->irq_requested_type &
-					KVM_ASSIGNED_DEV_HOST_INTX)
-				free_irq(adev->host_irq, (void *)adev);
+			kvm_free_assigned_irq(kvm, adev);
 
 			r = pci_enable_msi(adev->dev);
 			if (r)
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/15] KVM: Add support to disable MSI for assigned device
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
  2008-12-25  9:09 ` [PATCH 01/15] KVM: Add MSI_ACTION flag for assigned irq Sheng Yang
  2008-12-25  9:09 ` [PATCH 02/15] KVM: Use kvm_free_assigned_irq() for free irq Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 04/15] KVM: Add a route layer to convert MSI message to GSI Sheng Yang
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

MSI is always enabled by default for msi2intx=1. But if msi2intx=0, we
have to disable MSI if guest require to do so.

The patch also discard unnecessary msi2intx judgment if guest want to update
MSI state.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 virt/kvm/kvm_main.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index cd84b3e..111738b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -328,6 +328,15 @@ static int assigned_device_update_msi(struct kvm *kvm,
 		adev->irq_requested_type &= ~KVM_ASSIGNED_DEV_GUEST_MSI;
 		adev->guest_irq = airq->guest_irq;
 		adev->ack_notifier.gsi = airq->guest_irq;
+	} else {
+		/*
+		 * Guest require to disable device MSI, we disable MSI and
+		 * re-enable INTx by default again. Notice it's only for
+		 * non-msi2intx.
+		 */
+		kvm_free_assigned_irq(kvm, adev);
+		assigned_device_update_intx(kvm, adev, airq);
+		return 0;
 	}
 
 	if (adev->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI)
@@ -399,8 +408,7 @@ static int kvm_vm_ioctl_assign_irq(struct kvm *kvm,
 		}
 	}
 
-	if ((!msi2intx &&
-	     (assigned_irq->flags & KVM_DEV_IRQ_ASSIGN_ENABLE_MSI)) ||
+	if ((assigned_irq->flags & KVM_DEV_IRQ_ASSIGN_MSI_ACTION) ||
 	    (msi2intx && match->dev->msi_enabled)) {
 #ifdef CONFIG_X86
 		r = assigned_device_update_msi(kvm, match, assigned_irq);
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/15] KVM: Add a route layer to convert MSI message to GSI
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (2 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 03/15] KVM: Add support to disable MSI for assigned device Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 05/15] KVM: Using gsi_msg mapping for MSI device assignment Sheng Yang
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

Avi's purpose, to use single kvm_set_irq() to deal with all interrupt, including
MSI. So here is it.

struct gsi_msg is a mapping from a special gsi(with KVM_GSI_MSG_MASK) to
MSI/MSI-X message address/data.

Now we support up to 256 gsi_msg mapping, and gsi_msg is allocated by kernel and
provide two ioctls to userspace, which is more flexiable.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm.h      |   12 ++++++++
 include/linux/kvm_host.h |   16 ++++++++++
 virt/kvm/irq_comm.c      |   70 ++++++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c      |   66 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 164 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 5b965f6..bb1ce7e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -394,6 +394,7 @@ struct kvm_trace_rec {
 #define KVM_CAP_USER_NMI 22
 #endif
 #define KVM_CAP_SET_GUEST_DEBUG 23
+#define KVM_CAP_GSI_MSG 24
 
 /*
  * ioctls for VM fds
@@ -427,6 +428,8 @@ struct kvm_trace_rec {
 				   struct kvm_assigned_pci_dev)
 #define KVM_ASSIGN_IRQ _IOR(KVMIO, 0x70, \
 			    struct kvm_assigned_irq)
+#define KVM_REQUEST_GSI_MSG _IOWR(KVMIO, 0x71, struct kvm_assigned_gsi_msg)
+#define KVM_FREE_GSI_MSG _IOR(KVMIO, 0x72, struct kvm_assigned_gsi_msg)
 
 /*
  * ioctls for vcpu fds
@@ -547,4 +550,13 @@ struct kvm_assigned_irq {
 #define KVM_DEV_IRQ_ASSIGN_MSI_ACTION	(1 << 0)
 #define KVM_DEV_IRQ_ASSIGN_ENABLE_MSI	(1 << 1)
 
+struct kvm_assigned_gsi_msg {
+	__u32 gsi;
+	struct {
+		__u32 addr_lo;
+		__u32 addr_hi;
+		__u32 data;
+	} msg;
+};
+
 #endif
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d63e9a4..0e5741a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -132,6 +132,10 @@ struct kvm {
 	unsigned long mmu_notifier_seq;
 	long mmu_notifier_count;
 #endif
+	struct hlist_head gsi_msg_list;
+	struct mutex gsi_msg_lock;
+#define KVM_NR_GSI_MSG	    256
+	DECLARE_BITMAP(gsi_msg_bitmap, KVM_NR_GSI_MSG);
 };
 
 /* The guest did something we don't support. */
@@ -319,6 +323,14 @@ struct kvm_assigned_dev_kernel {
 	struct pci_dev *dev;
 	struct kvm *kvm;
 };
+
+#define KVM_GSI_MSG_MASK    0x1000000ull
+struct kvm_gsi_msg {
+	u32 gsi;
+	struct msi_msg msg;
+	struct hlist_node link;
+};
+
 void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
@@ -326,6 +338,10 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm,
 void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian);
 int kvm_request_irq_source_id(struct kvm *kvm);
 void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
+int kvm_update_gsi_msg(struct kvm *kvm, struct kvm_gsi_msg *gsi_msg);
+struct kvm_gsi_msg *kvm_find_gsi_msg(struct kvm *kvm, u32 gsi);
+void kvm_free_gsi_msg(struct kvm *kvm, struct kvm_gsi_msg *gsi_msg);
+void kvm_free_gsi_msg_list(struct kvm *kvm);
 
 #ifdef CONFIG_DMAR
 int kvm_iommu_map_pages(struct kvm *kvm, gfn_t base_gfn,
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index aa5d1e5..abfab46 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -99,3 +99,73 @@ void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id)
 		clear_bit(irq_source_id, &kvm->arch.irq_states[i]);
 	clear_bit(irq_source_id, &kvm->arch.irq_sources_bitmap);
 }
+
+int kvm_update_gsi_msg(struct kvm *kvm, struct kvm_gsi_msg *gsi_msg)
+{
+	struct kvm_gsi_msg *found_msg, *new_gsi_msg;
+	int r, gsi;
+
+	mutex_lock(&kvm->gsi_msg_lock);
+	/* Find whether we need a update or a new entry */
+	found_msg = kvm_find_gsi_msg(kvm, gsi_msg->gsi);
+	if (found_msg)
+		*found_msg = *gsi_msg;
+	else {
+		gsi = find_first_zero_bit(kvm->gsi_msg_bitmap, KVM_NR_GSI_MSG);
+		if (gsi >= KVM_NR_GSI_MSG) {
+			r = -ENOSPC;
+			goto out;
+		}
+		__set_bit(gsi, kvm->gsi_msg_bitmap);
+		gsi_msg->gsi = gsi | KVM_GSI_MSG_MASK;
+		new_gsi_msg = kzalloc(sizeof(*new_gsi_msg), GFP_KERNEL);
+		if (!new_gsi_msg) {
+			r = -ENOMEM;
+			goto out;
+		}
+		*new_gsi_msg = *gsi_msg;
+		hlist_add_head(&new_gsi_msg->link, &kvm->gsi_msg_list);
+	}
+	r = 0;
+out:
+	mutex_unlock(&kvm->gsi_msg_lock);
+	return r;
+}
+
+/* Call with kvm->gsi_msg_lock hold */
+struct kvm_gsi_msg *kvm_find_gsi_msg(struct kvm *kvm, u32 gsi)
+{
+	struct kvm_gsi_msg *gsi_msg;
+	struct hlist_node *n;
+
+	if (!(gsi & KVM_GSI_MSG_MASK))
+		return NULL;
+	hlist_for_each_entry(gsi_msg, n, &kvm->gsi_msg_list, link)
+		if (gsi_msg->gsi == gsi)
+			goto out;
+	gsi_msg = NULL;
+out:
+	return gsi_msg;
+}
+
+/* Call with kvm->gsi_msg_lock hold */
+void kvm_free_gsi_msg(struct kvm *kvm, struct kvm_gsi_msg *gsi_msg)
+{
+	if (!gsi_msg)
+		return;
+	__clear_bit(gsi_msg->gsi & ~KVM_GSI_MSG_MASK, kvm->gsi_msg_bitmap);
+	hlist_del(&gsi_msg->link);
+	kfree(gsi_msg);
+}
+
+void kvm_free_gsi_msg_list(struct kvm *kvm)
+{
+	struct kvm_gsi_msg *gsi_msg;
+	struct hlist_node *pos, *n;
+
+	mutex_lock(&kvm->gsi_msg_lock);
+	hlist_for_each_entry_safe(gsi_msg, pos, n, &kvm->gsi_msg_list, link)
+		kvm_free_gsi_msg(kvm, gsi_msg);
+	mutex_unlock(&kvm->gsi_msg_lock);
+}
+
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 111738b..26bccf9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -813,6 +813,8 @@ static struct kvm *kvm_create_vm(void)
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
 	kvm_coalesced_mmio_init(kvm);
 #endif
+	INIT_HLIST_HEAD(&kvm->gsi_msg_list);
+	mutex_init(&kvm->gsi_msg_lock);
 out:
 	return kvm;
 }
@@ -850,6 +852,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
 {
 	struct mm_struct *mm = kvm->mm;
 
+	kvm_free_gsi_msg_list(kvm);
 	spin_lock(&kvm_lock);
 	list_del(&kvm->vm_list);
 	spin_unlock(&kvm_lock);
@@ -1578,6 +1581,45 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
 	return 0;
 }
 
+static int kvm_vm_ioctl_request_gsi_msg(struct kvm *kvm,
+					struct kvm_assigned_gsi_msg *agsi_msg)
+{
+	struct kvm_gsi_msg gsi_msg;
+	int r;
+
+	gsi_msg.gsi = agsi_msg->gsi;
+	gsi_msg.msg.address_lo = agsi_msg->msg.addr_lo;
+	gsi_msg.msg.address_hi = agsi_msg->msg.addr_hi;
+	gsi_msg.msg.data = agsi_msg->msg.data;
+
+	r = kvm_update_gsi_msg(kvm, &gsi_msg);
+	if (r == 0)
+		agsi_msg->gsi = gsi_msg.gsi;
+	return r;
+}
+
+static int kvm_vm_ioctl_free_gsi_msg(struct kvm *kvm,
+				     struct kvm_assigned_gsi_msg *agsi_msg)
+{
+	struct kvm_gsi_msg *gsi_msg;
+	u32 gsi;
+	int r;
+
+	gsi = agsi_msg->gsi;
+	mutex_lock(&kvm->gsi_msg_lock);
+	gsi_msg = kvm_find_gsi_msg(kvm, gsi);
+	if (!gsi_msg || memcmp(gsi_msg, agsi_msg, sizeof(agsi_msg)) != 0) {
+		printk(KERN_WARNING "kvm: illegal gsi->msi_msg mapping!");
+		r = -EINVAL;
+		goto out;
+	}
+	kvm_free_gsi_msg(kvm, gsi_msg);
+	r = 0;
+out:
+	mutex_unlock(&kvm->gsi_msg_lock);
+	return r;
+}
+
 static long kvm_vcpu_ioctl(struct file *filp,
 			   unsigned int ioctl, unsigned long arg)
 {
@@ -1860,6 +1902,30 @@ static long kvm_vm_ioctl(struct file *filp,
 		break;
 	}
 #endif
+	case KVM_REQUEST_GSI_MSG: {
+		struct kvm_assigned_gsi_msg agsi_msg;
+		r = -EFAULT;
+		if (copy_from_user(&agsi_msg, argp, sizeof agsi_msg))
+			goto out;
+		r = kvm_vm_ioctl_request_gsi_msg(kvm, &agsi_msg);
+		if (r)
+			goto out;
+		r = -EFAULT;
+		if (copy_to_user(argp, &agsi_msg, sizeof agsi_msg))
+			goto out;
+		r = 0;
+		break;
+	}
+	case KVM_FREE_GSI_MSG: {
+		struct kvm_assigned_gsi_msg agsi_msg;
+		r = -EFAULT;
+		if (copy_from_user(&agsi_msg, argp, sizeof agsi_msg))
+			goto out;
+		r = kvm_vm_ioctl_free_gsi_msg(kvm, &agsi_msg);
+		if (r)
+			goto out;
+		break;
+	}
 	default:
 		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
 	}
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/15] KVM: Using gsi_msg mapping for MSI device assignment
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (3 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 04/15] KVM: Add a route layer to convert MSI message to GSI Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 06/15] KVM: Improve MSI dispatch function Sheng Yang
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

Convert MSI userspace interface to support gsi_msg mapping(and nobody should
be the user of the old interface...).

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm_host.h |    1 -
 virt/kvm/kvm_main.c      |   35 ++++++++++++++++++++++-------------
 2 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0e5741a..aa2606b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -313,7 +313,6 @@ struct kvm_assigned_dev_kernel {
 	int host_irq;
 	bool host_irq_disabled;
 	int guest_irq;
-	struct msi_msg guest_msi;
 #define KVM_ASSIGNED_DEV_GUEST_INTX	(1 << 0)
 #define KVM_ASSIGNED_DEV_GUEST_MSI	(1 << 1)
 #define KVM_ASSIGNED_DEV_HOST_INTX	(1 << 8)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 26bccf9..3494861 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -92,20 +92,30 @@ static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev)
 	int vcpu_id;
 	struct kvm_vcpu *vcpu;
 	struct kvm_ioapic *ioapic = ioapic_irqchip(dev->kvm);
-	int dest_id = (dev->guest_msi.address_lo & MSI_ADDR_DEST_ID_MASK)
-			>> MSI_ADDR_DEST_ID_SHIFT;
-	int vector = (dev->guest_msi.data & MSI_DATA_VECTOR_MASK)
-			>> MSI_DATA_VECTOR_SHIFT;
-	int dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT,
-				(unsigned long *)&dev->guest_msi.address_lo);
-	int trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT,
-				(unsigned long *)&dev->guest_msi.data);
-	int delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT,
-				(unsigned long *)&dev->guest_msi.data);
+	struct kvm_gsi_msg *gsi_msg;
+	int dest_id, vector, dest_mode, trig_mode, delivery_mode;
 	u32 deliver_bitmask;
 
 	BUG_ON(!ioapic);
 
+	mutex_lock(&dev->kvm->gsi_msg_lock);
+	gsi_msg = kvm_find_gsi_msg(dev->kvm, dev->guest_irq);
+	if (!gsi_msg) {
+		printk(KERN_WARNING "kvm: fail to find correlated gsi_msg\n");
+		return;
+	}
+	mutex_unlock(&dev->kvm->gsi_msg_lock);
+
+	dest_id = (gsi_msg->msg.address_lo & MSI_ADDR_DEST_ID_MASK)
+			>> MSI_ADDR_DEST_ID_SHIFT;
+	vector = (gsi_msg->msg.data & MSI_DATA_VECTOR_MASK)
+			>> MSI_DATA_VECTOR_SHIFT;
+	dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT,
+				(unsigned long *)&gsi_msg->msg.address_lo);
+	trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT,
+				(unsigned long *)&gsi_msg->msg.data);
+	delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT,
+				(unsigned long *)&gsi_msg->msg.data);
 	deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic,
 				dest_id, dest_mode);
 	/* IOAPIC delivery mode value is the same as MSI here */
@@ -316,17 +326,16 @@ static int assigned_device_update_msi(struct kvm *kvm,
 {
 	int r;
 
+	adev->guest_irq = airq->guest_irq;
+
 	if (airq->flags & KVM_DEV_IRQ_ASSIGN_ENABLE_MSI) {
 		/* x86 don't care upper address of guest msi message addr */
 		adev->irq_requested_type |= KVM_ASSIGNED_DEV_GUEST_MSI;
 		adev->irq_requested_type &= ~KVM_ASSIGNED_DEV_GUEST_INTX;
-		adev->guest_msi.address_lo = airq->guest_msi.addr_lo;
-		adev->guest_msi.data = airq->guest_msi.data;
 		adev->ack_notifier.gsi = -1;
 	} else if (msi2intx) {
 		adev->irq_requested_type |= KVM_ASSIGNED_DEV_GUEST_INTX;
 		adev->irq_requested_type &= ~KVM_ASSIGNED_DEV_GUEST_MSI;
-		adev->guest_irq = airq->guest_irq;
 		adev->ack_notifier.gsi = airq->guest_irq;
 	} else {
 		/*
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/15] KVM: Improve MSI dispatch function
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (4 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 05/15] KVM: Using gsi_msg mapping for MSI device assignment Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 07/15] KVM: Using ioapic_irqchip() macro for kvm_set_irq Sheng Yang
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

Prepare to merge with kvm_set_irq().

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 virt/kvm/kvm_main.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3494861..599257e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -87,7 +87,7 @@ static bool kvm_rebooting;
 #ifdef KVM_CAP_DEVICE_ASSIGNMENT
 
 #ifdef CONFIG_X86
-static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev)
+static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev, u32 gsi)
 {
 	int vcpu_id;
 	struct kvm_vcpu *vcpu;
@@ -99,7 +99,7 @@ static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev)
 	BUG_ON(!ioapic);
 
 	mutex_lock(&dev->kvm->gsi_msg_lock);
-	gsi_msg = kvm_find_gsi_msg(dev->kvm, dev->guest_irq);
+	gsi_msg = kvm_find_gsi_msg(dev->kvm, gsi);
 	if (!gsi_msg) {
 		printk(KERN_WARNING "kvm: fail to find correlated gsi_msg\n");
 		return;
@@ -143,7 +143,7 @@ static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev)
 	}
 }
 #else
-static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev) {}
+static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev, u32 gsi) {}
 #endif
 
 static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head,
@@ -178,7 +178,7 @@ static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
 			    assigned_dev->guest_irq, 1);
 	else if (assigned_dev->irq_requested_type &
 				KVM_ASSIGNED_DEV_GUEST_MSI) {
-		assigned_device_msi_dispatch(assigned_dev);
+		assigned_device_msi_dispatch(assigned_dev, assigned_dev->guest_irq);
 		enable_irq(assigned_dev->host_irq);
 		assigned_dev->host_irq_disabled = false;
 	}
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/15] KVM: Using ioapic_irqchip() macro for kvm_set_irq
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (5 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 06/15] KVM: Improve MSI dispatch function Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 08/15] KVM: Merge MSI handling to kvm_set_irq Sheng Yang
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang


Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 virt/kvm/irq_comm.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index abfab46..47243ef 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -39,7 +39,7 @@ void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level)
 	 * IOAPIC.  So set the bit in both. The guest will ignore
 	 * writes to the unused one.
 	 */
-	kvm_ioapic_set_irq(kvm->arch.vioapic, irq, !!(*irq_state));
+	kvm_ioapic_set_irq(ioapic_irqchip(kvm), irq, !!(*irq_state));
 #ifdef CONFIG_X86
 	kvm_pic_set_irq(pic_irqchip(kvm), irq, !!(*irq_state));
 #endif
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/15] KVM: Merge MSI handling to kvm_set_irq
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (6 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 07/15] KVM: Using ioapic_irqchip() macro for kvm_set_irq Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 09/15] KVM: Split IOAPIC structure Sheng Yang
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

Using kvm_set_irq to handle all interrupt injection.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm_host.h |    2 +-
 virt/kvm/irq_comm.c      |   98 +++++++++++++++++++++++++++++++++++++++-------
 virt/kvm/kvm_main.c      |   77 +++---------------------------------
 3 files changed, 90 insertions(+), 87 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index aa2606b..5b671b6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -330,7 +330,7 @@ struct kvm_gsi_msg {
 	struct hlist_node link;
 };
 
-void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level);
+void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
 				   struct kvm_irq_ack_notifier *kian);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 47243ef..63cdf01 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -20,28 +20,96 @@
  */
 
 #include <linux/kvm_host.h>
+
+#ifdef CONFIG_X86
+#include <asm/msidef.h>
+#endif
+
 #include "irq.h"
 
 #include "ioapic.h"
 
 /* This should be called with the kvm->lock mutex held */
-void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level)
+void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level)
 {
-	unsigned long *irq_state = (unsigned long *)&kvm->arch.irq_states[irq];
-
-	/* Logical OR for level trig interrupt */
-	if (level)
-		set_bit(irq_source_id, irq_state);
-	else
-		clear_bit(irq_source_id, irq_state);
-
-	/* Not possible to detect if the guest uses the PIC or the
-	 * IOAPIC.  So set the bit in both. The guest will ignore
-	 * writes to the unused one.
-	 */
-	kvm_ioapic_set_irq(ioapic_irqchip(kvm), irq, !!(*irq_state));
+	unsigned long *irq_state;
+#ifdef CONFIG_X86
+	int vcpu_id;
+	struct kvm_vcpu *vcpu;
+	struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
+	struct kvm_gsi_msg *gsi_msg;
+	int dest_id, vector, dest_mode, trig_mode, delivery_mode;
+	u32 deliver_bitmask;
+
+	BUG_ON(!ioapic);
+#endif
+
+	if (!(gsi & KVM_GSI_MSG_MASK)) {
+		int irq = gsi;
+
+		irq_state = (unsigned long *)&kvm->arch.irq_states[irq];
+
+		/* Logical OR for level trig interrupt */
+		if (level)
+			set_bit(irq_source_id, irq_state);
+		else
+			clear_bit(irq_source_id, irq_state);
+
+		/* Not possible to detect if the guest uses the PIC or the
+		 * IOAPIC.  So set the bit in both. The guest will ignore
+		 * writes to the unused one.
+		 */
+		kvm_ioapic_set_irq(ioapic, irq, !!(*irq_state));
 #ifdef CONFIG_X86
-	kvm_pic_set_irq(pic_irqchip(kvm), irq, !!(*irq_state));
+		kvm_pic_set_irq(pic_irqchip(kvm), irq, !!(*irq_state));
+#endif
+		return;
+	}
+
+#ifdef CONFIG_X86
+	mutex_lock(&kvm->gsi_msg_lock);
+	gsi_msg = kvm_find_gsi_msg(kvm, gsi);
+	mutex_unlock(&kvm->gsi_msg_lock);
+	if (!gsi_msg) {
+		printk(KERN_WARNING "kvm: fail to find correlated gsi_msg\n");
+		return;
+	}
+
+	dest_id = (gsi_msg->msg.address_lo & MSI_ADDR_DEST_ID_MASK)
+			>> MSI_ADDR_DEST_ID_SHIFT;
+	vector = (gsi_msg->msg.data & MSI_DATA_VECTOR_MASK)
+			>> MSI_DATA_VECTOR_SHIFT;
+	dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT,
+				(unsigned long *)&gsi_msg->msg.address_lo);
+	trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT,
+				(unsigned long *)&gsi_msg->msg.data);
+	delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT,
+				(unsigned long *)&gsi_msg->msg.data);
+	deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic,
+				dest_id, dest_mode);
+	/* IOAPIC delivery mode value is the same as MSI here */
+	switch (delivery_mode) {
+	case IOAPIC_LOWEST_PRIORITY:
+		vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm, vector,
+				deliver_bitmask);
+		if (vcpu != NULL)
+			kvm_apic_set_irq(vcpu, vector, trig_mode);
+		else
+			printk(KERN_INFO "kvm: null lowest priority vcpu!\n");
+		break;
+	case IOAPIC_FIXED:
+		for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
+			if (!(deliver_bitmask & (1 << vcpu_id)))
+				continue;
+			deliver_bitmask &= ~(1 << vcpu_id);
+			vcpu = ioapic->kvm->vcpus[vcpu_id];
+			if (vcpu)
+				kvm_apic_set_irq(vcpu, vector, trig_mode);
+		}
+		break;
+	default:
+		printk(KERN_INFO "kvm: unsupported MSI delivery mode\n");
+	}
 #endif
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 599257e..a51e630 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -47,10 +47,6 @@
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 
-#ifdef CONFIG_X86
-#include <asm/msidef.h>
-#endif
-
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
 #include "coalesced_mmio.h"
 #endif
@@ -86,66 +82,6 @@ static bool kvm_rebooting;
 
 #ifdef KVM_CAP_DEVICE_ASSIGNMENT
 
-#ifdef CONFIG_X86
-static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev, u32 gsi)
-{
-	int vcpu_id;
-	struct kvm_vcpu *vcpu;
-	struct kvm_ioapic *ioapic = ioapic_irqchip(dev->kvm);
-	struct kvm_gsi_msg *gsi_msg;
-	int dest_id, vector, dest_mode, trig_mode, delivery_mode;
-	u32 deliver_bitmask;
-
-	BUG_ON(!ioapic);
-
-	mutex_lock(&dev->kvm->gsi_msg_lock);
-	gsi_msg = kvm_find_gsi_msg(dev->kvm, gsi);
-	if (!gsi_msg) {
-		printk(KERN_WARNING "kvm: fail to find correlated gsi_msg\n");
-		return;
-	}
-	mutex_unlock(&dev->kvm->gsi_msg_lock);
-
-	dest_id = (gsi_msg->msg.address_lo & MSI_ADDR_DEST_ID_MASK)
-			>> MSI_ADDR_DEST_ID_SHIFT;
-	vector = (gsi_msg->msg.data & MSI_DATA_VECTOR_MASK)
-			>> MSI_DATA_VECTOR_SHIFT;
-	dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT,
-				(unsigned long *)&gsi_msg->msg.address_lo);
-	trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT,
-				(unsigned long *)&gsi_msg->msg.data);
-	delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT,
-				(unsigned long *)&gsi_msg->msg.data);
-	deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic,
-				dest_id, dest_mode);
-	/* IOAPIC delivery mode value is the same as MSI here */
-	switch (delivery_mode) {
-	case IOAPIC_LOWEST_PRIORITY:
-		vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm, vector,
-				deliver_bitmask);
-		if (vcpu != NULL)
-			kvm_apic_set_irq(vcpu, vector, trig_mode);
-		else
-			printk(KERN_INFO "kvm: null lowest priority vcpu!\n");
-		break;
-	case IOAPIC_FIXED:
-		for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-			if (!(deliver_bitmask & (1 << vcpu_id)))
-				continue;
-			deliver_bitmask &= ~(1 << vcpu_id);
-			vcpu = ioapic->kvm->vcpus[vcpu_id];
-			if (vcpu)
-				kvm_apic_set_irq(vcpu, vector, trig_mode);
-		}
-		break;
-	default:
-		printk(KERN_INFO "kvm: unsupported MSI delivery mode\n");
-	}
-}
-#else
-static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev, u32 gsi) {}
-#endif
-
 static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head,
 						      int assigned_dev_id)
 {
@@ -172,16 +108,15 @@ static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
 	 * finer-grained lock, update this
 	 */
 	mutex_lock(&assigned_dev->kvm->lock);
-	if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_GUEST_INTX)
-		kvm_set_irq(assigned_dev->kvm,
-			    assigned_dev->irq_source_id,
-			    assigned_dev->guest_irq, 1);
-	else if (assigned_dev->irq_requested_type &
-				KVM_ASSIGNED_DEV_GUEST_MSI) {
-		assigned_device_msi_dispatch(assigned_dev, assigned_dev->guest_irq);
+
+	kvm_set_irq(assigned_dev->kvm, assigned_dev->irq_source_id,
+		    assigned_dev->guest_irq, 1);
+
+	if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_GUEST_MSI) {
 		enable_irq(assigned_dev->host_irq);
 		assigned_dev->host_irq_disabled = false;
 	}
+
 	mutex_unlock(&assigned_dev->kvm->lock);
 	kvm_put_kvm(assigned_dev->kvm);
 }
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/15] KVM: Split IOAPIC structure
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (7 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 08/15] KVM: Merge MSI handling to kvm_set_irq Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 10/15] KVM: Unified the delivery of IOAPIC and MSI Sheng Yang
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

Prepared for reuse ioapic_redir_entry for MSI.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm_types.h |   17 +++++++++++++++++
 virt/kvm/ioapic.c         |    6 +++---
 virt/kvm/ioapic.h         |   17 +----------------
 3 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 9b6f395..f07de1a 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -53,4 +53,21 @@ struct kvm_pio_request {
 	int rep;
 };
 
+union kvm_ioapic_redirect_entry {
+	u64 bits;
+	struct {
+		u8 vector;
+		u8 delivery_mode:3;
+		u8 dest_mode:1;
+		u8 delivery_status:1;
+		u8 polarity:1;
+		u8 remote_irr:1;
+		u8 trig_mode:1;
+		u8 mask:1;
+		u8 reserve:7;
+		u8 reserved[4];
+		u8 dest_id;
+	} fields;
+};
+
 #endif /* __KVM_TYPES_H__ */
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 23b81cf..ebb2ab5 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -85,7 +85,7 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
 
 static void ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
 {
-	union ioapic_redir_entry *pent;
+	union kvm_ioapic_redirect_entry *pent;
 
 	pent = &ioapic->redirtbl[idx];
 
@@ -272,7 +272,7 @@ void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
 {
 	u32 old_irr = ioapic->irr;
 	u32 mask = 1 << irq;
-	union ioapic_redir_entry entry;
+	union kvm_ioapic_redirect_entry entry;
 
 	if (irq >= 0 && irq < IOAPIC_NUM_PINS) {
 		entry = ioapic->redirtbl[irq];
@@ -291,7 +291,7 @@ void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
 static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi,
 				    int trigger_mode)
 {
-	union ioapic_redir_entry *ent;
+	union kvm_ioapic_redirect_entry *ent;
 
 	ent = &ioapic->redirtbl[gsi];
 
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index 49c9581..ee5b0bd 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -40,22 +40,7 @@ struct kvm_ioapic {
 	u32 id;
 	u32 irr;
 	u32 pad;
-	union ioapic_redir_entry {
-		u64 bits;
-		struct {
-			u8 vector;
-			u8 delivery_mode:3;
-			u8 dest_mode:1;
-			u8 delivery_status:1;
-			u8 polarity:1;
-			u8 remote_irr:1;
-			u8 trig_mode:1;
-			u8 mask:1;
-			u8 reserve:7;
-			u8 reserved[4];
-			u8 dest_id;
-		} fields;
-	} redirtbl[IOAPIC_NUM_PINS];
+	union kvm_ioapic_redirect_entry redirtbl[IOAPIC_NUM_PINS];
 	struct kvm_io_device dev;
 	struct kvm *kvm;
 	void (*ack_notifier)(void *opaque, int irq);
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/15] KVM: Unified the delivery of IOAPIC and MSI
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (8 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 09/15] KVM: Split IOAPIC structure Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 11/15] KVM: Change API of kvm_ioapic_get_delivery_bitmask Sheng Yang
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

Duplicate code is always bothering...

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm_host.h |    3 ++
 virt/kvm/ioapic.c        |   84 +++++++++++++++++-----------------------------
 virt/kvm/irq_comm.c      |   75 ++++++++++++++++++++++++----------------
 3 files changed, 79 insertions(+), 83 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5b671b6..4f92317 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -330,6 +330,9 @@ struct kvm_gsi_msg {
 	struct hlist_node link;
 };
 
+void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
+				   union kvm_ioapic_redirect_entry *entry,
+				   u32 *deliver_bitmask);
 void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index ebb2ab5..af9f5de 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -195,75 +195,53 @@ u32 kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
 
 static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 {
-	u8 dest = ioapic->redirtbl[irq].fields.dest_id;
-	u8 dest_mode = ioapic->redirtbl[irq].fields.dest_mode;
-	u8 delivery_mode = ioapic->redirtbl[irq].fields.delivery_mode;
-	u8 vector = ioapic->redirtbl[irq].fields.vector;
-	u8 trig_mode = ioapic->redirtbl[irq].fields.trig_mode;
+	union kvm_ioapic_redirect_entry entry = ioapic->redirtbl[irq];
 	u32 deliver_bitmask;
 	struct kvm_vcpu *vcpu;
 	int vcpu_id, r = 0;
 
 	ioapic_debug("dest=%x dest_mode=%x delivery_mode=%x "
 		     "vector=%x trig_mode=%x\n",
-		     dest, dest_mode, delivery_mode, vector, trig_mode);
+		     entry.fields.dest, entry.fields.dest_mode,
+		     entry.fields.delivery_mode, entry.fields.vector,
+		     entry.fields.trig_mode);
 
-	deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic, dest,
-							  dest_mode);
+	kvm_get_intr_delivery_bitmask(ioapic, &entry, &deliver_bitmask);
 	if (!deliver_bitmask) {
 		ioapic_debug("no target on destination\n");
 		return 0;
 	}
 
-	switch (delivery_mode) {
-	case IOAPIC_LOWEST_PRIORITY:
-		vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm, vector,
-				deliver_bitmask);
+	/* Always delivery PIT interrupt to vcpu 0 */
 #ifdef CONFIG_X86
-		if (irq == 0)
-			vcpu = ioapic->kvm->vcpus[0];
+	if (irq == 0)
+		deliver_bitmask = 1 << 0;
 #endif
-		if (vcpu != NULL)
-			r = ioapic_inj_irq(ioapic, vcpu, vector,
-				       trig_mode, delivery_mode);
-		else
-			ioapic_debug("null lowest prio vcpu: "
-				     "mask=%x vector=%x delivery_mode=%x\n",
-				     deliver_bitmask, vector, IOAPIC_LOWEST_PRIORITY);
-		break;
-	case IOAPIC_FIXED:
-#ifdef CONFIG_X86
-		if (irq == 0)
-			deliver_bitmask = 1;
-#endif
-		for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-			if (!(deliver_bitmask & (1 << vcpu_id)))
-				continue;
-			deliver_bitmask &= ~(1 << vcpu_id);
-			vcpu = ioapic->kvm->vcpus[vcpu_id];
-			if (vcpu) {
-				r = ioapic_inj_irq(ioapic, vcpu, vector,
-					       trig_mode, delivery_mode);
-			}
-		}
-		break;
-	case IOAPIC_NMI:
-		for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-			if (!(deliver_bitmask & (1 << vcpu_id)))
-				continue;
-			deliver_bitmask &= ~(1 << vcpu_id);
-			vcpu = ioapic->kvm->vcpus[vcpu_id];
-			if (vcpu)
+
+	for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
+		if (!(deliver_bitmask & (1 << vcpu_id)))
+			continue;
+		deliver_bitmask &= ~(1 << vcpu_id);
+		vcpu = ioapic->kvm->vcpus[vcpu_id];
+		if (vcpu) {
+			if (entry.fields.delivery_mode ==
+					IOAPIC_LOWEST_PRIORITY ||
+			    entry.fields.delivery_mode == IOAPIC_FIXED)
+				r = ioapic_inj_irq(ioapic, vcpu,
+						   entry.fields.vector,
+						   entry.fields.trig_mode,
+						   entry.fields.delivery_mode);
+			else if (entry.fields.delivery_mode == IOAPIC_NMI)
 				ioapic_inj_nmi(vcpu);
 			else
-				ioapic_debug("NMI to vcpu %d failed\n",
-						vcpu->vcpu_id);
-		}
-		break;
-	default:
-		printk(KERN_WARNING "Unsupported delivery mode %d\n",
-		       delivery_mode);
-		break;
+				ioapic_debug("unsupported delivery mode %x!\n",
+					     entry.fields.delivery_mode);
+		} else
+			ioapic_debug("null destination vcpu: "
+				     "mask=%x vector=%x delivery_mode=%x\n",
+				     entry.fields.deliver_bitmask,
+				     entry.fields.vector,
+				     entry.fields.delivery_mode);
 	}
 	return r;
 }
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 63cdf01..d89d8b2 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -29,6 +29,29 @@
 
 #include "ioapic.h"
 
+void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
+				   union kvm_ioapic_redirect_entry *entry,
+				   u32 *deliver_bitmask)
+{
+	struct kvm_vcpu *vcpu;
+
+	*deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic,
+				entry->fields.dest_id, entry->fields.dest_mode);
+	switch (entry->fields.delivery_mode) {
+	case IOAPIC_LOWEST_PRIORITY:
+		vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm,
+				entry->fields.vector, *deliver_bitmask);
+		*deliver_bitmask = 1 << vcpu->vcpu_id;
+		break;
+	case IOAPIC_FIXED:
+	case IOAPIC_NMI:
+		break;
+	default:
+		printk(KERN_INFO "kvm: unsupported delivery mode\n");
+		*deliver_bitmask = 0;
+	}
+}
+
 /* This should be called with the kvm->lock mutex held */
 void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level)
 {
@@ -38,7 +61,7 @@ void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level)
 	struct kvm_vcpu *vcpu;
 	struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
 	struct kvm_gsi_msg *gsi_msg;
-	int dest_id, vector, dest_mode, trig_mode, delivery_mode;
+	union kvm_ioapic_redirect_entry entry;
 	u32 deliver_bitmask;
 
 	BUG_ON(!ioapic);
@@ -75,40 +98,32 @@ void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level)
 		return;
 	}
 
-	dest_id = (gsi_msg->msg.address_lo & MSI_ADDR_DEST_ID_MASK)
+	entry.bits = 0;
+	entry.fields.dest_id = (gsi_msg->msg.address_lo & MSI_ADDR_DEST_ID_MASK)
 			>> MSI_ADDR_DEST_ID_SHIFT;
-	vector = (gsi_msg->msg.data & MSI_DATA_VECTOR_MASK)
+	entry.fields.vector = (gsi_msg->msg.data & MSI_DATA_VECTOR_MASK)
 			>> MSI_DATA_VECTOR_SHIFT;
-	dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT,
+	entry.fields.dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT,
 				(unsigned long *)&gsi_msg->msg.address_lo);
-	trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT,
+	entry.fields.trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT,
 				(unsigned long *)&gsi_msg->msg.data);
-	delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT,
+	entry.fields.delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT,
 				(unsigned long *)&gsi_msg->msg.data);
-	deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic,
-				dest_id, dest_mode);
-	/* IOAPIC delivery mode value is the same as MSI here */
-	switch (delivery_mode) {
-	case IOAPIC_LOWEST_PRIORITY:
-		vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm, vector,
-				deliver_bitmask);
-		if (vcpu != NULL)
-			kvm_apic_set_irq(vcpu, vector, trig_mode);
-		else
-			printk(KERN_INFO "kvm: null lowest priority vcpu!\n");
-		break;
-	case IOAPIC_FIXED:
-		for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-			if (!(deliver_bitmask & (1 << vcpu_id)))
-				continue;
-			deliver_bitmask &= ~(1 << vcpu_id);
-			vcpu = ioapic->kvm->vcpus[vcpu_id];
-			if (vcpu)
-				kvm_apic_set_irq(vcpu, vector, trig_mode);
-		}
-		break;
-	default:
-		printk(KERN_INFO "kvm: unsupported MSI delivery mode\n");
+
+	kvm_get_intr_delivery_bitmask(ioapic, &entry, &deliver_bitmask);
+
+	if (!deliver_bitmask) {
+		printk(KERN_WARNING "kvm: no destination for MSI delivery!");
+		return;
+	}
+	for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
+		if (!(deliver_bitmask & (1 << vcpu_id)))
+			continue;
+		deliver_bitmask &= ~(1 << vcpu_id);
+		vcpu = ioapic->kvm->vcpus[vcpu_id];
+		if (vcpu)
+			kvm_apic_set_irq(vcpu, entry.fields.vector,
+					 entry.fields.trig_mode);
 	}
 #endif
 }
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/15] KVM: Change API of kvm_ioapic_get_delivery_bitmask
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (9 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 10/15] KVM: Unified the delivery of IOAPIC and MSI Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 12/15] KVM: Update intr delivery func to accept unsigned long* bitmap Sheng Yang
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

In order to use with bit ops.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 virt/kvm/ioapic.c   |   17 ++++++++---------
 virt/kvm/ioapic.h   |    4 ++--
 virt/kvm/irq_comm.c |    5 +++--
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index af9f5de..ebd5ba6 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -153,22 +153,22 @@ static void ioapic_inj_nmi(struct kvm_vcpu *vcpu)
 	kvm_vcpu_kick(vcpu);
 }
 
-u32 kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
-				    u8 dest_mode)
+void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
+				     u8 dest_mode, u32 *mask)
 {
-	u32 mask = 0;
 	int i;
 	struct kvm *kvm = ioapic->kvm;
 	struct kvm_vcpu *vcpu;
 
 	ioapic_debug("dest %d dest_mode %d\n", dest, dest_mode);
 
+	*mask = 0;
 	if (dest_mode == 0) {	/* Physical mode. */
 		if (dest == 0xFF) {	/* Broadcast. */
 			for (i = 0; i < KVM_MAX_VCPUS; ++i)
 				if (kvm->vcpus[i] && kvm->vcpus[i]->arch.apic)
-					mask |= 1 << i;
-			return mask;
+					*mask |= 1 << i;
+			return;
 		}
 		for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 			vcpu = kvm->vcpus[i];
@@ -176,7 +176,7 @@ u32 kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
 				continue;
 			if (kvm_apic_match_physical_addr(vcpu->arch.apic, dest)) {
 				if (vcpu->arch.apic)
-					mask = 1 << i;
+					*mask = 1 << i;
 				break;
 			}
 		}
@@ -187,10 +187,9 @@ u32 kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
 				continue;
 			if (vcpu->arch.apic &&
 			    kvm_apic_match_logical_addr(vcpu->arch.apic, dest))
-				mask |= 1 << vcpu->vcpu_id;
+				*mask |= 1 << vcpu->vcpu_id;
 		}
-	ioapic_debug("mask %x\n", mask);
-	return mask;
+	ioapic_debug("mask %x\n", *mask);
 }
 
 static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index ee5b0bd..e107dbb 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -70,7 +70,7 @@ void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode);
 int kvm_ioapic_init(struct kvm *kvm);
 void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level);
 void kvm_ioapic_reset(struct kvm_ioapic *ioapic);
-u32 kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
-				u8 dest_mode);
+void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
+				     u8 dest_mode, u32 *mask);
 
 #endif
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index d89d8b2..1949587 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -35,8 +35,9 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
 {
 	struct kvm_vcpu *vcpu;
 
-	*deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic,
-				entry->fields.dest_id, entry->fields.dest_mode);
+	kvm_ioapic_get_delivery_bitmask(ioapic, entry->fields.dest_id,
+					entry->fields.dest_mode,
+					deliver_bitmask);
 	switch (entry->fields.delivery_mode) {
 	case IOAPIC_LOWEST_PRIORITY:
 		vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm,
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/15] KVM: Update intr delivery func to accept unsigned long* bitmap
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (10 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 11/15] KVM: Change API of kvm_ioapic_get_delivery_bitmask Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 13/15] KVM: bit ops for deliver_bitmap Sheng Yang
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

Would be used with bit ops, and would be easily extended if KVM_MAX_VCPUS is
increased.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 arch/x86/kvm/lapic.c     |    8 ++++----
 include/linux/kvm_host.h |    2 +-
 virt/kvm/ioapic.c        |    4 ++--
 virt/kvm/ioapic.h        |    4 ++--
 virt/kvm/irq_comm.c      |    6 +++---
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afac68c..c1e4935 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -403,7 +403,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 }
 
 static struct kvm_lapic *kvm_apic_round_robin(struct kvm *kvm, u8 vector,
-				       unsigned long bitmap)
+				       unsigned long *bitmap)
 {
 	int last;
 	int next;
@@ -415,7 +415,7 @@ static struct kvm_lapic *kvm_apic_round_robin(struct kvm *kvm, u8 vector,
 	do {
 		if (++next == KVM_MAX_VCPUS)
 			next = 0;
-		if (kvm->vcpus[next] == NULL || !test_bit(next, &bitmap))
+		if (kvm->vcpus[next] == NULL || !test_bit(next, bitmap))
 			continue;
 		apic = kvm->vcpus[next]->arch.apic;
 		if (apic && apic_enabled(apic))
@@ -431,7 +431,7 @@ static struct kvm_lapic *kvm_apic_round_robin(struct kvm *kvm, u8 vector,
 }
 
 struct kvm_vcpu *kvm_get_lowest_prio_vcpu(struct kvm *kvm, u8 vector,
-		unsigned long bitmap)
+		unsigned long *bitmap)
 {
 	struct kvm_lapic *apic;
 
@@ -502,7 +502,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
 	}
 
 	if (delivery_mode == APIC_DM_LOWEST) {
-		target = kvm_get_lowest_prio_vcpu(vcpu->kvm, vector, lpr_map);
+		target = kvm_get_lowest_prio_vcpu(vcpu->kvm, vector, &lpr_map);
 		if (target != NULL)
 			__apic_accept_irq(target->arch.apic, delivery_mode,
 					  vector, level, trig_mode);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4f92317..fbf102c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -332,7 +332,7 @@ struct kvm_gsi_msg {
 
 void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
 				   union kvm_ioapic_redirect_entry *entry,
-				   u32 *deliver_bitmask);
+				   unsigned long *deliver_bitmask);
 void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index ebd5ba6..164a746 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -154,7 +154,7 @@ static void ioapic_inj_nmi(struct kvm_vcpu *vcpu)
 }
 
 void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
-				     u8 dest_mode, u32 *mask)
+				     u8 dest_mode, unsigned long *mask)
 {
 	int i;
 	struct kvm *kvm = ioapic->kvm;
@@ -195,7 +195,7 @@ void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
 static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 {
 	union kvm_ioapic_redirect_entry entry = ioapic->redirtbl[irq];
-	u32 deliver_bitmask;
+	unsigned long deliver_bitmask;
 	struct kvm_vcpu *vcpu;
 	int vcpu_id, r = 0;
 
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index e107dbb..c418a7f 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -65,12 +65,12 @@ static inline struct kvm_ioapic *ioapic_irqchip(struct kvm *kvm)
 }
 
 struct kvm_vcpu *kvm_get_lowest_prio_vcpu(struct kvm *kvm, u8 vector,
-				       unsigned long bitmap);
+				       unsigned long *bitmap);
 void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode);
 int kvm_ioapic_init(struct kvm *kvm);
 void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level);
 void kvm_ioapic_reset(struct kvm_ioapic *ioapic);
 void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
-				     u8 dest_mode, u32 *mask);
+				     u8 dest_mode, unsigned long *mask);
 
 #endif
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 1949587..e74d679 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -31,7 +31,7 @@
 
 void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
 				   union kvm_ioapic_redirect_entry *entry,
-				   u32 *deliver_bitmask)
+				   unsigned long *deliver_bitmask)
 {
 	struct kvm_vcpu *vcpu;
 
@@ -41,7 +41,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
 	switch (entry->fields.delivery_mode) {
 	case IOAPIC_LOWEST_PRIORITY:
 		vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm,
-				entry->fields.vector, *deliver_bitmask);
+				entry->fields.vector, deliver_bitmask);
 		*deliver_bitmask = 1 << vcpu->vcpu_id;
 		break;
 	case IOAPIC_FIXED:
@@ -63,7 +63,7 @@ void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level)
 	struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
 	struct kvm_gsi_msg *gsi_msg;
 	union kvm_ioapic_redirect_entry entry;
-	u32 deliver_bitmask;
+	unsigned long deliver_bitmask;
 
 	BUG_ON(!ioapic);
 #endif
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/15] KVM: bit ops for deliver_bitmap
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (11 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 12/15] KVM: Update intr delivery func to accept unsigned long* bitmap Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25  9:09 ` [PATCH 14/15] KVM: Using kfifo for irq recording Sheng Yang
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

It's also convenient when we extend KVM supported vcpu number in the future.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 arch/x86/kvm/lapic.c |    7 ++++---
 virt/kvm/ioapic.c    |   24 +++++++++++++-----------
 virt/kvm/irq_comm.c  |   16 ++++++++--------
 3 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c1e4935..359e02c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -477,9 +477,10 @@ static void apic_send_ipi(struct kvm_lapic *apic)
 
 	struct kvm_vcpu *target;
 	struct kvm_vcpu *vcpu;
-	unsigned long lpr_map = 0;
+	DECLARE_BITMAP(lpr_map, KVM_MAX_VCPUS);
 	int i;
 
+	bitmap_zero(lpr_map, KVM_MAX_VCPUS);
 	apic_debug("icr_high 0x%x, icr_low 0x%x, "
 		   "short_hand 0x%x, dest 0x%x, trig_mode 0x%x, level 0x%x, "
 		   "dest_mode 0x%x, delivery_mode 0x%x, vector 0x%x\n",
@@ -494,7 +495,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
 		if (vcpu->arch.apic &&
 		    apic_match_dest(vcpu, apic, short_hand, dest, dest_mode)) {
 			if (delivery_mode == APIC_DM_LOWEST)
-				set_bit(vcpu->vcpu_id, &lpr_map);
+				set_bit(vcpu->vcpu_id, lpr_map);
 			else
 				__apic_accept_irq(vcpu->arch.apic, delivery_mode,
 						  vector, level, trig_mode);
@@ -502,7 +503,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
 	}
 
 	if (delivery_mode == APIC_DM_LOWEST) {
-		target = kvm_get_lowest_prio_vcpu(vcpu->kvm, vector, &lpr_map);
+		target = kvm_get_lowest_prio_vcpu(vcpu->kvm, vector, lpr_map);
 		if (target != NULL)
 			__apic_accept_irq(target->arch.apic, delivery_mode,
 					  vector, level, trig_mode);
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 164a746..bf83f5e 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -195,7 +195,7 @@ void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
 static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 {
 	union kvm_ioapic_redirect_entry entry = ioapic->redirtbl[irq];
-	unsigned long deliver_bitmask;
+	DECLARE_BITMAP(deliver_bitmask, KVM_MAX_VCPUS);
 	struct kvm_vcpu *vcpu;
 	int vcpu_id, r = 0;
 
@@ -205,22 +205,24 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 		     entry.fields.delivery_mode, entry.fields.vector,
 		     entry.fields.trig_mode);
 
-	kvm_get_intr_delivery_bitmask(ioapic, &entry, &deliver_bitmask);
-	if (!deliver_bitmask) {
-		ioapic_debug("no target on destination\n");
-		return 0;
-	}
+	bitmap_zero(deliver_bitmask, KVM_MAX_VCPUS);
 
 	/* Always delivery PIT interrupt to vcpu 0 */
 #ifdef CONFIG_X86
 	if (irq == 0)
-		deliver_bitmask = 1 << 0;
+		set_bit(0, deliver_bitmask);
+	else
 #endif
+		kvm_get_intr_delivery_bitmask(ioapic, &entry, deliver_bitmask);
+
+	if (find_first_bit(deliver_bitmask, KVM_MAX_VCPUS) >= KVM_MAX_VCPUS) {
+		ioapic_debug("no target on destination\n");
+		return 0;
+	}
 
-	for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-		if (!(deliver_bitmask & (1 << vcpu_id)))
-			continue;
-		deliver_bitmask &= ~(1 << vcpu_id);
+	while ((vcpu_id = find_first_bit(deliver_bitmask, KVM_MAX_VCPUS))
+			< KVM_MAX_VCPUS) {
+		clear_bit(vcpu_id, deliver_bitmask);
 		vcpu = ioapic->kvm->vcpus[vcpu_id];
 		if (vcpu) {
 			if (entry.fields.delivery_mode ==
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index e74d679..ecda2c1 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -42,7 +42,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
 	case IOAPIC_LOWEST_PRIORITY:
 		vcpu = kvm_get_lowest_prio_vcpu(ioapic->kvm,
 				entry->fields.vector, deliver_bitmask);
-		*deliver_bitmask = 1 << vcpu->vcpu_id;
+		set_bit(vcpu->vcpu_id, deliver_bitmask);
 		break;
 	case IOAPIC_FIXED:
 	case IOAPIC_NMI:
@@ -63,11 +63,12 @@ void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level)
 	struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
 	struct kvm_gsi_msg *gsi_msg;
 	union kvm_ioapic_redirect_entry entry;
-	unsigned long deliver_bitmask;
+	DECLARE_BITMAP(deliver_bitmask, KVM_MAX_VCPUS);
 
 	BUG_ON(!ioapic);
 #endif
 
+	bitmap_zero(deliver_bitmask, KVM_MAX_VCPUS);
 	if (!(gsi & KVM_GSI_MSG_MASK)) {
 		int irq = gsi;
 
@@ -111,16 +112,15 @@ void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level)
 	entry.fields.delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT,
 				(unsigned long *)&gsi_msg->msg.data);
 
-	kvm_get_intr_delivery_bitmask(ioapic, &entry, &deliver_bitmask);
+	kvm_get_intr_delivery_bitmask(ioapic, &entry, deliver_bitmask);
 
-	if (!deliver_bitmask) {
+	if (find_first_bit(deliver_bitmask, KVM_MAX_VCPUS) >= KVM_MAX_VCPUS) {
 		printk(KERN_WARNING "kvm: no destination for MSI delivery!");
 		return;
 	}
-	for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-		if (!(deliver_bitmask & (1 << vcpu_id)))
-			continue;
-		deliver_bitmask &= ~(1 << vcpu_id);
+	while ((vcpu_id = find_first_bit(deliver_bitmask, KVM_MAX_VCPUS))
+			< KVM_MAX_VCPUS) {
+		clear_bit(vcpu_id, deliver_bitmask);
 		vcpu = ioapic->kvm->vcpus[vcpu_id];
 		if (vcpu)
 			kvm_apic_set_irq(vcpu, entry.fields.vector,
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 14/15] KVM: Using kfifo for irq recording
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (12 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 13/15] KVM: bit ops for deliver_bitmap Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-26  2:29   ` [PATCH 14/15] KVM: Replace host_irq_disable with a new flag Sheng Yang
  2008-12-25  9:09 ` [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq Sheng Yang
  2008-12-25  9:13 ` [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
  15 siblings, 1 reply; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

For MSI-X, we have to deal with multiply IRQ with same IRQ handler, so it's
necessary to record the IRQ that trigger the IRQ handler.

And this one is also useful for fixing kvm_free_assigned_irq().

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm_host.h |    4 ++++
 virt/kvm/kvm_main.c      |   30 +++++++++++++++++++++++++++---
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index fbf102c..84b11d5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -17,6 +17,7 @@
 #include <linux/preempt.h>
 #include <linux/marker.h>
 #include <linux/msi.h>
+#include <linux/kfifo.h>
 #include <asm/signal.h>
 
 #include <linux/kvm.h>
@@ -313,6 +314,9 @@ struct kvm_assigned_dev_kernel {
 	int host_irq;
 	bool host_irq_disabled;
 	int guest_irq;
+#define KVM_ASSIGNED_DEV_IRQ_FIFO_LEN	0x100
+	struct kfifo *irq_fifo;
+	spinlock_t irq_fifo_lock;
 #define KVM_ASSIGNED_DEV_GUEST_INTX	(1 << 0)
 #define KVM_ASSIGNED_DEV_GUEST_MSI	(1 << 1)
 #define KVM_ASSIGNED_DEV_HOST_INTX	(1 << 8)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a51e630..1863942 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -99,6 +99,8 @@ static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *h
 static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
 {
 	struct kvm_assigned_dev_kernel *assigned_dev;
+	int irq;
+	u32 gsi;
 
 	assigned_dev = container_of(work, struct kvm_assigned_dev_kernel,
 				    interrupt_work);
@@ -109,14 +111,22 @@ static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
 	 */
 	mutex_lock(&assigned_dev->kvm->lock);
 
-	kvm_set_irq(assigned_dev->kvm, assigned_dev->irq_source_id,
-		    assigned_dev->guest_irq, 1);
+handle_irq:
+	kfifo_get(assigned_dev->irq_fifo,
+		  (unsigned char *)&irq, sizeof(int));
+
+	gsi = assigned_dev->guest_irq;
+
+	kvm_set_irq(assigned_dev->kvm, assigned_dev->irq_source_id, gsi, 1);
 
 	if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_GUEST_MSI) {
 		enable_irq(assigned_dev->host_irq);
 		assigned_dev->host_irq_disabled = false;
 	}
 
+	if (kfifo_len(assigned_dev->irq_fifo) != 0)
+		goto handle_irq;
+
 	mutex_unlock(&assigned_dev->kvm->lock);
 	kvm_put_kvm(assigned_dev->kvm);
 }
@@ -128,6 +138,9 @@ static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
 
 	kvm_get_kvm(assigned_dev->kvm);
 
+	kfifo_put(assigned_dev->irq_fifo,
+		  (unsigned char *)&irq, sizeof(int));
+
 	schedule_work(&assigned_dev->interrupt_work);
 
 	disable_irq_nosync(irq);
@@ -201,6 +214,7 @@ static void kvm_free_assigned_device(struct kvm *kvm,
 	pci_dev_put(assigned_dev->dev);
 
 	list_del(&assigned_dev->list);
+	kfifo_free(assigned_dev->irq_fifo);
 	kfree(assigned_dev);
 }
 
@@ -448,15 +462,25 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 
 	list_add(&match->list, &kvm->arch.assigned_dev_head);
 
+	spin_lock_init(&match->irq_fifo_lock);
+	match->irq_fifo = kfifo_alloc(sizeof(unsigned char) *
+				      KVM_ASSIGNED_DEV_IRQ_FIFO_LEN,
+				      GFP_KERNEL | __GFP_ZERO,
+				      &match->irq_fifo_lock);
+	if (!match->irq_fifo)
+		goto out_list_del;
+
 	if (assigned_dev->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU) {
 		r = kvm_iommu_map_guest(kvm, match);
 		if (r)
-			goto out_list_del;
+			goto out_fifo_del;
 	}
 
 out:
 	mutex_unlock(&kvm->lock);
 	return r;
+out_fifo_del:
+	kfifo_free(match->irq_fifo);
 out_list_del:
 	list_del(&match->list);
 	pci_release_regions(dev);
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (13 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 14/15] KVM: Using kfifo for irq recording Sheng Yang
@ 2008-12-25  9:09 ` Sheng Yang
  2008-12-25 11:56   ` Sheng Yang
  2008-12-25  9:13 ` [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
  15 siblings, 1 reply; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:09 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

Thanks to Marcelo's observation, The following code have potential issue:

if (cancel_work_sync(&assigned_dev->interrupt_work))
	kvm_put_kvm(kvm);

In fact, cancel_work_sync() would return true either work struct is only
scheduled or the callback of work struct is executed. This code only
consider the former situation.

Also, we have a window between cancel_work_sync() and free_irq. This patch fixs
them two.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 virt/kvm/kvm_main.c |   26 ++++++++++++++++++++++----
 1 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1863942..ed10f15 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -186,10 +186,28 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
 	if (!assigned_dev->irq_requested_type)
 		return;
 
-	if (cancel_work_sync(&assigned_dev->interrupt_work))
-		/* We had pending work. That means we will have to take
-		 * care of kvm_put_kvm.
-		 */
+	/*
+	 * We need to ensure: kvm_put_kvm() paired with kvm_get_kvm() in
+	 * kvm_assigned_dev_intr, and no more interrupt after we cancelled
+	 * current one.
+	 *
+	 * Here we have two possiblities for cancel_work_sync() return true:
+	 * 1. The work is scheduled, but callback haven't been called.  We need
+	 * to call kvm_put_kvm() here. And IRQ is already disabled without
+	 * doubt.
+	 *
+	 * 2. The callback have executed, here we don't need to call
+	 * kvm_put_kvm(), but we may need to disable irq(e.g. for MSI).
+	 *
+	 * We judge the two condition according to if we have pending IRQs in
+	 * irq_fifo.
+	 */
+	if (kfifo_len(assigned_dev->irq_fifo) == 0 &&
+	    (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI))
+		disable_irq_nosync(assigned_dev->host_irq);
+
+	if (cancel_work_sync(&assigned_dev->interrupt_work) &&
+	    kfifo_len(assigned_dev->irq_fifo) != 0)
 		kvm_put_kvm(kvm);
 
 	free_irq(assigned_dev->host_irq, (void *)assigned_dev);
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/15] Device assignment & MSI enhancement
  2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
                   ` (14 preceding siblings ...)
  2008-12-25  9:09 ` [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq Sheng Yang
@ 2008-12-25  9:13 ` Sheng Yang
  15 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-25  9:13 UTC (permalink / raw)
  To: kvm; +Cc: Avi Kivity, Marcelo Tosatti

On Thursday 25 December 2008 17:09:24 Sheng Yang wrote:
> Hi Avi and Marcelo
>
> Merry Xmas! And here is the v2 of patchset. Target at 2.6.29 for it
> contained a lot of fix and improvement of current device assignment and MSI
> feature.
>
> Change from V1:
>
> Addressed Marcelo's comments, and:
> 1. Fix racy in kvm_free_assigned_irq(). In case to do this, I fetch one
> patch (irq_fifo) from original MSI-X patchset. Indeed a nice catch of
> Marcelo.
>
> :)
>
> 2. Unified kvm_set_irq() with ioapic_deliver(). It didn't save much, but
> duplicate is always bothering, and I have modified bitmask for vcpu to a
> real bitmap (maybe not all, just what I have seen).

Forgot to mention, I didn't change API for guest to disable MSI which is a 
part of Marcelo's comments, for I think single interface named "update" with 
some flags represent the current bit state is enough for now...

-- 
regards
Yang, Sheng

>
> And for V1:
>
> 1. Add gsi_msg mapping mechanism, which gsi can used to indicated a MSI
> interrupt.(Notice API/ABI changed a little, but we don't have userspace
> patch now, so it should be OK.)
>
> 2. Provide MSI disable capability.
>
> arch/x86/kvm/lapic.c      |   11 ++-
> include/linux/kvm.h       |   15 +++-
> include/linux/kvm_host.h  |   26 +++++-
> include/linux/kvm_types.h |   17 ++++
> virt/kvm/ioapic.c         |  117 ++++++++++---------------
> virt/kvm/ioapic.h         |   23 +----
> virt/kvm/irq_comm.c       |  184 ++++++++++++++++++++++++++++++++++++---
> virt/kvm/kvm_main.c       |  212
> ++++++++++++++++++++++++++++----------------- 8 files changed, 415
> insertions(+), 190 deletions(-)
>
> Sorry for the patchset size, it's too easy to grow fast, and I am a little
> too lazy to split them into more batches in the Xmas... :)
>
> --
> regards
> Yang, Sheng
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-25  9:09 ` [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq Sheng Yang
@ 2008-12-25 11:56   ` Sheng Yang
  2008-12-26  2:30     ` Sheng Yang
  0 siblings, 1 reply; 37+ messages in thread
From: Sheng Yang @ 2008-12-25 11:56 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm

On Thu, Dec 25, 2008 at 05:09:39PM +0800, Sheng Yang wrote:
> Thanks to Marcelo's observation, The following code have potential issue:
> 
> if (cancel_work_sync(&assigned_dev->interrupt_work))
> 	kvm_put_kvm(kvm);
> 
> In fact, cancel_work_sync() would return true either work struct is only
> scheduled or the callback of work struct is executed. This code only
> consider the former situation.
> 
> Also, we have a window between cancel_work_sync() and free_irq. This patch fixs
> them two.
> 
> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> ---

Still got trouble here. Something can happened after kfifo_len check.

> +	if (kfifo_len(assigned_dev->irq_fifo) == 0 &&
> +	    (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI))
> +		disable_irq_nosync(assigned_dev->host_irq);
> +
> +	if (cancel_work_sync(&assigned_dev->interrupt_work) &&
> +	    kfifo_len(assigned_dev->irq_fifo) != 0)
>  		kvm_put_kvm(kvm);

I think just disable_irq for all is OK, though it would result in nested
disable sometime.

How about this:

--
From: Sheng Yang <sheng@linux.intel.com>
Date: Thu, 25 Dec 2008 19:45:03 +0800
Subject: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq

Thanks to Marcelo's observation, The following code have potential issue:

if (cancel_work_sync(&assigned_dev->interrupt_work))
	kvm_put_kvm(kvm);

In fact, cancel_work_sync() would return true either work struct is only
scheduled or the callback of work struct is executed. This code only
consider the former situation.

Also, we have a window between cancel_work_sync() and free_irq. This patch fixs
them two.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 virt/kvm/kvm_main.c |   25 +++++++++++++++++++++----
 1 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1863942..f4859b8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -186,10 +186,27 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
 	if (!assigned_dev->irq_requested_type)
 		return;

-	if (cancel_work_sync(&assigned_dev->interrupt_work))
-		/* We had pending work. That means we will have to take
-		 * care of kvm_put_kvm.
-		 */
+	/*
+	 * We need to ensure: kvm_put_kvm() paired with kvm_get_kvm() in
+	 * kvm_assigned_dev_intr, and no more interrupt after we cancelled
+	 * current one.
+	 *
+	 * Here we have two possiblities for cancel_work_sync() return true:
+	 * 1. The work is scheduled, but callback haven't been called.  We need
+	 * to call kvm_put_kvm() here. And IRQ is already disabled without
+	 * doubt.
+	 *
+	 * 2. The callback have executed, here we don't need to call
+	 * kvm_put_kvm(), but we may need to disable irq(e.g. for MSI).
+	 *
+	 * We judge the two condition according to if we have pending IRQs in
+	 * irq_fifo. And we disable irq here anyway, and it may resulted in
+	 * nested disable, but it's fine, for we are going to free it.
+	 */
+	disable_irq_nosync(assigned_dev->host_irq);
+
+	if (cancel_work_sync(&assigned_dev->interrupt_work) &&
+	    kfifo_len(assigned_dev->irq_fifo) != 0)
 		kvm_put_kvm(kvm);

 	free_irq(assigned_dev->host_irq, (void *)assigned_dev);
--
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 14/15] KVM: Replace host_irq_disable with a new flag
  2008-12-25  9:09 ` [PATCH 14/15] KVM: Using kfifo for irq recording Sheng Yang
@ 2008-12-26  2:29   ` Sheng Yang
  0 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-26  2:29 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

(I discard irq_fifo and change a method to fix this problem)

We can reused the field "state" later.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm_host.h |    3 ++-
 virt/kvm/kvm_main.c      |    8 ++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index fbf102c..58e4b7e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -311,13 +311,14 @@ struct kvm_assigned_dev_kernel {
 	int host_busnr;
 	int host_devfn;
 	int host_irq;
-	bool host_irq_disabled;
 	int guest_irq;
 #define KVM_ASSIGNED_DEV_GUEST_INTX	(1 << 0)
 #define KVM_ASSIGNED_DEV_GUEST_MSI	(1 << 1)
 #define KVM_ASSIGNED_DEV_HOST_INTX	(1 << 8)
 #define KVM_ASSIGNED_DEV_HOST_MSI	(1 << 9)
 	unsigned long irq_requested_type;
+#define KVM_ASSIGNED_DEV_HOST_IRQ_DISABLED	(1 << 0)
+	unsigned long state;
 	int irq_source_id;
 	struct pci_dev *dev;
 	struct kvm *kvm;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a51e630..065af2d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -114,7 +114,7 @@ static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
 
 	if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_GUEST_MSI) {
 		enable_irq(assigned_dev->host_irq);
-		assigned_dev->host_irq_disabled = false;
+		assigned_dev->state &= ~KVM_ASSIGNED_DEV_HOST_IRQ_DISABLED;
 	}
 
 	mutex_unlock(&assigned_dev->kvm->lock);
@@ -131,7 +131,7 @@ static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
 	schedule_work(&assigned_dev->interrupt_work);
 
 	disable_irq_nosync(irq);
-	assigned_dev->host_irq_disabled = true;
+	assigned_dev->state |= KVM_ASSIGNED_DEV_HOST_IRQ_DISABLED;
 
 	return IRQ_HANDLED;
 }
@@ -152,9 +152,9 @@ static void kvm_assigned_dev_ack_irq(struct kvm_irq_ack_notifier *kian)
 	/* The guest irq may be shared so this ack may be
 	 * from another device.
 	 */
-	if (dev->host_irq_disabled) {
+	if (dev->state & KVM_ASSIGNED_DEV_HOST_IRQ_DISABLED) {
 		enable_irq(dev->host_irq);
-		dev->host_irq_disabled = false;
+		dev->state &= ~KVM_ASSIGNED_DEV_HOST_IRQ_DISABLED;
 	}
 }
 
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-25 11:56   ` Sheng Yang
@ 2008-12-26  2:30     ` Sheng Yang
  2008-12-27 20:06       ` Marcelo Tosatti
  0 siblings, 1 reply; 37+ messages in thread
From: Sheng Yang @ 2008-12-26  2:30 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

Thanks to Marcelo's observation, The following code have potential issue:

if (cancel_work_sync(&assigned_dev->interrupt_work))
	kvm_put_kvm(kvm);

In fact, cancel_work_sync() would return true either work struct is only
scheduled or the callback of work struct is executed. This code only
consider the former situation.

Also, we have a window between cancel_work_sync() and free_irq. This patch fixs
them two.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm_host.h |    1 +
 virt/kvm/kvm_main.c      |   34 ++++++++++++++++++++++++++++++----
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 58e4b7e..e0775b9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -318,6 +318,7 @@ struct kvm_assigned_dev_kernel {
 #define KVM_ASSIGNED_DEV_HOST_MSI	(1 << 9)
 	unsigned long irq_requested_type;
 #define KVM_ASSIGNED_DEV_HOST_IRQ_DISABLED	(1 << 0)
+#define KVM_ASSIGNED_DEV_IRQ_GOT_KVM		(1 << 1)
 	unsigned long state;
 	int irq_source_id;
 	struct pci_dev *dev;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 065af2d..9ffa601 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -119,6 +119,7 @@ static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
 
 	mutex_unlock(&assigned_dev->kvm->lock);
 	kvm_put_kvm(assigned_dev->kvm);
+	assigned_dev->state &= ~KVM_ASSIGNED_DEV_IRQ_GOT_KVM;
 }
 
 static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
@@ -126,7 +127,15 @@ static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
 	struct kvm_assigned_dev_kernel *assigned_dev =
 		(struct kvm_assigned_dev_kernel *) dev_id;
 
+	/*
+	 * In kvm_free_device_irq, cancel_work_sync return true if:
+	 * 1. work is scheduled, and then cancelled.
+	 * 2. work callback is executed.
+	 *
+	 * We need to call kvm_put_kvm() for the former, but not the later.
+	 */
 	kvm_get_kvm(assigned_dev->kvm);
+	assigned_dev->state |= KVM_ASSIGNED_DEV_IRQ_GOT_KVM;
 
 	schedule_work(&assigned_dev->interrupt_work);
 
@@ -173,10 +182,27 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
 	if (!assigned_dev->irq_requested_type)
 		return;
 
-	if (cancel_work_sync(&assigned_dev->interrupt_work))
-		/* We had pending work. That means we will have to take
-		 * care of kvm_put_kvm.
-		 */
+	/*
+	 * We need to ensure: kvm_put_kvm() paired with kvm_get_kvm() in
+	 * kvm_assigned_dev_intr, and no more interrupt after we cancelled
+	 * current one.
+	 *
+	 * Here we have two possiblities for cancel_work_sync() return true:
+	 * 1. The work is scheduled, but callback haven't been called.  We need
+	 * to call kvm_put_kvm() here. And IRQ is already disabled without
+	 * doubt.
+	 *
+	 * 2. The callback have executed, here we don't need to call
+	 * kvm_put_kvm(), but we may need to disable irq(e.g. for MSI).
+	 *
+	 * We judge the two condition according assigned_dev->state. And we
+	 * disable irq here anyway, and it may resulted in IRQ nested disable,
+	 * but it's fine, for we are going to free it.
+	 */
+	disable_irq_nosync(assigned_dev->host_irq);
+
+	if (cancel_work_sync(&assigned_dev->interrupt_work) &&
+	    assigned_dev->state & KVM_ASSIGNED_DEV_IRQ_GOT_KVM)
 		kvm_put_kvm(kvm);
 
 	free_irq(assigned_dev->host_irq, (void *)assigned_dev);
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-26  2:30     ` Sheng Yang
@ 2008-12-27 20:06       ` Marcelo Tosatti
  2008-12-27 20:15         ` Marcelo Tosatti
  2008-12-28 11:24         ` Sheng Yang
  0 siblings, 2 replies; 37+ messages in thread
From: Marcelo Tosatti @ 2008-12-27 20:06 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, kvm

On Fri, Dec 26, 2008 at 10:30:07AM +0800, Sheng Yang wrote:
> Thanks to Marcelo's observation, The following code have potential issue:
> 
> if (cancel_work_sync(&assigned_dev->interrupt_work))
> 	kvm_put_kvm(kvm);
> 
> In fact, cancel_work_sync() would return true either work struct is only
> scheduled or the callback of work struct is executed. This code only
> consider the former situation.

Why not simply drop the reference inc / dec from irq handler/work
function?

Just make sure that there is no queued/executing work left behind on
vm shutdown. Don't think an additional reference is necessary. Or am I
missing something?

> Also, we have a window between cancel_work_sync() and free_irq. 

This one looks OK.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-27 20:06       ` Marcelo Tosatti
@ 2008-12-27 20:15         ` Marcelo Tosatti
  2008-12-28 11:24         ` Sheng Yang
  1 sibling, 0 replies; 37+ messages in thread
From: Marcelo Tosatti @ 2008-12-27 20:15 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, kvm

On Sat, Dec 27, 2008 at 06:06:26PM -0200, Marcelo Tosatti wrote:
> On Fri, Dec 26, 2008 at 10:30:07AM +0800, Sheng Yang wrote:
> > Thanks to Marcelo's observation, The following code have potential issue:
> > 
> > if (cancel_work_sync(&assigned_dev->interrupt_work))
> > 	kvm_put_kvm(kvm);
> > 
> > In fact, cancel_work_sync() would return true either work struct is only
> > scheduled or the callback of work struct is executed. This code only
> > consider the former situation.
> 
> Why not simply drop the reference inc / dec from irq handler/work
> function?
> 
> Just make sure that there is no queued/executing work left behind on
> vm shutdown. Don't think an additional reference is necessary. Or am I
> missing something?

And maybe drop this issue from the patchset and fix it separately, since
it is a bugfix.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-27 20:06       ` Marcelo Tosatti
  2008-12-27 20:15         ` Marcelo Tosatti
@ 2008-12-28 11:24         ` Sheng Yang
  2008-12-28 12:57           ` Avi Kivity
  2008-12-29  5:42           ` Amit Shah
  1 sibling, 2 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-28 11:24 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm, Amit Shah, Han, Weidong

On Sat, Dec 27, 2008 at 06:06:26PM -0200, Marcelo Tosatti wrote:
> On Fri, Dec 26, 2008 at 10:30:07AM +0800, Sheng Yang wrote:
> > Thanks to Marcelo's observation, The following code have potential issue:
> > 
> > if (cancel_work_sync(&assigned_dev->interrupt_work))
> > 	kvm_put_kvm(kvm);
> > 
> > In fact, cancel_work_sync() would return true either work struct is only
> > scheduled or the callback of work struct is executed. This code only
> > consider the former situation.
> 
> Why not simply drop the reference inc / dec from irq handler/work
> function?

Sorry, I don't know the answer. After checking the code, I also think it's a
little strange to increase refernce count here, and I think we won't suppose
work_handler can release the kvm struct.

Maybe Avi knows? Or Amit and Weidong?

--
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-28 11:24         ` Sheng Yang
@ 2008-12-28 12:57           ` Avi Kivity
  2008-12-29  5:42           ` Amit Shah
  1 sibling, 0 replies; 37+ messages in thread
From: Avi Kivity @ 2008-12-28 12:57 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Marcelo Tosatti, kvm, Amit Shah, Han, Weidong

Sheng Yang wrote:
>>> if (cancel_work_sync(&assigned_dev->interrupt_work))
>>> 	kvm_put_kvm(kvm);
>>>
>>> In fact, cancel_work_sync() would return true either work struct is only
>>> scheduled or the callback of work struct is executed. This code only
>>> consider the former situation.
>>>       
>> Why not simply drop the reference inc / dec from irq handler/work
>> function?
>>     
>
> Sorry, I don't know the answer. After checking the code, I also think it's a
> little strange to increase refernce count here, and I think we won't suppose
> work_handler can release the kvm struct.
>
> Maybe Avi knows? Or Amit and Weidong?
>   

Not sure what the reasoning was, but it does seem like reference 
counting can be safely dropped from interrupt_work.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-28 11:24         ` Sheng Yang
  2008-12-28 12:57           ` Avi Kivity
@ 2008-12-29  5:42           ` Amit Shah
  2008-12-29 12:23             ` Sheng Yang
  2008-12-29 13:20             ` Avi Kivity
  1 sibling, 2 replies; 37+ messages in thread
From: Amit Shah @ 2008-12-29  5:42 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Marcelo Tosatti, Avi Kivity, kvm, Amit Shah, Han, Weidong

On Sun, Dec 28, 2008 at 07:24:02PM +0800, Sheng Yang wrote:
> On Sat, Dec 27, 2008 at 06:06:26PM -0200, Marcelo Tosatti wrote:
> > On Fri, Dec 26, 2008 at 10:30:07AM +0800, Sheng Yang wrote:
> > > Thanks to Marcelo's observation, The following code have potential issue:
> > > 
> > > if (cancel_work_sync(&assigned_dev->interrupt_work))
> > > 	kvm_put_kvm(kvm);
> > > 
> > > In fact, cancel_work_sync() would return true either work struct is only
> > > scheduled or the callback of work struct is executed. This code only
> > > consider the former situation.
> > 
> > Why not simply drop the reference inc / dec from irq handler/work
> > function?
> 
> Sorry, I don't know the answer. After checking the code, I also think it's a
> little strange to increase refernce count here, and I think we won't suppose
> work_handler can release the kvm struct.

At the time of developing that code, this was my observation:

I see from the call chain kvm_put_kvm->...->kvm_arch_destroy_vm, no locks are 
taken to actually destroy the vm. We can't be in ioctls, sure. But shouldn't 
the mutex be taken to ensure there's nothing else going on while destroying?

At least with the workqueue model, we could be called asynchronously in kernel 
context and I would have held the mutex and about to inject interrupts while 
everything is being wiped off underneath. However, the workqueue model tries 
its best to schedule the work on the same CPU, though we can't use that 
guarantee to ensure things will be fine.

---
So I had to get a ref to the current vm till we had any pending work scheduled. I think I put in comments in the code, but sadly most of my comments we stripped out before the merge.

Amit

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-29  5:42           ` Amit Shah
@ 2008-12-29 12:23             ` Sheng Yang
  2008-12-29 13:37               ` Avi Kivity
  2008-12-29 15:20               ` Marcelo Tosatti
  2008-12-29 13:20             ` Avi Kivity
  1 sibling, 2 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-29 12:23 UTC (permalink / raw)
  To: Amit Shah; +Cc: Marcelo Tosatti, Avi Kivity, kvm, Amit Shah, Han, Weidong

On Monday 29 December 2008 13:42:22 Amit Shah wrote:
> On Sun, Dec 28, 2008 at 07:24:02PM +0800, Sheng Yang wrote:
> > On Sat, Dec 27, 2008 at 06:06:26PM -0200, Marcelo Tosatti wrote:
> > > On Fri, Dec 26, 2008 at 10:30:07AM +0800, Sheng Yang wrote:
> > > > Thanks to Marcelo's observation, The following code have potential
> > > > issue:
> > > >
> > > > if (cancel_work_sync(&assigned_dev->interrupt_work))
> > > > 	kvm_put_kvm(kvm);
> > > >
> > > > In fact, cancel_work_sync() would return true either work struct is
> > > > only scheduled or the callback of work struct is executed. This code
> > > > only consider the former situation.
> > >
> > > Why not simply drop the reference inc / dec from irq handler/work
> > > function?
> >
> > Sorry, I don't know the answer. After checking the code, I also think
> > it's a little strange to increase refernce count here, and I think we
> > won't suppose work_handler can release the kvm struct.
>
> At the time of developing that code, this was my observation:
>
> I see from the call chain kvm_put_kvm->...->kvm_arch_destroy_vm, no locks
> are taken to actually destroy the vm. We can't be in ioctls, sure. But
> shouldn't the mutex be taken to ensure there's nothing else going on while
> destroying?
>
> At least with the workqueue model, we could be called asynchronously in
> kernel context and I would have held the mutex and about to inject
> interrupts while everything is being wiped off underneath. However, the
> workqueue model tries its best to schedule the work on the same CPU, though
> we can't use that guarantee to ensure things will be fine.
>
> ---
> So I had to get a ref to the current vm till we had any pending work
> scheduled. I think I put in comments in the code, but sadly most of my
> comments we stripped out before the merge.
>
Not quite understand...

The free assigned device in the destroy path of VM, so as free irq. And we got 
cancel_work_sync() in free irq which can sync with the execution of scheduled 
work. And now before cancel_work_sync(), we disable the interrupt so that no 
more schedule work happen again. So after cancel_work_sync(), everything(I 
think it's irq handler and schedule work here) asynchronously should quiet 
down.

Or I miss something?

-- 
regards
Yang, Sheng


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-29  5:42           ` Amit Shah
  2008-12-29 12:23             ` Sheng Yang
@ 2008-12-29 13:20             ` Avi Kivity
  1 sibling, 0 replies; 37+ messages in thread
From: Avi Kivity @ 2008-12-29 13:20 UTC (permalink / raw)
  To: Amit Shah; +Cc: Sheng Yang, Marcelo Tosatti, kvm, Amit Shah, Han, Weidong

Amit Shah wrote:
> I see from the call chain kvm_put_kvm->...->kvm_arch_destroy_vm, no locks are 
> taken to actually destroy the vm. We can't be in ioctls, sure. But shouldn't 
> the mutex be taken to ensure there's nothing else going on while destroying?
>
>   

Locks are useless to guard against something happening concurrent with 
destruction, since we're about to destroy the lock.

> At least with the workqueue model, we could be called asynchronously in kernel 
> context and I would have held the mutex and about to inject interrupts while 
> everything is being wiped off underneath. However, the workqueue model tries 
> its best to schedule the work on the same CPU, though we can't use that 
> guarantee to ensure things will be fine.
>
> ---
> So I had to get a ref to the current vm till we had any pending work scheduled. 

I think that's the right thing to do.

> I think I put in comments in the code, but sadly most of my comments we stripped out before the merge.
>   

Pity.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-29 12:23             ` Sheng Yang
@ 2008-12-29 13:37               ` Avi Kivity
  2008-12-29 13:49                 ` Sheng Yang
  2008-12-29 15:20               ` Marcelo Tosatti
  1 sibling, 1 reply; 37+ messages in thread
From: Avi Kivity @ 2008-12-29 13:37 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Amit Shah, Marcelo Tosatti, kvm, Amit Shah, Han, Weidong

Sheng Yang wrote:
> The free assigned device in the destroy path of VM, so as free irq. And we got 
> cancel_work_sync() in free irq which can sync with the execution of scheduled 
> work. And now before cancel_work_sync(), we disable the interrupt so that no 
> more schedule work happen again. So after cancel_work_sync(), everything(I 
> think it's irq handler and schedule work here) asynchronously should quiet 
> down.
>
> Or I miss something?
>   

Suppose the work_struct gets scheduled, but is delayed somewhere in the 
scheduler.  Some kill -9s the VM, and it starts getting destroyed.  
cancel_work_sync() can no longer truly cancel the work, so it has to 
schedule and wait for its completion.

So now we have kvm_assigned_dev_interrupt_work_handler() running in a 
partially destroyed VM.  It may work or not, but it's a fragile 
situation (changing the order of destruction of components will likely 
break things) and it's easy to avoid by keeping the reference count 
elevated.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-29 13:37               ` Avi Kivity
@ 2008-12-29 13:49                 ` Sheng Yang
  0 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2008-12-29 13:49 UTC (permalink / raw)
  To: kvm; +Cc: Avi Kivity, Amit Shah, Marcelo Tosatti, Amit Shah, Han, Weidong

On Monday 29 December 2008 21:37:52 Avi Kivity wrote:
> Sheng Yang wrote:
> > The free assigned device in the destroy path of VM, so as free irq. And
> > we got cancel_work_sync() in free irq which can sync with the execution
> > of scheduled work. And now before cancel_work_sync(), we disable the
> > interrupt so that no more schedule work happen again. So after
> > cancel_work_sync(), everything(I think it's irq handler and schedule work
> > here) asynchronously should quiet down.
> >
> > Or I miss something?
>
> Suppose the work_struct gets scheduled, but is delayed somewhere in the
> scheduler.  Some kill -9s the VM, and it starts getting destroyed.
> cancel_work_sync() can no longer truly cancel the work, so it has to
> schedule and wait for its completion.
>
> So now we have kvm_assigned_dev_interrupt_work_handler() running in a
> partially destroyed VM.  It may work or not, but it's a fragile
> situation (changing the order of destruction of components will likely
> break things) and it's easy to avoid by keeping the reference count
> elevated.

OK, got it. Thanks for explaining!

-- 
regards
Yang, Sheng


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-29 12:23             ` Sheng Yang
  2008-12-29 13:37               ` Avi Kivity
@ 2008-12-29 15:20               ` Marcelo Tosatti
  2008-12-30  2:14                 ` Sheng Yang
  1 sibling, 1 reply; 37+ messages in thread
From: Marcelo Tosatti @ 2008-12-29 15:20 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Amit Shah, Avi Kivity, kvm, Amit Shah, Han, Weidong

On Mon, Dec 29, 2008 at 08:23:28PM +0800, Sheng Yang wrote:
> On Monday 29 December 2008 13:42:22 Amit Shah wrote:
> > On Sun, Dec 28, 2008 at 07:24:02PM +0800, Sheng Yang wrote:
> > > On Sat, Dec 27, 2008 at 06:06:26PM -0200, Marcelo Tosatti wrote:
> > > > On Fri, Dec 26, 2008 at 10:30:07AM +0800, Sheng Yang wrote:
> > > > > Thanks to Marcelo's observation, The following code have potential
> > > > > issue:
> > > > >
> > > > > if (cancel_work_sync(&assigned_dev->interrupt_work))
> > > > > 	kvm_put_kvm(kvm);
> > > > >
> > > > > In fact, cancel_work_sync() would return true either work struct is
> > > > > only scheduled or the callback of work struct is executed. This code
> > > > > only consider the former situation.
> > > >
> > > > Why not simply drop the reference inc / dec from irq handler/work
> > > > function?
> > >
> > > Sorry, I don't know the answer. After checking the code, I also think
> > > it's a little strange to increase refernce count here, and I think we
> > > won't suppose work_handler can release the kvm struct.
> >
> > At the time of developing that code, this was my observation:
> >
> > I see from the call chain kvm_put_kvm->...->kvm_arch_destroy_vm, no locks
> > are taken to actually destroy the vm. We can't be in ioctls, sure. But
> > shouldn't the mutex be taken to ensure there's nothing else going on while
> > destroying?
> >
> > At least with the workqueue model, we could be called asynchronously in
> > kernel context and I would have held the mutex and about to inject
> > interrupts while everything is being wiped off underneath. However, the
> > workqueue model tries its best to schedule the work on the same CPU, though
> > we can't use that guarantee to ensure things will be fine.
> >
> > ---
> > So I had to get a ref to the current vm till we had any pending work
> > scheduled. I think I put in comments in the code, but sadly most of my
> > comments we stripped out before the merge.
> >
> Not quite understand...
> 
> The free assigned device in the destroy path of VM, so as free irq. And we got 
> cancel_work_sync() in free irq which can sync with the execution of scheduled 
> work. And now before cancel_work_sync(), we disable the interrupt so that no 
> more schedule work happen again. So after cancel_work_sync(), everything(I 
> think it's irq handler and schedule work here) asynchronously should quiet 
> down.
> 
> Or I miss something?

Thats right. As long as you disable the irq and cancel pending work
before freeing the data structures those paths use.

There is one remaining issue: kvm_assigned_dev_interrupt_work_handler
can re-enable the interrupt for KVM_ASSIGNED_DEV_GUEST_MSI case. Perhaps
you need a new flag to indicate shutdown (so the host IRQ won't be
reenabled).



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-29 15:20               ` Marcelo Tosatti
@ 2008-12-30  2:14                 ` Sheng Yang
  2008-12-30 16:45                   ` Marcelo Tosatti
  0 siblings, 1 reply; 37+ messages in thread
From: Sheng Yang @ 2008-12-30  2:14 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Amit Shah, Avi Kivity, kvm, Amit Shah, Han, Weidong

On Monday 29 December 2008 23:20:57 Marcelo Tosatti wrote:
> On Mon, Dec 29, 2008 at 08:23:28PM +0800, Sheng Yang wrote:
> > On Monday 29 December 2008 13:42:22 Amit Shah wrote:
> > > On Sun, Dec 28, 2008 at 07:24:02PM +0800, Sheng Yang wrote:
> > > > On Sat, Dec 27, 2008 at 06:06:26PM -0200, Marcelo Tosatti wrote:
> > > > > On Fri, Dec 26, 2008 at 10:30:07AM +0800, Sheng Yang wrote:
> > > > > > Thanks to Marcelo's observation, The following code have
> > > > > > potential issue:
> > > > > >
> > > > > > if (cancel_work_sync(&assigned_dev->interrupt_work))
> > > > > > 	kvm_put_kvm(kvm);
> > > > > >
> > > > > > In fact, cancel_work_sync() would return true either work struct
> > > > > > is only scheduled or the callback of work struct is executed.
> > > > > > This code only consider the former situation.
> > > > >
> > > > > Why not simply drop the reference inc / dec from irq handler/work
> > > > > function?
> > > >
> > > > Sorry, I don't know the answer. After checking the code, I also think
> > > > it's a little strange to increase refernce count here, and I think we
> > > > won't suppose work_handler can release the kvm struct.
> > >
> > > At the time of developing that code, this was my observation:
> > >
> > > I see from the call chain kvm_put_kvm->...->kvm_arch_destroy_vm, no
> > > locks are taken to actually destroy the vm. We can't be in ioctls,
> > > sure. But shouldn't the mutex be taken to ensure there's nothing else
> > > going on while destroying?
> > >
> > > At least with the workqueue model, we could be called asynchronously in
> > > kernel context and I would have held the mutex and about to inject
> > > interrupts while everything is being wiped off underneath. However, the
> > > workqueue model tries its best to schedule the work on the same CPU,
> > > though we can't use that guarantee to ensure things will be fine.
> > >
> > > ---
> > > So I had to get a ref to the current vm till we had any pending work
> > > scheduled. I think I put in comments in the code, but sadly most of my
> > > comments we stripped out before the merge.
> >
> > Not quite understand...
> >
> > The free assigned device in the destroy path of VM, so as free irq. And
> > we got cancel_work_sync() in free irq which can sync with the execution
> > of scheduled work. And now before cancel_work_sync(), we disable the
> > interrupt so that no more schedule work happen again. So after
> > cancel_work_sync(), everything(I think it's irq handler and schedule work
> > here) asynchronously should quiet down.
> >
> > Or I miss something?
>
> Thats right. As long as you disable the irq and cancel pending work
> before freeing the data structures those paths use.
>
> There is one remaining issue: kvm_assigned_dev_interrupt_work_handler
> can re-enable the interrupt for KVM_ASSIGNED_DEV_GUEST_MSI case. Perhaps
> you need a new flag to indicate shutdown (so the host IRQ won't be
> reenabled).

Is it already covered by disable_irq_no_sync() before cancel_work_sync()? I've 
noted this in my comment: the irq may be disabled nested(once for MSI and 
twice for INTx), but I think it's fine for we're going to free it.

-- 
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-30  2:14                 ` Sheng Yang
@ 2008-12-30 16:45                   ` Marcelo Tosatti
  2008-12-31  5:43                     ` Sheng Yang
  0 siblings, 1 reply; 37+ messages in thread
From: Marcelo Tosatti @ 2008-12-30 16:45 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Amit Shah, Avi Kivity, kvm, Amit Shah, Han, Weidong

On Tue, Dec 30, 2008 at 10:14:09AM +0800, Sheng Yang wrote:
> > There is one remaining issue: kvm_assigned_dev_interrupt_work_handler
> > can re-enable the interrupt for KVM_ASSIGNED_DEV_GUEST_MSI case. Perhaps
> > you need a new flag to indicate shutdown (so the host IRQ won't be
> > reenabled).
> 
> Is it already covered by disable_irq_no_sync() before cancel_work_sync()? I've 
> noted this in my comment: the irq may be disabled nested(once for MSI and 
> twice for INTx), but I think it's fine for we're going to free it.

The problem is that the irq can be re-enabled in
kvm_assigned_dev_interrupt_work_handler:


context 1               |    context 2

disable_irq_nosync
                            kvm_assigned_dev_interrupt_work_handler
                            enable_irq
cancel_work_sync

free_irq
                            
So between cancel_work_sync and free_irq kvm_assigned_dev_intr can run
and schedule work.

I guess it is OK to take the kvm mutex in kvm_free_assigned_irq (but
better verify), so it can:

mutex_lock
assigned_dev->irq_requested_type = 0;
mutex_unlock

disable_irq_nosync
cancel_work_sync
free_irq

So that the work handler won't re-enable the interrupt.

Other than that the latest patchset looks good.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-30 16:45                   ` Marcelo Tosatti
@ 2008-12-31  5:43                     ` Sheng Yang
  2009-01-02  0:10                       ` Marcelo Tosatti
  0 siblings, 1 reply; 37+ messages in thread
From: Sheng Yang @ 2008-12-31  5:43 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Amit Shah, Avi Kivity, kvm, Amit Shah, Han, Weidong

On Wednesday 31 December 2008 00:45:51 Marcelo Tosatti wrote:
> On Tue, Dec 30, 2008 at 10:14:09AM +0800, Sheng Yang wrote:
> > > There is one remaining issue: kvm_assigned_dev_interrupt_work_handler
> > > can re-enable the interrupt for KVM_ASSIGNED_DEV_GUEST_MSI case.
> > > Perhaps you need a new flag to indicate shutdown (so the host IRQ won't
> > > be reenabled).
> >
> > Is it already covered by disable_irq_no_sync() before cancel_work_sync()?
> > I've noted this in my comment: the irq may be disabled nested(once for
> > MSI and twice for INTx), but I think it's fine for we're going to free
> > it.
>
> The problem is that the irq can be re-enabled in
> kvm_assigned_dev_interrupt_work_handler:
>
>
> context 1               |    context 2
>
> disable_irq_nosync
>                             kvm_assigned_dev_interrupt_work_handler
>                             enable_irq
> cancel_work_sync
>
> free_irq
>

Um... My understanding is a little different...

Before context1 execute disable_irq_nosync(), in irq handler, 
disable_irq_nosync() would also been executed. So only one enable_irq() is not 
really enable irq here, which can cover the window at all. That's what I means 
nested disable irq...

-- 
regards
Yang, Sheng

> So between cancel_work_sync and free_irq kvm_assigned_dev_intr can run
> and schedule work.
>
> I guess it is OK to take the kvm mutex in kvm_free_assigned_irq (but
> better verify), so it can:
>
> mutex_lock
> assigned_dev->irq_requested_type = 0;
> mutex_unlock
>
> disable_irq_nosync
> cancel_work_sync
> free_irq
>
> So that the work handler won't re-enable the interrupt.
>
> Other than that the latest patchset looks good.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2008-12-31  5:43                     ` Sheng Yang
@ 2009-01-02  0:10                       ` Marcelo Tosatti
  2009-01-05  7:07                         ` Sheng Yang
  0 siblings, 1 reply; 37+ messages in thread
From: Marcelo Tosatti @ 2009-01-02  0:10 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Amit Shah, Avi Kivity, kvm, Amit Shah, Han, Weidong

On Wed, Dec 31, 2008 at 01:43:54PM +0800, Sheng Yang wrote:
> On Wednesday 31 December 2008 00:45:51 Marcelo Tosatti wrote:
> > On Tue, Dec 30, 2008 at 10:14:09AM +0800, Sheng Yang wrote:
> > > > There is one remaining issue: kvm_assigned_dev_interrupt_work_handler
> > > > can re-enable the interrupt for KVM_ASSIGNED_DEV_GUEST_MSI case.
> > > > Perhaps you need a new flag to indicate shutdown (so the host IRQ won't
> > > > be reenabled).
> > >
> > > Is it already covered by disable_irq_no_sync() before cancel_work_sync()?
> > > I've noted this in my comment: the irq may be disabled nested(once for
> > > MSI and twice for INTx), but I think it's fine for we're going to free
> > > it.
> >
> > The problem is that the irq can be re-enabled in
> > kvm_assigned_dev_interrupt_work_handler:
> >
> >
> > context 1               |    context 2
> >
> > disable_irq_nosync
> >                             kvm_assigned_dev_interrupt_work_handler
> >                             enable_irq
> > cancel_work_sync
> >
> > free_irq
> >
> 
> Um... My understanding is a little different...
> 
> Before context1 execute disable_irq_nosync(), in irq handler, 
> disable_irq_nosync() would also been executed. So only one enable_irq() is not 
> really enable irq here, which can cover the window at all. That's what I means 
> nested disable irq...

You're right. There have been two disable_irq calls.

OK, looks good to me.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2009-01-02  0:10                       ` Marcelo Tosatti
@ 2009-01-05  7:07                         ` Sheng Yang
  2009-01-05 13:27                           ` Avi Kivity
  0 siblings, 1 reply; 37+ messages in thread
From: Sheng Yang @ 2009-01-05  7:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Amit Shah, Avi Kivity, kvm, Amit Shah, Han, Weidong

On Friday 02 January 2009 08:10:37 Marcelo Tosatti wrote:
> On Wed, Dec 31, 2008 at 01:43:54PM +0800, Sheng Yang wrote:
> > On Wednesday 31 December 2008 00:45:51 Marcelo Tosatti wrote:
> > > On Tue, Dec 30, 2008 at 10:14:09AM +0800, Sheng Yang wrote:
> > > > > There is one remaining issue:
> > > > > kvm_assigned_dev_interrupt_work_handler can re-enable the interrupt
> > > > > for KVM_ASSIGNED_DEV_GUEST_MSI case. Perhaps you need a new flag to
> > > > > indicate shutdown (so the host IRQ won't be reenabled).
> > > >
> > > > Is it already covered by disable_irq_no_sync() before
> > > > cancel_work_sync()? I've noted this in my comment: the irq may be
> > > > disabled nested(once for MSI and twice for INTx), but I think it's
> > > > fine for we're going to free it.
> > >
> > > The problem is that the irq can be re-enabled in
> > > kvm_assigned_dev_interrupt_work_handler:
> > >
> > >
> > > context 1               |    context 2
> > >
> > > disable_irq_nosync
> > >                             kvm_assigned_dev_interrupt_work_handler
> > >                             enable_irq
> > > cancel_work_sync
> > >
> > > free_irq
> >
> > Um... My understanding is a little different...
> >
> > Before context1 execute disable_irq_nosync(), in irq handler,
> > disable_irq_nosync() would also been executed. So only one enable_irq()
> > is not really enable irq here, which can cover the window at all. That's
> > what I means nested disable irq...
>
> You're right. There have been two disable_irq calls.
>
> OK, looks good to me.

So Avi, can you check in the patchset?

Thanks.

-- 
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2009-01-05  7:07                         ` Sheng Yang
@ 2009-01-05 13:27                           ` Avi Kivity
  2009-01-06  1:25                             ` Sheng Yang
  0 siblings, 1 reply; 37+ messages in thread
From: Avi Kivity @ 2009-01-05 13:27 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Marcelo Tosatti, Amit Shah, kvm, Amit Shah, Han, Weidong

Sheng Yang wrote:
> So Avi, can you check in the patchset?
>   

I had some comments about simplifying the userspace interface (or is 
that another patchset?)

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq
  2009-01-05 13:27                           ` Avi Kivity
@ 2009-01-06  1:25                             ` Sheng Yang
  0 siblings, 0 replies; 37+ messages in thread
From: Sheng Yang @ 2009-01-06  1:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm, Amit Shah, Han, Weidong

On Monday 05 January 2009 21:27:00 Avi Kivity wrote:
> Sheng Yang wrote:
> > So Avi, can you check in the patchset?
>
> I had some comments about simplifying the userspace interface (or is
> that another patchset?)

Oh, I means another patchset while Marcelo reply to the old thread. The 
patchset titled "[PATCH 0/3] Fix racy in kvm_free_assigned_irq".

-- 
regards
Yang, Sheng


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2009-01-06  1:25 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-25  9:09 [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang
2008-12-25  9:09 ` [PATCH 01/15] KVM: Add MSI_ACTION flag for assigned irq Sheng Yang
2008-12-25  9:09 ` [PATCH 02/15] KVM: Use kvm_free_assigned_irq() for free irq Sheng Yang
2008-12-25  9:09 ` [PATCH 03/15] KVM: Add support to disable MSI for assigned device Sheng Yang
2008-12-25  9:09 ` [PATCH 04/15] KVM: Add a route layer to convert MSI message to GSI Sheng Yang
2008-12-25  9:09 ` [PATCH 05/15] KVM: Using gsi_msg mapping for MSI device assignment Sheng Yang
2008-12-25  9:09 ` [PATCH 06/15] KVM: Improve MSI dispatch function Sheng Yang
2008-12-25  9:09 ` [PATCH 07/15] KVM: Using ioapic_irqchip() macro for kvm_set_irq Sheng Yang
2008-12-25  9:09 ` [PATCH 08/15] KVM: Merge MSI handling to kvm_set_irq Sheng Yang
2008-12-25  9:09 ` [PATCH 09/15] KVM: Split IOAPIC structure Sheng Yang
2008-12-25  9:09 ` [PATCH 10/15] KVM: Unified the delivery of IOAPIC and MSI Sheng Yang
2008-12-25  9:09 ` [PATCH 11/15] KVM: Change API of kvm_ioapic_get_delivery_bitmask Sheng Yang
2008-12-25  9:09 ` [PATCH 12/15] KVM: Update intr delivery func to accept unsigned long* bitmap Sheng Yang
2008-12-25  9:09 ` [PATCH 13/15] KVM: bit ops for deliver_bitmap Sheng Yang
2008-12-25  9:09 ` [PATCH 14/15] KVM: Using kfifo for irq recording Sheng Yang
2008-12-26  2:29   ` [PATCH 14/15] KVM: Replace host_irq_disable with a new flag Sheng Yang
2008-12-25  9:09 ` [PATCH 15/15] KVM: Fix racy in kvm_free_assigned_irq Sheng Yang
2008-12-25 11:56   ` Sheng Yang
2008-12-26  2:30     ` Sheng Yang
2008-12-27 20:06       ` Marcelo Tosatti
2008-12-27 20:15         ` Marcelo Tosatti
2008-12-28 11:24         ` Sheng Yang
2008-12-28 12:57           ` Avi Kivity
2008-12-29  5:42           ` Amit Shah
2008-12-29 12:23             ` Sheng Yang
2008-12-29 13:37               ` Avi Kivity
2008-12-29 13:49                 ` Sheng Yang
2008-12-29 15:20               ` Marcelo Tosatti
2008-12-30  2:14                 ` Sheng Yang
2008-12-30 16:45                   ` Marcelo Tosatti
2008-12-31  5:43                     ` Sheng Yang
2009-01-02  0:10                       ` Marcelo Tosatti
2009-01-05  7:07                         ` Sheng Yang
2009-01-05 13:27                           ` Avi Kivity
2009-01-06  1:25                             ` Sheng Yang
2008-12-29 13:20             ` Avi Kivity
2008-12-25  9:13 ` [PATCH 0/15] Device assignment & MSI enhancement Sheng Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).