[KVM PATCH v8 0/3] irqfd fixes and enhancements

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [KVM PATCH v8 0/3] irqfd fixes and enhancements
@ 2009-07-01 16:08 Gregory Haskins
  2009-07-01 16:09 ` [KVM PATCH v8 1/3] KVM: Fix races in irqfd using new eventfd_kref_get interface Gregory Haskins
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Gregory Haskins @ 2009-07-01 16:08 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, mst, avi, davidel

(Applies to kvm.git/master:beeaacd1)

The following is the latest attempt to fix the races in irqfd/eventfd, as
well as restore DEASSIGN support.  For more details, please read the patch
headers.

As always, this series has been tested against the kvm-eventfd unit test
and everything appears to be functioning properly. You can download this
test here:

ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz2

Kind Regards,
-Greg


[Changelog:

	v8:
	   *) Rebased to kvm.git/master:beeaacd1)
	   *) Dropped Davide's patch (2/5 in v7) since it's now upstream
	   *) Folded v7's 1/5 and 3/5 together, and added a single
	      eventfd hunk to convert wake_up_locked_polled to wake_up_polled
	   *) Dropped irqfd->active bit in favor of irqfd_is_active() function
	   *) Cleaned up comments in 1/3
	   *) Dropped v7's 5/5 (slow-work)
	   *) Added new patch (3/3) which makes the cleanup-wq's creation
	      dynamic so to avoid the resource penalty for guests that do
	      not use irqfd.

	v7:
	   *) Addressed minor-nit feedback from Michael
	   *) Cleaned up patch headers
	   *) Re-added separate slow-work feature patch to end for comparison

	v6:
	   *) Removed slow-work in favor of using a dedicated single-thread
              workqueue.
	   *) Condensed cleanup path to always use deferred shutdown
	   *) Saved about 56 lines over v5, with the following diffstat:

 	   include/linux/kvm_host.h |    2 
 	   virt/kvm/eventfd.c       |  248 ++++++++++++++++++-----------------------------
 	   2 files changed, 97 insertions(+), 153 deletions(-)
	v5:
           Untracked..
]

---

Gregory Haskins (3):
      KVM: create irqfd-cleanup-wq on demand
      KVM: add irqfd DEASSIGN feature
      KVM: Fix races in irqfd using new eventfd_kref_get interface


 fs/eventfd.c             |    7 -
 include/linux/kvm.h      |    2 
 include/linux/kvm_host.h |    6 +
 virt/kvm/eventfd.c       |  281 ++++++++++++++++++++++++++++++++++++----------
 4 files changed, 229 insertions(+), 67 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [KVM PATCH v8 1/3] KVM: Fix races in irqfd using new eventfd_kref_get interface
  2009-07-01 16:08 [KVM PATCH v8 0/3] irqfd fixes and enhancements Gregory Haskins
@ 2009-07-01 16:09 ` Gregory Haskins
  2009-07-01 20:21   ` Gregory Haskins
  2009-07-02 14:16   ` Avi Kivity
  2009-07-01 16:09 ` [KVM PATCH v8 2/3] KVM: add irqfd DEASSIGN feature Gregory Haskins
  2009-07-01 16:09 ` [KVM PATCH v8 3/3] KVM: create irqfd-cleanup-wq on demand Gregory Haskins
  2 siblings, 2 replies; 9+ messages in thread
From: Gregory Haskins @ 2009-07-01 16:09 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, mst, avi, davidel

eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
"release" callback.  This lets eventfd clients know if the eventfd is about
to go away and is very useful particularly for in-kernel clients.  However,
until recently it is not possible to use this feature of eventfd in a
race-free way.

This patch utilizes a new eventfd interface to rectify the problem.  It also
converts the eventfd POLLHUP generation code to use the locked variant
of wakeup.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Davide Libenzi <davidel@xmailserver.org>
---

 fs/eventfd.c             |    7 --
 include/linux/kvm_host.h |    5 +
 virt/kvm/eventfd.c       |  187 ++++++++++++++++++++++++++++++++--------------
 3 files changed, 134 insertions(+), 65 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index d9849a1..31d12de 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -105,12 +105,7 @@ static int eventfd_release(struct inode *inode, struct file *file)
 {
 	struct eventfd_ctx *ctx = file->private_data;
 
-	/*
-	 * No need to hold the lock here, since we are on the file cleanup
-	 * path and the ones still attached to the wait queue will be
-	 * serialized by wake_up_locked_poll().
-	 */
-	wake_up_locked_poll(&ctx->wqh, POLLHUP);
+	wake_up_poll(&ctx->wqh, POLLHUP);
 	eventfd_ctx_put(ctx);
 	return 0;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1a8952f..7605bc4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -141,7 +141,10 @@ struct kvm {
 	struct kvm_io_bus mmio_bus;
 	struct kvm_io_bus pio_bus;
 #ifdef CONFIG_HAVE_KVM_EVENTFD
-	struct list_head irqfds;
+	struct {
+		spinlock_t        lock;
+		struct list_head  items;
+	} irqfds;
 #endif
 	struct kvm_vm_stat stat;
 	struct kvm_arch arch;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index a9e7de7..05ce447 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -28,7 +28,6 @@
 #include <linux/file.h>
 #include <linux/list.h>
 #include <linux/eventfd.h>
-#include <linux/srcu.h>
 
 /*
  * --------------------------------------------------------------------
@@ -37,66 +36,86 @@
  * Credit goes to Avi Kivity for the original idea.
  * --------------------------------------------------------------------
  */
+
 struct _irqfd {
-	struct mutex              lock;
-	struct srcu_struct        srcu;
 	struct kvm               *kvm;
+	struct eventfd_ctx       *eventfd;
 	int                       gsi;
 	struct list_head          list;
 	poll_table                pt;
 	wait_queue_head_t        *wqh;
 	wait_queue_t              wait;
 	struct work_struct        inject;
+	struct work_struct        shutdown;
 };
 
+static struct workqueue_struct *irqfd_cleanup_wq;
+
 static void
 irqfd_inject(struct work_struct *work)
 {
 	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
-	struct kvm *kvm;
-	int idx;
-
-	idx = srcu_read_lock(&irqfd->srcu);
-
-	kvm = rcu_dereference(irqfd->kvm);
-	if (kvm) {
-		mutex_lock(&kvm->irq_lock);
-		kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
-		kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
-		mutex_unlock(&kvm->irq_lock);
-	}
+	struct kvm *kvm = irqfd->kvm;
 
-	srcu_read_unlock(&irqfd->srcu, idx);
+	mutex_lock(&kvm->irq_lock);
+	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
+	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
+	mutex_unlock(&kvm->irq_lock);
 }
 
+/*
+ * Race-free decouple logic (ordering is critical)
+ */
 static void
-irqfd_disconnect(struct _irqfd *irqfd)
+irqfd_shutdown(struct work_struct *work)
 {
-	struct kvm *kvm;
+	struct _irqfd *irqfd = container_of(work, struct _irqfd, shutdown);
 
-	mutex_lock(&irqfd->lock);
+	/*
+	 * Synchronize with the wait-queue and unhook ourselves to prevent
+	 * further events.
+	 */
+	remove_wait_queue(irqfd->wqh, &irqfd->wait);
+
+	/*
+	 * We know no new events will be scheduled at this point, so block
+	 * until all previously outstanding events have completed
+	 */
+	flush_work(&irqfd->inject);
+
+	/*
+	 * It is now safe to release the object's resources
+	 */
+	eventfd_ctx_put(irqfd->eventfd);
+	kfree(irqfd);
+}
 
-	kvm = rcu_dereference(irqfd->kvm);
-	rcu_assign_pointer(irqfd->kvm, NULL);
 
-	mutex_unlock(&irqfd->lock);
+/* assumes kvm->irqfds.lock is held */
+static bool
+irqfd_is_active(struct _irqfd *irqfd)
+{
+	return list_empty(&irqfd->list) ? false : true;
+}
 
-	if (!kvm)
-		return;
+/*
+ * Mark the irqfd as inactive and schedule it for removal
+ *
+ * assumes kvm->irqfds.lock is held
+ */
+static void
+irqfd_deactivate(struct _irqfd *irqfd)
+{
+	BUG_ON(!irqfd_is_active(irqfd));
 
-	mutex_lock(&kvm->lock);
-	list_del(&irqfd->list);
-	mutex_unlock(&kvm->lock);
+	list_del_init(&irqfd->list);
 
-	/*
-	 * It is important to not drop the kvm reference until the next grace
-	 * period because there might be lockless references in flight up
-	 * until then
-	 */
-	synchronize_srcu(&irqfd->srcu);
-	kvm_put_kvm(kvm);
+	queue_work(irqfd_cleanup_wq, &irqfd->shutdown);
 }
 
+/*
+ * Called with wqh->lock held and interrupts disabled
+ */
 static int
 irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
 {
@@ -104,25 +123,29 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
 	unsigned long flags = (unsigned long)key;
 
 	if (flags & POLLIN)
-		/*
-		 * The POLLIN wake_up is called with interrupts disabled.
-		 * Therefore we need to defer the IRQ injection until later
-		 * since we need to acquire the kvm->lock to do so.
-		 */
+		/* An event has been signaled, inject an interrupt */
 		schedule_work(&irqfd->inject);
 
 	if (flags & POLLHUP) {
+		/* The eventfd is closing, detach from KVM */
+		struct kvm *kvm = irqfd->kvm;
+		unsigned long flags;
+
+		spin_lock_irqsave(&kvm->irqfds.lock, flags);
+
 		/*
-		 * The POLLHUP is called unlocked, so it theoretically should
-		 * be safe to remove ourselves from the wqh using the locked
-		 * variant of remove_wait_queue()
+		 * We must check if someone deactivated the irqfd before
+		 * we could acquire the irqfds.lock since the item is
+		 * deactivated from the KVM side before it is unhooked from
+		 * the wait-queue.  If it is already deactivated, we can
+		 * simply return knowing the other side will cleanup for us.
+		 * We cannot race against the irqfd going away since the
+		 * other side is required to acquire wqh->lock, which we hold
 		 */
-		remove_wait_queue(irqfd->wqh, &irqfd->wait);
-		flush_work(&irqfd->inject);
-		irqfd_disconnect(irqfd);
+		if (irqfd_is_active(irqfd))
+			irqfd_deactivate(irqfd);
 
-		cleanup_srcu_struct(&irqfd->srcu);
-		kfree(irqfd);
+		spin_unlock_irqrestore(&kvm->irqfds.lock, flags);
 	}
 
 	return 0;
@@ -143,6 +166,7 @@ kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
 {
 	struct _irqfd *irqfd;
 	struct file *file = NULL;
+	struct eventfd_ctx *eventfd = NULL;
 	int ret;
 	unsigned int events;
 
@@ -150,12 +174,11 @@ kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
 	if (!irqfd)
 		return -ENOMEM;
 
-	mutex_init(&irqfd->lock);
-	init_srcu_struct(&irqfd->srcu);
 	irqfd->kvm = kvm;
 	irqfd->gsi = gsi;
 	INIT_LIST_HEAD(&irqfd->list);
 	INIT_WORK(&irqfd->inject, irqfd_inject);
+	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
 
 	file = eventfd_fget(fd);
 	if (IS_ERR(file)) {
@@ -163,6 +186,14 @@ kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
 		goto fail;
 	}
 
+	eventfd = eventfd_ctx_fileget(file);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto fail;
+	}
+
+	irqfd->eventfd = eventfd;
+
 	/*
 	 * Install our own custom wake-up handling so we are notified via
 	 * a callback whenever someone signals the underlying eventfd
@@ -172,14 +203,13 @@ kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
 
 	events = file->f_op->poll(file, &irqfd->pt);
 
-	kvm_get_kvm(kvm);
-
-	mutex_lock(&kvm->lock);
-	list_add_tail(&irqfd->list, &kvm->irqfds);
-	mutex_unlock(&kvm->lock);
+	spin_lock_irq(&kvm->irqfds.lock);
+	list_add_tail(&irqfd->list, &kvm->irqfds.items);
+	spin_unlock_irq(&kvm->irqfds.lock);
 
 	/*
-	 * Check if there was an event already queued
+	 * Check if there was an event already pending on the eventfd
+	 * before we registered, and trigger it as if we didn't miss it.
 	 */
 	if (events & POLLIN)
 		schedule_work(&irqfd->inject);
@@ -193,6 +223,9 @@ kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
 	return 0;
 
 fail:
+	if (eventfd && !IS_ERR(eventfd))
+		eventfd_ctx_put(eventfd);
+
 	if (file && !IS_ERR(file))
 		fput(file);
 
@@ -203,14 +236,52 @@ fail:
 void
 kvm_irqfd_init(struct kvm *kvm)
 {
-	INIT_LIST_HEAD(&kvm->irqfds);
+	spin_lock_init(&kvm->irqfds.lock);
+	INIT_LIST_HEAD(&kvm->irqfds.items);
 }
 
+/*
+ * This function is called as the kvm VM fd is being released. Shutdown all
+ * irqfds that still remain open
+ */
 void
 kvm_irqfd_release(struct kvm *kvm)
 {
 	struct _irqfd *irqfd, *tmp;
 
-	list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list)
-		irqfd_disconnect(irqfd);
+	spin_lock_irq(&kvm->irqfds.lock);
+
+	list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list)
+		irqfd_deactivate(irqfd);
+
+	spin_unlock_irq(&kvm->irqfds.lock);
+
+	/*
+	 * Block until we know all outstanding shutdown jobs have completed
+	 * since we do not take a kvm* reference.
+	 */
+	flush_workqueue(irqfd_cleanup_wq);
+
 }
+
+/*
+ * create a host-wide workqueue for issuing deferred shutdown requests
+ * aggregated from all vm* instances. We need our own isolated single-thread
+ * queue to prevent deadlock against flushing the normal work-queue.
+ */
+static int __init irqfd_module_init(void)
+{
+	irqfd_cleanup_wq = create_singlethread_workqueue("kvm-irqfd-cleanup");
+	if (!irqfd_cleanup_wq)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void __exit irqfd_module_exit(void)
+{
+	destroy_workqueue(irqfd_cleanup_wq);
+}
+
+module_init(irqfd_module_init);
+module_exit(irqfd_module_exit);


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [KVM PATCH v8 1/3] KVM: Fix races in irqfd using new eventfd_kref_get interface
  2009-07-01 16:09 ` [KVM PATCH v8 1/3] KVM: Fix races in irqfd using new eventfd_kref_get interface Gregory Haskins
@ 2009-07-01 20:21   ` Gregory Haskins
  2009-07-02 14:16   ` Avi Kivity
  1 sibling, 0 replies; 9+ messages in thread
From: Gregory Haskins @ 2009-07-01 20:21 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, mst, avi, davidel

[-- Attachment #1: Type: text/plain, Size: 387 bytes --]

Gregory Haskins wrote:
>  
> +	eventfd = eventfd_ctx_fileget(file);
> +	if (IS_ERR(file)) {
> +		ret = PTR_ERR(file);
> +		goto fail;
> +	}
> +
> +	irqfd->eventfd = eventfd;
> +
>   

<sigh>

Just noticed the typo (return "eventfd" but error-check "file").  Looks
like I will need at least a v9.  Is there any other feedback before I
push out the fix for v8?

-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [KVM PATCH v8 1/3] KVM: Fix races in irqfd using new eventfd_kref_get interface
  2009-07-01 16:09 ` [KVM PATCH v8 1/3] KVM: Fix races in irqfd using new eventfd_kref_get interface Gregory Haskins
  2009-07-01 20:21   ` Gregory Haskins
@ 2009-07-02 14:16   ` Avi Kivity
  2009-07-02 14:27     ` Gregory Haskins
  1 sibling, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2009-07-02 14:16 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, mst, davidel

On 07/01/2009 07:09 PM, Gregory Haskins wrote:
> eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
> "release" callback.  This lets eventfd clients know if the eventfd is about
> to go away and is very useful particularly for in-kernel clients.  However,
> until recently it is not possible to use this feature of eventfd in a
> race-free way.
>
> This patch utilizes a new eventfd interface to rectify the problem.  It also
> converts the eventfd POLLHUP generation code to use the locked variant
> of wakeup.
>
> Signed-off-by: Gregory Haskins<ghaskins@novell.com>
> CC: Davide Libenzi<davidel@xmailserver.org>
> ---
>
>   fs/eventfd.c             |    7 --
>   include/linux/kvm_host.h |    5 +
>   virt/kvm/eventfd.c       |  187 ++++++++++++++++++++++++++++++++--------------
>   3 files changed, 134 insertions(+), 65 deletions(-)
>    


Please split the eventfd.c hunk into a separate patch.  When preparing 
the 2.6.32 submission, I'll fold that into the patch into its antipatch 
and they'll disappear.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [KVM PATCH v8 1/3] KVM: Fix races in irqfd using new eventfd_kref_get interface
  2009-07-02 14:16   ` Avi Kivity
@ 2009-07-02 14:27     ` Gregory Haskins
  2009-07-02 14:42       ` Avi Kivity
  0 siblings, 1 reply; 9+ messages in thread
From: Gregory Haskins @ 2009-07-02 14:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, mst, davidel

[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]

Avi Kivity wrote:
> On 07/01/2009 07:09 PM, Gregory Haskins wrote:
>> eventfd currently emits a POLLHUP wakeup on f_ops->release() to
>> generate a
>> "release" callback.  This lets eventfd clients know if the eventfd is
>> about
>> to go away and is very useful particularly for in-kernel clients. 
>> However,
>> until recently it is not possible to use this feature of eventfd in a
>> race-free way.
>>
>> This patch utilizes a new eventfd interface to rectify the problem. 
>> It also
>> converts the eventfd POLLHUP generation code to use the locked variant
>> of wakeup.
>>
>> Signed-off-by: Gregory Haskins<ghaskins@novell.com>
>> CC: Davide Libenzi<davidel@xmailserver.org>
>> ---
>>
>>   fs/eventfd.c             |    7 --
>>   include/linux/kvm_host.h |    5 +
>>   virt/kvm/eventfd.c       |  187
>> ++++++++++++++++++++++++++++++++--------------
>>   3 files changed, 134 insertions(+), 65 deletions(-)
>>    
>
>
> Please split the eventfd.c hunk into a separate patch.  When preparing
> the 2.6.32 submission, I'll fold that into the patch into its
> antipatch and they'll disappear.
>
Ok, but note that that means I should probably split 1/3 back out into
1/5 (prepare), 2/5 (eventfd hunk), 3/5 (fix irqfd) again like I had in
v7 so that the series is bisectable.  Is that ok?

-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [KVM PATCH v8 1/3] KVM: Fix races in irqfd using new eventfd_kref_get interface
  2009-07-02 14:27     ` Gregory Haskins
@ 2009-07-02 14:42       ` Avi Kivity
  0 siblings, 0 replies; 9+ messages in thread
From: Avi Kivity @ 2009-07-02 14:42 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, mst, davidel

On 07/02/2009 05:27 PM, Gregory Haskins wrote:
>
>    
>> Please split the eventfd.c hunk into a separate patch.  When preparing
>> the 2.6.32 submission, I'll fold that into the patch into its
>> antipatch and they'll disappear.
>>
>>      
> Ok, but note that that means I should probably split 1/3 back out into
> 1/5 (prepare), 2/5 (eventfd hunk), 3/5 (fix irqfd) again like I had in
> v7 so that the series is bisectable.  Is that ok?
>
>    

Sure.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [KVM PATCH v8 2/3] KVM: add irqfd DEASSIGN feature
  2009-07-01 16:08 [KVM PATCH v8 0/3] irqfd fixes and enhancements Gregory Haskins
  2009-07-01 16:09 ` [KVM PATCH v8 1/3] KVM: Fix races in irqfd using new eventfd_kref_get interface Gregory Haskins
@ 2009-07-01 16:09 ` Gregory Haskins
  2009-07-01 16:09 ` [KVM PATCH v8 3/3] KVM: create irqfd-cleanup-wq on demand Gregory Haskins
  2 siblings, 0 replies; 9+ messages in thread
From: Gregory Haskins @ 2009-07-01 16:09 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, mst, avi, davidel

DEASSIGN allows us to optionally disassociate an IRQFD from its underlying
eventfd without destroying the eventfd in the process.  This is useful
for conditions like live-migration which may have an eventfd associated
with a device and an IRQFD.  We need to be able to decouple the guest
from the event source while not perturbing the event source itself.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---

 include/linux/kvm.h |    2 ++
 virt/kvm/eventfd.c  |   46 ++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 69d3d73..76c6408 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -461,6 +461,8 @@ struct kvm_x86_mce {
 };
 #endif
 
+#define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
+
 struct kvm_irqfd {
 	__u32 fd;
 	__u32 gsi;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 05ce447..0fd200c 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -161,8 +161,8 @@ irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
 	add_wait_queue(wqh, &irqfd->wait);
 }
 
-int
-kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
+static int
+kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
 {
 	struct _irqfd *irqfd;
 	struct file *file = NULL;
@@ -241,6 +241,48 @@ kvm_irqfd_init(struct kvm *kvm)
 }
 
 /*
+ * shutdown any irqfd's that match fd+gsi
+ */
+static int
+kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
+{
+	struct _irqfd *irqfd, *tmp;
+	struct eventfd_ctx *eventfd;
+
+	eventfd = eventfd_ctx_fdget(fd);
+	if (IS_ERR(eventfd))
+		return PTR_ERR(eventfd);
+
+	spin_lock_irq(&kvm->irqfds.lock);
+
+	list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list) {
+		if (irqfd->eventfd == eventfd && irqfd->gsi == gsi)
+			irqfd_deactivate(irqfd);
+	}
+
+	spin_unlock_irq(&kvm->irqfds.lock);
+	eventfd_ctx_put(eventfd);
+
+	/*
+	 * Block until we know all outstanding shutdown jobs have completed
+	 * so that we guarantee there will not be any more interrupts on this
+	 * gsi once this deassign function returns.
+	 */
+	flush_workqueue(irqfd_cleanup_wq);
+
+	return 0;
+}
+
+int
+kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
+{
+	if (flags & KVM_IRQFD_FLAG_DEASSIGN)
+		return kvm_irqfd_deassign(kvm, fd, gsi);
+
+	return kvm_irqfd_assign(kvm, fd, gsi);
+}
+
+/*
  * This function is called as the kvm VM fd is being released. Shutdown all
  * irqfds that still remain open
  */


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [KVM PATCH v8 3/3] KVM: create irqfd-cleanup-wq on demand
  2009-07-01 16:08 [KVM PATCH v8 0/3] irqfd fixes and enhancements Gregory Haskins
  2009-07-01 16:09 ` [KVM PATCH v8 1/3] KVM: Fix races in irqfd using new eventfd_kref_get interface Gregory Haskins
  2009-07-01 16:09 ` [KVM PATCH v8 2/3] KVM: add irqfd DEASSIGN feature Gregory Haskins
@ 2009-07-01 16:09 ` Gregory Haskins
  2009-07-02 14:22   ` Avi Kivity
  2 siblings, 1 reply; 9+ messages in thread
From: Gregory Haskins @ 2009-07-01 16:09 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, mst, avi, davidel

We currently create this wq on module_init, which may be wasteful if the
host never creates a guest that uses irqfd.  This patch changes the
algorithm so that the workqueue is only created when at least one guest
is using irqfd.  The queue is cleaned up when the last guest using irqfd
is shutdown.

To keep things simple, we only check whether the guest has tried to create
an irqfd, not whether there are actually irqfds active.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 include/linux/kvm_host.h |    1 
 virt/kvm/eventfd.c       |  100 ++++++++++++++++++++++++++++++++++------------
 2 files changed, 75 insertions(+), 26 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7605bc4..0b0b6ac 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -144,6 +144,7 @@ struct kvm {
 	struct {
 		spinlock_t        lock;
 		struct list_head  items;
+		int               init:1;
 	} irqfds;
 #endif
 	struct kvm_vm_stat stat;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 0fd200c..87f615b 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -49,7 +49,16 @@ struct _irqfd {
 	struct work_struct        shutdown;
 };
 
-static struct workqueue_struct *irqfd_cleanup_wq;
+struct _irqfd_cleanup {
+	struct mutex             lock;
+	int                      refs;
+	struct workqueue_struct *wq;
+};
+
+static struct _irqfd_cleanup irqfd_cleanup = {
+	.lock = __MUTEX_INITIALIZER(irqfd_cleanup.lock),
+	.refs = 0,
+};
 
 static void
 irqfd_inject(struct work_struct *work)
@@ -110,7 +119,7 @@ irqfd_deactivate(struct _irqfd *irqfd)
 
 	list_del_init(&irqfd->list);
 
-	queue_work(irqfd_cleanup_wq, &irqfd->shutdown);
+	queue_work(irqfd_cleanup.wq, &irqfd->shutdown);
 }
 
 /*
@@ -161,6 +170,62 @@ irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
 	add_wait_queue(wqh, &irqfd->wait);
 }
 
+/*
+ * create a host-wide workqueue for issuing deferred shutdown requests
+ * aggregated from all vm* instances. We need our own isolated single-thread
+ * queue to prevent deadlock against flushing the normal work-queue.
+ */
+static int
+irqfd_cleanup_init(struct kvm *kvm)
+{
+	int ret = 0;
+
+	mutex_lock(&irqfd_cleanup.lock);
+
+	/*
+	 * Check the current init state from within the lock so that we
+	 * sync all users to the thread creation.
+	 */
+	if (kvm->irqfds.init)
+		goto out;
+
+	if (!irqfd_cleanup.refs) {
+		struct workqueue_struct *wq;
+
+		wq = create_singlethread_workqueue("kvm-irqfd-cleanup");
+		if (!wq) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		irqfd_cleanup.wq = wq;
+	}
+
+	irqfd_cleanup.refs++;
+	kvm->irqfds.init = true;
+
+out:
+	mutex_unlock(&irqfd_cleanup.lock);
+
+	return ret;
+}
+
+static void
+irqfd_cleanup_release(struct kvm *kvm)
+{
+	if (!kvm->irqfds.init)
+		return;
+
+	mutex_lock(&irqfd_cleanup.lock);
+
+	if (!(--irqfd_cleanup.refs))
+		destroy_workqueue(irqfd_cleanup.wq);
+
+	mutex_unlock(&irqfd_cleanup.lock);
+
+	kvm->irqfds.init = false;
+}
+
 static int
 kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
 {
@@ -170,6 +235,10 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
 	int ret;
 	unsigned int events;
 
+	ret = irqfd_cleanup_init(kvm);
+	if (ret < 0)
+		return ret;
+
 	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
 	if (!irqfd)
 		return -ENOMEM;
@@ -268,7 +337,7 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
 	 * so that we guarantee there will not be any more interrupts on this
 	 * gsi once this deassign function returns.
 	 */
-	flush_workqueue(irqfd_cleanup_wq);
+	flush_workqueue(irqfd_cleanup.wq);
 
 	return 0;
 }
@@ -302,28 +371,7 @@ kvm_irqfd_release(struct kvm *kvm)
 	 * Block until we know all outstanding shutdown jobs have completed
 	 * since we do not take a kvm* reference.
 	 */
-	flush_workqueue(irqfd_cleanup_wq);
-
-}
-
-/*
- * create a host-wide workqueue for issuing deferred shutdown requests
- * aggregated from all vm* instances. We need our own isolated single-thread
- * queue to prevent deadlock against flushing the normal work-queue.
- */
-static int __init irqfd_module_init(void)
-{
-	irqfd_cleanup_wq = create_singlethread_workqueue("kvm-irqfd-cleanup");
-	if (!irqfd_cleanup_wq)
-		return -ENOMEM;
-
-	return 0;
-}
+	flush_workqueue(irqfd_cleanup.wq);
+	irqfd_cleanup_release(kvm);
 
-static void __exit irqfd_module_exit(void)
-{
-	destroy_workqueue(irqfd_cleanup_wq);
 }
-
-module_init(irqfd_module_init);
-module_exit(irqfd_module_exit);


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [KVM PATCH v8 3/3] KVM: create irqfd-cleanup-wq on demand
  2009-07-01 16:09 ` [KVM PATCH v8 3/3] KVM: create irqfd-cleanup-wq on demand Gregory Haskins
@ 2009-07-02 14:22   ` Avi Kivity
  0 siblings, 0 replies; 9+ messages in thread
From: Avi Kivity @ 2009-07-02 14:22 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, mst, davidel

On 07/01/2009 07:09 PM, Gregory Haskins wrote:
> We currently create this wq on module_init, which may be wasteful if the
> host never creates a guest that uses irqfd.  This patch changes the
> algorithm so that the workqueue is only created when at least one guest
> is using irqfd.  The queue is cleaned up when the last guest using irqfd
> is shutdown.
>
> To keep things simple, we only check whether the guest has tried to create
> an irqfd, not whether there are actually irqfds active.
>    

> +++ b/include/linux/kvm_host.h
> @@ -144,6 +144,7 @@ struct kvm {
>   	struct {
>   		spinlock_t        lock;
>   		struct list_head  items;
> +		int               init:1;
>    

Since you're rebasing anyway... change the bitfield to a bool.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-07-02 14:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-01 16:08 [KVM PATCH v8 0/3] irqfd fixes and enhancements Gregory Haskins
2009-07-01 16:09 ` [KVM PATCH v8 1/3] KVM: Fix races in irqfd using new eventfd_kref_get interface Gregory Haskins
2009-07-01 20:21   ` Gregory Haskins
2009-07-02 14:16   ` Avi Kivity
2009-07-02 14:27     ` Gregory Haskins
2009-07-02 14:42       ` Avi Kivity
2009-07-01 16:09 ` [KVM PATCH v8 2/3] KVM: add irqfd DEASSIGN feature Gregory Haskins
2009-07-01 16:09 ` [KVM PATCH v8 3/3] KVM: create irqfd-cleanup-wq on demand Gregory Haskins
2009-07-02 14:22   ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox