All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gregory Haskins <ghaskins@novell.com>
To: davidel@xmailserver.org
Cc: mst@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, avi@redhat.com,
	paulmck@linux.vnet.ibm.com, mingo@elte.hu
Subject: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions
Date: Fri, 19 Jun 2009 14:51:38 -0400	[thread overview]
Message-ID: <20090619185138.31118.14916.stgit@dev.haskins.net> (raw)
In-Reply-To: <20090619183534.31118.30934.stgit@dev.haskins.net>

eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
notifier->release() callback.  This lets notification clients know if
the eventfd is about to go away and is very useful particularly for
in-kernel clients.  However, as it stands today it is not possible to
use the notification API in a race-free way.  This patch adds some
additional logic to the notification subsystem to rectify this problem.

Background:
-----------------------
Eventfd currently only has one reference count mechanism: fget/fput.  This
in of itself is normally fine.  However, if a client expects to be
notified if the eventfd is closed, it cannot hold a fget() reference
itself or the underlying f_ops->release() callback will never be invoked
by VFS.  Therefore we have this somewhat unusual situation where we may
hold a pointer to an eventfd object (by virtue of having a waiter registered
in its wait-queue), but no reference.  This makes it nearly impossible to
design a mutual decoupling algorithm: you cannot unhook one side from the
other (or vice versa) without racing.

The first problem was dealt with by essentially "donating" a surrogate
object to eventfd.  In other words, when a client attached to eventfd
and then later detached, it would decouple internally in a race free way
and then leave part of the object still attached to the eventfd.  This
decoupled object would correctly detect the end-of-life of the eventfd
object at some point in the future and be deallocated: However, we cannot
guarantee that this operation would not race with a potential rmmod of the
client, and is therefore broken.

Solution Details:
-----------------------

1) We add a private kref to the internal eventfd_ctx object.  This
   reference can be (transparently) held by notification clients without
   affecting the ability for VFS to indicate ->release() notification.

2) We convert the current lockless POLLHUP to a more traditional locked
   variant (*) so that we can ensure a race free mutual-decouple
   algorithm without requiring an surrogate object.

3) We guard the decouple algorithm with an atomic bit-clear to ensure
   mutual exclusion of the decoupling and reference-drop.

4) We hold a reference to the underlying eventfd_ctx until all paths
   have satisfactorily completed to ensure we do not race with eventfd
   going away.

Between these points, we believe we now have a race-free release
mechanism.

[*] Clients that previously assumed the ->release() could sleep will
need to be refactored.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Davide Libenzi <davidel@xmailserver.org>
CC: Michael S. Tsirkin <mst@redhat.com>
---

 fs/eventfd.c            |   62 +++++++++++++++++++++++++++++++++--------------
 include/linux/eventfd.h |    3 ++
 2 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 3d7fb16..934efee 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -16,8 +16,10 @@
 #include <linux/anon_inodes.h>
 #include <linux/eventfd.h>
 #include <linux/syscalls.h>
+#include <linux/kref.h>
 
 struct eventfd_ctx {
+	struct kref kref;
 	wait_queue_head_t wqh;
 	/*
 	 * Every time that a write(2) is performed on an eventfd, the
@@ -57,17 +59,24 @@ int eventfd_signal(struct file *file, int n)
 	return n;
 }
 
+static void _eventfd_release(struct kref *kref)
+{
+	struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref);
+
+	kfree(ctx);
+}
+
+static void _eventfd_put(struct kref *kref)
+{
+	kref_put(kref, &_eventfd_release);
+}
+
 static int eventfd_release(struct inode *inode, struct file *file)
 {
 	struct eventfd_ctx *ctx = file->private_data;
 
-	/*
-	 * No need to hold the lock here, since we are on the file cleanup
-	 * path and the ones still attached to the wait queue will be
-	 * serialized by wake_up_locked_poll().
-	 */
-	wake_up_locked_poll(&ctx->wqh, POLLHUP);
-	kfree(ctx);
+	wake_up_poll(&ctx->wqh, POLLHUP);
+	_eventfd_put(&ctx->kref);
 	return 0;
 }
 
@@ -222,6 +231,7 @@ SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
 	if (!ctx)
 		return -ENOMEM;
 
+	kref_init(&ctx->kref);
 	init_waitqueue_head(&ctx->wqh);
 	ctx->count = count;
 	ctx->flags = flags;
@@ -242,6 +252,15 @@ SYSCALL_DEFINE1(eventfd, unsigned int, count)
 	return sys_eventfd2(count, 0);
 }
 
+enum {
+	eventfd_notifier_flag_active,
+};
+
+static int test_and_clear_active(struct eventfd_notifier *en)
+{
+	return test_and_clear_bit(eventfd_notifier_flag_active, &en->flags);
+}
+
 static int eventfd_notifier_wakeup(wait_queue_t *wait, unsigned mode,
 				   int sync, void *key)
 {
@@ -251,19 +270,15 @@ static int eventfd_notifier_wakeup(wait_queue_t *wait, unsigned mode,
 	en = container_of(wait, struct eventfd_notifier, wait);
 
 	if (flags & POLLIN)
-		/*
-		 * The POLLIN wake_up is called with interrupts disabled.
-		 */
 		en->ops->signal(en);
 
 	if (flags & POLLHUP) {
-		/*
-		 * The POLLHUP is called unlocked, so it theoretically should
-		 * be safe to remove ourselves from the wqh using the locked
-		 * variant of remove_wait_queue()
-		 */
-		remove_wait_queue(en->wqh, &en->wait);
-		en->ops->release(en);
+
+		if (test_and_clear_active(en)) {
+			__remove_wait_queue(en->wqh, &en->wait);
+			_eventfd_put(en->eventfd);
+			en->ops->release(en);
+		}
 	}
 
 	return 0;
@@ -283,11 +298,14 @@ static void eventfd_notifier_ptable_enqueue(struct file *file,
 
 int eventfd_notifier_register(struct file *file, struct eventfd_notifier *en)
 {
+	struct eventfd_ctx *ctx;
 	unsigned int events;
 
 	if (file->f_op != &eventfd_fops)
 		return -EINVAL;
 
+	ctx = file->private_data;
+
 	/*
 	 * Install our own custom wake-up handling so we are notified via
 	 * a callback whenever someone signals the underlying eventfd
@@ -297,12 +315,20 @@ int eventfd_notifier_register(struct file *file, struct eventfd_notifier *en)
 
 	events = file->f_op->poll(file, &en->pt);
 
+	kref_get(&ctx->kref);
+	en->eventfd = &ctx->kref;
+
+	set_bit(eventfd_notifier_flag_active, &en->flags);
+
 	return (events & POLLIN) ? 1 : 0;
 }
 EXPORT_SYMBOL_GPL(eventfd_notifier_register);
 
 void eventfd_notifier_unregister(struct eventfd_notifier *en)
 {
-	remove_wait_queue(en->wqh, &en->wait);
+	if (test_and_clear_active(en)) {
+		remove_wait_queue(en->wqh, &en->wait);
+		_eventfd_put(en->eventfd);
+	}
 }
 EXPORT_SYMBOL_GPL(eventfd_notifier_unregister);
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index cb23969..2b6e239 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -12,6 +12,7 @@
 #include <linux/poll.h>
 #include <linux/file.h>
 #include <linux/list.h>
+#include <linux/kref.h>
 
 struct eventfd_notifier;
 
@@ -24,6 +25,8 @@ struct eventfd_notifier {
 	poll_table                         pt;
 	wait_queue_head_t                 *wqh;
 	wait_queue_t                       wait;
+	struct kref                       *eventfd;
+	unsigned long                      flags;
 	const struct eventfd_notifier_ops *ops;
 };
 


  parent reply	other threads:[~2009-06-19 18:52 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-16  2:29 [KVM-RFC PATCH 0/2] eventfd enhancements for irqfd/iosignalfd Gregory Haskins
2009-06-16  2:29 ` [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface Gregory Haskins
2009-06-16 14:02   ` Michael S. Tsirkin
2009-06-16 14:11     ` Gregory Haskins
2009-06-16 14:38       ` Michael S. Tsirkin
2009-06-16 14:48         ` Gregory Haskins
2009-06-16 14:54           ` Gregory Haskins
2009-06-16 15:16             ` Michael S. Tsirkin
2009-06-16 14:55           ` Michael S. Tsirkin
2009-06-16 15:20             ` Gregory Haskins
2009-06-16 15:41               ` Michael S. Tsirkin
2009-06-16 16:17                 ` Gregory Haskins
2009-06-16 16:19                   ` Davide Libenzi
2009-06-16 17:01                     ` Gregory Haskins
2009-06-17 16:38                       ` Davide Libenzi
2009-06-17 17:28                         ` Gregory Haskins
2009-06-17 17:44                           ` Davide Libenzi
2009-06-17 19:17                             ` Gregory Haskins
2009-06-17 19:50                               ` Davide Libenzi
2009-06-17 21:48                                 ` Gregory Haskins
2009-06-17 23:21                                   ` Davide Libenzi
2009-06-18  6:23                                     ` Michael S. Tsirkin
2009-06-18 17:52                                       ` Davide Libenzi
2009-06-18 14:01                                     ` Gregory Haskins
2009-06-18 14:01                                       ` Gregory Haskins
2009-06-18 17:44                                       ` Davide Libenzi
2009-06-18 19:04                                         ` Gregory Haskins
2009-06-18 19:04                                           ` Gregory Haskins
2009-06-18 22:03                                           ` Davide Libenzi
2009-06-18 22:47                                             ` Gregory Haskins
2009-06-18 22:47                                               ` Gregory Haskins
2009-06-19 18:51                                             ` Gregory Haskins
2009-06-19 18:51                                               ` [PATCH 1/3] eventfd: Allow waiters to be notified about the eventfd file* going away Gregory Haskins
2009-06-19 18:51                                               ` [PATCH 2/3] eventfd: add generalized notifier interface Gregory Haskins
2009-06-19 18:51                                               ` Gregory Haskins [this message]
2009-06-19 19:10                                                 ` [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions Davide Libenzi
2009-06-19 21:16                                                   ` Gregory Haskins
2009-06-19 21:26                                                     ` Davide Libenzi
2009-06-19 21:49                                                       ` Gregory Haskins
2009-06-19 21:54                                                         ` Davide Libenzi
2009-06-19 22:47                                                           ` Davide Libenzi
2009-06-20  2:09                                                             ` Gregory Haskins
2009-06-20 21:17                                                               ` Davide Libenzi
2009-06-20 22:11                                                                 ` Davide Libenzi
2009-06-20 23:48                                                                   ` Davide Libenzi
2009-06-21  1:14                                                                     ` Gregory Haskins
2009-06-21 16:51                                                                       ` Davide Libenzi
2009-06-21 18:39                                                                         ` Gregory Haskins
2009-06-21 23:54                                                                           ` Davide Libenzi
2009-06-22 16:05                                                                             ` Gregory Haskins
2009-06-22 16:05                                                                               ` Gregory Haskins
2009-06-22 17:01                                                                               ` Davide Libenzi
2009-06-22 17:43                                                                                 ` Gregory Haskins
2009-06-22 17:43                                                                                   ` Gregory Haskins
2009-06-22 18:03                                                                                   ` Davide Libenzi
2009-06-22 18:31                                                                                     ` Gregory Haskins
2009-06-22 18:31                                                                                       ` Gregory Haskins
2009-06-22 18:40                                                                                       ` Davide Libenzi
2009-06-22 18:41                                                                                     ` Michael S. Tsirkin
2009-06-22 18:51                                                                                       ` Davide Libenzi
2009-06-22 19:05                                                                                         ` Michael S. Tsirkin
2009-06-22 19:26                                                                                           ` Gregory Haskins
2009-06-22 19:29                                                                                             ` Davide Libenzi
2009-06-22 20:06                                                                                               ` Gregory Haskins
2009-06-22 22:53                                                                                                 ` Davide Libenzi
2009-06-23  1:03                                                                                                   ` Gregory Haskins
2009-06-23  1:17                                                                                                     ` Davide Libenzi
2009-06-23  1:26                                                                                                       ` Gregory Haskins
2009-06-23  1:26                                                                                                         ` Gregory Haskins
2009-06-23 14:29                                                                                                         ` Davide Libenzi
2009-06-23 14:37                                                                                                           ` Gregory Haskins
2009-06-23 14:37                                                                                                             ` Gregory Haskins
2009-06-23 14:35                                                                                                             ` Davide Libenzi
2009-06-23 14:42                                                                                                               ` Gregory Haskins
2009-06-23 14:42                                                                                                                 ` Gregory Haskins
2009-06-23 15:04                                                                                                               ` Michael S. Tsirkin
2009-06-22 20:28                                                                                             ` Michael S. Tsirkin
2009-06-22 19:16                                                                                         ` Gregory Haskins
2009-06-22 19:54                                                                                           ` Davide Libenzi
2009-06-24  3:25                                                                                     ` Rusty Russell
2009-06-24 22:45                                                                                       ` Davide Libenzi
2009-06-25 11:42                                                                                         ` Rusty Russell
2009-06-25 16:34                                                                                           ` Davide Libenzi
2009-06-25 17:32                                                                                             ` Gregory Haskins
2009-06-25 18:26                                                                                               ` Michael S. Tsirkin
2009-06-25 18:41                                                                                                 ` Gregory Haskins
2009-06-26 11:23                                                                                                   ` Michael S. Tsirkin
2009-06-23  3:25                                                                             ` Rusty Russell
2009-06-23 14:31                                                                               ` Davide Libenzi
2009-06-25  0:19                                                                                 ` Davide Libenzi
2009-06-21  1:05                                                                 ` Gregory Haskins
2009-06-16 17:54                   ` [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface Michael S. Tsirkin
2009-06-16 18:09                     ` Gregory Haskins
2009-06-17 14:45                       ` Michael S. Tsirkin
2009-06-17 15:02                         ` Gregory Haskins
2009-06-17 16:25                           ` Michael S. Tsirkin
2009-06-17 16:41                             ` Gregory Haskins
2009-06-16 14:17     ` Gregory Haskins
2009-06-16 14:22       ` Gregory Haskins
2009-06-16 14:40     ` Gregory Haskins
2009-06-16 14:46       ` Michael S. Tsirkin
2009-06-18  9:03       ` Avi Kivity
2009-06-18 11:43         ` Gregory Haskins
2009-06-16  2:30 ` [KVM-RFC PATCH 2/2] eventfd: add module reference counting support for registered notifiers Gregory Haskins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090619185138.31118.14916.stgit@dev.haskins.net \
    --to=ghaskins@novell.com \
    --cc=avi@redhat.com \
    --cc=davidel@xmailserver.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mst@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.