qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/2] aio: Use epoll in aio_poll()
@ 2015-10-13 11:10 Fam Zheng
  2015-10-13 11:10 ` [Qemu-devel] [PATCH v2 1/2] aio: Introduce aio_context_setup Fam Zheng
  2015-10-13 11:10 ` [Qemu-devel] [PATCH v2 2/2] aio: Introduce aio-epoll.c Fam Zheng
  0 siblings, 2 replies; 5+ messages in thread
From: Fam Zheng @ 2015-10-13 11:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, qemu-block, Stefan Hajnoczi

v2: Merge aio-epoll.c into aio-posix.c. [Paolo]
    Capture some benchmark data in commit log.

This series adds the ability to use epoll in aio_poll() on Linux. It's switched
on in a dynamic way rather than static for two reasons: 1) when the number of
fds is not high enough, using epoll has little advantage; 2) when an epoll
incompatible fd needs to be handled, we need to fall back.  The epoll is
enabled when a fd number threshold is met.


Fam Zheng (2):
  aio: Introduce aio_context_setup
  aio: Introduce aio-epoll.c

 aio-posix.c         | 177 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 aio-win32.c         |   4 ++
 async.c             |  13 +++-
 include/block/aio.h |  14 +++++
 4 files changed, 204 insertions(+), 4 deletions(-)

-- 
2.6.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH v2 1/2] aio: Introduce aio_context_setup
  2015-10-13 11:10 [Qemu-devel] [PATCH v2 0/2] aio: Use epoll in aio_poll() Fam Zheng
@ 2015-10-13 11:10 ` Fam Zheng
  2015-10-13 11:10 ` [Qemu-devel] [PATCH v2 2/2] aio: Introduce aio-epoll.c Fam Zheng
  1 sibling, 0 replies; 5+ messages in thread
From: Fam Zheng @ 2015-10-13 11:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, qemu-block, Stefan Hajnoczi

This is the place to initialize platform specific bits of AioContext.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 aio-posix.c         |  4 ++++
 aio-win32.c         |  4 ++++
 async.c             | 13 +++++++++++--
 include/block/aio.h |  8 ++++++++
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index d477033..597cb35 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -297,3 +297,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
     return progress;
 }
+
+void aio_context_setup(AioContext *ctx, Error **errp)
+{
+}
diff --git a/aio-win32.c b/aio-win32.c
index 50a6867..4742af3 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -363,3 +363,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
     aio_context_release(ctx);
     return progress;
 }
+
+void aio_context_setup(AioContext *ctx, Error **errp)
+{
+}
diff --git a/async.c b/async.c
index efce14b..f4b1410 100644
--- a/async.c
+++ b/async.c
@@ -320,12 +320,18 @@ AioContext *aio_context_new(Error **errp)
 {
     int ret;
     AioContext *ctx;
+    Error *local_err = NULL;
+
     ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
+    aio_context_setup(ctx, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        goto fail;
+    }
     ret = event_notifier_init(&ctx->notifier, false);
     if (ret < 0) {
-        g_source_destroy(&ctx->source);
         error_setg_errno(errp, -ret, "Failed to initialize event notifier");
-        return NULL;
+        goto fail;
     }
     g_source_set_can_recurse(&ctx->source, true);
     aio_set_event_notifier(ctx, &ctx->notifier,
@@ -339,6 +345,9 @@ AioContext *aio_context_new(Error **errp)
     ctx->notify_dummy_bh = aio_bh_new(ctx, notify_dummy_bh, NULL);
 
     return ctx;
+fail:
+    g_source_destroy(&ctx->source);
+    return NULL;
 }
 
 void aio_context_ref(AioContext *ctx)
diff --git a/include/block/aio.h b/include/block/aio.h
index 400b1b0..a6fd5ad 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -373,4 +373,12 @@ static inline void aio_timer_init(AioContext *ctx,
  */
 int64_t aio_compute_timeout(AioContext *ctx);
 
+/**
+ * aio_context_setup:
+ * @ctx: the aio context
+ *
+ * Initialize the aio context.
+ */
+void aio_context_setup(AioContext *ctx, Error **errp);
+
 #endif
-- 
2.6.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH v2 2/2] aio: Introduce aio-epoll.c
  2015-10-13 11:10 [Qemu-devel] [PATCH v2 0/2] aio: Use epoll in aio_poll() Fam Zheng
  2015-10-13 11:10 ` [Qemu-devel] [PATCH v2 1/2] aio: Introduce aio_context_setup Fam Zheng
@ 2015-10-13 11:10 ` Fam Zheng
  2015-10-16  9:32   ` Stefan Hajnoczi
  1 sibling, 1 reply; 5+ messages in thread
From: Fam Zheng @ 2015-10-13 11:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, qemu-block, Stefan Hajnoczi

To minimize code duplication, epoll is hooked into aio-posix's
aio_poll() instead of rolling its own. This approach also has both
compile-time and run-time switchability.

1) When QEMU starts with a small number of fds in the event loop, ppoll
is used.

2) When QEMU starts with a big number of fds, or when more devices are
hot plugged, epoll kicks in when the number of fds hits the threshold.

3) Some fds may not support epoll, such as tty based stdio. In this
case, it falls back to ppoll.

A rough benchmark with scsi-disk on virtio-scsi dataplane (epoll gets
enabled from 64 onward). Numbers are in MB/s.

===============================================
             |     master     |     epoll
             |                |
scsi disks # | read    randrw | read    randrw
-------------|----------------|----------------
1            | 74      36     | 75      37
8            | 74      36     | 73      37
32           | 65      32     | 63      32
64           | 52      25     | 66      32
128          | 42      21     | 54      27
256          | 26      15     | 38      19
===============================================

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 aio-posix.c         | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/block/aio.h |   6 ++
 2 files changed, 177 insertions(+), 2 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index 597cb35..3e88d7f 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -17,6 +17,9 @@
 #include "block/block.h"
 #include "qemu/queue.h"
 #include "qemu/sockets.h"
+#ifdef CONFIG_EPOLL
+#include <sys/epoll.h>
+#endif
 
 struct AioHandler
 {
@@ -28,6 +31,150 @@ struct AioHandler
     QLIST_ENTRY(AioHandler) node;
 };
 
+#ifdef CONFIG_EPOLL
+
+/* The fd number threashold to switch to epoll */
+#define EPOLL_ENABLE_THRESHOLD 64
+
+static void aio_epoll_disable(AioContext *ctx)
+{
+    ctx->epoll_available = false;
+    if (!ctx->epoll_enabled) {
+        return;
+    }
+    ctx->epoll_enabled = false;
+    close(ctx->epollfd);
+}
+
+static inline int epoll_events_from_pfd(int pfd_events)
+{
+    return (pfd_events & G_IO_IN ? EPOLLIN : 0) |
+           (pfd_events & G_IO_OUT ? EPOLLOUT : 0) |
+           (pfd_events & G_IO_HUP ? EPOLLHUP : 0) |
+           (pfd_events & G_IO_ERR ? EPOLLERR : 0);
+}
+
+static bool aio_epoll_try_enable(AioContext *ctx)
+{
+    AioHandler *node;
+    struct epoll_event event;
+    if (!ctx->epoll_available) {
+        return false;
+    }
+
+    QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+        int r;
+        if (node->deleted || !node->pfd.events) {
+            continue;
+        }
+        event.events = epoll_events_from_pfd(node->pfd.events);
+        event.data.ptr = node;
+        r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
+        if (r) {
+            return false;
+        }
+    }
+    ctx->epoll_enabled = true;
+    return true;
+}
+
+static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
+{
+    struct epoll_event event;
+    int r;
+
+    if (!ctx->epoll_enabled) {
+        return;
+    }
+    if (!node->pfd.events) {
+        r = epoll_ctl(ctx->epollfd, EPOLL_CTL_DEL, node->pfd.fd, &event);
+        assert(!r);
+    } else {
+        event.data.ptr = node;
+        event.events = epoll_events_from_pfd(node->pfd.events);
+        if (is_new) {
+            r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
+            if (r) {
+                aio_epoll_disable(ctx);
+            }
+        } else {
+            r = epoll_ctl(ctx->epollfd, EPOLL_CTL_MOD, node->pfd.fd, &event);
+            assert(!r);
+        }
+    }
+}
+
+static int aio_epoll(AioContext *ctx, GPollFD *pfds,
+                     unsigned npfd, int64_t timeout)
+{
+    AioHandler *node;
+    int i, ret = 0;
+    struct epoll_event events[128];
+
+    assert(npfd == 1);
+    assert(pfds[0].fd == ctx->epollfd);
+    if (timeout > 0) {
+        ret = qemu_poll_ns(pfds, npfd, timeout);
+    }
+    if (timeout <= 0 || ret > 0) {
+        ret = epoll_wait(ctx->epollfd, events,
+                         sizeof(events) / sizeof(events[0]),
+                         timeout);
+        if (ret <= 0) {
+            goto out;
+        }
+        for (i = 0; i < ret; i++) {
+            int ev = events[i].events;
+            node = events[i].data.ptr;
+            node->pfd.revents = (ev & EPOLLIN ? G_IO_IN : 0) |
+                (ev & EPOLLOUT ? G_IO_OUT : 0) |
+                (ev & EPOLLHUP ? G_IO_HUP : 0) |
+                (ev & EPOLLERR ? G_IO_ERR : 0);
+        }
+    }
+out:
+    return ret;
+}
+
+static bool aio_epoll_check_poll(AioContext *ctx, GPollFD *pfds,
+                                 unsigned npfd, int64_t timeout)
+{
+    if (!ctx->epoll_available) {
+        return false;
+    }
+    if (ctx->epoll_enabled) {
+        return true;
+    }
+    if (npfd >= EPOLL_ENABLE_THRESHOLD) {
+        if (aio_epoll_try_enable(ctx)) {
+            return true;
+        } else {
+            aio_epoll_disable(ctx);
+        }
+    }
+    return false;
+}
+
+#else
+
+static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
+{
+}
+
+static int aio_epoll(AioContext *ctx, GPollFD *pfds,
+                     unsigned npfd, int64_t timeout)
+{
+    assert(false);
+}
+
+static bool aio_epoll_check_poll(AioContext *ctx, GPollFD *pfds,
+                          unsigned npfd, int64_t timeout)
+{
+    return false;
+}
+
+#endif
+
 static AioHandler *find_aio_handler(AioContext *ctx, int fd)
 {
     AioHandler *node;
@@ -48,6 +195,7 @@ void aio_set_fd_handler(AioContext *ctx,
                         void *opaque)
 {
     AioHandler *node;
+    bool is_new = false;
 
     node = find_aio_handler(ctx, fd);
 
@@ -77,6 +225,7 @@ void aio_set_fd_handler(AioContext *ctx,
             QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
 
             g_source_add_poll(&ctx->source, &node->pfd);
+            is_new = true;
         }
         /* Update handler with latest information */
         node->io_read = io_read;
@@ -87,6 +236,7 @@ void aio_set_fd_handler(AioContext *ctx,
         node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
     }
 
+    aio_epoll_update(ctx, node, is_new);
     aio_notify(ctx);
 }
 
@@ -257,7 +407,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
     /* fill pollfds */
     QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-        if (!node->deleted && node->pfd.events) {
+        if (!node->deleted && node->pfd.events && !ctx->epoll_enabled) {
             add_pollfd(node);
         }
     }
@@ -268,7 +418,17 @@ bool aio_poll(AioContext *ctx, bool blocking)
     if (timeout) {
         aio_context_release(ctx);
     }
-    ret = qemu_poll_ns((GPollFD *)pollfds, npfd, timeout);
+    if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
+        AioHandler epoll_handler;
+
+        epoll_handler.pfd.fd = ctx->epollfd;
+        epoll_handler.pfd.events = G_IO_IN | G_IO_OUT | G_IO_HUP | G_IO_ERR;
+        npfd = 0;
+        add_pollfd(&epoll_handler);
+        ret = aio_epoll(ctx, pollfds, npfd, timeout);
+    } else  {
+        ret = qemu_poll_ns(pollfds, npfd, timeout);
+    }
     if (blocking) {
         atomic_sub(&ctx->notify_me, 2);
     }
@@ -300,4 +460,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 void aio_context_setup(AioContext *ctx, Error **errp)
 {
+#ifdef CONFIG_EPOLL
+    assert(!ctx->epollfd);
+    ctx->epollfd = epoll_create1(EPOLL_CLOEXEC);
+    if (ctx->epollfd == -1) {
+        ctx->epoll_available = false;
+    } else {
+        ctx->epoll_available = true;
+    }
+#endif
 }
diff --git a/include/block/aio.h b/include/block/aio.h
index a6fd5ad..45137e7 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -122,6 +122,12 @@ struct AioContext {
 
     /* TimerLists for calling timers - one per clock type */
     QEMUTimerListGroup tlg;
+
+#ifdef CONFIG_EPOLL
+    int epollfd;
+    bool epoll_enabled;
+    bool epoll_available;
+#endif
 };
 
 /**
-- 
2.6.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] aio: Introduce aio-epoll.c
  2015-10-13 11:10 ` [Qemu-devel] [PATCH v2 2/2] aio: Introduce aio-epoll.c Fam Zheng
@ 2015-10-16  9:32   ` Stefan Hajnoczi
  2015-10-16 10:10     ` Fam Zheng
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2015-10-16  9:32 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, pbonzini, qemu-devel, qemu-block

On Tue, Oct 13, 2015 at 07:10:55PM +0800, Fam Zheng wrote:
> +static bool aio_epoll_try_enable(AioContext *ctx)
> +{
> +    AioHandler *node;
> +    struct epoll_event event;
> +    if (!ctx->epoll_available) {
> +        return false;
> +    }

Why check this here since aio_epoll_check_poll() already checks it?

> +static int aio_epoll(AioContext *ctx, GPollFD *pfds,
> +                     unsigned npfd, int64_t timeout)
> +{
> +    AioHandler *node;
> +    int i, ret = 0;
> +    struct epoll_event events[128];

The strategy is to support up to 128 events per epoll_wait(2) call and
then wait for the next event loop iteration to harvest any remaining
events?

I just want to make sure I understand how this constant affects epoll
behavior.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] aio: Introduce aio-epoll.c
  2015-10-16  9:32   ` Stefan Hajnoczi
@ 2015-10-16 10:10     ` Fam Zheng
  0 siblings, 0 replies; 5+ messages in thread
From: Fam Zheng @ 2015-10-16 10:10 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, pbonzini, qemu-devel, qemu-block

On Fri, 10/16 11:32, Stefan Hajnoczi wrote:
> On Tue, Oct 13, 2015 at 07:10:55PM +0800, Fam Zheng wrote:
> > +static bool aio_epoll_try_enable(AioContext *ctx)
> > +{
> > +    AioHandler *node;
> > +    struct epoll_event event;
> > +    if (!ctx->epoll_available) {
> > +        return false;
> > +    }
> 
> Why check this here since aio_epoll_check_poll() already checks it?

You're right, it's redundant. I will remove it.

> 
> > +static int aio_epoll(AioContext *ctx, GPollFD *pfds,
> > +                     unsigned npfd, int64_t timeout)
> > +{
> > +    AioHandler *node;
> > +    int i, ret = 0;
> > +    struct epoll_event events[128];
> 
> The strategy is to support up to 128 events per epoll_wait(2) call and
> then wait for the next event loop iteration to harvest any remaining
> events?

Yes.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-10-16 10:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-13 11:10 [Qemu-devel] [PATCH v2 0/2] aio: Use epoll in aio_poll() Fam Zheng
2015-10-13 11:10 ` [Qemu-devel] [PATCH v2 1/2] aio: Introduce aio_context_setup Fam Zheng
2015-10-13 11:10 ` [Qemu-devel] [PATCH v2 2/2] aio: Introduce aio-epoll.c Fam Zheng
2015-10-16  9:32   ` Stefan Hajnoczi
2015-10-16 10:10     ` Fam Zheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).