qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] iohandler: Convert to GSource and use epoll on Linux
@ 2014-09-25 17:21 Fam Zheng
  2014-09-25 17:21 ` [Qemu-devel] [PATCH 1/2] iohandler: Convert I/O handler to GSource Fam Zheng
  2014-09-25 17:21 ` [Qemu-devel] [PATCH 2/2] iohandler: Add Linux implementation of iohandler GSource Fam Zheng
  0 siblings, 2 replies; 6+ messages in thread
From: Fam Zheng @ 2014-09-25 17:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

ppoll is not scalable, it has a complexity of O(n). When we have many virtio
queues, the main loop could be slowed down.

epoll, which is O(1), could solve this problem well.

In order to do this, we need to factor out an interface between main loop and
iohandler. What we have now is not good, so patch 1 changed it to GSource,
which is attached to the main context. The posix implementation, which is
identical, attaches each iohandler fd to the iohandler GSrouce. They are
automatically polled by g_poll, or extracted to ppoll.

Patch 2 adds another GSource which attaches a single epoll fd. The epoll fd set
manages all the iohandler fds. Each time the epoll fd is poked by main loop
polling, we call epoll_wait on it to dispatch the fds that are ready to
read/write.

A pitfall is that certain type(s) of fds can't be added to a epoll set. One
such case is the normal file or here document stdin fd. We have to special case
it (where epoll_ctl returns -1 with errno equals to EPERM), and always notify
these fds in the dispatch function.

To show the difference, let's compare the iodepth=1 read test on virtio-blk
by toggling the virtio-serial module (which adds or removes tens of
ioeventfds).

Before:

case            rw         bs         bw         iops       latency   
----------------------------------------------------------------------
vserial=on      read       64k        1346       21548      45        
vserial=off     read       64k        1956       31305      30        

After:
case            rw         bs         bw         iops       latency   
----------------------------------------------------------------------
vserial=on      read       64k        1727       27647      34        
vserial=off     read       64k        1741       27868      34        

The vserial=on case is better than before because we turned "ppoll with many
fds" into ("ppoll with a few fds" + "epoll with a few fds").

The vserial=off case is worse than before because we turned "ppoll with a few
fds" into ("ppoll with a few fds" + "epoll with a few fds").

To be optimal in both cases, we may dynamically enable/disable epoll depending
on the active ioeventfds. But that's not done yet.

Fam


Fam Zheng (2):
  iohandler: Convert I/O handler to GSource
  iohandler: Add Linux implementation of iohandler GSource

 Makefile.objs            |   2 +
 include/qemu/iohandler.h |  65 +++++++++++++++
 include/qemu/main-loop.h |   2 -
 iohandler-linux.c        | 213 +++++++++++++++++++++++++++++++++++++++++++++++
 iohandler-posix.c        | 150 +++++++++++++++++++++++++++++++++
 iohandler.c              |  90 ++------------------
 main-loop.c              |   6 +-
 7 files changed, 442 insertions(+), 86 deletions(-)
 create mode 100644 include/qemu/iohandler.h
 create mode 100644 iohandler-linux.c
 create mode 100644 iohandler-posix.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 1/2] iohandler: Convert I/O handler to GSource
  2014-09-25 17:21 [Qemu-devel] [PATCH 0/2] iohandler: Convert to GSource and use epoll on Linux Fam Zheng
@ 2014-09-25 17:21 ` Fam Zheng
  2014-09-25 17:21 ` [Qemu-devel] [PATCH 2/2] iohandler: Add Linux implementation of iohandler GSource Fam Zheng
  1 sibling, 0 replies; 6+ messages in thread
From: Fam Zheng @ 2014-09-25 17:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

Previously, I/O handler fd's are hooked into main loop by:

 1) qemu_iohandler_fill to add the list of io handler fds to g_poll.

 2) qemu_iohandler_poll to check the revent values and do the dispatch.

This patch moves all the fds into a GSource, which is attached to the
main event loop. This way we don't have to rebuild the whole list of fds
for every iteration, and there is a cleaner interface between us and
main loop. Furthermore, it makes adding a Linux specific implementation
(epoll) a lot easier.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 Makefile.objs            |   2 +-
 include/qemu/iohandler.h |  52 ++++++++++++++++
 include/qemu/main-loop.h |   2 -
 iohandler-posix.c        | 150 +++++++++++++++++++++++++++++++++++++++++++++++
 iohandler.c              |  90 +++-------------------------
 main-loop.c              |   6 +-
 6 files changed, 215 insertions(+), 87 deletions(-)
 create mode 100644 include/qemu/iohandler.h
 create mode 100644 iohandler-posix.c

diff --git a/Makefile.objs b/Makefile.objs
index 97db978..55dbc36 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -8,7 +8,7 @@ util-obj-y = util/ qobject/ qapi/ qapi-types.o qapi-visit.o qapi-event.o
 
 block-obj-y = async.o thread-pool.o
 block-obj-y += nbd.o block.o blockjob.o
-block-obj-y += main-loop.o iohandler.o qemu-timer.o
+block-obj-y += main-loop.o iohandler.o qemu-timer.o iohandler-posix.o
 block-obj-$(CONFIG_POSIX) += aio-posix.o
 block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
diff --git a/include/qemu/iohandler.h b/include/qemu/iohandler.h
new file mode 100644
index 0000000..e2af47d
--- /dev/null
+++ b/include/qemu/iohandler.h
@@ -0,0 +1,52 @@
+/*
+ * QEMU I/O Handler
+ *
+ * Copyright (c) 2014 Red Hat, Inc.
+ *
+ * Author: Fam Zheng <famz@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_IOHANDLER_H
+#define QEMU_IOHANDLER_H
+
+#include "qemu/main-loop.h"
+
+typedef struct IOHandlerRecord {
+    IOCanReadHandler *fd_read_poll;
+    IOHandler *fd_read;
+    IOHandler *fd_write;
+    void *opaque;
+    QLIST_ENTRY(IOHandlerRecord) next;
+    int fd;
+    bool deleted;
+    GPollFD gpfd;
+    bool attached;
+} IOHandlerRecord;
+
+typedef struct {
+    GSource source;
+
+    QLIST_HEAD(, IOHandlerRecord) io_handlers;
+} IOHandlerSource;
+
+GSource *qemu_iohandler_get_source(void);
+
+#endif
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 62c68c0..065944c 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -302,8 +302,6 @@ void qemu_mutex_unlock_iothread(void);
 /* internal interfaces */
 
 void qemu_fd_register(int fd);
-void qemu_iohandler_fill(GArray *pollfds);
-void qemu_iohandler_poll(GArray *pollfds, int rc);
 
 QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
 void qemu_bh_schedule_idle(QEMUBH *bh);
diff --git a/iohandler-posix.c b/iohandler-posix.c
new file mode 100644
index 0000000..0fac91f
--- /dev/null
+++ b/iohandler-posix.c
@@ -0,0 +1,150 @@
+/*
+ * I/O Handler posix implementation
+ *
+ * Copyright (c) 2014 Red Hat, Inc.
+ *
+ * Author: Fam Zheng <famz@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "config-host.h"
+#include "qemu-common.h"
+#include "qemu/iohandler.h"
+
+/* Prepare for the poll by synchronize IO handler list to GSource. */
+static gboolean iohandler_source_prepare(GSource *source, gint *timeout)
+{
+    IOHandlerRecord *ioh;
+    IOHandlerSource *s = (IOHandlerSource *)source;
+
+    QLIST_FOREACH(ioh, &s->io_handlers, next) {
+        bool add = false, remove = false;
+
+        if (ioh->deleted) {
+            remove = ioh->attached;
+        } else {
+            int events = 0;
+            if (ioh->fd_read &&
+                (!ioh->fd_read_poll ||
+                 ioh->fd_read_poll(ioh->opaque) != 0)) {
+                events |= G_IO_IN | G_IO_HUP | G_IO_ERR;
+            }
+            if (ioh->fd_write) {
+                events |= G_IO_OUT | G_IO_ERR;
+            }
+            ioh->gpfd.events = events,
+
+            remove = !events;
+            add = events && !ioh->attached;
+        }
+        if (remove) {
+            assert(!add);
+            g_source_remove_poll(source, &ioh->gpfd);
+            ioh->attached = false;
+        }
+        if (add) {
+            g_source_add_poll(source, &ioh->gpfd);
+            ioh->attached = true;
+        }
+    }
+    *timeout = -1;
+    return false;
+}
+
+static gboolean iohandler_source_check(GSource *source)
+{
+    IOHandlerRecord *ioh;
+    IOHandlerSource *s = (IOHandlerSource *)source;
+
+    QLIST_FOREACH(ioh, &s->io_handlers, next) {
+        int events;
+        if (!ioh->attached) {
+            continue;
+        }
+        events = ioh->gpfd.revents & ioh->gpfd.events;
+        if (ioh->fd_read &&
+                (events & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
+                (!ioh->fd_read_poll || ioh->fd_read_poll(ioh->opaque) != 0)) {
+            return true;
+        }
+        if (ioh->fd_write && (events & (G_IO_OUT | G_IO_ERR))) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static gboolean iohandler_source_dispatch(GSource *source,
+                                          GSourceFunc callback,
+                                          gpointer data)
+{
+    IOHandlerRecord *pioh, *ioh;
+    IOHandlerSource *s = (IOHandlerSource *)source;
+    int ret = false;
+
+    assert(callback == NULL);
+
+    QLIST_FOREACH_SAFE(ioh, &s->io_handlers, next, pioh) {
+        int revents = 0;
+
+        if (ioh->deleted) {
+            assert(!ioh->attached);
+        } else {
+            revents = ioh->gpfd.events & ioh->gpfd.revents;
+        }
+        if (ioh->fd_read && (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR))) {
+            ioh->fd_read(ioh->opaque);
+            ret = true;
+        }
+        if (ioh->fd_write && (revents & (G_IO_OUT | G_IO_ERR))) {
+            ioh->fd_write(ioh->opaque);
+            ret = true;
+        }
+
+        /* Do this last in case read/write handlers marked it for deletion */
+        if (ioh->deleted) {
+            if (ioh->attached) {
+                g_source_remove_poll(source, &ioh->gpfd);
+            }
+            QLIST_REMOVE(ioh, next);
+            g_free(ioh);
+        }
+    }
+    return ret;
+}
+
+static GSourceFuncs iohandler_source_funcs = {
+    iohandler_source_prepare,
+    iohandler_source_check,
+    iohandler_source_dispatch,
+    /* finalize */ NULL
+};
+
+GSource *qemu_iohandler_get_source(void)
+{
+    static IOHandlerSource *ioh_source;
+    if (!ioh_source) {
+        GSource *source = g_source_new(&iohandler_source_funcs,
+                                       sizeof(IOHandlerSource));
+        ioh_source = (IOHandlerSource *)source;
+        QLIST_INIT(&ioh_source->io_handlers);
+    }
+    return &ioh_source->source;
+}
diff --git a/iohandler.c b/iohandler.c
index cca614f..79982ee 100644
--- a/iohandler.c
+++ b/iohandler.c
@@ -27,26 +27,12 @@
 #include "qemu/queue.h"
 #include "block/aio.h"
 #include "qemu/main-loop.h"
+#include "qemu/iohandler.h"
 
 #ifndef _WIN32
 #include <sys/wait.h>
 #endif
 
-typedef struct IOHandlerRecord {
-    IOCanReadHandler *fd_read_poll;
-    IOHandler *fd_read;
-    IOHandler *fd_write;
-    void *opaque;
-    QLIST_ENTRY(IOHandlerRecord) next;
-    int fd;
-    int pollfds_idx;
-    bool deleted;
-} IOHandlerRecord;
-
-static QLIST_HEAD(, IOHandlerRecord) io_handlers =
-    QLIST_HEAD_INITIALIZER(io_handlers);
-
-
 /* XXX: fd_read_poll should be suppressed, but an API change is
    necessary in the character devices to suppress fd_can_read(). */
 int qemu_set_fd_handler2(int fd,
@@ -56,31 +42,33 @@ int qemu_set_fd_handler2(int fd,
                          void *opaque)
 {
     IOHandlerRecord *ioh;
+    IOHandlerSource *source = (IOHandlerSource *)qemu_iohandler_get_source();
 
     assert(fd >= 0);
 
     if (!fd_read && !fd_write) {
-        QLIST_FOREACH(ioh, &io_handlers, next) {
+        QLIST_FOREACH(ioh, &source->io_handlers, next) {
             if (ioh->fd == fd) {
                 ioh->deleted = 1;
                 break;
             }
         }
     } else {
-        QLIST_FOREACH(ioh, &io_handlers, next) {
+        QLIST_FOREACH(ioh, &source->io_handlers, next) {
             if (ioh->fd == fd)
                 goto found;
         }
         ioh = g_malloc0(sizeof(IOHandlerRecord));
-        QLIST_INSERT_HEAD(&io_handlers, ioh, next);
+        QLIST_INSERT_HEAD(&source->io_handlers, ioh, next);
     found:
         ioh->fd = fd;
         ioh->fd_read_poll = fd_read_poll;
         ioh->fd_read = fd_read;
         ioh->fd_write = fd_write;
         ioh->opaque = opaque;
-        ioh->pollfds_idx = -1;
-        ioh->deleted = 0;
+        ioh->deleted = false;
+        ioh->attached = false;
+        ioh->gpfd.fd = fd;
         qemu_notify_event();
     }
     return 0;
@@ -94,68 +82,6 @@ int qemu_set_fd_handler(int fd,
     return qemu_set_fd_handler2(fd, NULL, fd_read, fd_write, opaque);
 }
 
-void qemu_iohandler_fill(GArray *pollfds)
-{
-    IOHandlerRecord *ioh;
-
-    QLIST_FOREACH(ioh, &io_handlers, next) {
-        int events = 0;
-
-        if (ioh->deleted)
-            continue;
-        if (ioh->fd_read &&
-            (!ioh->fd_read_poll ||
-             ioh->fd_read_poll(ioh->opaque) != 0)) {
-            events |= G_IO_IN | G_IO_HUP | G_IO_ERR;
-        }
-        if (ioh->fd_write) {
-            events |= G_IO_OUT | G_IO_ERR;
-        }
-        if (events) {
-            GPollFD pfd = {
-                .fd = ioh->fd,
-                .events = events,
-            };
-            ioh->pollfds_idx = pollfds->len;
-            g_array_append_val(pollfds, pfd);
-        } else {
-            ioh->pollfds_idx = -1;
-        }
-    }
-}
-
-void qemu_iohandler_poll(GArray *pollfds, int ret)
-{
-    if (ret > 0) {
-        IOHandlerRecord *pioh, *ioh;
-
-        QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
-            int revents = 0;
-
-            if (!ioh->deleted && ioh->pollfds_idx != -1) {
-                GPollFD *pfd = &g_array_index(pollfds, GPollFD,
-                                              ioh->pollfds_idx);
-                revents = pfd->revents;
-            }
-
-            if (!ioh->deleted && ioh->fd_read &&
-                (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR))) {
-                ioh->fd_read(ioh->opaque);
-            }
-            if (!ioh->deleted && ioh->fd_write &&
-                (revents & (G_IO_OUT | G_IO_ERR))) {
-                ioh->fd_write(ioh->opaque);
-            }
-
-            /* Do this last in case read/write handlers marked it for deletion */
-            if (ioh->deleted) {
-                QLIST_REMOVE(ioh, next);
-                g_free(ioh);
-            }
-        }
-    }
-}
-
 /* reaping of zombies.  right now we're not passing the status to
    anyone, but it would be possible to add a callback.  */
 #ifndef _WIN32
diff --git a/main-loop.c b/main-loop.c
index 53393a4..4f81e9f 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -28,6 +28,7 @@
 #include "sysemu/qtest.h"
 #include "slirp/libslirp.h"
 #include "qemu/main-loop.h"
+#include "qemu/iohandler.h"
 #include "block/aio.h"
 
 #ifndef _WIN32
@@ -148,6 +149,9 @@ int qemu_init_main_loop(Error **errp)
     src = aio_get_g_source(qemu_aio_context);
     g_source_attach(src, NULL);
     g_source_unref(src);
+    src = qemu_iohandler_get_source();
+    g_source_attach(src, NULL);
+    g_source_unref(src);
     return 0;
 }
 
@@ -474,7 +478,6 @@ int main_loop_wait(int nonblocking)
 #ifdef CONFIG_SLIRP
     slirp_pollfds_fill(gpollfds, &timeout);
 #endif
-    qemu_iohandler_fill(gpollfds);
 
     if (timeout == UINT32_MAX) {
         timeout_ns = -1;
@@ -487,7 +490,6 @@ int main_loop_wait(int nonblocking)
                                           &main_loop_tlg));
 
     ret = os_host_main_loop_wait(timeout_ns);
-    qemu_iohandler_poll(gpollfds, ret);
 #ifdef CONFIG_SLIRP
     slirp_pollfds_poll(gpollfds, (ret < 0));
 #endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 2/2] iohandler: Add Linux implementation of iohandler GSource
  2014-09-25 17:21 [Qemu-devel] [PATCH 0/2] iohandler: Convert to GSource and use epoll on Linux Fam Zheng
  2014-09-25 17:21 ` [Qemu-devel] [PATCH 1/2] iohandler: Convert I/O handler to GSource Fam Zheng
@ 2014-09-25 17:21 ` Fam Zheng
  2014-09-25 19:45   ` Paolo Bonzini
  1 sibling, 1 reply; 6+ messages in thread
From: Fam Zheng @ 2014-09-25 17:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

The Linux-specific syscall epoll(7) has a constant complexity, whereas
ppoll/g_poll is linear complexity, depending on the number of fds.

The event loop is more efficient with epoll, because we only need to
poll on few fds now.

Sometimes EPOLL_CTL_ADD returns -1 with errno = EPERM, when the target
file descriptor doesn't support epoll and they are always ready for
read/write. We mark such fds and always dispatch.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 Makefile.objs            |   4 +-
 include/qemu/iohandler.h |  13 +++
 iohandler-linux.c        | 213 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 229 insertions(+), 1 deletion(-)
 create mode 100644 iohandler-linux.c

diff --git a/Makefile.objs b/Makefile.objs
index 55dbc36..3244c65 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -8,7 +8,9 @@ util-obj-y = util/ qobject/ qapi/ qapi-types.o qapi-visit.o qapi-event.o
 
 block-obj-y = async.o thread-pool.o
 block-obj-y += nbd.o block.o blockjob.o
-block-obj-y += main-loop.o iohandler.o qemu-timer.o iohandler-posix.o
+block-obj-y += main-loop.o iohandler.o qemu-timer.o
+block-obj-$(call lnot,$(CONFIG_LINUX)) += iohandler-posix.o
+block-obj-$(CONFIG_LINUX) += iohandler-linux.o
 block-obj-$(CONFIG_POSIX) += aio-posix.o
 block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
diff --git a/include/qemu/iohandler.h b/include/qemu/iohandler.h
index e2af47d..a879796 100644
--- a/include/qemu/iohandler.h
+++ b/include/qemu/iohandler.h
@@ -27,7 +27,11 @@
 #ifndef QEMU_IOHANDLER_H
 #define QEMU_IOHANDLER_H
 
+#include "config-host.h"
 #include "qemu/main-loop.h"
+#ifdef CONFIG_LINUX
+#include <sys/epoll.h>
+#endif
 
 typedef struct IOHandlerRecord {
     IOCanReadHandler *fd_read_poll;
@@ -39,11 +43,20 @@ typedef struct IOHandlerRecord {
     bool deleted;
     GPollFD gpfd;
     bool attached;
+#ifdef CONFIG_LINUX
+    struct epoll_event epoll_event;
+    bool fallback;
+#endif
+
 } IOHandlerRecord;
 
 typedef struct {
     GSource source;
 
+#ifdef CONFIG_LINUX
+    GPollFD epollfd;
+#endif
+
     QLIST_HEAD(, IOHandlerRecord) io_handlers;
 } IOHandlerSource;
 
diff --git a/iohandler-linux.c b/iohandler-linux.c
new file mode 100644
index 0000000..61f569b
--- /dev/null
+++ b/iohandler-linux.c
@@ -0,0 +1,213 @@
+/*
+ * I/O Handler posix implementation
+ *
+ * Copyright (c) 2014 Red Hat, Inc.
+ *
+ * Author: Fam Zheng <famz@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "config-host.h"
+#include "qemu-common.h"
+#include "qemu/iohandler.h"
+
+static int iohandler_get_events(IOHandlerSource *s, IOHandlerRecord *ioh)
+{
+    int events = 0;
+
+    if (!ioh->deleted) {
+        if (ioh->fd_read &&
+                (!ioh->fd_read_poll ||
+                 ioh->fd_read_poll(ioh->opaque) != 0)) {
+            events |= EPOLLIN | EPOLLHUP | EPOLLERR;
+        }
+        if (ioh->fd_write) {
+            events |= EPOLLOUT | EPOLLERR;
+        }
+    }
+
+    return events;
+}
+
+static gboolean iohandler_source_prepare(GSource *source, gint *timeout)
+{
+    IOHandlerRecord *ioh;
+    IOHandlerSource *s = (IOHandlerSource *)source;
+    int old_events, new_events, r;
+
+    QLIST_FOREACH(ioh, &s->io_handlers, next) {
+        old_events = ioh->epoll_event.events;
+        new_events = iohandler_get_events(s, ioh) & (EPOLLIN | EPOLLOUT);
+        ioh->epoll_event.events = new_events;
+        ioh->epoll_event.data.ptr = ioh;
+        if (old_events != new_events) {
+            if (!old_events) {
+                r = epoll_ctl(s->epollfd.fd, EPOLL_CTL_ADD, ioh->fd,
+                              &ioh->epoll_event);
+                if (r) {
+                    if (errno == EPERM) {
+                        /* Some fds don't work with epoll, let's mark it as
+                         * always ready. */
+                        ioh->fallback = true;
+                    } else {
+                        perror("epoll_ctl add");
+                        abort();
+                    }
+                }
+            } else if (new_events) {
+                /* Modify could fail when the fd is not available any more.
+                 * */
+                r = epoll_ctl(s->epollfd.fd, EPOLL_CTL_MOD, ioh->fd,
+                              &ioh->epoll_event);
+            }
+        }
+    }
+
+    *timeout = -1;
+    return false;
+}
+
+static gboolean iohandler_source_check(GSource *source)
+{
+    IOHandlerRecord *ioh;
+    IOHandlerSource *s = (IOHandlerSource *)source;
+
+    QLIST_FOREACH(ioh, &s->io_handlers, next) {
+        int events;
+        events = s->epollfd.revents;
+        if (ioh->fd_read &&
+                (events & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
+                (!ioh->fd_read_poll || ioh->fd_read_poll(ioh->opaque) != 0)) {
+            return true;
+        }
+        if (ioh->fd_write && (events & (G_IO_OUT | G_IO_ERR))) {
+            return true;
+        }
+        if (ioh->fallback) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static inline void iohandler_dispatch_event(IOHandlerSource *s,
+                                            struct epoll_event *ev)
+{
+    IOHandlerRecord *ioh = ev->data.ptr;
+    int revents;
+
+    if (!ioh->deleted) {
+        revents = iohandler_get_events(s, ioh) & ev->events;
+
+        if (ioh->fd_read && (revents & (EPOLLIN | EPOLLHUP | EPOLLERR))) {
+            ioh->fd_read(ioh->opaque);
+        }
+        if (ioh->fd_write && (revents & (EPOLLOUT | EPOLLERR))) {
+            ioh->fd_write(ioh->opaque);
+        }
+    }
+
+    /* Do this last in case read/write handlers marked it for deletion */
+    if (ioh->deleted) {
+        /* Delete could fail when the fd is not available any more.
+         * */
+        epoll_ctl(s->epollfd.fd, EPOLL_CTL_DEL, ioh->fd,
+                      &ioh->epoll_event);
+        QLIST_REMOVE(ioh, next);
+        g_free(ioh);
+    }
+}
+
+#define MAX_EVENTS 10
+
+static gboolean iohandler_source_dispatch(GSource *source,
+                                          GSourceFunc callback,
+                                          gpointer data)
+{
+    IOHandlerSource *s = (IOHandlerSource *)source;
+    struct epoll_event events[MAX_EVENTS];
+    int i, r, revents;
+    sigset_t origmask;
+    IOHandlerRecord *ioh;
+
+    assert(callback == NULL);
+
+    while (true) {
+        r = epoll_pwait(s->epollfd.fd, events, MAX_EVENTS, 0, &origmask);
+        if (r < 0) {
+            break;
+        } else if (r == 0) {
+            break;
+        } else {
+            for (i = 0; i < r; i++) {
+                iohandler_dispatch_event(s, &events[i]);
+            }
+            if (r < MAX_EVENTS) {
+                break;
+            }
+        }
+    }
+
+    QLIST_FOREACH(ioh, &s->io_handlers, next) {
+        if (!ioh->fallback) {
+            continue;
+        }
+        revents = iohandler_get_events(s, ioh);
+
+        if (ioh->fd_read && (revents & (EPOLLIN | EPOLLHUP | EPOLLERR))) {
+            ioh->fd_read(ioh->opaque);
+        }
+        if (ioh->fd_write && (revents & (EPOLLOUT | EPOLLERR))) {
+            ioh->fd_write(ioh->opaque);
+        }
+    }
+
+    return true;
+}
+
+static GSourceFuncs iohandler_source_funcs = {
+    iohandler_source_prepare,
+    iohandler_source_check,
+    iohandler_source_dispatch,
+    /* finalize */ NULL
+};
+
+GSource *qemu_iohandler_get_source(void)
+{
+    static IOHandlerSource *ioh_source;
+    if (!ioh_source) {
+        int epollfd;
+        GSource *source = g_source_new(&iohandler_source_funcs,
+                                       sizeof(IOHandlerSource));
+        ioh_source = (IOHandlerSource *)source;
+        QLIST_INIT(&ioh_source->io_handlers);
+        epollfd = epoll_create(1);
+        if (epollfd == -1) {
+            perror("epoll_create");
+            exit(1);
+        }
+        ioh_source->epollfd = (GPollFD) {
+            .fd = epollfd,
+            .events = G_IO_IN | G_IO_OUT | G_IO_HUP | G_IO_ERR,
+        };
+        g_source_add_poll(source, &ioh_source->epollfd);
+    }
+    return &ioh_source->source;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] iohandler: Add Linux implementation of iohandler GSource
  2014-09-25 17:21 ` [Qemu-devel] [PATCH 2/2] iohandler: Add Linux implementation of iohandler GSource Fam Zheng
@ 2014-09-25 19:45   ` Paolo Bonzini
  2014-09-26  1:23     ` Fam Zheng
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2014-09-25 19:45 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

Il 25/09/2014 19:21, Fam Zheng ha scritto:
> +    while (true) {
> +        r = epoll_pwait(s->epollfd.fd, events, MAX_EVENTS, 0, &origmask);

You can save a syscall by doing this just once.  You would just get a
readable epoll file descriptor on the next main loop iteration.  Also,
origmask is an input parameter, so you need to use epoll_wait.

Also, as an extra optimization perhaps you can make a second list with
iohandlers that were modified or have a read_poll handler, and only call
iohandler_get_events on that one.

Perhaps this together lowers the cost of epoll enough to more easily
reach the break-even point.

Though I wonder if it would be acceptable to make ioeventfd off by
default for virtio-serial...

Thanks,

Paolo

> +        if (r < 0) {
> +            break;
> +        } else if (r == 0) {
> +            break;
> +        } else {
> +            for (i = 0; i < r; i++) {
> +                iohandler_dispatch_event(s, &events[i]);
> +            }
> +            if (r < MAX_EVENTS) {
> +                break;
> +            }
> +        }
> +    }
> +

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] iohandler: Add Linux implementation of iohandler GSource
  2014-09-25 19:45   ` Paolo Bonzini
@ 2014-09-26  1:23     ` Fam Zheng
  2014-09-26  7:42       ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Fam Zheng @ 2014-09-26  1:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Thu, 09/25 21:45, Paolo Bonzini wrote:
> Il 25/09/2014 19:21, Fam Zheng ha scritto:
> > +    while (true) {
> > +        r = epoll_pwait(s->epollfd.fd, events, MAX_EVENTS, 0, &origmask);
> 
> You can save a syscall by doing this just once.  You would just get a
> readable epoll file descriptor on the next main loop iteration.

There is only one call here if r < MAX_EVENTS, which is the normal case.

> Also,
> origmask is an input parameter, so you need to use epoll_wait.

Yeah, I missed the point completely. :p

> 
> Also, as an extra optimization perhaps you can make a second list with
> iohandlers that were modified or have a read_poll handler, and only call
> iohandler_get_events on that one.

Sounds good, but I need to benchmark it to tell :)

There isn't a lot of computation if there is no read_poll.

> 
> Perhaps this together lowers the cost of epoll enough to more easily
> reach the break-even point.

Dynamic swithing between ppoll and epoll shoudn't be hard to implement, but I'm
not sure how to write the condition. Anyway, do you think it is a good idea?

Fam

> 
> Though I wonder if it would be acceptable to make ioeventfd off by
> default for virtio-serial...
> 
> Thanks,
> 
> Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] iohandler: Add Linux implementation of iohandler GSource
  2014-09-26  1:23     ` Fam Zheng
@ 2014-09-26  7:42       ` Paolo Bonzini
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2014-09-26  7:42 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

Il 26/09/2014 03:23, Fam Zheng ha scritto:
> > Also, as an extra optimization perhaps you can make a second list with
> > iohandlers that were modified or have a read_poll handler, and only call
> > iohandler_get_events on that one.
> 
> Sounds good, but I need to benchmark it to tell :)
> 
> There isn't a lot of computation if there is no read_poll.
> 
> > Perhaps this together lowers the cost of epoll enough to more easily
> > reach the break-even point.
> 
> Dynamic swithing between ppoll and epoll shoudn't be hard to implement, but I'm
> not sure how to write the condition.

Perhaps just a count of the number of polled file descriptors.

> Anyway, do you think it is a good idea?

It depends.  I think first of all we need to profile it for low file
descriptor counts, and see if obvious sources of overhead stand out
either in QEMU or in the kernel...

There's time anyway, since we also need a Win32 implementation.

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-09-26  7:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-25 17:21 [Qemu-devel] [PATCH 0/2] iohandler: Convert to GSource and use epoll on Linux Fam Zheng
2014-09-25 17:21 ` [Qemu-devel] [PATCH 1/2] iohandler: Convert I/O handler to GSource Fam Zheng
2014-09-25 17:21 ` [Qemu-devel] [PATCH 2/2] iohandler: Add Linux implementation of iohandler GSource Fam Zheng
2014-09-25 19:45   ` Paolo Bonzini
2014-09-26  1:23     ` Fam Zheng
2014-09-26  7:42       ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).