* [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode
@ 2016-12-01 19:26 Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 01/13] aio: add flag to skip fds to aio_dispatch() Stefan Hajnoczi
` (16 more replies)
0 siblings, 17 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
v4:
* Added poll time self-tuning algorithm [Christian and Paolo]
* Try a single iteration of polling to avoid non-blocking ppoll(2)/epoll_wait(2) [Paolo]
* Reordered patches to make performance analysis easier - see below
v3:
* Avoid ppoll(2)/epoll_wait(2) if polling succeeded [Paolo]
* Disable guest->host virtqueue notification during polling [Christian]
* Rebased on top of my virtio-blk/scsi virtqueue notification disable patches
v2:
* Uninitialized node->deleted gone [Fam]
* Removed 1024 polling loop iteration qemu_clock_get_ns() optimization which
created a weird step pattern [Fam]
* Unified with AioHandler, dropped AioPollHandler struct [Paolo]
(actually I think Paolo had more in mind but this is the first step)
* Only poll when all event loop resources support it [Paolo]
* Added run_poll_handlers_begin/end trace events for perf analysis
* Sorry, Christian, no virtqueue kick suppression yet
Recent performance investigation work done by Karl Rister shows that the
guest->host notification takes around 20 us. This is more than the "overhead"
of QEMU itself (e.g. block layer).
One way to avoid the costly exit is to use polling instead of notification.
The main drawback of polling is that it consumes CPU resources. In order to
benefit performance the host must have extra CPU cycles available on physical
CPUs that aren't used by the guest.
This is an experimental AioContext polling implementation. It adds a polling
callback into the event loop. Polling functions are implemented for virtio-blk
virtqueue guest->host kick and Linux AIO completion.
The -object iothread,poll-max-ns=NUM parameter sets the number of nanoseconds
to poll before entering the usual blocking poll(2) syscall. Try setting this
parameter to the time from old request completion to new virtqueue kick. By
default no polling is done so you must set this parameter to get busy polling.
Patch series overview:
Here are convenient points in the patch series at which to do performance
benchmarking to see how various pieces interact.
1. Basic -iothread poll-max-ns=NUM parameter without self-tuning:
aio: add flag to skip fds to aio_dispatch()
aio: add AioPollFn and io_poll() interface
aio: add polling mode to AioContext
virtio: poll virtqueues for new buffers
linux-aio: poll ring for completions
iothread: add polling parameters
2. My "[PATCH 0/3] virtio: disable notifications in blk and scsi" series is
needed as a base. This is unrelated to polling but we'll also disable
notifications during polling later:
virtio-blk: suppress virtqueue kick during processing
virtio-scsi: suppress virtqueue kick during processing
3. Disable virtqueue notifications (reduce vmexits) during polling:
virtio: turn vq->notification into a nested counter
aio: add .io_poll_begin/end() callbacks
virtio: disable virtqueue notifications during polling
4. Self-tuning, adjusts polling time depending on wait times:
aio: self-tune polling time
iothread: add poll-grow and poll-shrink parameters
Stefan Hajnoczi (13):
aio: add flag to skip fds to aio_dispatch()
aio: add AioPollFn and io_poll() interface
aio: add polling mode to AioContext
virtio: poll virtqueues for new buffers
linux-aio: poll ring for completions
iothread: add polling parameters
virtio-blk: suppress virtqueue kick during processing
virtio-scsi: suppress virtqueue kick during processing
virtio: turn vq->notification into a nested counter
aio: add .io_poll_begin/end() callbacks
virtio: disable virtqueue notifications during polling
aio: self-tune polling time
iothread: add poll-grow and poll-shrink parameters
include/block/aio.h | 53 +++++++-
include/sysemu/iothread.h | 5 +
aio-posix.c | 308 +++++++++++++++++++++++++++++++++++++++-----
aio-win32.c | 32 ++++-
async.c | 21 ++-
block/curl.c | 8 +-
block/iscsi.c | 3 +-
block/linux-aio.c | 19 ++-
block/nbd-client.c | 8 +-
block/nfs.c | 7 +-
block/sheepdog.c | 26 ++--
block/ssh.c | 4 +-
block/win32-aio.c | 4 +-
hw/block/virtio-blk.c | 18 ++-
hw/scsi/virtio-scsi.c | 36 +++---
hw/virtio/virtio.c | 54 ++++++--
iohandler.c | 2 +-
iothread.c | 84 ++++++++++++
nbd/server.c | 9 +-
stubs/set-fd-handler.c | 1 +
tests/test-aio.c | 4 +-
util/event_notifier-posix.c | 2 +-
trace-events | 6 +
23 files changed, 604 insertions(+), 110 deletions(-)
--
2.9.3
^ permalink raw reply [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 01/13] aio: add flag to skip fds to aio_dispatch()
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 02/13] aio: add AioPollFn and io_poll() interface Stefan Hajnoczi
` (15 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
Polling mode will not call ppoll(2)/epoll_wait(2). Therefore we know
there are no fds ready and should avoid looping over fd handlers in
aio_dispatch().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/aio.h | 6 +++++-
aio-posix.c | 14 ++++++++++----
aio-win32.c | 6 ++++--
async.c | 2 +-
4 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/include/block/aio.h b/include/block/aio.h
index c7ae27c..200f607 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -295,8 +295,12 @@ bool aio_pending(AioContext *ctx);
/* Dispatch any pending callbacks from the GSource attached to the AioContext.
*
* This is used internally in the implementation of the GSource.
+ *
+ * @dispatch_fds: true to process fds, false to skip them
+ * (can be used as an optimization by callers that know there
+ * are no fds ready)
*/
-bool aio_dispatch(AioContext *ctx);
+bool aio_dispatch(AioContext *ctx, bool dispatch_fds);
/* Progress in completing AIO work to occur. This can issue new pending
* aio as a result of executing I/O completion or bh callbacks.
diff --git a/aio-posix.c b/aio-posix.c
index e13b9ab..4ac2346 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -290,9 +290,13 @@ bool aio_pending(AioContext *ctx)
return false;
}
-bool aio_dispatch(AioContext *ctx)
+/*
+ * Note that dispatch_fds == false has the side-effect of post-poning the
+ * freeing of deleted handlers.
+ */
+bool aio_dispatch(AioContext *ctx, bool dispatch_fds)
{
- AioHandler *node;
+ AioHandler *node = NULL;
bool progress = false;
/*
@@ -308,7 +312,9 @@ bool aio_dispatch(AioContext *ctx)
* We have to walk very carefully in case aio_set_fd_handler is
* called while we're walking.
*/
- node = QLIST_FIRST(&ctx->aio_handlers);
+ if (dispatch_fds) {
+ node = QLIST_FIRST(&ctx->aio_handlers);
+ }
while (node) {
AioHandler *tmp;
int revents;
@@ -473,7 +479,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
ctx->walking_handlers--;
/* Run dispatch even if there were no readable fds to run timers */
- if (aio_dispatch(ctx)) {
+ if (aio_dispatch(ctx, ret > 0)) {
progress = true;
}
diff --git a/aio-win32.c b/aio-win32.c
index c8c249e..3ef8ea4 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -271,12 +271,14 @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event)
return progress;
}
-bool aio_dispatch(AioContext *ctx)
+bool aio_dispatch(AioContext *ctx, bool dispatch_fds)
{
bool progress;
progress = aio_bh_poll(ctx);
- progress |= aio_dispatch_handlers(ctx, INVALID_HANDLE_VALUE);
+ if (dispatch_fds) {
+ progress |= aio_dispatch_handlers(ctx, INVALID_HANDLE_VALUE);
+ }
progress |= timerlistgroup_run_timers(&ctx->tlg);
return progress;
}
diff --git a/async.c b/async.c
index b2de360..52bdb8f 100644
--- a/async.c
+++ b/async.c
@@ -251,7 +251,7 @@ aio_ctx_dispatch(GSource *source,
AioContext *ctx = (AioContext *) source;
assert(callback == NULL);
- aio_dispatch(ctx);
+ aio_dispatch(ctx, true);
return true;
}
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 02/13] aio: add AioPollFn and io_poll() interface
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 01/13] aio: add flag to skip fds to aio_dispatch() Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 03/13] aio: add polling mode to AioContext Stefan Hajnoczi
` (14 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
The new AioPollFn io_poll() argument to aio_set_fd_handler() and
aio_set_event_handler() is used in the next patch.
Keep this code change separate due to the number of files it touches.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/aio.h | 5 ++++-
aio-posix.c | 8 +++++---
async.c | 5 +++--
block/curl.c | 8 ++++----
block/iscsi.c | 3 ++-
block/linux-aio.c | 4 ++--
block/nbd-client.c | 8 ++++----
block/nfs.c | 7 ++++---
block/sheepdog.c | 26 +++++++++++++-------------
block/ssh.c | 4 ++--
block/win32-aio.c | 4 ++--
hw/virtio/virtio.c | 4 ++--
iohandler.c | 2 +-
nbd/server.c | 9 ++++-----
stubs/set-fd-handler.c | 1 +
tests/test-aio.c | 4 ++--
util/event_notifier-posix.c | 2 +-
17 files changed, 56 insertions(+), 48 deletions(-)
diff --git a/include/block/aio.h b/include/block/aio.h
index 200f607..9690b71 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -44,6 +44,7 @@ void qemu_aio_ref(void *p);
typedef struct AioHandler AioHandler;
typedef void QEMUBHFunc(void *opaque);
+typedef bool AioPollFn(void *opaque);
typedef void IOHandler(void *opaque);
struct ThreadPool;
@@ -329,6 +330,7 @@ void aio_set_fd_handler(AioContext *ctx,
bool is_external,
IOHandler *io_read,
IOHandler *io_write,
+ AioPollFn *io_poll,
void *opaque);
/* Register an event notifier and associated callbacks. Behaves very similarly
@@ -341,7 +343,8 @@ void aio_set_fd_handler(AioContext *ctx,
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *notifier,
bool is_external,
- EventNotifierHandler *io_read);
+ EventNotifierHandler *io_read,
+ AioPollFn *io_poll);
/* Return a GSource that lets the main loop poll the file descriptors attached
* to this AioContext.
diff --git a/aio-posix.c b/aio-posix.c
index 4ac2346..feab8e8 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -200,6 +200,7 @@ void aio_set_fd_handler(AioContext *ctx,
bool is_external,
IOHandler *io_read,
IOHandler *io_write,
+ AioPollFn *io_poll,
void *opaque)
{
AioHandler *node;
@@ -258,10 +259,11 @@ void aio_set_fd_handler(AioContext *ctx,
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *notifier,
bool is_external,
- EventNotifierHandler *io_read)
+ EventNotifierHandler *io_read,
+ AioPollFn *io_poll)
{
- aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
- is_external, (IOHandler *)io_read, NULL, notifier);
+ aio_set_fd_handler(ctx, event_notifier_get_fd(notifier), is_external,
+ (IOHandler *)io_read, NULL, io_poll, notifier);
}
bool aio_prepare(AioContext *ctx)
diff --git a/async.c b/async.c
index 52bdb8f..678845f 100644
--- a/async.c
+++ b/async.c
@@ -282,7 +282,7 @@ aio_ctx_finalize(GSource *source)
}
qemu_mutex_unlock(&ctx->bh_lock);
- aio_set_event_notifier(ctx, &ctx->notifier, false, NULL);
+ aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL);
event_notifier_cleanup(&ctx->notifier);
qemu_rec_mutex_destroy(&ctx->lock);
qemu_mutex_destroy(&ctx->bh_lock);
@@ -366,7 +366,8 @@ AioContext *aio_context_new(Error **errp)
aio_set_event_notifier(ctx, &ctx->notifier,
false,
(EventNotifierHandler *)
- event_notifier_dummy_cb);
+ event_notifier_dummy_cb,
+ NULL);
#ifdef CONFIG_LINUX_AIO
ctx->linux_aio = NULL;
#endif
diff --git a/block/curl.c b/block/curl.c
index 0404c1b..792fef8 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -192,19 +192,19 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
switch (action) {
case CURL_POLL_IN:
aio_set_fd_handler(s->aio_context, fd, false,
- curl_multi_read, NULL, state);
+ curl_multi_read, NULL, NULL, state);
break;
case CURL_POLL_OUT:
aio_set_fd_handler(s->aio_context, fd, false,
- NULL, curl_multi_do, state);
+ NULL, curl_multi_do, NULL, state);
break;
case CURL_POLL_INOUT:
aio_set_fd_handler(s->aio_context, fd, false,
- curl_multi_read, curl_multi_do, state);
+ curl_multi_read, curl_multi_do, NULL, state);
break;
case CURL_POLL_REMOVE:
aio_set_fd_handler(s->aio_context, fd, false,
- NULL, NULL, NULL);
+ NULL, NULL, NULL, NULL);
break;
}
diff --git a/block/iscsi.c b/block/iscsi.c
index 0960929..6aeeb9e 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -362,6 +362,7 @@ iscsi_set_events(IscsiLun *iscsilun)
false,
(ev & POLLIN) ? iscsi_process_read : NULL,
(ev & POLLOUT) ? iscsi_process_write : NULL,
+ NULL,
iscsilun);
iscsilun->events = ev;
}
@@ -1526,7 +1527,7 @@ static void iscsi_detach_aio_context(BlockDriverState *bs)
IscsiLun *iscsilun = bs->opaque;
aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsilun->iscsi),
- false, NULL, NULL, NULL);
+ false, NULL, NULL, NULL, NULL);
iscsilun->events = 0;
if (iscsilun->nop_timer) {
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 1685ec2..69c4ed5 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -439,7 +439,7 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context)
{
- aio_set_event_notifier(old_context, &s->e, false, NULL);
+ aio_set_event_notifier(old_context, &s->e, false, NULL, NULL);
qemu_bh_delete(s->completion_bh);
}
@@ -448,7 +448,7 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context)
s->aio_context = new_context;
s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
aio_set_event_notifier(new_context, &s->e, false,
- qemu_laio_completion_cb);
+ qemu_laio_completion_cb, NULL);
}
LinuxAioState *laio_init(void)
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 3779c6c..06f1532 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -145,7 +145,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
aio_context = bdrv_get_aio_context(bs);
aio_set_fd_handler(aio_context, s->sioc->fd, false,
- nbd_reply_ready, nbd_restart_write, bs);
+ nbd_reply_ready, nbd_restart_write, NULL, bs);
if (qiov) {
qio_channel_set_cork(s->ioc, true);
rc = nbd_send_request(s->ioc, request);
@@ -161,7 +161,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
rc = nbd_send_request(s->ioc, request);
}
aio_set_fd_handler(aio_context, s->sioc->fd, false,
- nbd_reply_ready, NULL, bs);
+ nbd_reply_ready, NULL, NULL, bs);
s->send_coroutine = NULL;
qemu_co_mutex_unlock(&s->send_mutex);
return rc;
@@ -366,14 +366,14 @@ void nbd_client_detach_aio_context(BlockDriverState *bs)
{
aio_set_fd_handler(bdrv_get_aio_context(bs),
nbd_get_client_session(bs)->sioc->fd,
- false, NULL, NULL, NULL);
+ false, NULL, NULL, NULL, NULL);
}
void nbd_client_attach_aio_context(BlockDriverState *bs,
AioContext *new_context)
{
aio_set_fd_handler(new_context, nbd_get_client_session(bs)->sioc->fd,
- false, nbd_reply_ready, NULL, bs);
+ false, nbd_reply_ready, NULL, NULL, bs);
}
void nbd_client_close(BlockDriverState *bs)
diff --git a/block/nfs.c b/block/nfs.c
index d082783..fd30eb3 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -197,7 +197,8 @@ static void nfs_set_events(NFSClient *client)
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
false,
(ev & POLLIN) ? nfs_process_read : NULL,
- (ev & POLLOUT) ? nfs_process_write : NULL, client);
+ (ev & POLLOUT) ? nfs_process_write : NULL,
+ NULL, client);
}
client->events = ev;
@@ -395,7 +396,7 @@ static void nfs_detach_aio_context(BlockDriverState *bs)
NFSClient *client = bs->opaque;
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
- false, NULL, NULL, NULL);
+ false, NULL, NULL, NULL, NULL);
client->events = 0;
}
@@ -415,7 +416,7 @@ static void nfs_client_close(NFSClient *client)
nfs_close(client->context, client->fh);
}
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
- false, NULL, NULL, NULL);
+ false, NULL, NULL, NULL, NULL);
nfs_destroy_context(client->context);
}
memset(client, 0, sizeof(NFSClient));
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 4c9af89..5637e0c 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -664,7 +664,7 @@ static coroutine_fn void do_co_req(void *opaque)
co = qemu_coroutine_self();
aio_set_fd_handler(srco->aio_context, sockfd, false,
- NULL, restart_co_req, co);
+ NULL, restart_co_req, NULL, co);
ret = send_co_req(sockfd, hdr, data, wlen);
if (ret < 0) {
@@ -672,7 +672,7 @@ static coroutine_fn void do_co_req(void *opaque)
}
aio_set_fd_handler(srco->aio_context, sockfd, false,
- restart_co_req, NULL, co);
+ restart_co_req, NULL, NULL, co);
ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
if (ret != sizeof(*hdr)) {
@@ -698,7 +698,7 @@ out:
/* there is at most one request for this sockfd, so it is safe to
* set each handler to NULL. */
aio_set_fd_handler(srco->aio_context, sockfd, false,
- NULL, NULL, NULL);
+ NULL, NULL, NULL, NULL);
srco->ret = ret;
srco->finished = true;
@@ -760,7 +760,7 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
AIOReq *aio_req, *next;
aio_set_fd_handler(s->aio_context, s->fd, false, NULL,
- NULL, NULL);
+ NULL, NULL, NULL);
close(s->fd);
s->fd = -1;
@@ -964,7 +964,7 @@ static int get_sheep_fd(BDRVSheepdogState *s, Error **errp)
}
aio_set_fd_handler(s->aio_context, fd, false,
- co_read_response, NULL, s);
+ co_read_response, NULL, NULL, s);
return fd;
}
@@ -1226,7 +1226,7 @@ static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
qemu_co_mutex_lock(&s->lock);
s->co_send = qemu_coroutine_self();
aio_set_fd_handler(s->aio_context, s->fd, false,
- co_read_response, co_write_request, s);
+ co_read_response, co_write_request, NULL, s);
socket_set_cork(s->fd, 1);
/* send a header */
@@ -1245,7 +1245,7 @@ static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
out:
socket_set_cork(s->fd, 0);
aio_set_fd_handler(s->aio_context, s->fd, false,
- co_read_response, NULL, s);
+ co_read_response, NULL, NULL, s);
s->co_send = NULL;
qemu_co_mutex_unlock(&s->lock);
}
@@ -1396,7 +1396,7 @@ static void sd_detach_aio_context(BlockDriverState *bs)
BDRVSheepdogState *s = bs->opaque;
aio_set_fd_handler(s->aio_context, s->fd, false, NULL,
- NULL, NULL);
+ NULL, NULL, NULL);
}
static void sd_attach_aio_context(BlockDriverState *bs,
@@ -1406,7 +1406,7 @@ static void sd_attach_aio_context(BlockDriverState *bs,
s->aio_context = new_context;
aio_set_fd_handler(new_context, s->fd, false,
- co_read_response, NULL, s);
+ co_read_response, NULL, NULL, s);
}
/* TODO Convert to fine grained options */
@@ -1520,7 +1520,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,
return 0;
out:
aio_set_fd_handler(bdrv_get_aio_context(bs), s->fd,
- false, NULL, NULL, NULL);
+ false, NULL, NULL, NULL, NULL);
if (s->fd >= 0) {
closesocket(s->fd);
}
@@ -1559,7 +1559,7 @@ static void sd_reopen_commit(BDRVReopenState *state)
if (s->fd) {
aio_set_fd_handler(s->aio_context, s->fd, false,
- NULL, NULL, NULL);
+ NULL, NULL, NULL, NULL);
closesocket(s->fd);
}
@@ -1583,7 +1583,7 @@ static void sd_reopen_abort(BDRVReopenState *state)
if (re_s->fd) {
aio_set_fd_handler(s->aio_context, re_s->fd, false,
- NULL, NULL, NULL);
+ NULL, NULL, NULL, NULL);
closesocket(re_s->fd);
}
@@ -1972,7 +1972,7 @@ static void sd_close(BlockDriverState *bs)
}
aio_set_fd_handler(bdrv_get_aio_context(bs), s->fd,
- false, NULL, NULL, NULL);
+ false, NULL, NULL, NULL, NULL);
closesocket(s->fd);
g_free(s->host_spec);
}
diff --git a/block/ssh.c b/block/ssh.c
index 15ed281..e0edf20 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -911,7 +911,7 @@ static coroutine_fn void set_fd_handler(BDRVSSHState *s, BlockDriverState *bs)
rd_handler, wr_handler);
aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
- false, rd_handler, wr_handler, co);
+ false, rd_handler, wr_handler, NULL, co);
}
static coroutine_fn void clear_fd_handler(BDRVSSHState *s,
@@ -919,7 +919,7 @@ static coroutine_fn void clear_fd_handler(BDRVSSHState *s,
{
DPRINTF("s->sock=%d", s->sock);
aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
- false, NULL, NULL, NULL);
+ false, NULL, NULL, NULL, NULL);
}
/* A non-blocking call returned EAGAIN, so yield, ensuring the
diff --git a/block/win32-aio.c b/block/win32-aio.c
index 95e3ab1..8cdf73b 100644
--- a/block/win32-aio.c
+++ b/block/win32-aio.c
@@ -175,7 +175,7 @@ int win32_aio_attach(QEMUWin32AIOState *aio, HANDLE hfile)
void win32_aio_detach_aio_context(QEMUWin32AIOState *aio,
AioContext *old_context)
{
- aio_set_event_notifier(old_context, &aio->e, false, NULL);
+ aio_set_event_notifier(old_context, &aio->e, false, NULL, NULL);
aio->is_aio_context_attached = false;
}
@@ -184,7 +184,7 @@ void win32_aio_attach_aio_context(QEMUWin32AIOState *aio,
{
aio->is_aio_context_attached = true;
aio_set_event_notifier(new_context, &aio->e, false,
- win32_aio_completion_cb);
+ win32_aio_completion_cb, NULL);
}
QEMUWin32AIOState *win32_aio_init(void)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 1af2de2..6f8ca25 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2053,9 +2053,9 @@ void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
if (handle_output) {
vq->handle_aio_output = handle_output;
aio_set_event_notifier(ctx, &vq->host_notifier, true,
- virtio_queue_host_notifier_aio_read);
+ virtio_queue_host_notifier_aio_read, NULL);
} else {
- aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL);
+ aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL);
/* Test and clear notifier before after disabling event,
* in case poll callback didn't have time to run. */
virtio_queue_host_notifier_aio_read(&vq->host_notifier);
diff --git a/iohandler.c b/iohandler.c
index f2fc8a9..eb625d9 100644
--- a/iohandler.c
+++ b/iohandler.c
@@ -63,7 +63,7 @@ void qemu_set_fd_handler(int fd,
{
iohandler_init();
aio_set_fd_handler(iohandler_ctx, fd, false,
- fd_read, fd_write, opaque);
+ fd_read, fd_write, NULL, opaque);
}
/* reaping of zombies. right now we're not passing the status to
diff --git a/nbd/server.c b/nbd/server.c
index 5b76261..efe5cb8 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1366,19 +1366,18 @@ static void nbd_restart_write(void *opaque)
static void nbd_set_handlers(NBDClient *client)
{
if (client->exp && client->exp->ctx) {
- aio_set_fd_handler(client->exp->ctx, client->sioc->fd,
- true,
+ aio_set_fd_handler(client->exp->ctx, client->sioc->fd, true,
client->can_read ? nbd_read : NULL,
client->send_coroutine ? nbd_restart_write : NULL,
- client);
+ NULL, client);
}
}
static void nbd_unset_handlers(NBDClient *client)
{
if (client->exp && client->exp->ctx) {
- aio_set_fd_handler(client->exp->ctx, client->sioc->fd,
- true, NULL, NULL, NULL);
+ aio_set_fd_handler(client->exp->ctx, client->sioc->fd, true, NULL,
+ NULL, NULL, NULL);
}
}
diff --git a/stubs/set-fd-handler.c b/stubs/set-fd-handler.c
index 06a5da4..acbe65c 100644
--- a/stubs/set-fd-handler.c
+++ b/stubs/set-fd-handler.c
@@ -15,6 +15,7 @@ void aio_set_fd_handler(AioContext *ctx,
bool is_external,
IOHandler *io_read,
IOHandler *io_write,
+ AioPollFn *io_poll,
void *opaque)
{
abort();
diff --git a/tests/test-aio.c b/tests/test-aio.c
index 5be99f8..2754f15 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -128,7 +128,7 @@ static void *test_acquire_thread(void *opaque)
static void set_event_notifier(AioContext *ctx, EventNotifier *notifier,
EventNotifierHandler *handler)
{
- aio_set_event_notifier(ctx, notifier, false, handler);
+ aio_set_event_notifier(ctx, notifier, false, handler, NULL);
}
static void dummy_notifier_read(EventNotifier *n)
@@ -388,7 +388,7 @@ static void test_aio_external_client(void)
for (i = 1; i < 3; i++) {
EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true };
event_notifier_init(&data.e, false);
- aio_set_event_notifier(ctx, &data.e, true, event_ready_cb);
+ aio_set_event_notifier(ctx, &data.e, true, event_ready_cb, NULL);
event_notifier_set(&data.e);
for (j = 0; j < i; j++) {
aio_disable_external(ctx);
diff --git a/util/event_notifier-posix.c b/util/event_notifier-posix.c
index c1f0d79..f2aacfc 100644
--- a/util/event_notifier-posix.c
+++ b/util/event_notifier-posix.c
@@ -95,7 +95,7 @@ int event_notifier_set_handler(EventNotifier *e,
EventNotifierHandler *handler)
{
aio_set_fd_handler(iohandler_get_aio_context(), e->rfd, is_external,
- (IOHandler *)handler, NULL, e);
+ (IOHandler *)handler, NULL, NULL, e);
return 0;
}
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 03/13] aio: add polling mode to AioContext
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 01/13] aio: add flag to skip fds to aio_dispatch() Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 02/13] aio: add AioPollFn and io_poll() interface Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers Stefan Hajnoczi
` (13 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
The AioContext event loop uses ppoll(2) or epoll_wait(2) to monitor file
descriptors or until a timer expires. In cases like virtqueues, Linux
AIO, and ThreadPool it is technically possible to wait for events via
polling (i.e. continuously checking for events without blocking).
Polling can be faster than blocking syscalls because file descriptors,
the process scheduler, and system calls are bypassed.
The main disadvantage to polling is that it increases CPU utilization.
In classic polling configuration a full host CPU thread might run at
100% to respond to events as quickly as possible. This patch implements
a timeout so we fall back to blocking syscalls if polling detects no
activity. After the timeout no CPU cycles are wasted on polling until
the next event loop iteration.
The run_poll_handlers_begin() and run_poll_handlers_end() trace events
are added to aid performance analysis and troubleshooting. If you need
to know whether polling mode is being used, trace these events to find
out.
Note that the AioContext is now re-acquired before disabling notify_me
in the non-polling case. This makes the code cleaner since notify_me
was enabled outside the non-polling AioContext release region. This
change is correct since it's safe to keep notify_me enabled longer
(disabling is an optimization) but potentially causes unnecessary
event_notifer_set() calls. I think the chance of performance regression
is small here.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/aio.h | 16 ++++++
aio-posix.c | 163 +++++++++++++++++++++++++++++++++++++++++++---------
aio-win32.c | 10 +++-
async.c | 13 ++++-
trace-events | 4 ++
5 files changed, 177 insertions(+), 29 deletions(-)
diff --git a/include/block/aio.h b/include/block/aio.h
index 9690b71..bd5b619 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -131,6 +131,12 @@ struct AioContext {
int external_disable_cnt;
+ /* Number of AioHandlers without .io_poll() */
+ int poll_disable_cnt;
+
+ /* Maximum polling time in nanoseconds */
+ int64_t poll_max_ns;
+
/* epoll(7) state used when built with CONFIG_EPOLL */
int epollfd;
bool epoll_enabled;
@@ -481,4 +487,14 @@ static inline bool aio_context_in_iothread(AioContext *ctx)
*/
void aio_context_setup(AioContext *ctx);
+/**
+ * aio_context_set_poll_params:
+ * @ctx: the aio context
+ * @max_ns: how long to busy poll for, in nanoseconds
+ *
+ * Poll mode can be disabled by setting poll_max_ns to 0.
+ */
+void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
+ Error **errp);
+
#endif
diff --git a/aio-posix.c b/aio-posix.c
index feab8e8..c6adddb 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -18,6 +18,8 @@
#include "block/block.h"
#include "qemu/queue.h"
#include "qemu/sockets.h"
+#include "qemu/cutils.h"
+#include "trace.h"
#ifdef CONFIG_EPOLL_CREATE1
#include <sys/epoll.h>
#endif
@@ -27,6 +29,7 @@ struct AioHandler
GPollFD pfd;
IOHandler *io_read;
IOHandler *io_write;
+ AioPollFn *io_poll;
int deleted;
void *opaque;
bool is_external;
@@ -210,7 +213,7 @@ void aio_set_fd_handler(AioContext *ctx,
node = find_aio_handler(ctx, fd);
/* Are we deleting the fd handler? */
- if (!io_read && !io_write) {
+ if (!io_read && !io_write && !io_poll) {
if (node == NULL) {
return;
}
@@ -229,6 +232,10 @@ void aio_set_fd_handler(AioContext *ctx,
QLIST_REMOVE(node, node);
deleted = true;
}
+
+ if (!node->io_poll) {
+ ctx->poll_disable_cnt--;
+ }
} else {
if (node == NULL) {
/* Alloc and insert if it's not already there */
@@ -238,10 +245,16 @@ void aio_set_fd_handler(AioContext *ctx,
g_source_add_poll(&ctx->source, &node->pfd);
is_new = true;
+
+ ctx->poll_disable_cnt += !io_poll;
+ } else {
+ ctx->poll_disable_cnt += !io_poll - !node->io_poll;
}
+
/* Update handler with latest information */
node->io_read = io_read;
node->io_write = io_write;
+ node->io_poll = io_poll;
node->opaque = opaque;
node->is_external = is_external;
@@ -251,6 +264,7 @@ void aio_set_fd_handler(AioContext *ctx,
aio_epoll_update(ctx, node, is_new);
aio_notify(ctx);
+
if (deleted) {
g_free(node);
}
@@ -408,10 +422,83 @@ static void add_pollfd(AioHandler *node)
npfd++;
}
+/* run_poll_handlers:
+ * @ctx: the AioContext
+ * @max_ns: maximum time to poll for, in nanoseconds
+ *
+ * Polls for a given time.
+ *
+ * Note that ctx->notify_me must be non-zero so this function can detect
+ * aio_notify().
+ *
+ * Note that the caller must have incremented ctx->walking_handlers.
+ *
+ * Returns: true if progress was made, false otherwise
+ */
+static bool run_poll_handlers(AioContext *ctx, int64_t max_ns)
+{
+ bool progress = false;
+ int64_t end_time;
+
+ assert(ctx->notify_me);
+ assert(ctx->walking_handlers > 0);
+ assert(ctx->poll_disable_cnt == 0);
+
+ trace_run_poll_handlers_begin(ctx, max_ns);
+
+ end_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + max_ns;
+
+ do {
+ AioHandler *node;
+
+ QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+ if (!node->deleted && node->io_poll &&
+ node->io_poll(node->opaque)) {
+ progress = true;
+ }
+
+ /* Caller handles freeing deleted nodes. Don't do it here. */
+ }
+ } while (!progress && qemu_clock_get_ns(QEMU_CLOCK_REALTIME) < end_time);
+
+ trace_run_poll_handlers_end(ctx, progress);
+
+ return progress;
+}
+
+/* try_poll_mode:
+ * @ctx: the AioContext
+ * @blocking: polling is only attempted when blocking is true
+ *
+ * If blocking is true then ctx->notify_me must be non-zero so this function
+ * can detect aio_notify().
+ *
+ * Note that the caller must have incremented ctx->walking_handlers.
+ *
+ * Returns: true if progress was made, false otherwise
+ */
+static bool try_poll_mode(AioContext *ctx, bool blocking)
+{
+ if (blocking && ctx->poll_max_ns && ctx->poll_disable_cnt == 0) {
+ /* See qemu_soonest_timeout() uint64_t hack */
+ int64_t max_ns = MIN((uint64_t)aio_compute_timeout(ctx),
+ (uint64_t)ctx->poll_max_ns);
+
+ if (max_ns) {
+ if (run_poll_handlers(ctx, max_ns)) {
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
bool aio_poll(AioContext *ctx, bool blocking)
{
AioHandler *node;
- int i, ret;
+ int i;
+ int ret = 0;
bool progress;
int64_t timeout;
@@ -431,42 +518,47 @@ bool aio_poll(AioContext *ctx, bool blocking)
ctx->walking_handlers++;
- assert(npfd == 0);
+ if (try_poll_mode(ctx, blocking)) {
+ progress = true;
+ } else {
+ assert(npfd == 0);
- /* fill pollfds */
+ /* fill pollfds */
- if (!aio_epoll_enabled(ctx)) {
- QLIST_FOREACH(node, &ctx->aio_handlers, node) {
- if (!node->deleted && node->pfd.events
- && aio_node_check(ctx, node->is_external)) {
- add_pollfd(node);
+ if (!aio_epoll_enabled(ctx)) {
+ QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+ if (!node->deleted && node->pfd.events
+ && aio_node_check(ctx, node->is_external)) {
+ add_pollfd(node);
+ }
}
}
- }
- timeout = blocking ? aio_compute_timeout(ctx) : 0;
+ timeout = blocking ? aio_compute_timeout(ctx) : 0;
- /* wait until next event */
- if (timeout) {
- aio_context_release(ctx);
- }
- if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
- AioHandler epoll_handler;
+ /* wait until next event */
+ if (timeout) {
+ aio_context_release(ctx);
+ }
+ if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
+ AioHandler epoll_handler;
- epoll_handler.pfd.fd = ctx->epollfd;
- epoll_handler.pfd.events = G_IO_IN | G_IO_OUT | G_IO_HUP | G_IO_ERR;
- npfd = 0;
- add_pollfd(&epoll_handler);
- ret = aio_epoll(ctx, pollfds, npfd, timeout);
- } else {
- ret = qemu_poll_ns(pollfds, npfd, timeout);
+ epoll_handler.pfd.fd = ctx->epollfd;
+ epoll_handler.pfd.events = G_IO_IN | G_IO_OUT | G_IO_HUP | G_IO_ERR;
+ npfd = 0;
+ add_pollfd(&epoll_handler);
+ ret = aio_epoll(ctx, pollfds, npfd, timeout);
+ } else {
+ ret = qemu_poll_ns(pollfds, npfd, timeout);
+ }
+ if (timeout) {
+ aio_context_acquire(ctx);
+ }
}
+
if (blocking) {
atomic_sub(&ctx->notify_me, 2);
}
- if (timeout) {
- aio_context_acquire(ctx);
- }
aio_notify_accept(ctx);
@@ -492,6 +584,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
void aio_context_setup(AioContext *ctx)
{
+ /* TODO remove this in final patch submission */
+ if (getenv("QEMU_AIO_POLL_MAX_NS")) {
+ fprintf(stderr, "The QEMU_AIO_POLL_MAX_NS environment variable has "
+ "been replaced with -object iothread,poll-max-ns=NUM\n");
+ exit(1);
+ }
+
#ifdef CONFIG_EPOLL_CREATE1
assert(!ctx->epollfd);
ctx->epollfd = epoll_create1(EPOLL_CLOEXEC);
@@ -503,3 +602,13 @@ void aio_context_setup(AioContext *ctx)
}
#endif
}
+
+void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns, Error **errp)
+{
+ /* No thread synchronization here, it doesn't matter if an incorrect poll
+ * timeout is used once.
+ */
+ ctx->poll_max_ns = max_ns;
+
+ aio_notify(ctx);
+}
diff --git a/aio-win32.c b/aio-win32.c
index 3ef8ea4..0a6e91b 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -20,6 +20,7 @@
#include "block/block.h"
#include "qemu/queue.h"
#include "qemu/sockets.h"
+#include "qapi/error.h"
struct AioHandler {
EventNotifier *e;
@@ -38,6 +39,7 @@ void aio_set_fd_handler(AioContext *ctx,
bool is_external,
IOHandler *io_read,
IOHandler *io_write,
+ AioPollFn *io_poll,
void *opaque)
{
/* fd is a SOCKET in our case */
@@ -103,7 +105,8 @@ void aio_set_fd_handler(AioContext *ctx,
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *e,
bool is_external,
- EventNotifierHandler *io_notify)
+ EventNotifierHandler *io_notify,
+ AioPollFn *io_poll)
{
AioHandler *node;
@@ -376,3 +379,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
void aio_context_setup(AioContext *ctx)
{
}
+
+void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns, Error **errp)
+{
+ error_setg(errp, "AioContext polling is not implemented on Windows");
+}
diff --git a/async.c b/async.c
index 678845f..29abf40 100644
--- a/async.c
+++ b/async.c
@@ -349,6 +349,15 @@ static void event_notifier_dummy_cb(EventNotifier *e)
{
}
+/* Returns true if aio_notify() was called (e.g. a BH was scheduled) */
+static bool event_notifier_poll(void *opaque)
+{
+ EventNotifier *e = opaque;
+ AioContext *ctx = container_of(e, AioContext, notifier);
+
+ return atomic_read(&ctx->notified);
+}
+
AioContext *aio_context_new(Error **errp)
{
int ret;
@@ -367,7 +376,7 @@ AioContext *aio_context_new(Error **errp)
false,
(EventNotifierHandler *)
event_notifier_dummy_cb,
- NULL);
+ event_notifier_poll);
#ifdef CONFIG_LINUX_AIO
ctx->linux_aio = NULL;
#endif
@@ -376,6 +385,8 @@ AioContext *aio_context_new(Error **errp)
qemu_rec_mutex_init(&ctx->lock);
timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
+ ctx->poll_max_ns = 0;
+
return ctx;
fail:
g_source_destroy(&ctx->source);
diff --git a/trace-events b/trace-events
index f74e1d3..7fe3a1b 100644
--- a/trace-events
+++ b/trace-events
@@ -25,6 +25,10 @@
#
# The <format-string> should be a sprintf()-compatible format string.
+# aio-posix.c
+run_poll_handlers_begin(void *ctx, int64_t max_ns) "ctx %p max_ns %"PRId64
+run_poll_handlers_end(void *ctx, bool progress) "ctx %p progress %d"
+
# thread-pool.c
thread_pool_submit(void *pool, void *req, void *opaque) "pool %p req %p opaque %p"
thread_pool_complete(void *pool, void *req, void *opaque, int ret) "pool %p req %p opaque %p ret %d"
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (2 preceding siblings ...)
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 03/13] aio: add polling mode to AioContext Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-05 0:54 ` Fam Zheng
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 05/13] linux-aio: poll ring for completions Stefan Hajnoczi
` (12 subsequent siblings)
16 siblings, 1 reply; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
Add an AioContext poll handler to detect new virtqueue buffers without
waiting for a guest->host notification.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/virtio/virtio.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 6f8ca25..3870411 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2047,13 +2047,27 @@ static void virtio_queue_host_notifier_aio_read(EventNotifier *n)
}
}
+static bool virtio_queue_host_notifier_aio_poll(void *opaque)
+{
+ EventNotifier *n = opaque;
+ VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
+
+ if (virtio_queue_empty(vq)) {
+ return false;
+ }
+
+ virtio_queue_notify_aio_vq(vq);
+ return true;
+}
+
void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
VirtIOHandleOutput handle_output)
{
if (handle_output) {
vq->handle_aio_output = handle_output;
aio_set_event_notifier(ctx, &vq->host_notifier, true,
- virtio_queue_host_notifier_aio_read, NULL);
+ virtio_queue_host_notifier_aio_read,
+ virtio_queue_host_notifier_aio_poll);
} else {
aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL);
/* Test and clear notifier before after disabling event,
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 05/13] linux-aio: poll ring for completions
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (3 preceding siblings ...)
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 06/13] iothread: add polling parameters Stefan Hajnoczi
` (11 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
The Linux AIO userspace ABI includes a ring that is shared with the
kernel. This allows userspace programs to process completions without
system calls.
Add an AioContext poll handler to check for completions in the ring.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/linux-aio.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 69c4ed5..03ab741 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -255,6 +255,20 @@ static void qemu_laio_completion_cb(EventNotifier *e)
}
}
+static bool qemu_laio_poll_cb(void *opaque)
+{
+ EventNotifier *e = opaque;
+ LinuxAioState *s = container_of(e, LinuxAioState, e);
+ struct io_event *events;
+
+ if (!io_getevents_peek(s->ctx, &events)) {
+ return false;
+ }
+
+ qemu_laio_process_completions_and_submit(s);
+ return true;
+}
+
static void laio_cancel(BlockAIOCB *blockacb)
{
struct qemu_laiocb *laiocb = (struct qemu_laiocb *)blockacb;
@@ -448,7 +462,8 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context)
s->aio_context = new_context;
s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
aio_set_event_notifier(new_context, &s->e, false,
- qemu_laio_completion_cb, NULL);
+ qemu_laio_completion_cb,
+ qemu_laio_poll_cb);
}
LinuxAioState *laio_init(void)
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 06/13] iothread: add polling parameters
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (4 preceding siblings ...)
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 05/13] linux-aio: poll ring for completions Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 07/13] virtio-blk: suppress virtqueue kick during processing Stefan Hajnoczi
` (10 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
Poll mode can be configured with -object iothread,poll-max-ns=NUM.
Polling is disabled with a value of 0 nanoseconds.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/sysemu/iothread.h | 3 +++
iothread.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 53 insertions(+)
diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
index 68ac2de..314e163 100644
--- a/include/sysemu/iothread.h
+++ b/include/sysemu/iothread.h
@@ -28,6 +28,9 @@ typedef struct {
QemuCond init_done_cond; /* is thread initialization done? */
bool stopping;
int thread_id;
+
+ /* AioContext poll parameters */
+ int64_t poll_max_ns;
} IOThread;
#define IOTHREAD(obj) \
diff --git a/iothread.c b/iothread.c
index bd70344..8dfd10d 100644
--- a/iothread.c
+++ b/iothread.c
@@ -98,6 +98,15 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
return;
}
+ aio_context_set_poll_params(iothread->ctx, iothread->poll_max_ns,
+ &local_error);
+ if (local_error) {
+ error_propagate(errp, local_error);
+ aio_context_unref(iothread->ctx);
+ iothread->ctx = NULL;
+ return;
+ }
+
qemu_mutex_init(&iothread->init_done_lock);
qemu_cond_init(&iothread->init_done_cond);
@@ -120,10 +129,51 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
qemu_mutex_unlock(&iothread->init_done_lock);
}
+static void iothread_get_poll_max_ns(Object *obj, Visitor *v,
+ const char *name, void *opaque, Error **errp)
+{
+ IOThread *iothread = IOTHREAD(obj);
+
+ visit_type_int64(v, name, &iothread->poll_max_ns, errp);
+}
+
+static void iothread_set_poll_max_ns(Object *obj, Visitor *v,
+ const char *name, void *opaque, Error **errp)
+{
+ IOThread *iothread = IOTHREAD(obj);
+ Error *local_err = NULL;
+ int64_t value;
+
+ visit_type_int64(v, name, &value, &local_err);
+ if (local_err) {
+ goto out;
+ }
+
+ if (value < 0) {
+ error_setg(&local_err, "poll_max_ns value must be in range "
+ "[0, %"PRId64"]", INT64_MAX);
+ goto out;
+ }
+
+ iothread->poll_max_ns = value;
+
+ if (iothread->ctx) {
+ aio_context_set_poll_params(iothread->ctx, value, &local_err);
+ }
+
+out:
+ error_propagate(errp, local_err);
+}
+
static void iothread_class_init(ObjectClass *klass, void *class_data)
{
UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
ucc->complete = iothread_complete;
+
+ object_class_property_add(klass, "poll-max-ns", "int",
+ iothread_get_poll_max_ns,
+ iothread_set_poll_max_ns,
+ NULL, NULL, &error_abort);
}
static const TypeInfo iothread_info = {
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 07/13] virtio-blk: suppress virtqueue kick during processing
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (5 preceding siblings ...)
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 06/13] iothread: add polling parameters Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 08/13] virtio-scsi: " Stefan Hajnoczi
` (9 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
The guest does not need to kick the virtqueue while we are processing
it. This reduces the number of vmexits during periods of heavy I/O.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/block/virtio-blk.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 0c5fd27..50bb0cb 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -588,13 +588,19 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
blk_io_plug(s->blk);
- while ((req = virtio_blk_get_request(s, vq))) {
- if (virtio_blk_handle_request(req, &mrb)) {
- virtqueue_detach_element(req->vq, &req->elem, 0);
- virtio_blk_free_request(req);
- break;
+ do {
+ virtio_queue_set_notification(vq, 0);
+
+ while ((req = virtio_blk_get_request(s, vq))) {
+ if (virtio_blk_handle_request(req, &mrb)) {
+ virtqueue_detach_element(req->vq, &req->elem, 0);
+ virtio_blk_free_request(req);
+ break;
+ }
}
- }
+
+ virtio_queue_set_notification(vq, 1);
+ } while (!virtio_queue_empty(vq));
if (mrb.num_reqs) {
virtio_blk_submit_multireq(s->blk, &mrb);
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 08/13] virtio-scsi: suppress virtqueue kick during processing
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (6 preceding siblings ...)
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 07/13] virtio-blk: suppress virtqueue kick during processing Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 09/13] virtio: turn vq->notification into a nested counter Stefan Hajnoczi
` (8 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
The guest does not need to kick the virtqueue while we are processing
it. This reduces the number of vmexits during periods of heavy I/O.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/scsi/virtio-scsi.c | 36 +++++++++++++++++++++---------------
1 file changed, 21 insertions(+), 15 deletions(-)
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 10fd687..24a6272 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -578,26 +578,32 @@ static void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req)
void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
{
VirtIOSCSIReq *req, *next;
- int ret;
+ int ret = 0;
QTAILQ_HEAD(, VirtIOSCSIReq) reqs = QTAILQ_HEAD_INITIALIZER(reqs);
- while ((req = virtio_scsi_pop_req(s, vq))) {
- ret = virtio_scsi_handle_cmd_req_prepare(s, req);
- if (!ret) {
- QTAILQ_INSERT_TAIL(&reqs, req, next);
- } else if (ret == -EINVAL) {
- /* The device is broken and shouldn't process any request */
- while (!QTAILQ_EMPTY(&reqs)) {
- req = QTAILQ_FIRST(&reqs);
- QTAILQ_REMOVE(&reqs, req, next);
- blk_io_unplug(req->sreq->dev->conf.blk);
- scsi_req_unref(req->sreq);
- virtqueue_detach_element(req->vq, &req->elem, 0);
- virtio_scsi_free_req(req);
+ do {
+ virtio_queue_set_notification(vq, 0);
+
+ while ((req = virtio_scsi_pop_req(s, vq))) {
+ ret = virtio_scsi_handle_cmd_req_prepare(s, req);
+ if (!ret) {
+ QTAILQ_INSERT_TAIL(&reqs, req, next);
+ } else if (ret == -EINVAL) {
+ /* The device is broken and shouldn't process any request */
+ while (!QTAILQ_EMPTY(&reqs)) {
+ req = QTAILQ_FIRST(&reqs);
+ QTAILQ_REMOVE(&reqs, req, next);
+ blk_io_unplug(req->sreq->dev->conf.blk);
+ scsi_req_unref(req->sreq);
+ virtqueue_detach_element(req->vq, &req->elem, 0);
+ virtio_scsi_free_req(req);
+ }
}
}
- }
+
+ virtio_queue_set_notification(vq, 1);
+ } while (ret != -EINVAL && !virtio_queue_empty(vq));
QTAILQ_FOREACH_SAFE(req, &reqs, next, next) {
virtio_scsi_handle_cmd_req_submit(s, req);
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 09/13] virtio: turn vq->notification into a nested counter
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (7 preceding siblings ...)
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 08/13] virtio-scsi: " Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 10/13] aio: add .io_poll_begin/end() callbacks Stefan Hajnoczi
` (7 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
Polling should disable virtqueue notifications but that requires nested
virtio_queue_set_notification() calls. Turn vq->notification into a
counter so it is possible to do nesting.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/virtio/virtio.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 3870411..79b5c36 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -87,8 +87,8 @@ struct VirtQueue
/* Last used index value we have signalled on */
bool signalled_used_valid;
- /* Notification enabled? */
- bool notification;
+ /* Nested host->guest notification disabled counter */
+ unsigned int notification_disabled;
uint16_t queue_index;
@@ -201,7 +201,7 @@ static inline void vring_used_flags_unset_bit(VirtQueue *vq, int mask)
static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
{
hwaddr pa;
- if (!vq->notification) {
+ if (vq->notification_disabled) {
return;
}
pa = vq->vring.used + offsetof(VRingUsed, ring[vq->vring.num]);
@@ -210,7 +210,13 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
void virtio_queue_set_notification(VirtQueue *vq, int enable)
{
- vq->notification = enable;
+ if (enable) {
+ assert(vq->notification_disabled > 0);
+ vq->notification_disabled--;
+ } else {
+ vq->notification_disabled++;
+ }
+
if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
vring_set_avail_event(vq, vring_avail_idx(vq));
} else if (enable) {
@@ -959,7 +965,7 @@ void virtio_reset(void *opaque)
virtio_queue_set_vector(vdev, i, VIRTIO_NO_VECTOR);
vdev->vq[i].signalled_used = 0;
vdev->vq[i].signalled_used_valid = false;
- vdev->vq[i].notification = true;
+ vdev->vq[i].notification_disabled = 0;
vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
vdev->vq[i].inuse = 0;
}
@@ -1770,7 +1776,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
vdev->vq[i].vring.desc = qemu_get_be64(f);
qemu_get_be16s(f, &vdev->vq[i].last_avail_idx);
vdev->vq[i].signalled_used_valid = false;
- vdev->vq[i].notification = true;
+ vdev->vq[i].notification_disabled = 0;
if (vdev->vq[i].vring.desc) {
/* XXX virtio-1 devices */
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 10/13] aio: add .io_poll_begin/end() callbacks
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (8 preceding siblings ...)
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 09/13] virtio: turn vq->notification into a nested counter Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 11/13] virtio: disable virtqueue notifications during polling Stefan Hajnoczi
` (6 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
The begin and end callbacks can be used to prepare for the polling loop
and clean up when polling stops. Note that they may only be called once
for multiple aio_poll() calls if polling continues to succeed. Once
polling fails the end callback is invoked before aio_poll() resumes file
descriptor monitoring.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/aio.h | 20 ++++++++++
aio-posix.c | 105 ++++++++++++++++++++++++++++++++++++++++++++--------
aio-win32.c | 15 ++++++++
3 files changed, 125 insertions(+), 15 deletions(-)
diff --git a/include/block/aio.h b/include/block/aio.h
index bd5b619..cc3272b 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -137,6 +137,9 @@ struct AioContext {
/* Maximum polling time in nanoseconds */
int64_t poll_max_ns;
+ /* Are we in polling mode or monitoring file descriptors? */
+ bool poll_started;
+
/* epoll(7) state used when built with CONFIG_EPOLL */
int epollfd;
bool epoll_enabled;
@@ -339,6 +342,14 @@ void aio_set_fd_handler(AioContext *ctx,
AioPollFn *io_poll,
void *opaque);
+/* Set polling begin/end callbacks for a file descriptor that has already been
+ * registered with aio_set_fd_handler. Do nothing if the file descriptor is
+ * not registered.
+ */
+void aio_set_fd_poll(AioContext *ctx, int fd,
+ IOHandler *io_poll_begin,
+ IOHandler *io_poll_end);
+
/* Register an event notifier and associated callbacks. Behaves very similarly
* to event_notifier_set_handler. Unlike event_notifier_set_handler, these callbacks
* will be invoked when using aio_poll().
@@ -352,6 +363,15 @@ void aio_set_event_notifier(AioContext *ctx,
EventNotifierHandler *io_read,
AioPollFn *io_poll);
+/* Set polling begin/end callbacks for an event notifier that has already been
+ * registered with aio_set_event_notifier. Do nothing if the event notifier is
+ * not registered.
+ */
+void aio_set_event_notifier_poll(AioContext *ctx,
+ EventNotifier *notifier,
+ EventNotifierHandler *io_poll_begin,
+ EventNotifierHandler *io_poll_end);
+
/* Return a GSource that lets the main loop poll the file descriptors attached
* to this AioContext.
*/
diff --git a/aio-posix.c b/aio-posix.c
index c6adddb..5216d82 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -30,6 +30,8 @@ struct AioHandler
IOHandler *io_read;
IOHandler *io_write;
AioPollFn *io_poll;
+ IOHandler *io_poll_begin;
+ IOHandler *io_poll_end;
int deleted;
void *opaque;
bool is_external;
@@ -270,6 +272,20 @@ void aio_set_fd_handler(AioContext *ctx,
}
}
+void aio_set_fd_poll(AioContext *ctx, int fd,
+ IOHandler *io_poll_begin,
+ IOHandler *io_poll_end)
+{
+ AioHandler *node = find_aio_handler(ctx, fd);
+
+ if (!node) {
+ return;
+ }
+
+ node->io_poll_begin = io_poll_begin;
+ node->io_poll_end = io_poll_end;
+}
+
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *notifier,
bool is_external,
@@ -280,8 +296,53 @@ void aio_set_event_notifier(AioContext *ctx,
(IOHandler *)io_read, NULL, io_poll, notifier);
}
+void aio_set_event_notifier_poll(AioContext *ctx,
+ EventNotifier *notifier,
+ EventNotifierHandler *io_poll_begin,
+ EventNotifierHandler *io_poll_end)
+{
+ aio_set_fd_poll(ctx, event_notifier_get_fd(notifier),
+ (IOHandler *)io_poll_begin,
+ (IOHandler *)io_poll_end);
+}
+
+static void poll_set_started(AioContext *ctx, bool started)
+{
+ AioHandler *node;
+
+ if (started == ctx->poll_started) {
+ return;
+ }
+
+ ctx->poll_started = started;
+
+ ctx->walking_handlers++;
+ QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+ IOHandler *fn;
+
+ if (node->deleted) {
+ continue;
+ }
+
+ if (started) {
+ fn = node->io_poll_begin;
+ } else {
+ fn = node->io_poll_end;
+ }
+
+ if (fn) {
+ fn(node->opaque);
+ }
+ }
+ ctx->walking_handlers--;
+}
+
+
bool aio_prepare(AioContext *ctx)
{
+ /* Poll mode cannot be used with glib's event loop, disable it. */
+ poll_set_started(ctx, false);
+
return false;
}
@@ -422,6 +483,23 @@ static void add_pollfd(AioHandler *node)
npfd++;
}
+static bool run_poll_handlers_once(AioContext *ctx)
+{
+ bool progress = false;
+ AioHandler *node;
+
+ QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+ if (!node->deleted && node->io_poll &&
+ node->io_poll(node->opaque)) {
+ progress = true;
+ }
+
+ /* Caller handles freeing deleted nodes. Don't do it here. */
+ }
+
+ return progress;
+}
+
/* run_poll_handlers:
* @ctx: the AioContext
* @max_ns: maximum time to poll for, in nanoseconds
@@ -437,7 +515,7 @@ static void add_pollfd(AioHandler *node)
*/
static bool run_poll_handlers(AioContext *ctx, int64_t max_ns)
{
- bool progress = false;
+ bool progress;
int64_t end_time;
assert(ctx->notify_me);
@@ -449,16 +527,7 @@ static bool run_poll_handlers(AioContext *ctx, int64_t max_ns)
end_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + max_ns;
do {
- AioHandler *node;
-
- QLIST_FOREACH(node, &ctx->aio_handlers, node) {
- if (!node->deleted && node->io_poll &&
- node->io_poll(node->opaque)) {
- progress = true;
- }
-
- /* Caller handles freeing deleted nodes. Don't do it here. */
- }
+ progress = run_poll_handlers_once(ctx);
} while (!progress && qemu_clock_get_ns(QEMU_CLOCK_REALTIME) < end_time);
trace_run_poll_handlers_end(ctx, progress);
@@ -468,10 +537,9 @@ static bool run_poll_handlers(AioContext *ctx, int64_t max_ns)
/* try_poll_mode:
* @ctx: the AioContext
- * @blocking: polling is only attempted when blocking is true
+ * @blocking: busy polling is only attempted when blocking is true
*
- * If blocking is true then ctx->notify_me must be non-zero so this function
- * can detect aio_notify().
+ * ctx->notify_me must be non-zero so this function can detect aio_notify().
*
* Note that the caller must have incremented ctx->walking_handlers.
*
@@ -485,13 +553,20 @@ static bool try_poll_mode(AioContext *ctx, bool blocking)
(uint64_t)ctx->poll_max_ns);
if (max_ns) {
+ poll_set_started(ctx, true);
+
if (run_poll_handlers(ctx, max_ns)) {
return true;
}
}
}
- return false;
+ poll_set_started(ctx, false);
+
+ /* Even if we don't run busy polling, try polling once in case it can make
+ * progress and the caller will be able to avoid ppoll(2)/epoll_wait(2).
+ */
+ return run_poll_handlers_once(ctx);
}
bool aio_poll(AioContext *ctx, bool blocking)
diff --git a/aio-win32.c b/aio-win32.c
index 0a6e91b..d0e40a8 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -102,6 +102,13 @@ void aio_set_fd_handler(AioContext *ctx,
aio_notify(ctx);
}
+void aio_set_fd_poll(AioContext *ctx, int fd,
+ IOHandler *io_poll_begin,
+ IOHandler *io_poll_end)
+{
+ /* Not implemented */
+}
+
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *e,
bool is_external,
@@ -153,6 +160,14 @@ void aio_set_event_notifier(AioContext *ctx,
aio_notify(ctx);
}
+void aio_set_event_notifier_poll(AioContext *ctx,
+ EventNotifier *notifier,
+ EventNotifierHandler *io_poll_begin,
+ EventNotifierHandler *io_poll_end)
+{
+ /* Not implemented */
+}
+
bool aio_prepare(AioContext *ctx)
{
static struct timeval tv0;
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 11/13] virtio: disable virtqueue notifications during polling
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (9 preceding siblings ...)
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 10/13] aio: add .io_poll_begin/end() callbacks Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 12/13] aio: self-tune polling time Stefan Hajnoczi
` (5 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
This is a performance optimization to eliminate vmexits during polling.
It also avoids spurious ioeventfd processing after polling ends.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/virtio/virtio.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 79b5c36..d40711a 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2053,6 +2053,13 @@ static void virtio_queue_host_notifier_aio_read(EventNotifier *n)
}
}
+static void virtio_queue_host_notifier_aio_poll_begin(EventNotifier *n)
+{
+ VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
+
+ virtio_queue_set_notification(vq, 0);
+}
+
static bool virtio_queue_host_notifier_aio_poll(void *opaque)
{
EventNotifier *n = opaque;
@@ -2066,6 +2073,14 @@ static bool virtio_queue_host_notifier_aio_poll(void *opaque)
return true;
}
+static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
+{
+ VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
+
+ /* Caller polls once more after this to catch requests that race with us */
+ virtio_queue_set_notification(vq, 1);
+}
+
void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
VirtIOHandleOutput handle_output)
{
@@ -2074,6 +2089,9 @@ void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
aio_set_event_notifier(ctx, &vq->host_notifier, true,
virtio_queue_host_notifier_aio_read,
virtio_queue_host_notifier_aio_poll);
+ aio_set_event_notifier_poll(ctx, &vq->host_notifier,
+ virtio_queue_host_notifier_aio_poll_begin,
+ virtio_queue_host_notifier_aio_poll_end);
} else {
aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL);
/* Test and clear notifier before after disabling event,
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 12/13] aio: self-tune polling time
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (10 preceding siblings ...)
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 11/13] virtio: disable virtqueue notifications during polling Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-05 20:06 ` Christian Borntraeger
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 13/13] iothread: add poll-grow and poll-shrink parameters Stefan Hajnoczi
` (4 subsequent siblings)
16 siblings, 1 reply; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
This patch is based on the algorithm for the kvm.ko halt_poll_ns
parameter in Linux. The initial polling time is zero.
If the event loop is woken up within the maximum polling time it means
polling could be effective, so grow polling time.
If the event loop is woken up beyond the maximum polling time it means
polling is not effective, so shrink polling time.
If the event loop makes progress within the current polling time then
the sweet spot has been reached.
This algorithm adjusts the polling time so it can adapt to variations in
workloads. The goal is to reach the sweet spot while also recognizing
when polling would hurt more than help.
Two new trace events, poll_grow and poll_shrink, are added for observing
polling time adjustment.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/aio.h | 10 +++++++--
aio-posix.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++----
aio-win32.c | 3 ++-
async.c | 3 +++
iothread.c | 4 ++--
trace-events | 2 ++
6 files changed, 71 insertions(+), 9 deletions(-)
diff --git a/include/block/aio.h b/include/block/aio.h
index cc3272b..e4a4912 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -134,8 +134,11 @@ struct AioContext {
/* Number of AioHandlers without .io_poll() */
int poll_disable_cnt;
- /* Maximum polling time in nanoseconds */
- int64_t poll_max_ns;
+ /* Polling mode parameters */
+ int64_t poll_ns; /* current polling time in nanoseconds */
+ int64_t poll_max_ns; /* maximum polling time in nanoseconds */
+ int64_t poll_grow; /* polling time growth factor */
+ int64_t poll_shrink; /* polling time shrink factor */
/* Are we in polling mode or monitoring file descriptors? */
bool poll_started;
@@ -511,10 +514,13 @@ void aio_context_setup(AioContext *ctx);
* aio_context_set_poll_params:
* @ctx: the aio context
* @max_ns: how long to busy poll for, in nanoseconds
+ * @grow: polling time growth factor
+ * @shrink: polling time shrink factor
*
* Poll mode can be disabled by setting poll_max_ns to 0.
*/
void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
+ int64_t grow, int64_t shrink,
Error **errp);
#endif
diff --git a/aio-posix.c b/aio-posix.c
index 5216d82..1585571 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -550,7 +550,7 @@ static bool try_poll_mode(AioContext *ctx, bool blocking)
if (blocking && ctx->poll_max_ns && ctx->poll_disable_cnt == 0) {
/* See qemu_soonest_timeout() uint64_t hack */
int64_t max_ns = MIN((uint64_t)aio_compute_timeout(ctx),
- (uint64_t)ctx->poll_max_ns);
+ (uint64_t)ctx->poll_ns);
if (max_ns) {
poll_set_started(ctx, true);
@@ -576,6 +576,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
int ret = 0;
bool progress;
int64_t timeout;
+ int64_t start = 0;
aio_context_acquire(ctx);
progress = false;
@@ -593,6 +594,10 @@ bool aio_poll(AioContext *ctx, bool blocking)
ctx->walking_handlers++;
+ if (ctx->poll_max_ns) {
+ start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+ }
+
if (try_poll_mode(ctx, blocking)) {
progress = true;
} else {
@@ -635,6 +640,47 @@ bool aio_poll(AioContext *ctx, bool blocking)
atomic_sub(&ctx->notify_me, 2);
}
+ /* Adjust polling time */
+ if (ctx->poll_max_ns) {
+ int64_t block_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start;
+
+ if (block_ns <= ctx->poll_ns) {
+ /* This is the sweet spot, no adjustment needed */
+ } else if (block_ns > ctx->poll_max_ns) {
+ /* We'd have to poll for too long, poll less */
+ int64_t old = ctx->poll_ns;
+
+ if (ctx->poll_shrink) {
+ ctx->poll_ns /= ctx->poll_shrink;
+ } else {
+ ctx->poll_ns = 0;
+ }
+
+ trace_poll_shrink(ctx, old, ctx->poll_ns);
+ } else if (ctx->poll_ns < ctx->poll_max_ns &&
+ block_ns < ctx->poll_max_ns) {
+ /* There is room to grow, poll longer */
+ int64_t old = ctx->poll_ns;
+ int64_t grow = ctx->poll_grow;
+
+ if (grow == 0) {
+ grow = 2;
+ }
+
+ if (ctx->poll_ns) {
+ ctx->poll_ns *= grow;
+ } else {
+ ctx->poll_ns = 4000; /* start polling at 4 microseconds */
+ }
+
+ if (ctx->poll_ns > ctx->poll_max_ns) {
+ ctx->poll_ns = ctx->poll_max_ns;
+ }
+
+ trace_poll_grow(ctx, old, ctx->poll_ns);
+ }
+ }
+
aio_notify_accept(ctx);
/* if we have any readable fds, dispatch event */
@@ -678,12 +724,16 @@ void aio_context_setup(AioContext *ctx)
#endif
}
-void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns, Error **errp)
+void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
+ int64_t grow, int64_t shrink, Error **errp)
{
- /* No thread synchronization here, it doesn't matter if an incorrect poll
- * timeout is used once.
+ /* No thread synchronization here, it doesn't matter if an incorrect value
+ * is used once.
*/
ctx->poll_max_ns = max_ns;
+ ctx->poll_ns = 0;
+ ctx->poll_grow = grow;
+ ctx->poll_shrink = shrink;
aio_notify(ctx);
}
diff --git a/aio-win32.c b/aio-win32.c
index d0e40a8..d19dc42 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -395,7 +395,8 @@ void aio_context_setup(AioContext *ctx)
{
}
-void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns, Error **errp)
+void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
+ int64_t grow, int64_t shrink, Error **errp)
{
error_setg(errp, "AioContext polling is not implemented on Windows");
}
diff --git a/async.c b/async.c
index 29abf40..2960171 100644
--- a/async.c
+++ b/async.c
@@ -385,7 +385,10 @@ AioContext *aio_context_new(Error **errp)
qemu_rec_mutex_init(&ctx->lock);
timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
+ ctx->poll_ns = 0;
ctx->poll_max_ns = 0;
+ ctx->poll_grow = 0;
+ ctx->poll_shrink = 0;
return ctx;
fail:
diff --git a/iothread.c b/iothread.c
index 8dfd10d..28598b5 100644
--- a/iothread.c
+++ b/iothread.c
@@ -98,7 +98,7 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
return;
}
- aio_context_set_poll_params(iothread->ctx, iothread->poll_max_ns,
+ aio_context_set_poll_params(iothread->ctx, iothread->poll_max_ns, 0, 0,
&local_error);
if (local_error) {
error_propagate(errp, local_error);
@@ -158,7 +158,7 @@ static void iothread_set_poll_max_ns(Object *obj, Visitor *v,
iothread->poll_max_ns = value;
if (iothread->ctx) {
- aio_context_set_poll_params(iothread->ctx, value, &local_err);
+ aio_context_set_poll_params(iothread->ctx, value, 0, 0, &local_err);
}
out:
diff --git a/trace-events b/trace-events
index 7fe3a1b..1181486 100644
--- a/trace-events
+++ b/trace-events
@@ -28,6 +28,8 @@
# aio-posix.c
run_poll_handlers_begin(void *ctx, int64_t max_ns) "ctx %p max_ns %"PRId64
run_poll_handlers_end(void *ctx, bool progress) "ctx %p progress %d"
+poll_shrink(void *ctx, int64_t old, int64_t new) "ctx %p old %"PRId64" new %"PRId64
+poll_grow(void *ctx, int64_t old, int64_t new) "ctx %p old %"PRId64" new %"PRId64
# thread-pool.c
thread_pool_submit(void *pool, void *req, void *opaque) "pool %p req %p opaque %p"
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [Qemu-devel] [PATCH v4 13/13] iothread: add poll-grow and poll-shrink parameters
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (11 preceding siblings ...)
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 12/13] aio: self-tune polling time Stefan Hajnoczi
@ 2016-12-01 19:26 ` Stefan Hajnoczi
2016-12-02 8:27 ` [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Paolo Bonzini
` (3 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-01 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: borntraeger, Paolo Bonzini, Karl Rister, Fam Zheng,
Stefan Hajnoczi
These parameters control the poll time self-tuning algorithm. They are
optional and will default to sane values if omitted.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/sysemu/iothread.h | 2 ++
iothread.c | 56 +++++++++++++++++++++++++++++++++++++----------
2 files changed, 47 insertions(+), 11 deletions(-)
diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
index 314e163..e6da1a4 100644
--- a/include/sysemu/iothread.h
+++ b/include/sysemu/iothread.h
@@ -31,6 +31,8 @@ typedef struct {
/* AioContext poll parameters */
int64_t poll_max_ns;
+ int64_t poll_grow;
+ int64_t poll_shrink;
} IOThread;
#define IOTHREAD(obj) \
diff --git a/iothread.c b/iothread.c
index 28598b5..7bedde8 100644
--- a/iothread.c
+++ b/iothread.c
@@ -98,7 +98,10 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
return;
}
- aio_context_set_poll_params(iothread->ctx, iothread->poll_max_ns, 0, 0,
+ aio_context_set_poll_params(iothread->ctx,
+ iothread->poll_max_ns,
+ iothread->poll_grow,
+ iothread->poll_shrink,
&local_error);
if (local_error) {
error_propagate(errp, local_error);
@@ -129,18 +132,37 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
qemu_mutex_unlock(&iothread->init_done_lock);
}
-static void iothread_get_poll_max_ns(Object *obj, Visitor *v,
+typedef struct {
+ const char *name;
+ ptrdiff_t offset; /* field's byte offset in IOThread struct */
+} PollParamInfo;
+
+static PollParamInfo poll_max_ns_info = {
+ "poll-max-ns", offsetof(IOThread, poll_max_ns),
+};
+static PollParamInfo poll_grow_info = {
+ "poll-grow", offsetof(IOThread, poll_grow),
+};
+static PollParamInfo poll_shrink_info = {
+ "poll-shrink", offsetof(IOThread, poll_shrink),
+};
+
+static void iothread_get_poll_param(Object *obj, Visitor *v,
const char *name, void *opaque, Error **errp)
{
IOThread *iothread = IOTHREAD(obj);
+ PollParamInfo *info = opaque;
+ int64_t *field = (void *)iothread + info->offset;
- visit_type_int64(v, name, &iothread->poll_max_ns, errp);
+ visit_type_int64(v, name, field, errp);
}
-static void iothread_set_poll_max_ns(Object *obj, Visitor *v,
+static void iothread_set_poll_param(Object *obj, Visitor *v,
const char *name, void *opaque, Error **errp)
{
IOThread *iothread = IOTHREAD(obj);
+ PollParamInfo *info = opaque;
+ int64_t *field = (void *)iothread + info->offset;
Error *local_err = NULL;
int64_t value;
@@ -150,15 +172,19 @@ static void iothread_set_poll_max_ns(Object *obj, Visitor *v,
}
if (value < 0) {
- error_setg(&local_err, "poll_max_ns value must be in range "
- "[0, %"PRId64"]", INT64_MAX);
+ error_setg(&local_err, "%s value must be in range [0, %"PRId64"]",
+ info->name, INT64_MAX);
goto out;
}
- iothread->poll_max_ns = value;
+ *field = value;
if (iothread->ctx) {
- aio_context_set_poll_params(iothread->ctx, value, 0, 0, &local_err);
+ aio_context_set_poll_params(iothread->ctx,
+ iothread->poll_max_ns,
+ iothread->poll_grow,
+ iothread->poll_shrink,
+ &local_err);
}
out:
@@ -171,9 +197,17 @@ static void iothread_class_init(ObjectClass *klass, void *class_data)
ucc->complete = iothread_complete;
object_class_property_add(klass, "poll-max-ns", "int",
- iothread_get_poll_max_ns,
- iothread_set_poll_max_ns,
- NULL, NULL, &error_abort);
+ iothread_get_poll_param,
+ iothread_set_poll_param,
+ NULL, &poll_max_ns_info, &error_abort);
+ object_class_property_add(klass, "poll-grow", "int",
+ iothread_get_poll_param,
+ iothread_set_poll_param,
+ NULL, &poll_grow_info, &error_abort);
+ object_class_property_add(klass, "poll-shrink", "int",
+ iothread_get_poll_param,
+ iothread_set_poll_param,
+ NULL, &poll_shrink_info, &error_abort);
}
static const TypeInfo iothread_info = {
--
2.9.3
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (12 preceding siblings ...)
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 13/13] iothread: add poll-grow and poll-shrink parameters Stefan Hajnoczi
@ 2016-12-02 8:27 ` Paolo Bonzini
2016-12-02 11:11 ` Stefan Hajnoczi
` (2 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2016-12-02 8:27 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel, borntraeger, Karl Rister, Fam Zheng
> v4:
> * Added poll time self-tuning algorithm [Christian and Paolo]
> * Try a single iteration of polling to avoid non-blocking
> ppoll(2)/epoll_wait(2) [Paolo]
> * Reordered patches to make performance analysis easier - see below
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> v3:
> * Avoid ppoll(2)/epoll_wait(2) if polling succeeded [Paolo]
> * Disable guest->host virtqueue notification during polling [Christian]
> * Rebased on top of my virtio-blk/scsi virtqueue notification disable
> patches
>
> v2:
> * Uninitialized node->deleted gone [Fam]
> * Removed 1024 polling loop iteration qemu_clock_get_ns() optimization which
> created a weird step pattern [Fam]
> * Unified with AioHandler, dropped AioPollHandler struct [Paolo]
> (actually I think Paolo had more in mind but this is the first step)
> * Only poll when all event loop resources support it [Paolo]
> * Added run_poll_handlers_begin/end trace events for perf analysis
> * Sorry, Christian, no virtqueue kick suppression yet
>
> Recent performance investigation work done by Karl Rister shows that the
> guest->host notification takes around 20 us. This is more than the
> "overhead"
> of QEMU itself (e.g. block layer).
>
> One way to avoid the costly exit is to use polling instead of notification.
> The main drawback of polling is that it consumes CPU resources. In order to
> benefit performance the host must have extra CPU cycles available on physical
> CPUs that aren't used by the guest.
>
> This is an experimental AioContext polling implementation. It adds a polling
> callback into the event loop. Polling functions are implemented for
> virtio-blk
> virtqueue guest->host kick and Linux AIO completion.
>
> The -object iothread,poll-max-ns=NUM parameter sets the number of nanoseconds
> to poll before entering the usual blocking poll(2) syscall. Try setting this
> parameter to the time from old request completion to new virtqueue kick. By
> default no polling is done so you must set this parameter to get busy
> polling.
>
> Patch series overview:
>
> Here are convenient points in the patch series at which to do performance
> benchmarking to see how various pieces interact.
>
> 1. Basic -iothread poll-max-ns=NUM parameter without self-tuning:
>
> aio: add flag to skip fds to aio_dispatch()
> aio: add AioPollFn and io_poll() interface
> aio: add polling mode to AioContext
> virtio: poll virtqueues for new buffers
> linux-aio: poll ring for completions
> iothread: add polling parameters
>
> 2. My "[PATCH 0/3] virtio: disable notifications in blk and scsi" series is
> needed as a base. This is unrelated to polling but we'll also disable
> notifications during polling later:
>
> virtio-blk: suppress virtqueue kick during processing
> virtio-scsi: suppress virtqueue kick during processing
>
> 3. Disable virtqueue notifications (reduce vmexits) during polling:
>
> virtio: turn vq->notification into a nested counter
> aio: add .io_poll_begin/end() callbacks
> virtio: disable virtqueue notifications during polling
>
> 4. Self-tuning, adjusts polling time depending on wait times:
>
> aio: self-tune polling time
> iothread: add poll-grow and poll-shrink parameters
>
> Stefan Hajnoczi (13):
> aio: add flag to skip fds to aio_dispatch()
> aio: add AioPollFn and io_poll() interface
> aio: add polling mode to AioContext
> virtio: poll virtqueues for new buffers
> linux-aio: poll ring for completions
> iothread: add polling parameters
> virtio-blk: suppress virtqueue kick during processing
> virtio-scsi: suppress virtqueue kick during processing
> virtio: turn vq->notification into a nested counter
> aio: add .io_poll_begin/end() callbacks
> virtio: disable virtqueue notifications during polling
> aio: self-tune polling time
> iothread: add poll-grow and poll-shrink parameters
>
> include/block/aio.h | 53 +++++++-
> include/sysemu/iothread.h | 5 +
> aio-posix.c | 308
> +++++++++++++++++++++++++++++++++++++++-----
> aio-win32.c | 32 ++++-
> async.c | 21 ++-
> block/curl.c | 8 +-
> block/iscsi.c | 3 +-
> block/linux-aio.c | 19 ++-
> block/nbd-client.c | 8 +-
> block/nfs.c | 7 +-
> block/sheepdog.c | 26 ++--
> block/ssh.c | 4 +-
> block/win32-aio.c | 4 +-
> hw/block/virtio-blk.c | 18 ++-
> hw/scsi/virtio-scsi.c | 36 +++---
> hw/virtio/virtio.c | 54 ++++++--
> iohandler.c | 2 +-
> iothread.c | 84 ++++++++++++
> nbd/server.c | 9 +-
> stubs/set-fd-handler.c | 1 +
> tests/test-aio.c | 4 +-
> util/event_notifier-posix.c | 2 +-
> trace-events | 6 +
> 23 files changed, 604 insertions(+), 110 deletions(-)
>
> --
> 2.9.3
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (13 preceding siblings ...)
2016-12-02 8:27 ` [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Paolo Bonzini
@ 2016-12-02 11:11 ` Stefan Hajnoczi
2016-12-05 10:25 ` Fam Zheng
2016-12-05 11:19 ` Christian Borntraeger
16 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-02 11:11 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, borntraeger, Karl Rister, Fam Zheng, Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 816 bytes --]
On Thu, Dec 01, 2016 at 07:26:39PM +0000, Stefan Hajnoczi wrote:
> v4:
> * Added poll time self-tuning algorithm [Christian and Paolo]
> * Try a single iteration of polling to avoid non-blocking ppoll(2)/epoll_wait(2) [Paolo]
> * Reordered patches to make performance analysis easier - see below
>
> v3:
> * Avoid ppoll(2)/epoll_wait(2) if polling succeeded [Paolo]
> * Disable guest->host virtqueue notification during polling [Christian]
> * Rebased on top of my virtio-blk/scsi virtqueue notification disable patches
Karl Rister found a performance regression between v2 -> v3.
There was a bug in the poll_disable_cnt code in aio_set_fd_handler()
which meant that polling was disabled unnecessarily. This has been
fixed in v4 and I suspect it was the cause for the regression.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers Stefan Hajnoczi
@ 2016-12-05 0:54 ` Fam Zheng
2016-12-05 10:14 ` Stefan Hajnoczi
0 siblings, 1 reply; 32+ messages in thread
From: Fam Zheng @ 2016-12-05 0:54 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel, borntraeger, Paolo Bonzini, Karl Rister
On Thu, 12/01 19:26, Stefan Hajnoczi wrote:
> Add an AioContext poll handler to detect new virtqueue buffers without
> waiting for a guest->host notification.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> hw/virtio/virtio.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 6f8ca25..3870411 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2047,13 +2047,27 @@ static void virtio_queue_host_notifier_aio_read(EventNotifier *n)
> }
> }
>
> +static bool virtio_queue_host_notifier_aio_poll(void *opaque)
> +{
> + EventNotifier *n = opaque;
> + VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
> +
> + if (virtio_queue_empty(vq)) {
I notice this call path gets very hot in the profile:
#0 address_space_translate_internal (d=0x7f853c22e970, addr=2140788738,
xlat=xlat@entry=0x7f8549db59e8, plen=plen@entry=0x7f8549db5a68,
resolve_subpage=resolve_subpage@entry=true) at /root/qemu-nvme/exec.c:419
#1 0x00007f854d3e8449 in address_space_translate (as=<optimized out>, addr=2140788738,
xlat=xlat@entry=0x7f8549db5a70, plen=plen@entry=0x7f8549db5a68,
is_write=is_write@entry=false) at /root/qemu-nvme/exec.c:462
#2 0x00007f854d3ee2fd in address_space_lduw_internal (endian=DEVICE_LITTLE_ENDIAN,
result=0x0, attrs=..., addr=<optimized out>, as=<optimized out>)
at /root/qemu-nvme/exec.c:3299
#3 address_space_lduw_le (result=0x0, attrs=..., addr=<optimized out>, as=<optimized out>)
at /root/qemu-nvme/exec.c:3351
#4 lduw_le_phys (as=<optimized out>, addr=<optimized out>) at /root/qemu-nvme/exec.c:3369
#5 0x00007f854d475978 in virtio_lduw_phys (vdev=<optimized out>, pa=<optimized out>)
at /root/qemu-nvme/include/hw/virtio/virtio-access.h:46
#6 vring_avail_idx (vq=<optimized out>, vq=<optimized out>)
at /root/qemu-nvme/hw/virtio/virtio.c:143
#7 virtio_queue_empty (vq=vq@entry=0x7f854fd0c9e0) at /root/qemu-nvme/hw/virtio/virtio.c:246
#8 0x00007f854d475d20 in virtio_queue_empty (vq=0x7f854fd0c9e0)
at /root/qemu-nvme/hw/virtio/virtio.c:2074
#9 virtio_queue_host_notifier_aio_poll (opaque=0x7f854fd0ca48)
at /root/qemu-nvme/hw/virtio/virtio.c:2068
#10 0x00007f854d69116e in run_poll_handlers_once (ctx=<optimized out>) at aio-posix.c:493
#11 0x00007f854d691b08 in run_poll_handlers (max_ns=<optimized out>, ctx=0x7f854e908850)
at aio-posix.c:530
#12 try_poll_mode (blocking=true, ctx=0x7f854e908850) at aio-posix.c:558
#13 aio_poll (ctx=0x7f854e908850, blocking=blocking@entry=true) at aio-posix.c:601
#14 0x00007f854d4ff12e in iothread_run (opaque=0x7f854e9082f0) at iothread.c:53
#15 0x00007f854c249dc5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f854acd973d in clone () from /lib64/libc.so.6
Is it too costly to do an address_space_translate per poll?
Fam
> + return false;
> + }
> +
> + virtio_queue_notify_aio_vq(vq);
> + return true;
> +}
> +
> void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
> VirtIOHandleOutput handle_output)
> {
> if (handle_output) {
> vq->handle_aio_output = handle_output;
> aio_set_event_notifier(ctx, &vq->host_notifier, true,
> - virtio_queue_host_notifier_aio_read, NULL);
> + virtio_queue_host_notifier_aio_read,
> + virtio_queue_host_notifier_aio_poll);
> } else {
> aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL);
> /* Test and clear notifier before after disabling event,
> --
> 2.9.3
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers
2016-12-05 0:54 ` Fam Zheng
@ 2016-12-05 10:14 ` Stefan Hajnoczi
2016-12-05 10:59 ` Paolo Bonzini
2016-12-05 11:12 ` Christian Borntraeger
0 siblings, 2 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-05 10:14 UTC (permalink / raw)
To: Fam Zheng
Cc: Stefan Hajnoczi, borntraeger, Karl Rister, qemu-devel,
Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 3594 bytes --]
On Mon, Dec 05, 2016 at 08:54:59AM +0800, Fam Zheng wrote:
> On Thu, 12/01 19:26, Stefan Hajnoczi wrote:
> > Add an AioContext poll handler to detect new virtqueue buffers without
> > waiting for a guest->host notification.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > hw/virtio/virtio.c | 16 +++++++++++++++-
> > 1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 6f8ca25..3870411 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -2047,13 +2047,27 @@ static void virtio_queue_host_notifier_aio_read(EventNotifier *n)
> > }
> > }
> >
> > +static bool virtio_queue_host_notifier_aio_poll(void *opaque)
> > +{
> > + EventNotifier *n = opaque;
> > + VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
> > +
> > + if (virtio_queue_empty(vq)) {
>
> I notice this call path gets very hot in the profile:
>
> #0 address_space_translate_internal (d=0x7f853c22e970, addr=2140788738,
> xlat=xlat@entry=0x7f8549db59e8, plen=plen@entry=0x7f8549db5a68,
> resolve_subpage=resolve_subpage@entry=true) at /root/qemu-nvme/exec.c:419
> #1 0x00007f854d3e8449 in address_space_translate (as=<optimized out>, addr=2140788738,
> xlat=xlat@entry=0x7f8549db5a70, plen=plen@entry=0x7f8549db5a68,
> is_write=is_write@entry=false) at /root/qemu-nvme/exec.c:462
> #2 0x00007f854d3ee2fd in address_space_lduw_internal (endian=DEVICE_LITTLE_ENDIAN,
> result=0x0, attrs=..., addr=<optimized out>, as=<optimized out>)
> at /root/qemu-nvme/exec.c:3299
> #3 address_space_lduw_le (result=0x0, attrs=..., addr=<optimized out>, as=<optimized out>)
> at /root/qemu-nvme/exec.c:3351
> #4 lduw_le_phys (as=<optimized out>, addr=<optimized out>) at /root/qemu-nvme/exec.c:3369
> #5 0x00007f854d475978 in virtio_lduw_phys (vdev=<optimized out>, pa=<optimized out>)
> at /root/qemu-nvme/include/hw/virtio/virtio-access.h:46
> #6 vring_avail_idx (vq=<optimized out>, vq=<optimized out>)
> at /root/qemu-nvme/hw/virtio/virtio.c:143
> #7 virtio_queue_empty (vq=vq@entry=0x7f854fd0c9e0) at /root/qemu-nvme/hw/virtio/virtio.c:246
> #8 0x00007f854d475d20 in virtio_queue_empty (vq=0x7f854fd0c9e0)
> at /root/qemu-nvme/hw/virtio/virtio.c:2074
> #9 virtio_queue_host_notifier_aio_poll (opaque=0x7f854fd0ca48)
> at /root/qemu-nvme/hw/virtio/virtio.c:2068
> #10 0x00007f854d69116e in run_poll_handlers_once (ctx=<optimized out>) at aio-posix.c:493
> #11 0x00007f854d691b08 in run_poll_handlers (max_ns=<optimized out>, ctx=0x7f854e908850)
> at aio-posix.c:530
> #12 try_poll_mode (blocking=true, ctx=0x7f854e908850) at aio-posix.c:558
> #13 aio_poll (ctx=0x7f854e908850, blocking=blocking@entry=true) at aio-posix.c:601
> #14 0x00007f854d4ff12e in iothread_run (opaque=0x7f854e9082f0) at iothread.c:53
> #15 0x00007f854c249dc5 in start_thread () from /lib64/libpthread.so.0
> #16 0x00007f854acd973d in clone () from /lib64/libc.so.6
>
> Is it too costly to do an address_space_translate per poll?
Good insight.
If we map guest RAM we should consider mapping the vring in
hw/virtio/virtio.c. That way not just polling benefits.
The map/unmap API was not designed for repeated memory accesses. In the
bounce buffer case you have a stale copy - repeated reads will not
update it. Vrings should be in guest RAM so we don't really need to
support bounce buffering but it's still a hack to use the map/unmap API.
Paolo: Thoughts about the memory API?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (14 preceding siblings ...)
2016-12-02 11:11 ` Stefan Hajnoczi
@ 2016-12-05 10:25 ` Fam Zheng
2016-12-05 14:56 ` Stefan Hajnoczi
2016-12-05 11:19 ` Christian Borntraeger
16 siblings, 1 reply; 32+ messages in thread
From: Fam Zheng @ 2016-12-05 10:25 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel, borntraeger, Paolo Bonzini, Karl Rister
On Thu, 12/01 19:26, Stefan Hajnoczi wrote:
> One way to avoid the costly exit is to use polling instead of notification.
Testing with fio on virtio-blk backed by NVMe device, I can see significant
performance improvement with this series:
poll-max-ns iodepth=1 iodepth=32
--------------------------------------
0 24806.94 151430.36
1 24435.02 155285.59
100 24702.41 149937.2
1000 24457.08 152936.3
10000 24605.05 156158.12
13000 24412.95 154908.73
16000 30654.24 185731.58
18000 36365.69 185538.78
20000 35640.48 188640.61
30000 37411.05 184198.39
60000 35230.66 190427.42
So this definitely helps synchronous I/O with queue depth = 1. Great!
I have a small concern with high queue depth, though. The more frequent we check
the virtqueue, the less likely the requests can be batched, and the more
submissions (both from guest to QEMU and from QEMU to host) are needed to
achieve the same bandwidth, because we'd do less plug/unplug. This could be a
problem under heavy workload. We may want to consider the driver's transient
state when it is appending requests to the virtqueue. For example, virtio-blk
driver in Linux updates avail idx after adding each request. If QEMU looks at
the index in the middle, it will miss the opportunities to plug/unplug and merge
requests. On the other hand, though virtio-blk driver doesn't have IO scheduler,
it does have some batch submission semantics passed down by blk-mq (see the
"notify" condition in drivers/block/virtio_blk.c:virtio_queue_rq()). So I'm
wondering whether it makes sense to wait for the whole batch of requests to be
added to the virtqueue before processing it? This can be done by changing the
driver to only update "avail" index after all requests are added to the queue,
or even adding a flag in the virtio ring descriptor to suppress busy polling.
> The main drawback of polling is that it consumes CPU resources. In order to
> benefit performance the host must have extra CPU cycles available on physical
> CPUs that aren't used by the guest.
>
> This is an experimental AioContext polling implementation. It adds a polling
> callback into the event loop. Polling functions are implemented for virtio-blk
> virtqueue guest->host kick and Linux AIO completion.
>
> The -object iothread,poll-max-ns=NUM parameter sets the number of nanoseconds
> to poll before entering the usual blocking poll(2) syscall. Try setting this
> parameter to the time from old request completion to new virtqueue kick. By
> default no polling is done so you must set this parameter to get busy polling.
Given the self tuning, should we document the best practice in setting the
value? Is it okay for user or even QEMU to use a relatively large value by
default and expect it to tune itself sensibly?
Fam
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers
2016-12-05 10:14 ` Stefan Hajnoczi
@ 2016-12-05 10:59 ` Paolo Bonzini
2016-12-05 14:46 ` Stefan Hajnoczi
2016-12-05 11:12 ` Christian Borntraeger
1 sibling, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2016-12-05 10:59 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Fam Zheng, Stefan Hajnoczi, borntraeger, Karl Rister, qemu-devel,
v.maffione
----- Original Message -----
> From: "Stefan Hajnoczi" <stefanha@gmail.com>
> To: "Fam Zheng" <famz@redhat.com>
> Cc: "Stefan Hajnoczi" <stefanha@redhat.com>, borntraeger@de.ibm.com, "Karl Rister" <krister@redhat.com>,
> qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>
> Sent: Monday, December 5, 2016 11:14:52 AM
> Subject: Re: [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers
>
> On Mon, Dec 05, 2016 at 08:54:59AM +0800, Fam Zheng wrote:
> > On Thu, 12/01 19:26, Stefan Hajnoczi wrote:
> > > Add an AioContext poll handler to detect new virtqueue buffers without
> > > waiting for a guest->host notification.
> > >
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > > hw/virtio/virtio.c | 16 +++++++++++++++-
> > > 1 file changed, 15 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > index 6f8ca25..3870411 100644
> > > --- a/hw/virtio/virtio.c
> > > +++ b/hw/virtio/virtio.c
> > > @@ -2047,13 +2047,27 @@ static void
> > > virtio_queue_host_notifier_aio_read(EventNotifier *n)
> > > }
> > > }
> > >
> > > +static bool virtio_queue_host_notifier_aio_poll(void *opaque)
> > > +{
> > > + EventNotifier *n = opaque;
> > > + VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
> > > +
> > > + if (virtio_queue_empty(vq)) {
> >
> > I notice this call path gets very hot in the profile:
> >
> > #0 address_space_translate_internal (d=0x7f853c22e970, addr=2140788738,
> > xlat=xlat@entry=0x7f8549db59e8, plen=plen@entry=0x7f8549db5a68,
> > resolve_subpage=resolve_subpage@entry=true) at
> > /root/qemu-nvme/exec.c:419
> > #1 0x00007f854d3e8449 in address_space_translate (as=<optimized out>,
> > addr=2140788738,
> > xlat=xlat@entry=0x7f8549db5a70, plen=plen@entry=0x7f8549db5a68,
> > is_write=is_write@entry=false) at /root/qemu-nvme/exec.c:462
> > #2 0x00007f854d3ee2fd in address_space_lduw_internal
> > (endian=DEVICE_LITTLE_ENDIAN,
> > result=0x0, attrs=..., addr=<optimized out>, as=<optimized out>)
> > at /root/qemu-nvme/exec.c:3299
> > #3 address_space_lduw_le (result=0x0, attrs=..., addr=<optimized out>,
> > as=<optimized out>)
> > at /root/qemu-nvme/exec.c:3351
> > #4 lduw_le_phys (as=<optimized out>, addr=<optimized out>) at
> > /root/qemu-nvme/exec.c:3369
> > #5 0x00007f854d475978 in virtio_lduw_phys (vdev=<optimized out>,
> > pa=<optimized out>)
> > at /root/qemu-nvme/include/hw/virtio/virtio-access.h:46
> > #6 vring_avail_idx (vq=<optimized out>, vq=<optimized out>)
> > at /root/qemu-nvme/hw/virtio/virtio.c:143
> > #7 virtio_queue_empty (vq=vq@entry=0x7f854fd0c9e0) at
> > /root/qemu-nvme/hw/virtio/virtio.c:246
> > #8 0x00007f854d475d20 in virtio_queue_empty (vq=0x7f854fd0c9e0)
> > at /root/qemu-nvme/hw/virtio/virtio.c:2074
> > #9 virtio_queue_host_notifier_aio_poll (opaque=0x7f854fd0ca48)
> > at /root/qemu-nvme/hw/virtio/virtio.c:2068
> > #10 0x00007f854d69116e in run_poll_handlers_once (ctx=<optimized out>) at
> > aio-posix.c:493
> > #11 0x00007f854d691b08 in run_poll_handlers (max_ns=<optimized out>,
> > ctx=0x7f854e908850)
> > at aio-posix.c:530
> > #12 try_poll_mode (blocking=true, ctx=0x7f854e908850) at aio-posix.c:558
> > #13 aio_poll (ctx=0x7f854e908850, blocking=blocking@entry=true) at
> > aio-posix.c:601
> > #14 0x00007f854d4ff12e in iothread_run (opaque=0x7f854e9082f0) at
> > iothread.c:53
> > #15 0x00007f854c249dc5 in start_thread () from /lib64/libpthread.so.0
> > #16 0x00007f854acd973d in clone () from /lib64/libc.so.6
> >
> > Is it too costly to do an address_space_translate per poll?
>
> Good insight.
>
> If we map guest RAM we should consider mapping the vring in
> hw/virtio/virtio.c. That way not just polling benefits.
>
> The map/unmap API was not designed for repeated memory accesses. In the
> bounce buffer case you have a stale copy - repeated reads will not
> update it. Vrings should be in guest RAM so we don't really need to
> support bounce buffering but it's still a hack to use the map/unmap API.
>
> Paolo: Thoughts about the memory API?
Yeah, I had discussed this recently with Fam, and prototyped (read: compiled
the code) an API like this:
int64_t address_space_cache_init(MemoryRegionCache *cache,
AddressSpace *as,
hwaddr addr,
hwaddr len,
bool is_write);
/* Only for is_write == true. */
void address_space_cache_invalidate(MemoryRegionCache *cache,
hwaddr addr,
hwaddr access_len);
uint32_t address_space_lduw_cached(MemoryRegionCache *cache, hwaddr addr,
MemTxAttrs attrs, MemTxResult *result);
...
It's basically the same API as usual, but with address_space_translate
reduced to
*xlat = addr - cache->xlat;
*plen = MIN(*plen, cache->len - *xlat);
return cache->mr;
and likewise for other hot spots of the ld/st API. Originally I thought
of two ways we could use it:
1) update the cache with a MemoryListener
2) hoist the cache out of the hot loops, but still initialize it
once per iteration.
This was before Fam tested your patches, and the polling case makes
(1) the only practical way.
(2) of course would have been less code, but if the idea is used by other
devices (Vincenzo and the netmap gang wanted something similar in e1000,
for example) perhaps we could just add a method to PCIDeviceClass and
VirtioDeviceClass.
Separate from this, and unrelated to polling, we can and should use
the map/unmap API for descriptors, both direct and indirect. This is
simple and I hope to test it some time this week.
Paolo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers
2016-12-05 10:14 ` Stefan Hajnoczi
2016-12-05 10:59 ` Paolo Bonzini
@ 2016-12-05 11:12 ` Christian Borntraeger
1 sibling, 0 replies; 32+ messages in thread
From: Christian Borntraeger @ 2016-12-05 11:12 UTC (permalink / raw)
To: Stefan Hajnoczi, Fam Zheng
Cc: Stefan Hajnoczi, Karl Rister, qemu-devel, Paolo Bonzini
On 12/05/2016 11:14 AM, Stefan Hajnoczi wrote:
> On Mon, Dec 05, 2016 at 08:54:59AM +0800, Fam Zheng wrote:
>> On Thu, 12/01 19:26, Stefan Hajnoczi wrote:
>>> Add an AioContext poll handler to detect new virtqueue buffers without
>>> waiting for a guest->host notification.
>>>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>> hw/virtio/virtio.c | 16 +++++++++++++++-
>>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>> index 6f8ca25..3870411 100644
>>> --- a/hw/virtio/virtio.c
>>> +++ b/hw/virtio/virtio.c
>>> @@ -2047,13 +2047,27 @@ static void virtio_queue_host_notifier_aio_read(EventNotifier *n)
>>> }
>>> }
>>>
>>> +static bool virtio_queue_host_notifier_aio_poll(void *opaque)
>>> +{
>>> + EventNotifier *n = opaque;
>>> + VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
>>> +
>>> + if (virtio_queue_empty(vq)) {
>>
>> I notice this call path gets very hot in the profile:
>>
>> #0 address_space_translate_internal (d=0x7f853c22e970, addr=2140788738,
>> xlat=xlat@entry=0x7f8549db59e8, plen=plen@entry=0x7f8549db5a68,
>> resolve_subpage=resolve_subpage@entry=true) at /root/qemu-nvme/exec.c:419
>> #1 0x00007f854d3e8449 in address_space_translate (as=<optimized out>, addr=2140788738,
>> xlat=xlat@entry=0x7f8549db5a70, plen=plen@entry=0x7f8549db5a68,
>> is_write=is_write@entry=false) at /root/qemu-nvme/exec.c:462
>> #2 0x00007f854d3ee2fd in address_space_lduw_internal (endian=DEVICE_LITTLE_ENDIAN,
>> result=0x0, attrs=..., addr=<optimized out>, as=<optimized out>)
>> at /root/qemu-nvme/exec.c:3299
>> #3 address_space_lduw_le (result=0x0, attrs=..., addr=<optimized out>, as=<optimized out>)
>> at /root/qemu-nvme/exec.c:3351
>> #4 lduw_le_phys (as=<optimized out>, addr=<optimized out>) at /root/qemu-nvme/exec.c:3369
>> #5 0x00007f854d475978 in virtio_lduw_phys (vdev=<optimized out>, pa=<optimized out>)
>> at /root/qemu-nvme/include/hw/virtio/virtio-access.h:46
>> #6 vring_avail_idx (vq=<optimized out>, vq=<optimized out>)
>> at /root/qemu-nvme/hw/virtio/virtio.c:143
>> #7 virtio_queue_empty (vq=vq@entry=0x7f854fd0c9e0) at /root/qemu-nvme/hw/virtio/virtio.c:246
>> #8 0x00007f854d475d20 in virtio_queue_empty (vq=0x7f854fd0c9e0)
>> at /root/qemu-nvme/hw/virtio/virtio.c:2074
>> #9 virtio_queue_host_notifier_aio_poll (opaque=0x7f854fd0ca48)
>> at /root/qemu-nvme/hw/virtio/virtio.c:2068
>> #10 0x00007f854d69116e in run_poll_handlers_once (ctx=<optimized out>) at aio-posix.c:493
>> #11 0x00007f854d691b08 in run_poll_handlers (max_ns=<optimized out>, ctx=0x7f854e908850)
>> at aio-posix.c:530
>> #12 try_poll_mode (blocking=true, ctx=0x7f854e908850) at aio-posix.c:558
>> #13 aio_poll (ctx=0x7f854e908850, blocking=blocking@entry=true) at aio-posix.c:601
>> #14 0x00007f854d4ff12e in iothread_run (opaque=0x7f854e9082f0) at iothread.c:53
>> #15 0x00007f854c249dc5 in start_thread () from /lib64/libpthread.so.0
>> #16 0x00007f854acd973d in clone () from /lib64/libc.so.6
>>
>> Is it too costly to do an address_space_translate per poll?
>
> Good insight.
>
> If we map guest RAM we should consider mapping the vring in
> hw/virtio/virtio.c. That way not just polling benefits.
>
> The map/unmap API was not designed for repeated memory accesses. In the
> bounce buffer case you have a stale copy - repeated reads will not
> update it. Vrings should be in guest RAM so we don't really need to
> support bounce buffering but it's still a hack to use the map/unmap API.
>
> Paolo: Thoughts about the memory API?
Yes, I see the same on s390. This path is the main CPU eater in the host,
even for non polling as soon as you have fast enough disks. If we could
"abuse" polling to actually lower this overhead that would be interesting.
On the other hand, reducing the overhead of the memory API would be even
better.
Christian
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
` (15 preceding siblings ...)
2016-12-05 10:25 ` Fam Zheng
@ 2016-12-05 11:19 ` Christian Borntraeger
16 siblings, 0 replies; 32+ messages in thread
From: Christian Borntraeger @ 2016-12-05 11:19 UTC (permalink / raw)
To: Stefan Hajnoczi, qemu-devel; +Cc: Karl Rister, Fam Zheng, Paolo Bonzini
On 12/01/2016 08:26 PM, Stefan Hajnoczi wrote:
> v4:
> * Added poll time self-tuning algorithm [Christian and Paolo]
> * Try a single iteration of polling to avoid non-blocking ppoll(2)/epoll_wait(2) [Paolo]
> * Reordered patches to make performance analysis easier - see below
This looks quite promising. I will give this to our performance folks to get
some more controlled measurements.
>
> v3:
> * Avoid ppoll(2)/epoll_wait(2) if polling succeeded [Paolo]
> * Disable guest->host virtqueue notification during polling [Christian]
> * Rebased on top of my virtio-blk/scsi virtqueue notification disable patches
>
> v2:
> * Uninitialized node->deleted gone [Fam]
> * Removed 1024 polling loop iteration qemu_clock_get_ns() optimization which
> created a weird step pattern [Fam]
> * Unified with AioHandler, dropped AioPollHandler struct [Paolo]
> (actually I think Paolo had more in mind but this is the first step)
> * Only poll when all event loop resources support it [Paolo]
> * Added run_poll_handlers_begin/end trace events for perf analysis
> * Sorry, Christian, no virtqueue kick suppression yet
>
> Recent performance investigation work done by Karl Rister shows that the
> guest->host notification takes around 20 us. This is more than the "overhead"
> of QEMU itself (e.g. block layer).
>
> One way to avoid the costly exit is to use polling instead of notification.
> The main drawback of polling is that it consumes CPU resources. In order to
> benefit performance the host must have extra CPU cycles available on physical
> CPUs that aren't used by the guest.
>
> This is an experimental AioContext polling implementation. It adds a polling
> callback into the event loop. Polling functions are implemented for virtio-blk
> virtqueue guest->host kick and Linux AIO completion.
>
> The -object iothread,poll-max-ns=NUM parameter sets the number of nanoseconds
> to poll before entering the usual blocking poll(2) syscall. Try setting this
> parameter to the time from old request completion to new virtqueue kick. By
> default no polling is done so you must set this parameter to get busy polling.
>
> Patch series overview:
>
> Here are convenient points in the patch series at which to do performance
> benchmarking to see how various pieces interact.
>
> 1. Basic -iothread poll-max-ns=NUM parameter without self-tuning:
>
> aio: add flag to skip fds to aio_dispatch()
> aio: add AioPollFn and io_poll() interface
> aio: add polling mode to AioContext
> virtio: poll virtqueues for new buffers
> linux-aio: poll ring for completions
> iothread: add polling parameters
>
> 2. My "[PATCH 0/3] virtio: disable notifications in blk and scsi" series is
> needed as a base. This is unrelated to polling but we'll also disable
> notifications during polling later:
>
> virtio-blk: suppress virtqueue kick during processing
> virtio-scsi: suppress virtqueue kick during processing
>
> 3. Disable virtqueue notifications (reduce vmexits) during polling:
>
> virtio: turn vq->notification into a nested counter
> aio: add .io_poll_begin/end() callbacks
> virtio: disable virtqueue notifications during polling
>
> 4. Self-tuning, adjusts polling time depending on wait times:
>
> aio: self-tune polling time
> iothread: add poll-grow and poll-shrink parameters
>
> Stefan Hajnoczi (13):
> aio: add flag to skip fds to aio_dispatch()
> aio: add AioPollFn and io_poll() interface
> aio: add polling mode to AioContext
> virtio: poll virtqueues for new buffers
> linux-aio: poll ring for completions
> iothread: add polling parameters
> virtio-blk: suppress virtqueue kick during processing
> virtio-scsi: suppress virtqueue kick during processing
> virtio: turn vq->notification into a nested counter
> aio: add .io_poll_begin/end() callbacks
> virtio: disable virtqueue notifications during polling
> aio: self-tune polling time
> iothread: add poll-grow and poll-shrink parameters
>
> include/block/aio.h | 53 +++++++-
> include/sysemu/iothread.h | 5 +
> aio-posix.c | 308 +++++++++++++++++++++++++++++++++++++++-----
> aio-win32.c | 32 ++++-
> async.c | 21 ++-
> block/curl.c | 8 +-
> block/iscsi.c | 3 +-
> block/linux-aio.c | 19 ++-
> block/nbd-client.c | 8 +-
> block/nfs.c | 7 +-
> block/sheepdog.c | 26 ++--
> block/ssh.c | 4 +-
> block/win32-aio.c | 4 +-
> hw/block/virtio-blk.c | 18 ++-
> hw/scsi/virtio-scsi.c | 36 +++---
> hw/virtio/virtio.c | 54 ++++++--
> iohandler.c | 2 +-
> iothread.c | 84 ++++++++++++
> nbd/server.c | 9 +-
> stubs/set-fd-handler.c | 1 +
> tests/test-aio.c | 4 +-
> util/event_notifier-posix.c | 2 +-
> trace-events | 6 +
> 23 files changed, 604 insertions(+), 110 deletions(-)
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers
2016-12-05 10:59 ` Paolo Bonzini
@ 2016-12-05 14:46 ` Stefan Hajnoczi
2016-12-05 15:22 ` Paolo Bonzini
0 siblings, 1 reply; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-05 14:46 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Stefan Hajnoczi, Fam Zheng, borntraeger, Karl Rister, qemu-devel,
v.maffione
[-- Attachment #1: Type: text/plain, Size: 6659 bytes --]
On Mon, Dec 05, 2016 at 05:59:55AM -0500, Paolo Bonzini wrote:
>
>
> ----- Original Message -----
> > From: "Stefan Hajnoczi" <stefanha@gmail.com>
> > To: "Fam Zheng" <famz@redhat.com>
> > Cc: "Stefan Hajnoczi" <stefanha@redhat.com>, borntraeger@de.ibm.com, "Karl Rister" <krister@redhat.com>,
> > qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>
> > Sent: Monday, December 5, 2016 11:14:52 AM
> > Subject: Re: [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers
> >
> > On Mon, Dec 05, 2016 at 08:54:59AM +0800, Fam Zheng wrote:
> > > On Thu, 12/01 19:26, Stefan Hajnoczi wrote:
> > > > Add an AioContext poll handler to detect new virtqueue buffers without
> > > > waiting for a guest->host notification.
> > > >
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > ---
> > > > hw/virtio/virtio.c | 16 +++++++++++++++-
> > > > 1 file changed, 15 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > > > index 6f8ca25..3870411 100644
> > > > --- a/hw/virtio/virtio.c
> > > > +++ b/hw/virtio/virtio.c
> > > > @@ -2047,13 +2047,27 @@ static void
> > > > virtio_queue_host_notifier_aio_read(EventNotifier *n)
> > > > }
> > > > }
> > > >
> > > > +static bool virtio_queue_host_notifier_aio_poll(void *opaque)
> > > > +{
> > > > + EventNotifier *n = opaque;
> > > > + VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
> > > > +
> > > > + if (virtio_queue_empty(vq)) {
> > >
> > > I notice this call path gets very hot in the profile:
> > >
> > > #0 address_space_translate_internal (d=0x7f853c22e970, addr=2140788738,
> > > xlat=xlat@entry=0x7f8549db59e8, plen=plen@entry=0x7f8549db5a68,
> > > resolve_subpage=resolve_subpage@entry=true) at
> > > /root/qemu-nvme/exec.c:419
> > > #1 0x00007f854d3e8449 in address_space_translate (as=<optimized out>,
> > > addr=2140788738,
> > > xlat=xlat@entry=0x7f8549db5a70, plen=plen@entry=0x7f8549db5a68,
> > > is_write=is_write@entry=false) at /root/qemu-nvme/exec.c:462
> > > #2 0x00007f854d3ee2fd in address_space_lduw_internal
> > > (endian=DEVICE_LITTLE_ENDIAN,
> > > result=0x0, attrs=..., addr=<optimized out>, as=<optimized out>)
> > > at /root/qemu-nvme/exec.c:3299
> > > #3 address_space_lduw_le (result=0x0, attrs=..., addr=<optimized out>,
> > > as=<optimized out>)
> > > at /root/qemu-nvme/exec.c:3351
> > > #4 lduw_le_phys (as=<optimized out>, addr=<optimized out>) at
> > > /root/qemu-nvme/exec.c:3369
> > > #5 0x00007f854d475978 in virtio_lduw_phys (vdev=<optimized out>,
> > > pa=<optimized out>)
> > > at /root/qemu-nvme/include/hw/virtio/virtio-access.h:46
> > > #6 vring_avail_idx (vq=<optimized out>, vq=<optimized out>)
> > > at /root/qemu-nvme/hw/virtio/virtio.c:143
> > > #7 virtio_queue_empty (vq=vq@entry=0x7f854fd0c9e0) at
> > > /root/qemu-nvme/hw/virtio/virtio.c:246
> > > #8 0x00007f854d475d20 in virtio_queue_empty (vq=0x7f854fd0c9e0)
> > > at /root/qemu-nvme/hw/virtio/virtio.c:2074
> > > #9 virtio_queue_host_notifier_aio_poll (opaque=0x7f854fd0ca48)
> > > at /root/qemu-nvme/hw/virtio/virtio.c:2068
> > > #10 0x00007f854d69116e in run_poll_handlers_once (ctx=<optimized out>) at
> > > aio-posix.c:493
> > > #11 0x00007f854d691b08 in run_poll_handlers (max_ns=<optimized out>,
> > > ctx=0x7f854e908850)
> > > at aio-posix.c:530
> > > #12 try_poll_mode (blocking=true, ctx=0x7f854e908850) at aio-posix.c:558
> > > #13 aio_poll (ctx=0x7f854e908850, blocking=blocking@entry=true) at
> > > aio-posix.c:601
> > > #14 0x00007f854d4ff12e in iothread_run (opaque=0x7f854e9082f0) at
> > > iothread.c:53
> > > #15 0x00007f854c249dc5 in start_thread () from /lib64/libpthread.so.0
> > > #16 0x00007f854acd973d in clone () from /lib64/libc.so.6
> > >
> > > Is it too costly to do an address_space_translate per poll?
> >
> > Good insight.
> >
> > If we map guest RAM we should consider mapping the vring in
> > hw/virtio/virtio.c. That way not just polling benefits.
> >
> > The map/unmap API was not designed for repeated memory accesses. In the
> > bounce buffer case you have a stale copy - repeated reads will not
> > update it. Vrings should be in guest RAM so we don't really need to
> > support bounce buffering but it's still a hack to use the map/unmap API.
> >
> > Paolo: Thoughts about the memory API?
>
> Yeah, I had discussed this recently with Fam, and prototyped (read: compiled
> the code) an API like this:
>
> int64_t address_space_cache_init(MemoryRegionCache *cache,
> AddressSpace *as,
> hwaddr addr,
> hwaddr len,
> bool is_write);
> /* Only for is_write == true. */
> void address_space_cache_invalidate(MemoryRegionCache *cache,
> hwaddr addr,
> hwaddr access_len);
> uint32_t address_space_lduw_cached(MemoryRegionCache *cache, hwaddr addr,
> MemTxAttrs attrs, MemTxResult *result);
> ...
>
> It's basically the same API as usual, but with address_space_translate
> reduced to
>
> *xlat = addr - cache->xlat;
> *plen = MIN(*plen, cache->len - *xlat);
> return cache->mr;
>
> and likewise for other hot spots of the ld/st API. Originally I thought
> of two ways we could use it:
>
> 1) update the cache with a MemoryListener
>
> 2) hoist the cache out of the hot loops, but still initialize it
> once per iteration.
>
> This was before Fam tested your patches, and the polling case makes
> (1) the only practical way.
>
> (2) of course would have been less code, but if the idea is used by other
> devices (Vincenzo and the netmap gang wanted something similar in e1000,
> for example) perhaps we could just add a method to PCIDeviceClass and
> VirtioDeviceClass.
>
> Separate from this, and unrelated to polling, we can and should use
> the map/unmap API for descriptors, both direct and indirect. This is
> simple and I hope to test it some time this week.
That sounds promising.
Another cache invalidation approach is to give MemoryRegions generation
numbers so that the cached address translation can be refreshed if the
MemoryRegion has changed.
Anyway, I'm glad the polling series on its own is already showing good
performance results. I'd like to merge it at the beginning of QEMU 2.9
so we can work on further improvements that build on top of it.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode
2016-12-05 10:25 ` Fam Zheng
@ 2016-12-05 14:56 ` Stefan Hajnoczi
2016-12-05 16:40 ` Karl Rister
0 siblings, 1 reply; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-05 14:56 UTC (permalink / raw)
To: Fam Zheng; +Cc: qemu-devel, borntraeger, Paolo Bonzini, Karl Rister
[-- Attachment #1: Type: text/plain, Size: 3825 bytes --]
On Mon, Dec 05, 2016 at 06:25:49PM +0800, Fam Zheng wrote:
> On Thu, 12/01 19:26, Stefan Hajnoczi wrote:
> > One way to avoid the costly exit is to use polling instead of notification.
>
> Testing with fio on virtio-blk backed by NVMe device, I can see significant
> performance improvement with this series:
>
> poll-max-ns iodepth=1 iodepth=32
> --------------------------------------
> 0 24806.94 151430.36
> 1 24435.02 155285.59
> 100 24702.41 149937.2
> 1000 24457.08 152936.3
> 10000 24605.05 156158.12
> 13000 24412.95 154908.73
> 16000 30654.24 185731.58
> 18000 36365.69 185538.78
> 20000 35640.48 188640.61
> 30000 37411.05 184198.39
> 60000 35230.66 190427.42
>
> So this definitely helps synchronous I/O with queue depth = 1. Great!
Nice. Even with iodepth=32 the improvement is significant.
> I have a small concern with high queue depth, though. The more frequent we check
> the virtqueue, the less likely the requests can be batched, and the more
> submissions (both from guest to QEMU and from QEMU to host) are needed to
> achieve the same bandwidth, because we'd do less plug/unplug. This could be a
> problem under heavy workload. We may want to consider the driver's transient
> state when it is appending requests to the virtqueue. For example, virtio-blk
> driver in Linux updates avail idx after adding each request. If QEMU looks at
> the index in the middle, it will miss the opportunities to plug/unplug and merge
> requests. On the other hand, though virtio-blk driver doesn't have IO scheduler,
> it does have some batch submission semantics passed down by blk-mq (see the
> "notify" condition in drivers/block/virtio_blk.c:virtio_queue_rq()). So I'm
> wondering whether it makes sense to wait for the whole batch of requests to be
> added to the virtqueue before processing it? This can be done by changing the
> driver to only update "avail" index after all requests are added to the queue,
> or even adding a flag in the virtio ring descriptor to suppress busy polling.
Only benchmarking can tell. It would be interesting to extract the
vq->vring.avail->idx update from virtqueue_add(). I don't think a new
flag is necessary.
> > The main drawback of polling is that it consumes CPU resources. In order to
> > benefit performance the host must have extra CPU cycles available on physical
> > CPUs that aren't used by the guest.
> >
> > This is an experimental AioContext polling implementation. It adds a polling
> > callback into the event loop. Polling functions are implemented for virtio-blk
> > virtqueue guest->host kick and Linux AIO completion.
> >
> > The -object iothread,poll-max-ns=NUM parameter sets the number of nanoseconds
> > to poll before entering the usual blocking poll(2) syscall. Try setting this
> > parameter to the time from old request completion to new virtqueue kick. By
> > default no polling is done so you must set this parameter to get busy polling.
>
> Given the self tuning, should we document the best practice in setting the
> value? Is it okay for user or even QEMU to use a relatively large value by
> default and expect it to tune itself sensibly?
Karl: do you have time to run a bigger suite of benchmarks to identify a
reasonable default poll-max-ns value? Both aio=native and aio=threads
are important.
If there is a sweet spot that improves performance without pathological
cases then we could even enable polling by default in QEMU.
Otherwise we'd just document the recommended best polling duration as a
starting point for users.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers
2016-12-05 14:46 ` Stefan Hajnoczi
@ 2016-12-05 15:22 ` Paolo Bonzini
2016-12-06 9:28 ` Stefan Hajnoczi
0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2016-12-05 15:22 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Stefan Hajnoczi, Fam Zheng, borntraeger, Karl Rister, qemu-devel,
v maffione
> Anyway, I'm glad the polling series on its own is already showing good
> performance results. I'd like to merge it at the beginning of QEMU 2.9
> so we can work on further improvements that build on top of it.
Yes, it would be great if you can stage the infrastructure bits, so that
I can rebase my lockcnt patches on top.
Paolo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode
2016-12-05 14:56 ` Stefan Hajnoczi
@ 2016-12-05 16:40 ` Karl Rister
2016-12-06 9:27 ` Stefan Hajnoczi
0 siblings, 1 reply; 32+ messages in thread
From: Karl Rister @ 2016-12-05 16:40 UTC (permalink / raw)
To: Stefan Hajnoczi, Fam Zheng; +Cc: qemu-devel, borntraeger, Paolo Bonzini
On 12/05/2016 08:56 AM, Stefan Hajnoczi wrote:
<snip>
> Karl: do you have time to run a bigger suite of benchmarks to identify a
> reasonable default poll-max-ns value? Both aio=native and aio=threads
> are important.
>
> If there is a sweet spot that improves performance without pathological
> cases then we could even enable polling by default in QEMU.
>
> Otherwise we'd just document the recommended best polling duration as a
> starting point for users.
>
I have collected a baseline on the latest patches and am currently
collecting poll-max-ns=16384. I can certainly throw in a few more
scenarios. Do we want to stick with powers of 2 or some other strategy?
--
Karl Rister <krister@redhat.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 12/13] aio: self-tune polling time
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 12/13] aio: self-tune polling time Stefan Hajnoczi
@ 2016-12-05 20:06 ` Christian Borntraeger
2016-12-06 9:20 ` Stefan Hajnoczi
0 siblings, 1 reply; 32+ messages in thread
From: Christian Borntraeger @ 2016-12-05 20:06 UTC (permalink / raw)
To: Stefan Hajnoczi, qemu-devel; +Cc: Karl Rister, Fam Zheng, Paolo Bonzini
On 12/01/2016 08:26 PM, Stefan Hajnoczi wrote:
> This patch is based on the algorithm for the kvm.ko halt_poll_ns
> parameter in Linux. The initial polling time is zero.
>
> If the event loop is woken up within the maximum polling time it means
> polling could be effective, so grow polling time.
>
> If the event loop is woken up beyond the maximum polling time it means
> polling is not effective, so shrink polling time.
>
> If the event loop makes progress within the current polling time then
> the sweet spot has been reached.
>
> This algorithm adjusts the polling time so it can adapt to variations in
> workloads. The goal is to reach the sweet spot while also recognizing
> when polling would hurt more than help.
>
> Two new trace events, poll_grow and poll_shrink, are added for observing
> polling time adjustment.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Not sure way, but I have 4 host ramdisks with the same iothread as guest
virtio-blk. running fio in the guest on one of these disks will poll, as
soon as I have 2 disks in fio I almost always see shrinks (so polling
stays at 0) and almost no grows.
> ---
> include/block/aio.h | 10 +++++++--
> aio-posix.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++----
> aio-win32.c | 3 ++-
> async.c | 3 +++
> iothread.c | 4 ++--
> trace-events | 2 ++
> 6 files changed, 71 insertions(+), 9 deletions(-)
>
> diff --git a/include/block/aio.h b/include/block/aio.h
> index cc3272b..e4a4912 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -134,8 +134,11 @@ struct AioContext {
> /* Number of AioHandlers without .io_poll() */
> int poll_disable_cnt;
>
> - /* Maximum polling time in nanoseconds */
> - int64_t poll_max_ns;
> + /* Polling mode parameters */
> + int64_t poll_ns; /* current polling time in nanoseconds */
> + int64_t poll_max_ns; /* maximum polling time in nanoseconds */
> + int64_t poll_grow; /* polling time growth factor */
> + int64_t poll_shrink; /* polling time shrink factor */
>
> /* Are we in polling mode or monitoring file descriptors? */
> bool poll_started;
> @@ -511,10 +514,13 @@ void aio_context_setup(AioContext *ctx);
> * aio_context_set_poll_params:
> * @ctx: the aio context
> * @max_ns: how long to busy poll for, in nanoseconds
> + * @grow: polling time growth factor
> + * @shrink: polling time shrink factor
> *
> * Poll mode can be disabled by setting poll_max_ns to 0.
> */
> void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
> + int64_t grow, int64_t shrink,
> Error **errp);
>
> #endif
> diff --git a/aio-posix.c b/aio-posix.c
> index 5216d82..1585571 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -550,7 +550,7 @@ static bool try_poll_mode(AioContext *ctx, bool blocking)
> if (blocking && ctx->poll_max_ns && ctx->poll_disable_cnt == 0) {
> /* See qemu_soonest_timeout() uint64_t hack */
> int64_t max_ns = MIN((uint64_t)aio_compute_timeout(ctx),
> - (uint64_t)ctx->poll_max_ns);
> + (uint64_t)ctx->poll_ns);
>
> if (max_ns) {
> poll_set_started(ctx, true);
> @@ -576,6 +576,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
> int ret = 0;
> bool progress;
> int64_t timeout;
> + int64_t start = 0;
>
> aio_context_acquire(ctx);
> progress = false;
> @@ -593,6 +594,10 @@ bool aio_poll(AioContext *ctx, bool blocking)
>
> ctx->walking_handlers++;
>
> + if (ctx->poll_max_ns) {
> + start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> + }
> +
> if (try_poll_mode(ctx, blocking)) {
> progress = true;
> } else {
> @@ -635,6 +640,47 @@ bool aio_poll(AioContext *ctx, bool blocking)
> atomic_sub(&ctx->notify_me, 2);
> }
>
> + /* Adjust polling time */
> + if (ctx->poll_max_ns) {
> + int64_t block_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start;
> +
> + if (block_ns <= ctx->poll_ns) {
> + /* This is the sweet spot, no adjustment needed */
> + } else if (block_ns > ctx->poll_max_ns) {
> + /* We'd have to poll for too long, poll less */
> + int64_t old = ctx->poll_ns;
> +
> + if (ctx->poll_shrink) {
> + ctx->poll_ns /= ctx->poll_shrink;
> + } else {
> + ctx->poll_ns = 0;
> + }
> +
> + trace_poll_shrink(ctx, old, ctx->poll_ns);
> + } else if (ctx->poll_ns < ctx->poll_max_ns &&
> + block_ns < ctx->poll_max_ns) {
> + /* There is room to grow, poll longer */
> + int64_t old = ctx->poll_ns;
> + int64_t grow = ctx->poll_grow;
> +
> + if (grow == 0) {
> + grow = 2;
> + }
> +
> + if (ctx->poll_ns) {
> + ctx->poll_ns *= grow;
> + } else {
> + ctx->poll_ns = 4000; /* start polling at 4 microseconds */
> + }
> +
> + if (ctx->poll_ns > ctx->poll_max_ns) {
> + ctx->poll_ns = ctx->poll_max_ns;
> + }
> +
> + trace_poll_grow(ctx, old, ctx->poll_ns);
> + }
> + }
> +
> aio_notify_accept(ctx);
>
> /* if we have any readable fds, dispatch event */
> @@ -678,12 +724,16 @@ void aio_context_setup(AioContext *ctx)
> #endif
> }
>
> -void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns, Error **errp)
> +void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
> + int64_t grow, int64_t shrink, Error **errp)
> {
> - /* No thread synchronization here, it doesn't matter if an incorrect poll
> - * timeout is used once.
> + /* No thread synchronization here, it doesn't matter if an incorrect value
> + * is used once.
> */
> ctx->poll_max_ns = max_ns;
> + ctx->poll_ns = 0;
> + ctx->poll_grow = grow;
> + ctx->poll_shrink = shrink;
>
> aio_notify(ctx);
> }
> diff --git a/aio-win32.c b/aio-win32.c
> index d0e40a8..d19dc42 100644
> --- a/aio-win32.c
> +++ b/aio-win32.c
> @@ -395,7 +395,8 @@ void aio_context_setup(AioContext *ctx)
> {
> }
>
> -void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns, Error **errp)
> +void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
> + int64_t grow, int64_t shrink, Error **errp)
> {
> error_setg(errp, "AioContext polling is not implemented on Windows");
> }
> diff --git a/async.c b/async.c
> index 29abf40..2960171 100644
> --- a/async.c
> +++ b/async.c
> @@ -385,7 +385,10 @@ AioContext *aio_context_new(Error **errp)
> qemu_rec_mutex_init(&ctx->lock);
> timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
>
> + ctx->poll_ns = 0;
> ctx->poll_max_ns = 0;
> + ctx->poll_grow = 0;
> + ctx->poll_shrink = 0;
>
> return ctx;
> fail:
> diff --git a/iothread.c b/iothread.c
> index 8dfd10d..28598b5 100644
> --- a/iothread.c
> +++ b/iothread.c
> @@ -98,7 +98,7 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
> return;
> }
>
> - aio_context_set_poll_params(iothread->ctx, iothread->poll_max_ns,
> + aio_context_set_poll_params(iothread->ctx, iothread->poll_max_ns, 0, 0,
> &local_error);
> if (local_error) {
> error_propagate(errp, local_error);
> @@ -158,7 +158,7 @@ static void iothread_set_poll_max_ns(Object *obj, Visitor *v,
> iothread->poll_max_ns = value;
>
> if (iothread->ctx) {
> - aio_context_set_poll_params(iothread->ctx, value, &local_err);
> + aio_context_set_poll_params(iothread->ctx, value, 0, 0, &local_err);
> }
>
> out:
> diff --git a/trace-events b/trace-events
> index 7fe3a1b..1181486 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -28,6 +28,8 @@
> # aio-posix.c
> run_poll_handlers_begin(void *ctx, int64_t max_ns) "ctx %p max_ns %"PRId64
> run_poll_handlers_end(void *ctx, bool progress) "ctx %p progress %d"
> +poll_shrink(void *ctx, int64_t old, int64_t new) "ctx %p old %"PRId64" new %"PRId64
> +poll_grow(void *ctx, int64_t old, int64_t new) "ctx %p old %"PRId64" new %"PRId64
>
> # thread-pool.c
> thread_pool_submit(void *pool, void *req, void *opaque) "pool %p req %p opaque %p"
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 12/13] aio: self-tune polling time
2016-12-05 20:06 ` Christian Borntraeger
@ 2016-12-06 9:20 ` Stefan Hajnoczi
2016-12-06 10:12 ` Christian Borntraeger
0 siblings, 1 reply; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-06 9:20 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Stefan Hajnoczi, qemu-devel, Paolo Bonzini, Fam Zheng,
Karl Rister
[-- Attachment #1: Type: text/plain, Size: 2367 bytes --]
On Mon, Dec 05, 2016 at 09:06:17PM +0100, Christian Borntraeger wrote:
> On 12/01/2016 08:26 PM, Stefan Hajnoczi wrote:
> > This patch is based on the algorithm for the kvm.ko halt_poll_ns
> > parameter in Linux. The initial polling time is zero.
> >
> > If the event loop is woken up within the maximum polling time it means
> > polling could be effective, so grow polling time.
> >
> > If the event loop is woken up beyond the maximum polling time it means
> > polling is not effective, so shrink polling time.
> >
> > If the event loop makes progress within the current polling time then
> > the sweet spot has been reached.
> >
> > This algorithm adjusts the polling time so it can adapt to variations in
> > workloads. The goal is to reach the sweet spot while also recognizing
> > when polling would hurt more than help.
> >
> > Two new trace events, poll_grow and poll_shrink, are added for observing
> > polling time adjustment.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> Not sure way, but I have 4 host ramdisks with the same iothread as guest
> virtio-blk. running fio in the guest on one of these disks will poll, as
> soon as I have 2 disks in fio I almost always see shrinks (so polling
> stays at 0) and almost no grows.
Shrinking occurs when polling + ppoll(2) time exceeds poll-max-ns.
What is the value of poll-max-ns and how long is run_poll_handlers_end -
run_poll_handlers_begin?
I wonder if polling both disks takes longer than poll-max-ns once you
have two disks. The "polling" activity includes processing the I/O
requests, so I imagine the time extends significantly as more disks have
I/O requests ready for processing.
Maybe the block_ns timing calculation should exclude processing time to
avoid false shrinking?
It also strikes me that there's a blind spot to the self-tuning
algorithm: imagine virtqueue kick via ppoll(2) + ioeventfd takes N
nanoseconds. Detecting new virtqueue buffers via polling takes M
nanoseconds. When M <= poll-max-ns < N the algorithm decides there is
no point in polling but it would actually be faster to poll. The reason
is that the algorithm only looks at block_ns, which is N, not M.
This seems difficult to tackle because the algorithm has no way of
predicting M unless it randomly tries to poll longer.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode
2016-12-05 16:40 ` Karl Rister
@ 2016-12-06 9:27 ` Stefan Hajnoczi
0 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-06 9:27 UTC (permalink / raw)
To: Karl Rister
Cc: Stefan Hajnoczi, Fam Zheng, borntraeger, qemu-devel,
Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 940 bytes --]
On Mon, Dec 05, 2016 at 10:40:49AM -0600, Karl Rister wrote:
> On 12/05/2016 08:56 AM, Stefan Hajnoczi wrote:
> <snip>
>
> > Karl: do you have time to run a bigger suite of benchmarks to identify a
> > reasonable default poll-max-ns value? Both aio=native and aio=threads
> > are important.
> >
> > If there is a sweet spot that improves performance without pathological
> > cases then we could even enable polling by default in QEMU.
> >
> > Otherwise we'd just document the recommended best polling duration as a
> > starting point for users.
> >
>
> I have collected a baseline on the latest patches and am currently
> collecting poll-max-ns=16384. I can certainly throw in a few more
> scenarios. Do we want to stick with powers of 2 or some other strategy?
Excellent, thanks! The algorithm self-tunes by doubling poll time so
there's not much advantage to looking at non pow-2 values.
Thanks,
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers
2016-12-05 15:22 ` Paolo Bonzini
@ 2016-12-06 9:28 ` Stefan Hajnoczi
0 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-06 9:28 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Stefan Hajnoczi, Fam Zheng, borntraeger, Karl Rister, qemu-devel,
v maffione
[-- Attachment #1: Type: text/plain, Size: 521 bytes --]
On Mon, Dec 05, 2016 at 10:22:12AM -0500, Paolo Bonzini wrote:
> > Anyway, I'm glad the polling series on its own is already showing good
> > performance results. I'd like to merge it at the beginning of QEMU 2.9
> > so we can work on further improvements that build on top of it.
>
> Yes, it would be great if you can stage the infrastructure bits, so that
> I can rebase my lockcnt patches on top.
I'll apply it to block-next as soon as Christian and Karl are happy with
the performance results.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 12/13] aio: self-tune polling time
2016-12-06 9:20 ` Stefan Hajnoczi
@ 2016-12-06 10:12 ` Christian Borntraeger
2016-12-06 17:22 ` Stefan Hajnoczi
0 siblings, 1 reply; 32+ messages in thread
From: Christian Borntraeger @ 2016-12-06 10:12 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Stefan Hajnoczi, qemu-devel, Paolo Bonzini, Fam Zheng,
Karl Rister
On 12/06/2016 10:20 AM, Stefan Hajnoczi wrote:
> On Mon, Dec 05, 2016 at 09:06:17PM +0100, Christian Borntraeger wrote:
>> On 12/01/2016 08:26 PM, Stefan Hajnoczi wrote:
>>> This patch is based on the algorithm for the kvm.ko halt_poll_ns
>>> parameter in Linux. The initial polling time is zero.
>>>
>>> If the event loop is woken up within the maximum polling time it means
>>> polling could be effective, so grow polling time.
>>>
>>> If the event loop is woken up beyond the maximum polling time it means
>>> polling is not effective, so shrink polling time.
>>>
>>> If the event loop makes progress within the current polling time then
>>> the sweet spot has been reached.
>>>
>>> This algorithm adjusts the polling time so it can adapt to variations in
>>> workloads. The goal is to reach the sweet spot while also recognizing
>>> when polling would hurt more than help.
>>>
>>> Two new trace events, poll_grow and poll_shrink, are added for observing
>>> polling time adjustment.
>>>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>
>> Not sure way, but I have 4 host ramdisks with the same iothread as guest
>> virtio-blk. running fio in the guest on one of these disks will poll, as
>> soon as I have 2 disks in fio I almost always see shrinks (so polling
>> stays at 0) and almost no grows.
>
> Shrinking occurs when polling + ppoll(2) time exceeds poll-max-ns.
>
> What is the value of poll-max-ns
I used 50000ns as poll value. When using 500000ns it is polling again.
> and how long is run_poll_handlers_end - run_poll_handlers_begin?
Too long. I looked again and I realized that I used cache=none without
io=native. After adding io=native things are better. Even with 4 disks
polling still happens. So it seems that the mileage will vary depending
on the settings
Christian
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Qemu-devel] [PATCH v4 12/13] aio: self-tune polling time
2016-12-06 10:12 ` Christian Borntraeger
@ 2016-12-06 17:22 ` Stefan Hajnoczi
0 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-12-06 17:22 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Stefan Hajnoczi, qemu-devel, Paolo Bonzini, Fam Zheng,
Karl Rister
[-- Attachment #1: Type: text/plain, Size: 2282 bytes --]
On Tue, Dec 06, 2016 at 11:12:45AM +0100, Christian Borntraeger wrote:
> On 12/06/2016 10:20 AM, Stefan Hajnoczi wrote:
> > On Mon, Dec 05, 2016 at 09:06:17PM +0100, Christian Borntraeger wrote:
> >> On 12/01/2016 08:26 PM, Stefan Hajnoczi wrote:
> >>> This patch is based on the algorithm for the kvm.ko halt_poll_ns
> >>> parameter in Linux. The initial polling time is zero.
> >>>
> >>> If the event loop is woken up within the maximum polling time it means
> >>> polling could be effective, so grow polling time.
> >>>
> >>> If the event loop is woken up beyond the maximum polling time it means
> >>> polling is not effective, so shrink polling time.
> >>>
> >>> If the event loop makes progress within the current polling time then
> >>> the sweet spot has been reached.
> >>>
> >>> This algorithm adjusts the polling time so it can adapt to variations in
> >>> workloads. The goal is to reach the sweet spot while also recognizing
> >>> when polling would hurt more than help.
> >>>
> >>> Two new trace events, poll_grow and poll_shrink, are added for observing
> >>> polling time adjustment.
> >>>
> >>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> >>
> >> Not sure way, but I have 4 host ramdisks with the same iothread as guest
> >> virtio-blk. running fio in the guest on one of these disks will poll, as
> >> soon as I have 2 disks in fio I almost always see shrinks (so polling
> >> stays at 0) and almost no grows.
> >
> > Shrinking occurs when polling + ppoll(2) time exceeds poll-max-ns.
> >
> > What is the value of poll-max-ns
>
> I used 50000ns as poll value. When using 500000ns it is polling again.
>
> > and how long is run_poll_handlers_end - run_poll_handlers_begin?
>
> Too long. I looked again and I realized that I used cache=none without
> io=native. After adding io=native things are better. Even with 4 disks
> polling still happens. So it seems that the mileage will vary depending
> on the settings
Okay, it could be the things I mentioned. The self-tuning can be too
conservative and poll less than we'd hope for. On the other hand it's
hard to predict the actual ideal polling amount so I guess being
conservative is alright. Maybe the algorithm can be refined later.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2016-12-06 17:23 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-01 19:26 [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 01/13] aio: add flag to skip fds to aio_dispatch() Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 02/13] aio: add AioPollFn and io_poll() interface Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 03/13] aio: add polling mode to AioContext Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 04/13] virtio: poll virtqueues for new buffers Stefan Hajnoczi
2016-12-05 0:54 ` Fam Zheng
2016-12-05 10:14 ` Stefan Hajnoczi
2016-12-05 10:59 ` Paolo Bonzini
2016-12-05 14:46 ` Stefan Hajnoczi
2016-12-05 15:22 ` Paolo Bonzini
2016-12-06 9:28 ` Stefan Hajnoczi
2016-12-05 11:12 ` Christian Borntraeger
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 05/13] linux-aio: poll ring for completions Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 06/13] iothread: add polling parameters Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 07/13] virtio-blk: suppress virtqueue kick during processing Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 08/13] virtio-scsi: " Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 09/13] virtio: turn vq->notification into a nested counter Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 10/13] aio: add .io_poll_begin/end() callbacks Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 11/13] virtio: disable virtqueue notifications during polling Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 12/13] aio: self-tune polling time Stefan Hajnoczi
2016-12-05 20:06 ` Christian Borntraeger
2016-12-06 9:20 ` Stefan Hajnoczi
2016-12-06 10:12 ` Christian Borntraeger
2016-12-06 17:22 ` Stefan Hajnoczi
2016-12-01 19:26 ` [Qemu-devel] [PATCH v4 13/13] iothread: add poll-grow and poll-shrink parameters Stefan Hajnoczi
2016-12-02 8:27 ` [Qemu-devel] [PATCH v4 00/13] aio: experimental virtio-blk polling mode Paolo Bonzini
2016-12-02 11:11 ` Stefan Hajnoczi
2016-12-05 10:25 ` Fam Zheng
2016-12-05 14:56 ` Stefan Hajnoczi
2016-12-05 16:40 ` Karl Rister
2016-12-06 9:27 ` Stefan Hajnoczi
2016-12-05 11:19 ` Christian Borntraeger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).