* [RFC 1/3] aio-posix: run aio_set_fd_handler() in target AioContext
2023-12-13 21:15 [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler Stefan Hajnoczi
@ 2023-12-13 21:15 ` Stefan Hajnoczi
2023-12-13 21:15 ` [RFC 2/3] aio: use counter instead of ctx->list_lock Stefan Hajnoczi
` (5 subsequent siblings)
6 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2023-12-13 21:15 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, qemu-block, pbonzini, Hanna Reitz, Fam Zheng,
Fiona Ebner, Stefan Hajnoczi
TODO
- What about Windows?
Most of the event loop code runs in the AioContext's home thread. The
exceptions are aio_notify(), aio_bh_scheduler(), aio_set_fd_handler(),
etc. Amongst them, aio_set_fd_handler() is the most complicated because
the aio_handlers list must be both thread-safe and handle nested
aio_poll() simultaneously.
This patch eliminates the multi-threading concerns by moving the actual
work into a BH in the AioContext's home thread.
This change is required to call the AioHandler's io_poll_end() callback
from the AioContext's home thread in a later patch.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
util/aio-posix.c | 106 +++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 99 insertions(+), 7 deletions(-)
diff --git a/util/aio-posix.c b/util/aio-posix.c
index 7f2c99729d..c5e944f30b 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -97,13 +97,14 @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
return true;
}
-void aio_set_fd_handler(AioContext *ctx,
- int fd,
- IOHandler *io_read,
- IOHandler *io_write,
- AioPollFn *io_poll,
- IOHandler *io_poll_ready,
- void *opaque)
+/* Perform aio_set_fd_handler() in this thread's AioContext */
+static void aio_set_fd_handler_local(AioContext *ctx,
+ int fd,
+ IOHandler *io_read,
+ IOHandler *io_write,
+ AioPollFn *io_poll,
+ IOHandler *io_poll_ready,
+ void *opaque)
{
AioHandler *node;
AioHandler *new_node = NULL;
@@ -178,6 +179,97 @@ void aio_set_fd_handler(AioContext *ctx,
}
}
+typedef struct {
+ AioContext *ctx;
+ int fd;
+ IOHandler *io_read;
+ IOHandler *io_write;
+ AioPollFn *io_poll;
+ IOHandler *io_poll_ready;
+ void *opaque;
+ QemuEvent done;
+} AioSetFdHandlerRemote;
+
+static void aio_set_fd_handler_remote_bh(void *opaque)
+{
+ AioSetFdHandlerRemote *data = opaque;
+
+ aio_set_fd_handler_local(data->ctx, data->fd, data->io_read,
+ data->io_write, data->io_poll,
+ data->io_poll_ready, data->opaque);
+ qemu_event_set(&data->done);
+}
+
+/* Perform aio_set_fd_handler() in another thread's AioContext */
+static void aio_set_fd_handler_remote(AioContext *ctx,
+ int fd,
+ IOHandler *io_read,
+ IOHandler *io_write,
+ AioPollFn *io_poll,
+ IOHandler *io_poll_ready,
+ void *opaque)
+{
+ AioSetFdHandlerRemote data = {
+ .ctx = ctx,
+ .fd = fd,
+ .io_read = io_read,
+ .io_write = io_write,
+ .io_poll = io_poll,
+ .io_poll_ready = io_poll_ready,
+ .opaque = opaque,
+ };
+
+ /*
+ * Arbitrary threads waiting for each other can deadlock, so only allow
+ * cross-thread aio_set_fd_handler() when the BQL is held.
+ */
+ assert(qemu_in_main_thread());
+
+ qemu_event_init(&data.done, false);
+
+ aio_bh_schedule_oneshot(ctx, aio_set_fd_handler_remote_bh, &data);
+
+ /*
+ * The BQL is not dropped when run from the main loop thread so the
+ * assumption is that this wait is fast.
+ */
+ qemu_event_wait(&data.done);
+
+ qemu_event_destroy(&data.done);
+}
+
+void aio_set_fd_handler(AioContext *ctx,
+ int fd,
+ IOHandler *io_read,
+ IOHandler *io_write,
+ AioPollFn *io_poll,
+ IOHandler *io_poll_ready,
+ void *opaque)
+{
+ /*
+ * Special case for ctx->notifier: it's not possible to rely on
+ * in_aio_context_home_thread() or iohandler_get_aio_context() below when
+ * aio_context_new() calls aio_set_fd_handler() on ctx->notifier.
+ * qemu_set_current_aio_context() and iohandler_ctx haven't been set up yet
+ * at this point. Treat ctx as local when dealing with ctx->notifier.
+ */
+ bool is_ctx_notifier = fd == event_notifier_get_fd(&ctx->notifier);
+
+ /*
+ * iohandler_ctx is special in that it runs in the main thread, but that
+ * thread's context is qemu_aio_context.
+ */
+ if (is_ctx_notifier ||
+ in_aio_context_home_thread(ctx == iohandler_get_aio_context() ?
+ qemu_get_aio_context() : ctx)) {
+ aio_set_fd_handler_local(ctx, fd, io_read, io_write, io_poll,
+ io_poll_ready, opaque);
+ } else {
+ aio_set_fd_handler_remote(ctx, fd, io_read, io_write, io_poll,
+ io_poll_ready, opaque);
+ }
+}
+
static void aio_set_fd_poll(AioContext *ctx, int fd,
IOHandler *io_poll_begin,
IOHandler *io_poll_end)
--
2.43.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC 2/3] aio: use counter instead of ctx->list_lock
2023-12-13 21:15 [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler Stefan Hajnoczi
2023-12-13 21:15 ` [RFC 1/3] aio-posix: run aio_set_fd_handler() in target AioContext Stefan Hajnoczi
@ 2023-12-13 21:15 ` Stefan Hajnoczi
2023-12-13 21:15 ` [RFC 3/3] aio-posix: call ->poll_end() when removing AioHandler Stefan Hajnoczi
` (4 subsequent siblings)
6 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2023-12-13 21:15 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, qemu-block, pbonzini, Hanna Reitz, Fam Zheng,
Fiona Ebner, Stefan Hajnoczi
TODO further simplifications may be possible, like using none _RCU() macros for the aio_handlers QLIST
Now that aio_set_fd_handler() uses a BH to schedule itself in remote
AioContexts it is no longer necessary to worry about multi-threading.
Replace the ctx->list_lock locked counter with a plain uint32_t counter
called ctx->walking_handlers.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/aio.h | 22 ++++++----------------
util/aio-posix.c | 44 ++++++++++++++++----------------------------
util/async.c | 2 --
util/fdmon-epoll.c | 6 +-----
4 files changed, 23 insertions(+), 51 deletions(-)
diff --git a/include/block/aio.h b/include/block/aio.h
index af05512a7d..569e704411 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -71,8 +71,6 @@ typedef struct {
* removed
*
* Add/remove/modify a monitored file descriptor.
- *
- * Called with ctx->list_lock acquired.
*/
void (*update)(AioContext *ctx, AioHandler *old_node, AioHandler *new_node);
@@ -84,7 +82,7 @@ typedef struct {
*
* Wait for file descriptors to become ready and place them on ready_list.
*
- * Called with ctx->list_lock incremented but not locked.
+ * Called with ctx->walking_handlers incremented.
*
* Returns: number of ready file descriptors.
*/
@@ -136,10 +134,10 @@ struct AioContext {
*/
BdrvGraphRWlock *bdrv_graph;
- /* The list of registered AIO handlers. Protected by ctx->list_lock. */
+ /* The list of registered AIO handlers. */
AioHandlerList aio_handlers;
- /* The list of AIO handlers to be deleted. Protected by ctx->list_lock. */
+ /* The list of AIO handlers to be deleted. */
AioHandlerList deleted_aio_handlers;
/* Used to avoid unnecessary event_notifier_set calls in aio_notify;
@@ -171,11 +169,8 @@ struct AioContext {
*/
uint32_t notify_me;
- /* A lock to protect between QEMUBH and AioHandler adders and deleter,
- * and to ensure that no callbacks are removed while we're walking and
- * dispatching them.
- */
- QemuLockCnt list_lock;
+ /* Re-entrancy counter for iterating over aio_handlers */
+ uint32_t walking_handlers;
/* Bottom Halves pending aio_bh_poll() processing */
BHList bh_list;
@@ -236,12 +231,7 @@ struct AioContext {
/* AIO engine parameters */
int64_t aio_max_batch; /* maximum number of requests in a batch */
- /*
- * List of handlers participating in userspace polling. Protected by
- * ctx->list_lock. Iterated and modified mostly by the event loop thread
- * from aio_poll() with ctx->list_lock incremented. aio_set_fd_handler()
- * only touches the list to delete nodes if ctx->list_lock's count is zero.
- */
+ /* List of handlers participating in userspace polling. */
AioHandlerList poll_aio_handlers;
/* Are we in polling mode or monitoring file descriptors? */
diff --git a/util/aio-posix.c b/util/aio-posix.c
index c5e944f30b..86e232e9d3 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -84,7 +84,7 @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
}
/* If a read is in progress, just mark the node as deleted */
- if (qemu_lockcnt_count(&ctx->list_lock)) {
+ if (ctx->walking_handlers > 0) {
QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted);
return false;
}
@@ -116,14 +116,11 @@ static void aio_set_fd_handler_local(AioContext *ctx,
io_poll = NULL; /* polling only makes sense if there is a handler */
}
- qemu_lockcnt_lock(&ctx->list_lock);
-
node = find_aio_handler(ctx, fd);
/* Are we deleting the fd handler? */
if (!io_read && !io_write && !io_poll) {
if (node == NULL) {
- qemu_lockcnt_unlock(&ctx->list_lock);
return;
}
/* Clean events in order to unregister fd from the ctx epoll. */
@@ -171,7 +168,6 @@ static void aio_set_fd_handler_local(AioContext *ctx,
if (node) {
deleted = aio_remove_fd_handler(ctx, node);
}
- qemu_lockcnt_unlock(&ctx->list_lock);
aio_notify(ctx);
if (deleted) {
@@ -317,7 +313,7 @@ static bool poll_set_started(AioContext *ctx, AioHandlerList *ready_list,
ctx->poll_started = started;
- qemu_lockcnt_inc(&ctx->list_lock);
+ ctx->walking_handlers++;
QLIST_FOREACH(node, &ctx->poll_aio_handlers, node_poll) {
IOHandler *fn;
@@ -341,7 +337,7 @@ static bool poll_set_started(AioContext *ctx, AioHandlerList *ready_list,
progress = true;
}
}
- qemu_lockcnt_dec(&ctx->list_lock);
+ ctx->walking_handlers--;
return progress;
}
@@ -363,12 +359,7 @@ bool aio_pending(AioContext *ctx)
AioHandler *node;
bool result = false;
- /*
- * We have to walk very carefully in case aio_set_fd_handler is
- * called while we're walking.
- */
- qemu_lockcnt_inc(&ctx->list_lock);
-
+ ctx->walking_handlers++;
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
int revents;
@@ -383,19 +374,17 @@ bool aio_pending(AioContext *ctx)
break;
}
}
- qemu_lockcnt_dec(&ctx->list_lock);
+ ctx->walking_handlers--;
return result;
}
+/* Caller must not have ctx->walking_handlers incremented */
static void aio_free_deleted_handlers(AioContext *ctx)
{
AioHandler *node;
- if (QLIST_EMPTY_RCU(&ctx->deleted_aio_handlers)) {
- return;
- }
- if (!qemu_lockcnt_dec_if_lock(&ctx->list_lock)) {
+ if (ctx->walking_handlers > 0) {
return; /* we are nested, let the parent do the freeing */
}
@@ -405,8 +394,6 @@ static void aio_free_deleted_handlers(AioContext *ctx)
QLIST_SAFE_REMOVE(node, node_poll);
g_free(node);
}
-
- qemu_lockcnt_inc_and_unlock(&ctx->list_lock);
}
static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
@@ -511,11 +498,12 @@ static bool aio_dispatch_handlers(AioContext *ctx)
void aio_dispatch(AioContext *ctx)
{
- qemu_lockcnt_inc(&ctx->list_lock);
+ ctx->walking_handlers++;
aio_bh_poll(ctx);
aio_dispatch_handlers(ctx);
+ ctx->walking_handlers--;
+
aio_free_deleted_handlers(ctx);
- qemu_lockcnt_dec(&ctx->list_lock);
timerlistgroup_run_timers(&ctx->tlg);
}
@@ -607,7 +595,7 @@ static bool remove_idle_poll_handlers(AioContext *ctx,
*
* Polls for a given time.
*
- * Note that the caller must have incremented ctx->list_lock.
+ * Note that the caller must have incremented ctx->walking_handlers.
*
* Returns: true if progress was made, false otherwise
*/
@@ -617,7 +605,7 @@ static bool run_poll_handlers(AioContext *ctx, AioHandlerList *ready_list,
bool progress;
int64_t start_time, elapsed_time;
- assert(qemu_lockcnt_count(&ctx->list_lock) > 0);
+ assert(&ctx->walking_handlers > 0);
trace_run_poll_handlers_begin(ctx, max_ns, *timeout);
@@ -663,7 +651,7 @@ static bool run_poll_handlers(AioContext *ctx, AioHandlerList *ready_list,
* @timeout: timeout for blocking wait, computed by the caller and updated if
* polling succeeds.
*
- * Note that the caller must have incremented ctx->list_lock.
+ * Note that the caller must have incremented ctx->walking_handlers.
*
* Returns: true if progress was made, false otherwise
*/
@@ -711,7 +699,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
assert(in_aio_context_home_thread(ctx == iohandler_get_aio_context() ?
qemu_get_aio_context() : ctx));
- qemu_lockcnt_inc(&ctx->list_lock);
+ ctx->walking_handlers++;
if (ctx->poll_max_ns) {
start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -814,9 +802,9 @@ bool aio_poll(AioContext *ctx, bool blocking)
progress |= aio_bh_poll(ctx);
progress |= aio_dispatch_ready_handlers(ctx, &ready_list);
- aio_free_deleted_handlers(ctx);
+ ctx->walking_handlers--;
- qemu_lockcnt_dec(&ctx->list_lock);
+ aio_free_deleted_handlers(ctx);
progress |= timerlistgroup_run_timers(&ctx->tlg);
diff --git a/util/async.c b/util/async.c
index 460529057c..19dd9efce1 100644
--- a/util/async.c
+++ b/util/async.c
@@ -412,7 +412,6 @@ aio_ctx_finalize(GSource *source)
aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL, NULL);
event_notifier_cleanup(&ctx->notifier);
qemu_rec_mutex_destroy(&ctx->lock);
- qemu_lockcnt_destroy(&ctx->list_lock);
timerlistgroup_deinit(&ctx->tlg);
unregister_aiocontext(ctx);
aio_context_destroy(ctx);
@@ -585,7 +584,6 @@ AioContext *aio_context_new(Error **errp)
goto fail;
}
g_source_set_can_recurse(&ctx->source, true);
- qemu_lockcnt_init(&ctx->list_lock);
ctx->co_schedule_bh = aio_bh_new(ctx, co_schedule_bh_cb, ctx);
QSLIST_INIT(&ctx->scheduled_coroutines);
diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c
index c6413cb18f..ec7c8effc9 100644
--- a/util/fdmon-epoll.c
+++ b/util/fdmon-epoll.c
@@ -132,15 +132,11 @@ bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd)
return false;
}
- /* The list must not change while we add fds to epoll */
- if (!qemu_lockcnt_dec_if_lock(&ctx->list_lock)) {
+ if (ctx->walking_handlers > 1) {
return false;
}
ok = fdmon_epoll_try_enable(ctx);
-
- qemu_lockcnt_inc_and_unlock(&ctx->list_lock);
-
if (!ok) {
fdmon_epoll_disable(ctx);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC 3/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-13 21:15 [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler Stefan Hajnoczi
2023-12-13 21:15 ` [RFC 1/3] aio-posix: run aio_set_fd_handler() in target AioContext Stefan Hajnoczi
2023-12-13 21:15 ` [RFC 2/3] aio: use counter instead of ctx->list_lock Stefan Hajnoczi
@ 2023-12-13 21:15 ` Stefan Hajnoczi
2023-12-13 21:52 ` Paolo Bonzini
2023-12-13 21:52 ` [RFC 0/3] " Stefan Hajnoczi
` (3 subsequent siblings)
6 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2023-12-13 21:15 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, qemu-block, pbonzini, Hanna Reitz, Fam Zheng,
Fiona Ebner, Stefan Hajnoczi
Hanna Czenczek found the root cause for --device
virtio-scsi-pci,iothread= hangs across hotplug/unplug. When AioContext
polling has started on the virtqueue host notifier in the IOThread and
the main loop thread calls aio_set_fd_handler(), then the io_poll_end()
callback is not invoked on the virtqueue host notifier. The virtqueue is
left with polling disabled and never detects guest activity once
attached again (because virtqueue kicks are not enabled and AioContext
only starts polling after the fd becomes active for the first time). The
result is that the virtio-scsi device stops responding to its virtqueue.
Previous patches made aio_set_fd_handler() run in the AioContext's home
thread so it's now possible to call ->io_poll_end() to solve this bug.
Buglink: https://issues.redhat.com/browse/RHEL-3934
Cc: Hanna Czenczek <hreitz@redhat.com>
Cc: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
util/aio-posix.c | 53 ++++++++++++++++++++++++++++++------------------
1 file changed, 33 insertions(+), 20 deletions(-)
diff --git a/util/aio-posix.c b/util/aio-posix.c
index 86e232e9d3..2245a9659b 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -64,8 +64,11 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
return NULL;
}
-static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
+static void aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
{
+ bool poll_ready;
+ bool delete_node = false;
+
/* If the GSource is in the process of being destroyed then
* g_source_remove_poll() causes an assertion failure. Skip
* removal in that case, because glib cleans up its state during
@@ -76,25 +79,40 @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
}
node->pfd.revents = 0;
+ poll_ready = node->poll_ready;
node->poll_ready = false;
/* If the fd monitor has already marked it deleted, leave it alone */
- if (QLIST_IS_INSERTED(node, node_deleted)) {
- return false;
+ if (!QLIST_IS_INSERTED(node, node_deleted)) {
+ /* If a read is in progress, just mark the node as deleted */
+ if (ctx->walking_handlers > 0) {
+ QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted);
+ } else {
+ /* Otherwise, delete it for real. We can't just mark it as
+ * deleted because deleted nodes are only cleaned up while
+ * no one is walking the handlers list.
+ */
+ QLIST_SAFE_REMOVE(node, node_poll);
+ QLIST_REMOVE(node, node);
+ delete_node = true;
+ }
}
- /* If a read is in progress, just mark the node as deleted */
- if (ctx->walking_handlers > 0) {
- QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted);
- return false;
+ /* If polling was started on the node then end it now */
+ if (ctx->poll_started && node->io_poll_end) {
+ node->io_poll_end(node->opaque);
+
+ /* Poll one last time in case ->io_poll_end() raced with the event */
+ if (node->io_poll(node->opaque)) {
+ poll_ready = true;
+ }
+ }
+ if (poll_ready) {
+ node->io_poll_ready(node->opaque);
+ }
+ if (delete_node) {
+ g_free(node);
}
- /* Otherwise, delete it for real. We can't just mark it as
- * deleted because deleted nodes are only cleaned up while
- * no one is walking the handlers list.
- */
- QLIST_SAFE_REMOVE(node, node_poll);
- QLIST_REMOVE(node, node);
- return true;
}
/* Perform aio_set_fd_handler() in this thread's AioContext */
@@ -109,7 +127,6 @@ static void aio_set_fd_handler_local(AioContext *ctx,
AioHandler *node;
AioHandler *new_node = NULL;
bool is_new = false;
- bool deleted = false;
int poll_disable_change;
if (io_poll && !io_poll_ready) {
@@ -166,13 +183,9 @@ static void aio_set_fd_handler_local(AioContext *ctx,
ctx->fdmon_ops->update(ctx, node, new_node);
if (node) {
- deleted = aio_remove_fd_handler(ctx, node);
+ aio_remove_fd_handler(ctx, node);
}
aio_notify(ctx);
-
- if (deleted) {
- g_free(node);
- }
}
typedef struct {
--
2.43.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC 3/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-13 21:15 ` [RFC 3/3] aio-posix: call ->poll_end() when removing AioHandler Stefan Hajnoczi
@ 2023-12-13 21:52 ` Paolo Bonzini
2023-12-14 20:12 ` Stefan Hajnoczi
0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2023-12-13 21:52 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Kevin Wolf, qemu-block, Hanna Reitz, Fam Zheng,
Fiona Ebner
On Wed, Dec 13, 2023 at 10:15 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> - /* If a read is in progress, just mark the node as deleted */
> - if (ctx->walking_handlers > 0) {
> - QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted);
> - return false;
> + /* If polling was started on the node then end it now */
> + if (ctx->poll_started && node->io_poll_end) {
> + node->io_poll_end(node->opaque);
> +
> + /* Poll one last time in case ->io_poll_end() raced with the event */
> + if (node->io_poll(node->opaque)) {
> + poll_ready = true;
> + }
Do you need this at all? If the caller is removing the handlers, they
should have put themselves in a state where they don't care about the
file descriptor becoming readable.
You still have to be careful because aio_bh_schedule_oneshot() can
trigger remote nested event loops (which can cause deadlocks and, I am
especially afraid, can take some time and invalidate the expectation
that you don't need to drop the BQL). But it does simplify this patch
quite a bit.
Paolo
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 3/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-13 21:52 ` Paolo Bonzini
@ 2023-12-14 20:12 ` Stefan Hajnoczi
2023-12-14 20:39 ` Paolo Bonzini
0 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2023-12-14 20:12 UTC (permalink / raw)
To: Paolo Bonzini
Cc: qemu-devel, Kevin Wolf, qemu-block, Hanna Reitz, Fam Zheng,
Fiona Ebner
[-- Attachment #1: Type: text/plain, Size: 2038 bytes --]
On Wed, Dec 13, 2023 at 10:52:58PM +0100, Paolo Bonzini wrote:
> On Wed, Dec 13, 2023 at 10:15 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > - /* If a read is in progress, just mark the node as deleted */
> > - if (ctx->walking_handlers > 0) {
> > - QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted);
> > - return false;
> > + /* If polling was started on the node then end it now */
> > + if (ctx->poll_started && node->io_poll_end) {
> > + node->io_poll_end(node->opaque);
> > +
> > + /* Poll one last time in case ->io_poll_end() raced with the event */
> > + if (node->io_poll(node->opaque)) {
> > + poll_ready = true;
> > + }
>
> Do you need this at all? If the caller is removing the handlers, they
> should have put themselves in a state where they don't care about the
> file descriptor becoming readable.
The caller no longer wishes to monitor the fd. This may be temporary
though. The fd must stay readable so that the caller can monitor it
again in the future and handle pending events.
->io_poll_begin() and ->io_poll_end() are used to bypass the fd while in
polling mode (e.g. disabling virtqueue kicks). This is a performance
optimization that avoids wasting cycles servicing fds when we're polling
anyway.
When the caller does:
1. aio_set_fd_handler(ctx, fd, NULL, ...) // stop monitoring
2. aio_set_fd_handler(ctx, fd, read_fd, ...) // start monitoring
Then read_fd() should be called if the fd became readable between 1 and 2.
Since the fd may be bypassed until ->io_poll_end() returns, we must poll
one last time to check if an event snuck in right at the end without
making the fd readable. If polling detected an event, then we must do
something. We cannot drop the event.
(An alternative is to poll once before monitoring the fd again. That way
pending events are detected even if the fd wasn't readable. That is
currently not the way aio-posix.c works though.)
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 3/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-14 20:12 ` Stefan Hajnoczi
@ 2023-12-14 20:39 ` Paolo Bonzini
2023-12-18 14:27 ` Stefan Hajnoczi
0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2023-12-14 20:39 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Kevin Wolf, open list:Block layer core, Hanna Reitz,
Fam Zheng, Fiona Ebner
[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]
Il gio 14 dic 2023, 21:12 Stefan Hajnoczi <stefanha@redhat.com> ha scritto:
> Since the fd may be bypassed until ->io_poll_end() returns, we must poll
> one last time to check if an event snuck in right at the end without
> making the fd readable. If polling detected an event, then we must do
> something. We cannot drop the event
I agree that in general we cannot. I wonder however if, in the (already
racy) case of a concurrent aio_set_fd_handler(ctx, fd, NULL, ...), you
really need to call poll_ready here.
>
> (An alternative is to poll once before monitoring the fd again. That way
> pending events are detected even if the fd wasn't readable. That is
> currently not the way aio-posix.c works though.)
>
Yes, that would be a change. If I understood correctly Hanna's suggestions
in the issue, she also mentioned doing a manual virtqueue notification
before monitoring restarts. So basically my idea boils down to implementing
that, and then cleaning up everything on top.
Paolo
> Stefan
>
[-- Attachment #2: Type: text/html, Size: 1864 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 3/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-14 20:39 ` Paolo Bonzini
@ 2023-12-18 14:27 ` Stefan Hajnoczi
0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2023-12-18 14:27 UTC (permalink / raw)
To: Paolo Bonzini
Cc: qemu-devel, Kevin Wolf, open list:Block layer core, Hanna Reitz,
Fam Zheng, Fiona Ebner
[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]
On Thu, Dec 14, 2023 at 09:39:02PM +0100, Paolo Bonzini wrote:
> Il gio 14 dic 2023, 21:12 Stefan Hajnoczi <stefanha@redhat.com> ha scritto:
>
> > Since the fd may be bypassed until ->io_poll_end() returns, we must poll
> > one last time to check if an event snuck in right at the end without
> > making the fd readable. If polling detected an event, then we must do
> > something. We cannot drop the event
>
>
> I agree that in general we cannot. I wonder however if, in the (already
> racy) case of a concurrent aio_set_fd_handler(ctx, fd, NULL, ...), you
> really need to call poll_ready here.
It doesn't need to happen here but it needs to be recorded so that the
handler (either poll_ready or the fd read handler) runs later. In the
case of eventfds it's easy to write(fd, ...) so that activity will be
picked up again when file descriptor monitoring resumes.
> >
> > (An alternative is to poll once before monitoring the fd again. That way
> > pending events are detected even if the fd wasn't readable. That is
> > currently not the way aio-posix.c works though.)
> >
>
> Yes, that would be a change. If I understood correctly Hanna's suggestions
> in the issue, she also mentioned doing a manual virtqueue notification
> before monitoring restarts. So basically my idea boils down to implementing
> that, and then cleaning up everything on top.
Okay.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-13 21:15 [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler Stefan Hajnoczi
` (2 preceding siblings ...)
2023-12-13 21:15 ` [RFC 3/3] aio-posix: call ->poll_end() when removing AioHandler Stefan Hajnoczi
@ 2023-12-13 21:52 ` Stefan Hajnoczi
2023-12-13 23:10 ` Paolo Bonzini
` (2 subsequent siblings)
6 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2023-12-13 21:52 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Kevin Wolf, qemu-block, pbonzini, Hanna Reitz,
Fam Zheng, Fiona Ebner
Based-on: 20231205182011.1976568-1-stefanha@redhat.com
(https://lore.kernel.org/qemu-devel/20231205182011.1976568-1-stefanha@redhat.com/)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-13 21:15 [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler Stefan Hajnoczi
` (3 preceding siblings ...)
2023-12-13 21:52 ` [RFC 0/3] " Stefan Hajnoczi
@ 2023-12-13 23:10 ` Paolo Bonzini
2023-12-14 19:52 ` Stefan Hajnoczi
2023-12-14 13:38 ` Fiona Ebner
2024-01-02 15:24 ` Hanna Czenczek
6 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2023-12-13 23:10 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Kevin Wolf, qemu-block, Hanna Reitz, Fam Zheng,
Fiona Ebner
On Wed, Dec 13, 2023 at 10:15 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> Alternatives welcome! (A cleaner version of this approach might be to forbid
> cross-thread aio_set_fd_handler() calls and to refactor all
> aio_set_fd_handler() callers so they come from the AioContext's home thread.
> I'm starting to think that only the aio_notify() and aio_schedule_bh() APIs
> should be thread-safe.)
I think that's pretty hard because aio_set_fd_handler() is a pretty
important part of the handoff from one AioContext to another and also
of drained_begin()/end(), and both of these things run in the main
thread.
Regarding how to solve this issue, there is a lot of
"underdocumenting" of the locking policy in aio-posix.c, and indeed it
makes running aio_set_fd_handler() in the target AioContext tempting;
but it is also scary to rely on the iothread being able to react
quickly. I'm also worried that we're changing the logic just because
we don't understand the old one, but then we add technical debt.
So, as a first step, I would take inspiration from the block layer
locking work, and add assertions to functions like poll_set_started()
or find_aio_handler(). Is the list_lock elevated (nonzero)? Or locked?
Are we in the iothread? And likewise, for each list, does insertion
happen from the iothread or with the list_lock taken (and possibly
elevated)? Does removal happen from the iothread or with list_lock
zero+taken?
After this step, we should have a clearer idea of the possible states
of the node (based on the lists, the state is a subset of
{poll_started, deleted, ready}) and draw a nice graph of the
transitions. We should also understand if any calls to
QLIST_IS_INSERTED() have correctness issues.
Good news, I don't think any memory barriers are needed here. One
thing that we already do correctly is that, once a node is deleted, we
try to skip work; see for example poll_set_started(). This also
provides a good place to do cleanup work for deleted nodes, including
calling poll_end(): aio_free_deleted_handlers(), because it runs with
list_lock zero and taken, just like the tail of
aio_remove_fd_handler(). It's the safest possible place to do cleanup
and to take a lock. Therefore we have:
- a fast path in the iothread that runs without any concurrence with
stuff happening in the main thread
- a slow path in the iothread that runs with list_lock zero and taken.
The slow path shares logic with the main thread, meaning that
aio_free_deleted_handlers() and aio_remove_fd_handler() should share
some functions called by both.
If the code is organized this way, any wrong bits should jump out more
easily. For example, these two lines in aio_remove_fd_handler() are
clearly misplaced
node->pfd.revents = 0;
node->poll_ready = false;
because they run in the main thread but they touch iothread data! They
should be after qemu_lockcnt_count() is checked to be zero.
Regarding the call to io_poll_ready(), I would hope that it is
unnecessary; in other words, that after drained_end() the virtqueue
notification would be raised. Yes, virtio_queue_set_notification is
edge triggered rather than level triggered, so it would be necessary
to add a check with virtio_queue_host_notifier_aio_poll() and
virtio_queue_host_notifier_aio_poll_ready() in
virtio_queue_aio_attach_host_notifier, but that does not seem too bad
because virtio is the only user of the io_poll_begin and io_poll_end
callbacks. It would have to be documented though.
Paolo
Paolo
>
> Stefan Hajnoczi (3):
> aio-posix: run aio_set_fd_handler() in target AioContext
> aio: use counter instead of ctx->list_lock
> aio-posix: call ->poll_end() when removing AioHandler
>
> include/block/aio.h | 22 ++---
> util/aio-posix.c | 197 ++++++++++++++++++++++++++++++++------------
> util/async.c | 2 -
> util/fdmon-epoll.c | 6 +-
> 4 files changed, 152 insertions(+), 75 deletions(-)
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-13 23:10 ` Paolo Bonzini
@ 2023-12-14 19:52 ` Stefan Hajnoczi
0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2023-12-14 19:52 UTC (permalink / raw)
To: Paolo Bonzini
Cc: qemu-devel, Kevin Wolf, qemu-block, Hanna Reitz, Fam Zheng,
Fiona Ebner
[-- Attachment #1: Type: text/plain, Size: 5097 bytes --]
On Thu, Dec 14, 2023 at 12:10:32AM +0100, Paolo Bonzini wrote:
> On Wed, Dec 13, 2023 at 10:15 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > Alternatives welcome! (A cleaner version of this approach might be to forbid
> > cross-thread aio_set_fd_handler() calls and to refactor all
> > aio_set_fd_handler() callers so they come from the AioContext's home thread.
> > I'm starting to think that only the aio_notify() and aio_schedule_bh() APIs
> > should be thread-safe.)
>
> I think that's pretty hard because aio_set_fd_handler() is a pretty
> important part of the handoff from one AioContext to another and also
> of drained_begin()/end(), and both of these things run in the main
> thread.
>
> Regarding how to solve this issue, there is a lot of
> "underdocumenting" of the locking policy in aio-posix.c, and indeed it
> makes running aio_set_fd_handler() in the target AioContext tempting;
> but it is also scary to rely on the iothread being able to react
> quickly. I'm also worried that we're changing the logic just because
> we don't understand the old one, but then we add technical debt.
>
> So, as a first step, I would take inspiration from the block layer
> locking work, and add assertions to functions like poll_set_started()
> or find_aio_handler(). Is the list_lock elevated (nonzero)? Or locked?
> Are we in the iothread? And likewise, for each list, does insertion
> happen from the iothread or with the list_lock taken (and possibly
> elevated)? Does removal happen from the iothread or with list_lock
> zero+taken?
>
> After this step, we should have a clearer idea of the possible states
> of the node (based on the lists, the state is a subset of
> {poll_started, deleted, ready}) and draw a nice graph of the
> transitions. We should also understand if any calls to
> QLIST_IS_INSERTED() have correctness issues.
>
> Good news, I don't think any memory barriers are needed here. One
> thing that we already do correctly is that, once a node is deleted, we
> try to skip work; see for example poll_set_started(). This also
> provides a good place to do cleanup work for deleted nodes, including
> calling poll_end(): aio_free_deleted_handlers(), because it runs with
> list_lock zero and taken, just like the tail of
> aio_remove_fd_handler(). It's the safest possible place to do cleanup
> and to take a lock. Therefore we have:
>
> - a fast path in the iothread that runs without any concurrence with
> stuff happening in the main thread
>
> - a slow path in the iothread that runs with list_lock zero and taken.
> The slow path shares logic with the main thread, meaning that
> aio_free_deleted_handlers() and aio_remove_fd_handler() should share
> some functions called by both.
>
> If the code is organized this way, any wrong bits should jump out more
> easily. For example, these two lines in aio_remove_fd_handler() are
> clearly misplaced
>
> node->pfd.revents = 0;
> node->poll_ready = false;
>
> because they run in the main thread but they touch iothread data! They
> should be after qemu_lockcnt_count() is checked to be zero.
>
> Regarding the call to io_poll_ready(), I would hope that it is
> unnecessary; in other words, that after drained_end() the virtqueue
> notification would be raised. Yes, virtio_queue_set_notification is
> edge triggered rather than level triggered, so it would be necessary
> to add a check with virtio_queue_host_notifier_aio_poll() and
> virtio_queue_host_notifier_aio_poll_ready() in
> virtio_queue_aio_attach_host_notifier, but that does not seem too bad
> because virtio is the only user of the io_poll_begin and io_poll_end
> callbacks. It would have to be documented though.
I think Hanna had the same idea: document that ->io_poll_end() isn't
called by aio_set_fd_handler() and shift the responsibility onto the
caller to get back into a state where notifications are enabled before
they add the fd with aio_set_fd_handler() again.
In a little more detail, the caller needs to do the following before
adding the fd back with aio_set_fd_handler() again:
1. Call ->io_poll_end().
2. Poll one more time in case an event slipped in and write to the
eventfd so the fd is immediately readable or call ->io_poll_ready().
I think this is more or less what you described above.
I don't like pushing this responsibility onto the caller, but adding a
synchronization point in aio_set_fd_handler() is problematic, so let's
give it a try. I'll try that approach and send a v2.
Stefan
>
> Paolo
>
>
> Paolo
>
> >
> > Stefan Hajnoczi (3):
> > aio-posix: run aio_set_fd_handler() in target AioContext
> > aio: use counter instead of ctx->list_lock
> > aio-posix: call ->poll_end() when removing AioHandler
> >
> > include/block/aio.h | 22 ++---
> > util/aio-posix.c | 197 ++++++++++++++++++++++++++++++++------------
> > util/async.c | 2 -
> > util/fdmon-epoll.c | 6 +-
> > 4 files changed, 152 insertions(+), 75 deletions(-)
> >
> > --
> > 2.43.0
> >
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-13 21:15 [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler Stefan Hajnoczi
` (4 preceding siblings ...)
2023-12-13 23:10 ` Paolo Bonzini
@ 2023-12-14 13:38 ` Fiona Ebner
2023-12-14 19:53 ` Stefan Hajnoczi
2024-01-02 15:24 ` Hanna Czenczek
6 siblings, 1 reply; 30+ messages in thread
From: Fiona Ebner @ 2023-12-14 13:38 UTC (permalink / raw)
To: Stefan Hajnoczi, qemu-devel
Cc: Kevin Wolf, qemu-block, pbonzini, Hanna Reitz, Fam Zheng
Am 13.12.23 um 22:15 schrieb Stefan Hajnoczi:
> But there you have it. Please let me know what you think and try your
> reproducers to see if this fixes the missing io_poll_end() issue. Thanks!
>
Thanks to you! I applied the RFC (and the series it depends on) on top
of 8.2.0-rc4 and this fixes my reproducer which drains VirtIO SCSI or
VirtIO block devices in a loop. Also didn't encounter any other issues
while playing around a bit with backup and mirror jobs.
The changes look fine to me, but this issue is also the first time I
came in close contact with this code, so that unfortunately does not say
much.
Best Regards,
Fiona
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-14 13:38 ` Fiona Ebner
@ 2023-12-14 19:53 ` Stefan Hajnoczi
2023-12-18 12:41 ` Fiona Ebner
0 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2023-12-14 19:53 UTC (permalink / raw)
To: Fiona Ebner
Cc: qemu-devel, Kevin Wolf, qemu-block, pbonzini, Hanna Reitz,
Fam Zheng
[-- Attachment #1: Type: text/plain, Size: 856 bytes --]
On Thu, Dec 14, 2023 at 02:38:47PM +0100, Fiona Ebner wrote:
> Am 13.12.23 um 22:15 schrieb Stefan Hajnoczi:
> > But there you have it. Please let me know what you think and try your
> > reproducers to see if this fixes the missing io_poll_end() issue. Thanks!
> >
>
> Thanks to you! I applied the RFC (and the series it depends on) on top
> of 8.2.0-rc4 and this fixes my reproducer which drains VirtIO SCSI or
> VirtIO block devices in a loop. Also didn't encounter any other issues
> while playing around a bit with backup and mirror jobs.
>
> The changes look fine to me, but this issue is also the first time I
> came in close contact with this code, so that unfortunately does not say
> much.
Great.
I will still try the other approach that Hanna and Paolo have suggested.
It seems more palatable. I will send a v2.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-14 19:53 ` Stefan Hajnoczi
@ 2023-12-18 12:41 ` Fiona Ebner
2023-12-18 14:25 ` Stefan Hajnoczi
2023-12-18 14:49 ` Paolo Bonzini
0 siblings, 2 replies; 30+ messages in thread
From: Fiona Ebner @ 2023-12-18 12:41 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Kevin Wolf, qemu-block, pbonzini, Hanna Reitz,
Fam Zheng
Am 14.12.23 um 20:53 schrieb Stefan Hajnoczi:
>
> I will still try the other approach that Hanna and Paolo have suggested.
> It seems more palatable. I will send a v2.
>
FYI, what I already tried downstream (for VirtIO SCSI):
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 9c751bf296..a6449b04d0 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1166,6 +1166,8 @@ static void virtio_scsi_drained_end(SCSIBus *bus)
>
> for (uint32_t i = 0; i < total_queues; i++) {
> VirtQueue *vq = virtio_get_queue(vdev, i);
> + virtio_queue_set_notification(vq, 1);
> + virtio_queue_notify(vdev, i);
> virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> }
> }
But this introduces an issue where e.g. a 'backup' QMP command would put
the iothread into a bad state. After the command, whenever the guest
issues IO, the thread will temporarily spike to using 100% CPU. Using
QMP stop+cont is a way to make it go back to normal.
I think it's because of nested drains, because when additionally
checking that the drain count is zero and only executing the loop then,
that issue doesn't seem to manifest, i.e.:
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 9c751bf296..d22c586b38 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1164,9 +1164,13 @@ static void virtio_scsi_drained_end(SCSIBus *bus)
> return;
> }
>
> - for (uint32_t i = 0; i < total_queues; i++) {
> - VirtQueue *vq = virtio_get_queue(vdev, i);
> - virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> + if (s->bus.drain_count == 0) {
> + for (uint32_t i = 0; i < total_queues; i++) {
> + VirtQueue *vq = virtio_get_queue(vdev, i);
> + virtio_queue_set_notification(vq, 1);
> + virtio_queue_notify(vdev, i);
> + virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> + }
> }
> }
>
Best Regards,
Fiona
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-18 12:41 ` Fiona Ebner
@ 2023-12-18 14:25 ` Stefan Hajnoczi
2023-12-18 14:49 ` Paolo Bonzini
1 sibling, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2023-12-18 14:25 UTC (permalink / raw)
To: Fiona Ebner
Cc: qemu-devel, Kevin Wolf, qemu-block, pbonzini, Hanna Reitz,
Fam Zheng
[-- Attachment #1: Type: text/plain, Size: 2339 bytes --]
On Mon, Dec 18, 2023 at 01:41:38PM +0100, Fiona Ebner wrote:
> Am 14.12.23 um 20:53 schrieb Stefan Hajnoczi:
> >
> > I will still try the other approach that Hanna and Paolo have suggested.
> > It seems more palatable. I will send a v2.
> >
>
> FYI, what I already tried downstream (for VirtIO SCSI):
>
> > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> > index 9c751bf296..a6449b04d0 100644
> > --- a/hw/scsi/virtio-scsi.c
> > +++ b/hw/scsi/virtio-scsi.c
> > @@ -1166,6 +1166,8 @@ static void virtio_scsi_drained_end(SCSIBus *bus)
> >
> > for (uint32_t i = 0; i < total_queues; i++) {
> > VirtQueue *vq = virtio_get_queue(vdev, i);
> > + virtio_queue_set_notification(vq, 1);
> > + virtio_queue_notify(vdev, i);
> > virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > }
> > }
>
> But this introduces an issue where e.g. a 'backup' QMP command would put
> the iothread into a bad state. After the command, whenever the guest
> issues IO, the thread will temporarily spike to using 100% CPU. Using
> QMP stop+cont is a way to make it go back to normal.
>
> I think it's because of nested drains, because when additionally
> checking that the drain count is zero and only executing the loop then,
> that issue doesn't seem to manifest, i.e.:
Thanks for letting me know about the issue. I'll keep an eye out for it
when playing with the code.
Stefan
>
> > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> > index 9c751bf296..d22c586b38 100644
> > --- a/hw/scsi/virtio-scsi.c
> > +++ b/hw/scsi/virtio-scsi.c
> > @@ -1164,9 +1164,13 @@ static void virtio_scsi_drained_end(SCSIBus *bus)
> > return;
> > }
> >
> > - for (uint32_t i = 0; i < total_queues; i++) {
> > - VirtQueue *vq = virtio_get_queue(vdev, i);
> > - virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > + if (s->bus.drain_count == 0) {
> > + for (uint32_t i = 0; i < total_queues; i++) {
> > + VirtQueue *vq = virtio_get_queue(vdev, i);
> > + virtio_queue_set_notification(vq, 1);
> > + virtio_queue_notify(vdev, i);
> > + virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > + }
> > }
> > }
> >
>
> Best Regards,
> Fiona
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-18 12:41 ` Fiona Ebner
2023-12-18 14:25 ` Stefan Hajnoczi
@ 2023-12-18 14:49 ` Paolo Bonzini
2023-12-19 8:40 ` Fiona Ebner
1 sibling, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2023-12-18 14:49 UTC (permalink / raw)
To: Fiona Ebner
Cc: Stefan Hajnoczi, qemu-devel, Kevin Wolf, qemu-block, Hanna Reitz,
Fam Zheng
On Mon, Dec 18, 2023 at 1:41 PM Fiona Ebner <f.ebner@proxmox.com> wrote:
> I think it's because of nested drains, because when additionally
> checking that the drain count is zero and only executing the loop then,
> that issue doesn't seem to manifest
But isn't virtio_scsi_drained_end only run if bus->drain_count == 0?
if (bus->drain_count-- == 1) {
trace_scsi_bus_drained_end(bus, sdev);
if (bus->info->drained_end) {
bus->info->drained_end(bus);
}
}
Paolo
>
> > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> > index 9c751bf296..d22c586b38 100644
> > --- a/hw/scsi/virtio-scsi.c
> > +++ b/hw/scsi/virtio-scsi.c
> > @@ -1164,9 +1164,13 @@ static void virtio_scsi_drained_end(SCSIBus *bus)
> > return;
> > }
> >
> > - for (uint32_t i = 0; i < total_queues; i++) {
> > - VirtQueue *vq = virtio_get_queue(vdev, i);
> > - virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > + if (s->bus.drain_count == 0) {
> > + for (uint32_t i = 0; i < total_queues; i++) {
> > + VirtQueue *vq = virtio_get_queue(vdev, i);
> > + virtio_queue_set_notification(vq, 1);
> > + virtio_queue_notify(vdev, i);
> > + virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> > + }
> > }
> > }
> >
>
> Best Regards,
> Fiona
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-18 14:49 ` Paolo Bonzini
@ 2023-12-19 8:40 ` Fiona Ebner
0 siblings, 0 replies; 30+ messages in thread
From: Fiona Ebner @ 2023-12-19 8:40 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Stefan Hajnoczi, qemu-devel, Kevin Wolf, qemu-block, Hanna Reitz,
Fam Zheng
Am 18.12.23 um 15:49 schrieb Paolo Bonzini:
> On Mon, Dec 18, 2023 at 1:41 PM Fiona Ebner <f.ebner@proxmox.com> wrote:
>> I think it's because of nested drains, because when additionally
>> checking that the drain count is zero and only executing the loop then,
>> that issue doesn't seem to manifest
>
> But isn't virtio_scsi_drained_end only run if bus->drain_count == 0?
>
> if (bus->drain_count-- == 1) {
> trace_scsi_bus_drained_end(bus, sdev);
> if (bus->info->drained_end) {
> bus->info->drained_end(bus);
> }
> }
>
You're right. Sorry, I must've messed up my testing yesterday :(
Sometimes the CPU spikes are very short-lived. Now I see the same issue
with both variants.
Unfortunately, I haven't been able to figure out why it happens yet.
Best Regards,
Fiona
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2023-12-13 21:15 [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler Stefan Hajnoczi
` (5 preceding siblings ...)
2023-12-14 13:38 ` Fiona Ebner
@ 2024-01-02 15:24 ` Hanna Czenczek
2024-01-02 15:53 ` Paolo Bonzini
` (2 more replies)
6 siblings, 3 replies; 30+ messages in thread
From: Hanna Czenczek @ 2024-01-02 15:24 UTC (permalink / raw)
To: Stefan Hajnoczi, qemu-devel
Cc: Kevin Wolf, qemu-block, pbonzini, Fam Zheng, Fiona Ebner
[-- Attachment #1: Type: text/plain, Size: 3125 bytes --]
On 13.12.23 22:15, Stefan Hajnoczi wrote:
> Hanna and Fiona encountered a bug in aio_set_fd_handler(): there is no matching
> io_poll_end() call upon removing an AioHandler when io_poll_begin() was
> previously called. The missing io_poll_end() call leaves virtqueue
> notifications disabled and the virtqueue's ioeventfd will never become
> readable anymore.
>
> The details of how virtio-scsi devices using IOThreads can hang after
> hotplug/unplug are covered here:
> https://issues.redhat.com/browse/RHEL-3934
>
> Hanna is currently away over the December holidays. I'm sending these RFC
> patches in the meantime. They demonstrate running aio_set_fd_handler() in the
> AioContext home thread and adding the missing io_poll_end() call.
I agree with Paolo that if the handlers are removed, the caller probably
isn’t interested in notifications: In our specific case, we remove the
handlers because the device is to be drained, so we won’t poll the
virtqueue anyway. Therefore, if io_poll_end() is to be called to
complete the start-end pair, it shouldn’t be done when the handlers are
removed, but instead when they are reinstated.
I believe that’s quite infeasible to do in generic code: We’d have to
basically remember that we haven’t called a specific io_poll_end
callback yet, and so once it is reinstated while we’re not in a polling
phase, we would need to call it then. That in itself is already hairy,
but in addition, the callback consists of a function pointer and an
opaque pointer, and it seems impossible to reliably establish identity
between two opaque pointers to know whether a newly instated io_poll_end
callback is the same one as one that had been removed before (pointer
identity may work, but also may not).
That’s why I think the responsibility should fall on the caller. I
believe both virtio_queue_aio_attach_host_notifier*() functions should
enable vq notifications before instating the handler – if we’re in
polling mode, io_poll_start() will then immediately be called and
disable vq notifications again. Having them enabled briefly shouldn’t
hurt anything but performance (which I don’t think is terrible in the
drain case). For completeness’ sake, we may even have
virtio_queue_aio_detach_host_notifier() disable vq notifications,
because otherwise it’s unknown whether notifications are enabled or
disabled after removing the notifiers. That isn’t terrible, but I think
(A) if any of the two, we want them to be disabled, because we won’t
check for requests anyway, and (B) this gives us a clearer state
machine, where removing the notifiers will always leave vq notifications
disabled, so when they are reinstated (i.e. when calling
virtio_queue_aio_attach_host_notifier*()), it’s clear that we must poll
once to check for new requests.
I’ve attached the preliminary patch that I didn’t get to send (or test
much) last year. Not sure if it has the same CPU-usage-spike issue
Fiona was seeing, the only functional difference is that I notify the vq
after attaching the notifiers instead of before.
Hanna
[-- Attachment #2: 0001-Keep-notifications-disabled-during-drain.patch --]
[-- Type: text/x-patch, Size: 7354 bytes --]
From 451aae74fc19a6ea5cd6381247cd9202571651e8 Mon Sep 17 00:00:00 2001
From: Hanna Czenczek <hreitz@redhat.com>
Date: Wed, 6 Dec 2023 18:24:55 +0100
Subject: [PATCH] Keep notifications disabled during drain
Preliminary patch with a preliminary description: During drain, we do
not care about virtqueue notifications, which is why we remove the
handlers on it. When removing those handlers, whether vq notifications
are enabled or not depends on whether we were in polling mode or not; if
not, they are enabled (by default); if so, they have been disabled by
the io_poll_start callback.
Ideally, we would rather have the vq notifications be disabled, because
we will not process requests during a drained section anyway.
Therefore, have virtio_queue_aio_detach_host_notifier() explicitly
disable them, and virtio_queue_aio_attach_host_notifier*() re-enable
them (if we are in a polling section, attaching them will invoke the
io_poll_start callback, which will re-disable them).
Because we will miss virtqueue updates in the drained section, we also
need to poll the virtqueue once in the drained_end functions after
attaching the notifiers.
---
include/block/aio.h | 7 ++++++-
include/hw/virtio/virtio.h | 1 +
hw/block/virtio-blk.c | 10 ++++++++++
hw/scsi/virtio-scsi.c | 10 ++++++++++
hw/virtio/virtio.c | 30 +++++++++++++++++++++++++++++-
5 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/include/block/aio.h b/include/block/aio.h
index f08b358077..795a375ff2 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -497,9 +497,14 @@ void aio_set_event_notifier(AioContext *ctx,
AioPollFn *io_poll,
EventNotifierHandler *io_poll_ready);
-/* Set polling begin/end callbacks for an event notifier that has already been
+/*
+ * Set polling begin/end callbacks for an event notifier that has already been
* registered with aio_set_event_notifier. Do nothing if the event notifier is
* not registered.
+ *
+ * Note that if the io_poll_end() callback (or the entire notifier) is removed
+ * during polling, it will not be called, so an io_poll_begin() is not
+ * necessarily always followed by an io_poll_end().
*/
void aio_set_event_notifier_poll(AioContext *ctx,
EventNotifier *notifier,
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8f72850bc..64e66bea2d 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -335,6 +335,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n);
void virtio_init_region_cache(VirtIODevice *vdev, int n);
void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
void virtio_queue_notify(VirtIODevice *vdev, int n);
+void virtio_queue_notify_vq(VirtQueue *vq);
uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index a1f8e15522..68dad8cf48 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1538,6 +1538,16 @@ static void virtio_blk_drained_end(void *opaque)
for (uint16_t i = 0; i < s->conf.num_queues; i++) {
VirtQueue *vq = virtio_get_queue(vdev, i);
virtio_queue_aio_attach_host_notifier(vq, ctx);
+
+ /*
+ * We will have ignored notifications about new requests from the guest
+ * during the drain, so "kick" the virt queue to process those requests
+ * now.
+ * Our .handle_output callback (virtio_blk_handle_output() ->
+ * virtio_blk_handle_vq()) acquires the AioContext, so this should be
+ * safe to call from any context.
+ */
+ virtio_queue_notify_vq(vq);
}
}
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 9c751bf296..9234bea7e8 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1167,6 +1167,16 @@ static void virtio_scsi_drained_end(SCSIBus *bus)
for (uint32_t i = 0; i < total_queues; i++) {
VirtQueue *vq = virtio_get_queue(vdev, i);
virtio_queue_aio_attach_host_notifier(vq, s->ctx);
+
+ /*
+ * We will have ignored notifications about new requests from the guest
+ * during the drain, so "kick" the virt queue to process those requests
+ * now.
+ * Our .handle_output callbacks (virtio_scsi_handle_{ctrl,cmd,vq}) use
+ * virtio_scsi_acquire(), so this should be safe to call from any
+ * context.
+ */
+ virtio_queue_notify_vq(vq);
}
}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index e5105571cf..f515115069 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2255,7 +2255,7 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int align)
}
}
-static void virtio_queue_notify_vq(VirtQueue *vq)
+void virtio_queue_notify_vq(VirtQueue *vq)
{
if (vq->vring.desc && vq->handle_output) {
VirtIODevice *vdev = vq->vdev;
@@ -3556,6 +3556,17 @@ static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
{
+ /*
+ * virtio_queue_aio_detach_host_notifier() leaves notifications disabled.
+ * Re-enable them. (And if detach has not been used before, notifications
+ * being enabled is still the default state while a notifier is attached;
+ * see virtio_queue_host_notifier_aio_poll_end(), which will always leave
+ * notifications enabled once the polling section is left.)
+ */
+ if (!virtio_queue_get_notification(vq)) {
+ virtio_queue_set_notification(vq, 1);
+ }
+
aio_set_event_notifier(ctx, &vq->host_notifier,
virtio_queue_host_notifier_read,
virtio_queue_host_notifier_aio_poll,
@@ -3573,6 +3584,10 @@ void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
*/
void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ctx)
{
+ if (!virtio_queue_get_notification(vq)) {
+ virtio_queue_set_notification(vq, 1);
+ }
+
aio_set_event_notifier(ctx, &vq->host_notifier,
virtio_queue_host_notifier_read,
NULL, NULL);
@@ -3581,6 +3596,19 @@ void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ct
void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx)
{
aio_set_event_notifier(ctx, &vq->host_notifier, NULL, NULL, NULL);
+
+ /*
+ * aio_set_event_notifier_poll() does not guarantee whether io_poll_end()
+ * will run after io_poll_begin(), so by removing the notifier, we do not
+ * know whether virtio_queue_host_notifier_aio_poll_end() has run after a
+ * previous virtio_queue_host_notifier_aio_poll_begin(), i.e. whether
+ * notifications are enabled or disabled. We just removed the notifier, so
+ * we may as well disable notifications. The attach_host_notifier functions
+ * will re-enable them.
+ */
+ if (virtio_queue_get_notification(vq)) {
+ virtio_queue_set_notification(vq, 0);
+ }
}
void virtio_queue_host_notifier_read(EventNotifier *n)
--
2.43.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2024-01-02 15:24 ` Hanna Czenczek
@ 2024-01-02 15:53 ` Paolo Bonzini
2024-01-02 16:55 ` Hanna Czenczek
2024-01-03 11:40 ` Fiona Ebner
2024-01-23 16:28 ` Hanna Czenczek
2 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2024-01-02 15:53 UTC (permalink / raw)
To: Hanna Czenczek
Cc: Stefan Hajnoczi, qemu-devel, Kevin Wolf, qemu-block, Fam Zheng,
Fiona Ebner
On Tue, Jan 2, 2024 at 4:24 PM Hanna Czenczek <hreitz@redhat.com> wrote:
> I’ve attached the preliminary patch that I didn’t get to send (or test
> much) last year. Not sure if it has the same CPU-usage-spike issue
> Fiona was seeing, the only functional difference is that I notify the vq
> after attaching the notifiers instead of before.
I think the patch makes sense and cleaning up the logic of aio_poll
(which is one of those functions that grew and grew without much
clarity into who did what) can be done on top.
Just one small thing, the virtio_queue_notify_vq() call is required
because the virtqueue interrupt and eventfd are edge-triggered rather
than level-triggered; so perhaps it should be placed in the
function(s) that establish the handlers,
virtio_queue_aio_attach_host_notifier() and
virtio_queue_aio_attach_host_notifier_no_poll()? Neither
virtio_blk_drained_end() nor virtio_scsi_drained_end() are
particularly special, and the comment applies just as well:
/*
* We will have ignored notifications about new requests from the guest
* while handlers were not attached, so "kick" the virt queue to process
* those requests now.
*/
Paolo
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2024-01-02 15:53 ` Paolo Bonzini
@ 2024-01-02 16:55 ` Hanna Czenczek
0 siblings, 0 replies; 30+ messages in thread
From: Hanna Czenczek @ 2024-01-02 16:55 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Stefan Hajnoczi, qemu-devel, Kevin Wolf, qemu-block, Fam Zheng,
Fiona Ebner
[-- Attachment #1: Type: text/plain, Size: 1709 bytes --]
On 02.01.24 16:53, Paolo Bonzini wrote:
> On Tue, Jan 2, 2024 at 4:24 PM Hanna Czenczek<hreitz@redhat.com> wrote:
>> I’ve attached the preliminary patch that I didn’t get to send (or test
>> much) last year. Not sure if it has the same CPU-usage-spike issue
>> Fiona was seeing, the only functional difference is that I notify the vq
>> after attaching the notifiers instead of before.
> I think the patch makes sense and cleaning up the logic of aio_poll
> (which is one of those functions that grew and grew without much
> clarity into who did what) can be done on top.
>
> Just one small thing, the virtio_queue_notify_vq() call is required
> because the virtqueue interrupt and eventfd are edge-triggered rather
> than level-triggered; so perhaps it should be placed in the
> function(s) that establish the handlers,
> virtio_queue_aio_attach_host_notifier() and
> virtio_queue_aio_attach_host_notifier_no_poll()? Neither
> virtio_blk_drained_end() nor virtio_scsi_drained_end() are
> particularly special, and the comment applies just as well:
>
> /*
> * We will have ignored notifications about new requests from the guest
> * while handlers were not attached, so "kick" the virt queue to process
> * those requests now.
> */
I wasn’t entirely whether we want to call notify_vq() if we have
instated the handlers for the first time (in which case someone ought to
look for existing unprocessed requests anyway), so I decided to limit it
to drained_end.
But considering that it must be safe to call notify_vq() right after
instating the handlers (virtio_queue_host_notifier_read() may then be
called after all), we might as well do so every time, yes.
Hanna
[-- Attachment #2: Type: text/html, Size: 2307 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2024-01-02 15:24 ` Hanna Czenczek
2024-01-02 15:53 ` Paolo Bonzini
@ 2024-01-03 11:40 ` Fiona Ebner
2024-01-03 13:35 ` Paolo Bonzini
2024-01-23 16:28 ` Hanna Czenczek
2 siblings, 1 reply; 30+ messages in thread
From: Fiona Ebner @ 2024-01-03 11:40 UTC (permalink / raw)
To: Hanna Czenczek, Stefan Hajnoczi, qemu-devel
Cc: Kevin Wolf, qemu-block, pbonzini, Fam Zheng
Am 02.01.24 um 16:24 schrieb Hanna Czenczek:
>
> I’ve attached the preliminary patch that I didn’t get to send (or test
> much) last year. Not sure if it has the same CPU-usage-spike issue
> Fiona was seeing, the only functional difference is that I notify the vq
> after attaching the notifiers instead of before.
>
Applied the patch on top of c12887e1b0 ("block-coroutine-wrapper: use
qemu_get_current_aio_context()") because it conflicts with b6948ab01d
("virtio-blk: add iothread-vq-mapping parameter").
I'm happy to report that I cannot reproduce the CPU-usage-spike issue
with the patch, but I did run into an assertion failure when trying to
verify that it fixes my original stuck-guest-IO issue. See below for the
backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934
> I think it’s sufficient to simply call virtio_queue_notify_vq(vq) after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because both virtio-scsi’s and virtio-blk’s .handle_output() implementations acquire the device’s context, so this should be directly callable from any context.
I guess this is not true anymore now that the AioContext locking was
removed?
Back to the CPU-usage-spike issue: I experimented around and it doesn't
seem to matter whether I notify the virt queue before or after attaching
the notifiers. But there's another functional difference. My patch
called virtio_queue_notify() which contains this block:
> if (vq->host_notifier_enabled) {
> event_notifier_set(&vq->host_notifier);
> } else if (vq->handle_output) {
> vq->handle_output(vdev, vq);
In my testing, the first branch was taken, calling event_notifier_set().
Hanna's patch uses virtio_queue_notify_vq() and there,
vq->handle_output() will be called. That seems to be the relevant
difference regarding the CPU-usage-spike issue.
Best Regards,
Fiona
[0]:
> #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> #1 0x00007ffff60e3d9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
> #2 0x00007ffff6094f32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
> #3 0x00007ffff607f472 in __GI_abort () at ./stdlib/abort.c:79
> #4 0x00007ffff607f395 in __assert_fail_base (fmt=0x7ffff61f3a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
> assertion=assertion@entry=0x555556246bf8 "ctx == qemu_get_current_aio_context()",
> file=file@entry=0x555556246baf "../system/dma-helpers.c", line=line@entry=123,
> function=function@entry=0x555556246c70 <__PRETTY_FUNCTION__.1> "dma_blk_cb") at ./assert/assert.c:92
> #5 0x00007ffff608de32 in __GI___assert_fail (assertion=0x555556246bf8 "ctx == qemu_get_current_aio_context()",
> file=0x555556246baf "../system/dma-helpers.c", line=123, function=0x555556246c70 <__PRETTY_FUNCTION__.1> "dma_blk_cb")
> at ./assert/assert.c:101
> #6 0x0000555555b83425 in dma_blk_cb (opaque=0x55555804f150, ret=0) at ../system/dma-helpers.c:123
> #7 0x0000555555b839ec in dma_blk_io (ctx=0x555557404310, sg=0x5555588ca6f8, offset=70905856, align=512,
> io_func=0x555555a94a87 <scsi_dma_readv>, io_func_opaque=0x55555817ea00, cb=0x555555a8d99f <scsi_dma_complete>, opaque=0x55555817ea00,
> dir=DMA_DIRECTION_FROM_DEVICE) at ../system/dma-helpers.c:236
> #8 0x0000555555a8de9a in scsi_do_read (r=0x55555817ea00, ret=0) at ../hw/scsi/scsi-disk.c:431
> #9 0x0000555555a8e249 in scsi_read_data (req=0x55555817ea00) at ../hw/scsi/scsi-disk.c:501
> #10 0x0000555555a897e3 in scsi_req_continue (req=0x55555817ea00) at ../hw/scsi/scsi-bus.c:1478
> #11 0x0000555555d8270e in virtio_scsi_handle_cmd_req_submit (s=0x555558669af0, req=0x5555588ca6b0) at ../hw/scsi/virtio-scsi.c:828
> #12 0x0000555555d82937 in virtio_scsi_handle_cmd_vq (s=0x555558669af0, vq=0x555558672550) at ../hw/scsi/virtio-scsi.c:870
> #13 0x0000555555d829a9 in virtio_scsi_handle_cmd (vdev=0x555558669af0, vq=0x555558672550) at ../hw/scsi/virtio-scsi.c:883
> #14 0x0000555555db3784 in virtio_queue_notify_vq (vq=0x555558672550) at ../hw/virtio/virtio.c:2268
> #15 0x0000555555d8346a in virtio_scsi_drained_end (bus=0x555558669d88) at ../hw/scsi/virtio-scsi.c:1179
> #16 0x0000555555a8a549 in scsi_device_drained_end (sdev=0x555558105000) at ../hw/scsi/scsi-bus.c:1774
> #17 0x0000555555a931db in scsi_disk_drained_end (opaque=0x555558105000) at ../hw/scsi/scsi-disk.c:2369
> #18 0x0000555555ee439c in blk_root_drained_end (child=0x5555574065d0) at ../block/block-backend.c:2829
> #19 0x0000555555ef0ac3 in bdrv_parent_drained_end_single (c=0x5555574065d0) at ../block/io.c:74
> #20 0x0000555555ef0b02 in bdrv_parent_drained_end (bs=0x555557409f80, ignore=0x0) at ../block/io.c:89
> #21 0x0000555555ef1b1b in bdrv_do_drained_end (bs=0x555557409f80, parent=0x0) at ../block/io.c:421
> #22 0x0000555555ef1b5a in bdrv_drained_end (bs=0x555557409f80) at ../block/io.c:428
> #23 0x0000555555efcf64 in mirror_exit_common (job=0x5555588b8220) at ../block/mirror.c:798
> #24 0x0000555555efcfde in mirror_abort (job=0x5555588b8220) at ../block/mirror.c:814
> #25 0x0000555555ec53ea in job_abort (job=0x5555588b8220) at ../job.c:825
> #26 0x0000555555ec54d5 in job_finalize_single_locked (job=0x5555588b8220) at ../job.c:855
> #27 0x0000555555ec57cb in job_completed_txn_abort_locked (job=0x5555588b8220) at ../job.c:958
> #28 0x0000555555ec5c20 in job_completed_locked (job=0x5555588b8220) at ../job.c:1065
> #29 0x0000555555ec5cd5 in job_exit (opaque=0x5555588b8220) at ../job.c:1088
> #30 0x000055555608342e in aio_bh_call (bh=0x7fffe400dfd0) at ../util/async.c:169
> #31 0x0000555556083549 in aio_bh_poll (ctx=0x55555718ade0) at ../util/async.c:216
> #32 0x0000555556065203 in aio_dispatch (ctx=0x55555718ade0) at ../util/aio-posix.c:423
> #33 0x0000555556083988 in aio_ctx_dispatch (source=0x55555718ade0, callback=0x0, user_data=0x0) at ../util/async.c:358
> #34 0x00007ffff753e7a9 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #35 0x00005555560850ae in glib_pollfds_poll () at ../util/main-loop.c:290
> #36 0x000055555608512b in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
> #37 0x0000555556085239 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> #38 0x0000555555b8d501 in qemu_main_loop () at ../system/runstate.c:782
> #39 0x0000555555e55587 in qemu_default_main () at ../system/main.c:37
> #40 0x0000555555e555c2 in main (argc=68, argv=0x7fffffffd8b8) at ../system/main.c:48
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2024-01-03 11:40 ` Fiona Ebner
@ 2024-01-03 13:35 ` Paolo Bonzini
2024-01-05 13:43 ` Fiona Ebner
0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2024-01-03 13:35 UTC (permalink / raw)
To: Fiona Ebner, Hanna Czenczek, Stefan Hajnoczi, qemu-devel
Cc: Kevin Wolf, qemu-block, Fam Zheng
On 1/3/24 12:40, Fiona Ebner wrote:
> I'm happy to report that I cannot reproduce the CPU-usage-spike issue
> with the patch, but I did run into an assertion failure when trying to
> verify that it fixes my original stuck-guest-IO issue. See below for the
> backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934
>
>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq)
>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because
>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations
>> acquire the device’s context, so this should be directly callable from
>> any context.
>
> I guess this is not true anymore now that the AioContext locking was
> removed?
Good point and, in fact, even before it was much safer to use
virtio_queue_notify() instead. Not only does it use the event notifier
handler, but it also calls it in the right thread/AioContext just by
doing event_notifier_set().
The call to virtio_queue_set_notification(..., 1) is safe; not sure
about the call to virtio_queue_set_notification(..., 0) though. I'd
rather have that executed synchronously in the AioContext using
aio_wait_bh_oneshot(). This is consistent with the pattern used by
virtio_scsi_dataplane_stop() and virtio_blk_data_plane_stop().
Paolo
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2024-01-03 13:35 ` Paolo Bonzini
@ 2024-01-05 13:43 ` Fiona Ebner
2024-01-05 14:30 ` Fiona Ebner
0 siblings, 1 reply; 30+ messages in thread
From: Fiona Ebner @ 2024-01-05 13:43 UTC (permalink / raw)
To: Paolo Bonzini, Hanna Czenczek, Stefan Hajnoczi, qemu-devel
Cc: Kevin Wolf, qemu-block, Fam Zheng
Am 03.01.24 um 14:35 schrieb Paolo Bonzini:
> On 1/3/24 12:40, Fiona Ebner wrote:
>> I'm happy to report that I cannot reproduce the CPU-usage-spike issue
>> with the patch, but I did run into an assertion failure when trying to
>> verify that it fixes my original stuck-guest-IO issue. See below for the
>> backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934
>>
>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq)
>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because
>>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations
>>> acquire the device’s context, so this should be directly callable from
>>> any context.
>>
>> I guess this is not true anymore now that the AioContext locking was
>> removed?
>
> Good point and, in fact, even before it was much safer to use
> virtio_queue_notify() instead. Not only does it use the event notifier
> handler, but it also calls it in the right thread/AioContext just by
> doing event_notifier_set().
>
But with virtio_queue_notify() using the event notifier, the
CPU-usage-spike issue is present:
>> Back to the CPU-usage-spike issue: I experimented around and it doesn't
>> seem to matter whether I notify the virt queue before or after attaching
>> the notifiers. But there's another functional difference. My patch
>> called virtio_queue_notify() which contains this block:
>>
>>> if (vq->host_notifier_enabled) {
>>> event_notifier_set(&vq->host_notifier);
>>> } else if (vq->handle_output) {
>>> vq->handle_output(vdev, vq);
>>
>> In my testing, the first branch was taken, calling event_notifier_set().
>> Hanna's patch uses virtio_queue_notify_vq() and there,
>> vq->handle_output() will be called. That seems to be the relevant
>> difference regarding the CPU-usage-spike issue.
I should mention that this is with a VirtIO SCSI disk. I also attempted
to reproduce the CPU-usage-spike issue with a VirtIO block disk, but
didn't manage yet.
What I noticed is that in virtio_queue_host_notifier_aio_poll(), one of
the queues (but only one) will always show as nonempty. And then,
run_poll_handlers_once() will always detect progress which explains the
CPU usage.
The following shows
1. vq address
2. number of times vq was passed to virtio_queue_host_notifier_aio_poll()
3. number of times the result of virtio_queue_host_notifier_aio_poll()
was true for the vq
> 0x555fd93f9c80 17162000 0
> 0x555fd93f9e48 17162000 6
> 0x555fd93f9ee0 17162000 0
> 0x555fd93f9d18 17162000 17162000
> 0x555fd93f9db0 17162000 0
> 0x555fd93f9f78 17162000 0
And for the problematic one, the reason it is seen as nonempty is:
> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0
Those values stay like this while the call counts above increase.
So something going wrong with the indices when the event notifier is set
from QEMU side (in the main thread)?
The guest is Debian 12 with a 6.1 kernel.
Best Regards,
Fiona
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2024-01-05 13:43 ` Fiona Ebner
@ 2024-01-05 14:30 ` Fiona Ebner
2024-01-22 17:41 ` Hanna Czenczek
0 siblings, 1 reply; 30+ messages in thread
From: Fiona Ebner @ 2024-01-05 14:30 UTC (permalink / raw)
To: Paolo Bonzini, Hanna Czenczek, Stefan Hajnoczi, qemu-devel
Cc: Kevin Wolf, qemu-block, Fam Zheng
Am 05.01.24 um 14:43 schrieb Fiona Ebner:
> Am 03.01.24 um 14:35 schrieb Paolo Bonzini:
>> On 1/3/24 12:40, Fiona Ebner wrote:
>>> I'm happy to report that I cannot reproduce the CPU-usage-spike issue
>>> with the patch, but I did run into an assertion failure when trying to
>>> verify that it fixes my original stuck-guest-IO issue. See below for the
>>> backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934
>>>
>>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq)
>>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because
>>>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations
>>>> acquire the device’s context, so this should be directly callable from
>>>> any context.
>>>
>>> I guess this is not true anymore now that the AioContext locking was
>>> removed?
>>
>> Good point and, in fact, even before it was much safer to use
>> virtio_queue_notify() instead. Not only does it use the event notifier
>> handler, but it also calls it in the right thread/AioContext just by
>> doing event_notifier_set().
>>
>
> But with virtio_queue_notify() using the event notifier, the
> CPU-usage-spike issue is present:
>
>>> Back to the CPU-usage-spike issue: I experimented around and it doesn't
>>> seem to matter whether I notify the virt queue before or after attaching
>>> the notifiers. But there's another functional difference. My patch
>>> called virtio_queue_notify() which contains this block:
>>>
>>>> if (vq->host_notifier_enabled) {
>>>> event_notifier_set(&vq->host_notifier);
>>>> } else if (vq->handle_output) {
>>>> vq->handle_output(vdev, vq);
>>>
>>> In my testing, the first branch was taken, calling event_notifier_set().
>>> Hanna's patch uses virtio_queue_notify_vq() and there,
>>> vq->handle_output() will be called. That seems to be the relevant
>>> difference regarding the CPU-usage-spike issue.
>
> I should mention that this is with a VirtIO SCSI disk. I also attempted
> to reproduce the CPU-usage-spike issue with a VirtIO block disk, but
> didn't manage yet.
>
> What I noticed is that in virtio_queue_host_notifier_aio_poll(), one of
> the queues (but only one) will always show as nonempty. And then,
> run_poll_handlers_once() will always detect progress which explains the
> CPU usage.
>
> The following shows
> 1. vq address
> 2. number of times vq was passed to virtio_queue_host_notifier_aio_poll()
> 3. number of times the result of virtio_queue_host_notifier_aio_poll()
> was true for the vq
>
>> 0x555fd93f9c80 17162000 0
>> 0x555fd93f9e48 17162000 6
>> 0x555fd93f9ee0 17162000 0
>> 0x555fd93f9d18 17162000 17162000
>> 0x555fd93f9db0 17162000 0
>> 0x555fd93f9f78 17162000 0
>
> And for the problematic one, the reason it is seen as nonempty is:
>
>> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0
>
vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and
s->events_dropped is false in my testing, so
virtio_scsi_handle_event_vq() doesn't do anything.
> Those values stay like this while the call counts above increase.
>
> So something going wrong with the indices when the event notifier is set
> from QEMU side (in the main thread)?
>
> The guest is Debian 12 with a 6.1 kernel.
Best Regards,
Fiona
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2024-01-05 14:30 ` Fiona Ebner
@ 2024-01-22 17:41 ` Hanna Czenczek
2024-01-22 17:52 ` Hanna Czenczek
0 siblings, 1 reply; 30+ messages in thread
From: Hanna Czenczek @ 2024-01-22 17:41 UTC (permalink / raw)
To: Fiona Ebner, Paolo Bonzini, Stefan Hajnoczi, qemu-devel
Cc: Kevin Wolf, qemu-block, Fam Zheng
On 05.01.24 15:30, Fiona Ebner wrote:
> Am 05.01.24 um 14:43 schrieb Fiona Ebner:
>> Am 03.01.24 um 14:35 schrieb Paolo Bonzini:
>>> On 1/3/24 12:40, Fiona Ebner wrote:
>>>> I'm happy to report that I cannot reproduce the CPU-usage-spike issue
>>>> with the patch, but I did run into an assertion failure when trying to
>>>> verify that it fixes my original stuck-guest-IO issue. See below for the
>>>> backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934
>>>>
>>>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq)
>>>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because
>>>>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations
>>>>> acquire the device’s context, so this should be directly callable from
>>>>> any context.
>>>> I guess this is not true anymore now that the AioContext locking was
>>>> removed?
>>> Good point and, in fact, even before it was much safer to use
>>> virtio_queue_notify() instead. Not only does it use the event notifier
>>> handler, but it also calls it in the right thread/AioContext just by
>>> doing event_notifier_set().
>>>
>> But with virtio_queue_notify() using the event notifier, the
>> CPU-usage-spike issue is present:
>>
>>>> Back to the CPU-usage-spike issue: I experimented around and it doesn't
>>>> seem to matter whether I notify the virt queue before or after attaching
>>>> the notifiers. But there's another functional difference. My patch
>>>> called virtio_queue_notify() which contains this block:
>>>>
>>>>> if (vq->host_notifier_enabled) {
>>>>> event_notifier_set(&vq->host_notifier);
>>>>> } else if (vq->handle_output) {
>>>>> vq->handle_output(vdev, vq);
>>>> In my testing, the first branch was taken, calling event_notifier_set().
>>>> Hanna's patch uses virtio_queue_notify_vq() and there,
>>>> vq->handle_output() will be called. That seems to be the relevant
>>>> difference regarding the CPU-usage-spike issue.
>> I should mention that this is with a VirtIO SCSI disk. I also attempted
>> to reproduce the CPU-usage-spike issue with a VirtIO block disk, but
>> didn't manage yet.
>>
>> What I noticed is that in virtio_queue_host_notifier_aio_poll(), one of
>> the queues (but only one) will always show as nonempty. And then,
>> run_poll_handlers_once() will always detect progress which explains the
>> CPU usage.
>>
>> The following shows
>> 1. vq address
>> 2. number of times vq was passed to virtio_queue_host_notifier_aio_poll()
>> 3. number of times the result of virtio_queue_host_notifier_aio_poll()
>> was true for the vq
>>
>>> 0x555fd93f9c80 17162000 0
>>> 0x555fd93f9e48 17162000 6
>>> 0x555fd93f9ee0 17162000 0
>>> 0x555fd93f9d18 17162000 17162000
>>> 0x555fd93f9db0 17162000 0
>>> 0x555fd93f9f78 17162000 0
>> And for the problematic one, the reason it is seen as nonempty is:
>>
>>> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0
> vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and
> s->events_dropped is false in my testing, so
> virtio_scsi_handle_event_vq() doesn't do anything.
>
>> Those values stay like this while the call counts above increase.
>>
>> So something going wrong with the indices when the event notifier is set
>> from QEMU side (in the main thread)?
>>
>> The guest is Debian 12 with a 6.1 kernel.
So, trying to figure out a new RFC version:
About the stack trace you, Fiona, posted: As far as I understand, that
happens because virtio_blk_drained_end() calling
virtio_queue_notify_vq() wasn’t safe after all, and instead we need to
use virtio_queue_notify(). Right?
However, you say using virtio_queue_notify() instead causes busy loops
of doing nothing in virtio-scsi (what you describe above). I mean,
better than a crash, but, you know. :)
I don’t know have any prior knowledge about the event handling,
unfortunately. The fact that 8 buffers are available but we don’t use
any sounds OK to me; as far as I understand, we only use those buffers
if we have any events to push into them, so as long as we don’t, we
won’t. Question is, should we not have its poll handler return false if
we don’t have any events (i.e. events_dropped is false)? Would that
solve it?
Hanna
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2024-01-22 17:41 ` Hanna Czenczek
@ 2024-01-22 17:52 ` Hanna Czenczek
2024-01-23 11:12 ` Fiona Ebner
2024-01-23 11:15 ` Hanna Czenczek
0 siblings, 2 replies; 30+ messages in thread
From: Hanna Czenczek @ 2024-01-22 17:52 UTC (permalink / raw)
To: Fiona Ebner, Paolo Bonzini, Stefan Hajnoczi, qemu-devel
Cc: Kevin Wolf, qemu-block, Fam Zheng
On 22.01.24 18:41, Hanna Czenczek wrote:
> On 05.01.24 15:30, Fiona Ebner wrote:
>> Am 05.01.24 um 14:43 schrieb Fiona Ebner:
>>> Am 03.01.24 um 14:35 schrieb Paolo Bonzini:
>>>> On 1/3/24 12:40, Fiona Ebner wrote:
>>>>> I'm happy to report that I cannot reproduce the CPU-usage-spike issue
>>>>> with the patch, but I did run into an assertion failure when
>>>>> trying to
>>>>> verify that it fixes my original stuck-guest-IO issue. See below
>>>>> for the
>>>>> backtrace [0]. Hanna wrote in
>>>>> https://issues.redhat.com/browse/RHEL-3934
>>>>>
>>>>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq)
>>>>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call,
>>>>>> because
>>>>>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations
>>>>>> acquire the device’s context, so this should be directly callable
>>>>>> from
>>>>>> any context.
>>>>> I guess this is not true anymore now that the AioContext locking was
>>>>> removed?
>>>> Good point and, in fact, even before it was much safer to use
>>>> virtio_queue_notify() instead. Not only does it use the event
>>>> notifier
>>>> handler, but it also calls it in the right thread/AioContext just by
>>>> doing event_notifier_set().
>>>>
>>> But with virtio_queue_notify() using the event notifier, the
>>> CPU-usage-spike issue is present:
>>>
>>>>> Back to the CPU-usage-spike issue: I experimented around and it
>>>>> doesn't
>>>>> seem to matter whether I notify the virt queue before or after
>>>>> attaching
>>>>> the notifiers. But there's another functional difference. My patch
>>>>> called virtio_queue_notify() which contains this block:
>>>>>
>>>>>> if (vq->host_notifier_enabled) {
>>>>>> event_notifier_set(&vq->host_notifier);
>>>>>> } else if (vq->handle_output) {
>>>>>> vq->handle_output(vdev, vq);
>>>>> In my testing, the first branch was taken, calling
>>>>> event_notifier_set().
>>>>> Hanna's patch uses virtio_queue_notify_vq() and there,
>>>>> vq->handle_output() will be called. That seems to be the relevant
>>>>> difference regarding the CPU-usage-spike issue.
>>> I should mention that this is with a VirtIO SCSI disk. I also attempted
>>> to reproduce the CPU-usage-spike issue with a VirtIO block disk, but
>>> didn't manage yet.
>>>
>>> What I noticed is that in virtio_queue_host_notifier_aio_poll(), one of
>>> the queues (but only one) will always show as nonempty. And then,
>>> run_poll_handlers_once() will always detect progress which explains the
>>> CPU usage.
>>>
>>> The following shows
>>> 1. vq address
>>> 2. number of times vq was passed to
>>> virtio_queue_host_notifier_aio_poll()
>>> 3. number of times the result of virtio_queue_host_notifier_aio_poll()
>>> was true for the vq
>>>
>>>> 0x555fd93f9c80 17162000 0
>>>> 0x555fd93f9e48 17162000 6
>>>> 0x555fd93f9ee0 17162000 0
>>>> 0x555fd93f9d18 17162000 17162000
>>>> 0x555fd93f9db0 17162000 0
>>>> 0x555fd93f9f78 17162000 0
>>> And for the problematic one, the reason it is seen as nonempty is:
>>>
>>>> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0
>> vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and
>> s->events_dropped is false in my testing, so
>> virtio_scsi_handle_event_vq() doesn't do anything.
>>
>>> Those values stay like this while the call counts above increase.
>>>
>>> So something going wrong with the indices when the event notifier is
>>> set
>>> from QEMU side (in the main thread)?
>>>
>>> The guest is Debian 12 with a 6.1 kernel.
>
> So, trying to figure out a new RFC version:
>
> About the stack trace you, Fiona, posted: As far as I understand,
> that happens because virtio_blk_drained_end() calling
> virtio_queue_notify_vq() wasn’t safe after all, and instead we need to
> use virtio_queue_notify(). Right?
>
> However, you say using virtio_queue_notify() instead causes busy loops
> of doing nothing in virtio-scsi (what you describe above). I mean,
> better than a crash, but, you know. :)
>
> I don’t know have any prior knowledge about the event handling,
> unfortunately. The fact that 8 buffers are available but we don’t use
> any sounds OK to me; as far as I understand, we only use those buffers
> if we have any events to push into them, so as long as we don’t, we
> won’t. Question is, should we not have its poll handler return false
> if we don’t have any events (i.e. events_dropped is false)? Would
> that solve it?
Or actually, maybe we could just skip the virtio_queue_notify() call for
the event vq? I.e. have it be `if (vq !=
VIRTIO_SCSI_COMMON(s)->event_vq) { virtio_queue_notify(vdev, i); }`? I
wouldn’t like that very much, (a) because this would make it slightly
cumbersome to put that into virtio_queue_aio_attach_host_notifier*(),
and (b) in case that does fix it, I do kind of feel like the real
problem is that we use virtio_queue_host_notifier_aio_poll() for the
event vq, which tells the polling code to poll whenever the vq is
non-empty, but we (AFAIU) expect the event vq to be non-empty all the time.
Hanna
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2024-01-22 17:52 ` Hanna Czenczek
@ 2024-01-23 11:12 ` Fiona Ebner
2024-01-23 11:25 ` Hanna Czenczek
2024-01-23 11:15 ` Hanna Czenczek
1 sibling, 1 reply; 30+ messages in thread
From: Fiona Ebner @ 2024-01-23 11:12 UTC (permalink / raw)
To: Hanna Czenczek, Paolo Bonzini, Stefan Hajnoczi, qemu-devel
Cc: Kevin Wolf, qemu-block, Fam Zheng
Am 22.01.24 um 18:52 schrieb Hanna Czenczek:
> On 22.01.24 18:41, Hanna Czenczek wrote:
>> On 05.01.24 15:30, Fiona Ebner wrote:
>>> Am 05.01.24 um 14:43 schrieb Fiona Ebner:
>>>> Am 03.01.24 um 14:35 schrieb Paolo Bonzini:
>>>>> On 1/3/24 12:40, Fiona Ebner wrote:
>>>>>> I'm happy to report that I cannot reproduce the CPU-usage-spike issue
>>>>>> with the patch, but I did run into an assertion failure when
>>>>>> trying to
>>>>>> verify that it fixes my original stuck-guest-IO issue. See below
>>>>>> for the
>>>>>> backtrace [0]. Hanna wrote in
>>>>>> https://issues.redhat.com/browse/RHEL-3934
>>>>>>
>>>>>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq)
>>>>>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call,
>>>>>>> because
>>>>>>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations
>>>>>>> acquire the device’s context, so this should be directly callable
>>>>>>> from
>>>>>>> any context.
>>>>>> I guess this is not true anymore now that the AioContext locking was
>>>>>> removed?
>>>>> Good point and, in fact, even before it was much safer to use
>>>>> virtio_queue_notify() instead. Not only does it use the event
>>>>> notifier
>>>>> handler, but it also calls it in the right thread/AioContext just by
>>>>> doing event_notifier_set().
>>>>>
>>>> But with virtio_queue_notify() using the event notifier, the
>>>> CPU-usage-spike issue is present:
>>>>
>>>>>> Back to the CPU-usage-spike issue: I experimented around and it
>>>>>> doesn't
>>>>>> seem to matter whether I notify the virt queue before or after
>>>>>> attaching
>>>>>> the notifiers. But there's another functional difference. My patch
>>>>>> called virtio_queue_notify() which contains this block:
>>>>>>
>>>>>>> if (vq->host_notifier_enabled) {
>>>>>>> event_notifier_set(&vq->host_notifier);
>>>>>>> } else if (vq->handle_output) {
>>>>>>> vq->handle_output(vdev, vq);
>>>>>> In my testing, the first branch was taken, calling
>>>>>> event_notifier_set().
>>>>>> Hanna's patch uses virtio_queue_notify_vq() and there,
>>>>>> vq->handle_output() will be called. That seems to be the relevant
>>>>>> difference regarding the CPU-usage-spike issue.
>>>> I should mention that this is with a VirtIO SCSI disk. I also attempted
>>>> to reproduce the CPU-usage-spike issue with a VirtIO block disk, but
>>>> didn't manage yet.
>>>>
>>>> What I noticed is that in virtio_queue_host_notifier_aio_poll(), one of
>>>> the queues (but only one) will always show as nonempty. And then,
>>>> run_poll_handlers_once() will always detect progress which explains the
>>>> CPU usage.
>>>>
>>>> The following shows
>>>> 1. vq address
>>>> 2. number of times vq was passed to
>>>> virtio_queue_host_notifier_aio_poll()
>>>> 3. number of times the result of virtio_queue_host_notifier_aio_poll()
>>>> was true for the vq
>>>>
>>>>> 0x555fd93f9c80 17162000 0
>>>>> 0x555fd93f9e48 17162000 6
>>>>> 0x555fd93f9ee0 17162000 0
>>>>> 0x555fd93f9d18 17162000 17162000
>>>>> 0x555fd93f9db0 17162000 0
>>>>> 0x555fd93f9f78 17162000 0
>>>> And for the problematic one, the reason it is seen as nonempty is:
>>>>
>>>>> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0
>>> vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and
>>> s->events_dropped is false in my testing, so
>>> virtio_scsi_handle_event_vq() doesn't do anything.
>>>
>>>> Those values stay like this while the call counts above increase.
>>>>
>>>> So something going wrong with the indices when the event notifier is
>>>> set
>>>> from QEMU side (in the main thread)?
>>>>
>>>> The guest is Debian 12 with a 6.1 kernel.
>>
>> So, trying to figure out a new RFC version:
>>
>> About the stack trace you, Fiona, posted: As far as I understand,
>> that happens because virtio_blk_drained_end() calling
>> virtio_queue_notify_vq() wasn’t safe after all, and instead we need to
>> use virtio_queue_notify(). Right?
>>
AFAICT, yes. In particular, after 4f36b13847 ("scsi: remove AioContext
locking"), the AioContext is not acquired by
virtio_scsi_handle_{cmd,ctrl,event} anymore.
>> However, you say using virtio_queue_notify() instead causes busy loops
>> of doing nothing in virtio-scsi (what you describe above). I mean,
>> better than a crash, but, you know. :)
Yes, that happens for me when using virtio_queue_notify(), i.e.
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 690aceec45..8cdf04ac2d 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1166,7 +1166,11 @@ static void virtio_scsi_drained_end(SCSIBus *bus)
>
> for (uint32_t i = 0; i < total_queues; i++) {
> VirtQueue *vq = virtio_get_queue(vdev, i);
> + if (!virtio_queue_get_notification(vq)) {
> + virtio_queue_set_notification(vq, true);
> + }
> virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> + virtio_queue_notify(vdev, i);
> }
> }
>
>>
>> I don’t know have any prior knowledge about the event handling,
>> unfortunately. The fact that 8 buffers are available but we don’t use
>> any sounds OK to me; as far as I understand, we only use those buffers
>> if we have any events to push into them, so as long as we don’t, we
>> won’t. Question is, should we not have its poll handler return false
>> if we don’t have any events (i.e. events_dropped is false)? Would
>> that solve it?
>
> Or actually, maybe we could just skip the virtio_queue_notify() call for
> the event vq? I.e. have it be `if (vq !=
> VIRTIO_SCSI_COMMON(s)->event_vq) { virtio_queue_notify(vdev, i); }`?
That seems to avoid the CPU-usage-spike issue :)
> I wouldn’t like that very much, (a) because this would make it slightly
> cumbersome to put that into virtio_queue_aio_attach_host_notifier*(),
> and (b) in case that does fix it, I do kind of feel like the real
> problem is that we use virtio_queue_host_notifier_aio_poll() for the
> event vq, which tells the polling code to poll whenever the vq is
> non-empty, but we (AFAIU) expect the event vq to be non-empty all the time.
>
AFAIU, (at least in my testing) it's only non-empty after it was
notified via virtio_scsi_drained_end() once. But it's hard to tell,
because it seems that the poll callback is only called after the first
drain. I noticed poll_set_started() is not called, because
ctx->fdmon_ops->need_wait(ctx) was true, i.e. ctx->poll_disable_cnt was
positive (I'm using fdmon_poll). I then found this is because of the
notifier for the event vq, being attached with
> virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
in virtio_scsi_dataplane_start(). But in virtio_scsi_drained_end() it is
attached with virtio_queue_aio_attach_host_notifier() instead of the
_no_poll() variant. So that might be the actual issue here?
From a quick test, I cannot see the CPU-usage-spike issue with the
following either:
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 690aceec45..ba1ab8e410 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1166,7 +1166,15 @@ static void virtio_scsi_drained_end(SCSIBus *bus)
>
> for (uint32_t i = 0; i < total_queues; i++) {
> VirtQueue *vq = virtio_get_queue(vdev, i);
> - virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> + if (!virtio_queue_get_notification(vq)) {
> + virtio_queue_set_notification(vq, true);
> + }
> + if (vq == VIRTIO_SCSI_COMMON(s)->event_vq) {
> + virtio_queue_aio_attach_host_notifier_no_poll(vq, s->ctx);
> + } else {
> + virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> + }
> + virtio_queue_notify(vdev, i);
> }
> }
Best Regards,
Fiona
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2024-01-23 11:12 ` Fiona Ebner
@ 2024-01-23 11:25 ` Hanna Czenczek
0 siblings, 0 replies; 30+ messages in thread
From: Hanna Czenczek @ 2024-01-23 11:25 UTC (permalink / raw)
To: Fiona Ebner, Paolo Bonzini, Stefan Hajnoczi, qemu-devel
Cc: Kevin Wolf, qemu-block, Fam Zheng
[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]
On 23.01.24 12:12, Fiona Ebner wrote:
[...]
> I noticed poll_set_started() is not called, because
> ctx->fdmon_ops->need_wait(ctx) was true, i.e. ctx->poll_disable_cnt was
> positive (I'm using fdmon_poll). I then found this is because of the
> notifier for the event vq, being attached with
>
>> virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
> in virtio_scsi_dataplane_start(). But in virtio_scsi_drained_end() it is
> attached with virtio_queue_aio_attach_host_notifier() instead of the
> _no_poll() variant. So that might be the actual issue here?
>
> From a quick test, I cannot see the CPU-usage-spike issue with the
> following either:
>
>> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
>> index 690aceec45..ba1ab8e410 100644
>> --- a/hw/scsi/virtio-scsi.c
>> +++ b/hw/scsi/virtio-scsi.c
>> @@ -1166,7 +1166,15 @@ static void virtio_scsi_drained_end(SCSIBus *bus)
>>
>> for (uint32_t i = 0; i < total_queues; i++) {
>> VirtQueue *vq = virtio_get_queue(vdev, i);
>> - virtio_queue_aio_attach_host_notifier(vq, s->ctx);
>> + if (!virtio_queue_get_notification(vq)) {
>> + virtio_queue_set_notification(vq, true);
>> + }
>> + if (vq == VIRTIO_SCSI_COMMON(s)->event_vq) {
>> + virtio_queue_aio_attach_host_notifier_no_poll(vq, s->ctx);
>> + } else {
>> + virtio_queue_aio_attach_host_notifier(vq, s->ctx);
>> + }
>> + virtio_queue_notify(vdev, i);
>> }
>> }
Perfect, so we agree on trying it that way. :)
Hanna
[-- Attachment #2: Type: text/html, Size: 2216 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2024-01-22 17:52 ` Hanna Czenczek
2024-01-23 11:12 ` Fiona Ebner
@ 2024-01-23 11:15 ` Hanna Czenczek
1 sibling, 0 replies; 30+ messages in thread
From: Hanna Czenczek @ 2024-01-23 11:15 UTC (permalink / raw)
To: Fiona Ebner, Paolo Bonzini, Stefan Hajnoczi, qemu-devel
Cc: Kevin Wolf, qemu-block, Fam Zheng
On 22.01.24 18:52, Hanna Czenczek wrote:
> On 22.01.24 18:41, Hanna Czenczek wrote:
>> On 05.01.24 15:30, Fiona Ebner wrote:
>>> Am 05.01.24 um 14:43 schrieb Fiona Ebner:
>>>> Am 03.01.24 um 14:35 schrieb Paolo Bonzini:
>>>>> On 1/3/24 12:40, Fiona Ebner wrote:
>>>>>> I'm happy to report that I cannot reproduce the CPU-usage-spike
>>>>>> issue
>>>>>> with the patch, but I did run into an assertion failure when
>>>>>> trying to
>>>>>> verify that it fixes my original stuck-guest-IO issue. See below
>>>>>> for the
>>>>>> backtrace [0]. Hanna wrote in
>>>>>> https://issues.redhat.com/browse/RHEL-3934
>>>>>>
>>>>>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq)
>>>>>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call,
>>>>>>> because
>>>>>>> both virtio-scsi’s and virtio-blk’s .handle_output()
>>>>>>> implementations
>>>>>>> acquire the device’s context, so this should be directly
>>>>>>> callable from
>>>>>>> any context.
>>>>>> I guess this is not true anymore now that the AioContext locking was
>>>>>> removed?
>>>>> Good point and, in fact, even before it was much safer to use
>>>>> virtio_queue_notify() instead. Not only does it use the event
>>>>> notifier
>>>>> handler, but it also calls it in the right thread/AioContext just by
>>>>> doing event_notifier_set().
>>>>>
>>>> But with virtio_queue_notify() using the event notifier, the
>>>> CPU-usage-spike issue is present:
>>>>
>>>>>> Back to the CPU-usage-spike issue: I experimented around and it
>>>>>> doesn't
>>>>>> seem to matter whether I notify the virt queue before or after
>>>>>> attaching
>>>>>> the notifiers. But there's another functional difference. My patch
>>>>>> called virtio_queue_notify() which contains this block:
>>>>>>
>>>>>>> if (vq->host_notifier_enabled) {
>>>>>>> event_notifier_set(&vq->host_notifier);
>>>>>>> } else if (vq->handle_output) {
>>>>>>> vq->handle_output(vdev, vq);
>>>>>> In my testing, the first branch was taken, calling
>>>>>> event_notifier_set().
>>>>>> Hanna's patch uses virtio_queue_notify_vq() and there,
>>>>>> vq->handle_output() will be called. That seems to be the relevant
>>>>>> difference regarding the CPU-usage-spike issue.
>>>> I should mention that this is with a VirtIO SCSI disk. I also
>>>> attempted
>>>> to reproduce the CPU-usage-spike issue with a VirtIO block disk, but
>>>> didn't manage yet.
>>>>
>>>> What I noticed is that in virtio_queue_host_notifier_aio_poll(),
>>>> one of
>>>> the queues (but only one) will always show as nonempty. And then,
>>>> run_poll_handlers_once() will always detect progress which explains
>>>> the
>>>> CPU usage.
>>>>
>>>> The following shows
>>>> 1. vq address
>>>> 2. number of times vq was passed to
>>>> virtio_queue_host_notifier_aio_poll()
>>>> 3. number of times the result of virtio_queue_host_notifier_aio_poll()
>>>> was true for the vq
>>>>
>>>>> 0x555fd93f9c80 17162000 0
>>>>> 0x555fd93f9e48 17162000 6
>>>>> 0x555fd93f9ee0 17162000 0
>>>>> 0x555fd93f9d18 17162000 17162000
>>>>> 0x555fd93f9db0 17162000 0
>>>>> 0x555fd93f9f78 17162000 0
>>>> And for the problematic one, the reason it is seen as nonempty is:
>>>>
>>>>> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0
>>> vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and
>>> s->events_dropped is false in my testing, so
>>> virtio_scsi_handle_event_vq() doesn't do anything.
>>>
>>>> Those values stay like this while the call counts above increase.
>>>>
>>>> So something going wrong with the indices when the event notifier
>>>> is set
>>>> from QEMU side (in the main thread)?
>>>>
>>>> The guest is Debian 12 with a 6.1 kernel.
>>
>> So, trying to figure out a new RFC version:
>>
>> About the stack trace you, Fiona, posted: As far as I understand,
>> that happens because virtio_blk_drained_end() calling
>> virtio_queue_notify_vq() wasn’t safe after all, and instead we need
>> to use virtio_queue_notify(). Right?
>>
>> However, you say using virtio_queue_notify() instead causes busy
>> loops of doing nothing in virtio-scsi (what you describe above). I
>> mean, better than a crash, but, you know. :)
>>
>> I don’t know have any prior knowledge about the event handling,
>> unfortunately. The fact that 8 buffers are available but we don’t
>> use any sounds OK to me; as far as I understand, we only use those
>> buffers if we have any events to push into them, so as long as we
>> don’t, we won’t. Question is, should we not have its poll handler
>> return false if we don’t have any events (i.e. events_dropped is
>> false)? Would that solve it?
>
> Or actually, maybe we could just skip the virtio_queue_notify() call
> for the event vq? I.e. have it be `if (vq !=
> VIRTIO_SCSI_COMMON(s)->event_vq) { virtio_queue_notify(vdev, i); }`?
> I wouldn’t like that very much, (a) because this would make it
> slightly cumbersome to put that into
> virtio_queue_aio_attach_host_notifier*(), and (b) in case that does
> fix it, I do kind of feel like the real problem is that we use
> virtio_queue_host_notifier_aio_poll() for the event vq, which tells
> the polling code to poll whenever the vq is non-empty, but we (AFAIU)
> expect the event vq to be non-empty all the time.
Turns out there was commit 38738f7dbbda90fbc161757b7f4be35b52205552
(“virtio-scsi: don't waste CPU polling the event virtqueue”) by Stefan,
which specifically intended to not use
virtio_queue_host_notifier_aio_poll() for the event vq. I think the
problem is that virtio_scsi_drained_end() should have taken care to use
virtio_queue_aio_attach_host_notifier_no_poll() for the event vq.
If we do that, I think running virtio_queue_notify() on the event vq too
should be reasonable. We still want to check whether there are new
buffers available in case we have events_dropped. I don’t know whether
it’s really necessary, but without virtio_queue_host_notifier_aio_poll()
installed, it shouldn’t hurt.
Hanna
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
2024-01-02 15:24 ` Hanna Czenczek
2024-01-02 15:53 ` Paolo Bonzini
2024-01-03 11:40 ` Fiona Ebner
@ 2024-01-23 16:28 ` Hanna Czenczek
2 siblings, 0 replies; 30+ messages in thread
From: Hanna Czenczek @ 2024-01-23 16:28 UTC (permalink / raw)
To: Stefan Hajnoczi, qemu-devel
Cc: Kevin Wolf, qemu-block, pbonzini, Fam Zheng, Fiona Ebner
On 02.01.24 16:24, Hanna Czenczek wrote:
[...]
> I’ve attached the preliminary patch that I didn’t get to send (or test
> much) last year. Not sure if it has the same CPU-usage-spike issue
> Fiona was seeing, the only functional difference is that I notify the
> vq after attaching the notifiers instead of before.
It’ll take me some more time to send a patch mail to that effect,
because now there’s an assertion failure on hotunplug, which bisects
back to eaad0fe26050c227dc5dad63205835bac4912a51 (“scsi: only access
SCSIDevice->requests from one thread”):
{"execute":"device_del","arguments":{"id":"stg0"}}
{"return": {}}
qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
Assertion `ctx == blk->ctx' failed.
(gdb) bt
#0 0x00007f32a668d83c in () at /usr/lib/libc.so.6
#1 0x00007f32a663d668 in raise () at /usr/lib/libc.so.6
#2 0x00007f32a66254b8 in abort () at /usr/lib/libc.so.6
#3 0x00007f32a66253dc in () at /usr/lib/libc.so.6
#4 0x00007f32a6635d26 in () at /usr/lib/libc.so.6
#5 0x0000556e6b4880a4 in blk_get_aio_context (blk=0x556e6e89ccf0) at
../block/block-backend.c:2429
#6 blk_get_aio_context (blk=0x556e6e89ccf0) at
../block/block-backend.c:2417
#7 0x0000556e6b112d87 in scsi_device_for_each_req_async_bh
(opaque=0x556e6e2c6d10) at ../hw/scsi/scsi-bus.c:128
#8 0x0000556e6b5d1966 in aio_bh_poll (ctx=ctx@entry=0x556e6d8aa290) at
../util/async.c:218
#9 0x0000556e6b5bb16a in aio_poll (ctx=0x556e6d8aa290,
blocking=blocking@entry=true) at ../util/aio-posix.c:722
#10 0x0000556e6b4564b6 in iothread_run
(opaque=opaque@entry=0x556e6d89d920) at ../iothread.c:63
#11 0x0000556e6b5bde58 in qemu_thread_start (args=0x556e6d8aa9b0) at
../util/qemu-thread-posix.c:541
#12 0x00007f32a668b9eb in () at /usr/lib/libc.so.6
#13 0x00007f32a670f7cc in () at /usr/lib/libc.so.6
^ permalink raw reply [flat|nested] 30+ messages in thread