From: Kevin Wolf <kwolf@redhat.com>
To: qemu-block@nongnu.org
Cc: kwolf@redhat.com, qemu-devel@nongnu.org, peter.maydell@linaro.org
Subject: [PULL 4/6] monitor: Fix deadlock in monitor_cleanup
Date: Tue, 31 Mar 2026 17:03:50 +0200 [thread overview]
Message-ID: <20260331150352.256332-5-kwolf@redhat.com> (raw)
In-Reply-To: <20260331150352.256332-1-kwolf@redhat.com>
From: hongmianquan <hongmianquan@bytedance.com>
During qemu_cleanup, if a non-coroutine QMP command (e.g.,
query-commands) is concurrently received and processed by the
mon_iothread, it can lead to a deadlock in monitor_cleanup.
The root cause is a race condition between the main thread's shutdown
sequence and the coroutine's dispatching mechanism. When handling a
non-coroutine QMP command, qmp_dispatcher_co schedules the actual
command execution as a bottom half in iohandler_ctx and then yields. At
this suspended point, qmp_dispatcher_co_busy remains true.
Subsequently, the main thread in monitor_cleanup(), sets
qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since
qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The
main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes
the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx,
false), which attempts to wake up the coroutine, aio_co_wake schedules a
new wake-up BH in iohandler_ctx. The main thread then blocks
indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's
wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches
termination, resulting in a deadlock.
The execution sequence is illustrated below:
IO Thread Main Thread (qemu_aio_context) qmp_dispatcher_co (iohandler_ctx)
| | |
|-- query-commands | |
|-- qmp_dispatcher_co_wake() | |
| (sets busy = true) | |
| | <-- Wakes up in iohandler_ctx --> |
| | |-- qmp_dispatch()
| | |-- Schedules BH (do_qmp_dispatch_bh)
| | |-- qemu_coroutine_yield()
| | [State: Suspended, busy=true]
| [ quit triggered ] |
| |-- monitor_cleanup()
| |-- qmp_dispatcher_co_shutdown = true
| |-- qmp_dispatcher_co_wake()
| | -> Checks busy flag. It's TRUE!
| | -> Skips aio_co_wake().
| |
| |-- AIO_WAIT_WHILE_UNLOCKED:
| | |-- aio_poll(iohandler_ctx, false)
| | | -> Executes do_qmp_dispatch_bh
| | | -> Schedules 'co_schedule_bh' in iohandler_ctx
| | |
| | |-- aio_poll(qemu_aio_context, true)
| | | -> Blocks indefinitely! (Deadlock)
| |
| X (Main thread sleeping) X (Waiting for next iohandler_ctx poll)
To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh()
to break the main loop out of its blocking poll, allowing it to evaluate
the loop condition and poll iohandler_ctx.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Signed-off-by: wubo.bob <wubo.bob@bytedance.com>
Message-ID: <20260327131024.51947-1-hongmianquan@bytedance.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
qapi/qmp-dispatch.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 9bb1e6a9f4a..e3897d51977 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -128,6 +128,16 @@ static void do_qmp_dispatch_bh(void *opaque)
data->cmd->fn(data->args, data->ret, data->errp);
monitor_set_cur(qemu_coroutine_self(), NULL);
aio_co_wake(data->co);
+
+ /*
+ * If the QMP dispatcher coroutine is waiting to be scheduled
+ * in iohandler_ctx, we must kick the main loop. This ensures
+ * that AIO_WAIT_WHILE_UNLOCKED() in monitor_cleanup() doesn't
+ * block indefinitely waiting for an event in qemu_aio_context,
+ * but actually gets the chance to poll iohandler_ctx and resume
+ * the coroutine.
+ */
+ aio_wait_kick();
}
/*
--
2.53.0
next prev parent reply other threads:[~2026-03-31 15:05 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-31 15:03 [PULL 0/6] Block layer patches Kevin Wolf
2026-03-31 15:03 ` [PULL 1/6] ide: Fix potential assertion failure on VM stop for PIO read error Kevin Wolf
2026-03-31 15:03 ` [PULL 2/6] scsi: Don't consider LOGICAL UNIT NOT SUPPORTED guest recoverable Kevin Wolf
2026-03-31 15:03 ` [PULL 3/6] block: Fix references in bdrv_bsc_*() function comments Kevin Wolf
2026-03-31 15:03 ` Kevin Wolf [this message]
2026-04-04 6:02 ` [PULL 4/6] monitor: Fix deadlock in monitor_cleanup Michael Tokarev
2026-04-04 6:08 ` Michael Tokarev
2026-04-07 12:04 ` Kevin Wolf
2026-04-07 13:30 ` Michael Tokarev
2026-04-07 16:46 ` Kevin Wolf
2026-04-07 17:12 ` Michael Tokarev
2026-04-08 9:05 ` Kevin Wolf
2026-04-07 16:57 ` Michael Tokarev
2026-03-31 15:03 ` [PULL 5/6] vhost-user-blk-server: fix opt_io_size=1 causing severe Windows I/O degradation Kevin Wolf
2026-03-31 15:03 ` [PULL 6/6] block: Fix crash after setting latency historygram with single bin Kevin Wolf
2026-03-31 18:04 ` [PULL 0/6] Block layer patches Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260331150352.256332-5-kwolf@redhat.com \
--to=kwolf@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.