* [PATCH v2] monitor: Fix deadlock in monitor_cleanup
@ 2026-03-27 13:10 hongmianquan
2026-03-31 13:24 ` Markus Armbruster
2026-03-31 21:38 ` Michael Tokarev
0 siblings, 2 replies; 5+ messages in thread
From: hongmianquan @ 2026-03-27 13:10 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, armbru, michael.roth, wubo.bob, hongmianquan
During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently
received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup.
The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true.
Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock.
The execution sequence is illustrated below:
IO Thread Main Thread (qemu_aio_context) qmp_dispatcher_co (iohandler_ctx)
| | |
|-- query-commands | |
|-- qmp_dispatcher_co_wake() | |
| (sets busy = true) | |
| | <-- Wakes up in iohandler_ctx --> |
| | |-- qmp_dispatch()
| | |-- Schedules BH (do_qmp_dispatch_bh)
| | |-- qemu_coroutine_yield()
| | [State: Suspended, busy=true]
| [ quit triggered ] |
| |-- monitor_cleanup()
| |-- qmp_dispatcher_co_shutdown = true
| |-- qmp_dispatcher_co_wake()
| | -> Checks busy flag. It's TRUE!
| | -> Skips aio_co_wake().
| |
| |-- AIO_WAIT_WHILE_UNLOCKED:
| | |-- aio_poll(iohandler_ctx, false)
| | | -> Executes do_qmp_dispatch_bh
| | | -> Schedules 'co_schedule_bh' in iohandler_ctx
| | |
| | |-- aio_poll(qemu_aio_context, true)
| | | -> Blocks indefinitely! (Deadlock)
| |
| X (Main thread sleeping) X (Waiting for next iohandler_ctx poll)
To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx.
Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Signed-off-by: wubo.bob <wubo.bob@bytedance.com>
---
qapi/qmp-dispatch.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 9bb1e6a9f4..e3897d5197 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -128,6 +128,16 @@ static void do_qmp_dispatch_bh(void *opaque)
data->cmd->fn(data->args, data->ret, data->errp);
monitor_set_cur(qemu_coroutine_self(), NULL);
aio_co_wake(data->co);
+
+ /*
+ * If the QMP dispatcher coroutine is waiting to be scheduled
+ * in iohandler_ctx, we must kick the main loop. This ensures
+ * that AIO_WAIT_WHILE_UNLOCKED() in monitor_cleanup() doesn't
+ * block indefinitely waiting for an event in qemu_aio_context,
+ * but actually gets the chance to poll iohandler_ctx and resume
+ * the coroutine.
+ */
+ aio_wait_kick();
}
/*
--
2.32.1 (Apple Git-133)
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] monitor: Fix deadlock in monitor_cleanup
2026-03-27 13:10 [PATCH v2] monitor: Fix deadlock in monitor_cleanup hongmianquan
@ 2026-03-31 13:24 ` Markus Armbruster
2026-03-31 13:36 ` Kevin Wolf
2026-03-31 21:38 ` Michael Tokarev
1 sibling, 1 reply; 5+ messages in thread
From: Markus Armbruster @ 2026-03-31 13:24 UTC (permalink / raw)
To: hongmianquan; +Cc: qemu-devel, kwolf, armbru, michael.roth, wubo.bob
"hongmianquan" <hongmianquan@bytedance.com> writes:
> During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently
> received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup.
>
> The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true.
> Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock.
>
> The execution sequence is illustrated below:
>
> IO Thread Main Thread (qemu_aio_context) qmp_dispatcher_co (iohandler_ctx)
> | | |
> |-- query-commands | |
> |-- qmp_dispatcher_co_wake() | |
> | (sets busy = true) | |
> | | <-- Wakes up in iohandler_ctx --> |
> | | |-- qmp_dispatch()
> | | |-- Schedules BH (do_qmp_dispatch_bh)
> | | |-- qemu_coroutine_yield()
> | | [State: Suspended, busy=true]
> | [ quit triggered ] |
> | |-- monitor_cleanup()
> | |-- qmp_dispatcher_co_shutdown = true
> | |-- qmp_dispatcher_co_wake()
> | | -> Checks busy flag. It's TRUE!
> | | -> Skips aio_co_wake().
> | |
> | |-- AIO_WAIT_WHILE_UNLOCKED:
> | | |-- aio_poll(iohandler_ctx, false)
> | | | -> Executes do_qmp_dispatch_bh
> | | | -> Schedules 'co_schedule_bh' in iohandler_ctx
> | | |
> | | |-- aio_poll(qemu_aio_context, true)
> | | | -> Blocks indefinitely! (Deadlock)
> | |
> | X (Main thread sleeping) X (Waiting for next iohandler_ctx poll)
>
> To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx.
>
> Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
> Signed-off-by: wubo.bob <wubo.bob@bytedance.com>
Please line-wrap your paragraphs at 70 columns or so. The maintainer
accepting the patch may do that for you, to save you a respin.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] monitor: Fix deadlock in monitor_cleanup
2026-03-31 13:24 ` Markus Armbruster
@ 2026-03-31 13:36 ` Kevin Wolf
0 siblings, 0 replies; 5+ messages in thread
From: Kevin Wolf @ 2026-03-31 13:36 UTC (permalink / raw)
To: Markus Armbruster; +Cc: hongmianquan, qemu-devel, michael.roth, wubo.bob
Am 31.03.2026 um 15:24 hat Markus Armbruster geschrieben:
> "hongmianquan" <hongmianquan@bytedance.com> writes:
>
> > During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently
> > received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup.
> >
> > The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true.
> > Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock.
> >
> > The execution sequence is illustrated below:
> >
> > IO Thread Main Thread (qemu_aio_context) qmp_dispatcher_co (iohandler_ctx)
> > | | |
> > |-- query-commands | |
> > |-- qmp_dispatcher_co_wake() | |
> > | (sets busy = true) | |
> > | | <-- Wakes up in iohandler_ctx --> |
> > | | |-- qmp_dispatch()
> > | | |-- Schedules BH (do_qmp_dispatch_bh)
> > | | |-- qemu_coroutine_yield()
> > | | [State: Suspended, busy=true]
> > | [ quit triggered ] |
> > | |-- monitor_cleanup()
> > | |-- qmp_dispatcher_co_shutdown = true
> > | |-- qmp_dispatcher_co_wake()
> > | | -> Checks busy flag. It's TRUE!
> > | | -> Skips aio_co_wake().
> > | |
> > | |-- AIO_WAIT_WHILE_UNLOCKED:
> > | | |-- aio_poll(iohandler_ctx, false)
> > | | | -> Executes do_qmp_dispatch_bh
> > | | | -> Schedules 'co_schedule_bh' in iohandler_ctx
> > | | |
> > | | |-- aio_poll(qemu_aio_context, true)
> > | | | -> Blocks indefinitely! (Deadlock)
> > | |
> > | X (Main thread sleeping) X (Waiting for next iohandler_ctx poll)
> >
> > To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx.
> >
> > Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
> > Signed-off-by: wubo.bob <wubo.bob@bytedance.com>
>
> Please line-wrap your paragraphs at 70 columns or so. The maintainer
> accepting the patch may do that for you, to save you a respin.
>
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Acked-by: Markus Armbruster <armbru@redhat.com>
Thanks, applied to the block branch.
Kevin
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] monitor: Fix deadlock in monitor_cleanup
2026-03-27 13:10 [PATCH v2] monitor: Fix deadlock in monitor_cleanup hongmianquan
2026-03-31 13:24 ` Markus Armbruster
@ 2026-03-31 21:38 ` Michael Tokarev
2026-04-01 8:32 ` Kevin Wolf
1 sibling, 1 reply; 5+ messages in thread
From: Michael Tokarev @ 2026-03-31 21:38 UTC (permalink / raw)
To: hongmianquan, qemu-devel
Cc: kwolf, armbru, michael.roth, wubo.bob, qemu-stable
On 27.03.2026 16:10, hongmianquan wrote:
> During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently
> received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup.
>
> The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true.
> Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock.
>
> The execution sequence is illustrated below:
>
> IO Thread Main Thread (qemu_aio_context) qmp_dispatcher_co (iohandler_ctx)
> | | |
> |-- query-commands | |
> |-- qmp_dispatcher_co_wake() | |
> | (sets busy = true) | |
> | | <-- Wakes up in iohandler_ctx --> |
> | | |-- qmp_dispatch()
> | | |-- Schedules BH (do_qmp_dispatch_bh)
> | | |-- qemu_coroutine_yield()
> | | [State: Suspended, busy=true]
> | [ quit triggered ] |
> | |-- monitor_cleanup()
> | |-- qmp_dispatcher_co_shutdown = true
> | |-- qmp_dispatcher_co_wake()
> | | -> Checks busy flag. It's TRUE!
> | | -> Skips aio_co_wake().
> | |
> | |-- AIO_WAIT_WHILE_UNLOCKED:
> | | |-- aio_poll(iohandler_ctx, false)
> | | | -> Executes do_qmp_dispatch_bh
> | | | -> Schedules 'co_schedule_bh' in iohandler_ctx
> | | |
> | | |-- aio_poll(qemu_aio_context, true)
> | | | -> Blocks indefinitely! (Deadlock)
> | |
> | X (Main thread sleeping) X (Waiting for next iohandler_ctx poll)
>
> To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx.
Shouldn't this one too be picked up for the stable series?
Sounds like a good candidate to me.
Thanks,
/mjt
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] monitor: Fix deadlock in monitor_cleanup
2026-03-31 21:38 ` Michael Tokarev
@ 2026-04-01 8:32 ` Kevin Wolf
0 siblings, 0 replies; 5+ messages in thread
From: Kevin Wolf @ 2026-04-01 8:32 UTC (permalink / raw)
To: Michael Tokarev
Cc: hongmianquan, qemu-devel, armbru, michael.roth, wubo.bob,
qemu-stable
Am 31.03.2026 um 23:38 hat Michael Tokarev geschrieben:
> On 27.03.2026 16:10, hongmianquan wrote:
> > During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently
> > received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup.
>
> Shouldn't this one too be picked up for the stable series?
> Sounds like a good candidate to me.
Yes, makes sense, please pick it up. I also agree with the SCSI and IDE
patch you picked up.
Kevin
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-01 8:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 13:10 [PATCH v2] monitor: Fix deadlock in monitor_cleanup hongmianquan
2026-03-31 13:24 ` Markus Armbruster
2026-03-31 13:36 ` Kevin Wolf
2026-03-31 21:38 ` Michael Tokarev
2026-04-01 8:32 ` Kevin Wolf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.