qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context
@ 2024-06-05 12:08 Fiona Ebner
  2024-06-05 13:59 ` Fabiano Rosas
  2024-06-06 18:36 ` Stefan Hajnoczi
  0 siblings, 2 replies; 9+ messages in thread
From: Fiona Ebner @ 2024-06-05 12:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas, stefanha, pbonzini

The fact that the snapshot_save_job_bh() is scheduled in the main
loop's qemu_aio_context AioContext means that it might get executed
during a vCPU thread's aio_poll(). But saving of the VM state cannot
happen while the guest or devices are active and can lead to assertion
failures. See issue #2111 for two examples. Avoid the problem by
scheduling the snapshot_save_job_bh() in the iohandler AioContext,
which is not polled by vCPU threads.

Solves Issue #2111.

This change also solves the following issue:

Since commit effd60c878 ("monitor: only run coroutine commands in
qemu_aio_context"), the 'snapshot-save' QMP call would not respond
right after starting the job anymore, but only after the job finished,
which can take a long time. The reason is, because after commit
effd60c878, do_qmp_dispatch_bh() runs in the iohandler AioContext.
When do_qmp_dispatch_bh() wakes the qmp_dispatch() coroutine, the
coroutine cannot be entered immediately anymore, but needs to be
scheduled to the main loop's qemu_aio_context AioContext. But
snapshot_save_job_bh() was scheduled first to the same AioContext and
thus gets executed first.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/2111
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

While initial smoke testing seems fine, I'm not familiar enough with
this to rule out any pitfalls with the approach. Any reason why
scheduling to the iohandler AioContext could be wrong here?

Should the same be done for the snapshot_load_job_bh and
snapshot_delete_job_bh to keep it consistent?

 migration/savevm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index c621f2359b..0086b76ab0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3459,7 +3459,7 @@ static int coroutine_fn snapshot_save_job_run(Job *job, Error **errp)
     SnapshotJob *s = container_of(job, SnapshotJob, common);
     s->errp = errp;
     s->co = qemu_coroutine_self();
-    aio_bh_schedule_oneshot(qemu_get_aio_context(),
+    aio_bh_schedule_oneshot(iohandler_get_aio_context(),
                             snapshot_save_job_bh, job);
     qemu_coroutine_yield();
     return s->ret ? 0 : -1;
-- 
2.39.2




^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-06-18 20:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-05 12:08 [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context Fiona Ebner
2024-06-05 13:59 ` Fabiano Rosas
2024-06-06 18:36 ` Stefan Hajnoczi
2024-06-11 12:08   ` Fiona Ebner
2024-06-11 14:04     ` Stefan Hajnoczi
2024-06-12  9:20       ` Fiona Ebner
2024-06-12 15:34         ` Stefan Hajnoczi
2024-06-14  9:29           ` Fiona Ebner
2024-06-18 20:34             ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).