From: Stefan Hajnoczi <stefanha@redhat.com>
To: Fiona Ebner <f.ebner@proxmox.com>
Cc: qemu-devel@nongnu.org, peterx@redhat.com, farosas@suse.de,
pbonzini@redhat.com
Subject: Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context
Date: Tue, 11 Jun 2024 10:04:49 -0400 [thread overview]
Message-ID: <20240611140449.GA366375@fedora.redhat.com> (raw)
In-Reply-To: <6d64f07d-1638-44dc-848b-b307c0ebd0ad@proxmox.com>
[-- Attachment #1: Type: text/plain, Size: 2735 bytes --]
On Tue, Jun 11, 2024 at 02:08:49PM +0200, Fiona Ebner wrote:
> Am 06.06.24 um 20:36 schrieb Stefan Hajnoczi:
> > On Wed, Jun 05, 2024 at 02:08:48PM +0200, Fiona Ebner wrote:
> >> The fact that the snapshot_save_job_bh() is scheduled in the main
> >> loop's qemu_aio_context AioContext means that it might get executed
> >> during a vCPU thread's aio_poll(). But saving of the VM state cannot
> >> happen while the guest or devices are active and can lead to assertion
> >> failures. See issue #2111 for two examples. Avoid the problem by
> >> scheduling the snapshot_save_job_bh() in the iohandler AioContext,
> >> which is not polled by vCPU threads.
> >>
> >> Solves Issue #2111.
> >>
> >> This change also solves the following issue:
> >>
> >> Since commit effd60c878 ("monitor: only run coroutine commands in
> >> qemu_aio_context"), the 'snapshot-save' QMP call would not respond
> >> right after starting the job anymore, but only after the job finished,
> >> which can take a long time. The reason is, because after commit
> >> effd60c878, do_qmp_dispatch_bh() runs in the iohandler AioContext.
> >> When do_qmp_dispatch_bh() wakes the qmp_dispatch() coroutine, the
> >> coroutine cannot be entered immediately anymore, but needs to be
> >> scheduled to the main loop's qemu_aio_context AioContext. But
> >> snapshot_save_job_bh() was scheduled first to the same AioContext and
> >> thus gets executed first.
> >>
> >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/2111
> >> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> >> ---
> >>
> >> While initial smoke testing seems fine, I'm not familiar enough with
> >> this to rule out any pitfalls with the approach. Any reason why
> >> scheduling to the iohandler AioContext could be wrong here?
> >
> > If something waits for a BlockJob to finish using aio_poll() from
> > qemu_aio_context then a deadlock is possible since the iohandler_ctx
> > won't get a chance to execute. The only suspicious code path I found was
> > job_completed_txn_abort_locked() -> job_finish_sync_locked() but I'm not
> > sure whether it triggers this scenario. Please check that code path.
> >
>
> Sorry, I don't understand. Isn't executing the scheduled BH the only
> additional progress that the iohandler_ctx needs to make compared to
> before the patch? How exactly would that cause issues when waiting for a
> BlockJob?
>
> Or do you mean something waiting for the SnapshotJob from
> qemu_aio_context before snapshot_save_job_bh had the chance to run?
Yes, exactly. job_finish_sync_locked() will hang since iohandler_ctx has
no chance to execute. But I haven't audited the code to understand
whether this can happen.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2024-06-11 14:05 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-05 12:08 [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context Fiona Ebner
2024-06-05 13:59 ` Fabiano Rosas
2024-06-06 18:36 ` Stefan Hajnoczi
2024-06-11 12:08 ` Fiona Ebner
2024-06-11 14:04 ` Stefan Hajnoczi [this message]
2024-06-12 9:20 ` Fiona Ebner
2024-06-12 15:34 ` Stefan Hajnoczi
2024-06-14 9:29 ` Fiona Ebner
2024-06-18 20:34 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240611140449.GA366375@fedora.redhat.com \
--to=stefanha@redhat.com \
--cc=f.ebner@proxmox.com \
--cc=farosas@suse.de \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.