Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stefan Hajnoczi <stefanha@redhat.com>
To: Fiona Ebner <f.ebner@proxmox.com>
Cc: qemu-devel@nongnu.org, peterx@redhat.com, farosas@suse.de,
	pbonzini@redhat.com
Subject: Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context
Date: Thu, 6 Jun 2024 14:36:38 -0400	[thread overview]
Message-ID: <20240606183638.GC198201@fedora.redhat.com> (raw)
In-Reply-To: <20240605120848.358654-1-f.ebner@proxmox.com>

[-- Attachment #1: Type: text/plain, Size: 3399 bytes --]

On Wed, Jun 05, 2024 at 02:08:48PM +0200, Fiona Ebner wrote:
> The fact that the snapshot_save_job_bh() is scheduled in the main
> loop's qemu_aio_context AioContext means that it might get executed
> during a vCPU thread's aio_poll(). But saving of the VM state cannot
> happen while the guest or devices are active and can lead to assertion
> failures. See issue #2111 for two examples. Avoid the problem by
> scheduling the snapshot_save_job_bh() in the iohandler AioContext,
> which is not polled by vCPU threads.
> 
> Solves Issue #2111.
> 
> This change also solves the following issue:
> 
> Since commit effd60c878 ("monitor: only run coroutine commands in
> qemu_aio_context"), the 'snapshot-save' QMP call would not respond
> right after starting the job anymore, but only after the job finished,
> which can take a long time. The reason is, because after commit
> effd60c878, do_qmp_dispatch_bh() runs in the iohandler AioContext.
> When do_qmp_dispatch_bh() wakes the qmp_dispatch() coroutine, the
> coroutine cannot be entered immediately anymore, but needs to be
> scheduled to the main loop's qemu_aio_context AioContext. But
> snapshot_save_job_bh() was scheduled first to the same AioContext and
> thus gets executed first.
> 
> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/2111
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> 
> While initial smoke testing seems fine, I'm not familiar enough with
> this to rule out any pitfalls with the approach. Any reason why
> scheduling to the iohandler AioContext could be wrong here?

If something waits for a BlockJob to finish using aio_poll() from
qemu_aio_context then a deadlock is possible since the iohandler_ctx
won't get a chance to execute. The only suspicious code path I found was
job_completed_txn_abort_locked() -> job_finish_sync_locked() but I'm not
sure whether it triggers this scenario. Please check that code path.

> Should the same be done for the snapshot_load_job_bh and
> snapshot_delete_job_bh to keep it consistent?

In the long term it would be cleaner to move away from synchronous APIs
that rely on nested event loops. They have been a source of bugs for
years.

If vm_stop() and perhaps other operations in save_snapshot() were
asynchronous, then it would be safe to run the operation in
qemu_aio_context without using iohandler_ctx. vm_stop() wouldn't invoke
its callback until vCPUs were quiesced and outside device emulation
code.

I think this patch is fine as a one-line bug fix, but we should be
careful about falling back on this trick because it makes the codebase
harder to understand and more fragile.

> 
>  migration/savevm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index c621f2359b..0086b76ab0 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -3459,7 +3459,7 @@ static int coroutine_fn snapshot_save_job_run(Job *job, Error **errp)
>      SnapshotJob *s = container_of(job, SnapshotJob, common);
>      s->errp = errp;
>      s->co = qemu_coroutine_self();
> -    aio_bh_schedule_oneshot(qemu_get_aio_context(),
> +    aio_bh_schedule_oneshot(iohandler_get_aio_context(),
>                              snapshot_save_job_bh, job);
>      qemu_coroutine_yield();
>      return s->ret ? 0 : -1;
> -- 
> 2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

next prev parent reply	other threads:[~2024-06-07  1:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05 12:08 [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context Fiona Ebner
2024-06-05 13:59 ` Fabiano Rosas
2024-06-06 18:36 ` Stefan Hajnoczi [this message]
2024-06-11 12:08   ` Fiona Ebner
2024-06-11 14:04     ` Stefan Hajnoczi
2024-06-12  9:20       ` Fiona Ebner
2024-06-12 15:34         ` Stefan Hajnoczi
2024-06-14  9:29           ` Fiona Ebner
2024-06-18 20:34             ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240606183638.GC198201@fedora.redhat.com \
    --to=stefanha@redhat.com \
    --cc=f.ebner@proxmox.com \
    --cc=farosas@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.