From: Peter Xu <peterx@redhat.com>
To: Pranav Tyagi <prtyagi@redhat.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
Juraj Marcin <jmarcin@redhat.com>,
Prasad Pandit <ppandit@redhat.com>
Subject: Re: [PATCH] migration: Fix blocking in POSTCOPY_DEVICE during package load
Date: Tue, 21 Apr 2026 17:25:16 -0400 [thread overview]
Message-ID: <aefrPBN7k94Jj2dh@x1.local> (raw)
In-Reply-To: <20260421052227.8278-1-prtyagi@redhat.com>
On Tue, Apr 21, 2026 at 10:52:27AM +0530, Pranav Tyagi wrote:
> The package_loaded event is not set in case MIG_RP_MSG_PONG does not
> arrive on the source from the destination in the return path thread. The
> migration thread would then be blocked waiting for package_loaded event
> indefinitely in POSTCOPY_DEVICE state. Where as, in such a condition the
> source VM can safely resume as the destination has not yet started. The
> pong message can get lost in case of a network failure or destination
> crash before sending the pong.
>
> This patch uses the error detected in case of network failure or
> destination crash to set the package_loaded event in the out path of the
> return path thread. This will kick the migration thread out from
> a condition of indefinitely waiting for the package_loaded event. The
> migration thread then fails early and breaks from the migration loop to
> resume the VM on the source side.
>
> Fixes: 7b842fe354c6 ("migration: Introduce POSTCOPY_DEVICE state")
> Signed-off-by: Pranav Tyagi <prtyagi@redhat.com>
Ah I see.. thanks for figuring this out.
> ---
> migration/migration.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 5c9aaa6e58..1656c1203c 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2386,6 +2386,15 @@ out:
> if (err) {
> migrate_error_propagate(ms, err);
> trace_source_return_path_thread_bad_end();
> + if (ms->state == MIGRATION_STATUS_POSTCOPY_DEVICE) {
> + /*
> + * Kick the migration thread if it gets stuck in
> + * POSTCOPY_DEVICE state waiting for
> + * postcopy_package_loaded_event. The event will never be
> + * set as MIG_RP_MSG_PONG from the destination is lost.
> + */
> + qemu_event_set(&ms->postcopy_package_loaded_event);
> + }
This makes sense. Said that, we have another similar case right below for
the postcopy recover path:
if (ms->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
/*
* this will be extremely unlikely: that we got yet another network
* issue during recovering of the 1st network failure.. during this
* period the main migration thread can be waiting on rp_sem for
* this thread to sync with the other side.
*
* When this happens, explicitly kick the migration thread out of
* RECOVER stage and back to PAUSED, so the admin can try
* everything again.
*/
migration_rp_kick(ms);
}
It achieves some similar goal, where the migration thread is waiting for
some event from return path thread, and when the channel is broken we want
to kick it out.
I'm thinking if we can reuse that, instead of kicking multiple events.
Say, we can do migration_rp_kick() for both POSTCOPY_RECOVER and
POSTCOPY_DEVICE states in the rp thread. Meanwhile at below..
> }
>
> if (ms->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
> @@ -3232,6 +3241,17 @@ static MigIterateState migration_iteration_run(MigrationState *s)
> * package before actually completing.
> */
> qemu_event_wait(&s->postcopy_package_loaded_event);
> + /*
> + * Check for errors in case the migration thread was stuck in
> + * POSTCOPY_DEVICE state waiting for the
> + * postcopy_package_loaded_event which was never set.
> + * If so, fail now and break out of the iteration.
> + */
> + if (migrate_has_error(s)) {
> + migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
> + MIGRATION_STATUS_FAILING);
> + return MIG_ITERATE_BREAK;
> + }
... instead of waiting on postcopy_package_loaded_event, we can do:
while (!s->postcopy_package_loaded) {
if (migration_rp_wait(s)) {
/* Error happened */
migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
MIGRATION_STATUS_FAILING);
return MIG_ITERATE_BREAK;
}
}
/* Acknowledgement received from dest */
migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
MIGRATION_STATUS_POSTCOPY_ACTIVE);
PS: I think event also works in this case, so likely either one shared sem
or event should work; it's just that it was a sem before, and rp_sem has a
name that is generic enough to apply directly to the
postcopy_package_loaded_event use case.
Thanks,
> migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
> MIGRATION_STATUS_POSTCOPY_ACTIVE);
> }
> --
> 2.53.0
>
--
Peter Xu
next prev parent reply other threads:[~2026-04-21 21:26 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 5:22 [PATCH] migration: Fix blocking in POSTCOPY_DEVICE during package load Pranav Tyagi
2026-04-21 21:25 ` Peter Xu [this message]
2026-04-22 8:49 ` Pranav Tyagi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aefrPBN7k94Jj2dh@x1.local \
--to=peterx@redhat.com \
--cc=farosas@suse.de \
--cc=jmarcin@redhat.com \
--cc=ppandit@redhat.com \
--cc=prtyagi@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.