From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org
Cc: Thomas Huth <thuth@redhat.com>,
Markus Armbruster <armbru@redhat.com>,
Laurent Vivier <lvivier@redhat.com>,
Eric Blake <eblake@redhat.com>,
Prasad Pandit <ppandit@redhat.com>,
peterx@redhat.com, Jiri Denemark <jdenemar@redhat.com>,
Bandan Das <bdas@redhat.com>
Subject: Re: [PATCH v2 05/10] migration/postcopy: Add postcopy-recover-setup phase
Date: Mon, 17 Jun 2024 16:45:04 -0300 [thread overview]
Message-ID: <87iky7bbgf.fsf@suse.de> (raw)
In-Reply-To: <20240617181534.1425179-6-peterx@redhat.com>
Peter Xu <peterx@redhat.com> writes:
> This patch adds a migration state on src called "postcopy-recover-setup".
> The new state will describe the intermediate step starting from when the
> src QEMU received a postcopy recovery request, until the migration channels
> are properly established, but before the recovery process take place.
>
> The request came from Libvirt where Libvirt currently rely on the migration
> state events to detect migration state changes. That works for most of the
> migration process but except postcopy recovery failures at the beginning.
>
> Currently postcopy recovery only has two major states:
>
> - postcopy-paused: this is the state that both sides of QEMU will be in
> for a long time as long as the migration channel was interrupted.
>
> - postcopy-recover: this is the state where both sides of QEMU handshake
> with each other, preparing for a continuation of postcopy which used to
> be interrupted.
>
> The issue here is when the recovery port is invalid, the src QEMU will take
> the URI/channels, noticing the ports are not valid, and it'll silently keep
> in the postcopy-paused state, with no event sent to Libvirt. In this case,
> the only thing Libvirt can do is to poll the migration status with a proper
> interval, however that's less optimal.
>
> Considering that this is the only case where Libvirt won't get a
> notification from QEMU on such events, let's add postcopy-recover-setup
> state to mimic what we have with the "setup" state of a newly initialized
> migration, describing the phase of connection establishment.
>
> With that, postcopy recovery will have two paths to go now, and either path
> will guarantee an event generated. Now the events will look like this
> during a recovery process on src QEMU:
>
> - Initially when the recovery is initiated on src, QEMU will go from
> "postcopy-paused" -> "postcopy-recover-setup". Old QEMUs don't have
> this event.
>
> - Depending on whether the channel re-establishment is succeeded:
>
> - In succeeded case, src QEMU will move from "postcopy-recover-setup"
> to "postcopy-recover". Old QEMUs also have this event.
>
> - In failure case, src QEMU will move from "postcopy-recover-setup" to
> "postcopy-paused" again. Old QEMUs don't have this event.
>
> This guarantees that Libvirt will always receive a notification for
> recovery process properly.
>
> One thing to mention is, such new status is only needed on src QEMU not
> both. On dest QEMU, the state machine doesn't change. Hence the events
> don't change either. It's done like so because dest QEMU may not have an
> explicit point of setup start. E.g., it can happen that when dest QEMUs
> doesn't use migrate-recover command to use a new URI/channel, but the old
> URI/channels can be reused in recovery, in which case the old ports simply
> can work again after the network routes are fixed up.
>
> Add a new helper postcopy_is_paused() detecting whether postcopy is still
> paused, taking RECOVER_SETUP into account too. When using it on both
> src/dst, a slight change is done altogether to always wait for the
> semaphore before checking the status, because for both sides a sem_post()
> will be required for a recovery.
>
> Cc: Jiri Denemark <jdenemar@redhat.com>
> Cc: Fabiano Rosas <farosas@suse.de>
> Cc: Prasad Pandit <ppandit@redhat.com>
> Buglink: https://issues.redhat.com/browse/RHEL-38485
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
next prev parent reply other threads:[~2024-06-17 19:45 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
2024-06-17 18:15 ` [PATCH v2 01/10] migration/multifd: Avoid the final FLUSH in complete() Peter Xu
2024-06-17 18:15 ` [PATCH v2 02/10] migration: Rename thread debug names Peter Xu
2024-06-19 1:05 ` Zhijian Li (Fujitsu) via
2024-06-17 18:15 ` [PATCH v2 03/10] migration: Use MigrationStatus instead of int Peter Xu
2024-06-17 19:38 ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 04/10] migration: Cleanup incoming migration setup state change Peter Xu
2024-06-17 19:41 ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 05/10] migration/postcopy: Add postcopy-recover-setup phase Peter Xu
2024-06-17 19:45 ` Fabiano Rosas [this message]
2024-06-17 18:15 ` [PATCH v2 06/10] migration/docs: Update postcopy recover session for SETUP phase Peter Xu
2024-06-17 19:47 ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 07/10] tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure tests Peter Xu
2024-06-17 19:49 ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 08/10] tests/migration-tests: Always enable migration events Peter Xu
2024-06-17 19:51 ` Fabiano Rosas
2024-06-17 21:23 ` Peter Xu
2024-06-19 20:39 ` Peter Xu
2024-06-17 18:15 ` [PATCH v2 09/10] tests/migration-tests: Verify postcopy-recover-setup status Peter Xu
2024-06-17 19:53 ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 10/10] tests/migration-tests: Cover postcopy failure on reconnect Peter Xu
2024-06-17 20:07 ` Fabiano Rosas
2024-06-17 19:34 ` [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
2024-06-17 20:12 ` Fabiano Rosas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87iky7bbgf.fsf@suse.de \
--to=farosas@suse.de \
--cc=armbru@redhat.com \
--cc=bdas@redhat.com \
--cc=eblake@redhat.com \
--cc=jdenemar@redhat.com \
--cc=lvivier@redhat.com \
--cc=peterx@redhat.com \
--cc=ppandit@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).