All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org
Cc: Thomas Huth <thuth@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	Eric Blake <eblake@redhat.com>,
	Prasad Pandit <ppandit@redhat.com>,
	peterx@redhat.com, Jiri Denemark <jdenemar@redhat.com>,
	Bandan Das <bdas@redhat.com>
Subject: Re: [PATCH v2 05/10] migration/postcopy: Add postcopy-recover-setup phase
Date: Mon, 17 Jun 2024 16:45:04 -0300	[thread overview]
Message-ID: <87iky7bbgf.fsf@suse.de> (raw)
In-Reply-To: <20240617181534.1425179-6-peterx@redhat.com>

Peter Xu <peterx@redhat.com> writes:

> This patch adds a migration state on src called "postcopy-recover-setup".
> The new state will describe the intermediate step starting from when the
> src QEMU received a postcopy recovery request, until the migration channels
> are properly established, but before the recovery process take place.
>
> The request came from Libvirt where Libvirt currently rely on the migration
> state events to detect migration state changes.  That works for most of the
> migration process but except postcopy recovery failures at the beginning.
>
> Currently postcopy recovery only has two major states:
>
>   - postcopy-paused: this is the state that both sides of QEMU will be in
>     for a long time as long as the migration channel was interrupted.
>
>   - postcopy-recover: this is the state where both sides of QEMU handshake
>     with each other, preparing for a continuation of postcopy which used to
>     be interrupted.
>
> The issue here is when the recovery port is invalid, the src QEMU will take
> the URI/channels, noticing the ports are not valid, and it'll silently keep
> in the postcopy-paused state, with no event sent to Libvirt.  In this case,
> the only thing Libvirt can do is to poll the migration status with a proper
> interval, however that's less optimal.
>
> Considering that this is the only case where Libvirt won't get a
> notification from QEMU on such events, let's add postcopy-recover-setup
> state to mimic what we have with the "setup" state of a newly initialized
> migration, describing the phase of connection establishment.
>
> With that, postcopy recovery will have two paths to go now, and either path
> will guarantee an event generated.  Now the events will look like this
> during a recovery process on src QEMU:
>
>   - Initially when the recovery is initiated on src, QEMU will go from
>     "postcopy-paused" -> "postcopy-recover-setup".  Old QEMUs don't have
>     this event.
>
>   - Depending on whether the channel re-establishment is succeeded:
>
>     - In succeeded case, src QEMU will move from "postcopy-recover-setup"
>       to "postcopy-recover".  Old QEMUs also have this event.
>
>     - In failure case, src QEMU will move from "postcopy-recover-setup" to
>       "postcopy-paused" again.  Old QEMUs don't have this event.
>
> This guarantees that Libvirt will always receive a notification for
> recovery process properly.
>
> One thing to mention is, such new status is only needed on src QEMU not
> both.  On dest QEMU, the state machine doesn't change.  Hence the events
> don't change either.  It's done like so because dest QEMU may not have an
> explicit point of setup start.  E.g., it can happen that when dest QEMUs
> doesn't use migrate-recover command to use a new URI/channel, but the old
> URI/channels can be reused in recovery, in which case the old ports simply
> can work again after the network routes are fixed up.
>
> Add a new helper postcopy_is_paused() detecting whether postcopy is still
> paused, taking RECOVER_SETUP into account too.  When using it on both
> src/dst, a slight change is done altogether to always wait for the
> semaphore before checking the status, because for both sides a sem_post()
> will be required for a recovery.
>
> Cc: Jiri Denemark <jdenemar@redhat.com>
> Cc: Fabiano Rosas <farosas@suse.de>
> Cc: Prasad Pandit <ppandit@redhat.com>
> Buglink: https://issues.redhat.com/browse/RHEL-38485
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


  reply	other threads:[~2024-06-17 19:45 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
2024-06-17 18:15 ` [PATCH v2 01/10] migration/multifd: Avoid the final FLUSH in complete() Peter Xu
2024-06-17 18:15 ` [PATCH v2 02/10] migration: Rename thread debug names Peter Xu
2024-06-19  1:05   ` Zhijian Li (Fujitsu) via
2024-06-17 18:15 ` [PATCH v2 03/10] migration: Use MigrationStatus instead of int Peter Xu
2024-06-17 19:38   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 04/10] migration: Cleanup incoming migration setup state change Peter Xu
2024-06-17 19:41   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 05/10] migration/postcopy: Add postcopy-recover-setup phase Peter Xu
2024-06-17 19:45   ` Fabiano Rosas [this message]
2024-06-17 18:15 ` [PATCH v2 06/10] migration/docs: Update postcopy recover session for SETUP phase Peter Xu
2024-06-17 19:47   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 07/10] tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure tests Peter Xu
2024-06-17 19:49   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 08/10] tests/migration-tests: Always enable migration events Peter Xu
2024-06-17 19:51   ` Fabiano Rosas
2024-06-17 21:23     ` Peter Xu
2024-06-19 20:39       ` Peter Xu
2024-06-17 18:15 ` [PATCH v2 09/10] tests/migration-tests: Verify postcopy-recover-setup status Peter Xu
2024-06-17 19:53   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 10/10] tests/migration-tests: Cover postcopy failure on reconnect Peter Xu
2024-06-17 20:07   ` Fabiano Rosas
2024-06-17 19:34 ` [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
2024-06-17 20:12   ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87iky7bbgf.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=armbru@redhat.com \
    --cc=bdas@redhat.com \
    --cc=eblake@redhat.com \
    --cc=jdenemar@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=peterx@redhat.com \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.