From: Peter Xu <peterx@redhat.com>
To: Juraj Marcin <jmarcin@redhat.com>
Cc: qemu-devel@nongnu.org, Jiri Denemark <jdenemar@redhat.com>,
"Dr. David Alan Gilbert" <dave@treblig.org>,
Fabiano Rosas <farosas@suse.de>
Subject: Re: [PATCH 4/4] migration: Introduce POSTCOPY_DEVICE state
Date: Fri, 19 Sep 2025 12:58:08 -0400 [thread overview]
Message-ID: <aM2LoGDh5WsVnEi8@x1.local> (raw)
In-Reply-To: <20250915115918.3520735-5-jmarcin@redhat.com>
On Mon, Sep 15, 2025 at 01:59:15PM +0200, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
>
> Currently, when postcopy starts, the source VM starts switchover and
> sends a package containing the state of all non-postcopiable devices.
> When the destination loads this package, the switchover is complete and
> the destination VM starts. However, if the device state load fails or
> the destination side crashes, the source side is already in
> POSTCOPY_ACTIVE state and cannot be recovered, even when it has the most
> up-to-date machine state as the destination has not yet started.
>
> This patch introduces a new POSTCOPY_DEVICE state which is active
> while the destination machine is loading the device state, is not yet
> running, and the source side can be resumed in case of a migration
> failure.
>
> To transition from POSTCOPY_DEVICE to POSTCOPY_ACTIVE, the source
> side uses a PONG message that is a response to a PING message processed
> just before the POSTCOPY_RUN command that starts the destination VM.
> Thus, this change does not require any changes on the destination side
> and is effective even with older destination versions.
>
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
> ---
> migration/migration.c | 23 ++++++++++++++++++-----
> migration/savevm.h | 2 ++
> migration/trace-events | 1 +
> qapi/migration.json | 8 ++++++--
> tests/qtest/migration/precopy-tests.c | 3 ++-
> 5 files changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 7222e3de13..e63a7487be 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1223,6 +1223,7 @@ bool migration_is_running(void)
>
> switch (s->state) {
> case MIGRATION_STATUS_ACTIVE:
> + case MIGRATION_STATUS_POSTCOPY_DEVICE:
> case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> case MIGRATION_STATUS_POSTCOPY_PAUSED:
> case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
> @@ -1244,6 +1245,7 @@ static bool migration_is_active(void)
> MigrationState *s = current_migration;
>
> return (s->state == MIGRATION_STATUS_ACTIVE ||
> + s->state == MIGRATION_STATUS_POSTCOPY_DEVICE ||
> s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> }
>
> @@ -1366,6 +1368,7 @@ static void fill_source_migration_info(MigrationInfo *info)
> break;
> case MIGRATION_STATUS_ACTIVE:
> case MIGRATION_STATUS_CANCELLING:
> + case MIGRATION_STATUS_POSTCOPY_DEVICE:
> case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> case MIGRATION_STATUS_PRE_SWITCHOVER:
> case MIGRATION_STATUS_DEVICE:
> @@ -1419,6 +1422,7 @@ static void fill_destination_migration_info(MigrationInfo *info)
> case MIGRATION_STATUS_CANCELLING:
> case MIGRATION_STATUS_CANCELLED:
> case MIGRATION_STATUS_ACTIVE:
> + case MIGRATION_STATUS_POSTCOPY_DEVICE:
> case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> case MIGRATION_STATUS_POSTCOPY_PAUSED:
> case MIGRATION_STATUS_POSTCOPY_RECOVER:
> @@ -1719,6 +1723,7 @@ bool migration_in_postcopy(void)
> MigrationState *s = migrate_get_current();
>
> switch (s->state) {
> + case MIGRATION_STATUS_POSTCOPY_DEVICE:
> case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> case MIGRATION_STATUS_POSTCOPY_PAUSED:
> case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
> @@ -2564,6 +2569,11 @@ static void *source_return_path_thread(void *opaque)
> tmp32 = ldl_be_p(buf);
> trace_source_return_path_thread_pong(tmp32);
> qemu_sem_post(&ms->rp_state.rp_pong_acks);
> + if (tmp32 == QEMU_VM_PING_PACKAGED_LOADED) {
> + trace_source_return_path_thread_dst_started();
> + migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
> + MIGRATION_STATUS_POSTCOPY_ACTIVE);
Could this race with the migration thread modifying the state concurrently?
To avoid it, we could have a bool, set it here once, and in the iterations
do something like:
diff --git a/migration/migration.c b/migration/migration.c
index 10c216d25d..55230e10ee 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3449,6 +3449,16 @@ static MigIterateState migration_iteration_run(MigrationState *s)
trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy);
if (in_postcopy) {
+ if (s->postcopy_package_loaded) {
+ assert(s->state == MIGRATION_STATUS_POSTCOPY_DEVICE);
+ migrate_set_state(s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
+ MIGRATION_STATUS_POSTCOPY_ACTIVE);
+ /*
+ * Since postcopy cannot be re-initiated, this flag will only
+ * be set at most once for QEMU's whole lifecyce.
+ */
+ s->postcopy_package_loaded = false;
+ }
/*
* Iterate in postcopy until all pending data flushed. Note that
* postcopy completion doesn't rely on can_switchover, because when
> + }
> break;
>
> case MIG_RP_MSG_REQ_PAGES:
> @@ -2814,6 +2824,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
> if (migrate_postcopy_ram()) {
> qemu_savevm_send_ping(fb, 3);
> }
> + qemu_savevm_send_ping(fb, QEMU_VM_PING_PACKAGED_LOADED);
Nitpick: some comment would be nice here describing this ping, especially
when it becomes functional. The name does provide some info, but not
extremely clear what PACKAGED_LOADED mean if in a PONG yet.
>
> qemu_savevm_send_postcopy_run(fb);
>
> @@ -2871,7 +2882,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>
> /* Now, switchover looks all fine, switching to postcopy-active */
> migrate_set_state(&ms->state, MIGRATION_STATUS_DEVICE,
> - MIGRATION_STATUS_POSTCOPY_ACTIVE);
> + MIGRATION_STATUS_POSTCOPY_DEVICE);
>
> bql_unlock();
>
> @@ -3035,7 +3046,8 @@ static void migration_completion(MigrationState *s)
>
> if (s->state == MIGRATION_STATUS_ACTIVE) {
> ret = migration_completion_precopy(s);
> - } else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> + } else if (s->state == MIGRATION_STATUS_POSTCOPY_DEVICE ||
> + s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
Exactly. We need to be prepared that src sending too fast so when device
loading on dest we finished.
> migration_completion_postcopy(s);
> } else {
> ret = -1;
> @@ -3311,8 +3323,8 @@ static MigThrError migration_detect_error(MigrationState *s)
> return postcopy_pause(s);
> } else {
> /*
> - * For precopy (or postcopy with error outside IO), we fail
> - * with no time.
> + * For precopy (or postcopy with error outside IO, or before dest
> + * starts), we fail with no time.
> */
> migrate_set_state(&s->state, state, MIGRATION_STATUS_FAILED);
> trace_migration_thread_file_err();
> @@ -3447,7 +3459,8 @@ static MigIterateState migration_iteration_run(MigrationState *s)
> {
> uint64_t must_precopy, can_postcopy, pending_size;
> Error *local_err = NULL;
> - bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
> + bool in_postcopy = (s->state == MIGRATION_STATUS_POSTCOPY_DEVICE ||
> + s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> bool can_switchover = migration_can_switchover(s);
> bool complete_ready;
>
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 2d5e9c7166..c4de0325eb 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -29,6 +29,8 @@
> #define QEMU_VM_COMMAND 0x08
> #define QEMU_VM_SECTION_FOOTER 0x7e
>
> +#define QEMU_VM_PING_PACKAGED_LOADED 0x42
Only curious how you picked it. :)
It's fine, it can also be 5, as we know exactly what we used to use (1-4).
> +
> bool qemu_savevm_state_blocked(Error **errp);
> void qemu_savevm_non_migratable_list(strList **reasons);
> int qemu_savevm_state_prepare(Error **errp);
> diff --git a/migration/trace-events b/migration/trace-events
> index 706db97def..007b5c407e 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -191,6 +191,7 @@ source_return_path_thread_pong(uint32_t val) "0x%x"
> source_return_path_thread_shut(uint32_t val) "0x%x"
> source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
> source_return_path_thread_switchover_acked(void) ""
> +source_return_path_thread_dst_started(void) ""
> migration_thread_low_pending(uint64_t pending) "%" PRIu64
> migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
> process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 2387c21e9c..89a20d858d 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -142,6 +142,10 @@
> # @postcopy-active: like active, but now in postcopy mode.
> # (since 2.5)
> #
> +# @postcopy-device: like postcopy-active, but the destination is still
> +# loading device state and is not running yet. If migration fails
> +# during this state, the source side will resume. (since 10.2)
Is it worthwhile to mention we could jump from postcopy-device to
completed? I also wonder if it happens if libvirt would get confused.
Worth checking with Jiri.
Thanks,
> +#
> # @postcopy-paused: during postcopy but paused. (since 3.0)
> #
> # @postcopy-recover-setup: setup phase for a postcopy recovery
> @@ -173,8 +177,8 @@
> ##
> { 'enum': 'MigrationStatus',
> 'data': [ 'none', 'setup', 'cancelling', 'cancelled',
> - 'active', 'postcopy-active', 'postcopy-paused',
> - 'postcopy-recover-setup',
> + 'active', 'postcopy-device', 'postcopy-active',
> + 'postcopy-paused', 'postcopy-recover-setup',
> 'postcopy-recover', 'completed', 'failed', 'colo',
> 'pre-switchover', 'device', 'wait-unplug' ] }
> ##
> diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
> index bb38292550..57ca623de5 100644
> --- a/tests/qtest/migration/precopy-tests.c
> +++ b/tests/qtest/migration/precopy-tests.c
> @@ -1316,13 +1316,14 @@ void migration_test_add_precopy(MigrationTestEnv *env)
> }
>
> /* ensure new status don't go unnoticed */
> - assert(MIGRATION_STATUS__MAX == 15);
> + assert(MIGRATION_STATUS__MAX == 16);
>
> for (int i = MIGRATION_STATUS_NONE; i < MIGRATION_STATUS__MAX; i++) {
> switch (i) {
> case MIGRATION_STATUS_DEVICE: /* happens too fast */
> case MIGRATION_STATUS_WAIT_UNPLUG: /* no support in tests */
> case MIGRATION_STATUS_COLO: /* no support in tests */
> + case MIGRATION_STATUS_POSTCOPY_DEVICE: /* postcopy can't be cancelled */
> case MIGRATION_STATUS_POSTCOPY_ACTIVE: /* postcopy can't be cancelled */
> case MIGRATION_STATUS_POSTCOPY_PAUSED:
> case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
> --
> 2.51.0
>
--
Peter Xu
next prev parent reply other threads:[~2025-09-19 16:59 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-15 11:59 [PATCH 0/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
2025-09-15 11:59 ` [PATCH 1/4] migration: Do not try to start VM if disk activation fails Juraj Marcin
2025-09-19 16:12 ` Fabiano Rosas
2025-09-15 11:59 ` [PATCH 2/4] migration: Accept MigrationStatus in migration_has_failed() Juraj Marcin
2025-09-19 14:57 ` Peter Xu
2025-09-22 11:26 ` Juraj Marcin
2025-09-15 11:59 ` [PATCH 3/4] migration: Refactor incoming cleanup into migration_incoming_finish() Juraj Marcin
2025-09-19 15:53 ` Peter Xu
2025-09-19 16:46 ` Fabiano Rosas
2025-09-22 12:58 ` Juraj Marcin
2025-09-22 15:51 ` Peter Xu
2025-09-22 17:40 ` Fabiano Rosas
2025-09-22 17:48 ` Peter Xu
2025-09-23 14:58 ` Juraj Marcin
2025-09-23 16:17 ` Peter Xu
2025-09-15 11:59 ` [PATCH 4/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
2025-09-19 16:58 ` Peter Xu [this message]
2025-09-19 17:50 ` Peter Xu
2025-09-22 13:34 ` Juraj Marcin
2025-09-22 16:16 ` Peter Xu
2025-09-23 14:23 ` Juraj Marcin
2025-09-25 11:54 ` Jiří Denemark
2025-09-25 18:22 ` Peter Xu
2025-09-30 7:53 ` Jiří Denemark
2025-09-30 20:04 ` Peter Xu
2025-10-01 8:43 ` Jiří Denemark
2025-10-01 11:05 ` Dr. David Alan Gilbert
2025-10-01 14:26 ` Jiří Denemark
2025-10-01 15:53 ` Dr. David Alan Gilbert
2025-10-01 15:10 ` Daniel P. Berrangé
2025-10-02 12:17 ` Jiří Denemark
2025-10-02 13:12 ` Dr. David Alan Gilbert
2025-10-01 10:09 ` Juraj Marcin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aM2LoGDh5WsVnEi8@x1.local \
--to=peterx@redhat.com \
--cc=dave@treblig.org \
--cc=farosas@suse.de \
--cc=jdenemar@redhat.com \
--cc=jmarcin@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.