[PATCH v4 0/8] migration: Introduce POSTCOPY

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/8] migration: Introduce POSTCOPY_DEVICE state
@ 2025-11-03 18:32 Juraj Marcin
  2025-11-03 18:32 ` [PATCH v4 1/8] migration: Flush migration channel after sending data of CMD_PACKAGED Juraj Marcin
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Juraj Marcin @ 2025-11-03 18:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Dr. David Alan Gilbert, Fabiano Rosas, Peter Xu,
	Jiri Denemark

This series introduces a new POSTCOPY_DEVICE state that is active (both,
on source and destination side), while the destination loads the device
state. Before this series, if the destination machine failed during the
device load, the source side would stay stuck POSTCOPY_ACTIVE with no
way of recovery. With this series, if the migration fails while in
POSTCOPY_DEVICE state, the source side can safely resume, as destination
has not started yet.

RFC: https://lore.kernel.org/all/20250807114922.1013286-1-jmarcin@redhat.com/

V1: https://lore.kernel.org/all/20250915115918.3520735-1-jmarcin@redhat.com/

V2: https://lore.kernel.org/all/20251027154115.4138677-1-jmarcin@redhat.com/

V3: https://lore.kernel.org/all/20251030214915.1411860-1-jmarcin@redhat.com/

V4 changes:

- fixes for failing qemu-iotest 194
    - flush channel after sending CMD_PACKAGED data (new patch 1)
    - POSTCOPY_DEVICE state is now used only if return-path is open

Juraj Marcin (7):
  migration: Flush migration channel after sending data of CMD_PACKAGED
  migration: Move postcopy_ram_listen_thread() to postcopy-ram.c
  migration: Introduce postcopy incoming setup and cleanup functions
  migration: Refactor all incoming cleanup info
    migration_incoming_destroy()
  migration: Respect exit-on-error when migration fails before resuming
  migration: Make postcopy listen thread joinable
  migration: Introduce POSTCOPY_DEVICE state

Peter Xu (1):
  migration: Do not try to start VM if disk activation fails

 migration/migration.c                 | 123 +++++++++++++-------
 migration/migration.h                 |   4 +
 migration/postcopy-ram.c              | 161 ++++++++++++++++++++++++++
 migration/postcopy-ram.h              |   3 +
 migration/savevm.c                    | 138 ++--------------------
 migration/savevm.h                    |   2 +
 migration/trace-events                |   3 +-
 qapi/migration.json                   |  10 +-
 tests/qemu-iotests/194                |   2 +-
 tests/qtest/migration/precopy-tests.c |   3 +-
 10 files changed, 275 insertions(+), 174 deletions(-)

-- 
2.51.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/8] migration: Flush migration channel after sending data of CMD_PACKAGED
  2025-11-03 18:32 [PATCH v4 0/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
@ 2025-11-03 18:32 ` Juraj Marcin
  2025-11-03 19:04   ` Peter Xu
  2025-11-03 18:32 ` [PATCH v4 2/8] migration: Do not try to start VM if disk activation fails Juraj Marcin
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Juraj Marcin @ 2025-11-03 18:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Dr. David Alan Gilbert, Fabiano Rosas, Peter Xu,
	Jiri Denemark

From: Juraj Marcin <jmarcin@redhat.com>

If the length of the data sent after CMD_PACKAGED is just right, and
there is not much data to send afterward, it is possible part of the
CMD_PACKAGED payload will get left behind in the sending buffer. This
causes the destination side to hang while it tries to load the whole
package and initiate postcopy.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
---
 migration/savevm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 232cae090b..fa017378db 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1142,6 +1142,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len)
     qemu_savevm_command_send(f, MIG_CMD_PACKAGED, 4, (uint8_t *)&tmp);
 
     qemu_put_buffer(f, buf, len);
+    qemu_fflush(f);
 
     return 0;
 }
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/8] migration: Do not try to start VM if disk activation fails
  2025-11-03 18:32 [PATCH v4 0/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
  2025-11-03 18:32 ` [PATCH v4 1/8] migration: Flush migration channel after sending data of CMD_PACKAGED Juraj Marcin
@ 2025-11-03 18:32 ` Juraj Marcin
  2025-11-03 18:32 ` [PATCH v4 3/8] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c Juraj Marcin
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Juraj Marcin @ 2025-11-03 18:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Dr. David Alan Gilbert, Fabiano Rosas, Peter Xu,
	Jiri Denemark

From: Peter Xu <peterx@redhat.com>

If a rare split brain happens (e.g. dest QEMU started running somehow,
taking shared drive locks), src QEMU may not be able to activate the
drives anymore.  In this case, src QEMU shouldn't start the VM or it might
crash the block layer later with something like:

Meanwhile, src QEMU cannot try to continue either even if dest QEMU can
release the drive locks (e.g. by QMP "stop").  Because as long as dest QEMU
started running, it means dest QEMU's RAM is the only version that is
consistent with current status of the shared storage.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5e74993b46..6e647c7c4a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3526,6 +3526,8 @@ static MigIterateState migration_iteration_run(MigrationState *s)
 
 static void migration_iteration_finish(MigrationState *s)
 {
+    Error *local_err = NULL;
+
     bql_lock();
 
     /*
@@ -3549,11 +3551,28 @@ static void migration_iteration_finish(MigrationState *s)
     case MIGRATION_STATUS_FAILED:
     case MIGRATION_STATUS_CANCELLED:
     case MIGRATION_STATUS_CANCELLING:
-        /*
-         * Re-activate the block drives if they're inactivated.  Note, COLO
-         * shouldn't use block_active at all, so it should be no-op there.
-         */
-        migration_block_activate(NULL);
+        if (!migration_block_activate(&local_err)) {
+            /*
+            * Re-activate the block drives if they're inactivated.
+            *
+            * If it fails (e.g. in case of a split brain, where dest QEMU
+            * might have taken some of the drive locks and running!), do
+            * not start VM, instead wait for mgmt to decide the next step.
+            *
+            * If dest already started, it means dest QEMU should contain
+            * all the data it needs and it properly owns all the drive
+            * locks.  Then even if src QEMU got a FAILED in migration, it
+            * normally should mean we should treat the migration as
+            * COMPLETED.
+            *
+            * NOTE: it's not safe anymore to start VM on src now even if
+            * dest would release the drive locks.  It's because as long as
+            * dest started running then only dest QEMU's RAM is consistent
+            * with the shared storage.
+            */
+            error_free(local_err);
+            break;
+        }
         if (runstate_is_live(s->vm_old_state)) {
             if (!runstate_check(RUN_STATE_SHUTDOWN)) {
                 vm_start();
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/8] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c
  2025-11-03 18:32 [PATCH v4 0/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
  2025-11-03 18:32 ` [PATCH v4 1/8] migration: Flush migration channel after sending data of CMD_PACKAGED Juraj Marcin
  2025-11-03 18:32 ` [PATCH v4 2/8] migration: Do not try to start VM if disk activation fails Juraj Marcin
@ 2025-11-03 18:32 ` Juraj Marcin
  2025-11-03 18:32 ` [PATCH v4 4/8] migration: Introduce postcopy incoming setup and cleanup functions Juraj Marcin
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Juraj Marcin @ 2025-11-03 18:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Dr. David Alan Gilbert, Fabiano Rosas, Peter Xu,
	Jiri Denemark

From: Juraj Marcin <jmarcin@redhat.com>

This patch addresses a TODO about moving postcopy_ram_listen_thread() to
postcopy file.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 migration/postcopy-ram.c | 105 ++++++++++++++++++++++++++++++++++++++
 migration/postcopy-ram.h |   2 +
 migration/savevm.c       | 107 ---------------------------------------
 3 files changed, 107 insertions(+), 107 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 5471efb4f0..880b11f154 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2077,3 +2077,108 @@ bool postcopy_is_paused(MigrationStatus status)
     return status == MIGRATION_STATUS_POSTCOPY_PAUSED ||
         status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
 }
+
+/*
+ * Triggered by a postcopy_listen command; this thread takes over reading
+ * the input stream, leaving the main thread free to carry on loading the rest
+ * of the device state (from RAM).
+ */
+void *postcopy_ram_listen_thread(void *opaque)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    QEMUFile *f = mis->from_src_file;
+    int load_res;
+    MigrationState *migr = migrate_get_current();
+    Error *local_err = NULL;
+
+    object_ref(OBJECT(migr));
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    qemu_event_set(&mis->thread_sync_event);
+    trace_postcopy_ram_listen_thread_start();
+
+    rcu_register_thread();
+    /*
+     * Because we're a thread and not a coroutine we can't yield
+     * in qemu_file, and thus we must be blocking now.
+     */
+    qemu_file_set_blocking(f, true, &error_fatal);
+
+    /* TODO: sanity check that only postcopiable data will be loaded here */
+    load_res = qemu_loadvm_state_main(f, mis, &local_err);
+
+    /*
+     * This is tricky, but, mis->from_src_file can change after it
+     * returns, when postcopy recovery happened. In the future, we may
+     * want a wrapper for the QEMUFile handle.
+     */
+    f = mis->from_src_file;
+
+    /* And non-blocking again so we don't block in any cleanup */
+    qemu_file_set_blocking(f, false, &error_fatal);
+
+    trace_postcopy_ram_listen_thread_exit();
+    if (load_res < 0) {
+        qemu_file_set_error(f, load_res);
+        dirty_bitmap_mig_cancel_incoming();
+        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
+            !migrate_postcopy_ram() && migrate_dirty_bitmaps())
+        {
+            error_report("%s: loadvm failed during postcopy: %d: %s. All states "
+                         "are migrated except dirty bitmaps. Some dirty "
+                         "bitmaps may be lost, and present migrated dirty "
+                         "bitmaps are correctly migrated and valid.",
+                         __func__, load_res, error_get_pretty(local_err));
+            g_clear_pointer(&local_err, error_free);
+            load_res = 0; /* prevent further exit() */
+        } else {
+            error_prepend(&local_err,
+                          "loadvm failed during postcopy: %d: ", load_res);
+            migrate_set_error(migr, local_err);
+            g_clear_pointer(&local_err, error_report_err);
+            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                                           MIGRATION_STATUS_FAILED);
+        }
+    }
+    if (load_res >= 0) {
+        /*
+         * This looks good, but it's possible that the device loading in the
+         * main thread hasn't finished yet, and so we might not be in 'RUN'
+         * state yet; wait for the end of the main thread.
+         */
+        qemu_event_wait(&mis->main_thread_load_event);
+    }
+    postcopy_ram_incoming_cleanup(mis);
+
+    if (load_res < 0) {
+        /*
+         * If something went wrong then we have a bad state so exit;
+         * depending how far we got it might be possible at this point
+         * to leave the guest running and fire MCEs for pages that never
+         * arrived as a desperate recovery step.
+         */
+        rcu_unregister_thread();
+        exit(EXIT_FAILURE);
+    }
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                                   MIGRATION_STATUS_COMPLETED);
+    /*
+     * If everything has worked fine, then the main thread has waited
+     * for us to start, and we're the last use of the mis.
+     * (If something broke then qemu will have to exit anyway since it's
+     * got a bad migration state).
+     */
+    bql_lock();
+    migration_incoming_state_destroy();
+    bql_unlock();
+
+    rcu_unregister_thread();
+    mis->have_listen_thread = false;
+    postcopy_state_set(POSTCOPY_INCOMING_END);
+
+    object_unref(OBJECT(migr));
+
+    return NULL;
+}
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index ca19433b24..3e26db3e6b 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -199,4 +199,6 @@ bool postcopy_is_paused(MigrationStatus status);
 void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
                                    RAMBlock *rb);
 
+void *postcopy_ram_listen_thread(void *opaque);
+
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index fa017378db..2f7ed0db64 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2088,113 +2088,6 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     return 0;
 }
 
-/*
- * Triggered by a postcopy_listen command; this thread takes over reading
- * the input stream, leaving the main thread free to carry on loading the rest
- * of the device state (from RAM).
- * (TODO:This could do with being in a postcopy file - but there again it's
- * just another input loop, not that postcopy specific)
- */
-static void *postcopy_ram_listen_thread(void *opaque)
-{
-    MigrationIncomingState *mis = migration_incoming_get_current();
-    QEMUFile *f = mis->from_src_file;
-    int load_res;
-    MigrationState *migr = migrate_get_current();
-    Error *local_err = NULL;
-
-    object_ref(OBJECT(migr));
-
-    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
-    qemu_event_set(&mis->thread_sync_event);
-    trace_postcopy_ram_listen_thread_start();
-
-    rcu_register_thread();
-    /*
-     * Because we're a thread and not a coroutine we can't yield
-     * in qemu_file, and thus we must be blocking now.
-     */
-    qemu_file_set_blocking(f, true, &error_fatal);
-
-    /* TODO: sanity check that only postcopiable data will be loaded here */
-    load_res = qemu_loadvm_state_main(f, mis, &local_err);
-
-    /*
-     * This is tricky, but, mis->from_src_file can change after it
-     * returns, when postcopy recovery happened. In the future, we may
-     * want a wrapper for the QEMUFile handle.
-     */
-    f = mis->from_src_file;
-
-    /* And non-blocking again so we don't block in any cleanup */
-    qemu_file_set_blocking(f, false, &error_fatal);
-
-    trace_postcopy_ram_listen_thread_exit();
-    if (load_res < 0) {
-        qemu_file_set_error(f, load_res);
-        dirty_bitmap_mig_cancel_incoming();
-        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
-            !migrate_postcopy_ram() && migrate_dirty_bitmaps())
-        {
-            error_report("%s: loadvm failed during postcopy: %d: %s. All states "
-                         "are migrated except dirty bitmaps. Some dirty "
-                         "bitmaps may be lost, and present migrated dirty "
-                         "bitmaps are correctly migrated and valid.",
-                         __func__, load_res, error_get_pretty(local_err));
-            g_clear_pointer(&local_err, error_free);
-            load_res = 0; /* prevent further exit() */
-        } else {
-            error_prepend(&local_err,
-                          "loadvm failed during postcopy: %d: ", load_res);
-            migrate_set_error(migr, local_err);
-            g_clear_pointer(&local_err, error_report_err);
-            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                                           MIGRATION_STATUS_FAILED);
-        }
-    }
-    if (load_res >= 0) {
-        /*
-         * This looks good, but it's possible that the device loading in the
-         * main thread hasn't finished yet, and so we might not be in 'RUN'
-         * state yet; wait for the end of the main thread.
-         */
-        qemu_event_wait(&mis->main_thread_load_event);
-    }
-    postcopy_ram_incoming_cleanup(mis);
-
-    if (load_res < 0) {
-        /*
-         * If something went wrong then we have a bad state so exit;
-         * depending how far we got it might be possible at this point
-         * to leave the guest running and fire MCEs for pages that never
-         * arrived as a desperate recovery step.
-         */
-        rcu_unregister_thread();
-        exit(EXIT_FAILURE);
-    }
-
-    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                                   MIGRATION_STATUS_COMPLETED);
-    /*
-     * If everything has worked fine, then the main thread has waited
-     * for us to start, and we're the last use of the mis.
-     * (If something broke then qemu will have to exit anyway since it's
-     * got a bad migration state).
-     */
-    bql_lock();
-    migration_incoming_state_destroy();
-    bql_unlock();
-
-    rcu_unregister_thread();
-    mis->have_listen_thread = false;
-    postcopy_state_set(POSTCOPY_INCOMING_END);
-
-    object_unref(OBJECT(migr));
-
-    return NULL;
-}
-
 /* After this message we must be able to immediately receive postcopy data */
 static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis,
                                          Error **errp)
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/8] migration: Introduce postcopy incoming setup and cleanup functions
  2025-11-03 18:32 [PATCH v4 0/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
                   ` (2 preceding siblings ...)
  2025-11-03 18:32 ` [PATCH v4 3/8] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c Juraj Marcin
@ 2025-11-03 18:32 ` Juraj Marcin
  2025-11-03 18:32 ` [PATCH v4 5/8] migration: Refactor all incoming cleanup info migration_incoming_destroy() Juraj Marcin
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Juraj Marcin @ 2025-11-03 18:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Dr. David Alan Gilbert, Fabiano Rosas, Peter Xu,
	Jiri Denemark

From: Juraj Marcin <jmarcin@redhat.com>

After moving postcopy_ram_listen_thread() to postcopy file, this patch
introduces a pair of functions, postcopy_incoming_setup() and
postcopy_incoming_cleanup(). These functions encapsulate setup and
cleanup of all incoming postcopy resources, postcopy-ram and postcopy
listen thread.

Furthermore, this patch also renames the postcopy_ram_listen_thread to
postcopy_listen_thread, as this thread handles not only postcopy-ram,
but also dirty-bitmaps and in the future it could handle other
postcopiable devices.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    |  2 +-
 migration/postcopy-ram.c | 44 ++++++++++++++++++++++++++++++++++++++--
 migration/postcopy-ram.h |  3 ++-
 migration/savevm.c       | 25 ++---------------------
 4 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6e647c7c4a..9a367f717e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -892,7 +892,7 @@ process_incoming_migration_co(void *opaque)
              * but managed to complete within the precopy period, we can use
              * the normal exit.
              */
-            postcopy_ram_incoming_cleanup(mis);
+            postcopy_incoming_cleanup(mis);
         } else if (ret >= 0) {
             /*
              * Postcopy was started, cleanup should happen at the end of the
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 880b11f154..b47c955763 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2083,7 +2083,7 @@ bool postcopy_is_paused(MigrationStatus status)
  * the input stream, leaving the main thread free to carry on loading the rest
  * of the device state (from RAM).
  */
-void *postcopy_ram_listen_thread(void *opaque)
+static void *postcopy_listen_thread(void *opaque)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     QEMUFile *f = mis->from_src_file;
@@ -2149,7 +2149,7 @@ void *postcopy_ram_listen_thread(void *opaque)
          */
         qemu_event_wait(&mis->main_thread_load_event);
     }
-    postcopy_ram_incoming_cleanup(mis);
+    postcopy_incoming_cleanup(mis);
 
     if (load_res < 0) {
         /*
@@ -2182,3 +2182,43 @@ void *postcopy_ram_listen_thread(void *opaque)
 
     return NULL;
 }
+
+int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp)
+{
+    /*
+     * Sensitise RAM - can now generate requests for blocks that don't exist
+     * However, at this point the CPU shouldn't be running, and the IO
+     * shouldn't be doing anything yet so don't actually expect requests
+     */
+    if (migrate_postcopy_ram()) {
+        if (postcopy_ram_incoming_setup(mis)) {
+            postcopy_ram_incoming_cleanup(mis);
+            error_setg(errp, "Failed to setup incoming postcopy RAM blocks");
+            return -1;
+        }
+    }
+
+    trace_loadvm_postcopy_handle_listen("after uffd");
+
+    if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, errp)) {
+        return -1;
+    }
+
+    mis->have_listen_thread = true;
+    postcopy_thread_create(mis, &mis->listen_thread,
+                           MIGRATION_THREAD_DST_LISTEN,
+                           postcopy_listen_thread, QEMU_THREAD_DETACHED);
+
+    return 0;
+}
+
+int postcopy_incoming_cleanup(MigrationIncomingState *mis)
+{
+    int rc = 0;
+
+    if (migrate_postcopy_ram()) {
+        rc = postcopy_ram_incoming_cleanup(mis);
+    }
+
+    return rc;
+}
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 3e26db3e6b..a080dd65a7 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -199,6 +199,7 @@ bool postcopy_is_paused(MigrationStatus status);
 void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
                                    RAMBlock *rb);
 
-void *postcopy_ram_listen_thread(void *opaque);
+int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp);
+int postcopy_incoming_cleanup(MigrationIncomingState *mis);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 2f7ed0db64..01b5a8bfff 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2113,32 +2113,11 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis,
 
     trace_loadvm_postcopy_handle_listen("after discard");
 
-    /*
-     * Sensitise RAM - can now generate requests for blocks that don't exist
-     * However, at this point the CPU shouldn't be running, and the IO
-     * shouldn't be doing anything yet so don't actually expect requests
-     */
-    if (migrate_postcopy_ram()) {
-        if (postcopy_ram_incoming_setup(mis)) {
-            postcopy_ram_incoming_cleanup(mis);
-            error_setg(errp, "Failed to setup incoming postcopy RAM blocks");
-            return -1;
-        }
-    }
+    int rc = postcopy_incoming_setup(mis, errp);
 
-    trace_loadvm_postcopy_handle_listen("after uffd");
-
-    if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, errp)) {
-        return -1;
-    }
-
-    mis->have_listen_thread = true;
-    postcopy_thread_create(mis, &mis->listen_thread,
-                           MIGRATION_THREAD_DST_LISTEN,
-                           postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
     trace_loadvm_postcopy_handle_listen("return");
 
-    return 0;
+    return rc;
 }
 
 static void loadvm_postcopy_handle_run_bh(void *opaque)
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 5/8] migration: Refactor all incoming cleanup info migration_incoming_destroy()
  2025-11-03 18:32 [PATCH v4 0/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
                   ` (3 preceding siblings ...)
  2025-11-03 18:32 ` [PATCH v4 4/8] migration: Introduce postcopy incoming setup and cleanup functions Juraj Marcin
@ 2025-11-03 18:32 ` Juraj Marcin
  2025-11-03 18:32 ` [PATCH v4 6/8] migration: Respect exit-on-error when migration fails before resuming Juraj Marcin
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Juraj Marcin @ 2025-11-03 18:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Dr. David Alan Gilbert, Fabiano Rosas, Peter Xu,
	Jiri Denemark

From: Juraj Marcin <jmarcin@redhat.com>

Currently, there are two functions that are responsible for calling the
cleanup of the incoming migration state. With successful precopy, it's
the incoming migration coroutine, and with successful postcopy it's the
postcopy listen thread. However, if postcopy fails during in the device
load, both functions will try to do the cleanup.

This patch refactors all cleanup that needs to be done on the incoming
side into a common function and defines a clear boundary, who is
responsible for the cleanup. The incoming migration coroutine is
responsible for calling the cleanup function, unless the listen thread
has been started, in which case the postcopy listen thread runs the
incoming migration cleanup in its BH.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Fixes: 9535435795 ("migration: push Error **errp into qemu_loadvm_state()")
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    | 44 +++++++++-------------------
 migration/migration.h    |  1 +
 migration/postcopy-ram.c | 63 +++++++++++++++++++++-------------------
 migration/trace-events   |  2 +-
 4 files changed, 49 insertions(+), 61 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 9a367f717e..637be71bfe 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -438,10 +438,15 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
 
 void migration_incoming_state_destroy(void)
 {
-    struct MigrationIncomingState *mis = migration_incoming_get_current();
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyState ps = postcopy_state_get();
 
     multifd_recv_cleanup();
 
+    if (ps != POSTCOPY_INCOMING_NONE) {
+        postcopy_incoming_cleanup(mis);
+    }
+
     /*
      * RAM state cleanup needs to happen after multifd cleanup, because
      * multifd threads can use some of its states (receivedmap).
@@ -866,7 +871,6 @@ process_incoming_migration_co(void *opaque)
 {
     MigrationState *s = migrate_get_current();
     MigrationIncomingState *mis = migration_incoming_get_current();
-    PostcopyState ps;
     int ret;
     Error *local_err = NULL;
 
@@ -883,25 +887,14 @@ process_incoming_migration_co(void *opaque)
 
     trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
 
-    ps = postcopy_state_get();
-    trace_process_incoming_migration_co_end(ret, ps);
-    if (ps != POSTCOPY_INCOMING_NONE) {
-        if (ps == POSTCOPY_INCOMING_ADVISE) {
-            /*
-             * Where a migration had postcopy enabled (and thus went to advise)
-             * but managed to complete within the precopy period, we can use
-             * the normal exit.
-             */
-            postcopy_incoming_cleanup(mis);
-        } else if (ret >= 0) {
-            /*
-             * Postcopy was started, cleanup should happen at the end of the
-             * postcopy thread.
-             */
-            trace_process_incoming_migration_co_postcopy_end_main();
-            goto out;
-        }
-        /* Else if something went wrong then just fall out of the normal exit */
+    trace_process_incoming_migration_co_end(ret);
+    if (mis->have_listen_thread) {
+        /*
+         * Postcopy was started, cleanup should happen at the end of the
+         * postcopy listen thread.
+         */
+        trace_process_incoming_migration_co_postcopy_end_main();
+        goto out;
     }
 
     if (ret < 0) {
@@ -933,15 +926,6 @@ fail:
         }
 
         exit(EXIT_FAILURE);
-    } else {
-        /*
-         * Report the error here in case that QEMU abruptly exits
-         * when postcopy is enabled.
-         */
-        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
-            error_report_err(s->error);
-            s->error = NULL;
-        }
     }
 out:
     /* Pairs with the refcount taken in qmp_migrate_incoming() */
diff --git a/migration/migration.h b/migration/migration.h
index 01329bf824..4a37f7202c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -254,6 +254,7 @@ struct MigrationIncomingState {
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
 void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
+void migration_incoming_qemu_exit(void);
 /*
  * Functions to work with blocktime context
  */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index b47c955763..48cbb46c27 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2078,6 +2078,24 @@ bool postcopy_is_paused(MigrationStatus status)
         status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
 }
 
+static void postcopy_listen_thread_bh(void *opaque)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    migration_incoming_state_destroy();
+
+    if (mis->state == MIGRATION_STATUS_FAILED) {
+        /*
+         * If something went wrong then we have a bad state so exit;
+         * we only could have gotten here if something failed before
+         * POSTCOPY_INCOMING_RUNNING (for example device load), otherwise
+         * postcopy migration would pause inside qemu_loadvm_state_main().
+         * Failing dirty-bitmaps won't fail the whole migration.
+         */
+        exit(1);
+    }
+}
+
 /*
  * Triggered by a postcopy_listen command; this thread takes over reading
  * the input stream, leaving the main thread free to carry on loading the rest
@@ -2131,53 +2149,38 @@ static void *postcopy_listen_thread(void *opaque)
                          "bitmaps are correctly migrated and valid.",
                          __func__, load_res, error_get_pretty(local_err));
             g_clear_pointer(&local_err, error_free);
-            load_res = 0; /* prevent further exit() */
         } else {
+            /*
+             * Something went fatally wrong and we have a bad state, QEMU will
+             * exit depending on if postcopy-exit-on-error is true, but the
+             * migration cannot be recovered.
+             */
             error_prepend(&local_err,
                           "loadvm failed during postcopy: %d: ", load_res);
             migrate_set_error(migr, local_err);
             g_clear_pointer(&local_err, error_report_err);
             migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                                            MIGRATION_STATUS_FAILED);
+            goto out;
         }
     }
-    if (load_res >= 0) {
-        /*
-         * This looks good, but it's possible that the device loading in the
-         * main thread hasn't finished yet, and so we might not be in 'RUN'
-         * state yet; wait for the end of the main thread.
-         */
-        qemu_event_wait(&mis->main_thread_load_event);
-    }
-    postcopy_incoming_cleanup(mis);
-
-    if (load_res < 0) {
-        /*
-         * If something went wrong then we have a bad state so exit;
-         * depending how far we got it might be possible at this point
-         * to leave the guest running and fire MCEs for pages that never
-         * arrived as a desperate recovery step.
-         */
-        rcu_unregister_thread();
-        exit(EXIT_FAILURE);
-    }
+    /*
+     * This looks good, but it's possible that the device loading in the
+     * main thread hasn't finished yet, and so we might not be in 'RUN'
+     * state yet; wait for the end of the main thread.
+     */
+    qemu_event_wait(&mis->main_thread_load_event);
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                                    MIGRATION_STATUS_COMPLETED);
-    /*
-     * If everything has worked fine, then the main thread has waited
-     * for us to start, and we're the last use of the mis.
-     * (If something broke then qemu will have to exit anyway since it's
-     * got a bad migration state).
-     */
-    bql_lock();
-    migration_incoming_state_destroy();
-    bql_unlock();
 
+out:
     rcu_unregister_thread();
     mis->have_listen_thread = false;
     postcopy_state_set(POSTCOPY_INCOMING_END);
 
+    migration_bh_schedule(postcopy_listen_thread_bh, NULL);
+
     object_unref(OBJECT(migr));
 
     return NULL;
diff --git a/migration/trace-events b/migration/trace-events
index e8edd1fbba..772636f3ac 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -193,7 +193,7 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
 source_return_path_thread_switchover_acked(void) ""
 migration_thread_low_pending(uint64_t pending) "%" PRIu64
 migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
-process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
+process_incoming_migration_co_end(int ret) "ret=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 postcopy_preempt_enabled(bool value) "%d"
 migration_precopy_complete(void) ""
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 6/8] migration: Respect exit-on-error when migration fails before resuming
  2025-11-03 18:32 [PATCH v4 0/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
                   ` (4 preceding siblings ...)
  2025-11-03 18:32 ` [PATCH v4 5/8] migration: Refactor all incoming cleanup info migration_incoming_destroy() Juraj Marcin
@ 2025-11-03 18:32 ` Juraj Marcin
  2025-11-03 18:32 ` [PATCH v4 7/8] migration: Make postcopy listen thread joinable Juraj Marcin
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Juraj Marcin @ 2025-11-03 18:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Dr. David Alan Gilbert, Fabiano Rosas, Peter Xu,
	Jiri Denemark

From: Juraj Marcin <jmarcin@redhat.com>

When exit-on-error was added to migration, it wasn't added to postcopy.
Even though postcopy migration will usually pause and not fail, in cases
it does unrecoverably fail before destination side has been started,
exit-on-error will allow management to query the error.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 migration/postcopy-ram.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 48cbb46c27..91431f02a4 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2080,11 +2080,16 @@ bool postcopy_is_paused(MigrationStatus status)
 
 static void postcopy_listen_thread_bh(void *opaque)
 {
+    MigrationState *s = migrate_get_current();
     MigrationIncomingState *mis = migration_incoming_get_current();
 
     migration_incoming_state_destroy();
 
-    if (mis->state == MIGRATION_STATUS_FAILED) {
+    if (mis->state == MIGRATION_STATUS_FAILED && mis->exit_on_error) {
+        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
+            error_report_err(s->error);
+            s->error = NULL;
+        }
         /*
          * If something went wrong then we have a bad state so exit;
          * we only could have gotten here if something failed before
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 7/8] migration: Make postcopy listen thread joinable
  2025-11-03 18:32 [PATCH v4 0/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
                   ` (5 preceding siblings ...)
  2025-11-03 18:32 ` [PATCH v4 6/8] migration: Respect exit-on-error when migration fails before resuming Juraj Marcin
@ 2025-11-03 18:32 ` Juraj Marcin
  2025-11-03 18:32 ` [PATCH v4 8/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
  2025-11-03 21:07 ` [PATCH v4 0/8] " Peter Xu
  8 siblings, 0 replies; 12+ messages in thread
From: Juraj Marcin @ 2025-11-03 18:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Dr. David Alan Gilbert, Fabiano Rosas, Peter Xu,
	Jiri Denemark

From: Juraj Marcin <jmarcin@redhat.com>

This patch makes the listen thread joinable instead detached, and joins
it alongside other postcopy threads.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 migration/postcopy-ram.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 91431f02a4..8405cce7b4 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2181,7 +2181,6 @@ static void *postcopy_listen_thread(void *opaque)
 
 out:
     rcu_unregister_thread();
-    mis->have_listen_thread = false;
     postcopy_state_set(POSTCOPY_INCOMING_END);
 
     migration_bh_schedule(postcopy_listen_thread_bh, NULL);
@@ -2215,7 +2214,7 @@ int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp)
     mis->have_listen_thread = true;
     postcopy_thread_create(mis, &mis->listen_thread,
                            MIGRATION_THREAD_DST_LISTEN,
-                           postcopy_listen_thread, QEMU_THREAD_DETACHED);
+                           postcopy_listen_thread, QEMU_THREAD_JOINABLE);
 
     return 0;
 }
@@ -2224,6 +2223,11 @@ int postcopy_incoming_cleanup(MigrationIncomingState *mis)
 {
     int rc = 0;
 
+    if (mis->have_listen_thread) {
+        qemu_thread_join(&mis->listen_thread);
+        mis->have_listen_thread = false;
+    }
+
     if (migrate_postcopy_ram()) {
         rc = postcopy_ram_incoming_cleanup(mis);
     }
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 8/8] migration: Introduce POSTCOPY_DEVICE state
  2025-11-03 18:32 [PATCH v4 0/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
                   ` (6 preceding siblings ...)
  2025-11-03 18:32 ` [PATCH v4 7/8] migration: Make postcopy listen thread joinable Juraj Marcin
@ 2025-11-03 18:32 ` Juraj Marcin
  2025-11-03 19:09   ` Peter Xu
  2025-11-03 21:07 ` [PATCH v4 0/8] " Peter Xu
  8 siblings, 1 reply; 12+ messages in thread
From: Juraj Marcin @ 2025-11-03 18:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Dr. David Alan Gilbert, Fabiano Rosas, Peter Xu,
	Jiri Denemark

From: Juraj Marcin <jmarcin@redhat.com>

Currently, when postcopy starts, the source VM starts switchover and
sends a package containing the state of all non-postcopiable devices.
When the destination loads this package, the switchover is complete and
the destination VM starts. However, if the device state load fails or
the destination side crashes, the source side is already in
POSTCOPY_ACTIVE state and cannot be recovered, even when it has the most
up-to-date machine state as the destination has not yet started.

This patch introduces a new POSTCOPY_DEVICE state which is active while
the destination machine is loading the device state, is not yet running,
and the source side can be resumed in case of a migration failure.
Return-path is required for this state to function, otherwise it will be
skipped in favor of POSTCOPY_ACTIVE.

To transition from POSTCOPY_DEVICE to POSTCOPY_ACTIVE, the source
side uses a PONG message that is a response to a PING message processed
just before the POSTCOPY_RUN command that starts the destination VM.
Thus, this feature is effective even if the destination side does not
yet support this new state.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
---
 migration/migration.c                 | 50 ++++++++++++++++++++++++---
 migration/migration.h                 |  3 ++
 migration/postcopy-ram.c              | 10 ++++--
 migration/savevm.c                    |  5 +++
 migration/savevm.h                    |  2 ++
 migration/trace-events                |  1 +
 qapi/migration.json                   | 10 ++++--
 tests/qemu-iotests/194                |  2 +-
 tests/qtest/migration/precopy-tests.c |  3 +-
 9 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 637be71bfe..c2daab6bdd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1206,6 +1206,7 @@ bool migration_is_running(void)
 
     switch (s->state) {
     case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_DEVICE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
     case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
@@ -1227,6 +1228,7 @@ static bool migration_is_active(void)
     MigrationState *s = current_migration;
 
     return (s->state == MIGRATION_STATUS_ACTIVE ||
+            s->state == MIGRATION_STATUS_POSTCOPY_DEVICE ||
             s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
 }
 
@@ -1349,6 +1351,7 @@ static void fill_source_migration_info(MigrationInfo *info)
         break;
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_CANCELLING:
+    case MIGRATION_STATUS_POSTCOPY_DEVICE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
@@ -1402,6 +1405,7 @@ static void fill_destination_migration_info(MigrationInfo *info)
     case MIGRATION_STATUS_CANCELLING:
     case MIGRATION_STATUS_CANCELLED:
     case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_DEVICE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
     case MIGRATION_STATUS_POSTCOPY_RECOVER:
@@ -1732,6 +1736,7 @@ bool migration_in_postcopy(void)
     MigrationState *s = migrate_get_current();
 
     switch (s->state) {
+    case MIGRATION_STATUS_POSTCOPY_DEVICE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
     case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
@@ -1833,6 +1838,9 @@ int migrate_init(MigrationState *s, Error **errp)
     memset(&mig_stats, 0, sizeof(mig_stats));
     migration_reset_vfio_bytes_transferred();
 
+    s->postcopy_package_loaded = false;
+    qemu_event_reset(&s->postcopy_package_loaded_event);
+
     return 0;
 }
 
@@ -2568,6 +2576,11 @@ static void *source_return_path_thread(void *opaque)
             tmp32 = ldl_be_p(buf);
             trace_source_return_path_thread_pong(tmp32);
             qemu_sem_post(&ms->rp_state.rp_pong_acks);
+            if (tmp32 == QEMU_VM_PING_PACKAGED_LOADED) {
+                trace_source_return_path_thread_postcopy_package_loaded();
+                ms->postcopy_package_loaded = true;
+                qemu_event_set(&ms->postcopy_package_loaded_event);
+            }
             break;
 
         case MIG_RP_MSG_REQ_PAGES:
@@ -2813,6 +2826,15 @@ static int postcopy_start(MigrationState *ms, Error **errp)
     if (migrate_postcopy_ram()) {
         qemu_savevm_send_ping(fb, 3);
     }
+    if (ms->rp_state.rp_thread_created) {
+        /*
+        * This ping will tell us that all non-postcopiable device state has been
+        * successfully loaded and the destination is about to start. When
+        * response is received, it will trigger transition from POSTCOPY_DEVICE
+        * to POSTCOPY_ACTIVE state.
+        */
+        qemu_savevm_send_ping(fb, QEMU_VM_PING_PACKAGED_LOADED);
+    }
 
     qemu_savevm_send_postcopy_run(fb);
 
@@ -2868,8 +2890,13 @@ static int postcopy_start(MigrationState *ms, Error **errp)
      */
     migration_rate_set(migrate_max_postcopy_bandwidth());
 
-    /* Now, switchover looks all fine, switching to postcopy-active */
+    /*
+     * Now, switchover looks all fine, switching to POSTCOPY_DEVICE, or
+     * directly to POSTCOPY_ACTIVE if there is no return path.
+     */
     migrate_set_state(&ms->state, MIGRATION_STATUS_DEVICE,
+                      ms->rp_state.rp_thread_created ?
+                      MIGRATION_STATUS_POSTCOPY_DEVICE :
                       MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
     bql_unlock();
@@ -3311,8 +3338,8 @@ static MigThrError migration_detect_error(MigrationState *s)
         return postcopy_pause(s);
     } else {
         /*
-         * For precopy (or postcopy with error outside IO), we fail
-         * with no time.
+         * For precopy (or postcopy with error outside IO, or before dest
+         * starts), we fail with no time.
          */
         migrate_set_state(&s->state, state, MIGRATION_STATUS_FAILED);
         trace_migration_thread_file_err();
@@ -3447,7 +3474,8 @@ static MigIterateState migration_iteration_run(MigrationState *s)
 {
     uint64_t must_precopy, can_postcopy, pending_size;
     Error *local_err = NULL;
-    bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
+    bool in_postcopy = (s->state == MIGRATION_STATUS_POSTCOPY_DEVICE ||
+                        s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
     bool can_switchover = migration_can_switchover(s);
     bool complete_ready;
 
@@ -3463,6 +3491,18 @@ static MigIterateState migration_iteration_run(MigrationState *s)
          * POSTCOPY_ACTIVE it means switchover already happened.
          */
         complete_ready = !pending_size;
+        if (s->state == MIGRATION_STATUS_POSTCOPY_DEVICE &&
+            (s->postcopy_package_loaded || complete_ready)) {
+            /*
+             * If package has been loaded, the event is set and we will
+             * immediatelly transition to POSTCOPY_ACTIVE. If we are ready for
+             * completion, we need to wait for destination to load the postcopy
+             * package before actually completing.
+             */
+            qemu_event_wait(&s->postcopy_package_loaded_event);
+            migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
+                              MIGRATION_STATUS_POSTCOPY_ACTIVE);
+        }
     } else {
         /*
          * Exact pending reporting is only needed for precopy.  Taking RAM
@@ -4117,6 +4157,7 @@ static void migration_instance_finalize(Object *obj)
     qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
     qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
     error_free(ms->error);
+    qemu_event_destroy(&ms->postcopy_package_loaded_event);
 }
 
 static void migration_instance_init(Object *obj)
@@ -4138,6 +4179,7 @@ static void migration_instance_init(Object *obj)
     qemu_sem_init(&ms->wait_unplug_sem, 0);
     qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0);
     qemu_mutex_init(&ms->qemu_file_lock);
+    qemu_event_init(&ms->postcopy_package_loaded_event, 0);
 }
 
 /*
diff --git a/migration/migration.h b/migration/migration.h
index 4a37f7202c..213b33fe6e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -510,6 +510,9 @@ struct MigrationState {
     /* Is this a rdma migration */
     bool rdma_migration;
 
+    bool postcopy_package_loaded;
+    QemuEvent postcopy_package_loaded_event;
+
     GSource *hup_source;
 };
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 8405cce7b4..3f98dcb6fd 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2117,7 +2117,8 @@ static void *postcopy_listen_thread(void *opaque)
     object_ref(OBJECT(migr));
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
+                      mis->to_src_file ? MIGRATION_STATUS_POSTCOPY_DEVICE :
+                                         MIGRATION_STATUS_POSTCOPY_ACTIVE);
     qemu_event_set(&mis->thread_sync_event);
     trace_postcopy_ram_listen_thread_start();
 
@@ -2164,8 +2165,7 @@ static void *postcopy_listen_thread(void *opaque)
                           "loadvm failed during postcopy: %d: ", load_res);
             migrate_set_error(migr, local_err);
             g_clear_pointer(&local_err, error_report_err);
-            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                                           MIGRATION_STATUS_FAILED);
+            migrate_set_state(&mis->state, mis->state, MIGRATION_STATUS_FAILED);
             goto out;
         }
     }
@@ -2176,6 +2176,10 @@ static void *postcopy_listen_thread(void *opaque)
      */
     qemu_event_wait(&mis->main_thread_load_event);
 
+    /*
+     * Device load in the main thread has finished, we should be in
+     * POSTCOPY_ACTIVE now.
+     */
     migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                                    MIGRATION_STATUS_COMPLETED);
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 01b5a8bfff..62cc2ce25c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2170,6 +2170,11 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis, Error **errp)
         return -1;
     }
 
+    /* We might be already in POSTCOPY_ACTIVE if there is no return path */
+    if (mis->state == MIGRATION_STATUS_POSTCOPY_DEVICE) {
+        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
+                          MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    }
     postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
     migration_bh_schedule(loadvm_postcopy_handle_run_bh, mis);
 
diff --git a/migration/savevm.h b/migration/savevm.h
index c337e3e3d1..125a2507b7 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -29,6 +29,8 @@
 #define QEMU_VM_COMMAND              0x08
 #define QEMU_VM_SECTION_FOOTER       0x7e
 
+#define QEMU_VM_PING_PACKAGED_LOADED 0x42
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_non_migratable_list(strList **reasons);
 int qemu_savevm_state_prepare(Error **errp);
diff --git a/migration/trace-events b/migration/trace-events
index 772636f3ac..bf11b62b17 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -191,6 +191,7 @@ source_return_path_thread_pong(uint32_t val) "0x%x"
 source_return_path_thread_shut(uint32_t val) "0x%x"
 source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
 source_return_path_thread_switchover_acked(void) ""
+source_return_path_thread_postcopy_package_loaded(void) ""
 migration_thread_low_pending(uint64_t pending) "%" PRIu64
 migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
 process_incoming_migration_co_end(int ret) "ret=%d"
diff --git a/qapi/migration.json b/qapi/migration.json
index c7a6737cc1..93f71de3fe 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -142,6 +142,12 @@
 # @postcopy-active: like active, but now in postcopy mode.
 #     (since 2.5)
 #
+# @postcopy-device: like postcopy-active, but the destination is still
+#     loading device state and is not running yet.  If migration fails
+#     during this state, the source side will resume.  If there is no
+#     return-path from destination to source, this state is skipped.
+#     (since 10.2)
+#
 # @postcopy-paused: during postcopy but paused.  (since 3.0)
 #
 # @postcopy-recover-setup: setup phase for a postcopy recovery
@@ -173,8 +179,8 @@
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'postcopy-active', 'postcopy-paused',
-            'postcopy-recover-setup',
+            'active', 'postcopy-device', 'postcopy-active',
+            'postcopy-paused', 'postcopy-recover-setup',
             'postcopy-recover', 'completed', 'failed', 'colo',
             'pre-switchover', 'device', 'wait-unplug' ] }
 ##
diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index e114c0b269..806624394d 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -76,7 +76,7 @@ with iotests.FilePath('source.img') as source_img_path, \
 
     while True:
         event1 = source_vm.event_wait('MIGRATION')
-        if event1['data']['status'] == 'postcopy-active':
+        if event1['data']['status'] in ('postcopy-device', 'postcopy-active'):
             # This event is racy, it depends do we really do postcopy or bitmap
             # was migrated during downtime (and no data to migrate in postcopy
             # phase). So, don't log it.
diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index bb38292550..57ca623de5 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -1316,13 +1316,14 @@ void migration_test_add_precopy(MigrationTestEnv *env)
     }
 
     /* ensure new status don't go unnoticed */
-    assert(MIGRATION_STATUS__MAX == 15);
+    assert(MIGRATION_STATUS__MAX == 16);
 
     for (int i = MIGRATION_STATUS_NONE; i < MIGRATION_STATUS__MAX; i++) {
         switch (i) {
         case MIGRATION_STATUS_DEVICE: /* happens too fast */
         case MIGRATION_STATUS_WAIT_UNPLUG: /* no support in tests */
         case MIGRATION_STATUS_COLO: /* no support in tests */
+        case MIGRATION_STATUS_POSTCOPY_DEVICE: /* postcopy can't be cancelled */
         case MIGRATION_STATUS_POSTCOPY_ACTIVE: /* postcopy can't be cancelled */
         case MIGRATION_STATUS_POSTCOPY_PAUSED:
         case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/8] migration: Flush migration channel after sending data of CMD_PACKAGED
  2025-11-03 18:32 ` [PATCH v4 1/8] migration: Flush migration channel after sending data of CMD_PACKAGED Juraj Marcin
@ 2025-11-03 19:04   ` Peter Xu
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2025-11-03 19:04 UTC (permalink / raw)
  To: Juraj Marcin
  Cc: qemu-devel, Dr. David Alan Gilbert, Fabiano Rosas, Jiri Denemark

On Mon, Nov 03, 2025 at 07:32:50PM +0100, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
> 
> If the length of the data sent after CMD_PACKAGED is just right, and
> there is not much data to send afterward, it is possible part of the
> CMD_PACKAGED payload will get left behind in the sending buffer. This
> causes the destination side to hang while it tries to load the whole
> package and initiate postcopy.
> 
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 8/8] migration: Introduce POSTCOPY_DEVICE state
  2025-11-03 18:32 ` [PATCH v4 8/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
@ 2025-11-03 19:09   ` Peter Xu
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2025-11-03 19:09 UTC (permalink / raw)
  To: Juraj Marcin
  Cc: qemu-devel, Dr. David Alan Gilbert, Fabiano Rosas, Jiri Denemark

On Mon, Nov 03, 2025 at 07:32:57PM +0100, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
> 
> Currently, when postcopy starts, the source VM starts switchover and
> sends a package containing the state of all non-postcopiable devices.
> When the destination loads this package, the switchover is complete and
> the destination VM starts. However, if the device state load fails or
> the destination side crashes, the source side is already in
> POSTCOPY_ACTIVE state and cannot be recovered, even when it has the most
> up-to-date machine state as the destination has not yet started.
> 
> This patch introduces a new POSTCOPY_DEVICE state which is active while
> the destination machine is loading the device state, is not yet running,
> and the source side can be resumed in case of a migration failure.
> Return-path is required for this state to function, otherwise it will be
> skipped in favor of POSTCOPY_ACTIVE.
> 
> To transition from POSTCOPY_DEVICE to POSTCOPY_ACTIVE, the source
> side uses a PONG message that is a response to a PING message processed
> just before the POSTCOPY_RUN command that starts the destination VM.
> Thus, this feature is effective even if the destination side does not
> yet support this new state.
> 
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/8] migration: Introduce POSTCOPY_DEVICE state
  2025-11-03 18:32 [PATCH v4 0/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
                   ` (7 preceding siblings ...)
  2025-11-03 18:32 ` [PATCH v4 8/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
@ 2025-11-03 21:07 ` Peter Xu
  8 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2025-11-03 21:07 UTC (permalink / raw)
  To: Juraj Marcin
  Cc: qemu-devel, Dr. David Alan Gilbert, Fabiano Rosas, Jiri Denemark

On Mon, Nov 03, 2025 at 07:32:49PM +0100, Juraj Marcin wrote:
> This series introduces a new POSTCOPY_DEVICE state that is active (both,
> on source and destination side), while the destination loads the device
> state. Before this series, if the destination machine failed during the
> device load, the source side would stay stuck POSTCOPY_ACTIVE with no
> way of recovery. With this series, if the migration fails while in
> POSTCOPY_DEVICE state, the source side can safely resume, as destination
> has not started yet.
> 
> RFC: https://lore.kernel.org/all/20250807114922.1013286-1-jmarcin@redhat.com/
> 
> V1: https://lore.kernel.org/all/20250915115918.3520735-1-jmarcin@redhat.com/
> 
> V2: https://lore.kernel.org/all/20251027154115.4138677-1-jmarcin@redhat.com/
> 
> V3: https://lore.kernel.org/all/20251030214915.1411860-1-jmarcin@redhat.com/
> 
> V4 changes:
> 
> - fixes for failing qemu-iotest 194
>     - flush channel after sending CMD_PACKAGED data (new patch 1)
>     - POSTCOPY_DEVICE state is now used only if return-path is open

Queued, thanks Juraj.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-11-03 21:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-03 18:32 [PATCH v4 0/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
2025-11-03 18:32 ` [PATCH v4 1/8] migration: Flush migration channel after sending data of CMD_PACKAGED Juraj Marcin
2025-11-03 19:04   ` Peter Xu
2025-11-03 18:32 ` [PATCH v4 2/8] migration: Do not try to start VM if disk activation fails Juraj Marcin
2025-11-03 18:32 ` [PATCH v4 3/8] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c Juraj Marcin
2025-11-03 18:32 ` [PATCH v4 4/8] migration: Introduce postcopy incoming setup and cleanup functions Juraj Marcin
2025-11-03 18:32 ` [PATCH v4 5/8] migration: Refactor all incoming cleanup info migration_incoming_destroy() Juraj Marcin
2025-11-03 18:32 ` [PATCH v4 6/8] migration: Respect exit-on-error when migration fails before resuming Juraj Marcin
2025-11-03 18:32 ` [PATCH v4 7/8] migration: Make postcopy listen thread joinable Juraj Marcin
2025-11-03 18:32 ` [PATCH v4 8/8] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
2025-11-03 19:09   ` Peter Xu
2025-11-03 21:07 ` [PATCH v4 0/8] " Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).