[PATCH v3 0/7] migration: Introduce POSTCOPY

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/7] migration: Introduce POSTCOPY_DEVICE state
@ 2025-10-30 21:49 Juraj Marcin
  2025-10-30 21:49 ` [PATCH v3 1/7] migration: Do not try to start VM if disk activation fails Juraj Marcin
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Juraj Marcin @ 2025-10-30 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Peter Xu, Dr. David Alan Gilbert, Jiri Denemark,
	Fabiano Rosas

This series introduces a new POSTCOPY_DEVICE state that is active (both,
on source and destination side), while the destination loads the device
state. Before this series, if the destination machine failed during the
device load, the source side would stay stuck POSTCOPY_ACTIVE with no
way of recovery. With this series, if the migration fails while in
POSTCOPY_DEVICE state, the source side can safely resume, as destination
has not started yet.

RFC: https://lore.kernel.org/all/20250807114922.1013286-1-jmarcin@redhat.com/

V1: https://lore.kernel.org/all/20250915115918.3520735-1-jmarcin@redhat.com/

V2: https://lore.kernel.org/all/20251027154115.4138677-1-jmarcin@redhat.com/

V3 changes:

- rebased on top of https://gitlab.com/peterx/qemu/-/commits/staging

Patch 2:
- split into two separate patches

Patches 4, 5, 6 (was Patch 3)
- return to previous migration_incoming_state_destroy() that will not
  exit on error
- use existing exit-on-error option also for postcopy, in separate patch
- moved conversion of the postcopy listen thread to a joinable thread
  into a separate patch

Patch 7:
- added reset of postcopy_package_loaded_event

Juraj Marcin (6):
  migration: Move postcopy_ram_listen_thread() to postcopy-ram.c
  migration: Introduce postcopy incoming setup and cleanup functions
  migration: Refactor all incoming cleanup info
    migration_incoming_destroy()
  migration: Respect exit-on-error when migration fails before resuming
  migration: Make postcopy listen thread joinable
  migration: Introduce POSTCOPY_DEVICE state

Peter Xu (1):
  migration: Do not try to start VM if disk activation fails

 migration/migration.c                 | 116 ++++++++++++-------
 migration/migration.h                 |   4 +
 migration/postcopy-ram.c              | 160 ++++++++++++++++++++++++++
 migration/postcopy-ram.h              |   3 +
 migration/savevm.c                    | 134 +--------------------
 migration/savevm.h                    |   2 +
 migration/trace-events                |   3 +-
 qapi/migration.json                   |   8 +-
 tests/qtest/migration/precopy-tests.c |   3 +-
 9 files changed, 260 insertions(+), 173 deletions(-)

-- 
2.51.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/7] migration: Do not try to start VM if disk activation fails
  2025-10-30 21:49 [PATCH v3 0/7] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
@ 2025-10-30 21:49 ` Juraj Marcin
  2025-10-30 21:49 ` [PATCH v3 2/7] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c Juraj Marcin
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Juraj Marcin @ 2025-10-30 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Peter Xu, Dr. David Alan Gilbert, Jiri Denemark,
	Fabiano Rosas

From: Peter Xu <peterx@redhat.com>

If a rare split brain happens (e.g. dest QEMU started running somehow,
taking shared drive locks), src QEMU may not be able to activate the
drives anymore.  In this case, src QEMU shouldn't start the VM or it might
crash the block layer later with something like:

Meanwhile, src QEMU cannot try to continue either even if dest QEMU can
release the drive locks (e.g. by QMP "stop").  Because as long as dest QEMU
started running, it means dest QEMU's RAM is the only version that is
consistent with current status of the shared storage.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5e74993b46..6e647c7c4a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3526,6 +3526,8 @@ static MigIterateState migration_iteration_run(MigrationState *s)
 
 static void migration_iteration_finish(MigrationState *s)
 {
+    Error *local_err = NULL;
+
     bql_lock();
 
     /*
@@ -3549,11 +3551,28 @@ static void migration_iteration_finish(MigrationState *s)
     case MIGRATION_STATUS_FAILED:
     case MIGRATION_STATUS_CANCELLED:
     case MIGRATION_STATUS_CANCELLING:
-        /*
-         * Re-activate the block drives if they're inactivated.  Note, COLO
-         * shouldn't use block_active at all, so it should be no-op there.
-         */
-        migration_block_activate(NULL);
+        if (!migration_block_activate(&local_err)) {
+            /*
+            * Re-activate the block drives if they're inactivated.
+            *
+            * If it fails (e.g. in case of a split brain, where dest QEMU
+            * might have taken some of the drive locks and running!), do
+            * not start VM, instead wait for mgmt to decide the next step.
+            *
+            * If dest already started, it means dest QEMU should contain
+            * all the data it needs and it properly owns all the drive
+            * locks.  Then even if src QEMU got a FAILED in migration, it
+            * normally should mean we should treat the migration as
+            * COMPLETED.
+            *
+            * NOTE: it's not safe anymore to start VM on src now even if
+            * dest would release the drive locks.  It's because as long as
+            * dest started running then only dest QEMU's RAM is consistent
+            * with the shared storage.
+            */
+            error_free(local_err);
+            break;
+        }
         if (runstate_is_live(s->vm_old_state)) {
             if (!runstate_check(RUN_STATE_SHUTDOWN)) {
                 vm_start();
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/7] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c
  2025-10-30 21:49 [PATCH v3 0/7] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
  2025-10-30 21:49 ` [PATCH v3 1/7] migration: Do not try to start VM if disk activation fails Juraj Marcin
@ 2025-10-30 21:49 ` Juraj Marcin
  2025-10-30 22:28   ` Peter Xu
  2025-10-30 21:49 ` [PATCH v3 3/7] migration: Introduce postcopy incoming setup and cleanup functions Juraj Marcin
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Juraj Marcin @ 2025-10-30 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Peter Xu, Dr. David Alan Gilbert, Jiri Denemark,
	Fabiano Rosas

From: Juraj Marcin <jmarcin@redhat.com>

This patch addresses a TODO about moving postcopy_ram_listen_thread() to
postcopy file.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
---
 migration/postcopy-ram.c | 107 +++++++++++++++++++++++++++++++++++++++
 migration/postcopy-ram.h |   2 +
 migration/savevm.c       | 107 ---------------------------------------
 3 files changed, 109 insertions(+), 107 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 5471efb4f0..36d5415554 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2077,3 +2077,110 @@ bool postcopy_is_paused(MigrationStatus status)
     return status == MIGRATION_STATUS_POSTCOPY_PAUSED ||
         status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
 }
+
+/*
+ * Triggered by a postcopy_listen command; this thread takes over reading
+ * the input stream, leaving the main thread free to carry on loading the rest
+ * of the device state (from RAM).
+ * (TODO:This could do with being in a postcopy file - but there again it's
+ * just another input loop, not that postcopy specific)
+ */
+void *postcopy_ram_listen_thread(void *opaque)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    QEMUFile *f = mis->from_src_file;
+    int load_res;
+    MigrationState *migr = migrate_get_current();
+    Error *local_err = NULL;
+
+    object_ref(OBJECT(migr));
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    qemu_event_set(&mis->thread_sync_event);
+    trace_postcopy_ram_listen_thread_start();
+
+    rcu_register_thread();
+    /*
+     * Because we're a thread and not a coroutine we can't yield
+     * in qemu_file, and thus we must be blocking now.
+     */
+    qemu_file_set_blocking(f, true, &error_fatal);
+
+    /* TODO: sanity check that only postcopiable data will be loaded here */
+    load_res = qemu_loadvm_state_main(f, mis, &local_err);
+
+    /*
+     * This is tricky, but, mis->from_src_file can change after it
+     * returns, when postcopy recovery happened. In the future, we may
+     * want a wrapper for the QEMUFile handle.
+     */
+    f = mis->from_src_file;
+
+    /* And non-blocking again so we don't block in any cleanup */
+    qemu_file_set_blocking(f, false, &error_fatal);
+
+    trace_postcopy_ram_listen_thread_exit();
+    if (load_res < 0) {
+        qemu_file_set_error(f, load_res);
+        dirty_bitmap_mig_cancel_incoming();
+        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
+            !migrate_postcopy_ram() && migrate_dirty_bitmaps())
+        {
+            error_report("%s: loadvm failed during postcopy: %d: %s. All states "
+                         "are migrated except dirty bitmaps. Some dirty "
+                         "bitmaps may be lost, and present migrated dirty "
+                         "bitmaps are correctly migrated and valid.",
+                         __func__, load_res, error_get_pretty(local_err));
+            g_clear_pointer(&local_err, error_free);
+            load_res = 0; /* prevent further exit() */
+        } else {
+            error_prepend(&local_err,
+                          "loadvm failed during postcopy: %d: ", load_res);
+            migrate_set_error(migr, local_err);
+            g_clear_pointer(&local_err, error_report_err);
+            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                                           MIGRATION_STATUS_FAILED);
+        }
+    }
+    if (load_res >= 0) {
+        /*
+         * This looks good, but it's possible that the device loading in the
+         * main thread hasn't finished yet, and so we might not be in 'RUN'
+         * state yet; wait for the end of the main thread.
+         */
+        qemu_event_wait(&mis->main_thread_load_event);
+    }
+    postcopy_ram_incoming_cleanup(mis);
+
+    if (load_res < 0) {
+        /*
+         * If something went wrong then we have a bad state so exit;
+         * depending how far we got it might be possible at this point
+         * to leave the guest running and fire MCEs for pages that never
+         * arrived as a desperate recovery step.
+         */
+        rcu_unregister_thread();
+        exit(EXIT_FAILURE);
+    }
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                                   MIGRATION_STATUS_COMPLETED);
+    /*
+     * If everything has worked fine, then the main thread has waited
+     * for us to start, and we're the last use of the mis.
+     * (If something broke then qemu will have to exit anyway since it's
+     * got a bad migration state).
+     */
+    bql_lock();
+    migration_incoming_state_destroy();
+    bql_unlock();
+
+    rcu_unregister_thread();
+    mis->have_listen_thread = false;
+    postcopy_state_set(POSTCOPY_INCOMING_END);
+
+    object_unref(OBJECT(migr));
+
+    return NULL;
+}
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index ca19433b24..3e26db3e6b 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -199,4 +199,6 @@ bool postcopy_is_paused(MigrationStatus status);
 void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
                                    RAMBlock *rb);
 
+void *postcopy_ram_listen_thread(void *opaque);
+
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 232cae090b..97fdd08c08 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2087,113 +2087,6 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     return 0;
 }
 
-/*
- * Triggered by a postcopy_listen command; this thread takes over reading
- * the input stream, leaving the main thread free to carry on loading the rest
- * of the device state (from RAM).
- * (TODO:This could do with being in a postcopy file - but there again it's
- * just another input loop, not that postcopy specific)
- */
-static void *postcopy_ram_listen_thread(void *opaque)
-{
-    MigrationIncomingState *mis = migration_incoming_get_current();
-    QEMUFile *f = mis->from_src_file;
-    int load_res;
-    MigrationState *migr = migrate_get_current();
-    Error *local_err = NULL;
-
-    object_ref(OBJECT(migr));
-
-    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
-    qemu_event_set(&mis->thread_sync_event);
-    trace_postcopy_ram_listen_thread_start();
-
-    rcu_register_thread();
-    /*
-     * Because we're a thread and not a coroutine we can't yield
-     * in qemu_file, and thus we must be blocking now.
-     */
-    qemu_file_set_blocking(f, true, &error_fatal);
-
-    /* TODO: sanity check that only postcopiable data will be loaded here */
-    load_res = qemu_loadvm_state_main(f, mis, &local_err);
-
-    /*
-     * This is tricky, but, mis->from_src_file can change after it
-     * returns, when postcopy recovery happened. In the future, we may
-     * want a wrapper for the QEMUFile handle.
-     */
-    f = mis->from_src_file;
-
-    /* And non-blocking again so we don't block in any cleanup */
-    qemu_file_set_blocking(f, false, &error_fatal);
-
-    trace_postcopy_ram_listen_thread_exit();
-    if (load_res < 0) {
-        qemu_file_set_error(f, load_res);
-        dirty_bitmap_mig_cancel_incoming();
-        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
-            !migrate_postcopy_ram() && migrate_dirty_bitmaps())
-        {
-            error_report("%s: loadvm failed during postcopy: %d: %s. All states "
-                         "are migrated except dirty bitmaps. Some dirty "
-                         "bitmaps may be lost, and present migrated dirty "
-                         "bitmaps are correctly migrated and valid.",
-                         __func__, load_res, error_get_pretty(local_err));
-            g_clear_pointer(&local_err, error_free);
-            load_res = 0; /* prevent further exit() */
-        } else {
-            error_prepend(&local_err,
-                          "loadvm failed during postcopy: %d: ", load_res);
-            migrate_set_error(migr, local_err);
-            g_clear_pointer(&local_err, error_report_err);
-            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                                           MIGRATION_STATUS_FAILED);
-        }
-    }
-    if (load_res >= 0) {
-        /*
-         * This looks good, but it's possible that the device loading in the
-         * main thread hasn't finished yet, and so we might not be in 'RUN'
-         * state yet; wait for the end of the main thread.
-         */
-        qemu_event_wait(&mis->main_thread_load_event);
-    }
-    postcopy_ram_incoming_cleanup(mis);
-
-    if (load_res < 0) {
-        /*
-         * If something went wrong then we have a bad state so exit;
-         * depending how far we got it might be possible at this point
-         * to leave the guest running and fire MCEs for pages that never
-         * arrived as a desperate recovery step.
-         */
-        rcu_unregister_thread();
-        exit(EXIT_FAILURE);
-    }
-
-    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                                   MIGRATION_STATUS_COMPLETED);
-    /*
-     * If everything has worked fine, then the main thread has waited
-     * for us to start, and we're the last use of the mis.
-     * (If something broke then qemu will have to exit anyway since it's
-     * got a bad migration state).
-     */
-    bql_lock();
-    migration_incoming_state_destroy();
-    bql_unlock();
-
-    rcu_unregister_thread();
-    mis->have_listen_thread = false;
-    postcopy_state_set(POSTCOPY_INCOMING_END);
-
-    object_unref(OBJECT(migr));
-
-    return NULL;
-}
-
 /* After this message we must be able to immediately receive postcopy data */
 static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis,
                                          Error **errp)
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/7] migration: Introduce postcopy incoming setup and cleanup functions
  2025-10-30 21:49 [PATCH v3 0/7] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
  2025-10-30 21:49 ` [PATCH v3 1/7] migration: Do not try to start VM if disk activation fails Juraj Marcin
  2025-10-30 21:49 ` [PATCH v3 2/7] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c Juraj Marcin
@ 2025-10-30 21:49 ` Juraj Marcin
  2025-10-30 22:35   ` Peter Xu
  2025-10-30 21:49 ` [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy() Juraj Marcin
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Juraj Marcin @ 2025-10-30 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Peter Xu, Dr. David Alan Gilbert, Jiri Denemark,
	Fabiano Rosas

From: Juraj Marcin <jmarcin@redhat.com>

After moving postcopy_ram_listen_thread() to postcopy file, this patch
introduces a pair of functions, postcopy_incoming_setup() and
postcopy_incoming_cleanup(). These functions encapsulate setup and
cleanup of all incoming postcopy resources, postcopy-ram and postcopy
listen thread.

Furthermore, this patch also renames the postcopy_ram_listen_thread to
postcopy_listen_thread, as this thread handles not only postcopy-ram,
but also dirty-bitmaps and in the future it could handle other
postcopiable devices.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
---
 migration/migration.c    |  2 +-
 migration/postcopy-ram.c | 46 ++++++++++++++++++++++++++++++++++++----
 migration/postcopy-ram.h |  3 ++-
 migration/savevm.c       | 25 ++--------------------
 4 files changed, 47 insertions(+), 29 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6e647c7c4a..9a367f717e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -892,7 +892,7 @@ process_incoming_migration_co(void *opaque)
              * but managed to complete within the precopy period, we can use
              * the normal exit.
              */
-            postcopy_ram_incoming_cleanup(mis);
+            postcopy_incoming_cleanup(mis);
         } else if (ret >= 0) {
             /*
              * Postcopy was started, cleanup should happen at the end of the
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 36d5415554..b47c955763 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2082,10 +2082,8 @@ bool postcopy_is_paused(MigrationStatus status)
  * Triggered by a postcopy_listen command; this thread takes over reading
  * the input stream, leaving the main thread free to carry on loading the rest
  * of the device state (from RAM).
- * (TODO:This could do with being in a postcopy file - but there again it's
- * just another input loop, not that postcopy specific)
  */
-void *postcopy_ram_listen_thread(void *opaque)
+static void *postcopy_listen_thread(void *opaque)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     QEMUFile *f = mis->from_src_file;
@@ -2151,7 +2149,7 @@ void *postcopy_ram_listen_thread(void *opaque)
          */
         qemu_event_wait(&mis->main_thread_load_event);
     }
-    postcopy_ram_incoming_cleanup(mis);
+    postcopy_incoming_cleanup(mis);
 
     if (load_res < 0) {
         /*
@@ -2184,3 +2182,43 @@ void *postcopy_ram_listen_thread(void *opaque)
 
     return NULL;
 }
+
+int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp)
+{
+    /*
+     * Sensitise RAM - can now generate requests for blocks that don't exist
+     * However, at this point the CPU shouldn't be running, and the IO
+     * shouldn't be doing anything yet so don't actually expect requests
+     */
+    if (migrate_postcopy_ram()) {
+        if (postcopy_ram_incoming_setup(mis)) {
+            postcopy_ram_incoming_cleanup(mis);
+            error_setg(errp, "Failed to setup incoming postcopy RAM blocks");
+            return -1;
+        }
+    }
+
+    trace_loadvm_postcopy_handle_listen("after uffd");
+
+    if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, errp)) {
+        return -1;
+    }
+
+    mis->have_listen_thread = true;
+    postcopy_thread_create(mis, &mis->listen_thread,
+                           MIGRATION_THREAD_DST_LISTEN,
+                           postcopy_listen_thread, QEMU_THREAD_DETACHED);
+
+    return 0;
+}
+
+int postcopy_incoming_cleanup(MigrationIncomingState *mis)
+{
+    int rc = 0;
+
+    if (migrate_postcopy_ram()) {
+        rc = postcopy_ram_incoming_cleanup(mis);
+    }
+
+    return rc;
+}
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 3e26db3e6b..a080dd65a7 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -199,6 +199,7 @@ bool postcopy_is_paused(MigrationStatus status);
 void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
                                    RAMBlock *rb);
 
-void *postcopy_ram_listen_thread(void *opaque);
+int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp);
+int postcopy_incoming_cleanup(MigrationIncomingState *mis);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 97fdd08c08..6ae3f740b5 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2112,32 +2112,11 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis,
 
     trace_loadvm_postcopy_handle_listen("after discard");
 
-    /*
-     * Sensitise RAM - can now generate requests for blocks that don't exist
-     * However, at this point the CPU shouldn't be running, and the IO
-     * shouldn't be doing anything yet so don't actually expect requests
-     */
-    if (migrate_postcopy_ram()) {
-        if (postcopy_ram_incoming_setup(mis)) {
-            postcopy_ram_incoming_cleanup(mis);
-            error_setg(errp, "Failed to setup incoming postcopy RAM blocks");
-            return -1;
-        }
-    }
+    int rc = postcopy_incoming_setup(mis, errp);
 
-    trace_loadvm_postcopy_handle_listen("after uffd");
-
-    if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, errp)) {
-        return -1;
-    }
-
-    mis->have_listen_thread = true;
-    postcopy_thread_create(mis, &mis->listen_thread,
-                           MIGRATION_THREAD_DST_LISTEN,
-                           postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
     trace_loadvm_postcopy_handle_listen("return");
 
-    return 0;
+    return rc;
 }
 
 static void loadvm_postcopy_handle_run_bh(void *opaque)
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy()
  2025-10-30 21:49 [PATCH v3 0/7] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
                   ` (2 preceding siblings ...)
  2025-10-30 21:49 ` [PATCH v3 3/7] migration: Introduce postcopy incoming setup and cleanup functions Juraj Marcin
@ 2025-10-30 21:49 ` Juraj Marcin
  2025-10-30 22:49   ` Peter Xu
  2025-10-30 21:49 ` [PATCH v3 5/7] migration: Respect exit-on-error when migration fails before resuming Juraj Marcin
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Juraj Marcin @ 2025-10-30 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Peter Xu, Dr. David Alan Gilbert, Jiri Denemark,
	Fabiano Rosas

From: Juraj Marcin <jmarcin@redhat.com>

Currently, there are two functions that are responsible for calling the
cleanup of the incoming migration state. With successful precopy, it's
the incoming migration coroutine, and with successful postcopy it's the
postcopy listen thread. However, if postcopy fails during in the device
load, both functions will try to do the cleanup.

This patch refactors all cleanup that needs to be done on the incoming
side into a common function and defines a clear boundary, who is
responsible for the cleanup. The incoming migration coroutine is
responsible for calling the cleanup function, unless the listen thread
has been started, in which case the postcopy listen thread runs the
incoming migration cleanup in its BH.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
---
 migration/migration.c    | 44 +++++++++-------------------
 migration/migration.h    |  1 +
 migration/postcopy-ram.c | 63 +++++++++++++++++++++-------------------
 migration/trace-events   |  2 +-
 4 files changed, 49 insertions(+), 61 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 9a367f717e..637be71bfe 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -438,10 +438,15 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
 
 void migration_incoming_state_destroy(void)
 {
-    struct MigrationIncomingState *mis = migration_incoming_get_current();
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyState ps = postcopy_state_get();
 
     multifd_recv_cleanup();
 
+    if (ps != POSTCOPY_INCOMING_NONE) {
+        postcopy_incoming_cleanup(mis);
+    }
+
     /*
      * RAM state cleanup needs to happen after multifd cleanup, because
      * multifd threads can use some of its states (receivedmap).
@@ -866,7 +871,6 @@ process_incoming_migration_co(void *opaque)
 {
     MigrationState *s = migrate_get_current();
     MigrationIncomingState *mis = migration_incoming_get_current();
-    PostcopyState ps;
     int ret;
     Error *local_err = NULL;
 
@@ -883,25 +887,14 @@ process_incoming_migration_co(void *opaque)
 
     trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
 
-    ps = postcopy_state_get();
-    trace_process_incoming_migration_co_end(ret, ps);
-    if (ps != POSTCOPY_INCOMING_NONE) {
-        if (ps == POSTCOPY_INCOMING_ADVISE) {
-            /*
-             * Where a migration had postcopy enabled (and thus went to advise)
-             * but managed to complete within the precopy period, we can use
-             * the normal exit.
-             */
-            postcopy_incoming_cleanup(mis);
-        } else if (ret >= 0) {
-            /*
-             * Postcopy was started, cleanup should happen at the end of the
-             * postcopy thread.
-             */
-            trace_process_incoming_migration_co_postcopy_end_main();
-            goto out;
-        }
-        /* Else if something went wrong then just fall out of the normal exit */
+    trace_process_incoming_migration_co_end(ret);
+    if (mis->have_listen_thread) {
+        /*
+         * Postcopy was started, cleanup should happen at the end of the
+         * postcopy listen thread.
+         */
+        trace_process_incoming_migration_co_postcopy_end_main();
+        goto out;
     }
 
     if (ret < 0) {
@@ -933,15 +926,6 @@ fail:
         }
 
         exit(EXIT_FAILURE);
-    } else {
-        /*
-         * Report the error here in case that QEMU abruptly exits
-         * when postcopy is enabled.
-         */
-        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
-            error_report_err(s->error);
-            s->error = NULL;
-        }
     }
 out:
     /* Pairs with the refcount taken in qmp_migrate_incoming() */
diff --git a/migration/migration.h b/migration/migration.h
index 01329bf824..4a37f7202c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -254,6 +254,7 @@ struct MigrationIncomingState {
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
 void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
+void migration_incoming_qemu_exit(void);
 /*
  * Functions to work with blocktime context
  */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index b47c955763..48cbb46c27 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2078,6 +2078,24 @@ bool postcopy_is_paused(MigrationStatus status)
         status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
 }
 
+static void postcopy_listen_thread_bh(void *opaque)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    migration_incoming_state_destroy();
+
+    if (mis->state == MIGRATION_STATUS_FAILED) {
+        /*
+         * If something went wrong then we have a bad state so exit;
+         * we only could have gotten here if something failed before
+         * POSTCOPY_INCOMING_RUNNING (for example device load), otherwise
+         * postcopy migration would pause inside qemu_loadvm_state_main().
+         * Failing dirty-bitmaps won't fail the whole migration.
+         */
+        exit(1);
+    }
+}
+
 /*
  * Triggered by a postcopy_listen command; this thread takes over reading
  * the input stream, leaving the main thread free to carry on loading the rest
@@ -2131,53 +2149,38 @@ static void *postcopy_listen_thread(void *opaque)
                          "bitmaps are correctly migrated and valid.",
                          __func__, load_res, error_get_pretty(local_err));
             g_clear_pointer(&local_err, error_free);
-            load_res = 0; /* prevent further exit() */
         } else {
+            /*
+             * Something went fatally wrong and we have a bad state, QEMU will
+             * exit depending on if postcopy-exit-on-error is true, but the
+             * migration cannot be recovered.
+             */
             error_prepend(&local_err,
                           "loadvm failed during postcopy: %d: ", load_res);
             migrate_set_error(migr, local_err);
             g_clear_pointer(&local_err, error_report_err);
             migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                                            MIGRATION_STATUS_FAILED);
+            goto out;
         }
     }
-    if (load_res >= 0) {
-        /*
-         * This looks good, but it's possible that the device loading in the
-         * main thread hasn't finished yet, and so we might not be in 'RUN'
-         * state yet; wait for the end of the main thread.
-         */
-        qemu_event_wait(&mis->main_thread_load_event);
-    }
-    postcopy_incoming_cleanup(mis);
-
-    if (load_res < 0) {
-        /*
-         * If something went wrong then we have a bad state so exit;
-         * depending how far we got it might be possible at this point
-         * to leave the guest running and fire MCEs for pages that never
-         * arrived as a desperate recovery step.
-         */
-        rcu_unregister_thread();
-        exit(EXIT_FAILURE);
-    }
+    /*
+     * This looks good, but it's possible that the device loading in the
+     * main thread hasn't finished yet, and so we might not be in 'RUN'
+     * state yet; wait for the end of the main thread.
+     */
+    qemu_event_wait(&mis->main_thread_load_event);
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                                    MIGRATION_STATUS_COMPLETED);
-    /*
-     * If everything has worked fine, then the main thread has waited
-     * for us to start, and we're the last use of the mis.
-     * (If something broke then qemu will have to exit anyway since it's
-     * got a bad migration state).
-     */
-    bql_lock();
-    migration_incoming_state_destroy();
-    bql_unlock();
 
+out:
     rcu_unregister_thread();
     mis->have_listen_thread = false;
     postcopy_state_set(POSTCOPY_INCOMING_END);
 
+    migration_bh_schedule(postcopy_listen_thread_bh, NULL);
+
     object_unref(OBJECT(migr));
 
     return NULL;
diff --git a/migration/trace-events b/migration/trace-events
index e8edd1fbba..772636f3ac 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -193,7 +193,7 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
 source_return_path_thread_switchover_acked(void) ""
 migration_thread_low_pending(uint64_t pending) "%" PRIu64
 migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
-process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
+process_incoming_migration_co_end(int ret) "ret=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 postcopy_preempt_enabled(bool value) "%d"
 migration_precopy_complete(void) ""
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/7] migration: Respect exit-on-error when migration fails before resuming
  2025-10-30 21:49 [PATCH v3 0/7] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
                   ` (3 preceding siblings ...)
  2025-10-30 21:49 ` [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy() Juraj Marcin
@ 2025-10-30 21:49 ` Juraj Marcin
  2025-10-30 22:49   ` Peter Xu
  2025-10-30 21:49 ` [PATCH v3 6/7] migration: Make postcopy listen thread joinable Juraj Marcin
  2025-10-30 21:49 ` [PATCH v3 7/7] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
  6 siblings, 1 reply; 16+ messages in thread
From: Juraj Marcin @ 2025-10-30 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Peter Xu, Dr. David Alan Gilbert, Jiri Denemark,
	Fabiano Rosas

From: Juraj Marcin <jmarcin@redhat.com>

When exit-on-error was added to migration, it wasn't added to postcopy.
Even though postcopy migration will usually pause and not fail, in cases
it does unrecoverably fail before destination side has been started,
exit-on-error will allow management to query the error.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
---
 migration/postcopy-ram.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 48cbb46c27..91431f02a4 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2080,11 +2080,16 @@ bool postcopy_is_paused(MigrationStatus status)
 
 static void postcopy_listen_thread_bh(void *opaque)
 {
+    MigrationState *s = migrate_get_current();
     MigrationIncomingState *mis = migration_incoming_get_current();
 
     migration_incoming_state_destroy();
 
-    if (mis->state == MIGRATION_STATUS_FAILED) {
+    if (mis->state == MIGRATION_STATUS_FAILED && mis->exit_on_error) {
+        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
+            error_report_err(s->error);
+            s->error = NULL;
+        }
         /*
          * If something went wrong then we have a bad state so exit;
          * we only could have gotten here if something failed before
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 6/7] migration: Make postcopy listen thread joinable
  2025-10-30 21:49 [PATCH v3 0/7] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
                   ` (4 preceding siblings ...)
  2025-10-30 21:49 ` [PATCH v3 5/7] migration: Respect exit-on-error when migration fails before resuming Juraj Marcin
@ 2025-10-30 21:49 ` Juraj Marcin
  2025-10-30 22:49   ` Peter Xu
  2025-10-30 21:49 ` [PATCH v3 7/7] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
  6 siblings, 1 reply; 16+ messages in thread
From: Juraj Marcin @ 2025-10-30 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Peter Xu, Dr. David Alan Gilbert, Jiri Denemark,
	Fabiano Rosas

From: Juraj Marcin <jmarcin@redhat.com>

This patch makes the listen thread joinable instead detached, and joins
it alongside other postcopy threads.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
---
 migration/postcopy-ram.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 91431f02a4..8405cce7b4 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2181,7 +2181,6 @@ static void *postcopy_listen_thread(void *opaque)
 
 out:
     rcu_unregister_thread();
-    mis->have_listen_thread = false;
     postcopy_state_set(POSTCOPY_INCOMING_END);
 
     migration_bh_schedule(postcopy_listen_thread_bh, NULL);
@@ -2215,7 +2214,7 @@ int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp)
     mis->have_listen_thread = true;
     postcopy_thread_create(mis, &mis->listen_thread,
                            MIGRATION_THREAD_DST_LISTEN,
-                           postcopy_listen_thread, QEMU_THREAD_DETACHED);
+                           postcopy_listen_thread, QEMU_THREAD_JOINABLE);
 
     return 0;
 }
@@ -2224,6 +2223,11 @@ int postcopy_incoming_cleanup(MigrationIncomingState *mis)
 {
     int rc = 0;
 
+    if (mis->have_listen_thread) {
+        qemu_thread_join(&mis->listen_thread);
+        mis->have_listen_thread = false;
+    }
+
     if (migrate_postcopy_ram()) {
         rc = postcopy_ram_incoming_cleanup(mis);
     }
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 7/7] migration: Introduce POSTCOPY_DEVICE state
  2025-10-30 21:49 [PATCH v3 0/7] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
                   ` (5 preceding siblings ...)
  2025-10-30 21:49 ` [PATCH v3 6/7] migration: Make postcopy listen thread joinable Juraj Marcin
@ 2025-10-30 21:49 ` Juraj Marcin
  6 siblings, 0 replies; 16+ messages in thread
From: Juraj Marcin @ 2025-10-30 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juraj Marcin, Peter Xu, Dr. David Alan Gilbert, Jiri Denemark,
	Fabiano Rosas

From: Juraj Marcin <jmarcin@redhat.com>

Currently, when postcopy starts, the source VM starts switchover and
sends a package containing the state of all non-postcopiable devices.
When the destination loads this package, the switchover is complete and
the destination VM starts. However, if the device state load fails or
the destination side crashes, the source side is already in
POSTCOPY_ACTIVE state and cannot be recovered, even when it has the most
up-to-date machine state as the destination has not yet started.

This patch introduces a new POSTCOPY_DEVICE state which is active
while the destination machine is loading the device state, is not yet
running, and the source side can be resumed in case of a migration
failure.

To transition from POSTCOPY_DEVICE to POSTCOPY_ACTIVE, the source
side uses a PONG message that is a response to a PING message processed
just before the POSTCOPY_RUN command that starts the destination VM.
Thus, this feature is effective even if the destination side does not
yet support this new state.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c                 | 43 ++++++++++++++++++++++++---
 migration/migration.h                 |  3 ++
 migration/postcopy-ram.c              |  9 ++++--
 migration/savevm.c                    |  2 ++
 migration/savevm.h                    |  2 ++
 migration/trace-events                |  1 +
 qapi/migration.json                   |  8 +++--
 tests/qtest/migration/precopy-tests.c |  3 +-
 8 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 637be71bfe..76223cb940 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1206,6 +1206,7 @@ bool migration_is_running(void)
 
     switch (s->state) {
     case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_DEVICE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
     case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
@@ -1227,6 +1228,7 @@ static bool migration_is_active(void)
     MigrationState *s = current_migration;
 
     return (s->state == MIGRATION_STATUS_ACTIVE ||
+            s->state == MIGRATION_STATUS_POSTCOPY_DEVICE ||
             s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
 }
 
@@ -1349,6 +1351,7 @@ static void fill_source_migration_info(MigrationInfo *info)
         break;
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_CANCELLING:
+    case MIGRATION_STATUS_POSTCOPY_DEVICE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
@@ -1402,6 +1405,7 @@ static void fill_destination_migration_info(MigrationInfo *info)
     case MIGRATION_STATUS_CANCELLING:
     case MIGRATION_STATUS_CANCELLED:
     case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_DEVICE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
     case MIGRATION_STATUS_POSTCOPY_RECOVER:
@@ -1732,6 +1736,7 @@ bool migration_in_postcopy(void)
     MigrationState *s = migrate_get_current();
 
     switch (s->state) {
+    case MIGRATION_STATUS_POSTCOPY_DEVICE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
     case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
@@ -1833,6 +1838,9 @@ int migrate_init(MigrationState *s, Error **errp)
     memset(&mig_stats, 0, sizeof(mig_stats));
     migration_reset_vfio_bytes_transferred();
 
+    s->postcopy_package_loaded = false;
+    qemu_event_reset(&s->postcopy_package_loaded_event);
+
     return 0;
 }
 
@@ -2568,6 +2576,11 @@ static void *source_return_path_thread(void *opaque)
             tmp32 = ldl_be_p(buf);
             trace_source_return_path_thread_pong(tmp32);
             qemu_sem_post(&ms->rp_state.rp_pong_acks);
+            if (tmp32 == QEMU_VM_PING_PACKAGED_LOADED) {
+                trace_source_return_path_thread_postcopy_package_loaded();
+                ms->postcopy_package_loaded = true;
+                qemu_event_set(&ms->postcopy_package_loaded_event);
+            }
             break;
 
         case MIG_RP_MSG_REQ_PAGES:
@@ -2813,6 +2826,13 @@ static int postcopy_start(MigrationState *ms, Error **errp)
     if (migrate_postcopy_ram()) {
         qemu_savevm_send_ping(fb, 3);
     }
+    /*
+     * This ping will tell us that all non-postcopiable device state has been
+     * successfully loaded and the destination is about to start. When response
+     * is received, it will trigger transition from POSTCOPY_DEVICE to
+     * POSTCOPY_ACTIVE state.
+     */
+    qemu_savevm_send_ping(fb, QEMU_VM_PING_PACKAGED_LOADED);
 
     qemu_savevm_send_postcopy_run(fb);
 
@@ -2870,7 +2890,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
 
     /* Now, switchover looks all fine, switching to postcopy-active */
     migrate_set_state(&ms->state, MIGRATION_STATUS_DEVICE,
-                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+                      MIGRATION_STATUS_POSTCOPY_DEVICE);
 
     bql_unlock();
 
@@ -3311,8 +3331,8 @@ static MigThrError migration_detect_error(MigrationState *s)
         return postcopy_pause(s);
     } else {
         /*
-         * For precopy (or postcopy with error outside IO), we fail
-         * with no time.
+         * For precopy (or postcopy with error outside IO, or before dest
+         * starts), we fail with no time.
          */
         migrate_set_state(&s->state, state, MIGRATION_STATUS_FAILED);
         trace_migration_thread_file_err();
@@ -3447,7 +3467,8 @@ static MigIterateState migration_iteration_run(MigrationState *s)
 {
     uint64_t must_precopy, can_postcopy, pending_size;
     Error *local_err = NULL;
-    bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
+    bool in_postcopy = (s->state == MIGRATION_STATUS_POSTCOPY_DEVICE ||
+                        s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
     bool can_switchover = migration_can_switchover(s);
     bool complete_ready;
 
@@ -3463,6 +3484,18 @@ static MigIterateState migration_iteration_run(MigrationState *s)
          * POSTCOPY_ACTIVE it means switchover already happened.
          */
         complete_ready = !pending_size;
+        if (s->state == MIGRATION_STATUS_POSTCOPY_DEVICE &&
+            (s->postcopy_package_loaded || complete_ready)) {
+            /*
+             * If package has been loaded, the event is set and we will
+             * immediatelly transition to POSTCOPY_ACTIVE. If we are ready for
+             * completion, we need to wait for destination to load the postcopy
+             * package before actually completing.
+             */
+            qemu_event_wait(&s->postcopy_package_loaded_event);
+            migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
+                              MIGRATION_STATUS_POSTCOPY_ACTIVE);
+        }
     } else {
         /*
          * Exact pending reporting is only needed for precopy.  Taking RAM
@@ -4117,6 +4150,7 @@ static void migration_instance_finalize(Object *obj)
     qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
     qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
     error_free(ms->error);
+    qemu_event_destroy(&ms->postcopy_package_loaded_event);
 }
 
 static void migration_instance_init(Object *obj)
@@ -4138,6 +4172,7 @@ static void migration_instance_init(Object *obj)
     qemu_sem_init(&ms->wait_unplug_sem, 0);
     qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0);
     qemu_mutex_init(&ms->qemu_file_lock);
+    qemu_event_init(&ms->postcopy_package_loaded_event, 0);
 }
 
 /*
diff --git a/migration/migration.h b/migration/migration.h
index 4a37f7202c..213b33fe6e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -510,6 +510,9 @@ struct MigrationState {
     /* Is this a rdma migration */
     bool rdma_migration;
 
+    bool postcopy_package_loaded;
+    QemuEvent postcopy_package_loaded_event;
+
     GSource *hup_source;
 };
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 8405cce7b4..c1e3b110b6 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2117,7 +2117,7 @@ static void *postcopy_listen_thread(void *opaque)
     object_ref(OBJECT(migr));
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
+                                   MIGRATION_STATUS_POSTCOPY_DEVICE);
     qemu_event_set(&mis->thread_sync_event);
     trace_postcopy_ram_listen_thread_start();
 
@@ -2164,8 +2164,7 @@ static void *postcopy_listen_thread(void *opaque)
                           "loadvm failed during postcopy: %d: ", load_res);
             migrate_set_error(migr, local_err);
             g_clear_pointer(&local_err, error_report_err);
-            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                                           MIGRATION_STATUS_FAILED);
+            migrate_set_state(&mis->state, mis->state, MIGRATION_STATUS_FAILED);
             goto out;
         }
     }
@@ -2176,6 +2175,10 @@ static void *postcopy_listen_thread(void *opaque)
      */
     qemu_event_wait(&mis->main_thread_load_event);
 
+    /*
+     * Device load in the main thread has finished, we should be in
+     * POSTCOPY_ACTIVE now.
+     */
     migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                                    MIGRATION_STATUS_COMPLETED);
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 6ae3f740b5..ff01eb1a56 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2169,6 +2169,8 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis, Error **errp)
         return -1;
     }
 
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
     postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
     migration_bh_schedule(loadvm_postcopy_handle_run_bh, mis);
 
diff --git a/migration/savevm.h b/migration/savevm.h
index c337e3e3d1..125a2507b7 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -29,6 +29,8 @@
 #define QEMU_VM_COMMAND              0x08
 #define QEMU_VM_SECTION_FOOTER       0x7e
 
+#define QEMU_VM_PING_PACKAGED_LOADED 0x42
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_non_migratable_list(strList **reasons);
 int qemu_savevm_state_prepare(Error **errp);
diff --git a/migration/trace-events b/migration/trace-events
index 772636f3ac..bf11b62b17 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -191,6 +191,7 @@ source_return_path_thread_pong(uint32_t val) "0x%x"
 source_return_path_thread_shut(uint32_t val) "0x%x"
 source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
 source_return_path_thread_switchover_acked(void) ""
+source_return_path_thread_postcopy_package_loaded(void) ""
 migration_thread_low_pending(uint64_t pending) "%" PRIu64
 migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
 process_incoming_migration_co_end(int ret) "ret=%d"
diff --git a/qapi/migration.json b/qapi/migration.json
index c7a6737cc1..b63de1bd36 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -142,6 +142,10 @@
 # @postcopy-active: like active, but now in postcopy mode.
 #     (since 2.5)
 #
+# @postcopy-device: like postcopy-active, but the destination is still
+#     loading device state and is not running yet.  If migration fails
+#     during this state, the source side will resume.  (since 10.2)
+#
 # @postcopy-paused: during postcopy but paused.  (since 3.0)
 #
 # @postcopy-recover-setup: setup phase for a postcopy recovery
@@ -173,8 +177,8 @@
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'postcopy-active', 'postcopy-paused',
-            'postcopy-recover-setup',
+            'active', 'postcopy-device', 'postcopy-active',
+            'postcopy-paused', 'postcopy-recover-setup',
             'postcopy-recover', 'completed', 'failed', 'colo',
             'pre-switchover', 'device', 'wait-unplug' ] }
 ##
diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index bb38292550..57ca623de5 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -1316,13 +1316,14 @@ void migration_test_add_precopy(MigrationTestEnv *env)
     }
 
     /* ensure new status don't go unnoticed */
-    assert(MIGRATION_STATUS__MAX == 15);
+    assert(MIGRATION_STATUS__MAX == 16);
 
     for (int i = MIGRATION_STATUS_NONE; i < MIGRATION_STATUS__MAX; i++) {
         switch (i) {
         case MIGRATION_STATUS_DEVICE: /* happens too fast */
         case MIGRATION_STATUS_WAIT_UNPLUG: /* no support in tests */
         case MIGRATION_STATUS_COLO: /* no support in tests */
+        case MIGRATION_STATUS_POSTCOPY_DEVICE: /* postcopy can't be cancelled */
         case MIGRATION_STATUS_POSTCOPY_ACTIVE: /* postcopy can't be cancelled */
         case MIGRATION_STATUS_POSTCOPY_PAUSED:
         case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/7] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c
  2025-10-30 21:49 ` [PATCH v3 2/7] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c Juraj Marcin
@ 2025-10-30 22:28   ` Peter Xu
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Xu @ 2025-10-30 22:28 UTC (permalink / raw)
  To: Juraj Marcin
  Cc: qemu-devel, Dr. David Alan Gilbert, Jiri Denemark, Fabiano Rosas

On Thu, Oct 30, 2025 at 10:49:06PM +0100, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
> 
> This patch addresses a TODO about moving postcopy_ram_listen_thread() to
> postcopy file.
> 
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
> ---
>  migration/postcopy-ram.c | 107 +++++++++++++++++++++++++++++++++++++++
>  migration/postcopy-ram.h |   2 +
>  migration/savevm.c       | 107 ---------------------------------------
>  3 files changed, 109 insertions(+), 107 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 5471efb4f0..36d5415554 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -2077,3 +2077,110 @@ bool postcopy_is_paused(MigrationStatus status)
>      return status == MIGRATION_STATUS_POSTCOPY_PAUSED ||
>          status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
>  }
> +
> +/*
> + * Triggered by a postcopy_listen command; this thread takes over reading
> + * the input stream, leaving the main thread free to carry on loading the rest
> + * of the device state (from RAM).
> + * (TODO:This could do with being in a postcopy file - but there again it's
> + * just another input loop, not that postcopy specific)

I suppose touching the comment while moving (as long as explicitly
mentioned in the commit message) would be fine, when the comment is exactly
about "we should move it". :)  Not a big deal, thanks for the split.

Reviewed-by: Peter Xu <peterx@redhat.com>

> + */
> +void *postcopy_ram_listen_thread(void *opaque)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    QEMUFile *f = mis->from_src_file;
> +    int load_res;
> +    MigrationState *migr = migrate_get_current();
> +    Error *local_err = NULL;
> +
> +    object_ref(OBJECT(migr));
> +
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> +                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +    qemu_event_set(&mis->thread_sync_event);
> +    trace_postcopy_ram_listen_thread_start();
> +
> +    rcu_register_thread();
> +    /*
> +     * Because we're a thread and not a coroutine we can't yield
> +     * in qemu_file, and thus we must be blocking now.
> +     */
> +    qemu_file_set_blocking(f, true, &error_fatal);
> +
> +    /* TODO: sanity check that only postcopiable data will be loaded here */
> +    load_res = qemu_loadvm_state_main(f, mis, &local_err);
> +
> +    /*
> +     * This is tricky, but, mis->from_src_file can change after it
> +     * returns, when postcopy recovery happened. In the future, we may
> +     * want a wrapper for the QEMUFile handle.
> +     */
> +    f = mis->from_src_file;
> +
> +    /* And non-blocking again so we don't block in any cleanup */
> +    qemu_file_set_blocking(f, false, &error_fatal);
> +
> +    trace_postcopy_ram_listen_thread_exit();
> +    if (load_res < 0) {
> +        qemu_file_set_error(f, load_res);
> +        dirty_bitmap_mig_cancel_incoming();
> +        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> +            !migrate_postcopy_ram() && migrate_dirty_bitmaps())
> +        {
> +            error_report("%s: loadvm failed during postcopy: %d: %s. All states "
> +                         "are migrated except dirty bitmaps. Some dirty "
> +                         "bitmaps may be lost, and present migrated dirty "
> +                         "bitmaps are correctly migrated and valid.",
> +                         __func__, load_res, error_get_pretty(local_err));
> +            g_clear_pointer(&local_err, error_free);
> +            load_res = 0; /* prevent further exit() */
> +        } else {
> +            error_prepend(&local_err,
> +                          "loadvm failed during postcopy: %d: ", load_res);
> +            migrate_set_error(migr, local_err);
> +            g_clear_pointer(&local_err, error_report_err);
> +            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +                                           MIGRATION_STATUS_FAILED);
> +        }
> +    }
> +    if (load_res >= 0) {
> +        /*
> +         * This looks good, but it's possible that the device loading in the
> +         * main thread hasn't finished yet, and so we might not be in 'RUN'
> +         * state yet; wait for the end of the main thread.
> +         */
> +        qemu_event_wait(&mis->main_thread_load_event);
> +    }
> +    postcopy_ram_incoming_cleanup(mis);
> +
> +    if (load_res < 0) {
> +        /*
> +         * If something went wrong then we have a bad state so exit;
> +         * depending how far we got it might be possible at this point
> +         * to leave the guest running and fire MCEs for pages that never
> +         * arrived as a desperate recovery step.
> +         */
> +        rcu_unregister_thread();
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +                                   MIGRATION_STATUS_COMPLETED);
> +    /*
> +     * If everything has worked fine, then the main thread has waited
> +     * for us to start, and we're the last use of the mis.
> +     * (If something broke then qemu will have to exit anyway since it's
> +     * got a bad migration state).
> +     */
> +    bql_lock();
> +    migration_incoming_state_destroy();
> +    bql_unlock();
> +
> +    rcu_unregister_thread();
> +    mis->have_listen_thread = false;
> +    postcopy_state_set(POSTCOPY_INCOMING_END);
> +
> +    object_unref(OBJECT(migr));
> +
> +    return NULL;
> +}
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index ca19433b24..3e26db3e6b 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -199,4 +199,6 @@ bool postcopy_is_paused(MigrationStatus status);
>  void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
>                                     RAMBlock *rb);
>  
> +void *postcopy_ram_listen_thread(void *opaque);
> +
>  #endif
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 232cae090b..97fdd08c08 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2087,113 +2087,6 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
>      return 0;
>  }
>  
> -/*
> - * Triggered by a postcopy_listen command; this thread takes over reading
> - * the input stream, leaving the main thread free to carry on loading the rest
> - * of the device state (from RAM).
> - * (TODO:This could do with being in a postcopy file - but there again it's
> - * just another input loop, not that postcopy specific)
> - */
> -static void *postcopy_ram_listen_thread(void *opaque)
> -{
> -    MigrationIncomingState *mis = migration_incoming_get_current();
> -    QEMUFile *f = mis->from_src_file;
> -    int load_res;
> -    MigrationState *migr = migrate_get_current();
> -    Error *local_err = NULL;
> -
> -    object_ref(OBJECT(migr));
> -
> -    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> -                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
> -    qemu_event_set(&mis->thread_sync_event);
> -    trace_postcopy_ram_listen_thread_start();
> -
> -    rcu_register_thread();
> -    /*
> -     * Because we're a thread and not a coroutine we can't yield
> -     * in qemu_file, and thus we must be blocking now.
> -     */
> -    qemu_file_set_blocking(f, true, &error_fatal);
> -
> -    /* TODO: sanity check that only postcopiable data will be loaded here */
> -    load_res = qemu_loadvm_state_main(f, mis, &local_err);
> -
> -    /*
> -     * This is tricky, but, mis->from_src_file can change after it
> -     * returns, when postcopy recovery happened. In the future, we may
> -     * want a wrapper for the QEMUFile handle.
> -     */
> -    f = mis->from_src_file;
> -
> -    /* And non-blocking again so we don't block in any cleanup */
> -    qemu_file_set_blocking(f, false, &error_fatal);
> -
> -    trace_postcopy_ram_listen_thread_exit();
> -    if (load_res < 0) {
> -        qemu_file_set_error(f, load_res);
> -        dirty_bitmap_mig_cancel_incoming();
> -        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> -            !migrate_postcopy_ram() && migrate_dirty_bitmaps())
> -        {
> -            error_report("%s: loadvm failed during postcopy: %d: %s. All states "
> -                         "are migrated except dirty bitmaps. Some dirty "
> -                         "bitmaps may be lost, and present migrated dirty "
> -                         "bitmaps are correctly migrated and valid.",
> -                         __func__, load_res, error_get_pretty(local_err));
> -            g_clear_pointer(&local_err, error_free);
> -            load_res = 0; /* prevent further exit() */
> -        } else {
> -            error_prepend(&local_err,
> -                          "loadvm failed during postcopy: %d: ", load_res);
> -            migrate_set_error(migr, local_err);
> -            g_clear_pointer(&local_err, error_report_err);
> -            migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> -                                           MIGRATION_STATUS_FAILED);
> -        }
> -    }
> -    if (load_res >= 0) {
> -        /*
> -         * This looks good, but it's possible that the device loading in the
> -         * main thread hasn't finished yet, and so we might not be in 'RUN'
> -         * state yet; wait for the end of the main thread.
> -         */
> -        qemu_event_wait(&mis->main_thread_load_event);
> -    }
> -    postcopy_ram_incoming_cleanup(mis);
> -
> -    if (load_res < 0) {
> -        /*
> -         * If something went wrong then we have a bad state so exit;
> -         * depending how far we got it might be possible at this point
> -         * to leave the guest running and fire MCEs for pages that never
> -         * arrived as a desperate recovery step.
> -         */
> -        rcu_unregister_thread();
> -        exit(EXIT_FAILURE);
> -    }
> -
> -    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> -                                   MIGRATION_STATUS_COMPLETED);
> -    /*
> -     * If everything has worked fine, then the main thread has waited
> -     * for us to start, and we're the last use of the mis.
> -     * (If something broke then qemu will have to exit anyway since it's
> -     * got a bad migration state).
> -     */
> -    bql_lock();
> -    migration_incoming_state_destroy();
> -    bql_unlock();
> -
> -    rcu_unregister_thread();
> -    mis->have_listen_thread = false;
> -    postcopy_state_set(POSTCOPY_INCOMING_END);
> -
> -    object_unref(OBJECT(migr));
> -
> -    return NULL;
> -}
> -
>  /* After this message we must be able to immediately receive postcopy data */
>  static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis,
>                                           Error **errp)
> -- 
> 2.51.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/7] migration: Introduce postcopy incoming setup and cleanup functions
  2025-10-30 21:49 ` [PATCH v3 3/7] migration: Introduce postcopy incoming setup and cleanup functions Juraj Marcin
@ 2025-10-30 22:35   ` Peter Xu
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Xu @ 2025-10-30 22:35 UTC (permalink / raw)
  To: Juraj Marcin
  Cc: qemu-devel, Dr. David Alan Gilbert, Jiri Denemark, Fabiano Rosas

On Thu, Oct 30, 2025 at 10:49:07PM +0100, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
> 
> After moving postcopy_ram_listen_thread() to postcopy file, this patch
> introduces a pair of functions, postcopy_incoming_setup() and
> postcopy_incoming_cleanup(). These functions encapsulate setup and
> cleanup of all incoming postcopy resources, postcopy-ram and postcopy
> listen thread.
> 
> Furthermore, this patch also renames the postcopy_ram_listen_thread to
> postcopy_listen_thread, as this thread handles not only postcopy-ram,
> but also dirty-bitmaps and in the future it could handle other
> postcopiable devices.
> 
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy()
  2025-10-30 21:49 ` [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy() Juraj Marcin
@ 2025-10-30 22:49   ` Peter Xu
  2025-10-31 11:03     ` Juraj Marcin
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2025-10-30 22:49 UTC (permalink / raw)
  To: Juraj Marcin
  Cc: qemu-devel, Dr. David Alan Gilbert, Jiri Denemark, Fabiano Rosas

On Thu, Oct 30, 2025 at 10:49:08PM +0100, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
> 
> Currently, there are two functions that are responsible for calling the
> cleanup of the incoming migration state. With successful precopy, it's
> the incoming migration coroutine, and with successful postcopy it's the
> postcopy listen thread. However, if postcopy fails during in the device
> load, both functions will try to do the cleanup.
> 
> This patch refactors all cleanup that needs to be done on the incoming
> side into a common function and defines a clear boundary, who is
> responsible for the cleanup. The incoming migration coroutine is
> responsible for calling the cleanup function, unless the listen thread
> has been started, in which case the postcopy listen thread runs the
> incoming migration cleanup in its BH.
> 
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
> ---
>  migration/migration.c    | 44 +++++++++-------------------
>  migration/migration.h    |  1 +
>  migration/postcopy-ram.c | 63 +++++++++++++++++++++-------------------
>  migration/trace-events   |  2 +-
>  4 files changed, 49 insertions(+), 61 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 9a367f717e..637be71bfe 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -438,10 +438,15 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
>  
>  void migration_incoming_state_destroy(void)
>  {
> -    struct MigrationIncomingState *mis = migration_incoming_get_current();
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    PostcopyState ps = postcopy_state_get();
>  
>      multifd_recv_cleanup();
>  
> +    if (ps != POSTCOPY_INCOMING_NONE) {
> +        postcopy_incoming_cleanup(mis);
> +    }
> +
>      /*
>       * RAM state cleanup needs to happen after multifd cleanup, because
>       * multifd threads can use some of its states (receivedmap).
> @@ -866,7 +871,6 @@ process_incoming_migration_co(void *opaque)
>  {
>      MigrationState *s = migrate_get_current();
>      MigrationIncomingState *mis = migration_incoming_get_current();
> -    PostcopyState ps;
>      int ret;
>      Error *local_err = NULL;
>  
> @@ -883,25 +887,14 @@ process_incoming_migration_co(void *opaque)
>  
>      trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
>  
> -    ps = postcopy_state_get();
> -    trace_process_incoming_migration_co_end(ret, ps);
> -    if (ps != POSTCOPY_INCOMING_NONE) {
> -        if (ps == POSTCOPY_INCOMING_ADVISE) {
> -            /*
> -             * Where a migration had postcopy enabled (and thus went to advise)
> -             * but managed to complete within the precopy period, we can use
> -             * the normal exit.
> -             */
> -            postcopy_incoming_cleanup(mis);
> -        } else if (ret >= 0) {
> -            /*
> -             * Postcopy was started, cleanup should happen at the end of the
> -             * postcopy thread.
> -             */
> -            trace_process_incoming_migration_co_postcopy_end_main();
> -            goto out;
> -        }
> -        /* Else if something went wrong then just fall out of the normal exit */
> +    trace_process_incoming_migration_co_end(ret);
> +    if (mis->have_listen_thread) {
> +        /*
> +         * Postcopy was started, cleanup should happen at the end of the
> +         * postcopy listen thread.
> +         */
> +        trace_process_incoming_migration_co_postcopy_end_main();
> +        goto out;
>      }
>  
>      if (ret < 0) {
> @@ -933,15 +926,6 @@ fail:
>          }
>  
>          exit(EXIT_FAILURE);
> -    } else {
> -        /*
> -         * Report the error here in case that QEMU abruptly exits
> -         * when postcopy is enabled.
> -         */
> -        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> -            error_report_err(s->error);
> -            s->error = NULL;
> -        }

The patch looks all good itself.  Here a pure question: is the old code
wrong?  If user sets exit_on_error=false, then this path seems to be
releasing the error object, then query-migrate will see nothing?

>      }
>  out:
>      /* Pairs with the refcount taken in qmp_migrate_incoming() */
> diff --git a/migration/migration.h b/migration/migration.h
> index 01329bf824..4a37f7202c 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -254,6 +254,7 @@ struct MigrationIncomingState {
>  MigrationIncomingState *migration_incoming_get_current(void);
>  void migration_incoming_state_destroy(void);
>  void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
> +void migration_incoming_qemu_exit(void);
>  /*
>   * Functions to work with blocktime context
>   */
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index b47c955763..48cbb46c27 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -2078,6 +2078,24 @@ bool postcopy_is_paused(MigrationStatus status)
>          status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
>  }
>  
> +static void postcopy_listen_thread_bh(void *opaque)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +
> +    migration_incoming_state_destroy();
> +
> +    if (mis->state == MIGRATION_STATUS_FAILED) {
> +        /*
> +         * If something went wrong then we have a bad state so exit;
> +         * we only could have gotten here if something failed before
> +         * POSTCOPY_INCOMING_RUNNING (for example device load), otherwise
> +         * postcopy migration would pause inside qemu_loadvm_state_main().
> +         * Failing dirty-bitmaps won't fail the whole migration.
> +         */
> +        exit(1);
> +    }
> +}
> +
>  /*
>   * Triggered by a postcopy_listen command; this thread takes over reading
>   * the input stream, leaving the main thread free to carry on loading the rest
> @@ -2131,53 +2149,38 @@ static void *postcopy_listen_thread(void *opaque)
>                           "bitmaps are correctly migrated and valid.",
>                           __func__, load_res, error_get_pretty(local_err));
>              g_clear_pointer(&local_err, error_free);
> -            load_res = 0; /* prevent further exit() */
>          } else {
> +            /*
> +             * Something went fatally wrong and we have a bad state, QEMU will
> +             * exit depending on if postcopy-exit-on-error is true, but the
> +             * migration cannot be recovered.
> +             */
>              error_prepend(&local_err,
>                            "loadvm failed during postcopy: %d: ", load_res);
>              migrate_set_error(migr, local_err);
>              g_clear_pointer(&local_err, error_report_err);
>              migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>                                             MIGRATION_STATUS_FAILED);
> +            goto out;
>          }
>      }
> -    if (load_res >= 0) {
> -        /*
> -         * This looks good, but it's possible that the device loading in the
> -         * main thread hasn't finished yet, and so we might not be in 'RUN'
> -         * state yet; wait for the end of the main thread.
> -         */
> -        qemu_event_wait(&mis->main_thread_load_event);
> -    }
> -    postcopy_incoming_cleanup(mis);
> -
> -    if (load_res < 0) {
> -        /*
> -         * If something went wrong then we have a bad state so exit;
> -         * depending how far we got it might be possible at this point
> -         * to leave the guest running and fire MCEs for pages that never
> -         * arrived as a desperate recovery step.
> -         */
> -        rcu_unregister_thread();
> -        exit(EXIT_FAILURE);
> -    }
> +    /*
> +     * This looks good, but it's possible that the device loading in the
> +     * main thread hasn't finished yet, and so we might not be in 'RUN'
> +     * state yet; wait for the end of the main thread.
> +     */
> +    qemu_event_wait(&mis->main_thread_load_event);
>  
>      migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>                                     MIGRATION_STATUS_COMPLETED);
> -    /*
> -     * If everything has worked fine, then the main thread has waited
> -     * for us to start, and we're the last use of the mis.
> -     * (If something broke then qemu will have to exit anyway since it's
> -     * got a bad migration state).
> -     */
> -    bql_lock();
> -    migration_incoming_state_destroy();
> -    bql_unlock();
>  
> +out:
>      rcu_unregister_thread();
>      mis->have_listen_thread = false;
>      postcopy_state_set(POSTCOPY_INCOMING_END);
>  
> +    migration_bh_schedule(postcopy_listen_thread_bh, NULL);
> +
>      object_unref(OBJECT(migr));
>  
>      return NULL;
> diff --git a/migration/trace-events b/migration/trace-events
> index e8edd1fbba..772636f3ac 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -193,7 +193,7 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
>  source_return_path_thread_switchover_acked(void) ""
>  migration_thread_low_pending(uint64_t pending) "%" PRIu64
>  migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
> -process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> +process_incoming_migration_co_end(int ret) "ret=%d"
>  process_incoming_migration_co_postcopy_end_main(void) ""
>  postcopy_preempt_enabled(bool value) "%d"
>  migration_precopy_complete(void) ""
> -- 
> 2.51.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/7] migration: Respect exit-on-error when migration fails before resuming
  2025-10-30 21:49 ` [PATCH v3 5/7] migration: Respect exit-on-error when migration fails before resuming Juraj Marcin
@ 2025-10-30 22:49   ` Peter Xu
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Xu @ 2025-10-30 22:49 UTC (permalink / raw)
  To: Juraj Marcin
  Cc: qemu-devel, Dr. David Alan Gilbert, Jiri Denemark, Fabiano Rosas

On Thu, Oct 30, 2025 at 10:49:09PM +0100, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
> 
> When exit-on-error was added to migration, it wasn't added to postcopy.
> Even though postcopy migration will usually pause and not fail, in cases
> it does unrecoverably fail before destination side has been started,
> exit-on-error will allow management to query the error.
> 
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

> ---
>  migration/postcopy-ram.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 48cbb46c27..91431f02a4 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -2080,11 +2080,16 @@ bool postcopy_is_paused(MigrationStatus status)
>  
>  static void postcopy_listen_thread_bh(void *opaque)
>  {
> +    MigrationState *s = migrate_get_current();
>      MigrationIncomingState *mis = migration_incoming_get_current();
>  
>      migration_incoming_state_destroy();
>  
> -    if (mis->state == MIGRATION_STATUS_FAILED) {
> +    if (mis->state == MIGRATION_STATUS_FAILED && mis->exit_on_error) {
> +        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> +            error_report_err(s->error);
> +            s->error = NULL;
> +        }
>          /*
>           * If something went wrong then we have a bad state so exit;
>           * we only could have gotten here if something failed before
> -- 
> 2.51.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 6/7] migration: Make postcopy listen thread joinable
  2025-10-30 21:49 ` [PATCH v3 6/7] migration: Make postcopy listen thread joinable Juraj Marcin
@ 2025-10-30 22:49   ` Peter Xu
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Xu @ 2025-10-30 22:49 UTC (permalink / raw)
  To: Juraj Marcin
  Cc: qemu-devel, Dr. David Alan Gilbert, Jiri Denemark, Fabiano Rosas

On Thu, Oct 30, 2025 at 10:49:10PM +0100, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
> 
> This patch makes the listen thread joinable instead detached, and joins
> it alongside other postcopy threads.
> 
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy()
  2025-10-30 22:49   ` Peter Xu
@ 2025-10-31 11:03     ` Juraj Marcin
  2025-10-31 16:29       ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Juraj Marcin @ 2025-10-31 11:03 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Dr. David Alan Gilbert, Jiri Denemark, Fabiano Rosas

On 2025-10-30 18:49, Peter Xu wrote:
> On Thu, Oct 30, 2025 at 10:49:08PM +0100, Juraj Marcin wrote:
> > From: Juraj Marcin <jmarcin@redhat.com>
> > 
> > Currently, there are two functions that are responsible for calling the
> > cleanup of the incoming migration state. With successful precopy, it's
> > the incoming migration coroutine, and with successful postcopy it's the
> > postcopy listen thread. However, if postcopy fails during in the device
> > load, both functions will try to do the cleanup.
> > 
> > This patch refactors all cleanup that needs to be done on the incoming
> > side into a common function and defines a clear boundary, who is
> > responsible for the cleanup. The incoming migration coroutine is
> > responsible for calling the cleanup function, unless the listen thread
> > has been started, in which case the postcopy listen thread runs the
> > incoming migration cleanup in its BH.
> > 
> > Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
> > ---
> >  migration/migration.c    | 44 +++++++++-------------------
> >  migration/migration.h    |  1 +
> >  migration/postcopy-ram.c | 63 +++++++++++++++++++++-------------------
> >  migration/trace-events   |  2 +-
> >  4 files changed, 49 insertions(+), 61 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 9a367f717e..637be71bfe 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -438,10 +438,15 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
> >  
> >  void migration_incoming_state_destroy(void)
> >  {
> > -    struct MigrationIncomingState *mis = migration_incoming_get_current();
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    PostcopyState ps = postcopy_state_get();
> >  
> >      multifd_recv_cleanup();
> >  
> > +    if (ps != POSTCOPY_INCOMING_NONE) {
> > +        postcopy_incoming_cleanup(mis);
> > +    }
> > +
> >      /*
> >       * RAM state cleanup needs to happen after multifd cleanup, because
> >       * multifd threads can use some of its states (receivedmap).
> > @@ -866,7 +871,6 @@ process_incoming_migration_co(void *opaque)
> >  {
> >      MigrationState *s = migrate_get_current();
> >      MigrationIncomingState *mis = migration_incoming_get_current();
> > -    PostcopyState ps;
> >      int ret;
> >      Error *local_err = NULL;
> >  
> > @@ -883,25 +887,14 @@ process_incoming_migration_co(void *opaque)
> >  
> >      trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
> >  
> > -    ps = postcopy_state_get();
> > -    trace_process_incoming_migration_co_end(ret, ps);
> > -    if (ps != POSTCOPY_INCOMING_NONE) {
> > -        if (ps == POSTCOPY_INCOMING_ADVISE) {
> > -            /*
> > -             * Where a migration had postcopy enabled (and thus went to advise)
> > -             * but managed to complete within the precopy period, we can use
> > -             * the normal exit.
> > -             */
> > -            postcopy_incoming_cleanup(mis);
> > -        } else if (ret >= 0) {
> > -            /*
> > -             * Postcopy was started, cleanup should happen at the end of the
> > -             * postcopy thread.
> > -             */
> > -            trace_process_incoming_migration_co_postcopy_end_main();
> > -            goto out;
> > -        }
> > -        /* Else if something went wrong then just fall out of the normal exit */
> > +    trace_process_incoming_migration_co_end(ret);
> > +    if (mis->have_listen_thread) {
> > +        /*
> > +         * Postcopy was started, cleanup should happen at the end of the
> > +         * postcopy listen thread.
> > +         */
> > +        trace_process_incoming_migration_co_postcopy_end_main();
> > +        goto out;
> >      }
> >  
> >      if (ret < 0) {
> > @@ -933,15 +926,6 @@ fail:
> >          }
> >  
> >          exit(EXIT_FAILURE);
> > -    } else {
> > -        /*
> > -         * Report the error here in case that QEMU abruptly exits
> > -         * when postcopy is enabled.
> > -         */
> > -        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> > -            error_report_err(s->error);
> > -            s->error = NULL;
> > -        }
> 
> The patch looks all good itself.  Here a pure question: is the old code
> wrong?  If user sets exit_on_error=false, then this path seems to be
> releasing the error object, then query-migrate will see nothing?

I have tested this, and it is indeed what happens with the old code.
There is no "error-desc" in the response of "query-migrate" command,
just "status": "failed".

> 
> >      }
> >  out:
> >      /* Pairs with the refcount taken in qmp_migrate_incoming() */
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 01329bf824..4a37f7202c 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -254,6 +254,7 @@ struct MigrationIncomingState {
> >  MigrationIncomingState *migration_incoming_get_current(void);
> >  void migration_incoming_state_destroy(void);
> >  void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
> > +void migration_incoming_qemu_exit(void);
> >  /*
> >   * Functions to work with blocktime context
> >   */
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index b47c955763..48cbb46c27 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -2078,6 +2078,24 @@ bool postcopy_is_paused(MigrationStatus status)
> >          status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
> >  }
> >  
> > +static void postcopy_listen_thread_bh(void *opaque)
> > +{
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +
> > +    migration_incoming_state_destroy();
> > +
> > +    if (mis->state == MIGRATION_STATUS_FAILED) {
> > +        /*
> > +         * If something went wrong then we have a bad state so exit;
> > +         * we only could have gotten here if something failed before
> > +         * POSTCOPY_INCOMING_RUNNING (for example device load), otherwise
> > +         * postcopy migration would pause inside qemu_loadvm_state_main().
> > +         * Failing dirty-bitmaps won't fail the whole migration.
> > +         */
> > +        exit(1);
> > +    }
> > +}
> > +
> >  /*
> >   * Triggered by a postcopy_listen command; this thread takes over reading
> >   * the input stream, leaving the main thread free to carry on loading the rest
> > @@ -2131,53 +2149,38 @@ static void *postcopy_listen_thread(void *opaque)
> >                           "bitmaps are correctly migrated and valid.",
> >                           __func__, load_res, error_get_pretty(local_err));
> >              g_clear_pointer(&local_err, error_free);
> > -            load_res = 0; /* prevent further exit() */
> >          } else {
> > +            /*
> > +             * Something went fatally wrong and we have a bad state, QEMU will
> > +             * exit depending on if postcopy-exit-on-error is true, but the
> > +             * migration cannot be recovered.
> > +             */
> >              error_prepend(&local_err,
> >                            "loadvm failed during postcopy: %d: ", load_res);
> >              migrate_set_error(migr, local_err);
> >              g_clear_pointer(&local_err, error_report_err);
> >              migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> >                                             MIGRATION_STATUS_FAILED);
> > +            goto out;
> >          }
> >      }
> > -    if (load_res >= 0) {
> > -        /*
> > -         * This looks good, but it's possible that the device loading in the
> > -         * main thread hasn't finished yet, and so we might not be in 'RUN'
> > -         * state yet; wait for the end of the main thread.
> > -         */
> > -        qemu_event_wait(&mis->main_thread_load_event);
> > -    }
> > -    postcopy_incoming_cleanup(mis);
> > -
> > -    if (load_res < 0) {
> > -        /*
> > -         * If something went wrong then we have a bad state so exit;
> > -         * depending how far we got it might be possible at this point
> > -         * to leave the guest running and fire MCEs for pages that never
> > -         * arrived as a desperate recovery step.
> > -         */
> > -        rcu_unregister_thread();
> > -        exit(EXIT_FAILURE);
> > -    }
> > +    /*
> > +     * This looks good, but it's possible that the device loading in the
> > +     * main thread hasn't finished yet, and so we might not be in 'RUN'
> > +     * state yet; wait for the end of the main thread.
> > +     */
> > +    qemu_event_wait(&mis->main_thread_load_event);
> >  
> >      migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> >                                     MIGRATION_STATUS_COMPLETED);
> > -    /*
> > -     * If everything has worked fine, then the main thread has waited
> > -     * for us to start, and we're the last use of the mis.
> > -     * (If something broke then qemu will have to exit anyway since it's
> > -     * got a bad migration state).
> > -     */
> > -    bql_lock();
> > -    migration_incoming_state_destroy();
> > -    bql_unlock();
> >  
> > +out:
> >      rcu_unregister_thread();
> >      mis->have_listen_thread = false;
> >      postcopy_state_set(POSTCOPY_INCOMING_END);
> >  
> > +    migration_bh_schedule(postcopy_listen_thread_bh, NULL);
> > +
> >      object_unref(OBJECT(migr));
> >  
> >      return NULL;
> > diff --git a/migration/trace-events b/migration/trace-events
> > index e8edd1fbba..772636f3ac 100644
> > --- a/migration/trace-events
> > +++ b/migration/trace-events
> > @@ -193,7 +193,7 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
> >  source_return_path_thread_switchover_acked(void) ""
> >  migration_thread_low_pending(uint64_t pending) "%" PRIu64
> >  migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
> > -process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> > +process_incoming_migration_co_end(int ret) "ret=%d"
> >  process_incoming_migration_co_postcopy_end_main(void) ""
> >  postcopy_preempt_enabled(bool value) "%d"
> >  migration_precopy_complete(void) ""
> > -- 
> > 2.51.0
> > 
> 
> -- 
> Peter Xu
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy()
  2025-10-31 11:03     ` Juraj Marcin
@ 2025-10-31 16:29       ` Peter Xu
  2025-10-31 17:01         ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2025-10-31 16:29 UTC (permalink / raw)
  To: Juraj Marcin
  Cc: qemu-devel, Dr. David Alan Gilbert, Jiri Denemark, Fabiano Rosas,
	Arun Menon

On Fri, Oct 31, 2025 at 12:03:57PM +0100, Juraj Marcin wrote:
> On 2025-10-30 18:49, Peter Xu wrote:
> > On Thu, Oct 30, 2025 at 10:49:08PM +0100, Juraj Marcin wrote:
> > > From: Juraj Marcin <jmarcin@redhat.com>
> > > 
> > > Currently, there are two functions that are responsible for calling the
> > > cleanup of the incoming migration state. With successful precopy, it's
> > > the incoming migration coroutine, and with successful postcopy it's the
> > > postcopy listen thread. However, if postcopy fails during in the device
> > > load, both functions will try to do the cleanup.
> > > 
> > > This patch refactors all cleanup that needs to be done on the incoming
> > > side into a common function and defines a clear boundary, who is
> > > responsible for the cleanup. The incoming migration coroutine is
> > > responsible for calling the cleanup function, unless the listen thread
> > > has been started, in which case the postcopy listen thread runs the
> > > incoming migration cleanup in its BH.
> > > 
> > > Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
> > > ---
> > >  migration/migration.c    | 44 +++++++++-------------------
> > >  migration/migration.h    |  1 +
> > >  migration/postcopy-ram.c | 63 +++++++++++++++++++++-------------------
> > >  migration/trace-events   |  2 +-
> > >  4 files changed, 49 insertions(+), 61 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 9a367f717e..637be71bfe 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -438,10 +438,15 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
> > >  
> > >  void migration_incoming_state_destroy(void)
> > >  {
> > > -    struct MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    PostcopyState ps = postcopy_state_get();
> > >  
> > >      multifd_recv_cleanup();
> > >  
> > > +    if (ps != POSTCOPY_INCOMING_NONE) {
> > > +        postcopy_incoming_cleanup(mis);
> > > +    }
> > > +
> > >      /*
> > >       * RAM state cleanup needs to happen after multifd cleanup, because
> > >       * multifd threads can use some of its states (receivedmap).
> > > @@ -866,7 +871,6 @@ process_incoming_migration_co(void *opaque)
> > >  {
> > >      MigrationState *s = migrate_get_current();
> > >      MigrationIncomingState *mis = migration_incoming_get_current();
> > > -    PostcopyState ps;
> > >      int ret;
> > >      Error *local_err = NULL;
> > >  
> > > @@ -883,25 +887,14 @@ process_incoming_migration_co(void *opaque)
> > >  
> > >      trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
> > >  
> > > -    ps = postcopy_state_get();
> > > -    trace_process_incoming_migration_co_end(ret, ps);
> > > -    if (ps != POSTCOPY_INCOMING_NONE) {
> > > -        if (ps == POSTCOPY_INCOMING_ADVISE) {
> > > -            /*
> > > -             * Where a migration had postcopy enabled (and thus went to advise)
> > > -             * but managed to complete within the precopy period, we can use
> > > -             * the normal exit.
> > > -             */
> > > -            postcopy_incoming_cleanup(mis);
> > > -        } else if (ret >= 0) {
> > > -            /*
> > > -             * Postcopy was started, cleanup should happen at the end of the
> > > -             * postcopy thread.
> > > -             */
> > > -            trace_process_incoming_migration_co_postcopy_end_main();
> > > -            goto out;
> > > -        }
> > > -        /* Else if something went wrong then just fall out of the normal exit */
> > > +    trace_process_incoming_migration_co_end(ret);
> > > +    if (mis->have_listen_thread) {
> > > +        /*
> > > +         * Postcopy was started, cleanup should happen at the end of the
> > > +         * postcopy listen thread.
> > > +         */
> > > +        trace_process_incoming_migration_co_postcopy_end_main();
> > > +        goto out;
> > >      }
> > >  
> > >      if (ret < 0) {
> > > @@ -933,15 +926,6 @@ fail:
> > >          }
> > >  
> > >          exit(EXIT_FAILURE);
> > > -    } else {
> > > -        /*
> > > -         * Report the error here in case that QEMU abruptly exits
> > > -         * when postcopy is enabled.
> > > -         */
> > > -        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> > > -            error_report_err(s->error);
> > > -            s->error = NULL;
> > > -        }
> > 
> > The patch looks all good itself.  Here a pure question: is the old code
> > wrong?  If user sets exit_on_error=false, then this path seems to be
> > releasing the error object, then query-migrate will see nothing?
> 
> I have tested this, and it is indeed what happens with the old code.
> There is no "error-desc" in the response of "query-migrate" command,
> just "status": "failed".

Looks like it was a regression.

At the meantime, I feel like it's indeed fine if we do not dump error when
exit-on-error=false, like what you did here.

So I'll attach this:

Fixes: 9535435795 ("migration: push Error **errp into qemu_loadvm_state()")

And as the current patch looks all correct to me:

Reviewed-by: Peter Xu <peterx@redhat.com>

I've queued the series, thanks!

I'll post a small patch shortly to test the error-desc for
exit-on-error=false.

+Arun

> 
> > 
> > >      }
> > >  out:
> > >      /* Pairs with the refcount taken in qmp_migrate_incoming() */
> > > diff --git a/migration/migration.h b/migration/migration.h
> > > index 01329bf824..4a37f7202c 100644
> > > --- a/migration/migration.h
> > > +++ b/migration/migration.h
> > > @@ -254,6 +254,7 @@ struct MigrationIncomingState {
> > >  MigrationIncomingState *migration_incoming_get_current(void);
> > >  void migration_incoming_state_destroy(void);
> > >  void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
> > > +void migration_incoming_qemu_exit(void);
> > >  /*
> > >   * Functions to work with blocktime context
> > >   */
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index b47c955763..48cbb46c27 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -2078,6 +2078,24 @@ bool postcopy_is_paused(MigrationStatus status)
> > >          status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
> > >  }
> > >  
> > > +static void postcopy_listen_thread_bh(void *opaque)
> > > +{
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > +
> > > +    migration_incoming_state_destroy();
> > > +
> > > +    if (mis->state == MIGRATION_STATUS_FAILED) {
> > > +        /*
> > > +         * If something went wrong then we have a bad state so exit;
> > > +         * we only could have gotten here if something failed before
> > > +         * POSTCOPY_INCOMING_RUNNING (for example device load), otherwise
> > > +         * postcopy migration would pause inside qemu_loadvm_state_main().
> > > +         * Failing dirty-bitmaps won't fail the whole migration.
> > > +         */
> > > +        exit(1);
> > > +    }
> > > +}
> > > +
> > >  /*
> > >   * Triggered by a postcopy_listen command; this thread takes over reading
> > >   * the input stream, leaving the main thread free to carry on loading the rest
> > > @@ -2131,53 +2149,38 @@ static void *postcopy_listen_thread(void *opaque)
> > >                           "bitmaps are correctly migrated and valid.",
> > >                           __func__, load_res, error_get_pretty(local_err));
> > >              g_clear_pointer(&local_err, error_free);
> > > -            load_res = 0; /* prevent further exit() */
> > >          } else {
> > > +            /*
> > > +             * Something went fatally wrong and we have a bad state, QEMU will
> > > +             * exit depending on if postcopy-exit-on-error is true, but the
> > > +             * migration cannot be recovered.
> > > +             */
> > >              error_prepend(&local_err,
> > >                            "loadvm failed during postcopy: %d: ", load_res);
> > >              migrate_set_error(migr, local_err);
> > >              g_clear_pointer(&local_err, error_report_err);
> > >              migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > >                                             MIGRATION_STATUS_FAILED);
> > > +            goto out;
> > >          }
> > >      }
> > > -    if (load_res >= 0) {
> > > -        /*
> > > -         * This looks good, but it's possible that the device loading in the
> > > -         * main thread hasn't finished yet, and so we might not be in 'RUN'
> > > -         * state yet; wait for the end of the main thread.
> > > -         */
> > > -        qemu_event_wait(&mis->main_thread_load_event);
> > > -    }
> > > -    postcopy_incoming_cleanup(mis);
> > > -
> > > -    if (load_res < 0) {
> > > -        /*
> > > -         * If something went wrong then we have a bad state so exit;
> > > -         * depending how far we got it might be possible at this point
> > > -         * to leave the guest running and fire MCEs for pages that never
> > > -         * arrived as a desperate recovery step.
> > > -         */
> > > -        rcu_unregister_thread();
> > > -        exit(EXIT_FAILURE);
> > > -    }
> > > +    /*
> > > +     * This looks good, but it's possible that the device loading in the
> > > +     * main thread hasn't finished yet, and so we might not be in 'RUN'
> > > +     * state yet; wait for the end of the main thread.
> > > +     */
> > > +    qemu_event_wait(&mis->main_thread_load_event);
> > >  
> > >      migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > >                                     MIGRATION_STATUS_COMPLETED);
> > > -    /*
> > > -     * If everything has worked fine, then the main thread has waited
> > > -     * for us to start, and we're the last use of the mis.
> > > -     * (If something broke then qemu will have to exit anyway since it's
> > > -     * got a bad migration state).
> > > -     */
> > > -    bql_lock();
> > > -    migration_incoming_state_destroy();
> > > -    bql_unlock();
> > >  
> > > +out:
> > >      rcu_unregister_thread();
> > >      mis->have_listen_thread = false;
> > >      postcopy_state_set(POSTCOPY_INCOMING_END);
> > >  
> > > +    migration_bh_schedule(postcopy_listen_thread_bh, NULL);
> > > +
> > >      object_unref(OBJECT(migr));
> > >  
> > >      return NULL;
> > > diff --git a/migration/trace-events b/migration/trace-events
> > > index e8edd1fbba..772636f3ac 100644
> > > --- a/migration/trace-events
> > > +++ b/migration/trace-events
> > > @@ -193,7 +193,7 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
> > >  source_return_path_thread_switchover_acked(void) ""
> > >  migration_thread_low_pending(uint64_t pending) "%" PRIu64
> > >  migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
> > > -process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> > > +process_incoming_migration_co_end(int ret) "ret=%d"
> > >  process_incoming_migration_co_postcopy_end_main(void) ""
> > >  postcopy_preempt_enabled(bool value) "%d"
> > >  migration_precopy_complete(void) ""
> > > -- 
> > > 2.51.0
> > > 
> > 
> > -- 
> > Peter Xu
> > 
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy()
  2025-10-31 16:29       ` Peter Xu
@ 2025-10-31 17:01         ` Peter Xu
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Xu @ 2025-10-31 17:01 UTC (permalink / raw)
  To: Juraj Marcin
  Cc: qemu-devel, Dr. David Alan Gilbert, Jiri Denemark, Fabiano Rosas,
	Arun Menon

On Fri, Oct 31, 2025 at 12:29:28PM -0400, Peter Xu wrote:
> I've queued the series, thanks!

Ahh, this series seems to break iotests 194..

https://gitlab.com/peterx/qemu/-/jobs/11916678476

Juraj, please have a look.  If the fix is small feel free to send a fixup
on top of a patch.

IIUC, we at least need below, but maybe not enough:

diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index e114c0b269..f6b34814a2 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -76,7 +76,7 @@ with iotests.FilePath('source.img') as source_img_path, \
 
     while True:
         event1 = source_vm.event_wait('MIGRATION')
-        if event1['data']['status'] == 'postcopy-active':
+        if event1['data']['status'] in ('postcopy-active', 'postcopy-device'):
             # This event is racy, it depends do we really do postcopy or bitmap
             # was migrated during downtime (and no data to migrate in postcopy
             # phase). So, don't log it.

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-10-31 17:03 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-30 21:49 [PATCH v3 0/7] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
2025-10-30 21:49 ` [PATCH v3 1/7] migration: Do not try to start VM if disk activation fails Juraj Marcin
2025-10-30 21:49 ` [PATCH v3 2/7] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c Juraj Marcin
2025-10-30 22:28   ` Peter Xu
2025-10-30 21:49 ` [PATCH v3 3/7] migration: Introduce postcopy incoming setup and cleanup functions Juraj Marcin
2025-10-30 22:35   ` Peter Xu
2025-10-30 21:49 ` [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy() Juraj Marcin
2025-10-30 22:49   ` Peter Xu
2025-10-31 11:03     ` Juraj Marcin
2025-10-31 16:29       ` Peter Xu
2025-10-31 17:01         ` Peter Xu
2025-10-30 21:49 ` [PATCH v3 5/7] migration: Respect exit-on-error when migration fails before resuming Juraj Marcin
2025-10-30 22:49   ` Peter Xu
2025-10-30 21:49 ` [PATCH v3 6/7] migration: Make postcopy listen thread joinable Juraj Marcin
2025-10-30 22:49   ` Peter Xu
2025-10-30 21:49 ` [PATCH v3 7/7] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).