* [PATCH v2 1/4] migration: Do not try to start VM if disk activation fails
2025-10-27 15:41 [PATCH v2 0/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
@ 2025-10-27 15:41 ` Juraj Marcin
2025-10-27 15:41 ` [PATCH v2 2/4] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c Juraj Marcin
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Juraj Marcin @ 2025-10-27 15:41 UTC (permalink / raw)
To: qemu-devel
Cc: Juraj Marcin, Peter Xu, Dr. David Alan Gilbert, Jiri Denemark,
Fabiano Rosas
From: Peter Xu <peterx@redhat.com>
If a rare split brain happens (e.g. dest QEMU started running somehow,
taking shared drive locks), src QEMU may not be able to activate the
drives anymore. In this case, src QEMU shouldn't start the VM or it might
crash the block layer later with something like:
Meanwhile, src QEMU cannot try to continue either even if dest QEMU can
release the drive locks (e.g. by QMP "stop"). Because as long as dest QEMU
started running, it means dest QEMU's RAM is the only version that is
consistent with current status of the shared storage.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index a63b46bbef..e9acd0f63b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3551,6 +3551,8 @@ static MigIterateState migration_iteration_run(MigrationState *s)
static void migration_iteration_finish(MigrationState *s)
{
+ Error *local_err = NULL;
+
bql_lock();
/*
@@ -3574,11 +3576,28 @@ static void migration_iteration_finish(MigrationState *s)
case MIGRATION_STATUS_FAILED:
case MIGRATION_STATUS_CANCELLED:
case MIGRATION_STATUS_CANCELLING:
- /*
- * Re-activate the block drives if they're inactivated. Note, COLO
- * shouldn't use block_active at all, so it should be no-op there.
- */
- migration_block_activate(NULL);
+ if (!migration_block_activate(&local_err)) {
+ /*
+ * Re-activate the block drives if they're inactivated.
+ *
+ * If it fails (e.g. in case of a split brain, where dest QEMU
+ * might have taken some of the drive locks and running!), do
+ * not start VM, instead wait for mgmt to decide the next step.
+ *
+ * If dest already started, it means dest QEMU should contain
+ * all the data it needs and it properly owns all the drive
+ * locks. Then even if src QEMU got a FAILED in migration, it
+ * normally should mean we should treat the migration as
+ * COMPLETED.
+ *
+ * NOTE: it's not safe anymore to start VM on src now even if
+ * dest would release the drive locks. It's because as long as
+ * dest started running then only dest QEMU's RAM is consistent
+ * with the shared storage.
+ */
+ error_free(local_err);
+ break;
+ }
if (runstate_is_live(s->vm_old_state)) {
if (!runstate_check(RUN_STATE_SHUTDOWN)) {
vm_start();
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v2 2/4] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c
2025-10-27 15:41 [PATCH v2 0/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
2025-10-27 15:41 ` [PATCH v2 1/4] migration: Do not try to start VM if disk activation fails Juraj Marcin
@ 2025-10-27 15:41 ` Juraj Marcin
2025-10-29 19:53 ` Peter Xu
2025-10-27 15:41 ` [PATCH v2 3/4] migration: Refactor all incoming cleanup into migration_incoming_cleanup() Juraj Marcin
2025-10-27 15:41 ` [PATCH v2 4/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
3 siblings, 1 reply; 9+ messages in thread
From: Juraj Marcin @ 2025-10-27 15:41 UTC (permalink / raw)
To: qemu-devel
Cc: Juraj Marcin, Peter Xu, Dr. David Alan Gilbert, Jiri Denemark,
Fabiano Rosas
From: Juraj Marcin <jmarcin@redhat.com>
This patch addresses a TODO about moving postcopy_ram_listen_thread() to
postcopy file. Furthermore, this patch adds a pair of functions,
postcopy_incoming_setup() and postcopy_incoming_cleanup(), which sets up
and cleans the postcopy_ram_incoming state and the listen thread.
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
---
migration/migration.c | 2 +-
migration/postcopy-ram.c | 144 +++++++++++++++++++++++++++++++++++++++
migration/postcopy-ram.h | 3 +
migration/savevm.c | 131 +----------------------------------
4 files changed, 150 insertions(+), 130 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index e9acd0f63b..8827884a15 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -892,7 +892,7 @@ process_incoming_migration_co(void *opaque)
* but managed to complete within the precopy period, we can use
* the normal exit.
*/
- postcopy_ram_incoming_cleanup(mis);
+ postcopy_incoming_cleanup(mis);
} else if (ret >= 0) {
/*
* Postcopy was started, cleanup should happen at the end of the
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 5471efb4f0..dbbb2dfb78 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2077,3 +2077,147 @@ bool postcopy_is_paused(MigrationStatus status)
return status == MIGRATION_STATUS_POSTCOPY_PAUSED ||
status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
}
+
+/*
+ * Triggered by a postcopy_listen command; this thread takes over reading
+ * the input stream, leaving the main thread free to carry on loading the rest
+ * of the device state (from RAM).
+ */
+static void *postcopy_listen_thread(void *opaque)
+{
+ MigrationIncomingState *mis = migration_incoming_get_current();
+ QEMUFile *f = mis->from_src_file;
+ int load_res;
+ MigrationState *migr = migrate_get_current();
+ Error *local_err = NULL;
+
+ object_ref(OBJECT(migr));
+
+ migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+ MIGRATION_STATUS_POSTCOPY_ACTIVE);
+ qemu_event_set(&mis->thread_sync_event);
+ trace_postcopy_ram_listen_thread_start();
+
+ rcu_register_thread();
+ /*
+ * Because we're a thread and not a coroutine we can't yield
+ * in qemu_file, and thus we must be blocking now.
+ */
+ qemu_file_set_blocking(f, true, &error_fatal);
+
+ /* TODO: sanity check that only postcopiable data will be loaded here */
+ load_res = qemu_loadvm_state_main(f, mis, &local_err);
+
+ /*
+ * This is tricky, but, mis->from_src_file can change after it
+ * returns, when postcopy recovery happened. In the future, we may
+ * want a wrapper for the QEMUFile handle.
+ */
+ f = mis->from_src_file;
+
+ /* And non-blocking again so we don't block in any cleanup */
+ qemu_file_set_blocking(f, false, &error_fatal);
+
+ trace_postcopy_ram_listen_thread_exit();
+ if (load_res < 0) {
+ qemu_file_set_error(f, load_res);
+ dirty_bitmap_mig_cancel_incoming();
+ if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
+ !migrate_postcopy_ram() && migrate_dirty_bitmaps())
+ {
+ error_report("%s: loadvm failed during postcopy: %d. All states "
+ "are migrated except dirty bitmaps. Some dirty "
+ "bitmaps may be lost, and present migrated dirty "
+ "bitmaps are correctly migrated and valid.",
+ __func__, load_res);
+ load_res = 0; /* prevent further exit() */
+ } else {
+ error_prepend(&local_err,
+ "loadvm failed during postcopy: %d: ", load_res);
+ migrate_set_error(migr, local_err);
+ error_report_err(local_err);
+ migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+ MIGRATION_STATUS_FAILED);
+ }
+ }
+ if (load_res >= 0) {
+ /*
+ * This looks good, but it's possible that the device loading in the
+ * main thread hasn't finished yet, and so we might not be in 'RUN'
+ * state yet; wait for the end of the main thread.
+ */
+ qemu_event_wait(&mis->main_thread_load_event);
+ }
+ postcopy_incoming_cleanup(mis);
+
+ if (load_res < 0) {
+ /*
+ * If something went wrong then we have a bad state so exit;
+ * depending how far we got it might be possible at this point
+ * to leave the guest running and fire MCEs for pages that never
+ * arrived as a desperate recovery step.
+ */
+ rcu_unregister_thread();
+ exit(EXIT_FAILURE);
+ }
+
+ migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+ MIGRATION_STATUS_COMPLETED);
+ /*
+ * If everything has worked fine, then the main thread has waited
+ * for us to start, and we're the last use of the mis.
+ * (If something broke then qemu will have to exit anyway since it's
+ * got a bad migration state).
+ */
+ bql_lock();
+ migration_incoming_state_destroy();
+ bql_unlock();
+
+ rcu_unregister_thread();
+ mis->have_listen_thread = false;
+ postcopy_state_set(POSTCOPY_INCOMING_END);
+
+ object_unref(OBJECT(migr));
+
+ return NULL;
+}
+
+int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp)
+{
+ /*
+ * Sensitise RAM - can now generate requests for blocks that don't exist
+ * However, at this point the CPU shouldn't be running, and the IO
+ * shouldn't be doing anything yet so don't actually expect requests
+ */
+ if (migrate_postcopy_ram()) {
+ if (postcopy_ram_incoming_setup(mis)) {
+ postcopy_ram_incoming_cleanup(mis);
+ error_setg(errp, "Failed to setup incoming postcopy RAM blocks");
+ return -1;
+ }
+ }
+
+ trace_loadvm_postcopy_handle_listen("after uffd");
+
+ if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, errp)) {
+ return -1;
+ }
+
+ mis->have_listen_thread = true;
+ postcopy_thread_create(mis, &mis->listen_thread,
+ MIGRATION_THREAD_DST_LISTEN,
+ postcopy_listen_thread, QEMU_THREAD_DETACHED);
+
+ return 0;
+}
+
+int postcopy_incoming_cleanup(MigrationIncomingState *mis)
+{
+ int rc = 0;
+
+ if (migrate_postcopy_ram()) {
+ rc = postcopy_ram_incoming_cleanup(mis);
+ }
+
+ return rc;
+}
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index ca19433b24..a080dd65a7 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -199,4 +199,7 @@ bool postcopy_is_paused(MigrationStatus status);
void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
RAMBlock *rb);
+int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp);
+int postcopy_incoming_cleanup(MigrationIncomingState *mis);
+
#endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 7b35ec4dd0..96a2699ca7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2087,112 +2087,6 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
return 0;
}
-/*
- * Triggered by a postcopy_listen command; this thread takes over reading
- * the input stream, leaving the main thread free to carry on loading the rest
- * of the device state (from RAM).
- * (TODO:This could do with being in a postcopy file - but there again it's
- * just another input loop, not that postcopy specific)
- */
-static void *postcopy_ram_listen_thread(void *opaque)
-{
- MigrationIncomingState *mis = migration_incoming_get_current();
- QEMUFile *f = mis->from_src_file;
- int load_res;
- MigrationState *migr = migrate_get_current();
- Error *local_err = NULL;
-
- object_ref(OBJECT(migr));
-
- migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
- MIGRATION_STATUS_POSTCOPY_ACTIVE);
- qemu_event_set(&mis->thread_sync_event);
- trace_postcopy_ram_listen_thread_start();
-
- rcu_register_thread();
- /*
- * Because we're a thread and not a coroutine we can't yield
- * in qemu_file, and thus we must be blocking now.
- */
- qemu_file_set_blocking(f, true, &error_fatal);
-
- /* TODO: sanity check that only postcopiable data will be loaded here */
- load_res = qemu_loadvm_state_main(f, mis, &local_err);
-
- /*
- * This is tricky, but, mis->from_src_file can change after it
- * returns, when postcopy recovery happened. In the future, we may
- * want a wrapper for the QEMUFile handle.
- */
- f = mis->from_src_file;
-
- /* And non-blocking again so we don't block in any cleanup */
- qemu_file_set_blocking(f, false, &error_fatal);
-
- trace_postcopy_ram_listen_thread_exit();
- if (load_res < 0) {
- qemu_file_set_error(f, load_res);
- dirty_bitmap_mig_cancel_incoming();
- if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
- !migrate_postcopy_ram() && migrate_dirty_bitmaps())
- {
- error_report("%s: loadvm failed during postcopy: %d. All states "
- "are migrated except dirty bitmaps. Some dirty "
- "bitmaps may be lost, and present migrated dirty "
- "bitmaps are correctly migrated and valid.",
- __func__, load_res);
- load_res = 0; /* prevent further exit() */
- } else {
- error_prepend(&local_err,
- "loadvm failed during postcopy: %d: ", load_res);
- migrate_set_error(migr, local_err);
- error_report_err(local_err);
- migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
- MIGRATION_STATUS_FAILED);
- }
- }
- if (load_res >= 0) {
- /*
- * This looks good, but it's possible that the device loading in the
- * main thread hasn't finished yet, and so we might not be in 'RUN'
- * state yet; wait for the end of the main thread.
- */
- qemu_event_wait(&mis->main_thread_load_event);
- }
- postcopy_ram_incoming_cleanup(mis);
-
- if (load_res < 0) {
- /*
- * If something went wrong then we have a bad state so exit;
- * depending how far we got it might be possible at this point
- * to leave the guest running and fire MCEs for pages that never
- * arrived as a desperate recovery step.
- */
- rcu_unregister_thread();
- exit(EXIT_FAILURE);
- }
-
- migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
- MIGRATION_STATUS_COMPLETED);
- /*
- * If everything has worked fine, then the main thread has waited
- * for us to start, and we're the last use of the mis.
- * (If something broke then qemu will have to exit anyway since it's
- * got a bad migration state).
- */
- bql_lock();
- migration_incoming_state_destroy();
- bql_unlock();
-
- rcu_unregister_thread();
- mis->have_listen_thread = false;
- postcopy_state_set(POSTCOPY_INCOMING_END);
-
- object_unref(OBJECT(migr));
-
- return NULL;
-}
-
/* After this message we must be able to immediately receive postcopy data */
static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis,
Error **errp)
@@ -2218,32 +2112,11 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis,
trace_loadvm_postcopy_handle_listen("after discard");
- /*
- * Sensitise RAM - can now generate requests for blocks that don't exist
- * However, at this point the CPU shouldn't be running, and the IO
- * shouldn't be doing anything yet so don't actually expect requests
- */
- if (migrate_postcopy_ram()) {
- if (postcopy_ram_incoming_setup(mis)) {
- postcopy_ram_incoming_cleanup(mis);
- error_setg(errp, "Failed to setup incoming postcopy RAM blocks");
- return -1;
- }
- }
+ int rc = postcopy_incoming_setup(mis, errp);
- trace_loadvm_postcopy_handle_listen("after uffd");
-
- if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, errp)) {
- return -1;
- }
-
- mis->have_listen_thread = true;
- postcopy_thread_create(mis, &mis->listen_thread,
- MIGRATION_THREAD_DST_LISTEN,
- postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
trace_loadvm_postcopy_handle_listen("return");
- return 0;
+ return rc;
}
static void loadvm_postcopy_handle_run_bh(void *opaque)
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v2 2/4] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c
2025-10-27 15:41 ` [PATCH v2 2/4] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c Juraj Marcin
@ 2025-10-29 19:53 ` Peter Xu
2025-10-30 13:08 ` Juraj Marcin
0 siblings, 1 reply; 9+ messages in thread
From: Peter Xu @ 2025-10-29 19:53 UTC (permalink / raw)
To: Juraj Marcin
Cc: qemu-devel, Dr. David Alan Gilbert, Jiri Denemark, Fabiano Rosas
On Mon, Oct 27, 2025 at 04:41:09PM +0100, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
>
> This patch addresses a TODO about moving postcopy_ram_listen_thread() to
> postcopy file. Furthermore, this patch adds a pair of functions,
> postcopy_incoming_setup() and postcopy_incoming_cleanup(), which sets up
> and cleans the postcopy_ram_incoming state and the listen thread.
It would be great to separate code movements and changes.
Meanwhile, this patch won't apply cleanly on top of the staging branch that
I kept.. it'll be great if you could rebase this series to the branch when
repost:
https://gitlab.com/peterx/qemu/-/commits/staging
>
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
> ---
> migration/migration.c | 2 +-
> migration/postcopy-ram.c | 144 +++++++++++++++++++++++++++++++++++++++
> migration/postcopy-ram.h | 3 +
> migration/savevm.c | 131 +----------------------------------
> 4 files changed, 150 insertions(+), 130 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index e9acd0f63b..8827884a15 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -892,7 +892,7 @@ process_incoming_migration_co(void *opaque)
> * but managed to complete within the precopy period, we can use
> * the normal exit.
> */
> - postcopy_ram_incoming_cleanup(mis);
> + postcopy_incoming_cleanup(mis);
> } else if (ret >= 0) {
> /*
> * Postcopy was started, cleanup should happen at the end of the
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 5471efb4f0..dbbb2dfb78 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -2077,3 +2077,147 @@ bool postcopy_is_paused(MigrationStatus status)
> return status == MIGRATION_STATUS_POSTCOPY_PAUSED ||
> status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
> }
> +
> +/*
> + * Triggered by a postcopy_listen command; this thread takes over reading
> + * the input stream, leaving the main thread free to carry on loading the rest
> + * of the device state (from RAM).
> + */
> +static void *postcopy_listen_thread(void *opaque)
> +{
> + MigrationIncomingState *mis = migration_incoming_get_current();
> + QEMUFile *f = mis->from_src_file;
> + int load_res;
> + MigrationState *migr = migrate_get_current();
> + Error *local_err = NULL;
> +
> + object_ref(OBJECT(migr));
> +
> + migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> + MIGRATION_STATUS_POSTCOPY_ACTIVE);
> + qemu_event_set(&mis->thread_sync_event);
> + trace_postcopy_ram_listen_thread_start();
> +
> + rcu_register_thread();
> + /*
> + * Because we're a thread and not a coroutine we can't yield
> + * in qemu_file, and thus we must be blocking now.
> + */
> + qemu_file_set_blocking(f, true, &error_fatal);
> +
> + /* TODO: sanity check that only postcopiable data will be loaded here */
> + load_res = qemu_loadvm_state_main(f, mis, &local_err);
> +
> + /*
> + * This is tricky, but, mis->from_src_file can change after it
> + * returns, when postcopy recovery happened. In the future, we may
> + * want a wrapper for the QEMUFile handle.
> + */
> + f = mis->from_src_file;
> +
> + /* And non-blocking again so we don't block in any cleanup */
> + qemu_file_set_blocking(f, false, &error_fatal);
> +
> + trace_postcopy_ram_listen_thread_exit();
> + if (load_res < 0) {
> + qemu_file_set_error(f, load_res);
> + dirty_bitmap_mig_cancel_incoming();
> + if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> + !migrate_postcopy_ram() && migrate_dirty_bitmaps())
> + {
> + error_report("%s: loadvm failed during postcopy: %d. All states "
> + "are migrated except dirty bitmaps. Some dirty "
> + "bitmaps may be lost, and present migrated dirty "
> + "bitmaps are correctly migrated and valid.",
> + __func__, load_res);
> + load_res = 0; /* prevent further exit() */
> + } else {
> + error_prepend(&local_err,
> + "loadvm failed during postcopy: %d: ", load_res);
> + migrate_set_error(migr, local_err);
> + error_report_err(local_err);
> + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> + MIGRATION_STATUS_FAILED);
> + }
> + }
> + if (load_res >= 0) {
> + /*
> + * This looks good, but it's possible that the device loading in the
> + * main thread hasn't finished yet, and so we might not be in 'RUN'
> + * state yet; wait for the end of the main thread.
> + */
> + qemu_event_wait(&mis->main_thread_load_event);
> + }
> + postcopy_incoming_cleanup(mis);
Here I did notice that this replaced the old
postcopy_ram_incoming_cleanup(). I'm just curious: is it needed to check
postcopy-ram=on once more?
The two callers of postcopy_incoming_cleanup() should always have
postcopy-ram enabled, right?
> +
> + if (load_res < 0) {
> + /*
> + * If something went wrong then we have a bad state so exit;
> + * depending how far we got it might be possible at this point
> + * to leave the guest running and fire MCEs for pages that never
> + * arrived as a desperate recovery step.
> + */
> + rcu_unregister_thread();
> + exit(EXIT_FAILURE);
> + }
> +
> + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> + MIGRATION_STATUS_COMPLETED);
> + /*
> + * If everything has worked fine, then the main thread has waited
> + * for us to start, and we're the last use of the mis.
> + * (If something broke then qemu will have to exit anyway since it's
> + * got a bad migration state).
> + */
> + bql_lock();
> + migration_incoming_state_destroy();
> + bql_unlock();
> +
> + rcu_unregister_thread();
> + mis->have_listen_thread = false;
> + postcopy_state_set(POSTCOPY_INCOMING_END);
> +
> + object_unref(OBJECT(migr));
> +
> + return NULL;
> +}
> +
> +int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp)
> +{
> + /*
> + * Sensitise RAM - can now generate requests for blocks that don't exist
> + * However, at this point the CPU shouldn't be running, and the IO
> + * shouldn't be doing anything yet so don't actually expect requests
> + */
> + if (migrate_postcopy_ram()) {
> + if (postcopy_ram_incoming_setup(mis)) {
> + postcopy_ram_incoming_cleanup(mis);
> + error_setg(errp, "Failed to setup incoming postcopy RAM blocks");
> + return -1;
> + }
> + }
> +
> + trace_loadvm_postcopy_handle_listen("after uffd");
> +
> + if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, errp)) {
> + return -1;
> + }
> +
> + mis->have_listen_thread = true;
> + postcopy_thread_create(mis, &mis->listen_thread,
> + MIGRATION_THREAD_DST_LISTEN,
> + postcopy_listen_thread, QEMU_THREAD_DETACHED);
> +
> + return 0;
> +}
> +
> +int postcopy_incoming_cleanup(MigrationIncomingState *mis)
> +{
> + int rc = 0;
> +
> + if (migrate_postcopy_ram()) {
> + rc = postcopy_ram_incoming_cleanup(mis);
> + }
> +
> + return rc;
> +}
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index ca19433b24..a080dd65a7 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -199,4 +199,7 @@ bool postcopy_is_paused(MigrationStatus status);
> void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
> RAMBlock *rb);
>
> +int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp);
> +int postcopy_incoming_cleanup(MigrationIncomingState *mis);
> +
> #endif
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 7b35ec4dd0..96a2699ca7 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2087,112 +2087,6 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
> return 0;
> }
>
> -/*
> - * Triggered by a postcopy_listen command; this thread takes over reading
> - * the input stream, leaving the main thread free to carry on loading the rest
> - * of the device state (from RAM).
> - * (TODO:This could do with being in a postcopy file - but there again it's
> - * just another input loop, not that postcopy specific)
> - */
> -static void *postcopy_ram_listen_thread(void *opaque)
> -{
> - MigrationIncomingState *mis = migration_incoming_get_current();
> - QEMUFile *f = mis->from_src_file;
> - int load_res;
> - MigrationState *migr = migrate_get_current();
> - Error *local_err = NULL;
> -
> - object_ref(OBJECT(migr));
> -
> - migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> - MIGRATION_STATUS_POSTCOPY_ACTIVE);
> - qemu_event_set(&mis->thread_sync_event);
> - trace_postcopy_ram_listen_thread_start();
> -
> - rcu_register_thread();
> - /*
> - * Because we're a thread and not a coroutine we can't yield
> - * in qemu_file, and thus we must be blocking now.
> - */
> - qemu_file_set_blocking(f, true, &error_fatal);
> -
> - /* TODO: sanity check that only postcopiable data will be loaded here */
> - load_res = qemu_loadvm_state_main(f, mis, &local_err);
> -
> - /*
> - * This is tricky, but, mis->from_src_file can change after it
> - * returns, when postcopy recovery happened. In the future, we may
> - * want a wrapper for the QEMUFile handle.
> - */
> - f = mis->from_src_file;
> -
> - /* And non-blocking again so we don't block in any cleanup */
> - qemu_file_set_blocking(f, false, &error_fatal);
> -
> - trace_postcopy_ram_listen_thread_exit();
> - if (load_res < 0) {
> - qemu_file_set_error(f, load_res);
> - dirty_bitmap_mig_cancel_incoming();
> - if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> - !migrate_postcopy_ram() && migrate_dirty_bitmaps())
> - {
> - error_report("%s: loadvm failed during postcopy: %d. All states "
> - "are migrated except dirty bitmaps. Some dirty "
> - "bitmaps may be lost, and present migrated dirty "
> - "bitmaps are correctly migrated and valid.",
> - __func__, load_res);
> - load_res = 0; /* prevent further exit() */
> - } else {
> - error_prepend(&local_err,
> - "loadvm failed during postcopy: %d: ", load_res);
> - migrate_set_error(migr, local_err);
> - error_report_err(local_err);
> - migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> - MIGRATION_STATUS_FAILED);
> - }
> - }
> - if (load_res >= 0) {
> - /*
> - * This looks good, but it's possible that the device loading in the
> - * main thread hasn't finished yet, and so we might not be in 'RUN'
> - * state yet; wait for the end of the main thread.
> - */
> - qemu_event_wait(&mis->main_thread_load_event);
> - }
> - postcopy_ram_incoming_cleanup(mis);
> -
> - if (load_res < 0) {
> - /*
> - * If something went wrong then we have a bad state so exit;
> - * depending how far we got it might be possible at this point
> - * to leave the guest running and fire MCEs for pages that never
> - * arrived as a desperate recovery step.
> - */
> - rcu_unregister_thread();
> - exit(EXIT_FAILURE);
> - }
> -
> - migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> - MIGRATION_STATUS_COMPLETED);
> - /*
> - * If everything has worked fine, then the main thread has waited
> - * for us to start, and we're the last use of the mis.
> - * (If something broke then qemu will have to exit anyway since it's
> - * got a bad migration state).
> - */
> - bql_lock();
> - migration_incoming_state_destroy();
> - bql_unlock();
> -
> - rcu_unregister_thread();
> - mis->have_listen_thread = false;
> - postcopy_state_set(POSTCOPY_INCOMING_END);
> -
> - object_unref(OBJECT(migr));
> -
> - return NULL;
> -}
> -
> /* After this message we must be able to immediately receive postcopy data */
> static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis,
> Error **errp)
> @@ -2218,32 +2112,11 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis,
>
> trace_loadvm_postcopy_handle_listen("after discard");
>
> - /*
> - * Sensitise RAM - can now generate requests for blocks that don't exist
> - * However, at this point the CPU shouldn't be running, and the IO
> - * shouldn't be doing anything yet so don't actually expect requests
> - */
> - if (migrate_postcopy_ram()) {
> - if (postcopy_ram_incoming_setup(mis)) {
> - postcopy_ram_incoming_cleanup(mis);
> - error_setg(errp, "Failed to setup incoming postcopy RAM blocks");
> - return -1;
> - }
> - }
> + int rc = postcopy_incoming_setup(mis, errp);
>
> - trace_loadvm_postcopy_handle_listen("after uffd");
> -
> - if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, errp)) {
> - return -1;
> - }
> -
> - mis->have_listen_thread = true;
> - postcopy_thread_create(mis, &mis->listen_thread,
> - MIGRATION_THREAD_DST_LISTEN,
> - postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
> trace_loadvm_postcopy_handle_listen("return");
>
> - return 0;
> + return rc;
> }
>
> static void loadvm_postcopy_handle_run_bh(void *opaque)
> --
> 2.51.0
>
--
Peter Xu
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH v2 2/4] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c
2025-10-29 19:53 ` Peter Xu
@ 2025-10-30 13:08 ` Juraj Marcin
0 siblings, 0 replies; 9+ messages in thread
From: Juraj Marcin @ 2025-10-30 13:08 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Dr. David Alan Gilbert, Jiri Denemark, Fabiano Rosas
On 2025-10-29 15:53, Peter Xu wrote:
> On Mon, Oct 27, 2025 at 04:41:09PM +0100, Juraj Marcin wrote:
> > From: Juraj Marcin <jmarcin@redhat.com>
> >
> > This patch addresses a TODO about moving postcopy_ram_listen_thread() to
> > postcopy file. Furthermore, this patch adds a pair of functions,
> > postcopy_incoming_setup() and postcopy_incoming_cleanup(), which sets up
> > and cleans the postcopy_ram_incoming state and the listen thread.
>
> It would be great to separate code movements and changes.
I wanted to get around the need to expose the postcopy listen thread
function in a header file, hence the postcopy_incoming_setup() function,
adding postcopy_incoming_cleanup() together then seemed natural to me.
However, I could split it like this:
1. Move postcopy_ram_listen_thread() to postcopy-ram.c and add a simple
wrapper for postcopy_thread_create() (something like
postcopy_ram_listen_thread_create).
2. Rename postcopy_ram_listen_thread_create to postcopy_incoming_setup
and move rest of loadvm_postcopy_handle_listen, that is moved by this
patch, and lastly introduce postcopy_ram_incoming_cleanup().
>
> Meanwhile, this patch won't apply cleanly on top of the staging branch that
> I kept.. it'll be great if you could rebase this series to the branch when
> repost:
>
> https://gitlab.com/peterx/qemu/-/commits/staging
I'll rebase.
>
> >
> > Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
> > ---
> > migration/migration.c | 2 +-
> > migration/postcopy-ram.c | 144 +++++++++++++++++++++++++++++++++++++++
> > migration/postcopy-ram.h | 3 +
> > migration/savevm.c | 131 +----------------------------------
> > 4 files changed, 150 insertions(+), 130 deletions(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index e9acd0f63b..8827884a15 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -892,7 +892,7 @@ process_incoming_migration_co(void *opaque)
> > * but managed to complete within the precopy period, we can use
> > * the normal exit.
> > */
> > - postcopy_ram_incoming_cleanup(mis);
> > + postcopy_incoming_cleanup(mis);
> > } else if (ret >= 0) {
> > /*
> > * Postcopy was started, cleanup should happen at the end of the
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 5471efb4f0..dbbb2dfb78 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -2077,3 +2077,147 @@ bool postcopy_is_paused(MigrationStatus status)
> > return status == MIGRATION_STATUS_POSTCOPY_PAUSED ||
> > status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
> > }
> > +
> > +/*
> > + * Triggered by a postcopy_listen command; this thread takes over reading
> > + * the input stream, leaving the main thread free to carry on loading the rest
> > + * of the device state (from RAM).
> > + */
> > +static void *postcopy_listen_thread(void *opaque)
> > +{
> > + MigrationIncomingState *mis = migration_incoming_get_current();
> > + QEMUFile *f = mis->from_src_file;
> > + int load_res;
> > + MigrationState *migr = migrate_get_current();
> > + Error *local_err = NULL;
> > +
> > + object_ref(OBJECT(migr));
> > +
> > + migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> > + MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > + qemu_event_set(&mis->thread_sync_event);
> > + trace_postcopy_ram_listen_thread_start();
> > +
> > + rcu_register_thread();
> > + /*
> > + * Because we're a thread and not a coroutine we can't yield
> > + * in qemu_file, and thus we must be blocking now.
> > + */
> > + qemu_file_set_blocking(f, true, &error_fatal);
> > +
> > + /* TODO: sanity check that only postcopiable data will be loaded here */
> > + load_res = qemu_loadvm_state_main(f, mis, &local_err);
> > +
> > + /*
> > + * This is tricky, but, mis->from_src_file can change after it
> > + * returns, when postcopy recovery happened. In the future, we may
> > + * want a wrapper for the QEMUFile handle.
> > + */
> > + f = mis->from_src_file;
> > +
> > + /* And non-blocking again so we don't block in any cleanup */
> > + qemu_file_set_blocking(f, false, &error_fatal);
> > +
> > + trace_postcopy_ram_listen_thread_exit();
> > + if (load_res < 0) {
> > + qemu_file_set_error(f, load_res);
> > + dirty_bitmap_mig_cancel_incoming();
> > + if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> > + !migrate_postcopy_ram() && migrate_dirty_bitmaps())
> > + {
> > + error_report("%s: loadvm failed during postcopy: %d. All states "
> > + "are migrated except dirty bitmaps. Some dirty "
> > + "bitmaps may be lost, and present migrated dirty "
> > + "bitmaps are correctly migrated and valid.",
> > + __func__, load_res);
> > + load_res = 0; /* prevent further exit() */
> > + } else {
> > + error_prepend(&local_err,
> > + "loadvm failed during postcopy: %d: ", load_res);
> > + migrate_set_error(migr, local_err);
> > + error_report_err(local_err);
> > + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > + MIGRATION_STATUS_FAILED);
> > + }
> > + }
> > + if (load_res >= 0) {
> > + /*
> > + * This looks good, but it's possible that the device loading in the
> > + * main thread hasn't finished yet, and so we might not be in 'RUN'
> > + * state yet; wait for the end of the main thread.
> > + */
> > + qemu_event_wait(&mis->main_thread_load_event);
> > + }
> > + postcopy_incoming_cleanup(mis);
>
> Here I did notice that this replaced the old
> postcopy_ram_incoming_cleanup(). I'm just curious: is it needed to check
> postcopy-ram=on once more?
>
> The two callers of postcopy_incoming_cleanup() should always have
> postcopy-ram enabled, right?
They actually don't. Postcopy listen thread can be also used by
dirty-bitmaps with postcopy-ram capability disabled, hence why I also
renamed it to postcopy_listen_thread instead of
postcopy_ram_listen_thread.
Currently, postcopy_ram_incoming_cleanup() will do nothing if
postcopy-ram is not enabled, as it checks for each thread, however, I
think it's better to check for postcopy-ram beforehand. On the other
hand, postcopy-ram capability is always checked before calling
postcopy_ram_incoming_setup().
>
> > +
> > + if (load_res < 0) {
> > + /*
> > + * If something went wrong then we have a bad state so exit;
> > + * depending how far we got it might be possible at this point
> > + * to leave the guest running and fire MCEs for pages that never
> > + * arrived as a desperate recovery step.
> > + */
> > + rcu_unregister_thread();
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > + MIGRATION_STATUS_COMPLETED);
> > + /*
> > + * If everything has worked fine, then the main thread has waited
> > + * for us to start, and we're the last use of the mis.
> > + * (If something broke then qemu will have to exit anyway since it's
> > + * got a bad migration state).
> > + */
> > + bql_lock();
> > + migration_incoming_state_destroy();
> > + bql_unlock();
> > +
> > + rcu_unregister_thread();
> > + mis->have_listen_thread = false;
> > + postcopy_state_set(POSTCOPY_INCOMING_END);
> > +
> > + object_unref(OBJECT(migr));
> > +
> > + return NULL;
> > +}
> > +
> > +int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp)
> > +{
> > + /*
> > + * Sensitise RAM - can now generate requests for blocks that don't exist
> > + * However, at this point the CPU shouldn't be running, and the IO
> > + * shouldn't be doing anything yet so don't actually expect requests
> > + */
> > + if (migrate_postcopy_ram()) {
> > + if (postcopy_ram_incoming_setup(mis)) {
> > + postcopy_ram_incoming_cleanup(mis);
> > + error_setg(errp, "Failed to setup incoming postcopy RAM blocks");
> > + return -1;
> > + }
> > + }
> > +
> > + trace_loadvm_postcopy_handle_listen("after uffd");
> > +
> > + if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, errp)) {
> > + return -1;
> > + }
> > +
> > + mis->have_listen_thread = true;
> > + postcopy_thread_create(mis, &mis->listen_thread,
> > + MIGRATION_THREAD_DST_LISTEN,
> > + postcopy_listen_thread, QEMU_THREAD_DETACHED);
> > +
> > + return 0;
> > +}
> > +
> > +int postcopy_incoming_cleanup(MigrationIncomingState *mis)
> > +{
> > + int rc = 0;
> > +
> > + if (migrate_postcopy_ram()) {
> > + rc = postcopy_ram_incoming_cleanup(mis);
> > + }
> > +
> > + return rc;
> > +}
> > diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> > index ca19433b24..a080dd65a7 100644
> > --- a/migration/postcopy-ram.h
> > +++ b/migration/postcopy-ram.h
> > @@ -199,4 +199,7 @@ bool postcopy_is_paused(MigrationStatus status);
> > void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
> > RAMBlock *rb);
> >
> > +int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp);
> > +int postcopy_incoming_cleanup(MigrationIncomingState *mis);
> > +
> > #endif
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 7b35ec4dd0..96a2699ca7 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2087,112 +2087,6 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
> > return 0;
> > }
> >
> > -/*
> > - * Triggered by a postcopy_listen command; this thread takes over reading
> > - * the input stream, leaving the main thread free to carry on loading the rest
> > - * of the device state (from RAM).
> > - * (TODO:This could do with being in a postcopy file - but there again it's
> > - * just another input loop, not that postcopy specific)
> > - */
> > -static void *postcopy_ram_listen_thread(void *opaque)
> > -{
> > - MigrationIncomingState *mis = migration_incoming_get_current();
> > - QEMUFile *f = mis->from_src_file;
> > - int load_res;
> > - MigrationState *migr = migrate_get_current();
> > - Error *local_err = NULL;
> > -
> > - object_ref(OBJECT(migr));
> > -
> > - migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> > - MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > - qemu_event_set(&mis->thread_sync_event);
> > - trace_postcopy_ram_listen_thread_start();
> > -
> > - rcu_register_thread();
> > - /*
> > - * Because we're a thread and not a coroutine we can't yield
> > - * in qemu_file, and thus we must be blocking now.
> > - */
> > - qemu_file_set_blocking(f, true, &error_fatal);
> > -
> > - /* TODO: sanity check that only postcopiable data will be loaded here */
> > - load_res = qemu_loadvm_state_main(f, mis, &local_err);
> > -
> > - /*
> > - * This is tricky, but, mis->from_src_file can change after it
> > - * returns, when postcopy recovery happened. In the future, we may
> > - * want a wrapper for the QEMUFile handle.
> > - */
> > - f = mis->from_src_file;
> > -
> > - /* And non-blocking again so we don't block in any cleanup */
> > - qemu_file_set_blocking(f, false, &error_fatal);
> > -
> > - trace_postcopy_ram_listen_thread_exit();
> > - if (load_res < 0) {
> > - qemu_file_set_error(f, load_res);
> > - dirty_bitmap_mig_cancel_incoming();
> > - if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> > - !migrate_postcopy_ram() && migrate_dirty_bitmaps())
> > - {
> > - error_report("%s: loadvm failed during postcopy: %d. All states "
> > - "are migrated except dirty bitmaps. Some dirty "
> > - "bitmaps may be lost, and present migrated dirty "
> > - "bitmaps are correctly migrated and valid.",
> > - __func__, load_res);
> > - load_res = 0; /* prevent further exit() */
> > - } else {
> > - error_prepend(&local_err,
> > - "loadvm failed during postcopy: %d: ", load_res);
> > - migrate_set_error(migr, local_err);
> > - error_report_err(local_err);
> > - migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > - MIGRATION_STATUS_FAILED);
> > - }
> > - }
> > - if (load_res >= 0) {
> > - /*
> > - * This looks good, but it's possible that the device loading in the
> > - * main thread hasn't finished yet, and so we might not be in 'RUN'
> > - * state yet; wait for the end of the main thread.
> > - */
> > - qemu_event_wait(&mis->main_thread_load_event);
> > - }
> > - postcopy_ram_incoming_cleanup(mis);
> > -
> > - if (load_res < 0) {
> > - /*
> > - * If something went wrong then we have a bad state so exit;
> > - * depending how far we got it might be possible at this point
> > - * to leave the guest running and fire MCEs for pages that never
> > - * arrived as a desperate recovery step.
> > - */
> > - rcu_unregister_thread();
> > - exit(EXIT_FAILURE);
> > - }
> > -
> > - migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > - MIGRATION_STATUS_COMPLETED);
> > - /*
> > - * If everything has worked fine, then the main thread has waited
> > - * for us to start, and we're the last use of the mis.
> > - * (If something broke then qemu will have to exit anyway since it's
> > - * got a bad migration state).
> > - */
> > - bql_lock();
> > - migration_incoming_state_destroy();
> > - bql_unlock();
> > -
> > - rcu_unregister_thread();
> > - mis->have_listen_thread = false;
> > - postcopy_state_set(POSTCOPY_INCOMING_END);
> > -
> > - object_unref(OBJECT(migr));
> > -
> > - return NULL;
> > -}
> > -
> > /* After this message we must be able to immediately receive postcopy data */
> > static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis,
> > Error **errp)
> > @@ -2218,32 +2112,11 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis,
> >
> > trace_loadvm_postcopy_handle_listen("after discard");
> >
> > - /*
> > - * Sensitise RAM - can now generate requests for blocks that don't exist
> > - * However, at this point the CPU shouldn't be running, and the IO
> > - * shouldn't be doing anything yet so don't actually expect requests
> > - */
> > - if (migrate_postcopy_ram()) {
> > - if (postcopy_ram_incoming_setup(mis)) {
> > - postcopy_ram_incoming_cleanup(mis);
> > - error_setg(errp, "Failed to setup incoming postcopy RAM blocks");
> > - return -1;
> > - }
> > - }
> > + int rc = postcopy_incoming_setup(mis, errp);
> >
> > - trace_loadvm_postcopy_handle_listen("after uffd");
> > -
> > - if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, errp)) {
> > - return -1;
> > - }
> > -
> > - mis->have_listen_thread = true;
> > - postcopy_thread_create(mis, &mis->listen_thread,
> > - MIGRATION_THREAD_DST_LISTEN,
> > - postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
> > trace_loadvm_postcopy_handle_listen("return");
> >
> > - return 0;
> > + return rc;
> > }
> >
> > static void loadvm_postcopy_handle_run_bh(void *opaque)
> > --
> > 2.51.0
> >
>
> --
> Peter Xu
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 3/4] migration: Refactor all incoming cleanup into migration_incoming_cleanup()
2025-10-27 15:41 [PATCH v2 0/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
2025-10-27 15:41 ` [PATCH v2 1/4] migration: Do not try to start VM if disk activation fails Juraj Marcin
2025-10-27 15:41 ` [PATCH v2 2/4] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c Juraj Marcin
@ 2025-10-27 15:41 ` Juraj Marcin
2025-10-29 22:02 ` Peter Xu
2025-10-27 15:41 ` [PATCH v2 4/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
3 siblings, 1 reply; 9+ messages in thread
From: Juraj Marcin @ 2025-10-27 15:41 UTC (permalink / raw)
To: qemu-devel
Cc: Juraj Marcin, Peter Xu, Dr. David Alan Gilbert, Jiri Denemark,
Fabiano Rosas
From: Juraj Marcin <jmarcin@redhat.com>
Currently, there are two functions that are responsible for cleanup of
the incoming migration state. With successful precopy, it's the process
incoming migration coroutine and with successful postcopy it's the
postcopy listen thread. However, if postcopy fails during in the device
load, both functions will try to do the cleanup. Moreover, when
exit-on-error parameter was added, it was applied only to precopy.
This patch introduces exit-on-error to postcopy and refactors incoming
migration cleanup into a common function while defining a clear boundary
of who is responsible for the cleanup. The process incoming migration
coroutine is responsible for running the cleanup function, unless the
listen thread has been started (the postcopy state is at least in
LISTENING phase), in which case the postcopy listen thread runs the
incoming migration cleanup in a BH, which also joins the listen thread.
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
---
migration/migration-hmp-cmds.c | 2 +-
migration/migration.c | 77 +++++++++++++++++-----------------
migration/migration.h | 4 +-
migration/postcopy-ram.c | 53 ++++++++++-------------
migration/savevm.c | 4 +-
qapi/migration.json | 9 +++-
system/vl.c | 3 +-
7 files changed, 75 insertions(+), 77 deletions(-)
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 847d18faaa..c572c1fb28 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -521,7 +521,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
}
QAPI_LIST_PREPEND(caps, g_steal_pointer(&channel));
- qmp_migrate_incoming(NULL, true, caps, true, false, &err);
+ qmp_migrate_incoming(NULL, true, caps, true, false, true, false, &err);
qapi_free_MigrationChannelList(caps);
end:
diff --git a/migration/migration.c b/migration/migration.c
index 8827884a15..2cc6327c39 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -328,6 +328,7 @@ void migration_object_init(void)
current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR;
+ current_incoming->postcopy_exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR;
migration_object_check(current_migration, &error_fatal);
@@ -436,12 +437,18 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
}
}
-void migration_incoming_state_destroy(void)
+void migration_incoming_cleanup(void)
{
- struct MigrationIncomingState *mis = migration_incoming_get_current();
+ MigrationState *ms = migrate_get_current();
+ MigrationIncomingState *mis = migration_incoming_get_current();
+ PostcopyState ps = postcopy_state_get();
multifd_recv_cleanup();
+ if (ps != POSTCOPY_INCOMING_NONE) {
+ postcopy_incoming_cleanup(mis);
+ }
+
/*
* RAM state cleanup needs to happen after multifd cleanup, because
* multifd threads can use some of its states (receivedmap).
@@ -493,6 +500,22 @@ void migration_incoming_state_destroy(void)
cpr_set_incoming_mode(MIG_MODE_NONE);
yank_unregister_instance(MIGRATION_YANK_INSTANCE);
+
+ if (mis->state == MIGRATION_STATUS_FAILED &&
+ ((ps < POSTCOPY_INCOMING_LISTENING && mis->exit_on_error) ||
+ (ps >= POSTCOPY_INCOMING_LISTENING && mis->postcopy_exit_on_error))) {
+ WITH_QEMU_LOCK_GUARD(&ms->error_mutex) {
+ error_report_err(ms->error);
+ ms->error = NULL;
+ }
+
+ exit(EXIT_FAILURE);
+ }
+}
+
+void migration_incoming_cleanup_bh(void *opaque)
+{
+ migration_incoming_cleanup();
}
static void migrate_generate_event(MigrationStatus new_state)
@@ -858,7 +881,7 @@ static void process_incoming_migration_bh(void *opaque)
*/
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_COMPLETED);
- migration_incoming_state_destroy();
+ migration_incoming_cleanup();
}
static void coroutine_fn
@@ -885,23 +908,13 @@ process_incoming_migration_co(void *opaque)
ps = postcopy_state_get();
trace_process_incoming_migration_co_end(ret, ps);
- if (ps != POSTCOPY_INCOMING_NONE) {
- if (ps == POSTCOPY_INCOMING_ADVISE) {
- /*
- * Where a migration had postcopy enabled (and thus went to advise)
- * but managed to complete within the precopy period, we can use
- * the normal exit.
- */
- postcopy_incoming_cleanup(mis);
- } else if (ret >= 0) {
- /*
- * Postcopy was started, cleanup should happen at the end of the
- * postcopy thread.
- */
- trace_process_incoming_migration_co_postcopy_end_main();
- goto out;
- }
- /* Else if something went wrong then just fall out of the normal exit */
+ if (ps >= POSTCOPY_INCOMING_LISTENING) {
+ /*
+ * Postcopy was started, cleanup should happen at the end of the
+ * postcopy listen thread.
+ */
+ trace_process_incoming_migration_co_postcopy_end_main();
+ goto out;
}
if (ret < 0) {
@@ -924,25 +937,7 @@ fail:
migrate_set_error(s, local_err);
error_free(local_err);
- migration_incoming_state_destroy();
-
- if (mis->exit_on_error) {
- WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
- error_report_err(s->error);
- s->error = NULL;
- }
-
- exit(EXIT_FAILURE);
- } else {
- /*
- * Report the error here in case that QEMU abruptly exits
- * when postcopy is enabled.
- */
- WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
- error_report_err(s->error);
- s->error = NULL;
- }
- }
+ migration_incoming_cleanup();
out:
/* Pairs with the refcount taken in qmp_migrate_incoming() */
migrate_incoming_unref_outgoing_state();
@@ -1968,6 +1963,8 @@ void migrate_del_blocker(Error **reasonp)
void qmp_migrate_incoming(const char *uri, bool has_channels,
MigrationChannelList *channels,
bool has_exit_on_error, bool exit_on_error,
+ bool has_postcopy_exit_on_error,
+ bool postcopy_exit_on_error,
Error **errp)
{
Error *local_err = NULL;
@@ -1989,6 +1986,8 @@ void qmp_migrate_incoming(const char *uri, bool has_channels,
mis->exit_on_error =
has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR;
+ mis->postcopy_exit_on_error = has_postcopy_exit_on_error ?
+ postcopy_exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR;
qemu_start_incoming_migration(uri, has_channels, channels, &local_err);
diff --git a/migration/migration.h b/migration/migration.h
index 01329bf824..9d9e95ae90 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -249,10 +249,12 @@ struct MigrationIncomingState {
/* Do exit on incoming migration failure */
bool exit_on_error;
+ bool postcopy_exit_on_error;
};
MigrationIncomingState *migration_incoming_get_current(void);
-void migration_incoming_state_destroy(void);
+void migration_incoming_cleanup(void);
+void migration_incoming_cleanup_bh(void *opaque);
void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
/*
* Functions to work with blocktime context
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index dbbb2dfb78..0375366ed0 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2130,53 +2130,37 @@ static void *postcopy_listen_thread(void *opaque)
"bitmaps may be lost, and present migrated dirty "
"bitmaps are correctly migrated and valid.",
__func__, load_res);
- load_res = 0; /* prevent further exit() */
} else {
+ /*
+ * Something went fatally wrong and we have a bad state, QEMU will
+ * exit depending on if postcopy-exit-on-error is true, but the
+ * migration cannot be recovered.
+ */
error_prepend(&local_err,
"loadvm failed during postcopy: %d: ", load_res);
migrate_set_error(migr, local_err);
error_report_err(local_err);
migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
MIGRATION_STATUS_FAILED);
+ goto out;
}
}
- if (load_res >= 0) {
- /*
- * This looks good, but it's possible that the device loading in the
- * main thread hasn't finished yet, and so we might not be in 'RUN'
- * state yet; wait for the end of the main thread.
- */
- qemu_event_wait(&mis->main_thread_load_event);
- }
- postcopy_incoming_cleanup(mis);
-
- if (load_res < 0) {
- /*
- * If something went wrong then we have a bad state so exit;
- * depending how far we got it might be possible at this point
- * to leave the guest running and fire MCEs for pages that never
- * arrived as a desperate recovery step.
- */
- rcu_unregister_thread();
- exit(EXIT_FAILURE);
- }
+ /*
+ * This looks good, but it's possible that the device loading in the
+ * main thread hasn't finished yet, and so we might not be in 'RUN'
+ * state yet; wait for the end of the main thread.
+ */
+ qemu_event_wait(&mis->main_thread_load_event);
migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
MIGRATION_STATUS_COMPLETED);
- /*
- * If everything has worked fine, then the main thread has waited
- * for us to start, and we're the last use of the mis.
- * (If something broke then qemu will have to exit anyway since it's
- * got a bad migration state).
- */
- bql_lock();
- migration_incoming_state_destroy();
- bql_unlock();
+out:
rcu_unregister_thread();
- mis->have_listen_thread = false;
postcopy_state_set(POSTCOPY_INCOMING_END);
+ migration_bh_schedule(migration_incoming_cleanup_bh, NULL);
+
object_unref(OBJECT(migr));
return NULL;
@@ -2206,7 +2190,7 @@ int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp)
mis->have_listen_thread = true;
postcopy_thread_create(mis, &mis->listen_thread,
MIGRATION_THREAD_DST_LISTEN,
- postcopy_listen_thread, QEMU_THREAD_DETACHED);
+ postcopy_listen_thread, QEMU_THREAD_JOINABLE);
return 0;
}
@@ -2215,6 +2199,11 @@ int postcopy_incoming_cleanup(MigrationIncomingState *mis)
{
int rc = 0;
+ if (mis->have_listen_thread) {
+ qemu_thread_join(&mis->listen_thread);
+ mis->have_listen_thread = false;
+ }
+
if (migrate_postcopy_ram()) {
rc = postcopy_ram_incoming_cleanup(mis);
}
diff --git a/migration/savevm.c b/migration/savevm.c
index 96a2699ca7..27fe815731 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3368,7 +3368,7 @@ void qmp_xen_load_devices_state(const char *filename, Error **errp)
if (ret < 0) {
error_prepend(errp, "loading Xen device state failed: ");
}
- migration_incoming_state_destroy();
+ migration_incoming_cleanup();
}
bool load_snapshot(const char *name, const char *vmstate,
@@ -3438,7 +3438,7 @@ bool load_snapshot(const char *name, const char *vmstate,
goto err_drain;
}
ret = qemu_loadvm_state(f, errp);
- migration_incoming_state_destroy();
+ migration_incoming_cleanup();
bdrv_drain_all_end();
diff --git a/qapi/migration.json b/qapi/migration.json
index be0f3fcc12..da5b3a4d8c 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1817,6 +1817,12 @@
# event, and error details could be retrieved with `query-migrate`.
# (since 9.1)
#
+# @postcopy-exit-on-error: same es exit-on-error, but used during postcopy
+# migration. Default true. When set to true, QEMU only exits if a fatal
+# unrecoverable error happens during postcopy, in which case QEMU would
+# exit even before the introduction of this option.
+# (since 10.2)
+#
# Since: 2.3
#
# .. admonition:: Notes
@@ -1870,7 +1876,8 @@
{ 'command': 'migrate-incoming',
'data': {'*uri': 'str',
'*channels': [ 'MigrationChannel' ],
- '*exit-on-error': 'bool' } }
+ '*exit-on-error': 'bool',
+ '*postcopy-exit-on-error': 'bool' } }
##
# @xen-save-devices-state:
diff --git a/system/vl.c b/system/vl.c
index 17bbc092c8..1308d92901 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -2824,7 +2824,8 @@ void qmp_x_exit_preconfig(Error **errp)
g_new0(MigrationChannelList, 1);
channels->value = incoming_channels[MIGRATION_CHANNEL_TYPE_MAIN];
- qmp_migrate_incoming(NULL, true, channels, true, true, &local_err);
+ qmp_migrate_incoming(NULL, true, channels, true, true, true, true,
+ &local_err);
if (local_err) {
error_reportf_err(local_err, "-incoming %s: ", incoming);
exit(1);
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v2 3/4] migration: Refactor all incoming cleanup into migration_incoming_cleanup()
2025-10-27 15:41 ` [PATCH v2 3/4] migration: Refactor all incoming cleanup into migration_incoming_cleanup() Juraj Marcin
@ 2025-10-29 22:02 ` Peter Xu
0 siblings, 0 replies; 9+ messages in thread
From: Peter Xu @ 2025-10-29 22:02 UTC (permalink / raw)
To: Juraj Marcin
Cc: qemu-devel, Dr. David Alan Gilbert, Jiri Denemark, Fabiano Rosas
On Mon, Oct 27, 2025 at 04:41:10PM +0100, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
>
> Currently, there are two functions that are responsible for cleanup of
> the incoming migration state. With successful precopy, it's the process
> incoming migration coroutine and with successful postcopy it's the
> postcopy listen thread. However, if postcopy fails during in the device
> load, both functions will try to do the cleanup. Moreover, when
> exit-on-error parameter was added, it was applied only to precopy.
>
> This patch introduces exit-on-error to postcopy and refactors incoming
> migration cleanup into a common function while defining a clear boundary
> of who is responsible for the cleanup. The process incoming migration
> coroutine is responsible for running the cleanup function, unless the
> listen thread has been started (the postcopy state is at least in
> LISTENING phase), in which case the postcopy listen thread runs the
> incoming migration cleanup in a BH, which also joins the listen thread.
>
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
> ---
> migration/migration-hmp-cmds.c | 2 +-
> migration/migration.c | 77 +++++++++++++++++-----------------
> migration/migration.h | 4 +-
> migration/postcopy-ram.c | 53 ++++++++++-------------
> migration/savevm.c | 4 +-
> qapi/migration.json | 9 +++-
> system/vl.c | 3 +-
> 7 files changed, 75 insertions(+), 77 deletions(-)
>
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 847d18faaa..c572c1fb28 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -521,7 +521,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
> }
> QAPI_LIST_PREPEND(caps, g_steal_pointer(&channel));
>
> - qmp_migrate_incoming(NULL, true, caps, true, false, &err);
> + qmp_migrate_incoming(NULL, true, caps, true, false, true, false, &err);
> qapi_free_MigrationChannelList(caps);
>
> end:
> diff --git a/migration/migration.c b/migration/migration.c
> index 8827884a15..2cc6327c39 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -328,6 +328,7 @@ void migration_object_init(void)
> current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
>
> current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR;
> + current_incoming->postcopy_exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR;
Postcopy by default should almost be ignoring such concept of exit-on-error
because postcopy cannot fail as long as reaching ACTIVE..
Then this new flag is only about the device state transfer, which is a
trivially small window. The thing is, we were broken anyway for this small
window before this series if it can fail... So personally I still prefer
reusing exit-on-error for this very small window, and I keep thinking
having a new knob is an overkill..
We used to get some complains on having too much knobs for migration. I
used to scan them, most of them were still useful. However IMHO we should
still avoid introducing new configurables unless it is required.
>
> migration_object_check(current_migration, &error_fatal);
>
> @@ -436,12 +437,18 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
> }
> }
>
> -void migration_incoming_state_destroy(void)
> +void migration_incoming_cleanup(void)
> {
> - struct MigrationIncomingState *mis = migration_incoming_get_current();
> + MigrationState *ms = migrate_get_current();
> + MigrationIncomingState *mis = migration_incoming_get_current();
> + PostcopyState ps = postcopy_state_get();
>
> multifd_recv_cleanup();
>
> + if (ps != POSTCOPY_INCOMING_NONE) {
> + postcopy_incoming_cleanup(mis);
> + }
> +
> /*
> * RAM state cleanup needs to happen after multifd cleanup, because
> * multifd threads can use some of its states (receivedmap).
> @@ -493,6 +500,22 @@ void migration_incoming_state_destroy(void)
>
> cpr_set_incoming_mode(MIG_MODE_NONE);
> yank_unregister_instance(MIGRATION_YANK_INSTANCE);
> +
> + if (mis->state == MIGRATION_STATUS_FAILED &&
> + ((ps < POSTCOPY_INCOMING_LISTENING && mis->exit_on_error) ||
> + (ps >= POSTCOPY_INCOMING_LISTENING && mis->postcopy_exit_on_error))) {
> + WITH_QEMU_LOCK_GUARD(&ms->error_mutex) {
> + error_report_err(ms->error);
> + ms->error = NULL;
> + }
> +
> + exit(EXIT_FAILURE);
> + }
> +}
> +
> +void migration_incoming_cleanup_bh(void *opaque)
> +{
> + migration_incoming_cleanup();
> }
>
> static void migrate_generate_event(MigrationStatus new_state)
> @@ -858,7 +881,7 @@ static void process_incoming_migration_bh(void *opaque)
> */
> migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> MIGRATION_STATUS_COMPLETED);
> - migration_incoming_state_destroy();
> + migration_incoming_cleanup();
> }
>
> static void coroutine_fn
> @@ -885,23 +908,13 @@ process_incoming_migration_co(void *opaque)
>
> ps = postcopy_state_get();
> trace_process_incoming_migration_co_end(ret, ps);
> - if (ps != POSTCOPY_INCOMING_NONE) {
> - if (ps == POSTCOPY_INCOMING_ADVISE) {
> - /*
> - * Where a migration had postcopy enabled (and thus went to advise)
> - * but managed to complete within the precopy period, we can use
> - * the normal exit.
> - */
> - postcopy_incoming_cleanup(mis);
> - } else if (ret >= 0) {
> - /*
> - * Postcopy was started, cleanup should happen at the end of the
> - * postcopy thread.
> - */
> - trace_process_incoming_migration_co_postcopy_end_main();
> - goto out;
> - }
> - /* Else if something went wrong then just fall out of the normal exit */
> + if (ps >= POSTCOPY_INCOMING_LISTENING) {
> + /*
> + * Postcopy was started, cleanup should happen at the end of the
> + * postcopy listen thread.
> + */
I think this is correct. However since we've removed most of the ps checks
(which is definitely good.. this path was very hard to read before this
series) then here IMHO we could also directly detect the existance of the
thread:
if (mis->have_listen_thread)
...
That'll make it crystal clear to skip cleanup when the thread is created,
irrelevant of how POSTCOPY status is relevant to the "thread creation"
operation itself, e.g. if we have a fail path after set LISTEN but before
creating the thread then we may have issues.
Side benefit is we can drop postcopy_state_get() call too above.
> + trace_process_incoming_migration_co_postcopy_end_main();
> + goto out;
> }
>
> if (ret < 0) {
> @@ -924,25 +937,7 @@ fail:
> migrate_set_error(s, local_err);
> error_free(local_err);
>
> - migration_incoming_state_destroy();
> -
> - if (mis->exit_on_error) {
> - WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> - error_report_err(s->error);
> - s->error = NULL;
> - }
> -
> - exit(EXIT_FAILURE);
> - } else {
> - /*
> - * Report the error here in case that QEMU abruptly exits
> - * when postcopy is enabled.
> - */
> - WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> - error_report_err(s->error);
> - s->error = NULL;
> - }
> - }
> + migration_incoming_cleanup();
> out:
> /* Pairs with the refcount taken in qmp_migrate_incoming() */
> migrate_incoming_unref_outgoing_state();
> @@ -1968,6 +1963,8 @@ void migrate_del_blocker(Error **reasonp)
> void qmp_migrate_incoming(const char *uri, bool has_channels,
> MigrationChannelList *channels,
> bool has_exit_on_error, bool exit_on_error,
> + bool has_postcopy_exit_on_error,
> + bool postcopy_exit_on_error,
> Error **errp)
> {
> Error *local_err = NULL;
> @@ -1989,6 +1986,8 @@ void qmp_migrate_incoming(const char *uri, bool has_channels,
>
> mis->exit_on_error =
> has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR;
> + mis->postcopy_exit_on_error = has_postcopy_exit_on_error ?
> + postcopy_exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR;
>
> qemu_start_incoming_migration(uri, has_channels, channels, &local_err);
>
> diff --git a/migration/migration.h b/migration/migration.h
> index 01329bf824..9d9e95ae90 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -249,10 +249,12 @@ struct MigrationIncomingState {
>
> /* Do exit on incoming migration failure */
> bool exit_on_error;
> + bool postcopy_exit_on_error;
> };
>
> MigrationIncomingState *migration_incoming_get_current(void);
> -void migration_incoming_state_destroy(void);
> +void migration_incoming_cleanup(void);
> +void migration_incoming_cleanup_bh(void *opaque);
> void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
> /*
> * Functions to work with blocktime context
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index dbbb2dfb78..0375366ed0 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -2130,53 +2130,37 @@ static void *postcopy_listen_thread(void *opaque)
> "bitmaps may be lost, and present migrated dirty "
> "bitmaps are correctly migrated and valid.",
> __func__, load_res);
> - load_res = 0; /* prevent further exit() */
> } else {
> + /*
> + * Something went fatally wrong and we have a bad state, QEMU will
> + * exit depending on if postcopy-exit-on-error is true, but the
> + * migration cannot be recovered.
> + */
> error_prepend(&local_err,
> "loadvm failed during postcopy: %d: ", load_res);
> migrate_set_error(migr, local_err);
> error_report_err(local_err);
> migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> MIGRATION_STATUS_FAILED);
> + goto out;
> }
> }
> - if (load_res >= 0) {
> - /*
> - * This looks good, but it's possible that the device loading in the
> - * main thread hasn't finished yet, and so we might not be in 'RUN'
> - * state yet; wait for the end of the main thread.
> - */
> - qemu_event_wait(&mis->main_thread_load_event);
> - }
> - postcopy_incoming_cleanup(mis);
> -
> - if (load_res < 0) {
> - /*
> - * If something went wrong then we have a bad state so exit;
> - * depending how far we got it might be possible at this point
> - * to leave the guest running and fire MCEs for pages that never
> - * arrived as a desperate recovery step.
> - */
> - rcu_unregister_thread();
> - exit(EXIT_FAILURE);
> - }
> + /*
> + * This looks good, but it's possible that the device loading in the
> + * main thread hasn't finished yet, and so we might not be in 'RUN'
> + * state yet; wait for the end of the main thread.
> + */
> + qemu_event_wait(&mis->main_thread_load_event);
>
> migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> MIGRATION_STATUS_COMPLETED);
> - /*
> - * If everything has worked fine, then the main thread has waited
> - * for us to start, and we're the last use of the mis.
> - * (If something broke then qemu will have to exit anyway since it's
> - * got a bad migration state).
> - */
> - bql_lock();
> - migration_incoming_state_destroy();
> - bql_unlock();
>
> +out:
> rcu_unregister_thread();
> - mis->have_listen_thread = false;
> postcopy_state_set(POSTCOPY_INCOMING_END);
>
> + migration_bh_schedule(migration_incoming_cleanup_bh, NULL);
> +
> object_unref(OBJECT(migr));
>
> return NULL;
> @@ -2206,7 +2190,7 @@ int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp)
> mis->have_listen_thread = true;
> postcopy_thread_create(mis, &mis->listen_thread,
> MIGRATION_THREAD_DST_LISTEN,
> - postcopy_listen_thread, QEMU_THREAD_DETACHED);
> + postcopy_listen_thread, QEMU_THREAD_JOINABLE);
I feel like some of the changes in this patch can be split into a few more
patches.
>
> return 0;
> }
> @@ -2215,6 +2199,11 @@ int postcopy_incoming_cleanup(MigrationIncomingState *mis)
> {
> int rc = 0;
>
> + if (mis->have_listen_thread) {
> + qemu_thread_join(&mis->listen_thread);
> + mis->have_listen_thread = false;
> + }
> +
> if (migrate_postcopy_ram()) {
> rc = postcopy_ram_incoming_cleanup(mis);
> }
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 96a2699ca7..27fe815731 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -3368,7 +3368,7 @@ void qmp_xen_load_devices_state(const char *filename, Error **errp)
> if (ret < 0) {
> error_prepend(errp, "loading Xen device state failed: ");
> }
> - migration_incoming_state_destroy();
> + migration_incoming_cleanup();
> }
>
> bool load_snapshot(const char *name, const char *vmstate,
> @@ -3438,7 +3438,7 @@ bool load_snapshot(const char *name, const char *vmstate,
> goto err_drain;
> }
> ret = qemu_loadvm_state(f, errp);
> - migration_incoming_state_destroy();
> + migration_incoming_cleanup();
Do we need to touch these two places? I wonder if there'll be side effects
like "exit-on-error" start to affect tither xen load device or load
snapshot, where it used to only apply to live migrations.
>
> bdrv_drain_all_end();
>
> diff --git a/qapi/migration.json b/qapi/migration.json
> index be0f3fcc12..da5b3a4d8c 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1817,6 +1817,12 @@
> # event, and error details could be retrieved with `query-migrate`.
> # (since 9.1)
> #
> +# @postcopy-exit-on-error: same es exit-on-error, but used during postcopy
> +# migration. Default true. When set to true, QEMU only exits if a fatal
> +# unrecoverable error happens during postcopy, in which case QEMU would
> +# exit even before the introduction of this option.
> +# (since 10.2)
> +#
> # Since: 2.3
> #
> # .. admonition:: Notes
> @@ -1870,7 +1876,8 @@
> { 'command': 'migrate-incoming',
> 'data': {'*uri': 'str',
> '*channels': [ 'MigrationChannel' ],
> - '*exit-on-error': 'bool' } }
> + '*exit-on-error': 'bool',
> + '*postcopy-exit-on-error': 'bool' } }
>
> ##
> # @xen-save-devices-state:
> diff --git a/system/vl.c b/system/vl.c
> index 17bbc092c8..1308d92901 100644
> --- a/system/vl.c
> +++ b/system/vl.c
> @@ -2824,7 +2824,8 @@ void qmp_x_exit_preconfig(Error **errp)
> g_new0(MigrationChannelList, 1);
>
> channels->value = incoming_channels[MIGRATION_CHANNEL_TYPE_MAIN];
> - qmp_migrate_incoming(NULL, true, channels, true, true, &local_err);
> + qmp_migrate_incoming(NULL, true, channels, true, true, true, true,
> + &local_err);
> if (local_err) {
> error_reportf_err(local_err, "-incoming %s: ", incoming);
> exit(1);
> --
> 2.51.0
>
--
Peter Xu
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 4/4] migration: Introduce POSTCOPY_DEVICE state
2025-10-27 15:41 [PATCH v2 0/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
` (2 preceding siblings ...)
2025-10-27 15:41 ` [PATCH v2 3/4] migration: Refactor all incoming cleanup into migration_incoming_cleanup() Juraj Marcin
@ 2025-10-27 15:41 ` Juraj Marcin
2025-10-29 22:57 ` Peter Xu
3 siblings, 1 reply; 9+ messages in thread
From: Juraj Marcin @ 2025-10-27 15:41 UTC (permalink / raw)
To: qemu-devel
Cc: Juraj Marcin, Peter Xu, Dr. David Alan Gilbert, Jiri Denemark,
Fabiano Rosas
From: Juraj Marcin <jmarcin@redhat.com>
Currently, when postcopy starts, the source VM starts switchover and
sends a package containing the state of all non-postcopiable devices.
When the destination loads this package, the switchover is complete and
the destination VM starts. However, if the device state load fails or
the destination side crashes, the source side is already in
POSTCOPY_ACTIVE state and cannot be recovered, even when it has the most
up-to-date machine state as the destination has not yet started.
This patch introduces a new POSTCOPY_DEVICE state which is active
while the destination machine is loading the device state, is not yet
running, and the source side can be resumed in case of a migration
failure.
To transition from POSTCOPY_DEVICE to POSTCOPY_ACTIVE, the source
side uses a PONG message that is a response to a PING message processed
just before the POSTCOPY_RUN command that starts the destination VM.
Thus, this feature is effective even if the destination side does not
yet support this new state.
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
---
migration/migration.c | 42 ++++++++++++++++++++++++---
migration/migration.h | 3 ++
migration/postcopy-ram.c | 8 +++--
migration/savevm.c | 2 ++
migration/savevm.h | 2 ++
migration/trace-events | 1 +
qapi/migration.json | 8 +++--
tests/qtest/migration/precopy-tests.c | 3 +-
8 files changed, 60 insertions(+), 9 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 2cc6327c39..38f1a888a1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1217,6 +1217,7 @@ bool migration_is_running(void)
switch (s->state) {
case MIGRATION_STATUS_ACTIVE:
+ case MIGRATION_STATUS_POSTCOPY_DEVICE:
case MIGRATION_STATUS_POSTCOPY_ACTIVE:
case MIGRATION_STATUS_POSTCOPY_PAUSED:
case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
@@ -1238,6 +1239,7 @@ static bool migration_is_active(void)
MigrationState *s = current_migration;
return (s->state == MIGRATION_STATUS_ACTIVE ||
+ s->state == MIGRATION_STATUS_POSTCOPY_DEVICE ||
s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
}
@@ -1360,6 +1362,7 @@ static void fill_source_migration_info(MigrationInfo *info)
break;
case MIGRATION_STATUS_ACTIVE:
case MIGRATION_STATUS_CANCELLING:
+ case MIGRATION_STATUS_POSTCOPY_DEVICE:
case MIGRATION_STATUS_POSTCOPY_ACTIVE:
case MIGRATION_STATUS_PRE_SWITCHOVER:
case MIGRATION_STATUS_DEVICE:
@@ -1413,6 +1416,7 @@ static void fill_destination_migration_info(MigrationInfo *info)
case MIGRATION_STATUS_CANCELLING:
case MIGRATION_STATUS_CANCELLED:
case MIGRATION_STATUS_ACTIVE:
+ case MIGRATION_STATUS_POSTCOPY_DEVICE:
case MIGRATION_STATUS_POSTCOPY_ACTIVE:
case MIGRATION_STATUS_POSTCOPY_PAUSED:
case MIGRATION_STATUS_POSTCOPY_RECOVER:
@@ -1752,6 +1756,7 @@ bool migration_in_postcopy(void)
MigrationState *s = migrate_get_current();
switch (s->state) {
+ case MIGRATION_STATUS_POSTCOPY_DEVICE:
case MIGRATION_STATUS_POSTCOPY_ACTIVE:
case MIGRATION_STATUS_POSTCOPY_PAUSED:
case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
@@ -1853,6 +1858,8 @@ int migrate_init(MigrationState *s, Error **errp)
memset(&mig_stats, 0, sizeof(mig_stats));
migration_reset_vfio_bytes_transferred();
+ s->postcopy_package_loaded = false;
+
return 0;
}
@@ -2608,6 +2615,11 @@ static void *source_return_path_thread(void *opaque)
tmp32 = ldl_be_p(buf);
trace_source_return_path_thread_pong(tmp32);
qemu_sem_post(&ms->rp_state.rp_pong_acks);
+ if (tmp32 == QEMU_VM_PING_PACKAGED_LOADED) {
+ trace_source_return_path_thread_postcopy_package_loaded();
+ ms->postcopy_package_loaded = true;
+ qemu_event_set(&ms->postcopy_package_loaded_event);
+ }
break;
case MIG_RP_MSG_REQ_PAGES:
@@ -2853,6 +2865,13 @@ static int postcopy_start(MigrationState *ms, Error **errp)
if (migrate_postcopy_ram()) {
qemu_savevm_send_ping(fb, 3);
}
+ /*
+ * This ping will tell us that all non-postcopiable device state has been
+ * successfully loaded and the destination is about to start. When response
+ * is received, it will trigger transition from POSTCOPY_DEVICE to
+ * POSTCOPY_ACTIVE state.
+ */
+ qemu_savevm_send_ping(fb, QEMU_VM_PING_PACKAGED_LOADED);
qemu_savevm_send_postcopy_run(fb);
@@ -2910,7 +2929,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
/* Now, switchover looks all fine, switching to postcopy-active */
migrate_set_state(&ms->state, MIGRATION_STATUS_DEVICE,
- MIGRATION_STATUS_POSTCOPY_ACTIVE);
+ MIGRATION_STATUS_POSTCOPY_DEVICE);
bql_unlock();
@@ -3351,8 +3370,8 @@ static MigThrError migration_detect_error(MigrationState *s)
return postcopy_pause(s);
} else {
/*
- * For precopy (or postcopy with error outside IO), we fail
- * with no time.
+ * For precopy (or postcopy with error outside IO, or before dest
+ * starts), we fail with no time.
*/
migrate_set_state(&s->state, state, MIGRATION_STATUS_FAILED);
trace_migration_thread_file_err();
@@ -3487,7 +3506,8 @@ static MigIterateState migration_iteration_run(MigrationState *s)
{
uint64_t must_precopy, can_postcopy, pending_size;
Error *local_err = NULL;
- bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
+ bool in_postcopy = (s->state == MIGRATION_STATUS_POSTCOPY_DEVICE ||
+ s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
bool can_switchover = migration_can_switchover(s);
bool complete_ready;
@@ -3503,6 +3523,18 @@ static MigIterateState migration_iteration_run(MigrationState *s)
* POSTCOPY_ACTIVE it means switchover already happened.
*/
complete_ready = !pending_size;
+ if (s->state == MIGRATION_STATUS_POSTCOPY_DEVICE &&
+ (s->postcopy_package_loaded || complete_ready)) {
+ /*
+ * If package has been loaded, the event is set and we will
+ * immediatelly transition to POSTCOPY_ACTIVE. If we are ready for
+ * completion, we need to wait for destination to load the postcopy
+ * package before actually completing.
+ */
+ qemu_event_wait(&s->postcopy_package_loaded_event);
+ migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
+ MIGRATION_STATUS_POSTCOPY_ACTIVE);
+ }
} else {
/*
* Exact pending reporting is only needed for precopy. Taking RAM
@@ -4157,6 +4189,7 @@ static void migration_instance_finalize(Object *obj)
qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
error_free(ms->error);
+ qemu_event_destroy(&ms->postcopy_package_loaded_event);
}
static void migration_instance_init(Object *obj)
@@ -4178,6 +4211,7 @@ static void migration_instance_init(Object *obj)
qemu_sem_init(&ms->wait_unplug_sem, 0);
qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0);
qemu_mutex_init(&ms->qemu_file_lock);
+ qemu_event_init(&ms->postcopy_package_loaded_event, 0);
}
/*
diff --git a/migration/migration.h b/migration/migration.h
index 9d9e95ae90..050ded9e69 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -511,6 +511,9 @@ struct MigrationState {
/* Is this a rdma migration */
bool rdma_migration;
+ bool postcopy_package_loaded;
+ QemuEvent postcopy_package_loaded_event;
+
GSource *hup_source;
};
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 0375366ed0..985b66d1a6 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2094,7 +2094,7 @@ static void *postcopy_listen_thread(void *opaque)
object_ref(OBJECT(migr));
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
- MIGRATION_STATUS_POSTCOPY_ACTIVE);
+ MIGRATION_STATUS_POSTCOPY_DEVICE);
qemu_event_set(&mis->thread_sync_event);
trace_postcopy_ram_listen_thread_start();
@@ -2140,7 +2140,7 @@ static void *postcopy_listen_thread(void *opaque)
"loadvm failed during postcopy: %d: ", load_res);
migrate_set_error(migr, local_err);
error_report_err(local_err);
- migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+ migrate_set_state(&mis->state, mis->state,
MIGRATION_STATUS_FAILED);
goto out;
}
@@ -2152,6 +2152,10 @@ static void *postcopy_listen_thread(void *opaque)
*/
qemu_event_wait(&mis->main_thread_load_event);
+ /*
+ * Device load in the main thread has finished, we should be in
+ * POSTCOPY_ACTIVE now.
+ */
migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
MIGRATION_STATUS_COMPLETED);
diff --git a/migration/savevm.c b/migration/savevm.c
index 27fe815731..209c41c5b6 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2169,6 +2169,8 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis, Error **errp)
return -1;
}
+ migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
+ MIGRATION_STATUS_POSTCOPY_ACTIVE);
postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
migration_bh_schedule(loadvm_postcopy_handle_run_bh, mis);
diff --git a/migration/savevm.h b/migration/savevm.h
index c337e3e3d1..125a2507b7 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -29,6 +29,8 @@
#define QEMU_VM_COMMAND 0x08
#define QEMU_VM_SECTION_FOOTER 0x7e
+#define QEMU_VM_PING_PACKAGED_LOADED 0x42
+
bool qemu_savevm_state_blocked(Error **errp);
void qemu_savevm_non_migratable_list(strList **reasons);
int qemu_savevm_state_prepare(Error **errp);
diff --git a/migration/trace-events b/migration/trace-events
index e8edd1fbba..48518a4b2c 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -191,6 +191,7 @@ source_return_path_thread_pong(uint32_t val) "0x%x"
source_return_path_thread_shut(uint32_t val) "0x%x"
source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
source_return_path_thread_switchover_acked(void) ""
+source_return_path_thread_postcopy_package_loaded(void) ""
migration_thread_low_pending(uint64_t pending) "%" PRIu64
migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
diff --git a/qapi/migration.json b/qapi/migration.json
index da5b3a4d8c..c22285fd1c 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -142,6 +142,10 @@
# @postcopy-active: like active, but now in postcopy mode.
# (since 2.5)
#
+# @postcopy-device: like postcopy-active, but the destination is still
+# loading device state and is not running yet. If migration fails
+# during this state, the source side will resume. (since 10.2)
+#
# @postcopy-paused: during postcopy but paused. (since 3.0)
#
# @postcopy-recover-setup: setup phase for a postcopy recovery
@@ -173,8 +177,8 @@
##
{ 'enum': 'MigrationStatus',
'data': [ 'none', 'setup', 'cancelling', 'cancelled',
- 'active', 'postcopy-active', 'postcopy-paused',
- 'postcopy-recover-setup',
+ 'active', 'postcopy-device', 'postcopy-active',
+ 'postcopy-paused', 'postcopy-recover-setup',
'postcopy-recover', 'completed', 'failed', 'colo',
'pre-switchover', 'device', 'wait-unplug' ] }
##
diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index bb38292550..57ca623de5 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -1316,13 +1316,14 @@ void migration_test_add_precopy(MigrationTestEnv *env)
}
/* ensure new status don't go unnoticed */
- assert(MIGRATION_STATUS__MAX == 15);
+ assert(MIGRATION_STATUS__MAX == 16);
for (int i = MIGRATION_STATUS_NONE; i < MIGRATION_STATUS__MAX; i++) {
switch (i) {
case MIGRATION_STATUS_DEVICE: /* happens too fast */
case MIGRATION_STATUS_WAIT_UNPLUG: /* no support in tests */
case MIGRATION_STATUS_COLO: /* no support in tests */
+ case MIGRATION_STATUS_POSTCOPY_DEVICE: /* postcopy can't be cancelled */
case MIGRATION_STATUS_POSTCOPY_ACTIVE: /* postcopy can't be cancelled */
case MIGRATION_STATUS_POSTCOPY_PAUSED:
case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v2 4/4] migration: Introduce POSTCOPY_DEVICE state
2025-10-27 15:41 ` [PATCH v2 4/4] migration: Introduce POSTCOPY_DEVICE state Juraj Marcin
@ 2025-10-29 22:57 ` Peter Xu
0 siblings, 0 replies; 9+ messages in thread
From: Peter Xu @ 2025-10-29 22:57 UTC (permalink / raw)
To: Juraj Marcin
Cc: qemu-devel, Dr. David Alan Gilbert, Jiri Denemark, Fabiano Rosas
On Mon, Oct 27, 2025 at 04:41:11PM +0100, Juraj Marcin wrote:
[...]
> @@ -1752,6 +1756,7 @@ bool migration_in_postcopy(void)
> MigrationState *s = migrate_get_current();
>
> switch (s->state) {
> + case MIGRATION_STATUS_POSTCOPY_DEVICE:
> case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> case MIGRATION_STATUS_POSTCOPY_PAUSED:
> case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
> @@ -1853,6 +1858,8 @@ int migrate_init(MigrationState *s, Error **errp)
> memset(&mig_stats, 0, sizeof(mig_stats));
> migration_reset_vfio_bytes_transferred();
>
> + s->postcopy_package_loaded = false;
Better reset postcopy_package_loaded_event (in case of continuous
postcopy-device failures)?
Other than that, looks all good, thanks!
Reviewed-by: Peter Xu <peterx@redhat.com>
> +
> return 0;
> }
--
Peter Xu
^ permalink raw reply [flat|nested] 9+ messages in thread