From: Ben Chaney <bchaney@akamai.com>
To: qemu-devel@nongnu.org
Subject: [PATCH v3 1/8] migration: stop vm earlier for cpr
Date: Wed, 03 Dec 2025 13:43:22 -0500 [thread overview]
Message-ID: <20251203-cpr-tap-v3-1-3cc89e9b19e4@akamai.com> (raw)
In-Reply-To: <20251203-cpr-tap-v3-0-3cc89e9b19e4@akamai.com>
From: Steve Sistare <steven.sistare@oracle.com>
Stop the vm earlier for cpr, before cpr_save_state which causes new QEMU
to proceed and initialize devices. We must guarantee devices are stopped
in old QEMU, and all source notifiers called, before they are initialized
in new QEMU.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Ben Chaney <bchaney@akamai.com>
---
migration/migration.c | 57 +++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 48 insertions(+), 9 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index c2daab6bdd..6d40697767 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1657,6 +1657,7 @@ void migration_cancel(void)
MIGRATION_STATUS_CANCELLED);
cpr_state_close();
migrate_hup_delete(s);
+ vm_resume(s->vm_old_state);
}
}
@@ -2216,6 +2217,7 @@ void qmp_migrate(const char *uri, bool has_channels,
MigrationAddress *addr = NULL;
MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
MigrationChannel *cpr_channel = NULL;
+ bool stopped = false;
/*
* Having preliminary checks for uri and channel
@@ -2268,6 +2270,46 @@ void qmp_migrate(const char *uri, bool has_channels,
return;
}
+ /*
+ * CPR-transfer ordering:
+ *
+ * SOURCE TARGET
+ * ------ ------
+ * cpr_state_load() blocks
+ * | |
+ * | 1. migration_stop_vm() |
+ * | VM stopped, devices quiesced |
+ * | | Waiting for
+ * | 2. notifiers (PRECOPY_SETUP) | FDs from source
+ * | vhost_reset_owner() releases |
+ * | device ownership |
+ * | |
+ * | 3. cpr_state_save() ---- FDs -------> |
+ * | |
+ * v v
+ * postmigrate Device init begins
+ * - cpr_find_fd()
+ * - vhost_dev_init()
+ * - VHOST_SET_OWNER
+ *
+ * Step 3 is the synchronization/cut-over point. Target proceeds immediately
+ * upon receiving FDs, so steps 1-2 must complete otherwise:
+ * - Target's VHOST_SET_OWNER fails with -EBUSY (source still owns)
+ * - Race between source I/O and target device init
+ *
+ * We stop the VM early (before FD transfer) to prevent this race.
+ * Unlike regular migration, CPR-transfer passes memory via FD (memfd)
+ * rather than copying RAM, so early VM stop should have minimal downtime.
+ */
+ if (migrate_mode_is_cpr(s)) {
+ int ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
+ if (ret < 0) {
+ error_setg(&local_err, "migration_stop_vm failed, error %d", -ret);
+ goto out;
+ }
+ stopped = true;
+ }
+
if (!cpr_state_save(cpr_channel, &local_err)) {
goto out;
}
@@ -2294,6 +2336,9 @@ out:
if (local_err) {
migration_connect_set_error(s, local_err);
error_propagate(errp, local_err);
+ if (stopped) {
+ vm_resume(s->vm_old_state);
+ }
}
}
@@ -2339,6 +2384,9 @@ static void qmp_migrate_finish(MigrationAddress *addr, bool resume_requested,
}
migration_connect_set_error(s, local_err);
error_propagate(errp, local_err);
+ if (migrate_mode_is_cpr(s)) {
+ vm_resume(s->vm_old_state);
+ }
return;
}
}
@@ -4028,7 +4076,6 @@ void migration_connect(MigrationState *s, Error *error_in)
Error *local_err = NULL;
uint64_t rate_limit;
bool resume = (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP);
- int ret;
/*
* If there's a previous error, free it and prepare for another one.
@@ -4099,14 +4146,6 @@ void migration_connect(MigrationState *s, Error *error_in)
return;
}
- if (migrate_mode_is_cpr(s)) {
- ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
- if (ret < 0) {
- error_setg(&local_err, "migration_stop_vm failed, error %d", -ret);
- goto fail;
- }
- }
-
/*
* Take a refcount to make sure the migration object won't get freed by
* the main thread already in migration_shutdown().
--
2.34.1
next prev parent reply other threads:[~2025-12-03 18:45 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-03 18:43 [PATCH v3 0/8] Live update: tap and vhost Ben Chaney
2025-12-03 18:43 ` Ben Chaney [this message]
2025-12-03 18:43 ` [PATCH v3 2/8] migration: cpr setup notifier Ben Chaney
2025-12-03 18:43 ` [PATCH v3 3/8] vhost: reset vhost devices for cpr Ben Chaney
2025-12-03 18:43 ` [PATCH v3 4/8] cpr: delete all fds Ben Chaney
2025-12-03 18:43 ` [PATCH v3 5/8] tap: common return label Ben Chaney
2025-12-03 18:43 ` [PATCH v3 6/8] tap: cpr support Ben Chaney
2025-12-03 18:43 ` [PATCH v3 7/8] tap: postload fix for cpr Ben Chaney
2025-12-03 18:43 ` [PATCH v3 8/8] tap: cpr fixes Ben Chaney
-- strict thread matches above, loose matches on Subject: below --
2025-12-03 18:51 [PATCH v3 0/8] Live update: tap and vhost Ben Chaney
2025-12-03 18:51 ` [PATCH v3 1/8] migration: stop vm earlier for cpr Ben Chaney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251203-cpr-tap-v3-1-3cc89e9b19e4@akamai.com \
--to=bchaney@akamai.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).