From: Peter Xu <peterx@redhat.com>
To: qemu-devel@nongnu.org
Cc: Fabiano Rosas <farosas@suse.de>,
David Hildenbrand <david@redhat.com>,
peterx@redhat.com, Paolo Bonzini <pbonzini@redhat.com>
Subject: [PULL 30/36] migration: Do not try to start VM if disk activation fails
Date: Mon, 3 Nov 2025 16:06:19 -0500 [thread overview]
Message-ID: <20251103210625.3689448-31-peterx@redhat.com> (raw)
In-Reply-To: <20251103210625.3689448-1-peterx@redhat.com>
If a rare split brain happens (e.g. dest QEMU started running somehow,
taking shared drive locks), src QEMU may not be able to activate the
drives anymore. In this case, src QEMU shouldn't start the VM or it might
crash the block layer later with something like:
Meanwhile, src QEMU cannot try to continue either even if dest QEMU can
release the drive locks (e.g. by QMP "stop"). Because as long as dest QEMU
started running, it means dest QEMU's RAM is the only version that is
consistent with current status of the shared storage.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20251103183301.3840862-3-jmarcin@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 5e74993b46..6e647c7c4a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3526,6 +3526,8 @@ static MigIterateState migration_iteration_run(MigrationState *s)
static void migration_iteration_finish(MigrationState *s)
{
+ Error *local_err = NULL;
+
bql_lock();
/*
@@ -3549,11 +3551,28 @@ static void migration_iteration_finish(MigrationState *s)
case MIGRATION_STATUS_FAILED:
case MIGRATION_STATUS_CANCELLED:
case MIGRATION_STATUS_CANCELLING:
- /*
- * Re-activate the block drives if they're inactivated. Note, COLO
- * shouldn't use block_active at all, so it should be no-op there.
- */
- migration_block_activate(NULL);
+ if (!migration_block_activate(&local_err)) {
+ /*
+ * Re-activate the block drives if they're inactivated.
+ *
+ * If it fails (e.g. in case of a split brain, where dest QEMU
+ * might have taken some of the drive locks and running!), do
+ * not start VM, instead wait for mgmt to decide the next step.
+ *
+ * If dest already started, it means dest QEMU should contain
+ * all the data it needs and it properly owns all the drive
+ * locks. Then even if src QEMU got a FAILED in migration, it
+ * normally should mean we should treat the migration as
+ * COMPLETED.
+ *
+ * NOTE: it's not safe anymore to start VM on src now even if
+ * dest would release the drive locks. It's because as long as
+ * dest started running then only dest QEMU's RAM is consistent
+ * with the shared storage.
+ */
+ error_free(local_err);
+ break;
+ }
if (runstate_is_live(s->vm_old_state)) {
if (!runstate_check(RUN_STATE_SHUTDOWN)) {
vm_start();
--
2.50.1
next prev parent reply other threads:[~2025-11-03 21:14 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-03 21:05 [PULL 00/36] Staging patches Peter Xu
2025-11-03 21:05 ` [PULL 01/36] migration/savevm: Add a compatibility check for capabilities Peter Xu
2025-11-03 21:05 ` [PULL 02/36] MAINTAINERS: update cpr reviewers Peter Xu
2025-11-03 21:05 ` [PULL 03/36] migration/ram: fix docs of ram_handle_zero Peter Xu
2025-11-03 21:05 ` [PULL 04/36] migration: add FEATURE_SEEKABLE to QIOChannelBlock Peter Xu
2025-11-03 21:05 ` [PULL 05/36] migration: mapped-ram: handle zero pages Peter Xu
2025-11-03 21:05 ` [PULL 06/36] migration: Remove unused VMSTATE_UINTTL_EQUAL[_V]() macros Peter Xu
2025-11-03 21:05 ` [PULL 07/36] migration: Fix error leak in postcopy_ram_listen_thread() Peter Xu
2025-11-03 21:05 ` [PULL 08/36] migration/cpr: Fix coverity report in cpr_exec_persist_state() Peter Xu
2025-11-03 21:05 ` [PULL 09/36] migration/cpr: Fix UAF in cpr_exec_cb() when execvp() fails Peter Xu
2025-11-03 21:05 ` [PULL 10/36] migration/cpr: Avoid crashing QEMU when cpr-exec runs with no args Peter Xu
2025-11-03 21:06 ` [PULL 11/36] ram-block-attributes: fix interaction with hugetlb memory backends Peter Xu
2025-11-03 21:06 ` [PULL 12/36] ram-block-attributes: Unify the retrieval of the block size Peter Xu
2025-11-03 21:06 ` [PULL 13/36] migration/qmp: Update "resume" flag doc in "migrate" command Peter Xu
2025-11-05 12:27 ` Richard Henderson
2025-11-03 21:06 ` [PULL 14/36] migration/cpr: Document obscure usage of g_autofree when parse str Peter Xu
2025-11-03 21:06 ` [PULL 15/36] hostmem/shm: Allow shm memory backend serve as shared memory for coco-VMs Peter Xu
2025-11-03 21:06 ` [PULL 16/36] migration: Fix regression of passing error_fatal into vmstate_load_state() Peter Xu
2025-11-03 21:06 ` [PULL 17/36] migration: Don't free the reason after calling migrate_add_blocker Peter Xu
2025-11-03 21:06 ` [PULL 18/36] migration: Use unsigned instead of int for bit set of MigMode Peter Xu
2025-11-03 21:06 ` [PULL 19/36] migration: Use bitset of MigMode instead of variable arguments Peter Xu
2025-11-03 21:06 ` [PULL 20/36] migration: Put Error **errp parameter last Peter Xu
2025-11-03 21:06 ` [PULL 21/36] io: Add qio_channel_wait_cond() helper Peter Xu
2025-11-03 21:06 ` [PULL 22/36] migration: Properly wait on G_IO_IN when peeking messages Peter Xu
2025-11-03 21:06 ` [PULL 23/36] migration: vmstate_save_state_v(): fix error path Peter Xu
2025-11-03 21:06 ` [PULL 24/36] tmp_emulator: improve and fix use of errp Peter Xu
2025-11-03 21:06 ` [PULL 25/36] migration/vmstate: stop reporting error number for new _errp APIs Peter Xu
2025-11-03 21:06 ` [PULL 26/36] migration: vmsd errp handlers: return bool Peter Xu
2025-11-03 21:06 ` [PULL 27/36] scripts/vmstate-static-checker: Fix deprecation warnings with latest argparse Peter Xu
2025-11-03 21:06 ` [PULL 28/36] system/physmem: mark io_mem_unassigned lockless Peter Xu
2025-11-03 21:06 ` [PULL 29/36] migration: Flush migration channel after sending data of CMD_PACKAGED Peter Xu
2025-11-03 21:06 ` Peter Xu [this message]
2025-11-03 21:06 ` [PULL 31/36] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c Peter Xu
2025-11-03 21:06 ` [PULL 32/36] migration: Introduce postcopy incoming setup and cleanup functions Peter Xu
2025-11-03 21:06 ` [PULL 33/36] migration: Refactor all incoming cleanup info migration_incoming_destroy() Peter Xu
2025-11-03 21:06 ` [PULL 34/36] migration: Respect exit-on-error when migration fails before resuming Peter Xu
2025-11-03 21:06 ` [PULL 35/36] migration: Make postcopy listen thread joinable Peter Xu
2025-11-03 21:06 ` [PULL 36/36] migration: Introduce POSTCOPY_DEVICE state Peter Xu
2025-11-05 7:52 ` [PULL 00/36] Staging patches Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251103210625.3689448-31-peterx@redhat.com \
--to=peterx@redhat.com \
--cc=david@redhat.com \
--cc=farosas@suse.de \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).