From: Juan Quintela <quintela@redhat.com>
To: qemu-devel@nongnu.org
Cc: Juan Quintela <quintela@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Eric Blake <eblake@redhat.com>,
Lukas Straub <lukasstraub2@web.de>
Subject: [PULL 15/20] migration: Handle block device inactivation failures better
Date: Thu, 20 Apr 2023 15:17:46 +0200 [thread overview]
Message-ID: <20230420131751.28534-16-quintela@redhat.com> (raw)
In-Reply-To: <20230420131751.28534-1-quintela@redhat.com>
From: Eric Blake <eblake@redhat.com>
Consider what happens when performing a migration between two host
machines connected to an NFS server serving multiple block devices to
the guest, when the NFS server becomes unavailable. The migration
attempts to inactivate all block devices on the source (a necessary
step before the destination can take over); but if the NFS server is
non-responsive, the attempt to inactivate can itself fail. When that
happens, the destination fails to get the migrated guest (good,
because the source wasn't able to flush everything properly):
(qemu) qemu-kvm: load of migration failed: Input/output error
at which point, our only hope for the guest is for the source to take
back control. With the current code base, the host outputs a message, but then appears to resume:
(qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
(src qemu)info status
VM status: running
but a second migration attempt now asserts:
(src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
Whether the guest is recoverable on the source after the first failure
is debatable, but what we do not want is to have qemu itself fail due
to an assertion. It looks like the problem is as follows:
In migration.c:migration_completion(), the source sets 'inactivate' to
true (since COLO is not enabled), then tries
savevm.c:qemu_savevm_state_complete_precopy() with a request to
inactivate block devices. In turn, this calls
block.c:bdrv_inactivate_all(), which fails when flushing runs up
against the non-responsive NFS server. With savevm failing, we are
now left in a state where some, but not all, of the block devices have
been inactivated; but migration_completion() then jumps to 'fail'
rather than 'fail_invalidate' and skips an attempt to reclaim those
those disks by calling bdrv_activate_all(). Even if we do attempt to
reclaim disks, we aren't taking note of failure there, either.
Thus, we have reached a state where the migration engine has forgotten
all state about whether a block device is inactive, because we did not
set s->block_inactive in enough places; so migration allows the source
to reach vm_start() and resume execution, violating the block layer
invariant that the guest CPUs should not be restarted while a device
is inactive. Note that the code in migration.c:migrate_fd_cancel()
will also try to reactivate all block devices if s->block_inactive was
set, but because we failed to set that flag after the first failure,
the source assumes it has reclaimed all devices, even though it still
has remaining inactivated devices and does not try again. Normally,
qmp_cont() will also try to reactivate all disks (or correctly fail if
the disks are not reclaimable because NFS is not yet back up), but the
auto-resumption of the source after a migration failure does not go
through qmp_cont(). And because we have left the block layer in an
inconsistent state with devices still inactivated, the later migration
attempt is hitting the assertion failure.
Since it is important to not resume the source with inactive disks,
this patch marks s->block_inactive before attempting inactivation,
rather than after succeeding, in order to prevent any vm_start() until
it has successfully reactivated all devices.
See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
migration/migration.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index f311bb5f93..d630193272 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3447,13 +3447,11 @@ static void migration_completion(MigrationState *s)
MIGRATION_STATUS_DEVICE);
}
if (ret >= 0) {
+ s->block_inactive = inactivate;
qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
inactivate);
}
- if (inactivate && ret >= 0) {
- s->block_inactive = true;
- }
}
qemu_mutex_unlock_iothread();
@@ -3525,6 +3523,7 @@ fail_invalidate:
bdrv_activate_all(&local_err);
if (local_err) {
error_report_err(local_err);
+ s->block_inactive = true;
} else {
s->block_inactive = false;
}
--
2.39.2
next prev parent reply other threads:[~2023-04-20 13:18 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-20 13:17 [PULL 00/20] Migration 20230420 patches Juan Quintela
2023-04-20 13:17 ` [PULL 01/20] migration: remove extra whitespace character for code style Juan Quintela
2023-04-20 13:17 ` [PULL 02/20] postcopy-ram: do not use qatomic_mb_read Juan Quintela
2023-04-20 13:17 ` [PULL 03/20] migration: Merge ram_counters and ram_atomic_counters Juan Quintela
2023-04-20 13:17 ` [PULL 04/20] migration: Update atomic stats out of the mutex Juan Quintela
2023-04-20 13:17 ` [PULL 05/20] migration: Make multifd_bytes atomic Juan Quintela
2023-04-20 13:17 ` [PULL 06/20] migration: Make dirty_sync_missed_zero_copy atomic Juan Quintela
2023-04-20 13:17 ` [PULL 07/20] migration: Make precopy_bytes atomic Juan Quintela
2023-04-20 13:17 ` [PULL 08/20] migration: Make downtime_bytes atomic Juan Quintela
2023-04-20 13:17 ` [PULL 09/20] migration: Make dirty_sync_count atomic Juan Quintela
2023-04-20 13:17 ` [PULL 10/20] migration: Make postcopy_requests atomic Juan Quintela
2023-04-20 13:17 ` [PULL 11/20] migration: Make dirty_pages_rate atomic Juan Quintela
2023-04-20 13:17 ` [PULL 12/20] migration: Make dirty_bytes_last_sync atomic Juan Quintela
2023-04-20 13:17 ` [PULL 13/20] migration: Rename duplicate to zero_pages Juan Quintela
2023-04-20 13:17 ` [PULL 14/20] migration: Rename normal to normal_pages Juan Quintela
2023-04-20 13:17 ` Juan Quintela [this message]
2023-04-20 13:17 ` [PULL 16/20] util/mmap-alloc: qemu_fd_getfs() Juan Quintela
2023-04-20 13:17 ` [PULL 17/20] vl.c: Create late backends before migration object Juan Quintela
2023-04-20 13:17 ` [PULL 18/20] migration/postcopy: Detect file system on dest host Juan Quintela
2023-04-20 13:17 ` [PULL 19/20] migration: rename enabled_capabilities to capabilities Juan Quintela
2023-04-20 13:17 ` [PULL 20/20] migration: Pass migrate_caps_check() the old and new caps Juan Quintela
2023-04-22 5:09 ` [PULL 00/20] Migration 20230420 patches Richard Henderson
2023-04-22 9:21 ` Juan Quintela
2023-04-22 9:57 ` Richard Henderson
2023-04-23 9:45 ` Juan Quintela
2023-04-23 15:00 ` Richard Henderson
2023-04-24 9:57 ` Juan Quintela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230420131751.28534-16-quintela@redhat.com \
--to=quintela@redhat.com \
--cc=eblake@redhat.com \
--cc=lukasstraub2@web.de \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).