From: Avihai Horon <avihaih@nvidia.com>
To: <qemu-devel@nongnu.org>
Cc: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>,
"Joao Martins" <joao.m.martins@oracle.com>,
Avihai Horon <avihaih@nvidia.com>
Subject: [PATCH] migration: Don't serialize migration while can't switchover
Date: Thu, 22 Feb 2024 17:56:27 +0200 [thread overview]
Message-ID: <20240222155627.14563-1-avihaih@nvidia.com> (raw)
Currently, migration code serializes device data sending during pre-copy
iterative phase. As noted in the code comment, this is done to prevent
faster changing device from sending its data over and over.
However, with switchover-ack capability enabled, this behavior can be
problematic and may prevent migration from converging. The problem lies
in the fact that an earlier device may never finish sending its data and
thus block other devices from sending theirs.
This bug was observed in several VFIO migration scenarios where some
workload on the VM prevented RAM from ever reaching a hard zero, not
allowing VFIO initial pre-copy data to be sent, and thus destination
could not ack switchover. Note that the same scenario, but without
switchover-ack, would converge.
Fix it by not serializing device data sending during pre-copy iterative
phase if switchover was not acked yet.
Fixes: 1b4adb10f898 ("migration: Implement switchover ack logic")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
migration/savevm.h | 2 +-
migration/migration.c | 4 ++--
migration/savevm.c | 22 +++++++++++++++-------
3 files changed, 18 insertions(+), 10 deletions(-)
diff --git a/migration/savevm.h b/migration/savevm.h
index 74669733dd6..d4a368b522b 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -36,7 +36,7 @@ void qemu_savevm_state_setup(QEMUFile *f);
bool qemu_savevm_state_guest_unplug_pending(void);
int qemu_savevm_state_resume_prepare(MigrationState *s);
void qemu_savevm_state_header(QEMUFile *f);
-int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
+int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy, bool can_switchover);
void qemu_savevm_state_cleanup(void);
void qemu_savevm_state_complete_postcopy(QEMUFile *f);
int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
diff --git a/migration/migration.c b/migration/migration.c
index ab21de2cadb..d8bfe1fb1b9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3133,7 +3133,7 @@ static MigIterateState migration_iteration_run(MigrationState *s)
}
/* Just another iteration step */
- qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
+ qemu_savevm_state_iterate(s->to_dst_file, in_postcopy, can_switchover);
return MIG_ITERATE_RESUME;
}
@@ -3216,7 +3216,7 @@ static MigIterateState bg_migration_iteration_run(MigrationState *s)
{
int res;
- res = qemu_savevm_state_iterate(s->to_dst_file, false);
+ res = qemu_savevm_state_iterate(s->to_dst_file, false, true);
if (res > 0) {
bg_migration_completion(s);
return MIG_ITERATE_BREAK;
diff --git a/migration/savevm.c b/migration/savevm.c
index d612c8a9020..3a012796375 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1386,7 +1386,7 @@ int qemu_savevm_state_resume_prepare(MigrationState *s)
* 0 : We haven't finished, caller have to go again
* 1 : We have finished, we can go to complete phase
*/
-int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
+int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy, bool can_switchover)
{
SaveStateEntry *se;
int ret = 1;
@@ -1430,12 +1430,20 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
"%d(%s): %d",
se->section_id, se->idstr, ret);
qemu_file_set_error(f, ret);
+ return ret;
}
- if (ret <= 0) {
- /* Do not proceed to the next vmstate before this one reported
- completion of the current stage. This serializes the migration
- and reduces the probability that a faster changing state is
- synchronized over and over again. */
+
+ if (ret == 0 && can_switchover) {
+ /*
+ * Do not proceed to the next vmstate before this one reported
+ * completion of the current stage. This serializes the migration
+ * and reduces the probability that a faster changing state is
+ * synchronized over and over again.
+ * Do it only if migration can switchover. If migration can't
+ * switchover yet, do proceed to let other devices send their data
+ * too, as this may be required for switchover to be acked and
+ * migration to converge.
+ */
break;
}
}
@@ -1724,7 +1732,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
qemu_savevm_state_setup(f);
while (qemu_file_get_error(f) == 0) {
- if (qemu_savevm_state_iterate(f, false) > 0) {
+ if (qemu_savevm_state_iterate(f, false, true) > 0) {
break;
}
}
--
2.26.3
next reply other threads:[~2024-02-22 16:02 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-22 15:56 Avihai Horon [this message]
2024-02-27 3:16 ` [PATCH] migration: Don't serialize migration while can't switchover Wang, Lei
2024-02-28 9:56 ` Avihai Horon
2024-02-27 7:41 ` Peter Xu
2024-02-27 10:44 ` Joao Martins
2024-02-28 0:00 ` Avihai Horon
2024-02-28 3:04 ` Peter Xu
2024-02-28 9:39 ` Avihai Horon
2024-02-28 10:17 ` Peter Xu
2024-02-28 10:27 ` Avihai Horon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240222155627.14563-1-avihaih@nvidia.com \
--to=avihaih@nvidia.com \
--cc=farosas@suse.de \
--cc=joao.m.martins@oracle.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).