* [PATCH v11 13/21] migration-test: Add COLO migration unit test
2026-03-01 19:16 [PATCH v11 00/21] migration: Add COLO multifd support and " Lukas Straub
@ 2026-03-01 19:16 ` Lukas Straub
0 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-01 19:16 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
Dr. David Alan Gilbert, Lukas Straub
Add a COLO migration test for COLO migration and failover.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
MAINTAINERS | 1 +
tests/qtest/meson.build | 7 +-
tests/qtest/migration-test.c | 1 +
tests/qtest/migration/colo-tests.c | 198 +++++++++++++++++++++++++++++++++++++
tests/qtest/migration/framework.h | 5 +
5 files changed, 211 insertions(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index d2a1f4cc08223cb944b61e32a6d89e25bf82eacb..1b0ae10750036be00571b7104ad8426c071bb54c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3875,6 +3875,7 @@ F: migration/colo*
F: migration/multifd-colo.*
F: include/migration/colo.h
F: include/migration/failover.h
+F: tests/qtest/migration/colo-tests.c
F: docs/COLO-FT.txt
COLO Proxy
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 25fdbc798010b19e8ec9b6ab55e02d3fb5741398..6a46e2a767de12d978d910ddb6de175bce9810b8 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -374,6 +374,11 @@ if gnutls.found()
endif
endif
+migration_colo_files = []
+if get_option('replication').allowed()
+ migration_colo_files = [files('migration/colo-tests.c')]
+endif
+
qtests = {
'aspeed_hace-test': files('aspeed-hace-utils.c', 'aspeed_hace-test.c'),
'aspeed_smc-test': files('aspeed-smc-utils.c', 'aspeed_smc-test.c'),
@@ -385,7 +390,7 @@ qtests = {
'migration/migration-util.c') + dbus_vmstate1,
'erst-test': files('erst-test.c'),
'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
- 'migration-test': test_migration_files + migration_tls_files,
+ 'migration-test': test_migration_files + migration_tls_files + migration_colo_files,
'pxe-test': files('boot-sector.c'),
'pnv-xive2-test': files('pnv-xive2-common.c', 'pnv-xive2-flush-sync.c',
'pnv-xive2-nvpg_bar.c'),
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 08936871741535c926eeac40a7d7c3f461c72fd0..e582f05c7dc2673dbd05a936df8feb6c964b5bbc 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -55,6 +55,7 @@ int main(int argc, char **argv)
migration_test_add_precopy(env);
migration_test_add_cpr(env);
migration_test_add_misc(env);
+ migration_test_add_colo(env);
ret = g_test_run();
diff --git a/tests/qtest/migration/colo-tests.c b/tests/qtest/migration/colo-tests.c
new file mode 100644
index 0000000000000000000000000000000000000000..598a1d3821ed0a90318732702027cebad47352fd
--- /dev/null
+++ b/tests/qtest/migration/colo-tests.c
@@ -0,0 +1,198 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * QTest testcases for COLO migration
+ *
+ * Copyright (c) 2025 Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "migration/framework.h"
+#include "migration/migration-qmp.h"
+#include "migration/migration-util.h"
+#include "qemu/module.h"
+
+static int test_colo_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ QTestState *from, *to;
+ void *data_hook = NULL;
+
+ /*
+ * For the COLO test, both VMs will run in parallel. Thus both VMs want to
+ * open the image read/write at the same time. Using read-only=on is not
+ * possible here, because ide-hd does not support read-only backing image.
+ *
+ * So use -snapshot, where each qemu instance creates its own writable
+ * snapshot internally while leaving the real image read-only.
+ */
+ args->start.opts_source = "-snapshot";
+ args->start.opts_target = "-snapshot";
+
+ /*
+ * COLO migration code logs many errors when the migration socket
+ * is shut down, these are expected so we hide them here.
+ */
+ args->start.hide_stderr = true;
+
+ /*
+ * Test with yank with out of band capability since that is how it is
+ * used in production.
+ */
+ args->start.oob = true;
+ args->start.caps[MIGRATION_CAPABILITY_X_COLO] = true;
+
+ if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
+ return -1;
+ }
+
+ migrate_set_parameter_int(from, "x-checkpoint-delay", 300);
+
+ if (args->start_hook) {
+ data_hook = args->start_hook(from, to);
+ }
+
+ migrate_ensure_converge(from);
+ wait_for_serial("src_serial");
+
+ migrate_qmp(from, to, args->connect_uri, NULL, "{}");
+
+ wait_for_migration_status(from, "colo", NULL);
+ wait_for_resume(to, get_dst());
+
+ wait_for_serial("src_serial");
+ wait_for_serial("dest_serial");
+
+ /* wait for 3 checkpoints */
+ for (int i = 0; i < 3; i++) {
+ qtest_qmp_eventwait(to, "RESUME");
+ wait_for_serial("src_serial");
+ wait_for_serial("dest_serial");
+ }
+
+ if (failover_during_checkpoint) {
+ qtest_qmp_eventwait(to, "STOP");
+ }
+ if (primary_failover) {
+ qtest_qmp_assert_success(from, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+ "'arguments': {'instances':"
+ "[{'type': 'migration'}]}}");
+ qtest_qmp_assert_success(from, "{'execute': 'x-colo-lost-heartbeat'}");
+ wait_for_serial("src_serial");
+ } else {
+ qtest_qmp_assert_success(to, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+ "'arguments': {'instances':"
+ "[{'type': 'migration'}]}}");
+ qtest_qmp_assert_success(to, "{'execute': 'x-colo-lost-heartbeat'}");
+ wait_for_serial("dest_serial");
+ }
+
+ if (args->end_hook) {
+ args->end_hook(from, to, data_hook);
+ }
+
+ migrate_end(from, to, !primary_failover);
+
+ return 0;
+}
+
+static void test_colo_plain_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ args->listen_uri = "tcp:127.0.0.1:0";
+ test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void *hook_start_multifd(QTestState *from, QTestState *to)
+{
+ return migrate_hook_start_precopy_tcp_multifd_common(from, to, "none");
+}
+
+static void test_colo_multifd_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ args->listen_uri = "defer";
+ args->start_hook = hook_start_multifd;
+ args->start.caps[MIGRATION_CAPABILITY_MULTIFD] = true;
+ test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void test_colo_plain_primary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_plain_common(args, false, true);
+}
+
+static void test_colo_plain_secondary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_plain_common(args, false, false);
+}
+
+static void test_colo_multifd_primary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_multifd_common(args, false, true);
+}
+
+static void test_colo_multifd_secondary_failover(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, false, false);
+}
+
+static void test_colo_plain_primary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_plain_common(args, true, true);
+}
+
+static void test_colo_plain_secondary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_plain_common(args, true, false);
+}
+
+static void test_colo_multifd_primary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, true, true);
+}
+
+static void test_colo_multifd_secondary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, true, false);
+}
+
+void migration_test_add_colo(MigrationTestEnv *env)
+{
+ if (!env->full_set) {
+ return;
+ }
+
+ migration_test_add("/migration/colo/plain/primary_failover",
+ test_colo_plain_primary_failover);
+ migration_test_add("/migration/colo/plain/secondary_failover",
+ test_colo_plain_secondary_failover);
+
+ migration_test_add("/migration/colo/multifd/primary_failover",
+ test_colo_multifd_primary_failover);
+ migration_test_add("/migration/colo/multifd/secondary_failover",
+ test_colo_multifd_secondary_failover);
+
+ migration_test_add("/migration/colo/plain/primary_failover_checkpoint",
+ test_colo_plain_primary_failover_checkpoint);
+ migration_test_add("/migration/colo/plain/secondary_failover_checkpoint",
+ test_colo_plain_secondary_failover_checkpoint);
+
+ migration_test_add("/migration/colo/multifd/primary_failover_checkpoint",
+ test_colo_multifd_primary_failover_checkpoint);
+ migration_test_add("/migration/colo/multifd/secondary_failover_checkpoint",
+ test_colo_multifd_secondary_failover_checkpoint);
+}
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index 40984d04930da2d181326d9f6a742bde49018103..80eef758932ce9c301ed6c0f6383d18756144870 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -264,5 +264,10 @@ void migration_test_add_file(MigrationTestEnv *env);
void migration_test_add_precopy(MigrationTestEnv *env);
void migration_test_add_cpr(MigrationTestEnv *env);
void migration_test_add_misc(MigrationTestEnv *env);
+#ifdef CONFIG_REPLICATION
+void migration_test_add_colo(MigrationTestEnv *env);
+#else
+static inline void migration_test_add_colo(MigrationTestEnv *env) {};
+#endif
#endif /* TEST_FRAMEWORK_H */
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 13/21] migration-test: Add COLO migration unit test
2026-03-01 21:37 [PATCH v11 00/21] migration: Add COLO multifd support and " Lukas Straub
@ 2026-03-01 21:37 ` Lukas Straub
0 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-01 21:37 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
Dr. David Alan Gilbert, Lukas Straub
Add a COLO migration test for COLO migration and failover.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
MAINTAINERS | 1 +
tests/qtest/meson.build | 7 +-
tests/qtest/migration-test.c | 1 +
tests/qtest/migration/colo-tests.c | 198 +++++++++++++++++++++++++++++++++++++
tests/qtest/migration/framework.h | 5 +
5 files changed, 211 insertions(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index d2a1f4cc08223cb944b61e32a6d89e25bf82eacb..1b0ae10750036be00571b7104ad8426c071bb54c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3875,6 +3875,7 @@ F: migration/colo*
F: migration/multifd-colo.*
F: include/migration/colo.h
F: include/migration/failover.h
+F: tests/qtest/migration/colo-tests.c
F: docs/COLO-FT.txt
COLO Proxy
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 25fdbc798010b19e8ec9b6ab55e02d3fb5741398..6a46e2a767de12d978d910ddb6de175bce9810b8 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -374,6 +374,11 @@ if gnutls.found()
endif
endif
+migration_colo_files = []
+if get_option('replication').allowed()
+ migration_colo_files = [files('migration/colo-tests.c')]
+endif
+
qtests = {
'aspeed_hace-test': files('aspeed-hace-utils.c', 'aspeed_hace-test.c'),
'aspeed_smc-test': files('aspeed-smc-utils.c', 'aspeed_smc-test.c'),
@@ -385,7 +390,7 @@ qtests = {
'migration/migration-util.c') + dbus_vmstate1,
'erst-test': files('erst-test.c'),
'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
- 'migration-test': test_migration_files + migration_tls_files,
+ 'migration-test': test_migration_files + migration_tls_files + migration_colo_files,
'pxe-test': files('boot-sector.c'),
'pnv-xive2-test': files('pnv-xive2-common.c', 'pnv-xive2-flush-sync.c',
'pnv-xive2-nvpg_bar.c'),
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 08936871741535c926eeac40a7d7c3f461c72fd0..e582f05c7dc2673dbd05a936df8feb6c964b5bbc 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -55,6 +55,7 @@ int main(int argc, char **argv)
migration_test_add_precopy(env);
migration_test_add_cpr(env);
migration_test_add_misc(env);
+ migration_test_add_colo(env);
ret = g_test_run();
diff --git a/tests/qtest/migration/colo-tests.c b/tests/qtest/migration/colo-tests.c
new file mode 100644
index 0000000000000000000000000000000000000000..598a1d3821ed0a90318732702027cebad47352fd
--- /dev/null
+++ b/tests/qtest/migration/colo-tests.c
@@ -0,0 +1,198 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * QTest testcases for COLO migration
+ *
+ * Copyright (c) 2025 Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "migration/framework.h"
+#include "migration/migration-qmp.h"
+#include "migration/migration-util.h"
+#include "qemu/module.h"
+
+static int test_colo_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ QTestState *from, *to;
+ void *data_hook = NULL;
+
+ /*
+ * For the COLO test, both VMs will run in parallel. Thus both VMs want to
+ * open the image read/write at the same time. Using read-only=on is not
+ * possible here, because ide-hd does not support read-only backing image.
+ *
+ * So use -snapshot, where each qemu instance creates its own writable
+ * snapshot internally while leaving the real image read-only.
+ */
+ args->start.opts_source = "-snapshot";
+ args->start.opts_target = "-snapshot";
+
+ /*
+ * COLO migration code logs many errors when the migration socket
+ * is shut down, these are expected so we hide them here.
+ */
+ args->start.hide_stderr = true;
+
+ /*
+ * Test with yank with out of band capability since that is how it is
+ * used in production.
+ */
+ args->start.oob = true;
+ args->start.caps[MIGRATION_CAPABILITY_X_COLO] = true;
+
+ if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
+ return -1;
+ }
+
+ migrate_set_parameter_int(from, "x-checkpoint-delay", 300);
+
+ if (args->start_hook) {
+ data_hook = args->start_hook(from, to);
+ }
+
+ migrate_ensure_converge(from);
+ wait_for_serial("src_serial");
+
+ migrate_qmp(from, to, args->connect_uri, NULL, "{}");
+
+ wait_for_migration_status(from, "colo", NULL);
+ wait_for_resume(to, get_dst());
+
+ wait_for_serial("src_serial");
+ wait_for_serial("dest_serial");
+
+ /* wait for 3 checkpoints */
+ for (int i = 0; i < 3; i++) {
+ qtest_qmp_eventwait(to, "RESUME");
+ wait_for_serial("src_serial");
+ wait_for_serial("dest_serial");
+ }
+
+ if (failover_during_checkpoint) {
+ qtest_qmp_eventwait(to, "STOP");
+ }
+ if (primary_failover) {
+ qtest_qmp_assert_success(from, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+ "'arguments': {'instances':"
+ "[{'type': 'migration'}]}}");
+ qtest_qmp_assert_success(from, "{'execute': 'x-colo-lost-heartbeat'}");
+ wait_for_serial("src_serial");
+ } else {
+ qtest_qmp_assert_success(to, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+ "'arguments': {'instances':"
+ "[{'type': 'migration'}]}}");
+ qtest_qmp_assert_success(to, "{'execute': 'x-colo-lost-heartbeat'}");
+ wait_for_serial("dest_serial");
+ }
+
+ if (args->end_hook) {
+ args->end_hook(from, to, data_hook);
+ }
+
+ migrate_end(from, to, !primary_failover);
+
+ return 0;
+}
+
+static void test_colo_plain_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ args->listen_uri = "tcp:127.0.0.1:0";
+ test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void *hook_start_multifd(QTestState *from, QTestState *to)
+{
+ return migrate_hook_start_precopy_tcp_multifd_common(from, to, "none");
+}
+
+static void test_colo_multifd_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ args->listen_uri = "defer";
+ args->start_hook = hook_start_multifd;
+ args->start.caps[MIGRATION_CAPABILITY_MULTIFD] = true;
+ test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void test_colo_plain_primary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_plain_common(args, false, true);
+}
+
+static void test_colo_plain_secondary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_plain_common(args, false, false);
+}
+
+static void test_colo_multifd_primary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_multifd_common(args, false, true);
+}
+
+static void test_colo_multifd_secondary_failover(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, false, false);
+}
+
+static void test_colo_plain_primary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_plain_common(args, true, true);
+}
+
+static void test_colo_plain_secondary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_plain_common(args, true, false);
+}
+
+static void test_colo_multifd_primary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, true, true);
+}
+
+static void test_colo_multifd_secondary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, true, false);
+}
+
+void migration_test_add_colo(MigrationTestEnv *env)
+{
+ if (!env->full_set) {
+ return;
+ }
+
+ migration_test_add("/migration/colo/plain/primary_failover",
+ test_colo_plain_primary_failover);
+ migration_test_add("/migration/colo/plain/secondary_failover",
+ test_colo_plain_secondary_failover);
+
+ migration_test_add("/migration/colo/multifd/primary_failover",
+ test_colo_multifd_primary_failover);
+ migration_test_add("/migration/colo/multifd/secondary_failover",
+ test_colo_multifd_secondary_failover);
+
+ migration_test_add("/migration/colo/plain/primary_failover_checkpoint",
+ test_colo_plain_primary_failover_checkpoint);
+ migration_test_add("/migration/colo/plain/secondary_failover_checkpoint",
+ test_colo_plain_secondary_failover_checkpoint);
+
+ migration_test_add("/migration/colo/multifd/primary_failover_checkpoint",
+ test_colo_multifd_primary_failover_checkpoint);
+ migration_test_add("/migration/colo/multifd/secondary_failover_checkpoint",
+ test_colo_multifd_secondary_failover_checkpoint);
+}
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index 40984d04930da2d181326d9f6a742bde49018103..80eef758932ce9c301ed6c0f6383d18756144870 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -264,5 +264,10 @@ void migration_test_add_file(MigrationTestEnv *env);
void migration_test_add_precopy(MigrationTestEnv *env);
void migration_test_add_cpr(MigrationTestEnv *env);
void migration_test_add_misc(MigrationTestEnv *env);
+#ifdef CONFIG_REPLICATION
+void migration_test_add_colo(MigrationTestEnv *env);
+#else
+static inline void migration_test_add_colo(MigrationTestEnv *env) {};
+#endif
#endif /* TEST_FRAMEWORK_H */
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 13/21] migration-test: Add COLO migration unit test
2026-03-02 10:13 [PATCH v11 00/21] migration: Add COLO multifd support and " Lukas Straub
@ 2026-03-02 10:13 ` Lukas Straub
0 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 10:13 UTC (permalink / raw)
To: qemu-devel
Add a COLO migration test for COLO migration and failover.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
MAINTAINERS | 1 +
tests/qtest/meson.build | 7 +-
tests/qtest/migration-test.c | 1 +
tests/qtest/migration/colo-tests.c | 198 +++++++++++++++++++++++++++++++++++++
tests/qtest/migration/framework.h | 5 +
5 files changed, 211 insertions(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index d2a1f4cc08223cb944b61e32a6d89e25bf82eacb..1b0ae10750036be00571b7104ad8426c071bb54c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3875,6 +3875,7 @@ F: migration/colo*
F: migration/multifd-colo.*
F: include/migration/colo.h
F: include/migration/failover.h
+F: tests/qtest/migration/colo-tests.c
F: docs/COLO-FT.txt
COLO Proxy
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 25fdbc798010b19e8ec9b6ab55e02d3fb5741398..6a46e2a767de12d978d910ddb6de175bce9810b8 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -374,6 +374,11 @@ if gnutls.found()
endif
endif
+migration_colo_files = []
+if get_option('replication').allowed()
+ migration_colo_files = [files('migration/colo-tests.c')]
+endif
+
qtests = {
'aspeed_hace-test': files('aspeed-hace-utils.c', 'aspeed_hace-test.c'),
'aspeed_smc-test': files('aspeed-smc-utils.c', 'aspeed_smc-test.c'),
@@ -385,7 +390,7 @@ qtests = {
'migration/migration-util.c') + dbus_vmstate1,
'erst-test': files('erst-test.c'),
'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
- 'migration-test': test_migration_files + migration_tls_files,
+ 'migration-test': test_migration_files + migration_tls_files + migration_colo_files,
'pxe-test': files('boot-sector.c'),
'pnv-xive2-test': files('pnv-xive2-common.c', 'pnv-xive2-flush-sync.c',
'pnv-xive2-nvpg_bar.c'),
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 08936871741535c926eeac40a7d7c3f461c72fd0..e582f05c7dc2673dbd05a936df8feb6c964b5bbc 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -55,6 +55,7 @@ int main(int argc, char **argv)
migration_test_add_precopy(env);
migration_test_add_cpr(env);
migration_test_add_misc(env);
+ migration_test_add_colo(env);
ret = g_test_run();
diff --git a/tests/qtest/migration/colo-tests.c b/tests/qtest/migration/colo-tests.c
new file mode 100644
index 0000000000000000000000000000000000000000..598a1d3821ed0a90318732702027cebad47352fd
--- /dev/null
+++ b/tests/qtest/migration/colo-tests.c
@@ -0,0 +1,198 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * QTest testcases for COLO migration
+ *
+ * Copyright (c) 2025 Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "migration/framework.h"
+#include "migration/migration-qmp.h"
+#include "migration/migration-util.h"
+#include "qemu/module.h"
+
+static int test_colo_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ QTestState *from, *to;
+ void *data_hook = NULL;
+
+ /*
+ * For the COLO test, both VMs will run in parallel. Thus both VMs want to
+ * open the image read/write at the same time. Using read-only=on is not
+ * possible here, because ide-hd does not support read-only backing image.
+ *
+ * So use -snapshot, where each qemu instance creates its own writable
+ * snapshot internally while leaving the real image read-only.
+ */
+ args->start.opts_source = "-snapshot";
+ args->start.opts_target = "-snapshot";
+
+ /*
+ * COLO migration code logs many errors when the migration socket
+ * is shut down, these are expected so we hide them here.
+ */
+ args->start.hide_stderr = true;
+
+ /*
+ * Test with yank with out of band capability since that is how it is
+ * used in production.
+ */
+ args->start.oob = true;
+ args->start.caps[MIGRATION_CAPABILITY_X_COLO] = true;
+
+ if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
+ return -1;
+ }
+
+ migrate_set_parameter_int(from, "x-checkpoint-delay", 300);
+
+ if (args->start_hook) {
+ data_hook = args->start_hook(from, to);
+ }
+
+ migrate_ensure_converge(from);
+ wait_for_serial("src_serial");
+
+ migrate_qmp(from, to, args->connect_uri, NULL, "{}");
+
+ wait_for_migration_status(from, "colo", NULL);
+ wait_for_resume(to, get_dst());
+
+ wait_for_serial("src_serial");
+ wait_for_serial("dest_serial");
+
+ /* wait for 3 checkpoints */
+ for (int i = 0; i < 3; i++) {
+ qtest_qmp_eventwait(to, "RESUME");
+ wait_for_serial("src_serial");
+ wait_for_serial("dest_serial");
+ }
+
+ if (failover_during_checkpoint) {
+ qtest_qmp_eventwait(to, "STOP");
+ }
+ if (primary_failover) {
+ qtest_qmp_assert_success(from, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+ "'arguments': {'instances':"
+ "[{'type': 'migration'}]}}");
+ qtest_qmp_assert_success(from, "{'execute': 'x-colo-lost-heartbeat'}");
+ wait_for_serial("src_serial");
+ } else {
+ qtest_qmp_assert_success(to, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+ "'arguments': {'instances':"
+ "[{'type': 'migration'}]}}");
+ qtest_qmp_assert_success(to, "{'execute': 'x-colo-lost-heartbeat'}");
+ wait_for_serial("dest_serial");
+ }
+
+ if (args->end_hook) {
+ args->end_hook(from, to, data_hook);
+ }
+
+ migrate_end(from, to, !primary_failover);
+
+ return 0;
+}
+
+static void test_colo_plain_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ args->listen_uri = "tcp:127.0.0.1:0";
+ test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void *hook_start_multifd(QTestState *from, QTestState *to)
+{
+ return migrate_hook_start_precopy_tcp_multifd_common(from, to, "none");
+}
+
+static void test_colo_multifd_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ args->listen_uri = "defer";
+ args->start_hook = hook_start_multifd;
+ args->start.caps[MIGRATION_CAPABILITY_MULTIFD] = true;
+ test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void test_colo_plain_primary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_plain_common(args, false, true);
+}
+
+static void test_colo_plain_secondary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_plain_common(args, false, false);
+}
+
+static void test_colo_multifd_primary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_multifd_common(args, false, true);
+}
+
+static void test_colo_multifd_secondary_failover(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, false, false);
+}
+
+static void test_colo_plain_primary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_plain_common(args, true, true);
+}
+
+static void test_colo_plain_secondary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_plain_common(args, true, false);
+}
+
+static void test_colo_multifd_primary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, true, true);
+}
+
+static void test_colo_multifd_secondary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, true, false);
+}
+
+void migration_test_add_colo(MigrationTestEnv *env)
+{
+ if (!env->full_set) {
+ return;
+ }
+
+ migration_test_add("/migration/colo/plain/primary_failover",
+ test_colo_plain_primary_failover);
+ migration_test_add("/migration/colo/plain/secondary_failover",
+ test_colo_plain_secondary_failover);
+
+ migration_test_add("/migration/colo/multifd/primary_failover",
+ test_colo_multifd_primary_failover);
+ migration_test_add("/migration/colo/multifd/secondary_failover",
+ test_colo_multifd_secondary_failover);
+
+ migration_test_add("/migration/colo/plain/primary_failover_checkpoint",
+ test_colo_plain_primary_failover_checkpoint);
+ migration_test_add("/migration/colo/plain/secondary_failover_checkpoint",
+ test_colo_plain_secondary_failover_checkpoint);
+
+ migration_test_add("/migration/colo/multifd/primary_failover_checkpoint",
+ test_colo_multifd_primary_failover_checkpoint);
+ migration_test_add("/migration/colo/multifd/secondary_failover_checkpoint",
+ test_colo_multifd_secondary_failover_checkpoint);
+}
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index 40984d04930da2d181326d9f6a742bde49018103..80eef758932ce9c301ed6c0f6383d18756144870 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -264,5 +264,10 @@ void migration_test_add_file(MigrationTestEnv *env);
void migration_test_add_precopy(MigrationTestEnv *env);
void migration_test_add_cpr(MigrationTestEnv *env);
void migration_test_add_misc(MigrationTestEnv *env);
+#ifdef CONFIG_REPLICATION
+void migration_test_add_colo(MigrationTestEnv *env);
+#else
+static inline void migration_test_add_colo(MigrationTestEnv *env) {};
+#endif
#endif /* TEST_FRAMEWORK_H */
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 13/21] migration-test: Add COLO migration unit test
2026-03-02 10:19 Lukas Straub
@ 2026-03-02 10:19 ` Lukas Straub
0 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 10:19 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Laurent Vivier,
Paolo Bonzini, Zhang Chen, Hailiang Zhang, Markus Armbruster,
Li Zhijian, Dr. David Alan Gilbert
Add a COLO migration test for COLO migration and failover.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
MAINTAINERS | 1 +
tests/qtest/meson.build | 7 +-
tests/qtest/migration-test.c | 1 +
tests/qtest/migration/colo-tests.c | 198 +++++++++++++++++++++++++++++++++++++
tests/qtest/migration/framework.h | 5 +
5 files changed, 211 insertions(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index d2a1f4cc08223cb944b61e32a6d89e25bf82eacb..1b0ae10750036be00571b7104ad8426c071bb54c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3875,6 +3875,7 @@ F: migration/colo*
F: migration/multifd-colo.*
F: include/migration/colo.h
F: include/migration/failover.h
+F: tests/qtest/migration/colo-tests.c
F: docs/COLO-FT.txt
COLO Proxy
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 25fdbc798010b19e8ec9b6ab55e02d3fb5741398..6a46e2a767de12d978d910ddb6de175bce9810b8 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -374,6 +374,11 @@ if gnutls.found()
endif
endif
+migration_colo_files = []
+if get_option('replication').allowed()
+ migration_colo_files = [files('migration/colo-tests.c')]
+endif
+
qtests = {
'aspeed_hace-test': files('aspeed-hace-utils.c', 'aspeed_hace-test.c'),
'aspeed_smc-test': files('aspeed-smc-utils.c', 'aspeed_smc-test.c'),
@@ -385,7 +390,7 @@ qtests = {
'migration/migration-util.c') + dbus_vmstate1,
'erst-test': files('erst-test.c'),
'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
- 'migration-test': test_migration_files + migration_tls_files,
+ 'migration-test': test_migration_files + migration_tls_files + migration_colo_files,
'pxe-test': files('boot-sector.c'),
'pnv-xive2-test': files('pnv-xive2-common.c', 'pnv-xive2-flush-sync.c',
'pnv-xive2-nvpg_bar.c'),
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 08936871741535c926eeac40a7d7c3f461c72fd0..e582f05c7dc2673dbd05a936df8feb6c964b5bbc 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -55,6 +55,7 @@ int main(int argc, char **argv)
migration_test_add_precopy(env);
migration_test_add_cpr(env);
migration_test_add_misc(env);
+ migration_test_add_colo(env);
ret = g_test_run();
diff --git a/tests/qtest/migration/colo-tests.c b/tests/qtest/migration/colo-tests.c
new file mode 100644
index 0000000000000000000000000000000000000000..598a1d3821ed0a90318732702027cebad47352fd
--- /dev/null
+++ b/tests/qtest/migration/colo-tests.c
@@ -0,0 +1,198 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * QTest testcases for COLO migration
+ *
+ * Copyright (c) 2025 Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "migration/framework.h"
+#include "migration/migration-qmp.h"
+#include "migration/migration-util.h"
+#include "qemu/module.h"
+
+static int test_colo_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ QTestState *from, *to;
+ void *data_hook = NULL;
+
+ /*
+ * For the COLO test, both VMs will run in parallel. Thus both VMs want to
+ * open the image read/write at the same time. Using read-only=on is not
+ * possible here, because ide-hd does not support read-only backing image.
+ *
+ * So use -snapshot, where each qemu instance creates its own writable
+ * snapshot internally while leaving the real image read-only.
+ */
+ args->start.opts_source = "-snapshot";
+ args->start.opts_target = "-snapshot";
+
+ /*
+ * COLO migration code logs many errors when the migration socket
+ * is shut down, these are expected so we hide them here.
+ */
+ args->start.hide_stderr = true;
+
+ /*
+ * Test with yank with out of band capability since that is how it is
+ * used in production.
+ */
+ args->start.oob = true;
+ args->start.caps[MIGRATION_CAPABILITY_X_COLO] = true;
+
+ if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
+ return -1;
+ }
+
+ migrate_set_parameter_int(from, "x-checkpoint-delay", 300);
+
+ if (args->start_hook) {
+ data_hook = args->start_hook(from, to);
+ }
+
+ migrate_ensure_converge(from);
+ wait_for_serial("src_serial");
+
+ migrate_qmp(from, to, args->connect_uri, NULL, "{}");
+
+ wait_for_migration_status(from, "colo", NULL);
+ wait_for_resume(to, get_dst());
+
+ wait_for_serial("src_serial");
+ wait_for_serial("dest_serial");
+
+ /* wait for 3 checkpoints */
+ for (int i = 0; i < 3; i++) {
+ qtest_qmp_eventwait(to, "RESUME");
+ wait_for_serial("src_serial");
+ wait_for_serial("dest_serial");
+ }
+
+ if (failover_during_checkpoint) {
+ qtest_qmp_eventwait(to, "STOP");
+ }
+ if (primary_failover) {
+ qtest_qmp_assert_success(from, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+ "'arguments': {'instances':"
+ "[{'type': 'migration'}]}}");
+ qtest_qmp_assert_success(from, "{'execute': 'x-colo-lost-heartbeat'}");
+ wait_for_serial("src_serial");
+ } else {
+ qtest_qmp_assert_success(to, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+ "'arguments': {'instances':"
+ "[{'type': 'migration'}]}}");
+ qtest_qmp_assert_success(to, "{'execute': 'x-colo-lost-heartbeat'}");
+ wait_for_serial("dest_serial");
+ }
+
+ if (args->end_hook) {
+ args->end_hook(from, to, data_hook);
+ }
+
+ migrate_end(from, to, !primary_failover);
+
+ return 0;
+}
+
+static void test_colo_plain_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ args->listen_uri = "tcp:127.0.0.1:0";
+ test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void *hook_start_multifd(QTestState *from, QTestState *to)
+{
+ return migrate_hook_start_precopy_tcp_multifd_common(from, to, "none");
+}
+
+static void test_colo_multifd_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ args->listen_uri = "defer";
+ args->start_hook = hook_start_multifd;
+ args->start.caps[MIGRATION_CAPABILITY_MULTIFD] = true;
+ test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void test_colo_plain_primary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_plain_common(args, false, true);
+}
+
+static void test_colo_plain_secondary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_plain_common(args, false, false);
+}
+
+static void test_colo_multifd_primary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_multifd_common(args, false, true);
+}
+
+static void test_colo_multifd_secondary_failover(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, false, false);
+}
+
+static void test_colo_plain_primary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_plain_common(args, true, true);
+}
+
+static void test_colo_plain_secondary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_plain_common(args, true, false);
+}
+
+static void test_colo_multifd_primary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, true, true);
+}
+
+static void test_colo_multifd_secondary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, true, false);
+}
+
+void migration_test_add_colo(MigrationTestEnv *env)
+{
+ if (!env->full_set) {
+ return;
+ }
+
+ migration_test_add("/migration/colo/plain/primary_failover",
+ test_colo_plain_primary_failover);
+ migration_test_add("/migration/colo/plain/secondary_failover",
+ test_colo_plain_secondary_failover);
+
+ migration_test_add("/migration/colo/multifd/primary_failover",
+ test_colo_multifd_primary_failover);
+ migration_test_add("/migration/colo/multifd/secondary_failover",
+ test_colo_multifd_secondary_failover);
+
+ migration_test_add("/migration/colo/plain/primary_failover_checkpoint",
+ test_colo_plain_primary_failover_checkpoint);
+ migration_test_add("/migration/colo/plain/secondary_failover_checkpoint",
+ test_colo_plain_secondary_failover_checkpoint);
+
+ migration_test_add("/migration/colo/multifd/primary_failover_checkpoint",
+ test_colo_multifd_primary_failover_checkpoint);
+ migration_test_add("/migration/colo/multifd/secondary_failover_checkpoint",
+ test_colo_multifd_secondary_failover_checkpoint);
+}
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index 40984d04930da2d181326d9f6a742bde49018103..80eef758932ce9c301ed6c0f6383d18756144870 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -264,5 +264,10 @@ void migration_test_add_file(MigrationTestEnv *env);
void migration_test_add_precopy(MigrationTestEnv *env);
void migration_test_add_cpr(MigrationTestEnv *env);
void migration_test_add_misc(MigrationTestEnv *env);
+#ifdef CONFIG_REPLICATION
+void migration_test_add_colo(MigrationTestEnv *env);
+#else
+static inline void migration_test_add_colo(MigrationTestEnv *env) {};
+#endif
#endif /* TEST_FRAMEWORK_H */
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test
@ 2026-03-02 11:43 Lukas Straub
2026-03-02 11:43 ` [PATCH v11 01/21] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
` (21 more replies)
0 siblings, 22 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:43 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian, Juan Quintela
Screw b4, I'm back to git send-email again. Also screw my email provider black-box.
Hello everyone,
This has some cleanups for and adds multifd support and migration unit tests
for COLO migration.
Regards,
Lukas
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
Changes in v11:
- Use colo with return-path capability
- Link to v10: https://lore.kernel.org/qemu-devel/20260220-colo_unit_test_multifd-v10-0-bfe67d422ef1@web.de
Changes in v10:
- multifd: always kick the main thread
- always open the return path socket on source
- Link to v9: https://lore.kernel.org/qemu-devel/20260218-colo_unit_test_multifd-v9-0-d8dbdb0ca6f6@web.de
Changes in v9:
- Rebase onto master
- Fix two rare bugs discovered during sresstesting the colo unit test
- Link to v8: https://lore.kernel.org/qemu-devel/20260210-colo_unit_test_multifd-v8-0-7f9e5f7d082b@web.de
Changes in v8:
- Fix peter's review comments
- Link to v7: https://lore.kernel.org/qemu-devel/20260210-colo_unit_test_multifd-v7-0-23bd32f36828@web.de
Changes in v7:
- Fix peter's review comments
- Link to v6: https://lore.kernel.org/qemu-devel/20260206-colo_unit_test_multifd-v6-0-27779dda139d@web.de
Changes in v6:
- Fix the crash when running COLO with TCG accel.
- Link to v5: https://lore.kernel.org/qemu-devel/20260203-colo_unit_test_multifd-v5-0-57508b7389f6@web.de
Changes in v5:
- Remove unused inmports from multifd-colo.c
- Mention the checkpoint overhead of reset to the Q35 fix
- Link to v4: https://lore.kernel.org/qemu-devel/20260130-colo_unit_test_multifd-v4-0-7115ab6f0e77@web.de
Changes in v4:
- Add cleanup patches to remove migration_incoming_colo_enabled() and MIG_CMD_ENABLE_COLO
- Add more comments to the colo unit test
- Call colo_release_ram_cache() after multifd threads terminate
- Link to v3: https://lore.kernel.org/qemu-devel/20260125-colo_unit_test_multifd-v3-0-ae926ccd8eae@web.de
Changes in v3:
- Fix peter's review comments.
- Fix COLO with Q35 machine
- Link to v2: https://lore.kernel.org/qemu-devel/20260117-colo_unit_test_multifd-v2-0-ab521777fa51@web.de
Changes in v2:
- Fix review comments
- Hide stderr in colo migration test since the logged errors are expected
- Add benchmarking data for multifd
- Add myself as maintainer for COLO migration framework
- Link to v1: https://lore.kernel.org/qemu-devel/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de
---
Lukas Straub (21):
MAINTAINERS: Add myself as maintainer for COLO migration framework
MAINTAINERS: Remove Hailiang Zhang from COLO migration framework
colo: Setup ram cache in normal migration path
colo: Replace migration_incoming_colo_enabled() with migrate_colo()
colo: Remove ENABLE_COLO savevm command and mark it as deprecated
ram: Remove colo special-casing
multifd: Move ram state receive into multifd_ram_state_recv()
multifd: Add COLO support
Call colo_release_ram_cache() after multifd threads terminate
colo: Fix crash during device vmstate load
colo: Hold the BQL while sending ram state
colo: Do not hold the BQL while receiving ram state.
migration-test: Add COLO migration unit test
Convert colo main documentation to restructuredText
qemu-colo.rst: Miscellaneous changes
qemu-colo.rst: Add my copyright
qemu-colo.rst: Simplify the block replication setup
multifd: Fix hang if send thread errors during sync
colo: Use file lock in primary_vm_do_failover()
migration: Keep s->rp_state.from_dst_file open until migration ends
colo: Reuse the return path from migration on primary and secondary side
MAINTAINERS | 6 +-
docs/COLO-FT.txt | 334 ----------------------------------
docs/system/index.rst | 1 +
docs/system/qemu-colo.rst | 362 +++++++++++++++++++++++++++++++++++++
include/migration/colo.h | 3 -
migration/colo.c | 59 +++---
migration/meson.build | 2 +-
migration/migration.c | 80 ++++----
migration/multifd-colo.c | 44 +++++
migration/multifd-colo.h | 26 +++
migration/multifd-nocomp.c | 10 +-
migration/multifd.c | 26 ++-
migration/multifd.h | 5 +-
migration/options.c | 10 +-
migration/ram.c | 12 +-
migration/savevm.c | 37 +---
migration/savevm.h | 1 -
migration/trace-events | 1 -
tests/qtest/meson.build | 7 +-
tests/qtest/migration-test.c | 1 +
tests/qtest/migration/colo-tests.c | 199 ++++++++++++++++++++
tests/qtest/migration/framework.c | 13 ++
tests/qtest/migration/framework.h | 5 +
23 files changed, 777 insertions(+), 467 deletions(-)
---
base-commit: d8a9d97317d03190b34498741f98f22e2a9afe3e
change-id: 20251230-colo_unit_test_multifd-8bf58dcebd46
Best regards,
--
Lukas Straub <lukasstraub2@web.de>
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v11 01/21] MAINTAINERS: Add myself as maintainer for COLO migration framework
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
@ 2026-03-02 11:43 ` Lukas Straub
2026-03-02 11:43 ` [PATCH v11 02/21] MAINTAINERS: Remove Hailiang Zhang from " Lukas Straub
` (20 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:43 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
I am ready to maintain it.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 606b16762cf2ba24d6bd8bb72421db1325bb1908..ff592384b366acc823f19a6596ea8f32446763b2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3870,6 +3870,7 @@ F: qapi/yank.json
COLO Framework
M: Hailiang Zhang <zhanghailiang@xfusion.com>
+M: Lukas Straub <lukasstraub2@web.de>
S: Maintained
F: migration/colo*
F: include/migration/colo.h
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 02/21] MAINTAINERS: Remove Hailiang Zhang from COLO migration framework
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
2026-03-02 11:43 ` [PATCH v11 01/21] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
@ 2026-03-02 11:43 ` Lukas Straub
2026-03-10 11:33 ` Lukas Straub
2026-03-02 11:43 ` [PATCH v11 03/21] colo: Setup ram cache in normal migration path Lukas Straub
` (19 subsequent siblings)
21 siblings, 1 reply; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:43 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
His last email to the mailing list is from December 2021:
https://lore.kernel.org/qemu-devel/20211214075424.6920-1-zhanghailiang@xfusion.com/
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
MAINTAINERS | 1 -
1 file changed, 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index ff592384b366acc823f19a6596ea8f32446763b2..aa8f795783ba936ce2521e08aaca8de4a112ed33 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3869,7 +3869,6 @@ F: include/qemu/yank.h
F: qapi/yank.json
COLO Framework
-M: Hailiang Zhang <zhanghailiang@xfusion.com>
M: Lukas Straub <lukasstraub2@web.de>
S: Maintained
F: migration/colo*
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 03/21] colo: Setup ram cache in normal migration path
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
2026-03-02 11:43 ` [PATCH v11 01/21] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
2026-03-02 11:43 ` [PATCH v11 02/21] MAINTAINERS: Remove Hailiang Zhang from " Lukas Straub
@ 2026-03-02 11:43 ` Lukas Straub
2026-03-02 11:43 ` [PATCH v11 04/21] colo: Replace migration_incoming_colo_enabled() with migrate_colo() Lukas Straub
` (18 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:43 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
Since
121ccedc2b migration: block incoming colo when capability is disabled
x-colo capability needs to be always enabled on the incoming side.
So migration_incoming_colo_enabled() and migrate_colo() are equivalent
with migrate_colo() being easier to reason about since it is always true
during the whole migration.
Use migrate_colo() to initialize the ram cache in the normal migration path.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
migration/migration.c | 18 ++++++++++++++----
migration/savevm.c | 14 +-------------
2 files changed, 15 insertions(+), 17 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index a5b0465ed30cb812cb294ab901c7a37fe6157dc6..c2b9621190f8678ac6b32d6794d9bcb6ffa5e402 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -630,10 +630,6 @@ int migration_incoming_enable_colo(Error **errp)
return -EINVAL;
}
- if (ram_block_discard_disable(true)) {
- error_setg(errp, "COLO: cannot disable RAM discard");
- return -EBUSY;
- }
migration_colo_enabled = true;
return 0;
}
@@ -770,6 +766,20 @@ process_incoming_migration_co(void *opaque)
assert(mis->from_src_file);
+ if (migrate_colo()) {
+ if (ram_block_discard_disable(true)) {
+ error_setg(&local_err, "COLO: cannot disable RAM discard");
+ goto fail;
+ }
+
+ ret = colo_init_ram_cache(&local_err);
+ if (ret) {
+ error_prepend(&local_err, "failed to init colo RAM cache: %d: ",
+ ret);
+ goto fail;
+ }
+ }
+
mis->largest_page_size = qemu_ram_pagesize_largest();
postcopy_state_set(POSTCOPY_INCOMING_NONE);
migrate_set_state(&mis->state, MIGRATION_STATUS_SETUP,
diff --git a/migration/savevm.c b/migration/savevm.c
index 3a16c467b25b8d93b7d40bd0db751158e0278b4f..b88851cdb7974314b8481646a1dd19642887f210 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2427,19 +2427,7 @@ static int loadvm_process_enable_colo(MigrationIncomingState *mis,
Error **errp)
{
ERRP_GUARD();
- int ret;
-
- ret = migration_incoming_enable_colo(errp);
- if (ret < 0) {
- return ret;
- }
-
- ret = colo_init_ram_cache(errp);
- if (ret) {
- error_prepend(errp, "failed to init colo RAM cache: %d: ", ret);
- migration_incoming_disable_colo();
- }
- return ret;
+ return migration_incoming_enable_colo(errp);
}
static int loadvm_postcopy_handle_switchover_start(Error **errp)
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 04/21] colo: Replace migration_incoming_colo_enabled() with migrate_colo()
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (2 preceding siblings ...)
2026-03-02 11:43 ` [PATCH v11 03/21] colo: Setup ram cache in normal migration path Lukas Straub
@ 2026-03-02 11:43 ` Lukas Straub
2026-03-02 11:43 ` [PATCH v11 05/21] colo: Remove ENABLE_COLO savevm command and mark it as deprecated Lukas Straub
` (17 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:43 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
Since
121ccedc2b migration: block incoming colo when capability is disabled
x-colo capability needs to be always enabled on the incoming side.
So migration_incoming_colo_enabled() and migrate_colo() are equivalent
with migrate_colo() being easier to reason about since it is always true
during the whole migration.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
include/migration/colo.h | 1 -
migration/colo.c | 2 +-
migration/migration.c | 9 ++-------
migration/ram.c | 2 +-
4 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/include/migration/colo.h b/include/migration/colo.h
index d4fe422e4d335d3bef4f860f56400fcd73287a0e..2496a968cc1ce709f706c0efe57e4f765f163d3c 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -27,7 +27,6 @@ bool migration_in_colo_state(void);
/* loadvm */
int migration_incoming_enable_colo(Error **errp);
void migration_incoming_disable_colo(void);
-bool migration_incoming_colo_enabled(void);
bool migration_incoming_in_colo_state(void);
COLOMode get_colo_mode(void);
diff --git a/migration/colo.c b/migration/colo.c
index f7a5bd3619a49a3b4a8306973de7a4411cc0df58..97a224c39c49ff2269f375db47112458cab0b4cb 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -935,7 +935,7 @@ void coroutine_fn colo_incoming_co(void)
QemuThread th;
assert(bql_locked());
- assert(migration_incoming_colo_enabled());
+ assert(migrate_colo());
qemu_thread_create(&th, MIGRATION_THREAD_DST_COLO,
colo_process_incoming_thread,
diff --git a/migration/migration.c b/migration/migration.c
index c2b9621190f8678ac6b32d6794d9bcb6ffa5e402..ef6aac53343f2217cd1aa37e493483703068d1ff 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -605,11 +605,6 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis,
}
static bool migration_colo_enabled;
-bool migration_incoming_colo_enabled(void)
-{
- return migration_colo_enabled;
-}
-
void migration_incoming_disable_colo(void)
{
ram_block_discard_disable(false);
@@ -739,7 +734,7 @@ static void process_incoming_migration_bh(void *opaque)
} else {
runstate_set(RUN_STATE_PAUSED);
}
- } else if (migration_incoming_colo_enabled()) {
+ } else if (migrate_colo()) {
migration_incoming_disable_colo();
vm_start();
} else {
@@ -807,7 +802,7 @@ process_incoming_migration_co(void *opaque)
goto fail;
}
- if (migration_incoming_colo_enabled()) {
+ if (migrate_colo()) {
/* yield until COLO exit */
colo_incoming_co();
}
diff --git a/migration/ram.c b/migration/ram.c
index fc7ece2c1a10f34aa5a91f58cbe42ea418d7c078..aebf77aa0b861e00516d6f1090aebefdd0d97e54 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4370,7 +4370,7 @@ static int ram_load_precopy(QEMUFile *f)
* speed of the migration, but it obviously reduce the downtime of
* back-up all SVM'S memory in COLO preparing stage.
*/
- if (migration_incoming_colo_enabled()) {
+ if (migrate_colo()) {
if (migration_incoming_in_colo_state()) {
/* In COLO stage, put all pages into cache temporarily */
host = colo_cache_from_block_offset(block, addr, true);
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 05/21] colo: Remove ENABLE_COLO savevm command and mark it as deprecated
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (3 preceding siblings ...)
2026-03-02 11:43 ` [PATCH v11 04/21] colo: Replace migration_incoming_colo_enabled() with migrate_colo() Lukas Straub
@ 2026-03-02 11:43 ` Lukas Straub
2026-03-02 11:43 ` [PATCH v11 06/21] ram: Remove colo special-casing Lukas Straub
` (16 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:43 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
No need for it anymore now that x-colo capability is required
on incoming side. There is also no need to send it for backwards
compatibility since we only support COLO with the same version on
both sides.
We mark the command code as deprecated and now error out if such
a unhandled command is encountered in loadvm_process_command().
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
include/migration/colo.h | 2 --
migration/migration.c | 31 -------------------------------
migration/savevm.c | 25 +++++--------------------
migration/savevm.h | 1 -
migration/trace-events | 1 -
5 files changed, 5 insertions(+), 55 deletions(-)
diff --git a/include/migration/colo.h b/include/migration/colo.h
index 2496a968cc1ce709f706c0efe57e4f765f163d3c..8f94054a10760d0f2598f080643f45f9944cf051 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -25,8 +25,6 @@ void migrate_start_colo_process(MigrationState *s);
bool migration_in_colo_state(void);
/* loadvm */
-int migration_incoming_enable_colo(Error **errp);
-void migration_incoming_disable_colo(void);
bool migration_incoming_in_colo_state(void);
COLOMode get_colo_mode(void);
diff --git a/migration/migration.c b/migration/migration.c
index ef6aac53343f2217cd1aa37e493483703068d1ff..dba5d6ede579da42693d5270ede9660fb145238a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -604,31 +604,6 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis,
return migrate_send_rp_message_req_pages(mis, rb, start);
}
-static bool migration_colo_enabled;
-void migration_incoming_disable_colo(void)
-{
- ram_block_discard_disable(false);
- migration_colo_enabled = false;
-}
-
-int migration_incoming_enable_colo(Error **errp)
-{
-#ifndef CONFIG_REPLICATION
- error_setg(errp, "ENABLE_COLO command come in migration stream, but the "
- "replication module is not built in");
- return -ENOTSUP;
-#endif
-
- if (!migrate_colo()) {
- error_setg(errp, "ENABLE_COLO command come in migration stream"
- ", but x-colo capability is not set");
- return -EINVAL;
- }
-
- migration_colo_enabled = true;
- return 0;
-}
-
void migrate_add_address(SocketAddress *address)
{
MigrationIncomingState *mis = migration_incoming_get_current();
@@ -735,7 +710,6 @@ static void process_incoming_migration_bh(void *opaque)
runstate_set(RUN_STATE_PAUSED);
}
} else if (migrate_colo()) {
- migration_incoming_disable_colo();
vm_start();
} else {
runstate_set(global_state_get_runstate());
@@ -3534,11 +3508,6 @@ static void *migration_thread(void *opaque)
qemu_savevm_send_postcopy_advise(s->to_dst_file);
}
- if (migrate_colo()) {
- /* Notify migration destination that we enable COLO */
- qemu_savevm_send_colo_enable(s->to_dst_file);
- }
-
if (migrate_auto_converge()) {
/* Start RAMBlock dirty bitmap sync timer */
cpu_throttle_dirty_sync_timer(true);
diff --git a/migration/savevm.c b/migration/savevm.c
index b88851cdb7974314b8481646a1dd19642887f210..197c89e0e659b889409c3dc97518920ea9c1824f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -90,7 +90,7 @@ enum qemu_vm_cmd {
were previously sent during
precopy but are dirty. */
MIG_CMD_PACKAGED, /* Send a wrapped stream within this stream */
- MIG_CMD_ENABLE_COLO, /* Enable COLO */
+ MIG_CMD_DEPRECATED_0, /* Prior to 10.2, used as MIG_CMD_ENABLE_COLO */
MIG_CMD_POSTCOPY_RESUME, /* resume postcopy on dest */
MIG_CMD_RECV_BITMAP, /* Request for recved bitmap on dst */
MIG_CMD_SWITCHOVER_START, /* Switchover start notification */
@@ -1103,12 +1103,6 @@ static void qemu_savevm_command_send(QEMUFile *f,
qemu_fflush(f);
}
-void qemu_savevm_send_colo_enable(QEMUFile *f)
-{
- trace_savevm_send_colo_enable();
- qemu_savevm_command_send(f, MIG_CMD_ENABLE_COLO, 0, NULL);
-}
-
void qemu_savevm_send_ping(QEMUFile *f, uint32_t value)
{
uint32_t buf;
@@ -2423,13 +2417,6 @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
return 0;
}
-static int loadvm_process_enable_colo(MigrationIncomingState *mis,
- Error **errp)
-{
- ERRP_GUARD();
- return migration_incoming_enable_colo(errp);
-}
-
static int loadvm_postcopy_handle_switchover_start(Error **errp)
{
SaveStateEntry *se;
@@ -2513,7 +2500,7 @@ static int loadvm_process_command(QEMUFile *f, Error **errp)
return ret;
}
}
- break;
+ return 0;
case MIG_CMD_PING:
tmp32 = qemu_get_be32(f);
@@ -2524,7 +2511,7 @@ static int loadvm_process_command(QEMUFile *f, Error **errp)
return -1;
}
migrate_send_rp_pong(mis, tmp32);
- break;
+ return 0;
case MIG_CMD_PACKAGED:
return loadvm_handle_cmd_packaged(mis, errp);
@@ -2548,14 +2535,12 @@ static int loadvm_process_command(QEMUFile *f, Error **errp)
case MIG_CMD_RECV_BITMAP:
return loadvm_handle_recv_bitmap(mis, len, errp);
- case MIG_CMD_ENABLE_COLO:
- return loadvm_process_enable_colo(mis, errp);
-
case MIG_CMD_SWITCHOVER_START:
return loadvm_postcopy_handle_switchover_start(errp);
}
- return 0;
+ error_setg(errp, "MIG_CMD 0x%x deprecated (len 0x%x)", cmd, len);
+ return -EINVAL;
}
/*
diff --git a/migration/savevm.h b/migration/savevm.h
index 2ba0881f3bd28dba10e52e16657a0b67b19bb00b..b3d1e8a13ca9a535a6990560ed2d64739ebe730e 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -65,7 +65,6 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
uint16_t len,
uint64_t *start_list,
uint64_t *length_list);
-void qemu_savevm_send_colo_enable(QEMUFile *f);
int qemu_save_device_state(QEMUFile *f, Error **errp);
int qemu_loadvm_state(QEMUFile *f, Error **errp);
void qemu_loadvm_state_cleanup(MigrationIncomingState *mis);
diff --git a/migration/trace-events b/migration/trace-events
index 90629f828f80b51500776ae2171724369e194573..60e5087e38beccb98588fdffec7deff9a7f92c88 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -37,7 +37,6 @@ savevm_send_ping(uint32_t val) "0x%x"
savevm_send_postcopy_listen(void) ""
savevm_send_postcopy_run(void) ""
savevm_send_postcopy_resume(void) ""
-savevm_send_colo_enable(void) ""
savevm_send_recv_bitmap(char *name) "%s"
savevm_send_switchover_start(void) ""
savevm_state_setup(void) ""
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 06/21] ram: Remove colo special-casing
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (4 preceding siblings ...)
2026-03-02 11:43 ` [PATCH v11 05/21] colo: Remove ENABLE_COLO savevm command and mark it as deprecated Lukas Straub
@ 2026-03-02 11:43 ` Lukas Straub
2026-03-02 11:43 ` [PATCH v11 07/21] multifd: Move ram state receive into multifd_ram_state_recv() Lukas Straub
` (15 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:43 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
We only enter colo state after the precopy migration is finished
so this if is always taken.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
migration/ram.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index aebf77aa0b861e00516d6f1090aebefdd0d97e54..979751f61b30d6c4b878866b5011507e7c519176 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3116,12 +3116,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
RAMBlock *block;
int ret, max_hg_page_size;
- /* migration has already setup the bitmap, reuse it. */
- if (!migration_in_colo_state()) {
- if (ram_init_all(rsp, errp) != 0) {
- return -1;
- }
+ assert(!migration_in_colo_state());
+
+ if (ram_init_all(rsp, errp) != 0) {
+ return -1;
}
+
(*rsp)->pss[RAM_CHANNEL_PRECOPY].pss_channel = f;
/*
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 07/21] multifd: Move ram state receive into multifd_ram_state_recv()
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (5 preceding siblings ...)
2026-03-02 11:43 ` [PATCH v11 06/21] ram: Remove colo special-casing Lukas Straub
@ 2026-03-02 11:43 ` Lukas Straub
2026-03-02 11:43 ` [PATCH v11 08/21] multifd: Add COLO support Lukas Straub
` (14 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:43 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
This is in preparation for the next patch.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
migration/multifd.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/migration/multifd.c b/migration/multifd.c
index ad6261688fdf98a5c7f4ee9fb80ba2901201a33e..332e6fc58053462419f3171f6c320ac37648ef7b 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1253,6 +1253,15 @@ static int multifd_device_state_recv(MultiFDRecvParams *p, Error **errp)
return ret;
}
+static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
+{
+ int ret;
+
+ ret = multifd_recv_state->ops->recv(p, errp);
+
+ return ret;
+}
+
static void *multifd_recv_thread(void *opaque)
{
MigrationState *s = migrate_get_current();
@@ -1387,7 +1396,7 @@ static void *multifd_recv_thread(void *opaque)
assert(use_packets);
ret = multifd_device_state_recv(p, &local_err);
} else {
- ret = multifd_recv_state->ops->recv(p, &local_err);
+ ret = multifd_ram_state_recv(p, &local_err);
}
if (ret != 0) {
break;
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 08/21] multifd: Add COLO support
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (6 preceding siblings ...)
2026-03-02 11:43 ` [PATCH v11 07/21] multifd: Move ram state receive into multifd_ram_state_recv() Lukas Straub
@ 2026-03-02 11:43 ` Lukas Straub
2026-03-02 11:43 ` [PATCH v11 09/21] Call colo_release_ram_cache() after multifd threads terminate Lukas Straub
` (13 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:43 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian, Juan Quintela
Like in the normal ram_load() path, put the received pages into the
colo cache and mark the pages in the bitmap so that they will be
flushed to the guest later.
Multifd with COLO is useful to reduce the VM pause time during checkpointing
for latency sensitive workloads. In such workloads the worst-case latency
is especially important.
Also, this is already worth it for the precopy phase as it helps with
converging. Moreover, multifd migration is the preferred way to do migration
nowadays and this allows to use multifd compression with COLO.
Benchmark:
Cluster nodes
- Intel Xenon E5-2630 v3
- 48Gb RAM
- 10G Ethernet
Guest
- Windows Server 2016
- 6Gb RAM
- 4 cores
Workload
- Upload a file to the guest with SMB to simulate moderate
memory dirtying
- Measure the memory transfer time portion of each checkpoint
- 600ms COLO checkpoint interval
Results
Plain
idle mean: 4.50ms 99per: 10.33ms
load mean: 24.30ms 99per: 78.05ms
Multifd-4
idle mean: 6.48ms 99per: 10.41ms
load mean: 14.12ms 99per: 31.27ms
Evaluation
While multifd has slightly higher latency when the guest idles, it is
10ms faster under load and more importantly it's worst case latency is
less than 1/2 of plain under load as can be seen in the 99. Percentile.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
MAINTAINERS | 1 +
migration/meson.build | 2 +-
migration/multifd-colo.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
migration/multifd-colo.h | 26 ++++++++++++++++++++++++++
migration/multifd-nocomp.c | 10 +++++++++-
migration/multifd.c | 8 ++++++++
migration/multifd.h | 5 ++++-
7 files changed, 93 insertions(+), 3 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index aa8f795783ba936ce2521e08aaca8de4a112ed33..d2a1f4cc08223cb944b61e32a6d89e25bf82eacb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3872,6 +3872,7 @@ COLO Framework
M: Lukas Straub <lukasstraub2@web.de>
S: Maintained
F: migration/colo*
+F: migration/multifd-colo.*
F: include/migration/colo.h
F: include/migration/failover.h
F: docs/COLO-FT.txt
diff --git a/migration/meson.build b/migration/meson.build
index c7f39bdb55239ecb0e775c77b90a1aa9e6a4a9ce..c9f0f5f9f2137536497e53e960ce70654ad1b394 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -39,7 +39,7 @@ system_ss.add(files(
), gnutls, zlib)
if get_option('replication').allowed()
- system_ss.add(files('colo-failover.c', 'colo.c'))
+ system_ss.add(files('colo-failover.c', 'colo.c', 'multifd-colo.c'))
else
system_ss.add(files('colo-stubs.c'))
endif
diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
new file mode 100644
index 0000000000000000000000000000000000000000..f160c6543414d3e157a444d613c96df4c5f0e602
--- /dev/null
+++ b/migration/multifd-colo.c
@@ -0,0 +1,44 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * multifd colo implementation
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "multifd.h"
+#include "multifd-colo.h"
+#include "migration/colo.h"
+#include "system/ramblock.h"
+
+void multifd_colo_prepare_recv(MultiFDRecvParams *p)
+{
+ /*
+ * While we're still in precopy state (not yet in colo state), we copy
+ * received pages to both guest and cache. No need to set dirty bits,
+ * since guest and cache memory are in sync.
+ */
+ if (migration_incoming_in_colo_state()) {
+ colo_record_bitmap(p->block, p->normal, p->normal_num);
+ colo_record_bitmap(p->block, p->zero, p->zero_num);
+ }
+}
+
+void multifd_colo_process_recv(MultiFDRecvParams *p)
+{
+ if (!migration_incoming_in_colo_state()) {
+ for (int i = 0; i < p->normal_num; i++) {
+ void *guest = p->block->host + p->normal[i];
+ void *cache = p->host + p->normal[i];
+ memcpy(guest, cache, multifd_ram_page_size());
+ }
+ for (int i = 0; i < p->zero_num; i++) {
+ void *guest = p->block->host + p->zero[i];
+ memset(guest, 0, multifd_ram_page_size());
+ }
+ }
+}
diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
new file mode 100644
index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
--- /dev/null
+++ b/migration/multifd-colo.h
@@ -0,0 +1,26 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * multifd colo header
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
+#define QEMU_MIGRATION_MULTIFD_COLO_H
+
+#ifdef CONFIG_REPLICATION
+
+void multifd_colo_prepare_recv(MultiFDRecvParams *p);
+void multifd_colo_process_recv(MultiFDRecvParams *p);
+
+#else
+
+static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
+static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
+
+#endif
+#endif
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -16,6 +16,7 @@
#include "file.h"
#include "migration-stats.h"
#include "multifd.h"
+#include "multifd-colo.h"
#include "options.h"
#include "migration.h"
#include "qapi/error.h"
@@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
return -1;
}
- p->host = p->block->host;
for (i = 0; i < p->normal_num; i++) {
uint64_t offset = be64_to_cpu(packet->offset[i]);
@@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
p->zero[i] = offset;
}
+ if (migrate_colo()) {
+ multifd_colo_prepare_recv(p);
+ assert(p->block->colo_cache);
+ p->host = p->block->colo_cache;
+ } else {
+ p->host = p->block->host;
+ }
+
return 0;
}
diff --git a/migration/multifd.c b/migration/multifd.c
index 332e6fc58053462419f3171f6c320ac37648ef7b..220ed8564960fdabc58e4baa069dd252c8ad293c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -29,6 +29,7 @@
#include "qemu-file.h"
#include "trace.h"
#include "multifd.h"
+#include "multifd-colo.h"
#include "options.h"
#include "qemu/yank.h"
#include "io/channel-file.h"
@@ -1258,6 +1259,13 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
int ret;
ret = multifd_recv_state->ops->recv(p, errp);
+ if (ret != 0) {
+ return ret;
+ }
+
+ if (migrate_colo()) {
+ multifd_colo_process_recv(p);
+ }
return ret;
}
diff --git a/migration/multifd.h b/migration/multifd.h
index 89a395aef2b09a6762c45b5361e0ab63256feff6..fbc35702b062fdc3213ce92baed35994f5967c2b 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -279,7 +279,10 @@ typedef struct {
uint64_t packets_recved;
/* ramblock */
RAMBlock *block;
- /* ramblock host address */
+ /*
+ * Normally, it points to ramblock's host address. When COLO
+ * is enabled, it points to the mirror cache for the ramblock.
+ */
uint8_t *host;
/* buffers to recv */
struct iovec *iov;
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 09/21] Call colo_release_ram_cache() after multifd threads terminate
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (7 preceding siblings ...)
2026-03-02 11:43 ` [PATCH v11 08/21] multifd: Add COLO support Lukas Straub
@ 2026-03-02 11:43 ` Lukas Straub
2026-03-02 11:45 ` [PATCH v11 10/21] colo: Fix crash during device vmstate load Lukas Straub
` (12 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:43 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
The multifd threads still may access the colo cache, so release it
only after they terminate.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
migration/colo.c | 3 ---
migration/migration.c | 3 +++
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/migration/colo.c b/migration/colo.c
index 97a224c39c49ff2269f375db47112458cab0b4cb..96102c9d0fb3b60e063c81ce07b730011bc4919f 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -949,7 +949,4 @@ void coroutine_fn colo_incoming_co(void)
/* Wait checkpoint incoming thread exit before free resource */
qemu_thread_join(&th);
bql_lock();
-
- /* We hold the global BQL, so it is safe here */
- colo_release_ram_cache();
}
diff --git a/migration/migration.c b/migration/migration.c
index dba5d6ede579da42693d5270ede9660fb145238a..f36d42ef657bdf26d78ca642d77a9b76e1c0c174 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -454,6 +454,9 @@ void migration_incoming_state_destroy(void)
* BQL and retake unconditionally.
*/
assert(bql_locked());
+ if (migrate_colo()) {
+ colo_release_ram_cache();
+ }
qemu_loadvm_state_cleanup(mis);
if (mis->to_src_file) {
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 10/21] colo: Fix crash during device vmstate load
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (8 preceding siblings ...)
2026-03-02 11:43 ` [PATCH v11 09/21] Call colo_release_ram_cache() after multifd threads terminate Lukas Straub
@ 2026-03-02 11:45 ` Lukas Straub
2026-03-02 11:45 ` [PATCH v11 11/21] colo: Hold the BQL while sending ram state Lukas Straub
` (11 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:45 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
With colo we load device vmstate during each checkpoint, on top of
a vm that was already running. Some devices expect a reset before
loading vmstate on such a previously running vm.
This fixes a crash when using COLO with Q35 machine.
The reset adds 10-20ms overhead to the checkpointing proces in my
testing.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
migration/colo.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/migration/colo.c b/migration/colo.c
index 96102c9d0fb3b60e063c81ce07b730011bc4919f..dc7cfa81ef7db78e3ee372642de48567c5bc06eb 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -729,6 +729,12 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
bql_lock();
vmstate_loading = true;
+ /*
+ * With colo we load device vmstate during each checkpoint, on top of
+ * a vm that was already running. Some devices expect a reset before
+ * loading vmstate on such a previously running vm.
+ */
+ qemu_system_reset(SHUTDOWN_CAUSE_SNAPSHOT_LOAD);
colo_flush_ram_cache();
ret = qemu_load_device_state(fb, errp);
if (ret < 0) {
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 11/21] colo: Hold the BQL while sending ram state
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (9 preceding siblings ...)
2026-03-02 11:45 ` [PATCH v11 10/21] colo: Fix crash during device vmstate load Lukas Straub
@ 2026-03-02 11:45 ` Lukas Straub
2026-03-02 11:45 ` [PATCH v11 12/21] colo: Do not hold the BQL while receiving " Lukas Straub
` (10 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:45 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
qemu_savevm_state_complete_precopy() requires that BQL is held.
This fixes a crash when running with TCG accel.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
migration/colo.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/migration/colo.c b/migration/colo.c
index dc7cfa81ef7db78e3ee372642de48567c5bc06eb..3297aa593cd9f87bf1013598464cc581a9d23531 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -455,9 +455,8 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
/* Note: device state is saved into buffer */
ret = qemu_save_device_state(fb, &local_err);
-
- bql_unlock();
if (ret < 0) {
+ bql_unlock();
goto out;
}
@@ -471,6 +470,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
*/
qemu_savevm_state_complete_precopy_iterable(s->to_dst_file, false);
qemu_savevm_state_end(s->to_dst_file);
+ bql_unlock();
/*
* We need the size of the VMstate data in Secondary side,
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 12/21] colo: Do not hold the BQL while receiving ram state.
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (10 preceding siblings ...)
2026-03-02 11:45 ` [PATCH v11 11/21] colo: Hold the BQL while sending ram state Lukas Straub
@ 2026-03-02 11:45 ` Lukas Straub
2026-03-02 11:45 ` [PATCH v11 13/21] migration-test: Add COLO migration unit test Lukas Straub
` (9 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:45 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
We only receive ram into the colo cache here and don't touch anything
else, so the BQL is not needed here.
Move cpu_synchronize_all_states() downwards, before we apply the received
checkpoint. It turns out that qemu_system_reset() already calls it
for us.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
migration/colo.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/migration/colo.c b/migration/colo.c
index 3297aa593cd9f87bf1013598464cc581a9d23531..ce02c71d8857d470be434bdf3a9cacad3baab0d5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -686,11 +686,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
return;
}
- bql_lock();
- cpu_synchronize_all_states();
ret = qemu_loadvm_state_main(mis->from_src_file, mis, errp);
- bql_unlock();
-
if (ret < 0) {
return;
}
@@ -733,6 +729,8 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
* With colo we load device vmstate during each checkpoint, on top of
* a vm that was already running. Some devices expect a reset before
* loading vmstate on such a previously running vm.
+ *
+ * NOTE: qemu_system_reset() calls cpu_synchronize_all_states() for us
*/
qemu_system_reset(SHUTDOWN_CAUSE_SNAPSHOT_LOAD);
colo_flush_ram_cache();
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 13/21] migration-test: Add COLO migration unit test
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (11 preceding siblings ...)
2026-03-02 11:45 ` [PATCH v11 12/21] colo: Do not hold the BQL while receiving " Lukas Straub
@ 2026-03-02 11:45 ` Lukas Straub
2026-03-10 14:47 ` Arun Menon
2026-03-02 11:45 ` [PATCH v11 14/21] Convert colo main documentation to restructuredText Lukas Straub
` (8 subsequent siblings)
21 siblings, 1 reply; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:45 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
Add a COLO migration test for COLO migration and failover.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
MAINTAINERS | 1 +
tests/qtest/meson.build | 7 +-
tests/qtest/migration-test.c | 1 +
tests/qtest/migration/colo-tests.c | 198 +++++++++++++++++++++++++++++++++++++
tests/qtest/migration/framework.h | 5 +
5 files changed, 211 insertions(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index d2a1f4cc08223cb944b61e32a6d89e25bf82eacb..1b0ae10750036be00571b7104ad8426c071bb54c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3875,6 +3875,7 @@ F: migration/colo*
F: migration/multifd-colo.*
F: include/migration/colo.h
F: include/migration/failover.h
+F: tests/qtest/migration/colo-tests.c
F: docs/COLO-FT.txt
COLO Proxy
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 25fdbc798010b19e8ec9b6ab55e02d3fb5741398..6a46e2a767de12d978d910ddb6de175bce9810b8 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -374,6 +374,11 @@ if gnutls.found()
endif
endif
+migration_colo_files = []
+if get_option('replication').allowed()
+ migration_colo_files = [files('migration/colo-tests.c')]
+endif
+
qtests = {
'aspeed_hace-test': files('aspeed-hace-utils.c', 'aspeed_hace-test.c'),
'aspeed_smc-test': files('aspeed-smc-utils.c', 'aspeed_smc-test.c'),
@@ -385,7 +390,7 @@ qtests = {
'migration/migration-util.c') + dbus_vmstate1,
'erst-test': files('erst-test.c'),
'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
- 'migration-test': test_migration_files + migration_tls_files,
+ 'migration-test': test_migration_files + migration_tls_files + migration_colo_files,
'pxe-test': files('boot-sector.c'),
'pnv-xive2-test': files('pnv-xive2-common.c', 'pnv-xive2-flush-sync.c',
'pnv-xive2-nvpg_bar.c'),
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 08936871741535c926eeac40a7d7c3f461c72fd0..e582f05c7dc2673dbd05a936df8feb6c964b5bbc 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -55,6 +55,7 @@ int main(int argc, char **argv)
migration_test_add_precopy(env);
migration_test_add_cpr(env);
migration_test_add_misc(env);
+ migration_test_add_colo(env);
ret = g_test_run();
diff --git a/tests/qtest/migration/colo-tests.c b/tests/qtest/migration/colo-tests.c
new file mode 100644
index 0000000000000000000000000000000000000000..598a1d3821ed0a90318732702027cebad47352fd
--- /dev/null
+++ b/tests/qtest/migration/colo-tests.c
@@ -0,0 +1,198 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * QTest testcases for COLO migration
+ *
+ * Copyright (c) 2025 Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "migration/framework.h"
+#include "migration/migration-qmp.h"
+#include "migration/migration-util.h"
+#include "qemu/module.h"
+
+static int test_colo_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ QTestState *from, *to;
+ void *data_hook = NULL;
+
+ /*
+ * For the COLO test, both VMs will run in parallel. Thus both VMs want to
+ * open the image read/write at the same time. Using read-only=on is not
+ * possible here, because ide-hd does not support read-only backing image.
+ *
+ * So use -snapshot, where each qemu instance creates its own writable
+ * snapshot internally while leaving the real image read-only.
+ */
+ args->start.opts_source = "-snapshot";
+ args->start.opts_target = "-snapshot";
+
+ /*
+ * COLO migration code logs many errors when the migration socket
+ * is shut down, these are expected so we hide them here.
+ */
+ args->start.hide_stderr = true;
+
+ /*
+ * Test with yank with out of band capability since that is how it is
+ * used in production.
+ */
+ args->start.oob = true;
+ args->start.caps[MIGRATION_CAPABILITY_X_COLO] = true;
+
+ if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
+ return -1;
+ }
+
+ migrate_set_parameter_int(from, "x-checkpoint-delay", 300);
+
+ if (args->start_hook) {
+ data_hook = args->start_hook(from, to);
+ }
+
+ migrate_ensure_converge(from);
+ wait_for_serial("src_serial");
+
+ migrate_qmp(from, to, args->connect_uri, NULL, "{}");
+
+ wait_for_migration_status(from, "colo", NULL);
+ wait_for_resume(to, get_dst());
+
+ wait_for_serial("src_serial");
+ wait_for_serial("dest_serial");
+
+ /* wait for 3 checkpoints */
+ for (int i = 0; i < 3; i++) {
+ qtest_qmp_eventwait(to, "RESUME");
+ wait_for_serial("src_serial");
+ wait_for_serial("dest_serial");
+ }
+
+ if (failover_during_checkpoint) {
+ qtest_qmp_eventwait(to, "STOP");
+ }
+ if (primary_failover) {
+ qtest_qmp_assert_success(from, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+ "'arguments': {'instances':"
+ "[{'type': 'migration'}]}}");
+ qtest_qmp_assert_success(from, "{'execute': 'x-colo-lost-heartbeat'}");
+ wait_for_serial("src_serial");
+ } else {
+ qtest_qmp_assert_success(to, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+ "'arguments': {'instances':"
+ "[{'type': 'migration'}]}}");
+ qtest_qmp_assert_success(to, "{'execute': 'x-colo-lost-heartbeat'}");
+ wait_for_serial("dest_serial");
+ }
+
+ if (args->end_hook) {
+ args->end_hook(from, to, data_hook);
+ }
+
+ migrate_end(from, to, !primary_failover);
+
+ return 0;
+}
+
+static void test_colo_plain_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ args->listen_uri = "tcp:127.0.0.1:0";
+ test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void *hook_start_multifd(QTestState *from, QTestState *to)
+{
+ return migrate_hook_start_precopy_tcp_multifd_common(from, to, "none");
+}
+
+static void test_colo_multifd_common(MigrateCommon *args,
+ bool failover_during_checkpoint,
+ bool primary_failover)
+{
+ args->listen_uri = "defer";
+ args->start_hook = hook_start_multifd;
+ args->start.caps[MIGRATION_CAPABILITY_MULTIFD] = true;
+ test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void test_colo_plain_primary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_plain_common(args, false, true);
+}
+
+static void test_colo_plain_secondary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_plain_common(args, false, false);
+}
+
+static void test_colo_multifd_primary_failover(char *name, MigrateCommon *args)
+{
+ test_colo_multifd_common(args, false, true);
+}
+
+static void test_colo_multifd_secondary_failover(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, false, false);
+}
+
+static void test_colo_plain_primary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_plain_common(args, true, true);
+}
+
+static void test_colo_plain_secondary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_plain_common(args, true, false);
+}
+
+static void test_colo_multifd_primary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, true, true);
+}
+
+static void test_colo_multifd_secondary_failover_checkpoint(char *name,
+ MigrateCommon *args)
+{
+ test_colo_multifd_common(args, true, false);
+}
+
+void migration_test_add_colo(MigrationTestEnv *env)
+{
+ if (!env->full_set) {
+ return;
+ }
+
+ migration_test_add("/migration/colo/plain/primary_failover",
+ test_colo_plain_primary_failover);
+ migration_test_add("/migration/colo/plain/secondary_failover",
+ test_colo_plain_secondary_failover);
+
+ migration_test_add("/migration/colo/multifd/primary_failover",
+ test_colo_multifd_primary_failover);
+ migration_test_add("/migration/colo/multifd/secondary_failover",
+ test_colo_multifd_secondary_failover);
+
+ migration_test_add("/migration/colo/plain/primary_failover_checkpoint",
+ test_colo_plain_primary_failover_checkpoint);
+ migration_test_add("/migration/colo/plain/secondary_failover_checkpoint",
+ test_colo_plain_secondary_failover_checkpoint);
+
+ migration_test_add("/migration/colo/multifd/primary_failover_checkpoint",
+ test_colo_multifd_primary_failover_checkpoint);
+ migration_test_add("/migration/colo/multifd/secondary_failover_checkpoint",
+ test_colo_multifd_secondary_failover_checkpoint);
+}
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index 40984d04930da2d181326d9f6a742bde49018103..80eef758932ce9c301ed6c0f6383d18756144870 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -264,5 +264,10 @@ void migration_test_add_file(MigrationTestEnv *env);
void migration_test_add_precopy(MigrationTestEnv *env);
void migration_test_add_cpr(MigrationTestEnv *env);
void migration_test_add_misc(MigrationTestEnv *env);
+#ifdef CONFIG_REPLICATION
+void migration_test_add_colo(MigrationTestEnv *env);
+#else
+static inline void migration_test_add_colo(MigrationTestEnv *env) {};
+#endif
#endif /* TEST_FRAMEWORK_H */
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 14/21] Convert colo main documentation to restructuredText
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (12 preceding siblings ...)
2026-03-02 11:45 ` [PATCH v11 13/21] migration-test: Add COLO migration unit test Lukas Straub
@ 2026-03-02 11:45 ` Lukas Straub
2026-03-02 11:45 ` [PATCH v11 15/21] qemu-colo.rst: Miscellaneous changes Lukas Straub
` (7 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:45 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
MAINTAINERS | 2 +-
docs/COLO-FT.txt | 334 ------------------------------------------
docs/system/index.rst | 1 +
docs/system/qemu-colo.rst | 360 ++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 362 insertions(+), 335 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 1b0ae10750036be00571b7104ad8426c071bb54c..32ab4b1faa29499ae9067088ae1739e52680cbb4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3876,7 +3876,7 @@ F: migration/multifd-colo.*
F: include/migration/colo.h
F: include/migration/failover.h
F: tests/qtest/migration/colo-tests.c
-F: docs/COLO-FT.txt
+F: docs/system/qemu-colo.rst
COLO Proxy
M: Zhang Chen <zhangckid@gmail.com>
diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt
deleted file mode 100644
index 2283a09c080b8996f9767eeb415e8d4fbdc940af..0000000000000000000000000000000000000000
--- a/docs/COLO-FT.txt
+++ /dev/null
@@ -1,334 +0,0 @@
-COarse-grained LOck-stepping Virtual Machines for Non-stop Service
-----------------------------------------
-Copyright (c) 2016 Intel Corporation
-Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
-Copyright (c) 2016 Fujitsu, Corp.
-
-This work is licensed under the terms of the GNU GPL, version 2 or later.
-See the COPYING file in the top-level directory.
-
-This document gives an overview of COLO's design and how to use it.
-
-== Background ==
-Virtual machine (VM) replication is a well known technique for providing
-application-agnostic software-implemented hardware fault tolerance,
-also known as "non-stop service".
-
-COLO (COarse-grained LOck-stepping) is a high availability solution.
-Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the
-same request from client, and generate response in parallel too.
-If the response packets from PVM and SVM are identical, they are released
-immediately. Otherwise, a VM checkpoint (on demand) is conducted.
-
-== Architecture ==
-
-The architecture of COLO is shown in the diagram below.
-It consists of a pair of networked physical nodes:
-The primary node running the PVM, and the secondary node running the SVM
-to maintain a valid replica of the PVM.
-PVM and SVM execute in parallel and generate output of response packets for
-client requests according to the application semantics.
-
-The incoming packets from the client or external network are received by the
-primary node, and then forwarded to the secondary node, so that both the PVM
-and the SVM are stimulated with the same requests.
-
-COLO receives the outbound packets from both the PVM and SVM and compares them
-before allowing the output to be sent to clients.
-
-The SVM is qualified as a valid replica of the PVM, as long as it generates
-identical responses to all client requests. Once the differences in the outputs
-are detected between the PVM and SVM, COLO withholds transmission of the
-outbound packets until it has successfully synchronized the PVM state to the SVM.
-
- Primary Node Secondary Node
-+------------+ +-----------------------+ +------------------------+ +------------+
-| | | HeartBeat +<----->+ HeartBeat | | |
-| Primary VM | +-----------+-----------+ +-----------+------------+ |Secondary VM|
-| | | | | |
-| | +-----------|-----------+ +-----------|------------+ | |
-| | |QEMU +---v----+ | |QEMU +----v---+ | | |
-| | | |Failover| | | |Failover| | | |
-| | | +--------+ | | +--------+ | | |
-| | | +---------------+ | | +---------------+ | | |
-| | | | VM Checkpoint +-------------->+ VM Checkpoint | | | |
-| | | +---------------+ | | +---------------+ | | |
-|Requests<--------------------------\ /-----------------\ /--------------------->Requests|
-| | | ^ ^ | | | | | | |
-|Responses+---------------------\ /-|-|------------\ /-------------------------+Responses|
-| | | | | | | | | | | | | | | |
-| | | +-----------+ | | | | | | | | | | +----------+ | | |
-| | | | COLO disk | | | | | | | | | | | | COLO disk| | | |
-| | | | Manager +---------------------------->| Manager | | | |
-| | | ++----------+ v v | | | | | v v | +---------++ | | |
-| | | |+-----------+-+-+-++| | ++-+--+-+---------+ | | | |
-| | | || COLO Proxy || | | COLO Proxy | | | | |
-| | | || (compare packet || | |(adjust sequence | | | | |
-| | | ||and mirror packet)|| | | and ACK) | | | | |
-| | | |+------------+---+-+| | +-----------------+ | | | |
-+------------+ +-----------------------+ +------------------------+ +------------+
-+------------+ | | | | +------------+
-| VM Monitor | | | | | | VM Monitor |
-+------------+ | | | | +------------+
-+---------------------------------------+ +----------------------------------------+
-| Kernel | | | | | Kernel | |
-+---------------------------------------+ +----------------------------------------+
- | | | |
- +--------------v+ +---------v---+--+ +------------------+ +v-------------+
- | Storage | |External Network| | External Network | | Storage |
- +---------------+ +----------------+ +------------------+ +--------------+
-
-
-== Components introduction ==
-
-You can see there are several components in COLO's diagram of architecture.
-Their functions are described below.
-
-HeartBeat:
-Runs on both the primary and secondary nodes, to periodically check platform
-availability. When the primary node suffers a hardware fail-stop failure,
-the heartbeat stops responding, the secondary node will trigger a failover
-as soon as it determines the absence.
-
-COLO disk Manager:
-When primary VM writes data into image, the colo disk manager captures this data
-and sends it to secondary VM's which makes sure the context of secondary VM's
-image is consistent with the context of primary VM 's image.
-For more details, please refer to docs/block-replication.txt.
-
-Checkpoint/Failover Controller:
-Modifications of save/restore flow to realize continuous migration,
-to make sure the state of VM in Secondary side is always consistent with VM in
-Primary side.
-
-COLO Proxy:
-Delivers packets to Primary and Secondary, and then compare the responses from
-both side. Then decide whether to start a checkpoint according to some rules.
-Please refer to docs/colo-proxy.txt for more information.
-
-Note:
-HeartBeat has not been implemented yet, so you need to trigger failover process
-by using 'x-colo-lost-heartbeat' command.
-
-== COLO operation status ==
-
-+-----------------+
-| |
-| Start COLO |
-| |
-+--------+--------+
- |
- | Main qmp command:
- | migrate-set-capabilities with x-colo
- | migrate
- |
- v
-+--------+--------+
-| |
-| COLO running |
-| |
-+--------+--------+
- |
- | Main qmp command:
- | x-colo-lost-heartbeat
- | or
- | some error happened
- v
-+--------+--------+
-| | send qmp event:
-| COLO failover | COLO_EXIT
-| |
-+-----------------+
-
-COLO use the qmp command to switch and report operation status.
-The diagram just shows the main qmp command, you can get the detail
-in test procedure.
-
-== Test procedure ==
-Note: Here we are running both instances on the same host for testing,
-change the IP Addresses if you want to run it on two hosts. Initially
-127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host.
-
-== Startup qemu ==
-1. Primary:
-Note: Initially, $imagefolder/primary.qcow2 needs to be copied to all hosts.
-You don't need to change any IP's here, because 0.0.0.0 listens on any
-interface. The chardev's with 127.0.0.1 IP's loopback to the local qemu
-instance.
-
-# imagefolder="/mnt/vms/colo-test-primary"
-
-# qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
- -device piix3-usb-uhci -device usb-tablet -name primary \
- -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
- -device rtl8139,id=e0,netdev=hn0 \
- -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server=on,wait=off \
- -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on \
- -chardev socket,id=compare0,host=127.0.0.1,port=9001,server=on,wait=off \
- -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
- -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server=on,wait=off \
- -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
- -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
- -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
- -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
- -object iothread,id=iothread1 \
- -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,\
-outdev=compare_out0,iothread=iothread1 \
- -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
-children.0.file.filename=$imagefolder/primary.qcow2,children.0.driver=qcow2 -S
-
-2. Secondary:
-Note: Active and hidden images need to be created only once and the
-size should be the same as primary.qcow2. Again, you don't need to change
-any IP's here, except for the $primary_ip variable.
-
-# imagefolder="/mnt/vms/colo-test-secondary"
-# primary_ip=127.0.0.1
-
-# qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G
-
-# qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 10G
-
-# qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
- -device piix3-usb-uhci -device usb-tablet -name secondary \
- -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
- -device rtl8139,id=e0,netdev=hn0 \
- -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect-ms=1000 \
- -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect-ms=1000 \
- -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
- -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
- -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
- -drive if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=qcow2 \
- -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,\
-top-id=colo-disk0,file.file.filename=$imagefolder/secondary-active.qcow2,\
-file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/secondary-hidden.qcow2,\
-file.backing.backing=parent0 \
- -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
-children.0=childs0 \
- -incoming tcp:0.0.0.0:9998
-
-
-3. On Secondary VM's QEMU monitor, issue command
-{"execute":"qmp_capabilities"}
-{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
-{"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": "9999"} } } }
-{"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } }
-
-Note:
- a. The qmp command nbd-server-start and nbd-server-add must be run
- before running the qmp command migrate on primary QEMU
- b. Active disk, hidden disk and nbd target's length should be the
- same.
- c. It is better to put active disk and hidden disk in ramdisk. They
- will be merged into the parent disk on failover.
-
-4. On Primary VM's QEMU monitor, issue command:
-{"execute":"qmp_capabilities"}
-{"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
-{"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
-{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
-{"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
-
- Note:
- a. There should be only one NBD Client for each primary disk.
- b. The qmp command line must be run after running qmp command line in
- secondary qemu.
-
-5. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
-You can issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
-to change the idle checkpoint period time
-
-6. Failover test
-You can kill one of the VMs and Failover on the surviving VM:
-
-If you killed the Secondary, then follow "Primary Failover". After that,
-if you want to resume the replication, follow "Primary resume replication"
-
-If you killed the Primary, then follow "Secondary Failover". After that,
-if you want to resume the replication, follow "Secondary resume replication"
-
-== Primary Failover ==
-The Secondary died, resume on the Primary
-
-{"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "child": "children.1"} }
-{"execute": "human-monitor-command", "arguments":{ "command-line": "drive_del replication0" } }
-{"execute": "object-del", "arguments":{ "id": "comp0" } }
-{"execute": "object-del", "arguments":{ "id": "iothread1" } }
-{"execute": "object-del", "arguments":{ "id": "m0" } }
-{"execute": "object-del", "arguments":{ "id": "redire0" } }
-{"execute": "object-del", "arguments":{ "id": "redire1" } }
-{"execute": "x-colo-lost-heartbeat" }
-
-== Secondary Failover ==
-The Primary died, resume on the Secondary and prepare to become the new Primary
-
-{"execute": "nbd-server-stop"}
-{"execute": "x-colo-lost-heartbeat"}
-
-{"execute": "object-del", "arguments":{ "id": "f2" } }
-{"execute": "object-del", "arguments":{ "id": "f1" } }
-{"execute": "chardev-remove", "arguments":{ "id": "red1" } }
-{"execute": "chardev-remove", "arguments":{ "id": "red0" } }
-
-{"execute": "chardev-add", "arguments":{ "id": "mirror0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "0.0.0.0", "port": "9003" } }, "server": true } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare1", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "0.0.0.0", "port": "9004" } }, "server": true } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9001" } }, "server": true } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare0-0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9001" } }, "server": false } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare_out", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9005" } }, "server": true } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare_out0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9005" } }, "server": false } } } }
-
-== Primary resume replication ==
-Resume replication after new Secondary is up.
-
-Start the new Secondary (Steps 2 and 3 above), then on the Primary:
-{"execute": "drive-mirror", "arguments":{ "device": "colo-disk0", "job-id": "resync", "target": "nbd://127.0.0.2:9999/parent0", "mode": "existing", "format": "raw", "sync": "full"} }
-
-Wait until disk is synced, then:
-{"execute": "stop"}
-{"execute": "block-job-cancel", "arguments":{ "device": "resync"} }
-
-{"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
-{"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
-
-{"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
-{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
-{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire1", "netdev": "hn0", "queue": "rx", "outdev": "compare0" } }
-{"execute": "object-add", "arguments":{ "qom-type": "iothread", "id": "iothread1" } }
-{"execute": "object-add", "arguments":{ "qom-type": "colo-compare", "id": "comp0", "primary_in": "compare0-0", "secondary_in": "compare1", "outdev": "compare_out0", "iothread": "iothread1" } }
-
-{"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
-{"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.2:9998" } }
-
-Note:
-If this Primary previously was a Secondary, then we need to insert the
-filters before the filter-rewriter by using the
-""insert": "before", "position": "id=rew0"" Options. See below.
-
-== Secondary resume replication ==
-Become Primary and resume replication after new Secondary is up. Note
-that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary.
-
-Start the new Secondary (Steps 2 and 3 above, but with primary_ip=127.0.0.2),
-then on the old Secondary:
-{"execute": "drive-mirror", "arguments":{ "device": "colo-disk0", "job-id": "resync", "target": "nbd://127.0.0.1:9999/parent0", "mode": "existing", "format": "raw", "sync": "full"} }
-
-Wait until disk is synced, then:
-{"execute": "stop"}
-{"execute": "block-job-cancel", "arguments":{ "device": "resync" } }
-
-{"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=9999,file.export=parent0,node-name=replication0"}}
-{"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
-
-{"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
-{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
-{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire1", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "outdev": "compare0" } }
-{"execute": "object-add", "arguments":{ "qom-type": "iothread", "id": "iothread1" } }
-{"execute": "object-add", "arguments":{ "qom-type": "colo-compare", "id": "comp0", "primary_in": "compare0-0", "secondary_in": "compare1", "outdev": "compare_out0", "iothread": "iothread1" } }
-
-{"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
-{"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.1:9998" } }
-
-== TODO ==
-1. Support shared storage.
-2. Develop the heartbeat part.
-3. Reduce checkpoint VM’s downtime while doing checkpoint.
diff --git a/docs/system/index.rst b/docs/system/index.rst
index 427b020483104f6589878bbf255a367ae114c61b..6268c41aea9c74dc3e59d896b5ae082360bfbb1a 100644
--- a/docs/system/index.rst
+++ b/docs/system/index.rst
@@ -41,3 +41,4 @@ or Hypervisor.Framework.
igvm
vm-templating
sriov
+ qemu-colo
diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
new file mode 100644
index 0000000000000000000000000000000000000000..4b5fbbf398f8a5c4ea6baad615bde94b2b4678d2
--- /dev/null
+++ b/docs/system/qemu-colo.rst
@@ -0,0 +1,360 @@
+Qemu COLO Fault Tolerance
+=========================
+
+| Copyright (c) 2016 Intel Corporation
+| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+| Copyright (c) 2016 Fujitsu, Corp.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+This document gives an overview of COLO's design and how to use it.
+
+Background
+----------
+Virtual machine (VM) replication is a well known technique for providing
+application-agnostic software-implemented hardware fault tolerance,
+also known as "non-stop service".
+
+COLO (COarse-grained LOck-stepping) is a high availability solution.
+Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the
+same request from client, and generate response in parallel too.
+If the response packets from PVM and SVM are identical, they are released
+immediately. Otherwise, a VM checkpoint (on demand) is conducted.
+
+Architecture
+------------
+The architecture of COLO is shown in the diagram below.
+It consists of a pair of networked physical nodes:
+The primary node running the PVM, and the secondary node running the SVM
+to maintain a valid replica of the PVM.
+PVM and SVM execute in parallel and generate output of response packets for
+client requests according to the application semantics.
+
+The incoming packets from the client or external network are received by the
+primary node, and then forwarded to the secondary node, so that both the PVM
+and the SVM are stimulated with the same requests.
+
+COLO receives the outbound packets from both the PVM and SVM and compares them
+before allowing the output to be sent to clients.
+
+The SVM is qualified as a valid replica of the PVM, as long as it generates
+identical responses to all client requests. Once the differences in the outputs
+are detected between the PVM and SVM, COLO withholds transmission of the
+outbound packets until it has successfully synchronized the PVM state to the SVM.
+
+Overview::
+
+ Primary Node Secondary Node
+ +------------+ +-----------------------+ +------------------------+ +------------+
+ | | | HeartBeat +<----->+ HeartBeat | | |
+ | Primary VM | +-----------+-----------+ +-----------+------------+ |Secondary VM|
+ | | | | | |
+ | | +-----------|-----------+ +-----------|------------+ | |
+ | | |QEMU +---v----+ | |QEMU +----v---+ | | |
+ | | | |Failover| | | |Failover| | | |
+ | | | +--------+ | | +--------+ | | |
+ | | | +---------------+ | | +---------------+ | | |
+ | | | | VM Checkpoint +-------------->+ VM Checkpoint | | | |
+ | | | +---------------+ | | +---------------+ | | |
+ |Requests<--------------------------\ /-----------------\ /--------------------->Requests|
+ | | | ^ ^ | | | | | | |
+ |Responses+---------------------\ /-|-|------------\ /-------------------------+Responses|
+ | | | | | | | | | | | | | | | |
+ | | | +-----------+ | | | | | | | | | | +----------+ | | |
+ | | | | COLO disk | | | | | | | | | | | | COLO disk| | | |
+ | | | | Manager +---------------------------->| Manager | | | |
+ | | | ++----------+ v v | | | | | v v | +---------++ | | |
+ | | | |+-----------+-+-+-++| | ++-+--+-+---------+ | | | |
+ | | | || COLO Proxy || | | COLO Proxy | | | | |
+ | | | || (compare packet || | |(adjust sequence | | | | |
+ | | | ||and mirror packet)|| | | and ACK) | | | | |
+ | | | |+------------+---+-+| | +-----------------+ | | | |
+ +------------+ +-----------------------+ +------------------------+ +------------+
+ +------------+ | | | | +------------+
+ | VM Monitor | | | | | | VM Monitor |
+ +------------+ | | | | +------------+
+ +---------------------------------------+ +----------------------------------------+
+ | Kernel | | | | | Kernel | |
+ +---------------------------------------+ +----------------------------------------+
+ | | | |
+ +--------------v+ +---------v---+--+ +------------------+ +v-------------+
+ | Storage | |External Network| | External Network | | Storage |
+ +---------------+ +----------------+ +------------------+ +--------------+
+
+Components introduction
+^^^^^^^^^^^^^^^^^^^^^^^
+You can see there are several components in COLO's diagram of architecture.
+Their functions are described below.
+
+HeartBeat
+~~~~~~~~~
+Runs on both the primary and secondary nodes, to periodically check platform
+availability. When the primary node suffers a hardware fail-stop failure,
+the heartbeat stops responding, the secondary node will trigger a failover
+as soon as it determines the absence.
+
+COLO disk Manager
+~~~~~~~~~~~~~~~~~
+When primary VM writes data into image, the colo disk manager captures this data
+and sends it to secondary VM's which makes sure the context of secondary VM's
+image is consistent with the context of primary VM 's image.
+For more details, please refer to docs/block-replication.txt.
+
+Checkpoint/Failover Controller
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Modifications of save/restore flow to realize continuous migration,
+to make sure the state of VM in Secondary side is always consistent with VM in
+Primary side.
+
+COLO Proxy
+~~~~~~~~~~
+Delivers packets to Primary and Secondary, and then compare the responses from
+both side. Then decide whether to start a checkpoint according to some rules.
+Please refer to docs/colo-proxy.txt for more information.
+
+Note:
+HeartBeat has not been implemented yet, so you need to trigger failover process
+by using 'x-colo-lost-heartbeat' command.
+
+COLO operation status
+^^^^^^^^^^^^^^^^^^^^^
+
+Overview::
+
+ +-----------------+
+ | |
+ | Start COLO |
+ | |
+ +--------+--------+
+ |
+ | Main qmp command:
+ | migrate-set-capabilities with x-colo
+ | migrate
+ |
+ v
+ +--------+--------+
+ | |
+ | COLO running |
+ | |
+ +--------+--------+
+ |
+ | Main qmp command:
+ | x-colo-lost-heartbeat
+ | or
+ | some error happened
+ v
+ +--------+--------+
+ | | send qmp event:
+ | COLO failover | COLO_EXIT
+ | |
+ +-----------------+
+
+
+COLO use the qmp command to switch and report operation status.
+The diagram just shows the main qmp command, you can get the detail
+in test procedure.
+
+Test procedure
+--------------
+Note: Here we are running both instances on the same host for testing,
+change the IP Addresses if you want to run it on two hosts. Initially
+``127.0.0.1`` is the Primary Host and ``127.0.0.2`` is the Secondary Host.
+
+Startup qemu
+^^^^^^^^^^^^
+**1. Primary**:
+Note: Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
+You don't need to change any IP's here, because ``0.0.0.0`` listens on any
+interface. The chardev's with ``127.0.0.1`` IP's loopback to the local qemu
+instance::
+
+ # imagefolder="/mnt/vms/colo-test-primary"
+
+ # qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
+ -device piix3-usb-uhci -device usb-tablet -name primary \
+ -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
+ -device rtl8139,id=e0,netdev=hn0 \
+ -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server=on,wait=off \
+ -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on \
+ -chardev socket,id=compare0,host=127.0.0.1,port=9001,server=on,wait=off \
+ -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
+ -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server=on,wait=off \
+ -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
+ -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
+ -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
+ -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
+ -object iothread,id=iothread1 \
+ -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,\
+ outdev=compare_out0,iothread=iothread1 \
+ -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
+ children.0.file.filename=$imagefolder/primary.qcow2,children.0.driver=qcow2 -S
+
+
+**2. Secondary**:
+Note: Active and hidden images need to be created only once and the
+size should be the same as ``primary.qcow2``. Again, you don't need to change
+any IP's here, except for the ``$primary_ip`` variable::
+
+ # imagefolder="/mnt/vms/colo-test-secondary"
+ # primary_ip=127.0.0.1
+
+ # qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G
+
+ # qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 10G
+
+ # qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
+ -device piix3-usb-uhci -device usb-tablet -name secondary \
+ -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
+ -device rtl8139,id=e0,netdev=hn0 \
+ -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect-ms=1000 \
+ -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect-ms=1000 \
+ -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
+ -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
+ -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
+ -drive if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=qcow2 \
+ -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,\
+ top-id=colo-disk0,file.file.filename=$imagefolder/secondary-active.qcow2,\
+ file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/secondary-hidden.qcow2,\
+ file.backing.backing=parent0 \
+ -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
+ children.0=childs0 \
+ -incoming tcp:0.0.0.0:9998
+
+
+**3.** On Secondary VM's QEMU monitor, issue command::
+
+ {"execute":"qmp_capabilities"}
+ {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
+ {"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": "9999"} } } }
+ {"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } }
+
+Note:
+ a. The qmp command ``nbd-server-start`` and ``nbd-server-add`` must be run
+ before running the qmp command migrate on primary QEMU
+ b. Active disk, hidden disk and nbd target's length should be the
+ same.
+ c. It is better to put active disk and hidden disk in ramdisk. They
+ will be merged into the parent disk on failover.
+
+**4.** On Primary VM's QEMU monitor, issue command::
+
+ {"execute":"qmp_capabilities"}
+ {"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
+ {"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
+ {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
+ {"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
+
+Note:
+ a. There should be only one NBD Client for each primary disk.
+ b. The qmp command line must be run after running qmp command line in
+ secondary qemu.
+
+**5.** After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
+You can issue command ``{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }``
+to change the idle checkpoint period time
+
+Failover test
+^^^^^^^^^^^^^
+You can kill one of the VMs and Failover on the surviving VM:
+
+If you killed the Secondary, then follow "Primary Failover".
+After that, if you want to resume the replication, follow "Primary resume replication"
+
+If you killed the Primary, then follow "Secondary Failover".
+After that, if you want to resume the replication, follow "Secondary resume replication"
+
+Primary Failover
+~~~~~~~~~~~~~~~~
+The Secondary died, resume on the Primary::
+
+ {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "child": "children.1"} }
+ {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_del replication0" } }
+ {"execute": "object-del", "arguments":{ "id": "comp0" } }
+ {"execute": "object-del", "arguments":{ "id": "iothread1" } }
+ {"execute": "object-del", "arguments":{ "id": "m0" } }
+ {"execute": "object-del", "arguments":{ "id": "redire0" } }
+ {"execute": "object-del", "arguments":{ "id": "redire1" } }
+ {"execute": "x-colo-lost-heartbeat" }
+
+Secondary Failover
+~~~~~~~~~~~~~~~~~~
+The Primary died, resume on the Secondary and prepare to become the new Primary::
+
+ {"execute": "nbd-server-stop"}
+ {"execute": "x-colo-lost-heartbeat"}
+
+ {"execute": "object-del", "arguments":{ "id": "f2" } }
+ {"execute": "object-del", "arguments":{ "id": "f1" } }
+ {"execute": "chardev-remove", "arguments":{ "id": "red1" } }
+ {"execute": "chardev-remove", "arguments":{ "id": "red0" } }
+
+ {"execute": "chardev-add", "arguments":{ "id": "mirror0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "0.0.0.0", "port": "9003" } }, "server": true } } } }
+ {"execute": "chardev-add", "arguments":{ "id": "compare1", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "0.0.0.0", "port": "9004" } }, "server": true } } } }
+ {"execute": "chardev-add", "arguments":{ "id": "compare0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9001" } }, "server": true } } } }
+ {"execute": "chardev-add", "arguments":{ "id": "compare0-0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9001" } }, "server": false } } } }
+ {"execute": "chardev-add", "arguments":{ "id": "compare_out", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9005" } }, "server": true } } } }
+ {"execute": "chardev-add", "arguments":{ "id": "compare_out0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9005" } }, "server": false } } } }
+
+Primary resume replication
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Resume replication after new Secondary is up.
+
+Start the new Secondary (Steps 2 and 3 above), then on the Primary::
+
+ {"execute": "drive-mirror", "arguments":{ "device": "colo-disk0", "job-id": "resync", "target": "nbd://127.0.0.2:9999/parent0", "mode": "existing", "format": "raw", "sync": "full"} }
+
+Wait until disk is synced, then::
+
+ {"execute": "stop"}
+ {"execute": "block-job-cancel", "arguments":{ "device": "resync"} }
+
+ {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
+ {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
+
+ {"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
+ {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
+ {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire1", "netdev": "hn0", "queue": "rx", "outdev": "compare0" } }
+ {"execute": "object-add", "arguments":{ "qom-type": "iothread", "id": "iothread1" } }
+ {"execute": "object-add", "arguments":{ "qom-type": "colo-compare", "id": "comp0", "primary_in": "compare0-0", "secondary_in": "compare1", "outdev": "compare_out0", "iothread": "iothread1" } }
+
+ {"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
+ {"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.2:9998" } }
+
+Note:
+If this Primary previously was a Secondary, then we need to insert the
+filters before the filter-rewriter by using the
+""insert": "before", "position": "id=rew0"" Options. See below.
+
+Secondary resume replication
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Become Primary and resume replication after new Secondary is up. Note
+that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary.
+
+Start the new Secondary (Steps 2 and 3 above, but with primary_ip=127.0.0.2),
+then on the old Secondary::
+
+ {"execute": "drive-mirror", "arguments":{ "device": "colo-disk0", "job-id": "resync", "target": "nbd://127.0.0.1:9999/parent0", "mode": "existing", "format": "raw", "sync": "full"} }
+
+Wait until disk is synced, then::
+
+ {"execute": "stop"}
+ {"execute": "block-job-cancel", "arguments":{ "device": "resync" } }
+
+ {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=9999,file.export=parent0,node-name=replication0"}}
+ {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
+
+ {"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
+ {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
+ {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire1", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "outdev": "compare0" } }
+ {"execute": "object-add", "arguments":{ "qom-type": "iothread", "id": "iothread1" } }
+ {"execute": "object-add", "arguments":{ "qom-type": "colo-compare", "id": "comp0", "primary_in": "compare0-0", "secondary_in": "compare1", "outdev": "compare_out0", "iothread": "iothread1" } }
+
+ {"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
+ {"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.1:9998" } }
+
+TODO
+----
+1. Support shared storage.
+2. Develop the heartbeat part.
+3. Reduce checkpoint VM’s downtime while doing checkpoint.
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 15/21] qemu-colo.rst: Miscellaneous changes
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (13 preceding siblings ...)
2026-03-02 11:45 ` [PATCH v11 14/21] Convert colo main documentation to restructuredText Lukas Straub
@ 2026-03-02 11:45 ` Lukas Straub
2026-03-02 11:45 ` [PATCH v11 16/21] qemu-colo.rst: Add my copyright Lukas Straub
` (6 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:45 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
docs/system/qemu-colo.rst | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
index 4b5fbbf398f8a5c4ea6baad615bde94b2b4678d2..a70e61aa09391cda933031535fa982d27cf6654b 100644
--- a/docs/system/qemu-colo.rst
+++ b/docs/system/qemu-colo.rst
@@ -1,13 +1,6 @@
Qemu COLO Fault Tolerance
=========================
-| Copyright (c) 2016 Intel Corporation
-| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
-| Copyright (c) 2016 Fujitsu, Corp.
-
-This work is licensed under the terms of the GNU GPL, version 2 or later.
-See the COPYING file in the top-level directory.
-
This document gives an overview of COLO's design and how to use it.
Background
@@ -82,8 +75,8 @@ Overview::
| Storage | |External Network| | External Network | | Storage |
+---------------+ +----------------+ +------------------+ +--------------+
-Components introduction
-^^^^^^^^^^^^^^^^^^^^^^^
+Components
+^^^^^^^^^^
You can see there are several components in COLO's diagram of architecture.
Their functions are described below.
@@ -157,14 +150,21 @@ in test procedure.
Test procedure
--------------
-Note: Here we are running both instances on the same host for testing,
+
+Setup
+^^^^^
+
+Here we are running both instances on the same host for testing,
change the IP Addresses if you want to run it on two hosts. Initially
``127.0.0.1`` is the Primary Host and ``127.0.0.2`` is the Secondary Host.
+COLO uses double the guest ram size on the secondary side. The Qemu version
+should be the same on both hosts.
+
Startup qemu
^^^^^^^^^^^^
**1. Primary**:
-Note: Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
+Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
You don't need to change any IP's here, because ``0.0.0.0`` listens on any
interface. The chardev's with ``127.0.0.1`` IP's loopback to the local qemu
instance::
@@ -192,7 +192,7 @@ instance::
**2. Secondary**:
-Note: Active and hidden images need to be created only once and the
+Active and hidden images need to be created only once and the
size should be the same as ``primary.qcow2``. Again, you don't need to change
any IP's here, except for the ``$primary_ip`` variable::
@@ -353,8 +353,9 @@ Wait until disk is synced, then::
{"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
{"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.1:9998" } }
-TODO
-----
-1. Support shared storage.
-2. Develop the heartbeat part.
-3. Reduce checkpoint VM’s downtime while doing checkpoint.
+| Copyright (c) 2016 Intel Corporation
+| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+| Copyright (c) 2016 Fujitsu, Corp.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 16/21] qemu-colo.rst: Add my copyright
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (14 preceding siblings ...)
2026-03-02 11:45 ` [PATCH v11 15/21] qemu-colo.rst: Miscellaneous changes Lukas Straub
@ 2026-03-02 11:45 ` Lukas Straub
2026-03-02 11:45 ` [PATCH v11 17/21] qemu-colo.rst: Simplify the block replication setup Lukas Straub
` (5 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:45 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
I have so far contributed 61 commits to the colo project, waranting
the addition of my copyright to this file.
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
docs/system/qemu-colo.rst | 1 +
1 file changed, 1 insertion(+)
diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
index a70e61aa09391cda933031535fa982d27cf6654b..75abbd80298df79223cb8e70064a5dc83d70f4eb 100644
--- a/docs/system/qemu-colo.rst
+++ b/docs/system/qemu-colo.rst
@@ -356,6 +356,7 @@ Wait until disk is synced, then::
| Copyright (c) 2016 Intel Corporation
| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
| Copyright (c) 2016 Fujitsu, Corp.
+| Copyright (c) 2026 Lukas Straub <lukasstraub2@web.de>
This work is licensed under the terms of the GNU GPL, version 2 or later.
See the COPYING file in the top-level directory.
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 17/21] qemu-colo.rst: Simplify the block replication setup
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (15 preceding siblings ...)
2026-03-02 11:45 ` [PATCH v11 16/21] qemu-colo.rst: Add my copyright Lukas Straub
@ 2026-03-02 11:45 ` Lukas Straub
2026-03-02 11:45 ` [PATCH v11 18/21] multifd: Fix hang if send thread errors during sync Lukas Straub
` (4 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:45 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
On the primary side we don't actually need the replication
block driver, since it only passes trough all IO.
So simplify the setup and also use 'blockdev-add' instead of
'human-monitor-command'.
This is how my clients use colo in production.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
docs/system/qemu-colo.rst | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
index 75abbd80298df79223cb8e70064a5dc83d70f4eb..f7d3b6439cf3401a58c412634239d1a43999a10e 100644
--- a/docs/system/qemu-colo.rst
+++ b/docs/system/qemu-colo.rst
@@ -240,8 +240,8 @@ Note:
**4.** On Primary VM's QEMU monitor, issue command::
{"execute":"qmp_capabilities"}
- {"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
- {"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
+ {"execute": "blockdev-add", "arguments": {"driver": "nbd", "node-name": "nbd0", "server": {"type": "inet", "host": "127.0.0.2", "port": "9999"}, "export": "parent0", "detect-zeroes": "on"} }
+ {"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "nbd0" } }
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
{"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
@@ -269,7 +269,7 @@ Primary Failover
The Secondary died, resume on the Primary::
{"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "child": "children.1"} }
- {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_del replication0" } }
+ {"execute": "blockdev-del", "arguments": {"node-name": "nbd0"} }
{"execute": "object-del", "arguments":{ "id": "comp0" } }
{"execute": "object-del", "arguments":{ "id": "iothread1" } }
{"execute": "object-del", "arguments":{ "id": "m0" } }
@@ -309,8 +309,8 @@ Wait until disk is synced, then::
{"execute": "stop"}
{"execute": "block-job-cancel", "arguments":{ "device": "resync"} }
- {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
- {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
+ {"execute": "blockdev-add", "arguments": {"driver": "nbd", "node-name": "nbd0", "server": {"type": "inet", "host": "127.0.0.2", "port": "9999"}, "export": "parent0", "detect-zeroes": "on"} }
+ {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "nbd0" } }
{"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
@@ -341,8 +341,8 @@ Wait until disk is synced, then::
{"execute": "stop"}
{"execute": "block-job-cancel", "arguments":{ "device": "resync" } }
- {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=9999,file.export=parent0,node-name=replication0"}}
- {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
+ {"execute": "blockdev-add", "arguments": {"driver": "nbd", "node-name": "nbd0", "server": {"type": "inet", "host": "127.0.0.1", "port": "9999"}, "export": "parent0", "detect-zeroes": "on"} }
+ {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "nbd0" } }
{"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 18/21] multifd: Fix hang if send thread errors during sync
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (16 preceding siblings ...)
2026-03-02 11:45 ` [PATCH v11 17/21] qemu-colo.rst: Simplify the block replication setup Lukas Straub
@ 2026-03-02 11:45 ` Lukas Straub
2026-03-02 11:45 ` [PATCH v11 19/21] colo: Use file lock in primary_vm_do_failover() Lukas Straub
` (3 subsequent siblings)
21 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:45 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
When a send thread encounters an error (as is the case with yank),
it sets multifd_send_state->exiting and the other threads exit too.
This races with multifd_send_sync_main() which now hangs at
qemu_sem_wait(&p->sem_sync) in multifd_send_sync_main() line 647
as it waits for threads that have exited.
Fix this by kicking the semaphores when exiting the send threads.
I encountered this hang when stress testing the colo unit test,
though I was unable to write a migration test to reliably hit this.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
migration/multifd.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/migration/multifd.c b/migration/multifd.c
index 220ed8564960fdabc58e4baa069dd252c8ad293c..7762aab8e0702672d3730f27e9c9ee3b86500f0c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -772,9 +772,14 @@ out:
assert(local_err);
trace_multifd_send_error(p->id);
multifd_send_error_propagate(local_err);
- multifd_send_kick_main(p);
}
+ /*
+ * Always kick the main thread: The main thread might wait on this thread
+ * while another thread encounters an error and signals this thread to exit.
+ */
+ multifd_send_kick_main(p);
+
rcu_unregister_thread();
trace_multifd_send_thread_end(p->id, p->packets_sent);
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 19/21] colo: Use file lock in primary_vm_do_failover()
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (17 preceding siblings ...)
2026-03-02 11:45 ` [PATCH v11 18/21] multifd: Fix hang if send thread errors during sync Lukas Straub
@ 2026-03-02 11:45 ` Lukas Straub
2026-03-02 20:06 ` Peter Xu
2026-03-02 11:45 ` [PATCH v11 20/21] migration: Keep s->rp_state.from_dst_file open until migration ends Lukas Straub
` (2 subsequent siblings)
21 siblings, 1 reply; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:45 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
Take the file lock since s->to_dst_file and s->rp_state.from_dst_file
may be changed in the migration thread.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
migration/colo.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/migration/colo.c b/migration/colo.c
index ce02c71d8857d470be434bdf3a9cacad3baab0d5..dc47d03874039b686d2a4072ac4e6c77e4ff1f87 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -173,11 +173,13 @@ static void primary_vm_do_failover(void)
* The s->rp_state.from_dst_file and s->to_dst_file may use the
* same fd, but we still shutdown the fd for twice, it is harmless.
*/
- if (s->to_dst_file) {
- qemu_file_shutdown(s->to_dst_file);
- }
- if (s->rp_state.from_dst_file) {
- qemu_file_shutdown(s->rp_state.from_dst_file);
+ WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
+ if (s->to_dst_file) {
+ qemu_file_shutdown(s->to_dst_file);
+ }
+ if (s->rp_state.from_dst_file) {
+ qemu_file_shutdown(s->rp_state.from_dst_file);
+ }
}
old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 20/21] migration: Keep s->rp_state.from_dst_file open until migration ends
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (18 preceding siblings ...)
2026-03-02 11:45 ` [PATCH v11 19/21] colo: Use file lock in primary_vm_do_failover() Lukas Straub
@ 2026-03-02 11:45 ` Lukas Straub
2026-03-02 20:07 ` Peter Xu
2026-03-02 11:45 ` [PATCH v11 21/21] colo: Reuse the return path from migration on primary and secondary side Lukas Straub
2026-03-06 12:53 ` [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
21 siblings, 1 reply; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:45 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
This will be used in the next patch to allow COLO to reuse the return path.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
migration/migration.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index f36d42ef657bdf26d78ca642d77a9b76e1c0c174..74f5c314ff62333f6c9dd8bbd09afdfaf84a41c2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -97,7 +97,8 @@ static GSList *migration_blockers[MIG_MODE__MAX];
static bool migration_object_check(MigrationState *ms, Error **errp);
static bool migration_switchover_start(MigrationState *s, Error **errp);
-static bool close_return_path_on_source(MigrationState *s);
+static bool stop_return_path_thread_on_source(MigrationState *s);
+static void migration_release_dst_files(MigrationState *ms);
static void migration_completion_end(MigrationState *s);
static void migration_downtime_start(MigrationState *s)
@@ -1278,7 +1279,8 @@ static void migration_cleanup(MigrationState *s)
cpr_state_close();
cpr_transfer_source_destroy(s);
- close_return_path_on_source(s);
+ stop_return_path_thread_on_source(s);
+ migration_release_dst_files(s);
if (s->migration_thread_running) {
bql_unlock();
@@ -2209,6 +2211,7 @@ static void migration_release_dst_files(MigrationState *ms)
* locking needed because this qemufile should only be managed by
* return path thread.
*/
+ assert(!ms->rp_state.rp_thread_created);
if (ms->postcopy_qemufile_src) {
migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
qemu_file_shutdown(ms->postcopy_qemufile_src);
@@ -2216,7 +2219,9 @@ static void migration_release_dst_files(MigrationState *ms)
ms->postcopy_qemufile_src = NULL;
}
- qemu_fclose(file);
+ if (file) {
+ qemu_fclose(file);
+ }
}
/*
@@ -2402,7 +2407,7 @@ static void open_return_path_on_source(MigrationState *ms)
}
/* Return true if error detected, or false otherwise */
-static bool close_return_path_on_source(MigrationState *ms)
+static bool stop_return_path_thread_on_source(MigrationState *ms)
{
if (!ms->rp_state.rp_thread_created) {
return false;
@@ -2424,7 +2429,6 @@ static bool close_return_path_on_source(MigrationState *ms)
qemu_thread_join(&ms->rp_state.rp_thread);
ms->rp_state.rp_thread_created = false;
- migration_release_dst_files(ms);
trace_migration_return_path_end_after();
/* Return path will persist the error in MigrationState when quit */
@@ -2787,7 +2791,7 @@ static void migration_completion(MigrationState *s)
goto fail;
}
- if (close_return_path_on_source(s)) {
+ if (stop_return_path_thread_on_source(s)) {
goto fail;
}
@@ -2941,7 +2945,8 @@ static MigThrError postcopy_pause(MigrationState *s)
* path and just wait for the thread to finish. It will be
* re-created when we resume.
*/
- close_return_path_on_source(s);
+ stop_return_path_thread_on_source(s);
+ migration_release_dst_files(s);
/*
* Current channel is possibly broken. Release it. Note that this is
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v11 21/21] colo: Reuse the return path from migration on primary and secondary side
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (19 preceding siblings ...)
2026-03-02 11:45 ` [PATCH v11 20/21] migration: Keep s->rp_state.from_dst_file open until migration ends Lukas Straub
@ 2026-03-02 11:45 ` Lukas Straub
2026-03-02 20:11 ` Peter Xu
2026-03-06 12:53 ` [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
21 siblings, 1 reply; 36+ messages in thread
From: Lukas Straub @ 2026-03-02 11:45 UTC (permalink / raw)
To: qemu-devel
Cc: Lukas Straub, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
Use the return-path capability with colo and reuse the opened return path
file on both primary and secondary side.
This fixes a crash in colo where migration_cancel() races with colo closing
s->rp_state.from_dst_file.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
docs/system/qemu-colo.rst | 4 ++--
migration/colo.c | 26 +++++++++-----------------
migration/options.c | 10 +++++++++-
tests/qtest/migration/colo-tests.c | 1 +
tests/qtest/migration/framework.c | 13 +++++++++++++
5 files changed, 34 insertions(+), 20 deletions(-)
diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
index f7d3b6439cf3401a58c412634239d1a43999a10e..9ec3eb79b006a24e0d3a360f1c8f49a68f4d8086 100644
--- a/docs/system/qemu-colo.rst
+++ b/docs/system/qemu-colo.rst
@@ -225,7 +225,7 @@ any IP's here, except for the ``$primary_ip`` variable::
**3.** On Secondary VM's QEMU monitor, issue command::
{"execute":"qmp_capabilities"}
- {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
+ {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "return-path", "state": true }, {"capability": "x-colo", "state": true } ] } }
{"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": "9999"} } } }
{"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } }
@@ -242,7 +242,7 @@ Note:
{"execute":"qmp_capabilities"}
{"execute": "blockdev-add", "arguments": {"driver": "nbd", "node-name": "nbd0", "server": {"type": "inet", "host": "127.0.0.2", "port": "9999"}, "export": "parent0", "detect-zeroes": "on"} }
{"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "nbd0" } }
- {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
+ {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "return-path", "state": true }, {"capability": "x-colo", "state": true } ] } }
{"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
Note:
diff --git a/migration/colo.c b/migration/colo.c
index dc47d03874039b686d2a4072ac4e6c77e4ff1f87..2d36f933cf155c1084565162294a61d32e28fe86 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -539,6 +539,8 @@ static void colo_process_checkpoint(MigrationState *s)
Error *local_err = NULL;
int ret;
+ assert(s->rp_state.from_dst_file);
+ assert(!s->rp_state.rp_thread_created);
if (get_colo_mode() != COLO_MODE_PRIMARY) {
error_report("COLO mode must be COLO_MODE_PRIMARY");
return;
@@ -546,12 +548,6 @@ static void colo_process_checkpoint(MigrationState *s)
failover_init_state();
- s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
- if (!s->rp_state.from_dst_file) {
- error_report("Open QEMUFile from_dst_file failed");
- goto out;
- }
-
packets_compare_notifier.notify = colo_compare_notify_checkpoint;
colo_compare_register_notifier(&packets_compare_notifier);
@@ -636,16 +632,6 @@ out:
colo_compare_unregister_notifier(&packets_compare_notifier);
timer_free(s->colo_delay_timer);
qemu_event_destroy(&s->colo_checkpoint_event);
-
- /*
- * Must be called after failover BH is completed,
- * Or the failover BH may shutdown the wrong fd that
- * re-used by other threads after we release here.
- */
- if (s->rp_state.from_dst_file) {
- qemu_fclose(s->rp_state.from_dst_file);
- s->rp_state.from_dst_file = NULL;
- }
}
void migrate_start_colo_process(MigrationState *s)
@@ -838,6 +824,7 @@ static void *colo_process_incoming_thread(void *opaque)
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_COLO);
+ assert(mis->to_src_file);
if (get_colo_mode() != COLO_MODE_SECONDARY) {
error_report("COLO mode must be COLO_MODE_SECONDARY");
return NULL;
@@ -854,7 +841,6 @@ static void *colo_process_incoming_thread(void *opaque)
failover_init_state();
- mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
/*
* Note: the communication between Primary side and Secondary side
* should be sequential, we set the fd to unblocked in migration incoming
@@ -866,6 +852,12 @@ static void *colo_process_incoming_thread(void *opaque)
goto out;
}
+ /*
+ * rp thread still running on primary side, shut it down to go into
+ * colo state.
+ */
+ migrate_send_rp_shut(mis, 0);
+
colo_incoming_start_dirty_log();
bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
diff --git a/migration/options.c b/migration/options.c
index 1ffe85a2d8ccc8de9cc04d3c12293538a367abf7..f33b2979290f30ecfd0ffc4654498f25750f6720 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -575,7 +575,15 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
ERRP_GUARD();
MigrationIncomingState *mis = migration_incoming_get_current();
-#ifndef CONFIG_REPLICATION
+#ifdef CONFIG_REPLICATION
+ if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
+ if (!new_caps[MIGRATION_CAPABILITY_RETURN_PATH]) {
+ error_setg(errp, "Capability 'x-colo' requires capability "
+ "'return-path'");
+ return false;
+ }
+ }
+#else
if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
error_setg(errp, "QEMU compiled without replication module"
" can't enable COLO");
diff --git a/tests/qtest/migration/colo-tests.c b/tests/qtest/migration/colo-tests.c
index 598a1d3821ed0a90318732702027cebad47352fd..366b9f7ac6d56a6e74a29d3e062124ad4234608e 100644
--- a/tests/qtest/migration/colo-tests.c
+++ b/tests/qtest/migration/colo-tests.c
@@ -46,6 +46,7 @@ static int test_colo_common(MigrateCommon *args,
* used in production.
*/
args->start.oob = true;
+ args->start.caps[MIGRATION_CAPABILITY_RETURN_PATH] = true;
args->start.caps[MIGRATION_CAPABILITY_X_COLO] = true;
if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 2a3efeb80740cbfe876f649f0af685c5839e00a2..0bfc241914462c023ceab2fab0386009dc5a4320 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -216,6 +216,19 @@ static void migrate_start_set_capabilities(QTestState *from, QTestState *to,
* MigrationCapability_lookup and MIGRATION_CAPABILITY_ constants
* are from qapi-types-migration.h.
*/
+
+ /*
+ * Enable return path first, since other features depend on it.
+ */
+ if (args->caps[MIGRATION_CAPABILITY_RETURN_PATH]) {
+ if (from) {
+ migrate_set_capability(from, "return-path", true);
+ }
+ if (to) {
+ migrate_set_capability(to, "return-path", true);
+ }
+ }
+
for (uint8_t i = 0; i < MIGRATION_CAPABILITY__MAX; i++) {
if (!args->caps[i]) {
continue;
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v11 19/21] colo: Use file lock in primary_vm_do_failover()
2026-03-02 11:45 ` [PATCH v11 19/21] colo: Use file lock in primary_vm_do_failover() Lukas Straub
@ 2026-03-02 20:06 ` Peter Xu
0 siblings, 0 replies; 36+ messages in thread
From: Peter Xu @ 2026-03-02 20:06 UTC (permalink / raw)
To: Lukas Straub
Cc: qemu-devel, Fabiano Rosas, Zhang Chen, Hailiang Zhang, Li Zhijian
On Mon, Mar 02, 2026 at 12:45:34PM +0100, Lukas Straub wrote:
> Take the file lock since s->to_dst_file and s->rp_state.from_dst_file
> may be changed in the migration thread.
>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v11 20/21] migration: Keep s->rp_state.from_dst_file open until migration ends
2026-03-02 11:45 ` [PATCH v11 20/21] migration: Keep s->rp_state.from_dst_file open until migration ends Lukas Straub
@ 2026-03-02 20:07 ` Peter Xu
0 siblings, 0 replies; 36+ messages in thread
From: Peter Xu @ 2026-03-02 20:07 UTC (permalink / raw)
To: Lukas Straub
Cc: qemu-devel, Fabiano Rosas, Zhang Chen, Hailiang Zhang, Li Zhijian
On Mon, Mar 02, 2026 at 12:45:35PM +0100, Lukas Straub wrote:
> This will be used in the next patch to allow COLO to reuse the return path.
>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v11 21/21] colo: Reuse the return path from migration on primary and secondary side
2026-03-02 11:45 ` [PATCH v11 21/21] colo: Reuse the return path from migration on primary and secondary side Lukas Straub
@ 2026-03-02 20:11 ` Peter Xu
0 siblings, 0 replies; 36+ messages in thread
From: Peter Xu @ 2026-03-02 20:11 UTC (permalink / raw)
To: Lukas Straub
Cc: qemu-devel, Fabiano Rosas, Zhang Chen, Hailiang Zhang, Li Zhijian
On Mon, Mar 02, 2026 at 12:45:36PM +0100, Lukas Straub wrote:
> Use the return-path capability with colo and reuse the opened return path
> file on both primary and secondary side.
>
> This fixes a crash in colo where migration_cancel() races with colo closing
> s->rp_state.from_dst_file.
>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Yeah, I think this looks more consistent, thanks.
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
` (20 preceding siblings ...)
2026-03-02 11:45 ` [PATCH v11 21/21] colo: Reuse the return path from migration on primary and secondary side Lukas Straub
@ 2026-03-06 12:53 ` Lukas Straub
2026-03-06 13:07 ` Fabiano Rosas
21 siblings, 1 reply; 36+ messages in thread
From: Lukas Straub @ 2026-03-06 12:53 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang, Li Zhijian,
Juan Quintela
[-- Attachment #1: Type: text/plain, Size: 5344 bytes --]
On Mon, 2 Mar 2026 12:43:30 +0100
Lukas Straub <lukasstraub2@web.de> wrote:
>
> Hello everyone,
> This has some cleanups for and adds multifd support and migration unit tests
> for COLO migration.
>
> Regards,
> Lukas
Hello Peter, Hello Fabiano,
Will you apply this?
Best regards,
Lukas Straub
>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
> Changes in v11:
> - Use colo with return-path capability
> - Link to v10: https://lore.kernel.org/qemu-devel/20260220-colo_unit_test_multifd-v10-0-bfe67d422ef1@web.de
>
> Changes in v10:
> - multifd: always kick the main thread
> - always open the return path socket on source
> - Link to v9: https://lore.kernel.org/qemu-devel/20260218-colo_unit_test_multifd-v9-0-d8dbdb0ca6f6@web.de
>
> Changes in v9:
> - Rebase onto master
> - Fix two rare bugs discovered during sresstesting the colo unit test
> - Link to v8: https://lore.kernel.org/qemu-devel/20260210-colo_unit_test_multifd-v8-0-7f9e5f7d082b@web.de
>
> Changes in v8:
> - Fix peter's review comments
> - Link to v7: https://lore.kernel.org/qemu-devel/20260210-colo_unit_test_multifd-v7-0-23bd32f36828@web.de
>
> Changes in v7:
> - Fix peter's review comments
> - Link to v6: https://lore.kernel.org/qemu-devel/20260206-colo_unit_test_multifd-v6-0-27779dda139d@web.de
>
> Changes in v6:
> - Fix the crash when running COLO with TCG accel.
> - Link to v5: https://lore.kernel.org/qemu-devel/20260203-colo_unit_test_multifd-v5-0-57508b7389f6@web.de
>
> Changes in v5:
> - Remove unused inmports from multifd-colo.c
> - Mention the checkpoint overhead of reset to the Q35 fix
> - Link to v4: https://lore.kernel.org/qemu-devel/20260130-colo_unit_test_multifd-v4-0-7115ab6f0e77@web.de
>
> Changes in v4:
> - Add cleanup patches to remove migration_incoming_colo_enabled() and MIG_CMD_ENABLE_COLO
> - Add more comments to the colo unit test
> - Call colo_release_ram_cache() after multifd threads terminate
> - Link to v3: https://lore.kernel.org/qemu-devel/20260125-colo_unit_test_multifd-v3-0-ae926ccd8eae@web.de
>
> Changes in v3:
> - Fix peter's review comments.
> - Fix COLO with Q35 machine
> - Link to v2: https://lore.kernel.org/qemu-devel/20260117-colo_unit_test_multifd-v2-0-ab521777fa51@web.de
>
> Changes in v2:
> - Fix review comments
> - Hide stderr in colo migration test since the logged errors are expected
> - Add benchmarking data for multifd
> - Add myself as maintainer for COLO migration framework
> - Link to v1: https://lore.kernel.org/qemu-devel/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de
>
> ---
> Lukas Straub (21):
> MAINTAINERS: Add myself as maintainer for COLO migration framework
> MAINTAINERS: Remove Hailiang Zhang from COLO migration framework
> colo: Setup ram cache in normal migration path
> colo: Replace migration_incoming_colo_enabled() with migrate_colo()
> colo: Remove ENABLE_COLO savevm command and mark it as deprecated
> ram: Remove colo special-casing
> multifd: Move ram state receive into multifd_ram_state_recv()
> multifd: Add COLO support
> Call colo_release_ram_cache() after multifd threads terminate
> colo: Fix crash during device vmstate load
> colo: Hold the BQL while sending ram state
> colo: Do not hold the BQL while receiving ram state.
> migration-test: Add COLO migration unit test
> Convert colo main documentation to restructuredText
> qemu-colo.rst: Miscellaneous changes
> qemu-colo.rst: Add my copyright
> qemu-colo.rst: Simplify the block replication setup
> multifd: Fix hang if send thread errors during sync
> colo: Use file lock in primary_vm_do_failover()
> migration: Keep s->rp_state.from_dst_file open until migration ends
> colo: Reuse the return path from migration on primary and secondary side
>
> MAINTAINERS | 6 +-
> docs/COLO-FT.txt | 334 ----------------------------------
> docs/system/index.rst | 1 +
> docs/system/qemu-colo.rst | 362 +++++++++++++++++++++++++++++++++++++
> include/migration/colo.h | 3 -
> migration/colo.c | 59 +++---
> migration/meson.build | 2 +-
> migration/migration.c | 80 ++++----
> migration/multifd-colo.c | 44 +++++
> migration/multifd-colo.h | 26 +++
> migration/multifd-nocomp.c | 10 +-
> migration/multifd.c | 26 ++-
> migration/multifd.h | 5 +-
> migration/options.c | 10 +-
> migration/ram.c | 12 +-
> migration/savevm.c | 37 +---
> migration/savevm.h | 1 -
> migration/trace-events | 1 -
> tests/qtest/meson.build | 7 +-
> tests/qtest/migration-test.c | 1 +
> tests/qtest/migration/colo-tests.c | 199 ++++++++++++++++++++
> tests/qtest/migration/framework.c | 13 ++
> tests/qtest/migration/framework.h | 5 +
> 23 files changed, 777 insertions(+), 467 deletions(-)
> ---
> base-commit: d8a9d97317d03190b34498741f98f22e2a9afe3e
> change-id: 20251230-colo_unit_test_multifd-8bf58dcebd46
>
> Best regards,
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test
2026-03-06 12:53 ` [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
@ 2026-03-06 13:07 ` Fabiano Rosas
0 siblings, 0 replies; 36+ messages in thread
From: Fabiano Rosas @ 2026-03-06 13:07 UTC (permalink / raw)
To: Lukas Straub, qemu-devel
Cc: Peter Xu, Zhang Chen, Hailiang Zhang, Li Zhijian, Juan Quintela
Lukas Straub <lukasstraub2@web.de> writes:
> On Mon, 2 Mar 2026 12:43:30 +0100
> Lukas Straub <lukasstraub2@web.de> wrote:
>
>>
>> Hello everyone,
>> This has some cleanups for and adds multifd support and migration unit tests
>> for COLO migration.
>>
>> Regards,
>> Lukas
>
> Hello Peter, Hello Fabiano,
> Will you apply this?
>
Very likely yes. I'm starting the tests on migration-next, if all goes
well I'll send the pull request later today.
> Best regards,
> Lukas Straub
>
>>
>> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
>> ---
>> Changes in v11:
>> - Use colo with return-path capability
>> - Link to v10: https://lore.kernel.org/qemu-devel/20260220-colo_unit_test_multifd-v10-0-bfe67d422ef1@web.de
>>
>> Changes in v10:
>> - multifd: always kick the main thread
>> - always open the return path socket on source
>> - Link to v9: https://lore.kernel.org/qemu-devel/20260218-colo_unit_test_multifd-v9-0-d8dbdb0ca6f6@web.de
>>
>> Changes in v9:
>> - Rebase onto master
>> - Fix two rare bugs discovered during sresstesting the colo unit test
>> - Link to v8: https://lore.kernel.org/qemu-devel/20260210-colo_unit_test_multifd-v8-0-7f9e5f7d082b@web.de
>>
>> Changes in v8:
>> - Fix peter's review comments
>> - Link to v7: https://lore.kernel.org/qemu-devel/20260210-colo_unit_test_multifd-v7-0-23bd32f36828@web.de
>>
>> Changes in v7:
>> - Fix peter's review comments
>> - Link to v6: https://lore.kernel.org/qemu-devel/20260206-colo_unit_test_multifd-v6-0-27779dda139d@web.de
>>
>> Changes in v6:
>> - Fix the crash when running COLO with TCG accel.
>> - Link to v5: https://lore.kernel.org/qemu-devel/20260203-colo_unit_test_multifd-v5-0-57508b7389f6@web.de
>>
>> Changes in v5:
>> - Remove unused inmports from multifd-colo.c
>> - Mention the checkpoint overhead of reset to the Q35 fix
>> - Link to v4: https://lore.kernel.org/qemu-devel/20260130-colo_unit_test_multifd-v4-0-7115ab6f0e77@web.de
>>
>> Changes in v4:
>> - Add cleanup patches to remove migration_incoming_colo_enabled() and MIG_CMD_ENABLE_COLO
>> - Add more comments to the colo unit test
>> - Call colo_release_ram_cache() after multifd threads terminate
>> - Link to v3: https://lore.kernel.org/qemu-devel/20260125-colo_unit_test_multifd-v3-0-ae926ccd8eae@web.de
>>
>> Changes in v3:
>> - Fix peter's review comments.
>> - Fix COLO with Q35 machine
>> - Link to v2: https://lore.kernel.org/qemu-devel/20260117-colo_unit_test_multifd-v2-0-ab521777fa51@web.de
>>
>> Changes in v2:
>> - Fix review comments
>> - Hide stderr in colo migration test since the logged errors are expected
>> - Add benchmarking data for multifd
>> - Add myself as maintainer for COLO migration framework
>> - Link to v1: https://lore.kernel.org/qemu-devel/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de
>>
>> ---
>> Lukas Straub (21):
>> MAINTAINERS: Add myself as maintainer for COLO migration framework
>> MAINTAINERS: Remove Hailiang Zhang from COLO migration framework
>> colo: Setup ram cache in normal migration path
>> colo: Replace migration_incoming_colo_enabled() with migrate_colo()
>> colo: Remove ENABLE_COLO savevm command and mark it as deprecated
>> ram: Remove colo special-casing
>> multifd: Move ram state receive into multifd_ram_state_recv()
>> multifd: Add COLO support
>> Call colo_release_ram_cache() after multifd threads terminate
>> colo: Fix crash during device vmstate load
>> colo: Hold the BQL while sending ram state
>> colo: Do not hold the BQL while receiving ram state.
>> migration-test: Add COLO migration unit test
>> Convert colo main documentation to restructuredText
>> qemu-colo.rst: Miscellaneous changes
>> qemu-colo.rst: Add my copyright
>> qemu-colo.rst: Simplify the block replication setup
>> multifd: Fix hang if send thread errors during sync
>> colo: Use file lock in primary_vm_do_failover()
>> migration: Keep s->rp_state.from_dst_file open until migration ends
>> colo: Reuse the return path from migration on primary and secondary side
>>
>> MAINTAINERS | 6 +-
>> docs/COLO-FT.txt | 334 ----------------------------------
>> docs/system/index.rst | 1 +
>> docs/system/qemu-colo.rst | 362 +++++++++++++++++++++++++++++++++++++
>> include/migration/colo.h | 3 -
>> migration/colo.c | 59 +++---
>> migration/meson.build | 2 +-
>> migration/migration.c | 80 ++++----
>> migration/multifd-colo.c | 44 +++++
>> migration/multifd-colo.h | 26 +++
>> migration/multifd-nocomp.c | 10 +-
>> migration/multifd.c | 26 ++-
>> migration/multifd.h | 5 +-
>> migration/options.c | 10 +-
>> migration/ram.c | 12 +-
>> migration/savevm.c | 37 +---
>> migration/savevm.h | 1 -
>> migration/trace-events | 1 -
>> tests/qtest/meson.build | 7 +-
>> tests/qtest/migration-test.c | 1 +
>> tests/qtest/migration/colo-tests.c | 199 ++++++++++++++++++++
>> tests/qtest/migration/framework.c | 13 ++
>> tests/qtest/migration/framework.h | 5 +
>> 23 files changed, 777 insertions(+), 467 deletions(-)
>> ---
>> base-commit: d8a9d97317d03190b34498741f98f22e2a9afe3e
>> change-id: 20251230-colo_unit_test_multifd-8bf58dcebd46
>>
>> Best regards,
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v11 02/21] MAINTAINERS: Remove Hailiang Zhang from COLO migration framework
2026-03-02 11:43 ` [PATCH v11 02/21] MAINTAINERS: Remove Hailiang Zhang from " Lukas Straub
@ 2026-03-10 11:33 ` Lukas Straub
0 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-10 11:33 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang, Li Zhijian
[-- Attachment #1: Type: text/plain, Size: 1086 bytes --]
On Mon, 2 Mar 2026 12:43:32 +0100
Lukas Straub <lukasstraub2@web.de> wrote:
> His last email to the mailing list is from December 2021:
> https://lore.kernel.org/qemu-devel/20211214075424.6920-1-zhanghailiang@xfusion.com/
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Zhang Chen <zhangckid@gmail.com>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
> MAINTAINERS | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ff592384b366acc823f19a6596ea8f32446763b2..aa8f795783ba936ce2521e08aaca8de4a112ed33 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3869,7 +3869,6 @@ F: include/qemu/yank.h
> F: qapi/yank.json
>
> COLO Framework
> -M: Hailiang Zhang <zhanghailiang@xfusion.com>
> M: Lukas Straub <lukasstraub2@web.de>
> S: Maintained
> F: migration/colo*
>
Thank you Hailiang for the work you put into Qemu and the COLO
project in particular, without you this project would not have been
possible. It always has been a pleasure working with you.
Best regards,
Lukas Straub
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v11 13/21] migration-test: Add COLO migration unit test
2026-03-02 11:45 ` [PATCH v11 13/21] migration-test: Add COLO migration unit test Lukas Straub
@ 2026-03-10 14:47 ` Arun Menon
2026-03-10 15:29 ` Lukas Straub
0 siblings, 1 reply; 36+ messages in thread
From: Arun Menon @ 2026-03-10 14:47 UTC (permalink / raw)
To: Lukas Straub
Cc: qemu-devel, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
Hi Lukas,
On Mon, Mar 02, 2026 at 12:45:28PM +0100, Lukas Straub wrote:
> Add a COLO migration test for COLO migration and failover.
>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Tested-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
> MAINTAINERS | 1 +
> tests/qtest/meson.build | 7 +-
> tests/qtest/migration-test.c | 1 +
> tests/qtest/migration/colo-tests.c | 198 +++++++++++++++++++++++++++++++++++++
> tests/qtest/migration/framework.h | 5 +
> 5 files changed, 211 insertions(+), 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d2a1f4cc08223cb944b61e32a6d89e25bf82eacb..1b0ae10750036be00571b7104ad8426c071bb54c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3875,6 +3875,7 @@ F: migration/colo*
> F: migration/multifd-colo.*
> F: include/migration/colo.h
> F: include/migration/failover.h
> +F: tests/qtest/migration/colo-tests.c
> F: docs/COLO-FT.txt
>
> COLO Proxy
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index 25fdbc798010b19e8ec9b6ab55e02d3fb5741398..6a46e2a767de12d978d910ddb6de175bce9810b8 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -374,6 +374,11 @@ if gnutls.found()
> endif
> endif
>
> +migration_colo_files = []
> +if get_option('replication').allowed()
> + migration_colo_files = [files('migration/colo-tests.c')]
> +endif
> +
> qtests = {
> 'aspeed_hace-test': files('aspeed-hace-utils.c', 'aspeed_hace-test.c'),
> 'aspeed_smc-test': files('aspeed-smc-utils.c', 'aspeed_smc-test.c'),
> @@ -385,7 +390,7 @@ qtests = {
> 'migration/migration-util.c') + dbus_vmstate1,
> 'erst-test': files('erst-test.c'),
> 'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
> - 'migration-test': test_migration_files + migration_tls_files,
> + 'migration-test': test_migration_files + migration_tls_files + migration_colo_files,
> 'pxe-test': files('boot-sector.c'),
> 'pnv-xive2-test': files('pnv-xive2-common.c', 'pnv-xive2-flush-sync.c',
> 'pnv-xive2-nvpg_bar.c'),
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 08936871741535c926eeac40a7d7c3f461c72fd0..e582f05c7dc2673dbd05a936df8feb6c964b5bbc 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -55,6 +55,7 @@ int main(int argc, char **argv)
> migration_test_add_precopy(env);
> migration_test_add_cpr(env);
> migration_test_add_misc(env);
> + migration_test_add_colo(env);
>
> ret = g_test_run();
>
> diff --git a/tests/qtest/migration/colo-tests.c b/tests/qtest/migration/colo-tests.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..598a1d3821ed0a90318732702027cebad47352fd
> --- /dev/null
> +++ b/tests/qtest/migration/colo-tests.c
> @@ -0,0 +1,198 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * QTest testcases for COLO migration
> + *
> + * Copyright (c) 2025 Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "libqtest.h"
> +#include "migration/framework.h"
> +#include "migration/migration-qmp.h"
> +#include "migration/migration-util.h"
> +#include "qemu/module.h"
> +
> +static int test_colo_common(MigrateCommon *args,
> + bool failover_during_checkpoint,
> + bool primary_failover)
> +{
> + QTestState *from, *to;
> + void *data_hook = NULL;
> +
> + /*
> + * For the COLO test, both VMs will run in parallel. Thus both VMs want to
> + * open the image read/write at the same time. Using read-only=on is not
> + * possible here, because ide-hd does not support read-only backing image.
> + *
> + * So use -snapshot, where each qemu instance creates its own writable
> + * snapshot internally while leaving the real image read-only.
> + */
> + args->start.opts_source = "-snapshot";
> + args->start.opts_target = "-snapshot";
> +
> + /*
> + * COLO migration code logs many errors when the migration socket
> + * is shut down, these are expected so we hide them here.
> + */
> + args->start.hide_stderr = true;
> +
> + /*
> + * Test with yank with out of band capability since that is how it is
> + * used in production.
> + */
> + args->start.oob = true;
> + args->start.caps[MIGRATION_CAPABILITY_X_COLO] = true;
> +
> + if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
> + return -1;
> + }
> +
> + migrate_set_parameter_int(from, "x-checkpoint-delay", 300);
> +
> + if (args->start_hook) {
> + data_hook = args->start_hook(from, to);
> + }
> +
> + migrate_ensure_converge(from);
> + wait_for_serial("src_serial");
> +
> + migrate_qmp(from, to, args->connect_uri, NULL, "{}");
> +
> + wait_for_migration_status(from, "colo", NULL);
> + wait_for_resume(to, get_dst());
> +
> + wait_for_serial("src_serial");
> + wait_for_serial("dest_serial");
> +
> + /* wait for 3 checkpoints */
> + for (int i = 0; i < 3; i++) {
> + qtest_qmp_eventwait(to, "RESUME");
> + wait_for_serial("src_serial");
> + wait_for_serial("dest_serial");
> + }
> +
> + if (failover_during_checkpoint) {
> + qtest_qmp_eventwait(to, "STOP");
> + }
> + if (primary_failover) {
> + qtest_qmp_assert_success(from, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
> + "'arguments': {'instances':"
> + "[{'type': 'migration'}]}}");
> + qtest_qmp_assert_success(from, "{'execute': 'x-colo-lost-heartbeat'}");
> + wait_for_serial("src_serial");
> + } else {
> + qtest_qmp_assert_success(to, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
> + "'arguments': {'instances':"
> + "[{'type': 'migration'}]}}");
> + qtest_qmp_assert_success(to, "{'execute': 'x-colo-lost-heartbeat'}");
> + wait_for_serial("dest_serial");
> + }
> +
> + if (args->end_hook) {
> + args->end_hook(from, to, data_hook);
> + }
> +
> + migrate_end(from, to, !primary_failover);
> +
> + return 0;
> +}
> +
> +static void test_colo_plain_common(MigrateCommon *args,
> + bool failover_during_checkpoint,
> + bool primary_failover)
> +{
> + args->listen_uri = "tcp:127.0.0.1:0";
> + test_colo_common(args, failover_during_checkpoint, primary_failover);
> +}
> +
> +static void *hook_start_multifd(QTestState *from, QTestState *to)
> +{
> + return migrate_hook_start_precopy_tcp_multifd_common(from, to, "none");
> +}
> +
> +static void test_colo_multifd_common(MigrateCommon *args,
> + bool failover_during_checkpoint,
> + bool primary_failover)
> +{
> + args->listen_uri = "defer";
> + args->start_hook = hook_start_multifd;
> + args->start.caps[MIGRATION_CAPABILITY_MULTIFD] = true;
> + test_colo_common(args, failover_during_checkpoint, primary_failover);
> +}
> +
> +static void test_colo_plain_primary_failover(char *name, MigrateCommon *args)
> +{
> + test_colo_plain_common(args, false, true);
> +}
> +
> +static void test_colo_plain_secondary_failover(char *name, MigrateCommon *args)
> +{
> + test_colo_plain_common(args, false, false);
> +}
> +
> +static void test_colo_multifd_primary_failover(char *name, MigrateCommon *args)
> +{
> + test_colo_multifd_common(args, false, true);
> +}
> +
> +static void test_colo_multifd_secondary_failover(char *name,
> + MigrateCommon *args)
> +{
> + test_colo_multifd_common(args, false, false);
> +}
> +
> +static void test_colo_plain_primary_failover_checkpoint(char *name,
> + MigrateCommon *args)
> +{
> + test_colo_plain_common(args, true, true);
> +}
> +
> +static void test_colo_plain_secondary_failover_checkpoint(char *name,
> + MigrateCommon *args)
> +{
> + test_colo_plain_common(args, true, false);
> +}
> +
> +static void test_colo_multifd_primary_failover_checkpoint(char *name,
> + MigrateCommon *args)
> +{
> + test_colo_multifd_common(args, true, true);
> +}
> +
> +static void test_colo_multifd_secondary_failover_checkpoint(char *name,
> + MigrateCommon *args)
> +{
> + test_colo_multifd_common(args, true, false);
> +}
> +
> +void migration_test_add_colo(MigrationTestEnv *env)
> +{
> + if (!env->full_set) {
> + return;
> + }
> +
> + migration_test_add("/migration/colo/plain/primary_failover",
> + test_colo_plain_primary_failover);
> + migration_test_add("/migration/colo/plain/secondary_failover",
> + test_colo_plain_secondary_failover);
> +
> + migration_test_add("/migration/colo/multifd/primary_failover",
> + test_colo_multifd_primary_failover);
> + migration_test_add("/migration/colo/multifd/secondary_failover",
> + test_colo_multifd_secondary_failover);
> +
> + migration_test_add("/migration/colo/plain/primary_failover_checkpoint",
> + test_colo_plain_primary_failover_checkpoint);
> + migration_test_add("/migration/colo/plain/secondary_failover_checkpoint",
> + test_colo_plain_secondary_failover_checkpoint);
> +
> + migration_test_add("/migration/colo/multifd/primary_failover_checkpoint",
> + test_colo_multifd_primary_failover_checkpoint);
> + migration_test_add("/migration/colo/multifd/secondary_failover_checkpoint",
> + test_colo_multifd_secondary_failover_checkpoint);
> +}
> diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
> index 40984d04930da2d181326d9f6a742bde49018103..80eef758932ce9c301ed6c0f6383d18756144870 100644
> --- a/tests/qtest/migration/framework.h
> +++ b/tests/qtest/migration/framework.h
> @@ -264,5 +264,10 @@ void migration_test_add_file(MigrationTestEnv *env);
> void migration_test_add_precopy(MigrationTestEnv *env);
> void migration_test_add_cpr(MigrationTestEnv *env);
> void migration_test_add_misc(MigrationTestEnv *env);
> +#ifdef CONFIG_REPLICATION
> +void migration_test_add_colo(MigrationTestEnv *env);
> +#else
> +static inline void migration_test_add_colo(MigrationTestEnv *env) {};
> +#endif
>
> #endif /* TEST_FRAMEWORK_H */
>
> --
> 2.39.5
>
>
I was running the qtests locally, and I encountered a timeout error.
Command run: mkdir -p build ; cd build ; make check-qtest-x86_64;
Following is the output:
======
67/67 qtest+qtest-x86_64 - qemu:qtest-x86_64/migration-test TIMEOUT 480.05s killed by signal 15 SIGTERM
>>> QTEST_QEMU_IMG=./qemu-img LD_LIBRARY_PATH=/home/arun/workdir/new/devel/upstream/qemu-priv/build/subprojects/slirp RUST_BACKTRACE=1 QTEST_QEMU_BINARY=./qemu-system-x86_64 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 G_TEST_DBUS_DAEMON=/home/arun/workdir/new/devel/upstream/qemu-priv/tests/dbus-vmstate-daemon.sh ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 PYTHON=/home/arun/workdir/new/devel/upstream/qemu-priv/build/pyvenv/bin/python3 MESON_TEST_ITERATION=1 MALLOC_PERTURB_=53 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 /home/arun/workdir/new/devel/upstream/qemu-priv/build/tests/qtest/migration-test --tap -k --full
stderr:
TAP parsing error: Too few tests run (expected 52, got 47)
Summary of Failures:
67/67 qtest+qtest-x86_64 - qemu:qtest-x86_64/migration-test TIMEOUT 480.05s killed by signal 15 SIGTERM
Ok: 64
Fail: 0
Skipped: 2
Timeout: 1
======
It seems that the test runner is stuck waiting for some input.
Following is the stack trace
======
> ps afx
127267 pts/0 S+ 0:00 | | \_ make check-qtest-x86_64 -j8
128245 pts/0 S+ 0:01 | | \_ /home/arun/workdir/new/devel/upstream/qemu-priv/build/pyvenv/bin/python3 /home/arun/workdir
128276 ? Ssl 0:07 | | \_ /home/arun/workdir/new/devel/upstream/qemu-priv/build/tests/qtest/migration-test --tap
134107 ? Sl 0:20 | | \_ ./qemu-system-x86_64 -qtest unix:/tmp/qtest-128276.sock -qtest-log /dev/null -chard
134115 ? Sl 0:22 | | \_ ./qemu-system-x86_64 -qtest unix:/tmp/qtest-128276.sock -qtest-log /dev/null -chard
5610 pts/2 Ss 0:01 | \_ /usr/bin/bash
======
gstack 128276
Thread 2 (Thread 0x7fdd090716c0 (LWP 128279) "call_rcu"):
#0 0x00007fdd0921434d in syscall () from /lib64/libc.so.6
#1 0x0000557fd604563a in qemu_futex_wait (f=0x557fd60a0190 <rcu_call_ready_event>, val=4294967295) at /home/arun/workdir/new/devel/upstream/qemu-priv/include/qemu/futex.h:47
#2 0x0000557fd604584e in qemu_event_wait (ev=0x557fd60a0190 <rcu_call_ready_event>) at ../util/event.c:162
#3 0x0000557fd6045fde in call_rcu_thread (opaque=0x0) at ../util/rcu.c:304
#4 0x0000557fd600e8fb in qemu_thread_start (args=0x557fd6beec70) at ../util/qemu-thread-posix.c:414
#5 0x00007fdd09193464 in start_thread () from /lib64/libc.so.6
#6 0x00007fdd092165ac in __clone3 () from /lib64/libc.so.6
Thread 1 (Thread 0x7fdd09073240 (LWP 128276) "migration-test"):
#0 0x00007fdd0919b982 in __syscall_cancel_arch () from /lib64/libc.so.6
#1 0x00007fdd0918fc3c in __internal_syscall_cancel () from /lib64/libc.so.6
#2 0x00007fdd091dfb62 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#3 0x00007fdd091ebb37 in nanosleep () from /lib64/libc.so.6
#4 0x00007fdd0921613a in usleep () from /lib64/libc.so.6
#5 0x0000557fd5fd99cd in wait_for_serial (side=0x557fd6065f08 "dest_serial") at ../tests/qtest/migration/framework.c:82
#6 0x0000557fd5fe5865 in test_colo_common (args=0x557fd6bfdf50, failover_during_checkpoint=false, primary_failover=true) at ../tests/qtest/migration/colo-tests.c:66
#7 0x0000557fd5fe5a0f in test_colo_plain_common (args=0x557fd6bfdf50, failover_during_checkpoint=false, primary_failover=true) at ../tests/qtest/migration/colo-tests.c:106
#8 0x0000557fd5fe5ad7 in test_colo_plain_primary_failover (name=0x557fd6bfd050 "/migration/colo/plain/primary_failover", args=0x557fd6bfdf50) at ../tests/qtest/migration/colo-tests.c:126
#9 0x0000557fd5fdeff8 in migration_test_wrapper (data=0x557fd6bfd320) at ../tests/qtest/migration/migration-util.c:258
#10 0x00007fdd0947bf3e in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
#11 0x00007fdd0947beb3 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
#12 0x00007fdd0947beb3 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
#13 0x00007fdd0947beb3 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
#14 0x00007fdd0947beb3 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
#15 0x00007fdd0947c46a in g_test_run_suite () from /lib64/libglib-2.0.so.0
#16 0x00007fdd0947c500 in g_test_run () from /lib64/libglib-2.0.so.0
#17 0x0000557fd5fd9490 in main (argc=1, argv=0x7fff33b51908) at ../tests/qtest/migration-test.c:60
Is there something that I am missing? Can you please look into this?
Regards,
Arun Menon
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v11 13/21] migration-test: Add COLO migration unit test
2026-03-10 14:47 ` Arun Menon
@ 2026-03-10 15:29 ` Lukas Straub
2026-03-10 17:42 ` Arun Menon
0 siblings, 1 reply; 36+ messages in thread
From: Lukas Straub @ 2026-03-10 15:29 UTC (permalink / raw)
To: Arun Menon
Cc: qemu-devel, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
[-- Attachment #1: Type: text/plain, Size: 17867 bytes --]
On Tue, 10 Mar 2026 20:17:57 +0530
Arun Menon <armenon@redhat.com> wrote:
> Hi Lukas,
>
> On Mon, Mar 02, 2026 at 12:45:28PM +0100, Lukas Straub wrote:
> > Add a COLO migration test for COLO migration and failover.
> >
> > Reviewed-by: Fabiano Rosas <farosas@suse.de>
> > Tested-by: Fabiano Rosas <farosas@suse.de>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > ---
> > MAINTAINERS | 1 +
> > tests/qtest/meson.build | 7 +-
> > tests/qtest/migration-test.c | 1 +
> > tests/qtest/migration/colo-tests.c | 198 +++++++++++++++++++++++++++++++++++++
> > tests/qtest/migration/framework.h | 5 +
> > 5 files changed, 211 insertions(+), 1 deletion(-)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index d2a1f4cc08223cb944b61e32a6d89e25bf82eacb..1b0ae10750036be00571b7104ad8426c071bb54c 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3875,6 +3875,7 @@ F: migration/colo*
> > F: migration/multifd-colo.*
> > F: include/migration/colo.h
> > F: include/migration/failover.h
> > +F: tests/qtest/migration/colo-tests.c
> > F: docs/COLO-FT.txt
> >
> > COLO Proxy
> > diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> > index 25fdbc798010b19e8ec9b6ab55e02d3fb5741398..6a46e2a767de12d978d910ddb6de175bce9810b8 100644
> > --- a/tests/qtest/meson.build
> > +++ b/tests/qtest/meson.build
> > @@ -374,6 +374,11 @@ if gnutls.found()
> > endif
> > endif
> >
> > +migration_colo_files = []
> > +if get_option('replication').allowed()
> > + migration_colo_files = [files('migration/colo-tests.c')]
> > +endif
> > +
> > qtests = {
> > 'aspeed_hace-test': files('aspeed-hace-utils.c', 'aspeed_hace-test.c'),
> > 'aspeed_smc-test': files('aspeed-smc-utils.c', 'aspeed_smc-test.c'),
> > @@ -385,7 +390,7 @@ qtests = {
> > 'migration/migration-util.c') + dbus_vmstate1,
> > 'erst-test': files('erst-test.c'),
> > 'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
> > - 'migration-test': test_migration_files + migration_tls_files,
> > + 'migration-test': test_migration_files + migration_tls_files + migration_colo_files,
> > 'pxe-test': files('boot-sector.c'),
> > 'pnv-xive2-test': files('pnv-xive2-common.c', 'pnv-xive2-flush-sync.c',
> > 'pnv-xive2-nvpg_bar.c'),
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index 08936871741535c926eeac40a7d7c3f461c72fd0..e582f05c7dc2673dbd05a936df8feb6c964b5bbc 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -55,6 +55,7 @@ int main(int argc, char **argv)
> > migration_test_add_precopy(env);
> > migration_test_add_cpr(env);
> > migration_test_add_misc(env);
> > + migration_test_add_colo(env);
> >
> > ret = g_test_run();
> >
> > diff --git a/tests/qtest/migration/colo-tests.c b/tests/qtest/migration/colo-tests.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..598a1d3821ed0a90318732702027cebad47352fd
> > --- /dev/null
> > +++ b/tests/qtest/migration/colo-tests.c
> > @@ -0,0 +1,198 @@
> > +/*
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + *
> > + * QTest testcases for COLO migration
> > + *
> > + * Copyright (c) 2025 Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "libqtest.h"
> > +#include "migration/framework.h"
> > +#include "migration/migration-qmp.h"
> > +#include "migration/migration-util.h"
> > +#include "qemu/module.h"
> > +
> > +static int test_colo_common(MigrateCommon *args,
> > + bool failover_during_checkpoint,
> > + bool primary_failover)
> > +{
> > + QTestState *from, *to;
> > + void *data_hook = NULL;
> > +
> > + /*
> > + * For the COLO test, both VMs will run in parallel. Thus both VMs want to
> > + * open the image read/write at the same time. Using read-only=on is not
> > + * possible here, because ide-hd does not support read-only backing image.
> > + *
> > + * So use -snapshot, where each qemu instance creates its own writable
> > + * snapshot internally while leaving the real image read-only.
> > + */
> > + args->start.opts_source = "-snapshot";
> > + args->start.opts_target = "-snapshot";
> > +
> > + /*
> > + * COLO migration code logs many errors when the migration socket
> > + * is shut down, these are expected so we hide them here.
> > + */
> > + args->start.hide_stderr = true;
> > +
> > + /*
> > + * Test with yank with out of band capability since that is how it is
> > + * used in production.
> > + */
> > + args->start.oob = true;
> > + args->start.caps[MIGRATION_CAPABILITY_X_COLO] = true;
> > +
> > + if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
> > + return -1;
> > + }
> > +
> > + migrate_set_parameter_int(from, "x-checkpoint-delay", 300);
> > +
> > + if (args->start_hook) {
> > + data_hook = args->start_hook(from, to);
> > + }
> > +
> > + migrate_ensure_converge(from);
> > + wait_for_serial("src_serial");
> > +
> > + migrate_qmp(from, to, args->connect_uri, NULL, "{}");
> > +
> > + wait_for_migration_status(from, "colo", NULL);
> > + wait_for_resume(to, get_dst());
> > +
> > + wait_for_serial("src_serial");
> > + wait_for_serial("dest_serial");
> > +
> > + /* wait for 3 checkpoints */
> > + for (int i = 0; i < 3; i++) {
> > + qtest_qmp_eventwait(to, "RESUME");
> > + wait_for_serial("src_serial");
> > + wait_for_serial("dest_serial");
> > + }
> > +
> > + if (failover_during_checkpoint) {
> > + qtest_qmp_eventwait(to, "STOP");
> > + }
> > + if (primary_failover) {
> > + qtest_qmp_assert_success(from, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
> > + "'arguments': {'instances':"
> > + "[{'type': 'migration'}]}}");
> > + qtest_qmp_assert_success(from, "{'execute': 'x-colo-lost-heartbeat'}");
> > + wait_for_serial("src_serial");
> > + } else {
> > + qtest_qmp_assert_success(to, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
> > + "'arguments': {'instances':"
> > + "[{'type': 'migration'}]}}");
> > + qtest_qmp_assert_success(to, "{'execute': 'x-colo-lost-heartbeat'}");
> > + wait_for_serial("dest_serial");
> > + }
> > +
> > + if (args->end_hook) {
> > + args->end_hook(from, to, data_hook);
> > + }
> > +
> > + migrate_end(from, to, !primary_failover);
> > +
> > + return 0;
> > +}
> > +
> > +static void test_colo_plain_common(MigrateCommon *args,
> > + bool failover_during_checkpoint,
> > + bool primary_failover)
> > +{
> > + args->listen_uri = "tcp:127.0.0.1:0";
> > + test_colo_common(args, failover_during_checkpoint, primary_failover);
> > +}
> > +
> > +static void *hook_start_multifd(QTestState *from, QTestState *to)
> > +{
> > + return migrate_hook_start_precopy_tcp_multifd_common(from, to, "none");
> > +}
> > +
> > +static void test_colo_multifd_common(MigrateCommon *args,
> > + bool failover_during_checkpoint,
> > + bool primary_failover)
> > +{
> > + args->listen_uri = "defer";
> > + args->start_hook = hook_start_multifd;
> > + args->start.caps[MIGRATION_CAPABILITY_MULTIFD] = true;
> > + test_colo_common(args, failover_during_checkpoint, primary_failover);
> > +}
> > +
> > +static void test_colo_plain_primary_failover(char *name, MigrateCommon *args)
> > +{
> > + test_colo_plain_common(args, false, true);
> > +}
> > +
> > +static void test_colo_plain_secondary_failover(char *name, MigrateCommon *args)
> > +{
> > + test_colo_plain_common(args, false, false);
> > +}
> > +
> > +static void test_colo_multifd_primary_failover(char *name, MigrateCommon *args)
> > +{
> > + test_colo_multifd_common(args, false, true);
> > +}
> > +
> > +static void test_colo_multifd_secondary_failover(char *name,
> > + MigrateCommon *args)
> > +{
> > + test_colo_multifd_common(args, false, false);
> > +}
> > +
> > +static void test_colo_plain_primary_failover_checkpoint(char *name,
> > + MigrateCommon *args)
> > +{
> > + test_colo_plain_common(args, true, true);
> > +}
> > +
> > +static void test_colo_plain_secondary_failover_checkpoint(char *name,
> > + MigrateCommon *args)
> > +{
> > + test_colo_plain_common(args, true, false);
> > +}
> > +
> > +static void test_colo_multifd_primary_failover_checkpoint(char *name,
> > + MigrateCommon *args)
> > +{
> > + test_colo_multifd_common(args, true, true);
> > +}
> > +
> > +static void test_colo_multifd_secondary_failover_checkpoint(char *name,
> > + MigrateCommon *args)
> > +{
> > + test_colo_multifd_common(args, true, false);
> > +}
> > +
> > +void migration_test_add_colo(MigrationTestEnv *env)
> > +{
> > + if (!env->full_set) {
> > + return;
> > + }
> > +
> > + migration_test_add("/migration/colo/plain/primary_failover",
> > + test_colo_plain_primary_failover);
> > + migration_test_add("/migration/colo/plain/secondary_failover",
> > + test_colo_plain_secondary_failover);
> > +
> > + migration_test_add("/migration/colo/multifd/primary_failover",
> > + test_colo_multifd_primary_failover);
> > + migration_test_add("/migration/colo/multifd/secondary_failover",
> > + test_colo_multifd_secondary_failover);
> > +
> > + migration_test_add("/migration/colo/plain/primary_failover_checkpoint",
> > + test_colo_plain_primary_failover_checkpoint);
> > + migration_test_add("/migration/colo/plain/secondary_failover_checkpoint",
> > + test_colo_plain_secondary_failover_checkpoint);
> > +
> > + migration_test_add("/migration/colo/multifd/primary_failover_checkpoint",
> > + test_colo_multifd_primary_failover_checkpoint);
> > + migration_test_add("/migration/colo/multifd/secondary_failover_checkpoint",
> > + test_colo_multifd_secondary_failover_checkpoint);
> > +}
> > diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
> > index 40984d04930da2d181326d9f6a742bde49018103..80eef758932ce9c301ed6c0f6383d18756144870 100644
> > --- a/tests/qtest/migration/framework.h
> > +++ b/tests/qtest/migration/framework.h
> > @@ -264,5 +264,10 @@ void migration_test_add_file(MigrationTestEnv *env);
> > void migration_test_add_precopy(MigrationTestEnv *env);
> > void migration_test_add_cpr(MigrationTestEnv *env);
> > void migration_test_add_misc(MigrationTestEnv *env);
> > +#ifdef CONFIG_REPLICATION
> > +void migration_test_add_colo(MigrationTestEnv *env);
> > +#else
> > +static inline void migration_test_add_colo(MigrationTestEnv *env) {};
> > +#endif
> >
> > #endif /* TEST_FRAMEWORK_H */
> >
> > --
> > 2.39.5
> >
> >
>
> I was running the qtests locally, and I encountered a timeout error.
>
> Command run: mkdir -p build ; cd build ; make check-qtest-x86_64;
>
> Following is the output:
> ======
> 67/67 qtest+qtest-x86_64 - qemu:qtest-x86_64/migration-test TIMEOUT 480.05s killed by signal 15 SIGTERM
> >>> QTEST_QEMU_IMG=./qemu-img LD_LIBRARY_PATH=/home/arun/workdir/new/devel/upstream/qemu-priv/build/subprojects/slirp RUST_BACKTRACE=1 QTEST_QEMU_BINARY=./qemu-system-x86_64 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 G_TEST_DBUS_DAEMON=/home/arun/workdir/new/devel/upstream/qemu-priv/tests/dbus-vmstate-daemon.sh ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 PYTHON=/home/arun/workdir/new/devel/upstream/qemu-priv/build/pyvenv/bin/python3 MESON_TEST_ITERATION=1 MALLOC_PERTURB_=53 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 /home/arun/workdir/new/devel/upstream/qemu-priv/build/tests/qtest/migration-test --tap -k --full
> stderr:
>
> TAP parsing error: Too few tests run (expected 52, got 47)
>
> Summary of Failures:
> 67/67 qtest+qtest-x86_64 - qemu:qtest-x86_64/migration-test TIMEOUT 480.05s killed by signal 15 SIGTERM
> Ok: 64
> Fail: 0
> Skipped: 2
> Timeout: 1
> ======
>
> It seems that the test runner is stuck waiting for some input.
> Following is the stack trace
> ======
> > ps afx
> 127267 pts/0 S+ 0:00 | | \_ make check-qtest-x86_64 -j8
> 128245 pts/0 S+ 0:01 | | \_ /home/arun/workdir/new/devel/upstream/qemu-priv/build/pyvenv/bin/python3 /home/arun/workdir
> 128276 ? Ssl 0:07 | | \_ /home/arun/workdir/new/devel/upstream/qemu-priv/build/tests/qtest/migration-test --tap
> 134107 ? Sl 0:20 | | \_ ./qemu-system-x86_64 -qtest unix:/tmp/qtest-128276.sock -qtest-log /dev/null -chard
> 134115 ? Sl 0:22 | | \_ ./qemu-system-x86_64 -qtest unix:/tmp/qtest-128276.sock -qtest-log /dev/null -chard
> 5610 pts/2 Ss 0:01 | \_ /usr/bin/bash
>
> ======
> gstack 128276
> Thread 2 (Thread 0x7fdd090716c0 (LWP 128279) "call_rcu"):
> #0 0x00007fdd0921434d in syscall () from /lib64/libc.so.6
> #1 0x0000557fd604563a in qemu_futex_wait (f=0x557fd60a0190 <rcu_call_ready_event>, val=4294967295) at /home/arun/workdir/new/devel/upstream/qemu-priv/include/qemu/futex.h:47
> #2 0x0000557fd604584e in qemu_event_wait (ev=0x557fd60a0190 <rcu_call_ready_event>) at ../util/event.c:162
> #3 0x0000557fd6045fde in call_rcu_thread (opaque=0x0) at ../util/rcu.c:304
> #4 0x0000557fd600e8fb in qemu_thread_start (args=0x557fd6beec70) at ../util/qemu-thread-posix.c:414
> #5 0x00007fdd09193464 in start_thread () from /lib64/libc.so.6
> #6 0x00007fdd092165ac in __clone3 () from /lib64/libc.so.6
>
> Thread 1 (Thread 0x7fdd09073240 (LWP 128276) "migration-test"):
> #0 0x00007fdd0919b982 in __syscall_cancel_arch () from /lib64/libc.so.6
> #1 0x00007fdd0918fc3c in __internal_syscall_cancel () from /lib64/libc.so.6
> #2 0x00007fdd091dfb62 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
> #3 0x00007fdd091ebb37 in nanosleep () from /lib64/libc.so.6
> #4 0x00007fdd0921613a in usleep () from /lib64/libc.so.6
> #5 0x0000557fd5fd99cd in wait_for_serial (side=0x557fd6065f08 "dest_serial") at ../tests/qtest/migration/framework.c:82
> #6 0x0000557fd5fe5865 in test_colo_common (args=0x557fd6bfdf50, failover_during_checkpoint=false, primary_failover=true) at ../tests/qtest/migration/colo-tests.c:66
60 migrate_qmp(from, to, args->connect_uri, NULL, "{}");
61
62 wait_for_migration_status(from, "colo", NULL);
63 wait_for_resume(to, get_dst());
64
65 wait_for_serial("src_serial");
66 wait_for_serial("dest_serial");
Interesting, so the secondary guest is stuck/crahsed after entering
colo state despite having resumed.
It works fine here on master. And before the merge I have looped the
colo tests for a whole day on my machine without any failures.
How often does this happen? What is the commit you are on, host, ASAN,
MSAN, UBSAN, configure options? With kvm or without?
Can you try the following and show me the log:
(cd build && QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_LOG=- tests/qtest/migration-test --full -p /x86_64/migration/colo/plain)
> #7 0x0000557fd5fe5a0f in test_colo_plain_common (args=0x557fd6bfdf50, failover_during_checkpoint=false, primary_failover=true) at ../tests/qtest/migration/colo-tests.c:106
> #8 0x0000557fd5fe5ad7 in test_colo_plain_primary_failover (name=0x557fd6bfd050 "/migration/colo/plain/primary_failover", args=0x557fd6bfdf50) at ../tests/qtest/migration/colo-tests.c:126
> #9 0x0000557fd5fdeff8 in migration_test_wrapper (data=0x557fd6bfd320) at ../tests/qtest/migration/migration-util.c:258
> #10 0x00007fdd0947bf3e in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
> #11 0x00007fdd0947beb3 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
> #12 0x00007fdd0947beb3 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
> #13 0x00007fdd0947beb3 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
> #14 0x00007fdd0947beb3 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
> #15 0x00007fdd0947c46a in g_test_run_suite () from /lib64/libglib-2.0.so.0
> #16 0x00007fdd0947c500 in g_test_run () from /lib64/libglib-2.0.so.0
> #17 0x0000557fd5fd9490 in main (argc=1, argv=0x7fff33b51908) at ../tests/qtest/migration-test.c:60
>
>
> Is there something that I am missing? Can you please look into this?
>
>
> Regards,
> Arun Menon
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v11 13/21] migration-test: Add COLO migration unit test
2026-03-10 15:29 ` Lukas Straub
@ 2026-03-10 17:42 ` Arun Menon
2026-03-19 22:40 ` Lukas Straub
0 siblings, 1 reply; 36+ messages in thread
From: Arun Menon @ 2026-03-10 17:42 UTC (permalink / raw)
To: Lukas Straub
Cc: qemu-devel, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
[-- Attachment #1: Type: text/plain, Size: 18921 bytes --]
Hi Lukas,
On Tue, Mar 10, 2026 at 04:29:54PM +0100, Lukas Straub wrote:
> On Tue, 10 Mar 2026 20:17:57 +0530
> Arun Menon <armenon@redhat.com> wrote:
>
> > Hi Lukas,
> >
> > On Mon, Mar 02, 2026 at 12:45:28PM +0100, Lukas Straub wrote:
> > > Add a COLO migration test for COLO migration and failover.
> > >
> > > Reviewed-by: Fabiano Rosas <farosas@suse.de>
> > > Tested-by: Fabiano Rosas <farosas@suse.de>
> > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > > ---
> > > MAINTAINERS | 1 +
> > > tests/qtest/meson.build | 7 +-
> > > tests/qtest/migration-test.c | 1 +
> > > tests/qtest/migration/colo-tests.c | 198 +++++++++++++++++++++++++++++++++++++
> > > tests/qtest/migration/framework.h | 5 +
> > > 5 files changed, 211 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index d2a1f4cc08223cb944b61e32a6d89e25bf82eacb..1b0ae10750036be00571b7104ad8426c071bb54c 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -3875,6 +3875,7 @@ F: migration/colo*
> > > F: migration/multifd-colo.*
> > > F: include/migration/colo.h
> > > F: include/migration/failover.h
> > > +F: tests/qtest/migration/colo-tests.c
> > > F: docs/COLO-FT.txt
> > >
> > > COLO Proxy
> > > diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> > > index 25fdbc798010b19e8ec9b6ab55e02d3fb5741398..6a46e2a767de12d978d910ddb6de175bce9810b8 100644
> > > --- a/tests/qtest/meson.build
> > > +++ b/tests/qtest/meson.build
> > > @@ -374,6 +374,11 @@ if gnutls.found()
> > > endif
> > > endif
> > >
> > > +migration_colo_files = []
> > > +if get_option('replication').allowed()
> > > + migration_colo_files = [files('migration/colo-tests.c')]
> > > +endif
> > > +
> > > qtests = {
> > > 'aspeed_hace-test': files('aspeed-hace-utils.c', 'aspeed_hace-test.c'),
> > > 'aspeed_smc-test': files('aspeed-smc-utils.c', 'aspeed_smc-test.c'),
> > > @@ -385,7 +390,7 @@ qtests = {
> > > 'migration/migration-util.c') + dbus_vmstate1,
> > > 'erst-test': files('erst-test.c'),
> > > 'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
> > > - 'migration-test': test_migration_files + migration_tls_files,
> > > + 'migration-test': test_migration_files + migration_tls_files + migration_colo_files,
> > > 'pxe-test': files('boot-sector.c'),
> > > 'pnv-xive2-test': files('pnv-xive2-common.c', 'pnv-xive2-flush-sync.c',
> > > 'pnv-xive2-nvpg_bar.c'),
> > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > > index 08936871741535c926eeac40a7d7c3f461c72fd0..e582f05c7dc2673dbd05a936df8feb6c964b5bbc 100644
> > > --- a/tests/qtest/migration-test.c
> > > +++ b/tests/qtest/migration-test.c
> > > @@ -55,6 +55,7 @@ int main(int argc, char **argv)
> > > migration_test_add_precopy(env);
> > > migration_test_add_cpr(env);
> > > migration_test_add_misc(env);
> > > + migration_test_add_colo(env);
> > >
> > > ret = g_test_run();
> > >
> > > diff --git a/tests/qtest/migration/colo-tests.c b/tests/qtest/migration/colo-tests.c
> > > new file mode 100644
> > > index 0000000000000000000000000000000000000000..598a1d3821ed0a90318732702027cebad47352fd
> > > --- /dev/null
> > > +++ b/tests/qtest/migration/colo-tests.c
> > > @@ -0,0 +1,198 @@
> > > +/*
> > > + * SPDX-License-Identifier: GPL-2.0-or-later
> > > + *
> > > + * QTest testcases for COLO migration
> > > + *
> > > + * Copyright (c) 2025 Lukas Straub <lukasstraub2@web.de>
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > + * See the COPYING file in the top-level directory.
> > > + *
> > > + */
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "libqtest.h"
> > > +#include "migration/framework.h"
> > > +#include "migration/migration-qmp.h"
> > > +#include "migration/migration-util.h"
> > > +#include "qemu/module.h"
> > > +
> > > +static int test_colo_common(MigrateCommon *args,
> > > + bool failover_during_checkpoint,
> > > + bool primary_failover)
> > > +{
> > > + QTestState *from, *to;
> > > + void *data_hook = NULL;
> > > +
> > > + /*
> > > + * For the COLO test, both VMs will run in parallel. Thus both VMs want to
> > > + * open the image read/write at the same time. Using read-only=on is not
> > > + * possible here, because ide-hd does not support read-only backing image.
> > > + *
> > > + * So use -snapshot, where each qemu instance creates its own writable
> > > + * snapshot internally while leaving the real image read-only.
> > > + */
> > > + args->start.opts_source = "-snapshot";
> > > + args->start.opts_target = "-snapshot";
> > > +
> > > + /*
> > > + * COLO migration code logs many errors when the migration socket
> > > + * is shut down, these are expected so we hide them here.
> > > + */
> > > + args->start.hide_stderr = true;
> > > +
> > > + /*
> > > + * Test with yank with out of band capability since that is how it is
> > > + * used in production.
> > > + */
> > > + args->start.oob = true;
> > > + args->start.caps[MIGRATION_CAPABILITY_X_COLO] = true;
> > > +
> > > + if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
> > > + return -1;
> > > + }
> > > +
> > > + migrate_set_parameter_int(from, "x-checkpoint-delay", 300);
> > > +
> > > + if (args->start_hook) {
> > > + data_hook = args->start_hook(from, to);
> > > + }
> > > +
> > > + migrate_ensure_converge(from);
> > > + wait_for_serial("src_serial");
> > > +
> > > + migrate_qmp(from, to, args->connect_uri, NULL, "{}");
> > > +
> > > + wait_for_migration_status(from, "colo", NULL);
> > > + wait_for_resume(to, get_dst());
> > > +
> > > + wait_for_serial("src_serial");
> > > + wait_for_serial("dest_serial");
> > > +
> > > + /* wait for 3 checkpoints */
> > > + for (int i = 0; i < 3; i++) {
> > > + qtest_qmp_eventwait(to, "RESUME");
> > > + wait_for_serial("src_serial");
> > > + wait_for_serial("dest_serial");
> > > + }
> > > +
> > > + if (failover_during_checkpoint) {
> > > + qtest_qmp_eventwait(to, "STOP");
> > > + }
> > > + if (primary_failover) {
> > > + qtest_qmp_assert_success(from, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
> > > + "'arguments': {'instances':"
> > > + "[{'type': 'migration'}]}}");
> > > + qtest_qmp_assert_success(from, "{'execute': 'x-colo-lost-heartbeat'}");
> > > + wait_for_serial("src_serial");
> > > + } else {
> > > + qtest_qmp_assert_success(to, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
> > > + "'arguments': {'instances':"
> > > + "[{'type': 'migration'}]}}");
> > > + qtest_qmp_assert_success(to, "{'execute': 'x-colo-lost-heartbeat'}");
> > > + wait_for_serial("dest_serial");
> > > + }
> > > +
> > > + if (args->end_hook) {
> > > + args->end_hook(from, to, data_hook);
> > > + }
> > > +
> > > + migrate_end(from, to, !primary_failover);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static void test_colo_plain_common(MigrateCommon *args,
> > > + bool failover_during_checkpoint,
> > > + bool primary_failover)
> > > +{
> > > + args->listen_uri = "tcp:127.0.0.1:0";
> > > + test_colo_common(args, failover_during_checkpoint, primary_failover);
> > > +}
> > > +
> > > +static void *hook_start_multifd(QTestState *from, QTestState *to)
> > > +{
> > > + return migrate_hook_start_precopy_tcp_multifd_common(from, to, "none");
> > > +}
> > > +
> > > +static void test_colo_multifd_common(MigrateCommon *args,
> > > + bool failover_during_checkpoint,
> > > + bool primary_failover)
> > > +{
> > > + args->listen_uri = "defer";
> > > + args->start_hook = hook_start_multifd;
> > > + args->start.caps[MIGRATION_CAPABILITY_MULTIFD] = true;
> > > + test_colo_common(args, failover_during_checkpoint, primary_failover);
> > > +}
> > > +
> > > +static void test_colo_plain_primary_failover(char *name, MigrateCommon *args)
> > > +{
> > > + test_colo_plain_common(args, false, true);
> > > +}
> > > +
> > > +static void test_colo_plain_secondary_failover(char *name, MigrateCommon *args)
> > > +{
> > > + test_colo_plain_common(args, false, false);
> > > +}
> > > +
> > > +static void test_colo_multifd_primary_failover(char *name, MigrateCommon *args)
> > > +{
> > > + test_colo_multifd_common(args, false, true);
> > > +}
> > > +
> > > +static void test_colo_multifd_secondary_failover(char *name,
> > > + MigrateCommon *args)
> > > +{
> > > + test_colo_multifd_common(args, false, false);
> > > +}
> > > +
> > > +static void test_colo_plain_primary_failover_checkpoint(char *name,
> > > + MigrateCommon *args)
> > > +{
> > > + test_colo_plain_common(args, true, true);
> > > +}
> > > +
> > > +static void test_colo_plain_secondary_failover_checkpoint(char *name,
> > > + MigrateCommon *args)
> > > +{
> > > + test_colo_plain_common(args, true, false);
> > > +}
> > > +
> > > +static void test_colo_multifd_primary_failover_checkpoint(char *name,
> > > + MigrateCommon *args)
> > > +{
> > > + test_colo_multifd_common(args, true, true);
> > > +}
> > > +
> > > +static void test_colo_multifd_secondary_failover_checkpoint(char *name,
> > > + MigrateCommon *args)
> > > +{
> > > + test_colo_multifd_common(args, true, false);
> > > +}
> > > +
> > > +void migration_test_add_colo(MigrationTestEnv *env)
> > > +{
> > > + if (!env->full_set) {
> > > + return;
> > > + }
> > > +
> > > + migration_test_add("/migration/colo/plain/primary_failover",
> > > + test_colo_plain_primary_failover);
> > > + migration_test_add("/migration/colo/plain/secondary_failover",
> > > + test_colo_plain_secondary_failover);
> > > +
> > > + migration_test_add("/migration/colo/multifd/primary_failover",
> > > + test_colo_multifd_primary_failover);
> > > + migration_test_add("/migration/colo/multifd/secondary_failover",
> > > + test_colo_multifd_secondary_failover);
> > > +
> > > + migration_test_add("/migration/colo/plain/primary_failover_checkpoint",
> > > + test_colo_plain_primary_failover_checkpoint);
> > > + migration_test_add("/migration/colo/plain/secondary_failover_checkpoint",
> > > + test_colo_plain_secondary_failover_checkpoint);
> > > +
> > > + migration_test_add("/migration/colo/multifd/primary_failover_checkpoint",
> > > + test_colo_multifd_primary_failover_checkpoint);
> > > + migration_test_add("/migration/colo/multifd/secondary_failover_checkpoint",
> > > + test_colo_multifd_secondary_failover_checkpoint);
> > > +}
> > > diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
> > > index 40984d04930da2d181326d9f6a742bde49018103..80eef758932ce9c301ed6c0f6383d18756144870 100644
> > > --- a/tests/qtest/migration/framework.h
> > > +++ b/tests/qtest/migration/framework.h
> > > @@ -264,5 +264,10 @@ void migration_test_add_file(MigrationTestEnv *env);
> > > void migration_test_add_precopy(MigrationTestEnv *env);
> > > void migration_test_add_cpr(MigrationTestEnv *env);
> > > void migration_test_add_misc(MigrationTestEnv *env);
> > > +#ifdef CONFIG_REPLICATION
> > > +void migration_test_add_colo(MigrationTestEnv *env);
> > > +#else
> > > +static inline void migration_test_add_colo(MigrationTestEnv *env) {};
> > > +#endif
> > >
> > > #endif /* TEST_FRAMEWORK_H */
> > >
> > > --
> > > 2.39.5
> > >
> > >
> >
> > I was running the qtests locally, and I encountered a timeout error.
> >
> > Command run: mkdir -p build ; cd build ; make check-qtest-x86_64;
> >
> > Following is the output:
> > ======
> > 67/67 qtest+qtest-x86_64 - qemu:qtest-x86_64/migration-test TIMEOUT 480.05s killed by signal 15 SIGTERM
> > >>> QTEST_QEMU_IMG=./qemu-img LD_LIBRARY_PATH=/home/arun/workdir/new/devel/upstream/qemu-priv/build/subprojects/slirp RUST_BACKTRACE=1 QTEST_QEMU_BINARY=./qemu-system-x86_64 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 G_TEST_DBUS_DAEMON=/home/arun/workdir/new/devel/upstream/qemu-priv/tests/dbus-vmstate-daemon.sh ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 PYTHON=/home/arun/workdir/new/devel/upstream/qemu-priv/build/pyvenv/bin/python3 MESON_TEST_ITERATION=1 MALLOC_PERTURB_=53 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 /home/arun/workdir/new/devel/upstream/qemu-priv/build/tests/qtest/migration-test --tap -k --full
> > stderr:
> >
> > TAP parsing error: Too few tests run (expected 52, got 47)
> >
> > Summary of Failures:
> > 67/67 qtest+qtest-x86_64 - qemu:qtest-x86_64/migration-test TIMEOUT 480.05s killed by signal 15 SIGTERM
> > Ok: 64
> > Fail: 0
> > Skipped: 2
> > Timeout: 1
> > ======
> >
> > It seems that the test runner is stuck waiting for some input.
> > Following is the stack trace
> > ======
> > > ps afx
> > 127267 pts/0 S+ 0:00 | | \_ make check-qtest-x86_64 -j8
> > 128245 pts/0 S+ 0:01 | | \_ /home/arun/workdir/new/devel/upstream/qemu-priv/build/pyvenv/bin/python3 /home/arun/workdir
> > 128276 ? Ssl 0:07 | | \_ /home/arun/workdir/new/devel/upstream/qemu-priv/build/tests/qtest/migration-test --tap
> > 134107 ? Sl 0:20 | | \_ ./qemu-system-x86_64 -qtest unix:/tmp/qtest-128276.sock -qtest-log /dev/null -chard
> > 134115 ? Sl 0:22 | | \_ ./qemu-system-x86_64 -qtest unix:/tmp/qtest-128276.sock -qtest-log /dev/null -chard
> > 5610 pts/2 Ss 0:01 | \_ /usr/bin/bash
> >
> > ======
> > gstack 128276
> > Thread 2 (Thread 0x7fdd090716c0 (LWP 128279) "call_rcu"):
> > #0 0x00007fdd0921434d in syscall () from /lib64/libc.so.6
> > #1 0x0000557fd604563a in qemu_futex_wait (f=0x557fd60a0190 <rcu_call_ready_event>, val=4294967295) at /home/arun/workdir/new/devel/upstream/qemu-priv/include/qemu/futex.h:47
> > #2 0x0000557fd604584e in qemu_event_wait (ev=0x557fd60a0190 <rcu_call_ready_event>) at ../util/event.c:162
> > #3 0x0000557fd6045fde in call_rcu_thread (opaque=0x0) at ../util/rcu.c:304
> > #4 0x0000557fd600e8fb in qemu_thread_start (args=0x557fd6beec70) at ../util/qemu-thread-posix.c:414
> > #5 0x00007fdd09193464 in start_thread () from /lib64/libc.so.6
> > #6 0x00007fdd092165ac in __clone3 () from /lib64/libc.so.6
> >
> > Thread 1 (Thread 0x7fdd09073240 (LWP 128276) "migration-test"):
> > #0 0x00007fdd0919b982 in __syscall_cancel_arch () from /lib64/libc.so.6
> > #1 0x00007fdd0918fc3c in __internal_syscall_cancel () from /lib64/libc.so.6
> > #2 0x00007fdd091dfb62 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
> > #3 0x00007fdd091ebb37 in nanosleep () from /lib64/libc.so.6
> > #4 0x00007fdd0921613a in usleep () from /lib64/libc.so.6
> > #5 0x0000557fd5fd99cd in wait_for_serial (side=0x557fd6065f08 "dest_serial") at ../tests/qtest/migration/framework.c:82
> > #6 0x0000557fd5fe5865 in test_colo_common (args=0x557fd6bfdf50, failover_during_checkpoint=false, primary_failover=true) at ../tests/qtest/migration/colo-tests.c:66
>
> 60 migrate_qmp(from, to, args->connect_uri, NULL, "{}");
> 61
> 62 wait_for_migration_status(from, "colo", NULL);
> 63 wait_for_resume(to, get_dst());
> 64
> 65 wait_for_serial("src_serial");
> 66 wait_for_serial("dest_serial");
>
> Interesting, so the secondary guest is stuck/crahsed after entering
> colo state despite having resumed.
>
> It works fine here on master. And before the merge I have looped the
> colo tests for a whole day on my machine without any failures.
>
> How often does this happen? What is the commit you are on, host, ASAN,
> MSAN, UBSAN, configure options? With kvm or without?
I have started the run on a fedora-43 VM. I am on commit:
de61484ec39f418e5c0d4603017695f9ffb8fe24 master branch.
The configuration command I used was:
../configure --target-list=x86_64-softmmu --enable-debug --enable-trace-backends=log --enable-slirp
Targets and accelerators
KVM support : YES
>
> Can you try the following and show me the log:
> (cd build && QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_LOG=- tests/qtest/migration-test --full -p /x86_64/migration/colo/plain)
>
I have attached the log file : qemu.log to this email. It is stuck at
the last line :
{"timestamp": {"seconds": 1773164370, "microseconds": 609791}, "event": "RESUME"}
It should be some configuration miss from my end.
Thank you for looking into it.
> > #7 0x0000557fd5fe5a0f in test_colo_plain_common (args=0x557fd6bfdf50, failover_during_checkpoint=false, primary_failover=true) at ../tests/qtest/migration/colo-tests.c:106
> > #8 0x0000557fd5fe5ad7 in test_colo_plain_primary_failover (name=0x557fd6bfd050 "/migration/colo/plain/primary_failover", args=0x557fd6bfdf50) at ../tests/qtest/migration/colo-tests.c:126
> > #9 0x0000557fd5fdeff8 in migration_test_wrapper (data=0x557fd6bfd320) at ../tests/qtest/migration/migration-util.c:258
> > #10 0x00007fdd0947bf3e in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
> > #11 0x00007fdd0947beb3 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
> > #12 0x00007fdd0947beb3 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
> > #13 0x00007fdd0947beb3 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
> > #14 0x00007fdd0947beb3 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
> > #15 0x00007fdd0947c46a in g_test_run_suite () from /lib64/libglib-2.0.so.0
> > #16 0x00007fdd0947c500 in g_test_run () from /lib64/libglib-2.0.so.0
> > #17 0x0000557fd5fd9490 in main (argc=1, argv=0x7fff33b51908) at ../tests/qtest/migration-test.c:60
> >
> >
> > Is there something that I am missing? Can you please look into this?
> >
> >
> > Regards,
> > Arun Menon
> >
>
Regards,
Arun Menon
[-- Attachment #2: qemu.log --]
[-- Type: text/plain, Size: 24210 bytes --]
arun@fedora:~/workdir/new/devel/upstream/qemu-priv/build$ QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_LOG=- tests/qtest/migration-test --full -p /x86_64/migration/colo/plain
TAP version 14
# random seed: R02S867993a6b1d4e9cff5bed399f88887cc
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-44479.sock -qtest-log /dev/fd/2 -chardev socket,path=/tmp/qtest-44479.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -run-with exit-with-parent=on -machine none -accel qtest
[I 0.000000] OPENED
[R +0.000934] endianness
[S +0.002373] OK little
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 10}, "package": "v10.0.0-7472-gde61484ec3"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities"}
{"return": {}}
{"execute": "qom-list-types", "arguments": {"abstract": false, "implements": "accel"}}
{"return": [{"name": "qtest-accel", "parent": "accel"}, {"name": "tcg-accel", "parent": "accel"}, {"name": "mshv-accel", "parent": "accel"}, {"name": "nitro-accel", "parent": "accel"}, {"name": "kvm-accel", "parent": "accel"}]}
[I +0.016144] CLOSED
# Skipping test: userfaultfd not available
[I 0.000000] OPENED
[R +0.001125] endianness
[S +0.001197] OK little
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 10}, "package": "v10.0.0-7472-gde61484ec3"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities"}
{"return": {}}
{"execute": "query-machines"}
{"return": [{"hotpluggable-cpus": true, "name": "pc-q35-5.2", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 288, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-10.2", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-6.2", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-5.2", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-9.1", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 4096, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-7.1", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 288, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-8.1", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 1024, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": false, "name": "none", "numa-mem-supported": false, "acpi": false, "cpu-max": 1, "deprecated": false, "default-ram-id": "ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-11.0", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 4096, "deprecated": false, "default-ram-id": "pc.ram", "alias": "q35"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-8.1", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-6.1", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 288, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": false, "name": "nitro", "numa-mem-supported": false, "acpi": false, "cpu-max": 4096, "deprecated": false, "default-ram-id": "ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-9.1", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-10.0", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 4096, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-7.1", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": false, "name": "x-remote", "numa-mem-supported": false, "acpi": false, "cpu-max": 1, "deprecated": false}, {"hotpluggable-cpus": true, "name": "pc-i440fx-10.1", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-5.1", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 288, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-6.1", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-5.1", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "isapc", "numa-mem-supported": false, "default-cpu-type": "486-x86_64-cpu", "acpi": true, "cpu-max": 1, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-9.0", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 4096, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-10.2", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 4096, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-7.0", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 288, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-8.0", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 288, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-8.0", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-6.0", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 288, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-9.0", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-11.0", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "is-default": true, "cpu-max": 255, "deprecated": false, "default-ram-id": "pc.ram", "alias": "pc"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-7.0", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-10.0", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-9.2", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 4096, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-7.2", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 288, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-6.0", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": false, "name": "microvm", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 288, "deprecated": false, "default-ram-id": "microvm.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-8.2", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 1024, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-8.2", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-6.2", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 288, "deprecated": true, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-9.2", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-q35-10.1", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 4096, "deprecated": false, "default-ram-id": "pc.ram"}, {"hotpluggable-cpus": true, "name": "pc-i440fx-7.2", "numa-mem-supported": false, "default-cpu-type": "qemu64-x86_64-cpu", "acpi": true, "cpu-max": 255, "deprecated": true, "default-ram-id": "pc.ram"}]}
[I +0.175507] CLOSED
# Start of x86_64 tests
# Start of migration tests
# Start of colo tests
# Start of plain tests
# Running /x86_64/migration/colo/plain/primary_failover
# Using machine type: pc-q35-11.0
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-44479.sock -qtest-log /dev/fd/2 -chardev socket,path=/tmp/qtest-44479.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -run-with exit-with-parent=on -accel kvm -accel tcg -machine pc-q35-11.0, -name source,debug-threads=on -machine memory-backend=mig.mem -object memory-backend-ram,id=mig.mem,size=150M,share=off -serial file:/tmp/migration-test-WV15L3/src_serial -drive if=none,id=d0,file=/tmp/migration-test-WV15L3/bootsect,format=raw -device ide-hd,drive=d0,secs=1,cyls=1,heads=1 -snapshot -accel qtest
[I 0.000001] OPENED
[R +0.059029] endianness
[S +0.059071] OK little
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 10}, "package": "v10.0.0-7472-gde61484ec3"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities", "arguments": {"enable": ["oob"]}}
{"return": {}}
# starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-44479.sock -qtest-log /dev/fd/2 -chardev socket,path=/tmp/qtest-44479.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -run-with exit-with-parent=on -accel kvm -accel tcg -machine pc-q35-11.0, -name target,debug-threads=on -machine memory-backend=mig.mem -object memory-backend-ram,id=mig.mem,size=150M,share=off -serial file:/tmp/migration-test-WV15L3/dest_serial -incoming tcp:127.0.0.1:0 -drive if=none,id=d0,file=/tmp/migration-test-WV15L3/bootsect,format=raw -device ide-hd,drive=d0,secs=1,cyls=1,heads=1 -snapshot -accel qtest
[I 0.000001] OPENED
[R +0.045500] endianness
[S +0.045546] OK little
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 10}, "package": "v10.0.0-7472-gde61484ec3"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities", "arguments": {"enable": ["oob"]}}
{"return": {}}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"state": true, "capability": "return-path"}]}}
{"return": {}}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"state": true, "capability": "return-path"}]}}
{"return": {}}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"state": true, "capability": "x-colo"}]}}
{"return": {}}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"state": true, "capability": "x-colo"}]}}
{"return": {}}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"state": true, "capability": "return-path"}]}}
{"return": {}}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"state": true, "capability": "return-path"}]}}
{"return": {}}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"state": true, "capability": "events"}]}}
{"return": {}}
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"state": true, "capability": "events"}]}}
{"return": {}}
{"execute": "migrate-set-parameters", "arguments": {"x-checkpoint-delay": 300}}
{"return": {}}
{"execute": "query-migrate-parameters"}
{"return": {"cpu-throttle-tailslow": false, "xbzrle-cache-size": 67108864, "cpu-throttle-initial": 20, "announce-max": 550, "direct-io": false, "avail-switchover-bandwidth": 0, "zero-page-detection": "multifd", "multifd-qatzip-level": 1, "multifd-channels": 2, "mode": "normal", "multifd-zstd-level": 1, "announce-initial": 50, "downtime-limit": 300, "tls-authz": "", "cpr-exec-command": [], "vcpu-dirty-limit": 1, "multifd-compression": "none", "announce-rounds": 5, "announce-step": 100, "tls-creds": "", "x-vcpu-dirty-limit-period": 1000, "multifd-zlib-level": 1, "max-cpu-throttle": 99, "max-postcopy-bandwidth": 0, "tls-hostname": "", "throttle-trigger-threshold": 50, "max-bandwidth": 134217728, "x-checkpoint-delay": 300, "cpu-throttle-increment": 10}}
{"execute": "migrate-set-parameters", "arguments": {"max-bandwidth": 1000000000}}
{"return": {}}
{"execute": "query-migrate-parameters"}
{"return": {"cpu-throttle-tailslow": false, "xbzrle-cache-size": 67108864, "cpu-throttle-initial": 20, "announce-max": 550, "direct-io": false, "avail-switchover-bandwidth": 0, "zero-page-detection": "multifd", "multifd-qatzip-level": 1, "multifd-channels": 2, "mode": "normal", "multifd-zstd-level": 1, "announce-initial": 50, "downtime-limit": 300, "tls-authz": "", "cpr-exec-command": [], "vcpu-dirty-limit": 1, "multifd-compression": "none", "announce-rounds": 5, "announce-step": 100, "tls-creds": "", "x-vcpu-dirty-limit-period": 1000, "multifd-zlib-level": 1, "max-cpu-throttle": 99, "max-postcopy-bandwidth": 0, "tls-hostname": "", "throttle-trigger-threshold": 50, "max-bandwidth": 1000000000, "x-checkpoint-delay": 300, "cpu-throttle-increment": 10}}
{"execute": "migrate-set-parameters", "arguments": {"downtime-limit": 30000}}
{"return": {}}
{"execute": "query-migrate-parameters"}
{"return": {"cpu-throttle-tailslow": false, "xbzrle-cache-size": 67108864, "cpu-throttle-initial": 20, "announce-max": 550, "direct-io": false, "avail-switchover-bandwidth": 0, "zero-page-detection": "multifd", "multifd-qatzip-level": 1, "multifd-channels": 2, "mode": "normal", "multifd-zstd-level": 1, "announce-initial": 50, "downtime-limit": 30000, "tls-authz": "", "cpr-exec-command": [], "vcpu-dirty-limit": 1, "multifd-compression": "none", "announce-rounds": 5, "announce-step": 100, "tls-creds": "", "x-vcpu-dirty-limit-period": 1000, "multifd-zlib-level": 1, "max-cpu-throttle": 99, "max-postcopy-bandwidth": 0, "tls-hostname": "", "throttle-trigger-threshold": 50, "max-bandwidth": 1000000000, "x-checkpoint-delay": 300, "cpu-throttle-increment": 10}}
{"execute": "query-migrate"}
{"return": {"status": "setup", "socket-address": [{"port": "45981", "ipv4": true, "host": "127.0.0.1", "type": "inet"}]}}
{"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.1:45981"}}
{"timestamp": {"seconds": 1773164234, "microseconds": 337705}, "event": "MIGRATION", "data": {"status": "setup"}}
{"return": {}}
{"execute": "query-migrate"}
{"timestamp": {"seconds": 1773164234, "microseconds": 342267}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
{"timestamp": {"seconds": 1773164234, "microseconds": 342442}, "event": "MIGRATION", "data": {"status": "active"}}
{"return": {"expected-downtime": 30000, "status": "active", "setup-time": 3, "total-time": 5, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 0, "downtime-bytes": 0, "page-size": 4096, "remaining": 174923776, "postcopy-bytes": 0, "mbps": 0, "transferred": 295, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 0, "duplicate": 0, "dirty-pages-rate": 0, "normal-bytes": 0, "normal": 0}}}
{"execute": "query-migrate"}
{"return": {"expected-downtime": 30000, "status": "active", "setup-time": 3, "total-time": 21, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 0, "downtime-bytes": 0, "page-size": 4096, "remaining": 169865216, "postcopy-bytes": 0, "mbps": 0, "transferred": 4073380, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 4200307, "duplicate": 211, "dirty-pages-rate": 0, "normal-bytes": 4190208, "normal": 1023}}}
{"execute": "query-migrate"}
{"return": {"expected-downtime": 30000, "status": "active", "setup-time": 3, "total-time": 28, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 0, "downtime-bytes": 0, "page-size": 4096, "remaining": 166457344, "postcopy-bytes": 0, "mbps": 0, "transferred": 7487908, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 7614835, "duplicate": 211, "dirty-pages-rate": 0, "normal-bytes": 7598080, "normal": 1855}}}
{"execute": "query-migrate"}
{"return": {"expected-downtime": 30000, "status": "active", "setup-time": 3, "total-time": 37, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 0, "downtime-bytes": 0, "page-size": 4096, "remaining": 162131968, "postcopy-bytes": 0, "mbps": 0, "transferred": 11821732, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 11948659, "duplicate": 211, "dirty-pages-rate": 0, "normal-bytes": 11923456, "normal": 2911}}}
{"execute": "query-migrate"}
{"return": {"expected-downtime": 30000, "status": "active", "setup-time": 3, "total-time": 48, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 0, "downtime-bytes": 0, "page-size": 4096, "remaining": 158724096, "postcopy-bytes": 0, "mbps": 0, "transferred": 15236260, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 15363187, "duplicate": 211, "dirty-pages-rate": 0, "normal-bytes": 15331328, "normal": 3743}}}
{"execute": "query-migrate"}
{"return": {"expected-downtime": 30000, "status": "active", "setup-time": 3, "total-time": 58, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 0, "downtime-bytes": 0, "page-size": 4096, "remaining": 155840512, "postcopy-bytes": 0, "mbps": 0, "transferred": 18125476, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 18252403, "duplicate": 211, "dirty-pages-rate": 0, "normal-bytes": 18214912, "normal": 4447}}}
{"execute": "query-migrate"}
{"return": {"expected-downtime": 30000, "status": "active", "setup-time": 3, "total-time": 72, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 0, "downtime-bytes": 0, "page-size": 4096, "remaining": 151588864, "postcopy-bytes": 0, "mbps": 0, "transferred": 22385446, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 22512363, "duplicate": 211, "dirty-pages-rate": 0, "normal-bytes": 22466560, "normal": 5485}}}
{"execute": "query-migrate"}
{"return": {"expected-downtime": 30000, "status": "active", "setup-time": 3, "total-time": 81, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 0, "downtime-bytes": 0, "page-size": 4096, "remaining": 148049920, "postcopy-bytes": 0, "mbps": 0, "transferred": 25931302, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 26058219, "duplicate": 211, "dirty-pages-rate": 0, "normal-bytes": 26005504, "normal": 6349}}}
{"execute": "query-migrate"}
{"return": {"expected-downtime": 30000, "status": "active", "setup-time": 3, "total-time": 94, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 0, "downtime-bytes": 0, "page-size": 4096, "remaining": 143593472, "postcopy-bytes": 0, "mbps": 0, "transferred": 30396454, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 30523371, "duplicate": 211, "dirty-pages-rate": 0, "normal-bytes": 30461952, "normal": 7437}}}
{"execute": "query-migrate"}
{"return": {"expected-downtime": 30000, "status": "active", "setup-time": 3, "total-time": 103, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 0, "downtime-bytes": 0, "page-size": 4096, "remaining": 139530240, "postcopy-bytes": 0, "mbps": 0, "transferred": 34467622, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 34594539, "duplicate": 211, "dirty-pages-rate": 0, "normal-bytes": 34525184, "normal": 8429}}}
{"execute": "query-migrate"}
{"timestamp": {"seconds": 1773164234, "microseconds": 447993}, "event": "MIGRATION_PASS", "data": {"pass": 2}}
{"timestamp": {"seconds": 1773164234, "microseconds": 448402}, "event": "STOP"}
{"timestamp": {"seconds": 1773164234, "microseconds": 448539}, "event": "MIGRATION", "data": {"status": "device"}}
{"timestamp": {"seconds": 1773164234, "microseconds": 448700}, "event": "MIGRATION_PASS", "data": {"pass": 3}}
{"return": {"expected-downtime": 30000, "status": "device", "setup-time": 3, "total-time": 385, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 3, "multifd-bytes": 0, "pages-per-second": 83981, "downtime-bytes": 75268133, "page-size": 4096, "remaining": 0, "postcopy-bytes": 0, "mbps": 2693.8943119266055, "transferred": 111972466, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 36703995, "duplicate": 17152, "dirty-pages-rate": 0, "normal-bytes": 111599616, "normal": 27246}}}
{"execute": "query-migrate"}
{"return": {"expected-downtime": 30000, "status": "device", "setup-time": 3, "total-time": 398, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 3, "multifd-bytes": 0, "pages-per-second": 83981, "downtime-bytes": 75268133, "page-size": 4096, "remaining": 0, "postcopy-bytes": 0, "mbps": 2693.8943119266055, "transferred": 112195530, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 36703995, "duplicate": 17152, "dirty-pages-rate": 0, "normal-bytes": 111599616, "normal": 27246}}}
{"execute": "query-migrate"}
{"return": {"expected-downtime": 30000, "status": "device", "setup-time": 3, "total-time": 406, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 3, "multifd-bytes": 0, "pages-per-second": 83981, "downtime-bytes": 75268133, "page-size": 4096, "remaining": 0, "postcopy-bytes": 0, "mbps": 2693.8943119266055, "transferred": 112195530, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 36703995, "duplicate": 17152, "dirty-pages-rate": 0, "normal-bytes": 111599616, "normal": 27246}}}
{"execute": "query-migrate"}
{"return": {"expected-downtime": 30000, "status": "device", "setup-time": 3, "total-time": 415, "ram": {"total": 174923776, "postcopy-requests": 0, "dirty-sync-count": 3, "multifd-bytes": 0, "pages-per-second": 83981, "downtime-bytes": 75268133, "page-size": 4096, "remaining": 0, "postcopy-bytes": 0, "mbps": 2693.8943119266055, "transferred": 112195530, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 36703995, "duplicate": 17152, "dirty-pages-rate": 0, "normal-bytes": 111599616, "normal": 27246}}}
{"execute": "query-migrate"}
{"timestamp": {"seconds": 1773164234, "microseconds": 759236}, "event": "MIGRATION", "data": {"status": "colo"}}
{"return": {"status": "colo"}}
{"timestamp": {"seconds": 1773164234, "microseconds": 340283}, "event": "MIGRATION", "data": {"status": "active"}}
{"timestamp": {"seconds": 1773164234, "microseconds": 755141}, "event": "MIGRATION", "data": {"status": "colo"}}
{"timestamp": {"seconds": 1773164234, "microseconds": 760176}, "event": "RESUME"}
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v11 13/21] migration-test: Add COLO migration unit test
2026-03-10 17:42 ` Arun Menon
@ 2026-03-19 22:40 ` Lukas Straub
0 siblings, 0 replies; 36+ messages in thread
From: Lukas Straub @ 2026-03-19 22:40 UTC (permalink / raw)
To: Arun Menon
Cc: qemu-devel, Peter Xu, Fabiano Rosas, Zhang Chen, Hailiang Zhang,
Li Zhijian
[-- Attachment #1: Type: text/plain, Size: 6100 bytes --]
On Tue, 10 Mar 2026 23:12:25 +0530
Arun Menon <armenon@redhat.com> wrote:
> Hi Lukas,
>
> On Tue, Mar 10, 2026 at 04:29:54PM +0100, Lukas Straub wrote:
> > On Tue, 10 Mar 2026 20:17:57 +0530
> > Arun Menon <armenon@redhat.com> wrote:
> >
> > > Hi Lukas,
> > >
> > > On Mon, Mar 02, 2026 at 12:45:28PM +0100, Lukas Straub wrote:
> > > > Add a COLO migration test for COLO migration and failover.
> > > >
> > > > Reviewed-by: Fabiano Rosas <farosas@suse.de>
> > > > Tested-by: Fabiano Rosas <farosas@suse.de>
> > > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > > > [...]
> > > >
> > >
> > > I was running the qtests locally, and I encountered a timeout error.
> > >
> > > Command run: mkdir -p build ; cd build ; make check-qtest-x86_64;
> > >
> > > Following is the output:
> > > ======
> > > [...]
> > >
> > > Summary of Failures:
> > > 67/67 qtest+qtest-x86_64 - qemu:qtest-x86_64/migration-test TIMEOUT 480.05s killed by signal 15 SIGTERM
> > > Ok: 64
> > > Fail: 0
> > > Skipped: 2
> > > Timeout: 1
> > > ======
> > >
> > > It seems that the test runner is stuck waiting for some input.
> > > Following is the stack trace
> > > [...]
> > > ======
> > > gstack 128276
> > > Thread 2 (Thread 0x7fdd090716c0 (LWP 128279) "call_rcu"):
> > > #0 0x00007fdd0921434d in syscall () from /lib64/libc.so.6
> > > #1 0x0000557fd604563a in qemu_futex_wait (f=0x557fd60a0190 <rcu_call_ready_event>, val=4294967295) at /home/arun/workdir/new/devel/upstream/qemu-priv/include/qemu/futex.h:47
> > > #2 0x0000557fd604584e in qemu_event_wait (ev=0x557fd60a0190 <rcu_call_ready_event>) at ../util/event.c:162
> > > #3 0x0000557fd6045fde in call_rcu_thread (opaque=0x0) at ../util/rcu.c:304
> > > #4 0x0000557fd600e8fb in qemu_thread_start (args=0x557fd6beec70) at ../util/qemu-thread-posix.c:414
> > > #5 0x00007fdd09193464 in start_thread () from /lib64/libc.so.6
> > > #6 0x00007fdd092165ac in __clone3 () from /lib64/libc.so.6
> > >
> > > Thread 1 (Thread 0x7fdd09073240 (LWP 128276) "migration-test"):
> > > #0 0x00007fdd0919b982 in __syscall_cancel_arch () from /lib64/libc.so.6
> > > #1 0x00007fdd0918fc3c in __internal_syscall_cancel () from /lib64/libc.so.6
> > > #2 0x00007fdd091dfb62 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
> > > #3 0x00007fdd091ebb37 in nanosleep () from /lib64/libc.so.6
> > > #4 0x00007fdd0921613a in usleep () from /lib64/libc.so.6
> > > #5 0x0000557fd5fd99cd in wait_for_serial (side=0x557fd6065f08 "dest_serial") at ../tests/qtest/migration/framework.c:82
> > > #6 0x0000557fd5fe5865 in test_colo_common (args=0x557fd6bfdf50, failover_during_checkpoint=false, primary_failover=true) at ../tests/qtest/migration/colo-tests.c:66
> >
> > 60 migrate_qmp(from, to, args->connect_uri, NULL, "{}");
> > 61
> > 62 wait_for_migration_status(from, "colo", NULL);
> > 63 wait_for_resume(to, get_dst());
> > 64
> > 65 wait_for_serial("src_serial");
> > 66 wait_for_serial("dest_serial");
> >
> > Interesting, so the secondary guest is stuck/crahsed after entering
> > colo state despite having resumed.
> >
> > It works fine here on master. And before the merge I have looped the
> > colo tests for a whole day on my machine without any failures.
> >
> > How often does this happen? What is the commit you are on, host, ASAN,
> > MSAN, UBSAN, configure options? With kvm or without?
>
> I have started the run on a fedora-43 VM. I am on commit:
> de61484ec39f418e5c0d4603017695f9ffb8fe24 master branch.
Okay, so the problem is that nested virtualization slows down kvm page
faults (e.g. dirtying the dirty bitmap) so much that it takes longer
for the guest to dirty the memory than the interval between two colo
checkpoints at which point the dirty bitmap is cleared again. And to
make matters worse the secondary qemu process is starved by the primary
qemu for some reason.
So the following happens:
1. After a checkpoint primary dirties all it's memory pretty quickly
and afterwards the dirty mem + serial out loops quickly. So it is very
likely the guest is in the middle of dirtying the memory at any given
time.
2. The colo checkpointing is actually working flawlessly it only seems
stuck because the migraion-test is stuck and not echoing the qmp events.
The secondary receives the primary checkpoint (whose guest is in the
middle of dirtying memory) and both sides resume. The secondary is
starved and spends a lot of time in the dirty mem phase.
3. Another checkpoint happens, before the secondary guest is finished
dirtying memory and has a chance sending the serial out signal. Goto 1.
Can you try increasing the checkpoint interval the following and see
what works for you:
migrate_set_parameter_int(from, "x-checkpoint-delay", 300);
In my tests, raising it to 1000 worked fine. Though for a doubly nested VM
I had to raise it to 5000. E.g. It takes up to 5 seconds to dirty 100Mb
memory in a doubly nested VM.
Regards,
Lukas Straub
>
> The configuration command I used was:
> ../configure --target-list=x86_64-softmmu --enable-debug --enable-trace-backends=log --enable-slirp
>
> Targets and accelerators
> KVM support : YES
>
> >
> > Can you try the following and show me the log:
> > (cd build && QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_LOG=- tests/qtest/migration-test --full -p /x86_64/migration/colo/plain)
> >
>
> I have attached the log file : qemu.log to this email. It is stuck at
> the last line :
> {"timestamp": {"seconds": 1773164370, "microseconds": 609791}, "event": "RESUME"}
>
> It should be some configuration miss from my end.
> Thank you for looking into it.
>
> [...]
> > >
> > >
> > > Is there something that I am missing? Can you please look into this?
> > >
> > >
> > > Regards,
> > > Arun Menon
> > >
> >
>
> Regards,
> Arun Menon
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2026-03-19 22:41 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02 11:43 [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
2026-03-02 11:43 ` [PATCH v11 01/21] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
2026-03-02 11:43 ` [PATCH v11 02/21] MAINTAINERS: Remove Hailiang Zhang from " Lukas Straub
2026-03-10 11:33 ` Lukas Straub
2026-03-02 11:43 ` [PATCH v11 03/21] colo: Setup ram cache in normal migration path Lukas Straub
2026-03-02 11:43 ` [PATCH v11 04/21] colo: Replace migration_incoming_colo_enabled() with migrate_colo() Lukas Straub
2026-03-02 11:43 ` [PATCH v11 05/21] colo: Remove ENABLE_COLO savevm command and mark it as deprecated Lukas Straub
2026-03-02 11:43 ` [PATCH v11 06/21] ram: Remove colo special-casing Lukas Straub
2026-03-02 11:43 ` [PATCH v11 07/21] multifd: Move ram state receive into multifd_ram_state_recv() Lukas Straub
2026-03-02 11:43 ` [PATCH v11 08/21] multifd: Add COLO support Lukas Straub
2026-03-02 11:43 ` [PATCH v11 09/21] Call colo_release_ram_cache() after multifd threads terminate Lukas Straub
2026-03-02 11:45 ` [PATCH v11 10/21] colo: Fix crash during device vmstate load Lukas Straub
2026-03-02 11:45 ` [PATCH v11 11/21] colo: Hold the BQL while sending ram state Lukas Straub
2026-03-02 11:45 ` [PATCH v11 12/21] colo: Do not hold the BQL while receiving " Lukas Straub
2026-03-02 11:45 ` [PATCH v11 13/21] migration-test: Add COLO migration unit test Lukas Straub
2026-03-10 14:47 ` Arun Menon
2026-03-10 15:29 ` Lukas Straub
2026-03-10 17:42 ` Arun Menon
2026-03-19 22:40 ` Lukas Straub
2026-03-02 11:45 ` [PATCH v11 14/21] Convert colo main documentation to restructuredText Lukas Straub
2026-03-02 11:45 ` [PATCH v11 15/21] qemu-colo.rst: Miscellaneous changes Lukas Straub
2026-03-02 11:45 ` [PATCH v11 16/21] qemu-colo.rst: Add my copyright Lukas Straub
2026-03-02 11:45 ` [PATCH v11 17/21] qemu-colo.rst: Simplify the block replication setup Lukas Straub
2026-03-02 11:45 ` [PATCH v11 18/21] multifd: Fix hang if send thread errors during sync Lukas Straub
2026-03-02 11:45 ` [PATCH v11 19/21] colo: Use file lock in primary_vm_do_failover() Lukas Straub
2026-03-02 20:06 ` Peter Xu
2026-03-02 11:45 ` [PATCH v11 20/21] migration: Keep s->rp_state.from_dst_file open until migration ends Lukas Straub
2026-03-02 20:07 ` Peter Xu
2026-03-02 11:45 ` [PATCH v11 21/21] colo: Reuse the return path from migration on primary and secondary side Lukas Straub
2026-03-02 20:11 ` Peter Xu
2026-03-06 12:53 ` [PATCH v11 00/21] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
2026-03-06 13:07 ` Fabiano Rosas
-- strict thread matches above, loose matches on Subject: below --
2026-03-02 10:19 Lukas Straub
2026-03-02 10:19 ` [PATCH v11 13/21] migration-test: Add " Lukas Straub
2026-03-02 10:13 [PATCH v11 00/21] migration: Add COLO multifd support and " Lukas Straub
2026-03-02 10:13 ` [PATCH v11 13/21] migration-test: Add " Lukas Straub
2026-03-01 21:37 [PATCH v11 00/21] migration: Add COLO multifd support and " Lukas Straub
2026-03-01 21:37 ` [PATCH v11 13/21] migration-test: Add " Lukas Straub
2026-03-01 19:16 [PATCH v11 00/21] migration: Add COLO multifd support and " Lukas Straub
2026-03-01 19:16 ` [PATCH v11 13/21] migration-test: Add " Lukas Straub
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox