[PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test
@ 2026-01-25 20:40 Lukas Straub
  2026-01-25 20:40 ` [PATCH v3 01/10] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
                   ` (9 more replies)
  0 siblings, 10 replies; 37+ messages in thread
From: Lukas Straub @ 2026-01-25 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub, Juan Quintela

Hello everyone,
This adds COLO multifd support and migration unit tests for COLO migration
and failover.

Regards,
Lukas

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
Changes in v3:
- Fix peter's review comments.
- Fix COLO with Q35 machine
- Link to v2: https://lore.kernel.org/qemu-devel/20260117-colo_unit_test_multifd-v2-0-ab521777fa51@web.de

Changes in v2:
- Fix review comments
- Hide stderr in colo migration test since the logged errors are expected
- Add benchmarking data for multifd
- Add myself as maintainer for COLO migration framework
- Link to v1: https://lore.kernel.org/qemu-devel/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de

---
Lukas Straub (10):
      MAINTAINERS: Add myself as maintainer for COLO migration framework
      MAINTAINERS: Remove Hailiang Zhang from COLO migration framework
      Move ram state receive into multifd_ram_state_recv()
      multifd: Add COLO support
      colo: Fix crash during device vmstate load
      migration-test: Add COLO migration unit test
      Convert colo main documentation to restructuredText
      qemu-colo.rst: Miscellaneous changes
      qemu-colo.rst: Add my copyright
      qemu-colo.rst: Simplify the block replication setup

 MAINTAINERS                        |   6 +-
 docs/COLO-FT.txt                   | 334 ----------------------------------
 docs/system/index.rst              |   1 +
 docs/system/qemu-colo.rst          | 362 +++++++++++++++++++++++++++++++++++++
 migration/colo.c                   |   1 +
 migration/meson.build              |   2 +-
 migration/multifd-colo.c           |  50 +++++
 migration/multifd-colo.h           |  26 +++
 migration/multifd-nocomp.c         |  10 +-
 migration/multifd.c                |  19 +-
 migration/multifd.h                |   5 +-
 tests/qtest/meson.build            |   7 +-
 tests/qtest/migration-test.c       |   1 +
 tests/qtest/migration/colo-tests.c | 199 ++++++++++++++++++++
 tests/qtest/migration/framework.h  |   5 +
 15 files changed, 687 insertions(+), 341 deletions(-)
---
base-commit: fea2d7a784fc3627a8aa72875f51fe7634b04b81
change-id: 20251230-colo_unit_test_multifd-8bf58dcebd46

Best regards,
-- 
Lukas Straub <lukasstraub2@web.de>



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 01/10] MAINTAINERS: Add myself as maintainer for COLO migration framework
  2026-01-25 20:40 [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
@ 2026-01-25 20:40 ` Lukas Straub
  2026-01-25 20:40 ` [PATCH v3 02/10] MAINTAINERS: Remove Hailiang Zhang from " Lukas Straub
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Lukas Straub @ 2026-01-25 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

I am ready to maintain it.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e23354235dc70a6224dd00ce92b0a049fbc8edfc..689d79b82d39ec8c2bb15dacb7928df5649756cd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3851,6 +3851,7 @@ F: qapi/yank.json
 
 COLO Framework
 M: Hailiang Zhang <zhanghailiang@xfusion.com>
+M: Lukas Straub <lukasstraub2@web.de>
 S: Maintained
 F: migration/colo*
 F: include/migration/colo.h

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 02/10] MAINTAINERS: Remove Hailiang Zhang from COLO migration framework
  2026-01-25 20:40 [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
  2026-01-25 20:40 ` [PATCH v3 01/10] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
@ 2026-01-25 20:40 ` Lukas Straub
  2026-01-25 20:40 ` [PATCH v3 03/10] Move ram state receive into multifd_ram_state_recv() Lukas Straub
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Lukas Straub @ 2026-01-25 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

His last email to the mailing list is from December 2021:
https://lore.kernel.org/qemu-devel/20211214075424.6920-1-zhanghailiang@xfusion.com/

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 689d79b82d39ec8c2bb15dacb7928df5649756cd..1e9bdd87c3a2f84f3abfc56986cd793976810fdd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3850,7 +3850,6 @@ F: include/qemu/yank.h
 F: qapi/yank.json
 
 COLO Framework
-M: Hailiang Zhang <zhanghailiang@xfusion.com>
 M: Lukas Straub <lukasstraub2@web.de>
 S: Maintained
 F: migration/colo*

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 03/10] Move ram state receive into multifd_ram_state_recv()
  2026-01-25 20:40 [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
  2026-01-25 20:40 ` [PATCH v3 01/10] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
  2026-01-25 20:40 ` [PATCH v3 02/10] MAINTAINERS: Remove Hailiang Zhang from " Lukas Straub
@ 2026-01-25 20:40 ` Lukas Straub
  2026-01-26 12:51   ` Fabiano Rosas
  2026-01-25 20:40 ` [PATCH v3 04/10] multifd: Add COLO support Lukas Straub
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Lukas Straub @ 2026-01-25 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

This is in preparation for the next patch.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/multifd.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index ad6261688fdf98a5c7f4ee9fb80ba2901201a33e..332e6fc58053462419f3171f6c320ac37648ef7b 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1253,6 +1253,15 @@ static int multifd_device_state_recv(MultiFDRecvParams *p, Error **errp)
     return ret;
 }
 
+static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
+{
+    int ret;
+
+    ret = multifd_recv_state->ops->recv(p, errp);
+
+    return ret;
+}
+
 static void *multifd_recv_thread(void *opaque)
 {
     MigrationState *s = migrate_get_current();
@@ -1387,7 +1396,7 @@ static void *multifd_recv_thread(void *opaque)
                 assert(use_packets);
                 ret = multifd_device_state_recv(p, &local_err);
             } else {
-                ret = multifd_recv_state->ops->recv(p, &local_err);
+                ret = multifd_ram_state_recv(p, &local_err);
             }
             if (ret != 0) {
                 break;

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 04/10] multifd: Add COLO support
  2026-01-25 20:40 [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (2 preceding siblings ...)
  2026-01-25 20:40 ` [PATCH v3 03/10] Move ram state receive into multifd_ram_state_recv() Lukas Straub
@ 2026-01-25 20:40 ` Lukas Straub
  2026-01-26 10:36   ` Zhang Chen
  2026-01-26 14:33   ` Fabiano Rosas
  2026-01-25 20:40 ` [PATCH v3 05/10] colo: Fix crash during device vmstate load Lukas Straub
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 37+ messages in thread
From: Lukas Straub @ 2026-01-25 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub, Juan Quintela

Like in the normal ram_load() path, put the received pages into the
colo cache and mark the pages in the bitmap so that they will be
flushed to the guest later.

Multifd with COLO is useful to reduce the VM pause time during checkpointing
for latency sensitive workloads. In such workloads the worst-case latency
is especially important.

Also, this is already worth it for the precopy phase as it helps with
converging. Moreover, multifd migration is the preferred way to do migration
nowadays and this allows to use multifd compression with COLO.

Benchmark:
Cluster nodes
 - Intel Xenon E5-2630 v3
 - 48Gb RAM
 - 10G Ethernet
Guest
 - Windows Server 2016
 - 6Gb RAM
 - 4 cores
Workload
 - Upload a file to the guest with SMB to simulate moderate
   memory dirtying
 - Measure the memory transfer time portion of each checkpoint
 - 600ms COLO checkpoint interval

Results
Plain
 idle mean: 4.50ms 99per: 10.33ms
 load mean: 24.30ms 99per: 78.05ms
Multifd-4
 idle mean: 6.48ms 99per: 10.41ms
 load mean: 14.12ms 99per: 31.27ms

Evaluation
While multifd has slightly higher latency when the guest idles, it is
10ms faster under load and more importantly it's worst case latency is
less than 1/2 of plain under load as can be seen in the 99. Percentile.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 MAINTAINERS                |  1 +
 migration/meson.build      |  2 +-
 migration/multifd-colo.c   | 50 ++++++++++++++++++++++++++++++++++++++++++++++
 migration/multifd-colo.h   | 26 ++++++++++++++++++++++++
 migration/multifd-nocomp.c | 10 +++++++++-
 migration/multifd.c        |  8 ++++++++
 migration/multifd.h        |  5 ++++-
 7 files changed, 99 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1e9bdd87c3a2f84f3abfc56986cd793976810fdd..883f0a8f4eb92d0bf0f89fcab4674ccc4aed1cc1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3853,6 +3853,7 @@ COLO Framework
 M: Lukas Straub <lukasstraub2@web.de>
 S: Maintained
 F: migration/colo*
+F: migration/multifd-colo.*
 F: include/migration/colo.h
 F: include/migration/failover.h
 F: docs/COLO-FT.txt
diff --git a/migration/meson.build b/migration/meson.build
index c7f39bdb55239ecb0e775c77b90a1aa9e6a4a9ce..c9f0f5f9f2137536497e53e960ce70654ad1b394 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -39,7 +39,7 @@ system_ss.add(files(
 ), gnutls, zlib)
 
 if get_option('replication').allowed()
-  system_ss.add(files('colo-failover.c', 'colo.c'))
+  system_ss.add(files('colo-failover.c', 'colo.c', 'multifd-colo.c'))
 else
   system_ss.add(files('colo-stubs.c'))
 endif
diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
new file mode 100644
index 0000000000000000000000000000000000000000..c47f5044663969e0c9af56da5ec34902d635810a
--- /dev/null
+++ b/migration/multifd-colo.c
@@ -0,0 +1,50 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * multifd colo implementation
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "exec/target_page.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "ram.h"
+#include "multifd.h"
+#include "options.h"
+#include "io/channel-socket.h"
+#include "migration/colo.h"
+#include "multifd-colo.h"
+#include "system/ramblock.h"
+
+void multifd_colo_prepare_recv(MultiFDRecvParams *p)
+{
+    /*
+     * While we're still in precopy state (not yet in colo state), we copy
+     * received pages to both guest and cache. No need to set dirty bits,
+     * since guest and cache memory are in sync.
+     */
+    if (migration_incoming_in_colo_state()) {
+        colo_record_bitmap(p->block, p->normal, p->normal_num);
+        colo_record_bitmap(p->block, p->zero, p->zero_num);
+    }
+}
+
+void multifd_colo_process_recv(MultiFDRecvParams *p)
+{
+    if (!migration_incoming_in_colo_state()) {
+        for (int i = 0; i < p->normal_num; i++) {
+            void *guest = p->block->host + p->normal[i];
+            void *cache = p->host + p->normal[i];
+            memcpy(guest, cache, multifd_ram_page_size());
+        }
+        for (int i = 0; i < p->zero_num; i++) {
+            void *guest = p->block->host + p->zero[i];
+            memset(guest, 0, multifd_ram_page_size());
+        }
+    }
+}
diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
new file mode 100644
index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
--- /dev/null
+++ b/migration/multifd-colo.h
@@ -0,0 +1,26 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * multifd colo header
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
+#define QEMU_MIGRATION_MULTIFD_COLO_H
+
+#ifdef CONFIG_REPLICATION
+
+void multifd_colo_prepare_recv(MultiFDRecvParams *p);
+void multifd_colo_process_recv(MultiFDRecvParams *p);
+
+#else
+
+static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
+static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
+
+#endif
+#endif
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -16,6 +16,7 @@
 #include "file.h"
 #include "migration-stats.h"
 #include "multifd.h"
+#include "multifd-colo.h"
 #include "options.h"
 #include "migration.h"
 #include "qapi/error.h"
@@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
         return -1;
     }
 
-    p->host = p->block->host;
     for (i = 0; i < p->normal_num; i++) {
         uint64_t offset = be64_to_cpu(packet->offset[i]);
 
@@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
         p->zero[i] = offset;
     }
 
+    if (migrate_colo()) {
+        multifd_colo_prepare_recv(p);
+        assert(p->block->colo_cache);
+        p->host = p->block->colo_cache;
+    } else {
+        p->host = p->block->host;
+    }
+
     return 0;
 }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 332e6fc58053462419f3171f6c320ac37648ef7b..220ed8564960fdabc58e4baa069dd252c8ad293c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -29,6 +29,7 @@
 #include "qemu-file.h"
 #include "trace.h"
 #include "multifd.h"
+#include "multifd-colo.h"
 #include "options.h"
 #include "qemu/yank.h"
 #include "io/channel-file.h"
@@ -1258,6 +1259,13 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
     int ret;
 
     ret = multifd_recv_state->ops->recv(p, errp);
+    if (ret != 0) {
+        return ret;
+    }
+
+    if (migrate_colo()) {
+        multifd_colo_process_recv(p);
+    }
 
     return ret;
 }
diff --git a/migration/multifd.h b/migration/multifd.h
index 89a395aef2b09a6762c45b5361e0ab63256feff6..fbc35702b062fdc3213ce92baed35994f5967c2b 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -279,7 +279,10 @@ typedef struct {
     uint64_t packets_recved;
     /* ramblock */
     RAMBlock *block;
-    /* ramblock host address */
+    /*
+     * Normally, it points to ramblock's host address.  When COLO
+     * is enabled, it points to the mirror cache for the ramblock.
+     */
     uint8_t *host;
     /* buffers to recv */
     struct iovec *iov;

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 05/10] colo: Fix crash during device vmstate load
  2026-01-25 20:40 [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (3 preceding siblings ...)
  2026-01-25 20:40 ` [PATCH v3 04/10] multifd: Add COLO support Lukas Straub
@ 2026-01-25 20:40 ` Lukas Straub
  2026-01-27 20:38   ` Peter Xu
  2026-01-25 20:40 ` [PATCH v3 06/10] migration-test: Add COLO migration unit test Lukas Straub
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Lukas Straub @ 2026-01-25 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

With colo we load device vmstate during each checkpoint, on top of
a vm that was already running. Some devices expect a reset before
loading vmstate on such a previously running vm.

This fixes a crash when using COLO with Q35 machine.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/colo.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/colo.c b/migration/colo.c
index db783f6fa77500386d923dd97e522883027e71d8..627b3706687036554eda3909b4194116a7640493 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -727,6 +727,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
 
     bql_lock();
     vmstate_loading = true;
+    qemu_system_reset(SHUTDOWN_CAUSE_SNAPSHOT_LOAD);
     colo_flush_ram_cache();
     ret = qemu_load_device_state(fb, errp);
     if (ret < 0) {

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 06/10] migration-test: Add COLO migration unit test
  2026-01-25 20:40 [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (4 preceding siblings ...)
  2026-01-25 20:40 ` [PATCH v3 05/10] colo: Fix crash during device vmstate load Lukas Straub
@ 2026-01-25 20:40 ` Lukas Straub
  2026-01-26 14:40   ` Fabiano Rosas
                     ` (2 more replies)
  2026-01-25 20:40 ` [PATCH v3 07/10] Convert colo main documentation to restructuredText Lukas Straub
                   ` (3 subsequent siblings)
  9 siblings, 3 replies; 37+ messages in thread
From: Lukas Straub @ 2026-01-25 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

Add a COLO migration test for COLO migration and failover.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 MAINTAINERS                        |   1 +
 tests/qtest/meson.build            |   7 +-
 tests/qtest/migration-test.c       |   1 +
 tests/qtest/migration/colo-tests.c | 199 +++++++++++++++++++++++++++++++++++++
 tests/qtest/migration/framework.h  |   5 +
 5 files changed, 212 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 883f0a8f4eb92d0bf0f89fcab4674ccc4aed1cc1..2a8b9b2d051883c1b7adce9c1afec80d16a317f8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3856,6 +3856,7 @@ F: migration/colo*
 F: migration/multifd-colo.*
 F: include/migration/colo.h
 F: include/migration/failover.h
+F: tests/qtest/migration/colo-tests.c
 F: docs/COLO-FT.txt
 
 COLO Proxy
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index dfb83650c643d884daad53a66034ab7aa8c45509..624f7744ec9bd81c8823075b966bc95f7750a667 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -371,6 +371,11 @@ if gnutls.found()
   endif
 endif
 
+migration_colo_files = []
+if get_option('replication').allowed()
+  migration_colo_files = [files('migration/colo-tests.c')]
+endif
+
 qtests = {
   'aspeed_hace-test': files('aspeed-hace-utils.c', 'aspeed_hace-test.c'),
   'aspeed_smc-test': files('aspeed-smc-utils.c', 'aspeed_smc-test.c'),
@@ -382,7 +387,7 @@ qtests = {
                              'migration/migration-util.c') + dbus_vmstate1,
   'erst-test': files('erst-test.c'),
   'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
-  'migration-test': test_migration_files + migration_tls_files,
+  'migration-test': test_migration_files + migration_tls_files + migration_colo_files,
   'pxe-test': files('boot-sector.c'),
   'pnv-xive2-test': files('pnv-xive2-common.c', 'pnv-xive2-flush-sync.c',
                           'pnv-xive2-nvpg_bar.c'),
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 08936871741535c926eeac40a7d7c3f461c72fd0..e582f05c7dc2673dbd05a936df8feb6c964b5bbc 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -55,6 +55,7 @@ int main(int argc, char **argv)
     migration_test_add_precopy(env);
     migration_test_add_cpr(env);
     migration_test_add_misc(env);
+    migration_test_add_colo(env);
 
     ret = g_test_run();
 
diff --git a/tests/qtest/migration/colo-tests.c b/tests/qtest/migration/colo-tests.c
new file mode 100644
index 0000000000000000000000000000000000000000..0586970e206f01ed6e7aa3429321aefc1de7be37
--- /dev/null
+++ b/tests/qtest/migration/colo-tests.c
@@ -0,0 +1,199 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * QTest testcases for COLO migration
+ *
+ * Copyright (c) 2025 Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "migration/framework.h"
+#include "migration/migration-qmp.h"
+#include "migration/migration-util.h"
+#include "qemu/module.h"
+
+static int test_colo_common(MigrateCommon *args,
+                            bool failover_during_checkpoint,
+                            bool primary_failover)
+{
+    QTestState *from, *to;
+    void *data_hook = NULL;
+
+    /*
+     * For the COLO test, both VMs will run in parallel. Thus both VMs want to
+     * open the image read/write at the same time. Using read-only=on is not
+     * possible here, because ide-hd does not support read-only backing image.
+     *
+     * So use -snapshot, where each qemu instance creates its own writable
+     * snapshot internally while leaving the real image read-only.
+     */
+    args->start.opts_source = "-snapshot";
+    args->start.opts_target = "-snapshot";
+
+    /*
+     * COLO migration code logs many errors when the migration socket
+     * is shut down, these are expected so we hide them here.
+     */
+    args->start.hide_stderr = true;
+
+    args->start.oob = true;
+    args->start.caps[MIGRATION_CAPABILITY_X_COLO] = true;
+
+    if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
+        return -1;
+    }
+
+    migrate_set_parameter_int(from, "x-checkpoint-delay", 300);
+
+    if (args->start_hook) {
+        data_hook = args->start_hook(from, to);
+    }
+
+    migrate_ensure_converge(from);
+    wait_for_serial("src_serial");
+
+    migrate_qmp(from, to, args->connect_uri, NULL, "{}");
+
+    wait_for_migration_status(from, "colo", NULL);
+    wait_for_resume(to, get_dst());
+
+    wait_for_serial("src_serial");
+    wait_for_serial("dest_serial");
+
+    /* wait for 3 checkpoints */
+    for (int i = 0; i < 3; i++) {
+        qtest_qmp_eventwait(to, "RESUME");
+        wait_for_serial("src_serial");
+        wait_for_serial("dest_serial");
+    }
+
+    if (failover_during_checkpoint) {
+        qtest_qmp_eventwait(to, "STOP");
+    }
+    if (primary_failover) {
+        qtest_qmp_assert_success(from, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+                                            "'arguments': {'instances':"
+                                                "[{'type': 'migration'}]}}");
+        qtest_qmp_assert_success(from, "{'execute': 'x-colo-lost-heartbeat'}");
+        wait_for_serial("src_serial");
+    } else {
+        qtest_qmp_assert_success(to, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+                                        "'arguments': {'instances':"
+                                            "[{'type': 'migration'}]}}");
+        qtest_qmp_assert_success(to, "{'execute': 'x-colo-lost-heartbeat'}");
+        wait_for_serial("dest_serial");
+    }
+
+    if (args->end_hook) {
+        args->end_hook(from, to, data_hook);
+    }
+
+    migrate_end(from, to, !primary_failover);
+
+    return 0;
+}
+
+static void test_colo_plain_common(MigrateCommon *args,
+                                   bool failover_during_checkpoint,
+                                   bool primary_failover)
+{
+    args->listen_uri = "tcp:127.0.0.1:0";
+    test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void *hook_start_multifd(QTestState *from, QTestState *to)
+{
+    return migrate_hook_start_precopy_tcp_multifd_common(from, to, "none");
+}
+
+static void test_colo_multifd_common(MigrateCommon *args,
+                                     bool failover_during_checkpoint,
+                                     bool primary_failover)
+{
+    args->listen_uri = "defer";
+    args->start_hook = hook_start_multifd;
+    args->start.caps[MIGRATION_CAPABILITY_MULTIFD] = true;
+    test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void test_colo_plain_primary_failover(char *name, MigrateCommon *args)
+{
+    test_colo_plain_common(args, false, true);
+}
+
+static void test_colo_plain_secondary_failover(char *name, MigrateCommon *args)
+{
+    test_colo_plain_common(args, false, false);
+}
+
+static void test_colo_multifd_primary_failover(char *name, MigrateCommon *args)
+{
+    test_colo_multifd_common(args, false, true);
+}
+
+static void test_colo_multifd_secondary_failover(char *name,
+                                                 MigrateCommon *args)
+{
+    test_colo_multifd_common(args, false, false);
+}
+
+static void test_colo_plain_primary_failover_checkpoint(char *name,
+                                                        MigrateCommon *args)
+{
+    test_colo_plain_common(args, true, true);
+}
+
+static void test_colo_plain_secondary_failover_checkpoint(char *name,
+                                                          MigrateCommon *args)
+{
+    test_colo_plain_common(args, true, false);
+}
+
+static void test_colo_multifd_primary_failover_checkpoint(char *name,
+                                                          MigrateCommon *args)
+{
+    test_colo_multifd_common(args, true, true);
+}
+
+static void test_colo_multifd_secondary_failover_checkpoint(char *name,
+                                                            MigrateCommon *args)
+{
+    test_colo_multifd_common(args, true, false);
+}
+
+void migration_test_add_colo(MigrationTestEnv *env)
+{
+    if (!env->has_kvm) {
+        g_test_skip("COLO requires KVM accelerator");
+        return;
+    }
+
+    if (!env->full_set) {
+        return;
+    }
+
+    migration_test_add("/migration/colo/plain/primary_failover",
+                       test_colo_plain_primary_failover);
+    migration_test_add("/migration/colo/plain/secondary_failover",
+                       test_colo_plain_secondary_failover);
+
+    migration_test_add("/migration/colo/multifd/primary_failover",
+                       test_colo_multifd_primary_failover);
+    migration_test_add("/migration/colo/multifd/secondary_failover",
+                       test_colo_multifd_secondary_failover);
+
+    migration_test_add("/migration/colo/plain/primary_failover_checkpoint",
+                       test_colo_plain_primary_failover_checkpoint);
+    migration_test_add("/migration/colo/plain/secondary_failover_checkpoint",
+                       test_colo_plain_secondary_failover_checkpoint);
+
+    migration_test_add("/migration/colo/multifd/primary_failover_checkpoint",
+                       test_colo_multifd_primary_failover_checkpoint);
+    migration_test_add("/migration/colo/multifd/secondary_failover_checkpoint",
+                       test_colo_multifd_secondary_failover_checkpoint);
+}
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index 40984d04930da2d181326d9f6a742bde49018103..80eef758932ce9c301ed6c0f6383d18756144870 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -264,5 +264,10 @@ void migration_test_add_file(MigrationTestEnv *env);
 void migration_test_add_precopy(MigrationTestEnv *env);
 void migration_test_add_cpr(MigrationTestEnv *env);
 void migration_test_add_misc(MigrationTestEnv *env);
+#ifdef CONFIG_REPLICATION
+void migration_test_add_colo(MigrationTestEnv *env);
+#else
+static inline void migration_test_add_colo(MigrationTestEnv *env) {};
+#endif
 
 #endif /* TEST_FRAMEWORK_H */

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 07/10] Convert colo main documentation to restructuredText
  2026-01-25 20:40 [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (5 preceding siblings ...)
  2026-01-25 20:40 ` [PATCH v3 06/10] migration-test: Add COLO migration unit test Lukas Straub
@ 2026-01-25 20:40 ` Lukas Straub
  2026-01-25 20:40 ` [PATCH v3 08/10] qemu-colo.rst: Miscellaneous changes Lukas Straub
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Lukas Straub @ 2026-01-25 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 MAINTAINERS               |   2 +-
 docs/COLO-FT.txt          | 334 ------------------------------------------
 docs/system/index.rst     |   1 +
 docs/system/qemu-colo.rst | 360 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+), 335 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2a8b9b2d051883c1b7adce9c1afec80d16a317f8..7d396183cef0f5e2064e016cf479765b97820b71 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3857,7 +3857,7 @@ F: migration/multifd-colo.*
 F: include/migration/colo.h
 F: include/migration/failover.h
 F: tests/qtest/migration/colo-tests.c
-F: docs/COLO-FT.txt
+F: docs/system/qemu-colo.rst
 
 COLO Proxy
 M: Zhang Chen <zhangckid@gmail.com>
diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt
deleted file mode 100644
index 2283a09c080b8996f9767eeb415e8d4fbdc940af..0000000000000000000000000000000000000000
--- a/docs/COLO-FT.txt
+++ /dev/null
@@ -1,334 +0,0 @@
-COarse-grained LOck-stepping Virtual Machines for Non-stop Service
-----------------------------------------
-Copyright (c) 2016 Intel Corporation
-Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
-Copyright (c) 2016 Fujitsu, Corp.
-
-This work is licensed under the terms of the GNU GPL, version 2 or later.
-See the COPYING file in the top-level directory.
-
-This document gives an overview of COLO's design and how to use it.
-
-== Background ==
-Virtual machine (VM) replication is a well known technique for providing
-application-agnostic software-implemented hardware fault tolerance,
-also known as "non-stop service".
-
-COLO (COarse-grained LOck-stepping) is a high availability solution.
-Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the
-same request from client, and generate response in parallel too.
-If the response packets from PVM and SVM are identical, they are released
-immediately. Otherwise, a VM checkpoint (on demand) is conducted.
-
-== Architecture ==
-
-The architecture of COLO is shown in the diagram below.
-It consists of a pair of networked physical nodes:
-The primary node running the PVM, and the secondary node running the SVM
-to maintain a valid replica of the PVM.
-PVM and SVM execute in parallel and generate output of response packets for
-client requests according to the application semantics.
-
-The incoming packets from the client or external network are received by the
-primary node, and then forwarded to the secondary node, so that both the PVM
-and the SVM are stimulated with the same requests.
-
-COLO receives the outbound packets from both the PVM and SVM and compares them
-before allowing the output to be sent to clients.
-
-The SVM is qualified as a valid replica of the PVM, as long as it generates
-identical responses to all client requests. Once the differences in the outputs
-are detected between the PVM and SVM, COLO withholds transmission of the
-outbound packets until it has successfully synchronized the PVM state to the SVM.
-
-  Primary Node                                                            Secondary Node
-+------------+  +-----------------------+       +------------------------+  +------------+
-|            |  |       HeartBeat       +<----->+       HeartBeat        |  |            |
-| Primary VM |  +-----------+-----------+       +-----------+------------+  |Secondary VM|
-|            |              |                               |               |            |
-|            |  +-----------|-----------+       +-----------|------------+  |            |
-|            |  |QEMU   +---v----+      |       |QEMU  +----v---+        |  |            |
-|            |  |       |Failover|      |       |      |Failover|        |  |            |
-|            |  |       +--------+      |       |      +--------+        |  |            |
-|            |  |   +---------------+   |       |   +---------------+    |  |            |
-|            |  |   | VM Checkpoint +-------------->+ VM Checkpoint |    |  |            |
-|            |  |   +---------------+   |       |   +---------------+    |  |            |
-|Requests<--------------------------\ /-----------------\ /--------------------->Requests|
-|            |  |                   ^ ^ |       |       | |              |  |            |
-|Responses+---------------------\ /-|-|------------\ /-------------------------+Responses|
-|            |  |               | | | | |       |  | |  | |              |  |            |
-|            |  | +-----------+ | | | | |       |  | |  | | +----------+ |  |            |
-|            |  | | COLO disk | | | | | |       |  | |  | | | COLO disk| |  |            |
-|            |  | |   Manager +---------------------------->| Manager  | |  |            |
-|            |  | ++----------+ v v | | |       |  | v  v | +---------++ |  |            |
-|            |  |  |+-----------+-+-+-++|       | ++-+--+-+---------+ |  |  |            |
-|            |  |  ||   COLO Proxy     ||       | |   COLO Proxy    | |  |  |            |
-|            |  |  || (compare packet  ||       | |(adjust sequence | |  |  |            |
-|            |  |  ||and mirror packet)||       | |    and ACK)     | |  |  |            |
-|            |  |  |+------------+---+-+|       | +-----------------+ |  |  |            |
-+------------+  +-----------------------+       +------------------------+  +------------+
-+------------+     |             |   |                                |     +------------+
-| VM Monitor |     |             |   |                                |     | VM Monitor |
-+------------+     |             |   |                                |     +------------+
-+---------------------------------------+       +----------------------------------------+
-|   Kernel         |             |   |  |       |   Kernel            |                  |
-+---------------------------------------+       +----------------------------------------+
-                   |             |   |                                |
-    +--------------v+  +---------v---+--+       +------------------+ +v-------------+
-    |   Storage     |  |External Network|       | External Network | |   Storage    |
-    +---------------+  +----------------+       +------------------+ +--------------+
-
-
-== Components introduction ==
-
-You can see there are several components in COLO's diagram of architecture.
-Their functions are described below.
-
-HeartBeat:
-Runs on both the primary and secondary nodes, to periodically check platform
-availability. When the primary node suffers a hardware fail-stop failure,
-the heartbeat stops responding, the secondary node will trigger a failover
-as soon as it determines the absence.
-
-COLO disk Manager:
-When primary VM writes data into image, the colo disk manager captures this data
-and sends it to secondary VM's which makes sure the context of secondary VM's
-image is consistent with the context of primary VM 's image.
-For more details, please refer to docs/block-replication.txt.
-
-Checkpoint/Failover Controller:
-Modifications of save/restore flow to realize continuous migration,
-to make sure the state of VM in Secondary side is always consistent with VM in
-Primary side.
-
-COLO Proxy:
-Delivers packets to Primary and Secondary, and then compare the responses from
-both side. Then decide whether to start a checkpoint according to some rules.
-Please refer to docs/colo-proxy.txt for more information.
-
-Note:
-HeartBeat has not been implemented yet, so you need to trigger failover process
-by using 'x-colo-lost-heartbeat' command.
-
-== COLO operation status ==
-
-+-----------------+
-|                 |
-|    Start COLO   |
-|                 |
-+--------+--------+
-         |
-         |  Main qmp command:
-         |  migrate-set-capabilities with x-colo
-         |  migrate
-         |
-         v
-+--------+--------+
-|                 |
-|  COLO running   |
-|                 |
-+--------+--------+
-         |
-         |  Main qmp command:
-         |  x-colo-lost-heartbeat
-         |  or
-         |  some error happened
-         v
-+--------+--------+
-|                 |  send qmp event:
-|  COLO failover  |  COLO_EXIT
-|                 |
-+-----------------+
-
-COLO use the qmp command to switch and report operation status.
-The diagram just shows the main qmp command, you can get the detail
-in test procedure.
-
-== Test procedure ==
-Note: Here we are running both instances on the same host for testing,
-change the IP Addresses if you want to run it on two hosts. Initially
-127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host.
-
-== Startup qemu ==
-1. Primary:
-Note: Initially, $imagefolder/primary.qcow2 needs to be copied to all hosts.
-You don't need to change any IP's here, because 0.0.0.0 listens on any
-interface. The chardev's with 127.0.0.1 IP's loopback to the local qemu
-instance.
-
-# imagefolder="/mnt/vms/colo-test-primary"
-
-# qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
-   -device piix3-usb-uhci -device usb-tablet -name primary \
-   -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
-   -device rtl8139,id=e0,netdev=hn0 \
-   -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server=on,wait=off \
-   -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on \
-   -chardev socket,id=compare0,host=127.0.0.1,port=9001,server=on,wait=off \
-   -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
-   -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server=on,wait=off \
-   -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
-   -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
-   -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
-   -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
-   -object iothread,id=iothread1 \
-   -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,\
-outdev=compare_out0,iothread=iothread1 \
-   -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
-children.0.file.filename=$imagefolder/primary.qcow2,children.0.driver=qcow2 -S
-
-2. Secondary:
-Note: Active and hidden images need to be created only once and the
-size should be the same as primary.qcow2. Again, you don't need to change
-any IP's here, except for the $primary_ip variable.
-
-# imagefolder="/mnt/vms/colo-test-secondary"
-# primary_ip=127.0.0.1
-
-# qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G
-
-# qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 10G
-
-# qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
-   -device piix3-usb-uhci -device usb-tablet -name secondary \
-   -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
-   -device rtl8139,id=e0,netdev=hn0 \
-   -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect-ms=1000 \
-   -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect-ms=1000 \
-   -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
-   -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
-   -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
-   -drive if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=qcow2 \
-   -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,\
-top-id=colo-disk0,file.file.filename=$imagefolder/secondary-active.qcow2,\
-file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/secondary-hidden.qcow2,\
-file.backing.backing=parent0 \
-   -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
-children.0=childs0 \
-   -incoming tcp:0.0.0.0:9998
-
-
-3. On Secondary VM's QEMU monitor, issue command
-{"execute":"qmp_capabilities"}
-{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
-{"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": "9999"} } } }
-{"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } }
-
-Note:
-  a. The qmp command nbd-server-start and nbd-server-add must be run
-     before running the qmp command migrate on primary QEMU
-  b. Active disk, hidden disk and nbd target's length should be the
-     same.
-  c. It is better to put active disk and hidden disk in ramdisk. They
-     will be merged into the parent disk on failover.
-
-4. On Primary VM's QEMU monitor, issue command:
-{"execute":"qmp_capabilities"}
-{"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
-{"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
-{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
-{"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
-
-  Note:
-  a. There should be only one NBD Client for each primary disk.
-  b. The qmp command line must be run after running qmp command line in
-     secondary qemu.
-
-5. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
-You can issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
-to change the idle checkpoint period time
-
-6. Failover test
-You can kill one of the VMs and Failover on the surviving VM:
-
-If you killed the Secondary, then follow "Primary Failover". After that,
-if you want to resume the replication, follow "Primary resume replication"
-
-If you killed the Primary, then follow "Secondary Failover". After that,
-if you want to resume the replication, follow "Secondary resume replication"
-
-== Primary Failover ==
-The Secondary died, resume on the Primary
-
-{"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "child": "children.1"} }
-{"execute": "human-monitor-command", "arguments":{ "command-line": "drive_del replication0" } }
-{"execute": "object-del", "arguments":{ "id": "comp0" } }
-{"execute": "object-del", "arguments":{ "id": "iothread1" } }
-{"execute": "object-del", "arguments":{ "id": "m0" } }
-{"execute": "object-del", "arguments":{ "id": "redire0" } }
-{"execute": "object-del", "arguments":{ "id": "redire1" } }
-{"execute": "x-colo-lost-heartbeat" }
-
-== Secondary Failover ==
-The Primary died, resume on the Secondary and prepare to become the new Primary
-
-{"execute": "nbd-server-stop"}
-{"execute": "x-colo-lost-heartbeat"}
-
-{"execute": "object-del", "arguments":{ "id": "f2" } }
-{"execute": "object-del", "arguments":{ "id": "f1" } }
-{"execute": "chardev-remove", "arguments":{ "id": "red1" } }
-{"execute": "chardev-remove", "arguments":{ "id": "red0" } }
-
-{"execute": "chardev-add", "arguments":{ "id": "mirror0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "0.0.0.0", "port": "9003" } }, "server": true } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare1", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "0.0.0.0", "port": "9004" } }, "server": true } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9001" } }, "server": true } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare0-0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9001" } }, "server": false } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare_out", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9005" } }, "server": true } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare_out0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9005" } }, "server": false } } } }
-
-== Primary resume replication ==
-Resume replication after new Secondary is up.
-
-Start the new Secondary (Steps 2 and 3 above), then on the Primary:
-{"execute": "drive-mirror", "arguments":{ "device": "colo-disk0", "job-id": "resync", "target": "nbd://127.0.0.2:9999/parent0", "mode": "existing", "format": "raw", "sync": "full"} }
-
-Wait until disk is synced, then:
-{"execute": "stop"}
-{"execute": "block-job-cancel", "arguments":{ "device": "resync"} }
-
-{"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
-{"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
-
-{"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
-{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
-{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire1", "netdev": "hn0", "queue": "rx", "outdev": "compare0" } }
-{"execute": "object-add", "arguments":{ "qom-type": "iothread", "id": "iothread1" } }
-{"execute": "object-add", "arguments":{ "qom-type": "colo-compare", "id": "comp0", "primary_in": "compare0-0", "secondary_in": "compare1", "outdev": "compare_out0", "iothread": "iothread1" } }
-
-{"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
-{"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.2:9998" } }
-
-Note:
-If this Primary previously was a Secondary, then we need to insert the
-filters before the filter-rewriter by using the
-""insert": "before", "position": "id=rew0"" Options. See below.
-
-== Secondary resume replication ==
-Become Primary and resume replication after new Secondary is up. Note
-that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary.
-
-Start the new Secondary (Steps 2 and 3 above, but with primary_ip=127.0.0.2),
-then on the old Secondary:
-{"execute": "drive-mirror", "arguments":{ "device": "colo-disk0", "job-id": "resync", "target": "nbd://127.0.0.1:9999/parent0", "mode": "existing", "format": "raw", "sync": "full"} }
-
-Wait until disk is synced, then:
-{"execute": "stop"}
-{"execute": "block-job-cancel", "arguments":{ "device": "resync" } }
-
-{"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=9999,file.export=parent0,node-name=replication0"}}
-{"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
-
-{"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
-{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
-{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire1", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "outdev": "compare0" } }
-{"execute": "object-add", "arguments":{ "qom-type": "iothread", "id": "iothread1" } }
-{"execute": "object-add", "arguments":{ "qom-type": "colo-compare", "id": "comp0", "primary_in": "compare0-0", "secondary_in": "compare1", "outdev": "compare_out0", "iothread": "iothread1" } }
-
-{"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
-{"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.1:9998" } }
-
-== TODO ==
-1. Support shared storage.
-2. Develop the heartbeat part.
-3. Reduce checkpoint VM’s downtime while doing checkpoint.
diff --git a/docs/system/index.rst b/docs/system/index.rst
index 427b020483104f6589878bbf255a367ae114c61b..6268c41aea9c74dc3e59d896b5ae082360bfbb1a 100644
--- a/docs/system/index.rst
+++ b/docs/system/index.rst
@@ -41,3 +41,4 @@ or Hypervisor.Framework.
    igvm
    vm-templating
    sriov
+   qemu-colo
diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
new file mode 100644
index 0000000000000000000000000000000000000000..4b5fbbf398f8a5c4ea6baad615bde94b2b4678d2
--- /dev/null
+++ b/docs/system/qemu-colo.rst
@@ -0,0 +1,360 @@
+Qemu COLO Fault Tolerance
+=========================
+
+| Copyright (c) 2016 Intel Corporation
+| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+| Copyright (c) 2016 Fujitsu, Corp.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+This document gives an overview of COLO's design and how to use it.
+
+Background
+----------
+Virtual machine (VM) replication is a well known technique for providing
+application-agnostic software-implemented hardware fault tolerance,
+also known as "non-stop service".
+
+COLO (COarse-grained LOck-stepping) is a high availability solution.
+Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the
+same request from client, and generate response in parallel too.
+If the response packets from PVM and SVM are identical, they are released
+immediately. Otherwise, a VM checkpoint (on demand) is conducted.
+
+Architecture
+------------
+The architecture of COLO is shown in the diagram below.
+It consists of a pair of networked physical nodes:
+The primary node running the PVM, and the secondary node running the SVM
+to maintain a valid replica of the PVM.
+PVM and SVM execute in parallel and generate output of response packets for
+client requests according to the application semantics.
+
+The incoming packets from the client or external network are received by the
+primary node, and then forwarded to the secondary node, so that both the PVM
+and the SVM are stimulated with the same requests.
+
+COLO receives the outbound packets from both the PVM and SVM and compares them
+before allowing the output to be sent to clients.
+
+The SVM is qualified as a valid replica of the PVM, as long as it generates
+identical responses to all client requests. Once the differences in the outputs
+are detected between the PVM and SVM, COLO withholds transmission of the
+outbound packets until it has successfully synchronized the PVM state to the SVM.
+
+Overview::
+
+      Primary Node                                                            Secondary Node
+    +------------+  +-----------------------+       +------------------------+  +------------+
+    |            |  |       HeartBeat       +<----->+       HeartBeat        |  |            |
+    | Primary VM |  +-----------+-----------+       +-----------+------------+  |Secondary VM|
+    |            |              |                               |               |            |
+    |            |  +-----------|-----------+       +-----------|------------+  |            |
+    |            |  |QEMU   +---v----+      |       |QEMU  +----v---+        |  |            |
+    |            |  |       |Failover|      |       |      |Failover|        |  |            |
+    |            |  |       +--------+      |       |      +--------+        |  |            |
+    |            |  |   +---------------+   |       |   +---------------+    |  |            |
+    |            |  |   | VM Checkpoint +-------------->+ VM Checkpoint |    |  |            |
+    |            |  |   +---------------+   |       |   +---------------+    |  |            |
+    |Requests<--------------------------\ /-----------------\ /--------------------->Requests|
+    |            |  |                   ^ ^ |       |       | |              |  |            |
+    |Responses+---------------------\ /-|-|------------\ /-------------------------+Responses|
+    |            |  |               | | | | |       |  | |  | |              |  |            |
+    |            |  | +-----------+ | | | | |       |  | |  | | +----------+ |  |            |
+    |            |  | | COLO disk | | | | | |       |  | |  | | | COLO disk| |  |            |
+    |            |  | |   Manager +---------------------------->| Manager  | |  |            |
+    |            |  | ++----------+ v v | | |       |  | v  v | +---------++ |  |            |
+    |            |  |  |+-----------+-+-+-++|       | ++-+--+-+---------+ |  |  |            |
+    |            |  |  ||   COLO Proxy     ||       | |   COLO Proxy    | |  |  |            |
+    |            |  |  || (compare packet  ||       | |(adjust sequence | |  |  |            |
+    |            |  |  ||and mirror packet)||       | |    and ACK)     | |  |  |            |
+    |            |  |  |+------------+---+-+|       | +-----------------+ |  |  |            |
+    +------------+  +-----------------------+       +------------------------+  +------------+
+    +------------+     |             |   |                                |     +------------+
+    | VM Monitor |     |             |   |                                |     | VM Monitor |
+    +------------+     |             |   |                                |     +------------+
+    +---------------------------------------+       +----------------------------------------+
+    |   Kernel         |             |   |  |       |   Kernel            |                  |
+    +---------------------------------------+       +----------------------------------------+
+                       |             |   |                                |
+        +--------------v+  +---------v---+--+       +------------------+ +v-------------+
+        |   Storage     |  |External Network|       | External Network | |   Storage    |
+        +---------------+  +----------------+       +------------------+ +--------------+
+
+Components introduction
+^^^^^^^^^^^^^^^^^^^^^^^
+You can see there are several components in COLO's diagram of architecture.
+Their functions are described below.
+
+HeartBeat
+~~~~~~~~~
+Runs on both the primary and secondary nodes, to periodically check platform
+availability. When the primary node suffers a hardware fail-stop failure,
+the heartbeat stops responding, the secondary node will trigger a failover
+as soon as it determines the absence.
+
+COLO disk Manager
+~~~~~~~~~~~~~~~~~
+When primary VM writes data into image, the colo disk manager captures this data
+and sends it to secondary VM's which makes sure the context of secondary VM's
+image is consistent with the context of primary VM 's image.
+For more details, please refer to docs/block-replication.txt.
+
+Checkpoint/Failover Controller
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Modifications of save/restore flow to realize continuous migration,
+to make sure the state of VM in Secondary side is always consistent with VM in
+Primary side.
+
+COLO Proxy
+~~~~~~~~~~
+Delivers packets to Primary and Secondary, and then compare the responses from
+both side. Then decide whether to start a checkpoint according to some rules.
+Please refer to docs/colo-proxy.txt for more information.
+
+Note:
+HeartBeat has not been implemented yet, so you need to trigger failover process
+by using 'x-colo-lost-heartbeat' command.
+
+COLO operation status
+^^^^^^^^^^^^^^^^^^^^^
+
+Overview::
+
+    +-----------------+
+    |                 |
+    |    Start COLO   |
+    |                 |
+    +--------+--------+
+             |
+             |  Main qmp command:
+             |  migrate-set-capabilities with x-colo
+             |  migrate
+             |
+             v
+    +--------+--------+
+    |                 |
+    |  COLO running   |
+    |                 |
+    +--------+--------+
+             |
+             |  Main qmp command:
+             |  x-colo-lost-heartbeat
+             |  or
+             |  some error happened
+             v
+    +--------+--------+
+    |                 |  send qmp event:
+    |  COLO failover  |  COLO_EXIT
+    |                 |
+    +-----------------+
+
+
+COLO use the qmp command to switch and report operation status.
+The diagram just shows the main qmp command, you can get the detail
+in test procedure.
+
+Test procedure
+--------------
+Note: Here we are running both instances on the same host for testing,
+change the IP Addresses if you want to run it on two hosts. Initially
+``127.0.0.1`` is the Primary Host and ``127.0.0.2`` is the Secondary Host.
+
+Startup qemu
+^^^^^^^^^^^^
+**1. Primary**:
+Note: Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
+You don't need to change any IP's here, because ``0.0.0.0`` listens on any
+interface. The chardev's with ``127.0.0.1`` IP's loopback to the local qemu
+instance::
+
+    # imagefolder="/mnt/vms/colo-test-primary"
+
+    # qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
+       -device piix3-usb-uhci -device usb-tablet -name primary \
+       -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
+       -device rtl8139,id=e0,netdev=hn0 \
+       -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server=on,wait=off \
+       -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on \
+       -chardev socket,id=compare0,host=127.0.0.1,port=9001,server=on,wait=off \
+       -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
+       -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server=on,wait=off \
+       -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
+       -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
+       -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
+       -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
+       -object iothread,id=iothread1 \
+       -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,\
+    outdev=compare_out0,iothread=iothread1 \
+       -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
+    children.0.file.filename=$imagefolder/primary.qcow2,children.0.driver=qcow2 -S
+
+
+**2. Secondary**:
+Note: Active and hidden images need to be created only once and the
+size should be the same as ``primary.qcow2``. Again, you don't need to change
+any IP's here, except for the ``$primary_ip`` variable::
+
+    # imagefolder="/mnt/vms/colo-test-secondary"
+    # primary_ip=127.0.0.1
+
+    # qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G
+
+    # qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 10G
+
+    # qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
+       -device piix3-usb-uhci -device usb-tablet -name secondary \
+       -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
+       -device rtl8139,id=e0,netdev=hn0 \
+       -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect-ms=1000 \
+       -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect-ms=1000 \
+       -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
+       -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
+       -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
+       -drive if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=qcow2 \
+       -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,\
+    top-id=colo-disk0,file.file.filename=$imagefolder/secondary-active.qcow2,\
+    file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/secondary-hidden.qcow2,\
+    file.backing.backing=parent0 \
+       -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
+    children.0=childs0 \
+       -incoming tcp:0.0.0.0:9998
+
+
+**3.** On Secondary VM's QEMU monitor, issue command::
+
+    {"execute":"qmp_capabilities"}
+    {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
+    {"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": "9999"} } } }
+    {"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } }
+
+Note:
+  a. The qmp command ``nbd-server-start`` and ``nbd-server-add`` must be run
+     before running the qmp command migrate on primary QEMU
+  b. Active disk, hidden disk and nbd target's length should be the
+     same.
+  c. It is better to put active disk and hidden disk in ramdisk. They
+     will be merged into the parent disk on failover.
+
+**4.** On Primary VM's QEMU monitor, issue command::
+
+    {"execute":"qmp_capabilities"}
+    {"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
+    {"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
+    {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
+    {"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
+
+Note:
+  a. There should be only one NBD Client for each primary disk.
+  b. The qmp command line must be run after running qmp command line in
+     secondary qemu.
+
+**5.** After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
+You can issue command ``{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }``
+to change the idle checkpoint period time
+
+Failover test
+^^^^^^^^^^^^^
+You can kill one of the VMs and Failover on the surviving VM:
+
+If you killed the Secondary, then follow "Primary Failover".
+After that, if you want to resume the replication, follow "Primary resume replication"
+
+If you killed the Primary, then follow "Secondary Failover".
+After that, if you want to resume the replication, follow "Secondary resume replication"
+
+Primary Failover
+~~~~~~~~~~~~~~~~
+The Secondary died, resume on the Primary::
+
+    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "child": "children.1"} }
+    {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_del replication0" } }
+    {"execute": "object-del", "arguments":{ "id": "comp0" } }
+    {"execute": "object-del", "arguments":{ "id": "iothread1" } }
+    {"execute": "object-del", "arguments":{ "id": "m0" } }
+    {"execute": "object-del", "arguments":{ "id": "redire0" } }
+    {"execute": "object-del", "arguments":{ "id": "redire1" } }
+    {"execute": "x-colo-lost-heartbeat" }
+
+Secondary Failover
+~~~~~~~~~~~~~~~~~~
+The Primary died, resume on the Secondary and prepare to become the new Primary::
+
+    {"execute": "nbd-server-stop"}
+    {"execute": "x-colo-lost-heartbeat"}
+
+    {"execute": "object-del", "arguments":{ "id": "f2" } }
+    {"execute": "object-del", "arguments":{ "id": "f1" } }
+    {"execute": "chardev-remove", "arguments":{ "id": "red1" } }
+    {"execute": "chardev-remove", "arguments":{ "id": "red0" } }
+
+    {"execute": "chardev-add", "arguments":{ "id": "mirror0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "0.0.0.0", "port": "9003" } }, "server": true } } } }
+    {"execute": "chardev-add", "arguments":{ "id": "compare1", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "0.0.0.0", "port": "9004" } }, "server": true } } } }
+    {"execute": "chardev-add", "arguments":{ "id": "compare0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9001" } }, "server": true } } } }
+    {"execute": "chardev-add", "arguments":{ "id": "compare0-0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9001" } }, "server": false } } } }
+    {"execute": "chardev-add", "arguments":{ "id": "compare_out", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9005" } }, "server": true } } } }
+    {"execute": "chardev-add", "arguments":{ "id": "compare_out0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9005" } }, "server": false } } } }
+
+Primary resume replication
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Resume replication after new Secondary is up.
+
+Start the new Secondary (Steps 2 and 3 above), then on the Primary::
+
+    {"execute": "drive-mirror", "arguments":{ "device": "colo-disk0", "job-id": "resync", "target": "nbd://127.0.0.2:9999/parent0", "mode": "existing", "format": "raw", "sync": "full"} }
+
+Wait until disk is synced, then::
+
+    {"execute": "stop"}
+    {"execute": "block-job-cancel", "arguments":{ "device": "resync"} }
+
+    {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
+    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
+
+    {"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire1", "netdev": "hn0", "queue": "rx", "outdev": "compare0" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "iothread", "id": "iothread1" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "colo-compare", "id": "comp0", "primary_in": "compare0-0", "secondary_in": "compare1", "outdev": "compare_out0", "iothread": "iothread1" } }
+
+    {"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
+    {"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.2:9998" } }
+
+Note:
+If this Primary previously was a Secondary, then we need to insert the
+filters before the filter-rewriter by using the
+""insert": "before", "position": "id=rew0"" Options. See below.
+
+Secondary resume replication
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Become Primary and resume replication after new Secondary is up. Note
+that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary.
+
+Start the new Secondary (Steps 2 and 3 above, but with primary_ip=127.0.0.2),
+then on the old Secondary::
+
+    {"execute": "drive-mirror", "arguments":{ "device": "colo-disk0", "job-id": "resync", "target": "nbd://127.0.0.1:9999/parent0", "mode": "existing", "format": "raw", "sync": "full"} }
+
+Wait until disk is synced, then::
+
+    {"execute": "stop"}
+    {"execute": "block-job-cancel", "arguments":{ "device": "resync" } }
+
+    {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=9999,file.export=parent0,node-name=replication0"}}
+    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
+
+    {"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire1", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "outdev": "compare0" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "iothread", "id": "iothread1" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "colo-compare", "id": "comp0", "primary_in": "compare0-0", "secondary_in": "compare1", "outdev": "compare_out0", "iothread": "iothread1" } }
+
+    {"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
+    {"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.1:9998" } }
+
+TODO
+----
+1. Support shared storage.
+2. Develop the heartbeat part.
+3. Reduce checkpoint VM’s downtime while doing checkpoint.

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 08/10] qemu-colo.rst: Miscellaneous changes
  2026-01-25 20:40 [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (6 preceding siblings ...)
  2026-01-25 20:40 ` [PATCH v3 07/10] Convert colo main documentation to restructuredText Lukas Straub
@ 2026-01-25 20:40 ` Lukas Straub
  2026-01-26 10:21   ` Zhang Chen
  2026-01-25 20:40 ` [PATCH v3 09/10] qemu-colo.rst: Add my copyright Lukas Straub
  2026-01-25 20:40 ` [PATCH v3 10/10] qemu-colo.rst: Simplify the block replication setup Lukas Straub
  9 siblings, 1 reply; 37+ messages in thread
From: Lukas Straub @ 2026-01-25 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 docs/system/qemu-colo.rst | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
index 4b5fbbf398f8a5c4ea6baad615bde94b2b4678d2..a70e61aa09391cda933031535fa982d27cf6654b 100644
--- a/docs/system/qemu-colo.rst
+++ b/docs/system/qemu-colo.rst
@@ -1,13 +1,6 @@
 Qemu COLO Fault Tolerance
 =========================
 
-| Copyright (c) 2016 Intel Corporation
-| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
-| Copyright (c) 2016 Fujitsu, Corp.
-
-This work is licensed under the terms of the GNU GPL, version 2 or later.
-See the COPYING file in the top-level directory.
-
 This document gives an overview of COLO's design and how to use it.
 
 Background
@@ -82,8 +75,8 @@ Overview::
         |   Storage     |  |External Network|       | External Network | |   Storage    |
         +---------------+  +----------------+       +------------------+ +--------------+
 
-Components introduction
-^^^^^^^^^^^^^^^^^^^^^^^
+Components
+^^^^^^^^^^
 You can see there are several components in COLO's diagram of architecture.
 Their functions are described below.
 
@@ -157,14 +150,21 @@ in test procedure.
 
 Test procedure
 --------------
-Note: Here we are running both instances on the same host for testing,
+
+Setup
+^^^^^
+
+Here we are running both instances on the same host for testing,
 change the IP Addresses if you want to run it on two hosts. Initially
 ``127.0.0.1`` is the Primary Host and ``127.0.0.2`` is the Secondary Host.
 
+COLO uses double the guest ram size on the secondary side. The Qemu version
+should be the same on both hosts.
+
 Startup qemu
 ^^^^^^^^^^^^
 **1. Primary**:
-Note: Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
+Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
 You don't need to change any IP's here, because ``0.0.0.0`` listens on any
 interface. The chardev's with ``127.0.0.1`` IP's loopback to the local qemu
 instance::
@@ -192,7 +192,7 @@ instance::
 
 
 **2. Secondary**:
-Note: Active and hidden images need to be created only once and the
+Active and hidden images need to be created only once and the
 size should be the same as ``primary.qcow2``. Again, you don't need to change
 any IP's here, except for the ``$primary_ip`` variable::
 
@@ -353,8 +353,9 @@ Wait until disk is synced, then::
     {"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
     {"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.1:9998" } }
 
-TODO
-----
-1. Support shared storage.
-2. Develop the heartbeat part.
-3. Reduce checkpoint VM’s downtime while doing checkpoint.
+| Copyright (c) 2016 Intel Corporation
+| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+| Copyright (c) 2016 Fujitsu, Corp.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 09/10] qemu-colo.rst: Add my copyright
  2026-01-25 20:40 [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (7 preceding siblings ...)
  2026-01-25 20:40 ` [PATCH v3 08/10] qemu-colo.rst: Miscellaneous changes Lukas Straub
@ 2026-01-25 20:40 ` Lukas Straub
  2026-01-26 10:23   ` Zhang Chen
  2026-01-25 20:40 ` [PATCH v3 10/10] qemu-colo.rst: Simplify the block replication setup Lukas Straub
  9 siblings, 1 reply; 37+ messages in thread
From: Lukas Straub @ 2026-01-25 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

I have so far contributed 61 commits to the colo project, waranting
the addition of my copyright to this file.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 docs/system/qemu-colo.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
index a70e61aa09391cda933031535fa982d27cf6654b..75abbd80298df79223cb8e70064a5dc83d70f4eb 100644
--- a/docs/system/qemu-colo.rst
+++ b/docs/system/qemu-colo.rst
@@ -356,6 +356,7 @@ Wait until disk is synced, then::
 | Copyright (c) 2016 Intel Corporation
 | Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
 | Copyright (c) 2016 Fujitsu, Corp.
+| Copyright (c) 2026 Lukas Straub <lukasstraub2@web.de>
 
 This work is licensed under the terms of the GNU GPL, version 2 or later.
 See the COPYING file in the top-level directory.

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 10/10] qemu-colo.rst: Simplify the block replication setup
  2026-01-25 20:40 [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (8 preceding siblings ...)
  2026-01-25 20:40 ` [PATCH v3 09/10] qemu-colo.rst: Add my copyright Lukas Straub
@ 2026-01-25 20:40 ` Lukas Straub
  9 siblings, 0 replies; 37+ messages in thread
From: Lukas Straub @ 2026-01-25 20:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

On the primary side we don't actually need the replication
block driver, since it only passes trough all IO.
So simplify the setup and also use 'blockdev-add' instead of
'human-monitor-command'.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 docs/system/qemu-colo.rst | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
index 75abbd80298df79223cb8e70064a5dc83d70f4eb..f7d3b6439cf3401a58c412634239d1a43999a10e 100644
--- a/docs/system/qemu-colo.rst
+++ b/docs/system/qemu-colo.rst
@@ -240,8 +240,8 @@ Note:
 **4.** On Primary VM's QEMU monitor, issue command::
 
     {"execute":"qmp_capabilities"}
-    {"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
-    {"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
+    {"execute": "blockdev-add", "arguments": {"driver": "nbd", "node-name": "nbd0", "server": {"type": "inet", "host": "127.0.0.2", "port": "9999"}, "export": "parent0", "detect-zeroes": "on"} }
+    {"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "nbd0" } }
     {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
     {"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
 
@@ -269,7 +269,7 @@ Primary Failover
 The Secondary died, resume on the Primary::
 
     {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "child": "children.1"} }
-    {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_del replication0" } }
+    {"execute": "blockdev-del", "arguments": {"node-name": "nbd0"} }
     {"execute": "object-del", "arguments":{ "id": "comp0" } }
     {"execute": "object-del", "arguments":{ "id": "iothread1" } }
     {"execute": "object-del", "arguments":{ "id": "m0" } }
@@ -309,8 +309,8 @@ Wait until disk is synced, then::
     {"execute": "stop"}
     {"execute": "block-job-cancel", "arguments":{ "device": "resync"} }
 
-    {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
-    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
+    {"execute": "blockdev-add", "arguments": {"driver": "nbd", "node-name": "nbd0", "server": {"type": "inet", "host": "127.0.0.2", "port": "9999"}, "export": "parent0", "detect-zeroes": "on"} }
+    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "nbd0" } }
 
     {"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
     {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
@@ -341,8 +341,8 @@ Wait until disk is synced, then::
     {"execute": "stop"}
     {"execute": "block-job-cancel", "arguments":{ "device": "resync" } }
 
-    {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=9999,file.export=parent0,node-name=replication0"}}
-    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
+    {"execute": "blockdev-add", "arguments": {"driver": "nbd", "node-name": "nbd0", "server": {"type": "inet", "host": "127.0.0.1", "port": "9999"}, "export": "parent0", "detect-zeroes": "on"} }
+    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "nbd0" } }
 
     {"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
     {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 08/10] qemu-colo.rst: Miscellaneous changes
  2026-01-25 20:40 ` [PATCH v3 08/10] qemu-colo.rst: Miscellaneous changes Lukas Straub
@ 2026-01-26 10:21   ` Zhang Chen
  2026-01-26 10:56     ` Lukas Straub
  0 siblings, 1 reply; 37+ messages in thread
From: Zhang Chen @ 2026-01-26 10:21 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

On Mon, Jan 26, 2026 at 4:40 AM Lukas Straub <lukasstraub2@web.de> wrote:
>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>

It seems this patch doesn't offer any major changes and merging it
with the previous patch(7/10) would be more appropriate.

Thanks

Chen

> ---
>  docs/system/qemu-colo.rst | 35 ++++++++++++++++++-----------------
>  1 file changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
> index 4b5fbbf398f8a5c4ea6baad615bde94b2b4678d2..a70e61aa09391cda933031535fa982d27cf6654b 100644
> --- a/docs/system/qemu-colo.rst
> +++ b/docs/system/qemu-colo.rst
> @@ -1,13 +1,6 @@
>  Qemu COLO Fault Tolerance
>  =========================
>
> -| Copyright (c) 2016 Intel Corporation
> -| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
> -| Copyright (c) 2016 Fujitsu, Corp.
> -
> -This work is licensed under the terms of the GNU GPL, version 2 or later.
> -See the COPYING file in the top-level directory.
> -
>  This document gives an overview of COLO's design and how to use it.
>
>  Background
> @@ -82,8 +75,8 @@ Overview::
>          |   Storage     |  |External Network|       | External Network | |   Storage    |
>          +---------------+  +----------------+       +------------------+ +--------------+
>
> -Components introduction
> -^^^^^^^^^^^^^^^^^^^^^^^
> +Components
> +^^^^^^^^^^
>  You can see there are several components in COLO's diagram of architecture.
>  Their functions are described below.
>
> @@ -157,14 +150,21 @@ in test procedure.
>
>  Test procedure
>  --------------
> -Note: Here we are running both instances on the same host for testing,
> +
> +Setup
> +^^^^^
> +
> +Here we are running both instances on the same host for testing,
>  change the IP Addresses if you want to run it on two hosts. Initially
>  ``127.0.0.1`` is the Primary Host and ``127.0.0.2`` is the Secondary Host.
>
> +COLO uses double the guest ram size on the secondary side. The Qemu version
> +should be the same on both hosts.
> +
>  Startup qemu
>  ^^^^^^^^^^^^
>  **1. Primary**:
> -Note: Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
> +Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
>  You don't need to change any IP's here, because ``0.0.0.0`` listens on any
>  interface. The chardev's with ``127.0.0.1`` IP's loopback to the local qemu
>  instance::
> @@ -192,7 +192,7 @@ instance::
>
>
>  **2. Secondary**:
> -Note: Active and hidden images need to be created only once and the
> +Active and hidden images need to be created only once and the
>  size should be the same as ``primary.qcow2``. Again, you don't need to change
>  any IP's here, except for the ``$primary_ip`` variable::
>
> @@ -353,8 +353,9 @@ Wait until disk is synced, then::
>      {"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
>      {"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.1:9998" } }
>
> -TODO
> -----
> -1. Support shared storage.
> -2. Develop the heartbeat part.
> -3. Reduce checkpoint VM’s downtime while doing checkpoint.
> +| Copyright (c) 2016 Intel Corporation
> +| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
> +| Copyright (c) 2016 Fujitsu, Corp.
> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
>
> --
> 2.39.5
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 09/10] qemu-colo.rst: Add my copyright
  2026-01-25 20:40 ` [PATCH v3 09/10] qemu-colo.rst: Add my copyright Lukas Straub
@ 2026-01-26 10:23   ` Zhang Chen
  0 siblings, 0 replies; 37+ messages in thread
From: Zhang Chen @ 2026-01-26 10:23 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

On Mon, Jan 26, 2026 at 4:41 AM Lukas Straub <lukasstraub2@web.de> wrote:
>
> I have so far contributed 61 commits to the colo project, waranting
> the addition of my copyright to this file.
>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>

Reviewed-by: Zhang Chen <zhangckid@gmail.com>

Thanks
Chen

> ---
>  docs/system/qemu-colo.rst | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
> index a70e61aa09391cda933031535fa982d27cf6654b..75abbd80298df79223cb8e70064a5dc83d70f4eb 100644
> --- a/docs/system/qemu-colo.rst
> +++ b/docs/system/qemu-colo.rst
> @@ -356,6 +356,7 @@ Wait until disk is synced, then::
>  | Copyright (c) 2016 Intel Corporation
>  | Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
>  | Copyright (c) 2016 Fujitsu, Corp.
> +| Copyright (c) 2026 Lukas Straub <lukasstraub2@web.de>
>
>  This work is licensed under the terms of the GNU GPL, version 2 or later.
>  See the COPYING file in the top-level directory.
>
> --
> 2.39.5
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] multifd: Add COLO support
  2026-01-25 20:40 ` [PATCH v3 04/10] multifd: Add COLO support Lukas Straub
@ 2026-01-26 10:36   ` Zhang Chen
  2026-01-26 11:13     ` Lukas Straub
  2026-01-26 14:33   ` Fabiano Rosas
  1 sibling, 1 reply; 37+ messages in thread
From: Zhang Chen @ 2026-01-26 10:36 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Juan Quintela

On Mon, Jan 26, 2026 at 4:40 AM Lukas Straub <lukasstraub2@web.de> wrote:
>
> Like in the normal ram_load() path, put the received pages into the
> colo cache and mark the pages in the bitmap so that they will be
> flushed to the guest later.
>
> Multifd with COLO is useful to reduce the VM pause time during checkpointing
> for latency sensitive workloads. In such workloads the worst-case latency
> is especially important.
>
> Also, this is already worth it for the precopy phase as it helps with
> converging. Moreover, multifd migration is the preferred way to do migration
> nowadays and this allows to use multifd compression with COLO.
>
> Benchmark:
> Cluster nodes
>  - Intel Xenon E5-2630 v3
>  - 48Gb RAM
>  - 10G Ethernet
> Guest
>  - Windows Server 2016
>  - 6Gb RAM
>  - 4 cores
> Workload
>  - Upload a file to the guest with SMB to simulate moderate
>    memory dirtying
>  - Measure the memory transfer time portion of each checkpoint
>  - 600ms COLO checkpoint interval
>
> Results
> Plain
>  idle mean: 4.50ms 99per: 10.33ms
>  load mean: 24.30ms 99per: 78.05ms
> Multifd-4
>  idle mean: 6.48ms 99per: 10.41ms
>  load mean: 14.12ms 99per: 31.27ms
>
> Evaluation
> While multifd has slightly higher latency when the guest idles, it is
> 10ms faster under load and more importantly it's worst case latency is
> less than 1/2 of plain under load as can be seen in the 99. Percentile.
>

Why the multifd get higher latency when the guest idles?  The status same
with normal live migration? Where is the time spent? The Sorry, I
don't know this background yet.

Thanks
Chen

> Signed-off-by: Juan Quintela <quintela@redhat.com>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  MAINTAINERS                |  1 +
>  migration/meson.build      |  2 +-
>  migration/multifd-colo.c   | 50 ++++++++++++++++++++++++++++++++++++++++++++++
>  migration/multifd-colo.h   | 26 ++++++++++++++++++++++++
>  migration/multifd-nocomp.c | 10 +++++++++-
>  migration/multifd.c        |  8 ++++++++
>  migration/multifd.h        |  5 ++++-
>  7 files changed, 99 insertions(+), 3 deletions(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 1e9bdd87c3a2f84f3abfc56986cd793976810fdd..883f0a8f4eb92d0bf0f89fcab4674ccc4aed1cc1 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3853,6 +3853,7 @@ COLO Framework
>  M: Lukas Straub <lukasstraub2@web.de>
>  S: Maintained
>  F: migration/colo*
> +F: migration/multifd-colo.*
>  F: include/migration/colo.h
>  F: include/migration/failover.h
>  F: docs/COLO-FT.txt
> diff --git a/migration/meson.build b/migration/meson.build
> index c7f39bdb55239ecb0e775c77b90a1aa9e6a4a9ce..c9f0f5f9f2137536497e53e960ce70654ad1b394 100644
> --- a/migration/meson.build
> +++ b/migration/meson.build
> @@ -39,7 +39,7 @@ system_ss.add(files(
>  ), gnutls, zlib)
>
>  if get_option('replication').allowed()
> -  system_ss.add(files('colo-failover.c', 'colo.c'))
> +  system_ss.add(files('colo-failover.c', 'colo.c', 'multifd-colo.c'))
>  else
>    system_ss.add(files('colo-stubs.c'))
>  endif
> diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..c47f5044663969e0c9af56da5ec34902d635810a
> --- /dev/null
> +++ b/migration/multifd-colo.c
> @@ -0,0 +1,50 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * multifd colo implementation
> + *
> + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "exec/target_page.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "ram.h"
> +#include "multifd.h"
> +#include "options.h"
> +#include "io/channel-socket.h"
> +#include "migration/colo.h"
> +#include "multifd-colo.h"
> +#include "system/ramblock.h"
> +
> +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
> +{
> +    /*
> +     * While we're still in precopy state (not yet in colo state), we copy
> +     * received pages to both guest and cache. No need to set dirty bits,
> +     * since guest and cache memory are in sync.
> +     */
> +    if (migration_incoming_in_colo_state()) {
> +        colo_record_bitmap(p->block, p->normal, p->normal_num);
> +        colo_record_bitmap(p->block, p->zero, p->zero_num);
> +    }
> +}
> +
> +void multifd_colo_process_recv(MultiFDRecvParams *p)
> +{
> +    if (!migration_incoming_in_colo_state()) {
> +        for (int i = 0; i < p->normal_num; i++) {
> +            void *guest = p->block->host + p->normal[i];
> +            void *cache = p->host + p->normal[i];
> +            memcpy(guest, cache, multifd_ram_page_size());
> +        }
> +        for (int i = 0; i < p->zero_num; i++) {
> +            void *guest = p->block->host + p->zero[i];
> +            memset(guest, 0, multifd_ram_page_size());
> +        }
> +    }
> +}
> diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
> --- /dev/null
> +++ b/migration/multifd-colo.h
> @@ -0,0 +1,26 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * multifd colo header
> + *
> + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
> +#define QEMU_MIGRATION_MULTIFD_COLO_H
> +
> +#ifdef CONFIG_REPLICATION
> +
> +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
> +void multifd_colo_process_recv(MultiFDRecvParams *p);
> +
> +#else
> +
> +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
> +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
> +
> +#endif
> +#endif
> diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> index 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 100644
> --- a/migration/multifd-nocomp.c
> +++ b/migration/multifd-nocomp.c
> @@ -16,6 +16,7 @@
>  #include "file.h"
>  #include "migration-stats.h"
>  #include "multifd.h"
> +#include "multifd-colo.h"
>  #include "options.h"
>  #include "migration.h"
>  #include "qapi/error.h"
> @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
>          return -1;
>      }
>
> -    p->host = p->block->host;
>      for (i = 0; i < p->normal_num; i++) {
>          uint64_t offset = be64_to_cpu(packet->offset[i]);
>
> @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
>          p->zero[i] = offset;
>      }
>
> +    if (migrate_colo()) {
> +        multifd_colo_prepare_recv(p);
> +        assert(p->block->colo_cache);
> +        p->host = p->block->colo_cache;
> +    } else {
> +        p->host = p->block->host;
> +    }
> +
>      return 0;
>  }
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 332e6fc58053462419f3171f6c320ac37648ef7b..220ed8564960fdabc58e4baa069dd252c8ad293c 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -29,6 +29,7 @@
>  #include "qemu-file.h"
>  #include "trace.h"
>  #include "multifd.h"
> +#include "multifd-colo.h"
>  #include "options.h"
>  #include "qemu/yank.h"
>  #include "io/channel-file.h"
> @@ -1258,6 +1259,13 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
>      int ret;
>
>      ret = multifd_recv_state->ops->recv(p, errp);
> +    if (ret != 0) {
> +        return ret;
> +    }
> +
> +    if (migrate_colo()) {
> +        multifd_colo_process_recv(p);
> +    }
>
>      return ret;
>  }
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 89a395aef2b09a6762c45b5361e0ab63256feff6..fbc35702b062fdc3213ce92baed35994f5967c2b 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -279,7 +279,10 @@ typedef struct {
>      uint64_t packets_recved;
>      /* ramblock */
>      RAMBlock *block;
> -    /* ramblock host address */
> +    /*
> +     * Normally, it points to ramblock's host address.  When COLO
> +     * is enabled, it points to the mirror cache for the ramblock.
> +     */
>      uint8_t *host;
>      /* buffers to recv */
>      struct iovec *iov;
>
> --
> 2.39.5
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 08/10] qemu-colo.rst: Miscellaneous changes
  2026-01-26 10:21   ` Zhang Chen
@ 2026-01-26 10:56     ` Lukas Straub
  0 siblings, 0 replies; 37+ messages in thread
From: Lukas Straub @ 2026-01-26 10:56 UTC (permalink / raw)
  To: Zhang Chen
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 4154 bytes --]

On Mon, 26 Jan 2026 18:21:10 +0800
Zhang Chen <zhangckid@gmail.com> wrote:

> On Mon, Jan 26, 2026 at 4:40 AM Lukas Straub <lukasstraub2@web.de> wrote:
> >
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>  
> 
> It seems this patch doesn't offer any major changes and merging it
> with the previous patch(7/10) would be more appropriate.

It's better to have it separately to see what is changed exactly. The
previous patch only does 1:1 conversion from text to rst without any
other changes.

If merged with the previous patch it becomes harder to see that I moved
and added a few lines here, which are changes that where not in the
original text file.

Regards,
Lukas Straub

> 
> Thanks
> 
> Chen
> 
> > ---
> >  docs/system/qemu-colo.rst | 35 ++++++++++++++++++-----------------
> >  1 file changed, 18 insertions(+), 17 deletions(-)
> >
> > diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
> > index 4b5fbbf398f8a5c4ea6baad615bde94b2b4678d2..a70e61aa09391cda933031535fa982d27cf6654b 100644
> > --- a/docs/system/qemu-colo.rst
> > +++ b/docs/system/qemu-colo.rst
> > @@ -1,13 +1,6 @@
> >  Qemu COLO Fault Tolerance
> >  =========================
> >
> > -| Copyright (c) 2016 Intel Corporation
> > -| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
> > -| Copyright (c) 2016 Fujitsu, Corp.
> > -
> > -This work is licensed under the terms of the GNU GPL, version 2 or later.
> > -See the COPYING file in the top-level directory.
> > -
> >  This document gives an overview of COLO's design and how to use it.
> >
> >  Background
> > @@ -82,8 +75,8 @@ Overview::
> >          |   Storage     |  |External Network|       | External Network | |   Storage    |
> >          +---------------+  +----------------+       +------------------+ +--------------+
> >
> > -Components introduction
> > -^^^^^^^^^^^^^^^^^^^^^^^
> > +Components
> > +^^^^^^^^^^
> >  You can see there are several components in COLO's diagram of architecture.
> >  Their functions are described below.
> >
> > @@ -157,14 +150,21 @@ in test procedure.
> >
> >  Test procedure
> >  --------------
> > -Note: Here we are running both instances on the same host for testing,
> > +
> > +Setup
> > +^^^^^
> > +
> > +Here we are running both instances on the same host for testing,
> >  change the IP Addresses if you want to run it on two hosts. Initially
> >  ``127.0.0.1`` is the Primary Host and ``127.0.0.2`` is the Secondary Host.
> >
> > +COLO uses double the guest ram size on the secondary side. The Qemu version
> > +should be the same on both hosts.
> > +
> >  Startup qemu
> >  ^^^^^^^^^^^^
> >  **1. Primary**:
> > -Note: Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
> > +Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
> >  You don't need to change any IP's here, because ``0.0.0.0`` listens on any
> >  interface. The chardev's with ``127.0.0.1`` IP's loopback to the local qemu
> >  instance::
> > @@ -192,7 +192,7 @@ instance::
> >
> >
> >  **2. Secondary**:
> > -Note: Active and hidden images need to be created only once and the
> > +Active and hidden images need to be created only once and the
> >  size should be the same as ``primary.qcow2``. Again, you don't need to change
> >  any IP's here, except for the ``$primary_ip`` variable::
> >
> > @@ -353,8 +353,9 @@ Wait until disk is synced, then::
> >      {"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
> >      {"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.1:9998" } }
> >
> > -TODO
> > -----
> > -1. Support shared storage.
> > -2. Develop the heartbeat part.
> > -3. Reduce checkpoint VM’s downtime while doing checkpoint.
> > +| Copyright (c) 2016 Intel Corporation
> > +| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
> > +| Copyright (c) 2016 Fujitsu, Corp.
> > +
> > +This work is licensed under the terms of the GNU GPL, version 2 or later.
> > +See the COPYING file in the top-level directory.
> >
> > --
> > 2.39.5
> >  


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] multifd: Add COLO support
  2026-01-26 10:36   ` Zhang Chen
@ 2026-01-26 11:13     ` Lukas Straub
  0 siblings, 0 replies; 37+ messages in thread
From: Lukas Straub @ 2026-01-26 11:13 UTC (permalink / raw)
  To: Zhang Chen
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Juan Quintela

[-- Attachment #1: Type: text/plain, Size: 9704 bytes --]

On Mon, 26 Jan 2026 18:36:39 +0800
Zhang Chen <zhangckid@gmail.com> wrote:

> On Mon, Jan 26, 2026 at 4:40 AM Lukas Straub <lukasstraub2@web.de> wrote:
> >
> > Like in the normal ram_load() path, put the received pages into the
> > colo cache and mark the pages in the bitmap so that they will be
> > flushed to the guest later.
> >
> > Multifd with COLO is useful to reduce the VM pause time during checkpointing
> > for latency sensitive workloads. In such workloads the worst-case latency
> > is especially important.
> >
> > Also, this is already worth it for the precopy phase as it helps with
> > converging. Moreover, multifd migration is the preferred way to do migration
> > nowadays and this allows to use multifd compression with COLO.
> >
> > Benchmark:
> > Cluster nodes
> >  - Intel Xenon E5-2630 v3
> >  - 48Gb RAM
> >  - 10G Ethernet
> > Guest
> >  - Windows Server 2016
> >  - 6Gb RAM
> >  - 4 cores
> > Workload
> >  - Upload a file to the guest with SMB to simulate moderate
> >    memory dirtying
> >  - Measure the memory transfer time portion of each checkpoint
> >  - 600ms COLO checkpoint interval
> >
> > Results
> > Plain
> >  idle mean: 4.50ms 99per: 10.33ms
> >  load mean: 24.30ms 99per: 78.05ms
> > Multifd-4
> >  idle mean: 6.48ms 99per: 10.41ms
> >  load mean: 14.12ms 99per: 31.27ms
> >
> > Evaluation
> > While multifd has slightly higher latency when the guest idles, it is
> > 10ms faster under load and more importantly it's worst case latency is
> > less than 1/2 of plain under load as can be seen in the 99. Percentile.
> >  
> 
> Why the multifd get higher latency when the guest idles?  The status same
> with normal live migration? Where is the time spent? The Sorry, I
> don't know this background yet.

Not sure, it could be more overhead due to coordinating the multifd
threads.

But it also can be explained from the sample variation. Here I also
calculate the standard deviation of the sample.
60% of samples are within +- one stddev.

plain idle: mean 4.50 99per 10.33 stddev 1.80
plain load: mean 24.30 99per 78.05 stddev 13.65
multifd-4 idle: mean 6.48 99per 10.41 stddev 2.53
multifd-4 load: mean 14.12 99per 31.27 stddev 7.48

So, I don't think its a significant difference.

> 
> Thanks
> Chen
> 
> > Signed-off-by: Juan Quintela <quintela@redhat.com>
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > ---
> >  MAINTAINERS                |  1 +
> >  migration/meson.build      |  2 +-
> >  migration/multifd-colo.c   | 50 ++++++++++++++++++++++++++++++++++++++++++++++
> >  migration/multifd-colo.h   | 26 ++++++++++++++++++++++++
> >  migration/multifd-nocomp.c | 10 +++++++++-
> >  migration/multifd.c        |  8 ++++++++
> >  migration/multifd.h        |  5 ++++-
> >  7 files changed, 99 insertions(+), 3 deletions(-)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 1e9bdd87c3a2f84f3abfc56986cd793976810fdd..883f0a8f4eb92d0bf0f89fcab4674ccc4aed1cc1 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3853,6 +3853,7 @@ COLO Framework
> >  M: Lukas Straub <lukasstraub2@web.de>
> >  S: Maintained
> >  F: migration/colo*
> > +F: migration/multifd-colo.*
> >  F: include/migration/colo.h
> >  F: include/migration/failover.h
> >  F: docs/COLO-FT.txt
> > diff --git a/migration/meson.build b/migration/meson.build
> > index c7f39bdb55239ecb0e775c77b90a1aa9e6a4a9ce..c9f0f5f9f2137536497e53e960ce70654ad1b394 100644
> > --- a/migration/meson.build
> > +++ b/migration/meson.build
> > @@ -39,7 +39,7 @@ system_ss.add(files(
> >  ), gnutls, zlib)
> >
> >  if get_option('replication').allowed()
> > -  system_ss.add(files('colo-failover.c', 'colo.c'))
> > +  system_ss.add(files('colo-failover.c', 'colo.c', 'multifd-colo.c'))
> >  else
> >    system_ss.add(files('colo-stubs.c'))
> >  endif
> > diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..c47f5044663969e0c9af56da5ec34902d635810a
> > --- /dev/null
> > +++ b/migration/multifd-colo.c
> > @@ -0,0 +1,50 @@
> > +/*
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + *
> > + * multifd colo implementation
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "exec/target_page.h"
> > +#include "qemu/error-report.h"
> > +#include "qapi/error.h"
> > +#include "ram.h"
> > +#include "multifd.h"
> > +#include "options.h"
> > +#include "io/channel-socket.h"
> > +#include "migration/colo.h"
> > +#include "multifd-colo.h"
> > +#include "system/ramblock.h"
> > +
> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
> > +{
> > +    /*
> > +     * While we're still in precopy state (not yet in colo state), we copy
> > +     * received pages to both guest and cache. No need to set dirty bits,
> > +     * since guest and cache memory are in sync.
> > +     */
> > +    if (migration_incoming_in_colo_state()) {
> > +        colo_record_bitmap(p->block, p->normal, p->normal_num);
> > +        colo_record_bitmap(p->block, p->zero, p->zero_num);
> > +    }
> > +}
> > +
> > +void multifd_colo_process_recv(MultiFDRecvParams *p)
> > +{
> > +    if (!migration_incoming_in_colo_state()) {
> > +        for (int i = 0; i < p->normal_num; i++) {
> > +            void *guest = p->block->host + p->normal[i];
> > +            void *cache = p->host + p->normal[i];
> > +            memcpy(guest, cache, multifd_ram_page_size());
> > +        }
> > +        for (int i = 0; i < p->zero_num; i++) {
> > +            void *guest = p->block->host + p->zero[i];
> > +            memset(guest, 0, multifd_ram_page_size());
> > +        }
> > +    }
> > +}
> > diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
> > --- /dev/null
> > +++ b/migration/multifd-colo.h
> > @@ -0,0 +1,26 @@
> > +/*
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + *
> > + * multifd colo header
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
> > +#define QEMU_MIGRATION_MULTIFD_COLO_H
> > +
> > +#ifdef CONFIG_REPLICATION
> > +
> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
> > +void multifd_colo_process_recv(MultiFDRecvParams *p);
> > +
> > +#else
> > +
> > +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
> > +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
> > +
> > +#endif
> > +#endif
> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> > index 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 100644
> > --- a/migration/multifd-nocomp.c
> > +++ b/migration/multifd-nocomp.c
> > @@ -16,6 +16,7 @@
> >  #include "file.h"
> >  #include "migration-stats.h"
> >  #include "multifd.h"
> > +#include "multifd-colo.h"
> >  #include "options.h"
> >  #include "migration.h"
> >  #include "qapi/error.h"
> > @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >          return -1;
> >      }
> >
> > -    p->host = p->block->host;
> >      for (i = 0; i < p->normal_num; i++) {
> >          uint64_t offset = be64_to_cpu(packet->offset[i]);
> >
> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >          p->zero[i] = offset;
> >      }
> >
> > +    if (migrate_colo()) {
> > +        multifd_colo_prepare_recv(p);
> > +        assert(p->block->colo_cache);
> > +        p->host = p->block->colo_cache;
> > +    } else {
> > +        p->host = p->block->host;
> > +    }
> > +
> >      return 0;
> >  }
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 332e6fc58053462419f3171f6c320ac37648ef7b..220ed8564960fdabc58e4baa069dd252c8ad293c 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -29,6 +29,7 @@
> >  #include "qemu-file.h"
> >  #include "trace.h"
> >  #include "multifd.h"
> > +#include "multifd-colo.h"
> >  #include "options.h"
> >  #include "qemu/yank.h"
> >  #include "io/channel-file.h"
> > @@ -1258,6 +1259,13 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
> >      int ret;
> >
> >      ret = multifd_recv_state->ops->recv(p, errp);
> > +    if (ret != 0) {
> > +        return ret;
> > +    }
> > +
> > +    if (migrate_colo()) {
> > +        multifd_colo_process_recv(p);
> > +    }
> >
> >      return ret;
> >  }
> > diff --git a/migration/multifd.h b/migration/multifd.h
> > index 89a395aef2b09a6762c45b5361e0ab63256feff6..fbc35702b062fdc3213ce92baed35994f5967c2b 100644
> > --- a/migration/multifd.h
> > +++ b/migration/multifd.h
> > @@ -279,7 +279,10 @@ typedef struct {
> >      uint64_t packets_recved;
> >      /* ramblock */
> >      RAMBlock *block;
> > -    /* ramblock host address */
> > +    /*
> > +     * Normally, it points to ramblock's host address.  When COLO
> > +     * is enabled, it points to the mirror cache for the ramblock.
> > +     */
> >      uint8_t *host;
> >      /* buffers to recv */
> >      struct iovec *iov;
> >
> > --
> > 2.39.5
> >  


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 03/10] Move ram state receive into multifd_ram_state_recv()
  2026-01-25 20:40 ` [PATCH v3 03/10] Move ram state receive into multifd_ram_state_recv() Lukas Straub
@ 2026-01-26 12:51   ` Fabiano Rosas
  0 siblings, 0 replies; 37+ messages in thread
From: Fabiano Rosas @ 2026-01-26 12:51 UTC (permalink / raw)
  To: Lukas Straub, qemu-devel
  Cc: Peter Xu, Laurent Vivier, Paolo Bonzini, Zhang Chen,
	Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

Lukas Straub <lukasstraub2@web.de> writes:

> This is in preparation for the next patch.
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>

Reviewed-by: Fabiano Rosas <farosas@suse.de>

> ---
>  migration/multifd.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index ad6261688fdf98a5c7f4ee9fb80ba2901201a33e..332e6fc58053462419f3171f6c320ac37648ef7b 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1253,6 +1253,15 @@ static int multifd_device_state_recv(MultiFDRecvParams *p, Error **errp)
>      return ret;
>  }
>  
> +static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
> +{
> +    int ret;
> +
> +    ret = multifd_recv_state->ops->recv(p, errp);
> +
> +    return ret;
> +}
> +
>  static void *multifd_recv_thread(void *opaque)
>  {
>      MigrationState *s = migrate_get_current();
> @@ -1387,7 +1396,7 @@ static void *multifd_recv_thread(void *opaque)
>                  assert(use_packets);
>                  ret = multifd_device_state_recv(p, &local_err);
>              } else {
> -                ret = multifd_recv_state->ops->recv(p, &local_err);
> +                ret = multifd_ram_state_recv(p, &local_err);
>              }
>              if (ret != 0) {
>                  break;


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] multifd: Add COLO support
  2026-01-25 20:40 ` [PATCH v3 04/10] multifd: Add COLO support Lukas Straub
  2026-01-26 10:36   ` Zhang Chen
@ 2026-01-26 14:33   ` Fabiano Rosas
  2026-01-26 19:33     ` Lukas Straub
  1 sibling, 1 reply; 37+ messages in thread
From: Fabiano Rosas @ 2026-01-26 14:33 UTC (permalink / raw)
  To: Lukas Straub, qemu-devel
  Cc: Peter Xu, Laurent Vivier, Paolo Bonzini, Zhang Chen,
	Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub, Juan Quintela

Lukas Straub <lukasstraub2@web.de> writes:

> Like in the normal ram_load() path, put the received pages into the
> colo cache and mark the pages in the bitmap so that they will be
> flushed to the guest later.
>



> Multifd with COLO is useful to reduce the VM pause time during checkpointing
> for latency sensitive workloads. In such workloads the worst-case latency
> is especially important.
>
> Also, this is already worth it for the precopy phase as it helps with
> converging. Moreover, multifd migration is the preferred way to do migration
> nowadays and this allows to use multifd compression with COLO.
>
> Benchmark:
> Cluster nodes
>  - Intel Xenon E5-2630 v3
>  - 48Gb RAM
>  - 10G Ethernet
> Guest
>  - Windows Server 2016
>  - 6Gb RAM
>  - 4 cores
> Workload
>  - Upload a file to the guest with SMB to simulate moderate
>    memory dirtying
>  - Measure the memory transfer time portion of each checkpoint
>  - 600ms COLO checkpoint interval
>
> Results
> Plain
>  idle mean: 4.50ms 99per: 10.33ms
>  load mean: 24.30ms 99per: 78.05ms
> Multifd-4
>  idle mean: 6.48ms 99per: 10.41ms
>  load mean: 14.12ms 99per: 31.27ms
>
> Evaluation
> While multifd has slightly higher latency when the guest idles, it is
> 10ms faster under load and more importantly it's worst case latency is
> less than 1/2 of plain under load as can be seen in the 99. Percentile.
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  MAINTAINERS                |  1 +
>  migration/meson.build      |  2 +-
>  migration/multifd-colo.c   | 50 ++++++++++++++++++++++++++++++++++++++++++++++
>  migration/multifd-colo.h   | 26 ++++++++++++++++++++++++
>  migration/multifd-nocomp.c | 10 +++++++++-
>  migration/multifd.c        |  8 ++++++++
>  migration/multifd.h        |  5 ++++-
>  7 files changed, 99 insertions(+), 3 deletions(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 1e9bdd87c3a2f84f3abfc56986cd793976810fdd..883f0a8f4eb92d0bf0f89fcab4674ccc4aed1cc1 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3853,6 +3853,7 @@ COLO Framework
>  M: Lukas Straub <lukasstraub2@web.de>
>  S: Maintained
>  F: migration/colo*
> +F: migration/multifd-colo.*
>  F: include/migration/colo.h
>  F: include/migration/failover.h
>  F: docs/COLO-FT.txt
> diff --git a/migration/meson.build b/migration/meson.build
> index c7f39bdb55239ecb0e775c77b90a1aa9e6a4a9ce..c9f0f5f9f2137536497e53e960ce70654ad1b394 100644
> --- a/migration/meson.build
> +++ b/migration/meson.build
> @@ -39,7 +39,7 @@ system_ss.add(files(
>  ), gnutls, zlib)
>  
>  if get_option('replication').allowed()
> -  system_ss.add(files('colo-failover.c', 'colo.c'))
> +  system_ss.add(files('colo-failover.c', 'colo.c', 'multifd-colo.c'))
>  else
>    system_ss.add(files('colo-stubs.c'))
>  endif
> diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..c47f5044663969e0c9af56da5ec34902d635810a
> --- /dev/null
> +++ b/migration/multifd-colo.c
> @@ -0,0 +1,50 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * multifd colo implementation
> + *
> + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "exec/target_page.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "ram.h"
> +#include "multifd.h"
> +#include "options.h"
> +#include "io/channel-socket.h"
> +#include "migration/colo.h"
> +#include "multifd-colo.h"
> +#include "system/ramblock.h"
> +
> +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
> +{
> +    /*
> +     * While we're still in precopy state (not yet in colo state), we copy
> +     * received pages to both guest and cache. No need to set dirty bits,
> +     * since guest and cache memory are in sync.
> +     */
> +    if (migration_incoming_in_colo_state()) {

What's the relationship between migration_incoming_in_colo_state() and
migration_incoming_colo_enabled()? ram_load_precopy() checks both. Would
migration_incoming_colo_enabled affect multifd as well?

The multifd recv threads will be running until after
process_incoming_migration_bh(), which is when
migration_incoming_disable_colo() runs.

Also, is the colo_cache guaranteed to be around until multifd threads
exit?

> +        colo_record_bitmap(p->block, p->normal, p->normal_num);
> +        colo_record_bitmap(p->block, p->zero, p->zero_num);
> +    }
> +}
> +
> +void multifd_colo_process_recv(MultiFDRecvParams *p)
> +{
> +    if (!migration_incoming_in_colo_state()) {
> +        for (int i = 0; i < p->normal_num; i++) {
> +            void *guest = p->block->host + p->normal[i];
> +            void *cache = p->host + p->normal[i];
> +            memcpy(guest, cache, multifd_ram_page_size());
> +        }

I see some differences between what ram.c does and what multifd will do
after this patch regarding which flags are checked and order of copies
(code below):

ram.c:

  - migration_incoming_colo_enabled && migration_incoming_in_colo_state:
  Reads from stream into colo_cache.
  
  - migration_incoming_colo_enabled && !migration_incoming_in_colo_state:
  Reads from stream into guest and then memcpy into colo_cache.

  - !migration_incoming_colo_enabled
  Reads from stream into guest.

multifd.c:

  - migrate_colo:
  Reads from stream into colo_cache.
  
  - !migration_incoming_in_colo_state:
  memcpy from colo_cache into guest.

  - !migration_incoming_colo_enabled
  ???

The resulting state should be the same, but I wonder if we want to i) use
the same checks in multifd and ii) when not in colo state, copy first
into guest (using readv) and later memcpy into the colo_cache.

---
ram.c:

host = host_from_ram_block_offset(block, addr);
if (migration_incoming_colo_enabled()) {
    if (migration_incoming_in_colo_state()) {
        host = colo_cache_from_block_offset(block, addr, true);
    } else {
        host_bak = colo_cache_from_block_offset(block, addr, false);
    }
}
qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
if (host_bak) {
    memcpy(host_bak, host, TARGET_PAGE_SIZE);
}

multifd:

if (migrate_colo()) {
    p->host = p->block->colo_cache;
}

for (int i = 0; i < p->normal_num; i++) {
    p->iov[i].iov_base = p->host + p->normal[i];
}
return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);

if (!migration_incoming_in_colo_state()) {
    for (int i = 0; i < p->normal_num; i++) {
        void *guest = p->block->host + p->normal[i];
        void *cache = p->host + p->normal[i];
        memcpy(guest, cache, multifd_ram_page_size());
    }
}
---

> +        for (int i = 0; i < p->zero_num; i++) {
> +            void *guest = p->block->host + p->zero[i];
> +            memset(guest, 0, multifd_ram_page_size());
> +        }

At multifd_nocomp_recv, there will be a call to
multifd_recv_zero_page_process(), which by that point will have p->host
== p->block->colo_cache, so it looks like that function will do some
zero page processing in the colo_cache, setting the rb->receivedmap for
pages in the colo_cache and potentially also doing a memcpy. Is this
intended?

I'm thinking that maybe it would overall be better to hook colo directly
in to multifd_nocomp_recv:

static int multifd_nocomp_recv(MultiFDRecvParams *p, Error **errp)
{
    uint32_t flags;

    if (migrate_mapped_ram()) {
        return multifd_file_recv_data(p, errp);
    }

    flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;

    if (flags != MULTIFD_FLAG_NOCOMP) {
        error_setg(errp, "multifd %u: flags received %x flags expected %x",
                   p->id, flags, MULTIFD_FLAG_NOCOMP);
        return -1;
    }

+    if (migration_incoming_colo_enabled() && migration_incoming_in_colo_state()) {
+        p->host = p->block->colo_cache;
+    } // or else{}, depending on how deal with zero pages in the cache

    multifd_recv_zero_page_process(p);

    if (!p->normal_num) {
        return 0;
    }

    for (int i = 0; i < p->normal_num; i++) {
        p->iov[i].iov_base = p->host + p->normal[i];
        p->iov[i].iov_len = multifd_ram_page_size();
        ramblock_recv_bitmap_set_offset(p->block, p->normal[i]);
    }
+    ret = qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
+    if (ret != 0) {
+        return ret;
+    }
+
+    if (migration_incoming_colo_enabled()) {
+        multifd_colo_process_recv();
+    }

    return ret;
}


> +    }
> +}
> diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
> --- /dev/null
> +++ b/migration/multifd-colo.h
> @@ -0,0 +1,26 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * multifd colo header
> + *
> + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
> +#define QEMU_MIGRATION_MULTIFD_COLO_H
> +
> +#ifdef CONFIG_REPLICATION
> +
> +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
> +void multifd_colo_process_recv(MultiFDRecvParams *p);
> +
> +#else
> +
> +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
> +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
> +
> +#endif
> +#endif
> diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> index 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 100644
> --- a/migration/multifd-nocomp.c
> +++ b/migration/multifd-nocomp.c
> @@ -16,6 +16,7 @@
>  #include "file.h"
>  #include "migration-stats.h"
>  #include "multifd.h"
> +#include "multifd-colo.h"
>  #include "options.h"
>  #include "migration.h"
>  #include "qapi/error.h"
> @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
>          return -1;
>      }
>  
> -    p->host = p->block->host;
>      for (i = 0; i < p->normal_num; i++) {
>          uint64_t offset = be64_to_cpu(packet->offset[i]);
>  
> @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
>          p->zero[i] = offset;
>      }
>  
> +    if (migrate_colo()) {
> +        multifd_colo_prepare_recv(p);
> +        assert(p->block->colo_cache);
> +        p->host = p->block->colo_cache;

Can't you just use p->block->colo_cache later? I don't see why p->host
needs to be set beforehand even in the non-colo case.

> +    } else {
> +        p->host = p->block->host;
> +    }
> +
>      return 0;
>  }
>  
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 332e6fc58053462419f3171f6c320ac37648ef7b..220ed8564960fdabc58e4baa069dd252c8ad293c 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -29,6 +29,7 @@
>  #include "qemu-file.h"
>  #include "trace.h"
>  #include "multifd.h"
> +#include "multifd-colo.h"
>  #include "options.h"
>  #include "qemu/yank.h"
>  #include "io/channel-file.h"
> @@ -1258,6 +1259,13 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
>      int ret;
>  
>      ret = multifd_recv_state->ops->recv(p, errp);
> +    if (ret != 0) {
> +        return ret;
> +    }
> +
> +    if (migrate_colo()) {
> +        multifd_colo_process_recv(p);
> +    }

Either put all of colo hooks in multifd.c or multifd-nocomp.c. I think
the latter is more appropriate as we have mapped_ram already in
there. Let's drop patch 3 and put this in multifd_nocomp_recv().

>  
>      return ret;
>  }
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 89a395aef2b09a6762c45b5361e0ab63256feff6..fbc35702b062fdc3213ce92baed35994f5967c2b 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -279,7 +279,10 @@ typedef struct {
>      uint64_t packets_recved;
>      /* ramblock */
>      RAMBlock *block;
> -    /* ramblock host address */
> +    /*
> +     * Normally, it points to ramblock's host address.  When COLO
> +     * is enabled, it points to the mirror cache for the ramblock.
> +     */
>      uint8_t *host;
>      /* buffers to recv */
>      struct iovec *iov;


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/10] migration-test: Add COLO migration unit test
  2026-01-25 20:40 ` [PATCH v3 06/10] migration-test: Add COLO migration unit test Lukas Straub
@ 2026-01-26 14:40   ` Fabiano Rosas
  2026-01-27 20:49   ` Peter Xu
  2026-01-28 12:32   ` Fabiano Rosas
  2 siblings, 0 replies; 37+ messages in thread
From: Fabiano Rosas @ 2026-01-26 14:40 UTC (permalink / raw)
  To: Lukas Straub, qemu-devel
  Cc: Peter Xu, Laurent Vivier, Paolo Bonzini, Zhang Chen,
	Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

Lukas Straub <lukasstraub2@web.de> writes:

> Add a COLO migration test for COLO migration and failover.
>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>

Reviewed-by: Fabiano Rosas <farosas@suse.de>

Looks ok at first sight, I'll later to some stress testing which usually
picks up subtle issues.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] multifd: Add COLO support
  2026-01-26 14:33   ` Fabiano Rosas
@ 2026-01-26 19:33     ` Lukas Straub
  2026-01-26 21:37       ` Fabiano Rosas
  0 siblings, 1 reply; 37+ messages in thread
From: Lukas Straub @ 2026-01-26 19:33 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Peter Xu, Laurent Vivier, Paolo Bonzini, Zhang Chen,
	Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Juan Quintela

[-- Attachment #1: Type: text/plain, Size: 15154 bytes --]

On Mon, 26 Jan 2026 11:33:13 -0300
Fabiano Rosas <farosas@suse.de> wrote:

> Lukas Straub <lukasstraub2@web.de> writes:
> 
> > Like in the normal ram_load() path, put the received pages into the
> > colo cache and mark the pages in the bitmap so that they will be
> > flushed to the guest later.
> >  
> 
> 
> 
> > Multifd with COLO is useful to reduce the VM pause time during checkpointing
> > for latency sensitive workloads. In such workloads the worst-case latency
> > is especially important.
> >
> > Also, this is already worth it for the precopy phase as it helps with
> > converging. Moreover, multifd migration is the preferred way to do migration
> > nowadays and this allows to use multifd compression with COLO.
> >
> > Benchmark:
> > Cluster nodes
> >  - Intel Xenon E5-2630 v3
> >  - 48Gb RAM
> >  - 10G Ethernet
> > Guest
> >  - Windows Server 2016
> >  - 6Gb RAM
> >  - 4 cores
> > Workload
> >  - Upload a file to the guest with SMB to simulate moderate
> >    memory dirtying
> >  - Measure the memory transfer time portion of each checkpoint
> >  - 600ms COLO checkpoint interval
> >
> > Results
> > Plain
> >  idle mean: 4.50ms 99per: 10.33ms
> >  load mean: 24.30ms 99per: 78.05ms
> > Multifd-4
> >  idle mean: 6.48ms 99per: 10.41ms
> >  load mean: 14.12ms 99per: 31.27ms
> >
> > Evaluation
> > While multifd has slightly higher latency when the guest idles, it is
> > 10ms faster under load and more importantly it's worst case latency is
> > less than 1/2 of plain under load as can be seen in the 99. Percentile.
> >
> > Signed-off-by: Juan Quintela <quintela@redhat.com>
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > ---
> >  MAINTAINERS                |  1 +
> >  migration/meson.build      |  2 +-
> >  migration/multifd-colo.c   | 50 ++++++++++++++++++++++++++++++++++++++++++++++
> >  migration/multifd-colo.h   | 26 ++++++++++++++++++++++++
> >  migration/multifd-nocomp.c | 10 +++++++++-
> >  migration/multifd.c        |  8 ++++++++
> >  migration/multifd.h        |  5 ++++-
> >  7 files changed, 99 insertions(+), 3 deletions(-)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 1e9bdd87c3a2f84f3abfc56986cd793976810fdd..883f0a8f4eb92d0bf0f89fcab4674ccc4aed1cc1 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3853,6 +3853,7 @@ COLO Framework
> >  M: Lukas Straub <lukasstraub2@web.de>
> >  S: Maintained
> >  F: migration/colo*
> > +F: migration/multifd-colo.*
> >  F: include/migration/colo.h
> >  F: include/migration/failover.h
> >  F: docs/COLO-FT.txt
> > diff --git a/migration/meson.build b/migration/meson.build
> > index c7f39bdb55239ecb0e775c77b90a1aa9e6a4a9ce..c9f0f5f9f2137536497e53e960ce70654ad1b394 100644
> > --- a/migration/meson.build
> > +++ b/migration/meson.build
> > @@ -39,7 +39,7 @@ system_ss.add(files(
> >  ), gnutls, zlib)
> >  
> >  if get_option('replication').allowed()
> > -  system_ss.add(files('colo-failover.c', 'colo.c'))
> > +  system_ss.add(files('colo-failover.c', 'colo.c', 'multifd-colo.c'))
> >  else
> >    system_ss.add(files('colo-stubs.c'))
> >  endif
> > diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..c47f5044663969e0c9af56da5ec34902d635810a
> > --- /dev/null
> > +++ b/migration/multifd-colo.c
> > @@ -0,0 +1,50 @@
> > +/*
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + *
> > + * multifd colo implementation
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "exec/target_page.h"
> > +#include "qemu/error-report.h"
> > +#include "qapi/error.h"
> > +#include "ram.h"
> > +#include "multifd.h"
> > +#include "options.h"
> > +#include "io/channel-socket.h"
> > +#include "migration/colo.h"
> > +#include "multifd-colo.h"
> > +#include "system/ramblock.h"
> > +
> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
> > +{
> > +    /*
> > +     * While we're still in precopy state (not yet in colo state), we copy
> > +     * received pages to both guest and cache. No need to set dirty bits,
> > +     * since guest and cache memory are in sync.
> > +     */
> > +    if (migration_incoming_in_colo_state()) {  
> 
> What's the relationship between migration_incoming_in_colo_state() and
> migration_incoming_colo_enabled()? ram_load_precopy() checks both. Would
> migration_incoming_colo_enabled affect multifd as well?

So first off migration_incoming_colo_enabled() and migrate_colo()
are equivalent in practice since
121ccedc2b migration: block incoming colo when capability is disabled

(I have some cleanup patches lying around, but that will be for later)

For colo, we do normal precopy migration and at the end we go into colo
state and then
migration_incoming_in_colo_state() will be true.

Here we check migrate_colo() outside of these functions as Peter
requested that in a previous version.

> 
> The multifd recv threads will be running until after
> process_incoming_migration_bh(), which is when
> migration_incoming_disable_colo() runs.

That is not an issue as we use migrate_colo() here.

> 
> Also, is the colo_cache guaranteed to be around until multifd threads
> exit?

This is an issue. I will fix it in the next version.

>
> > +        colo_record_bitmap(p->block, p->normal, p->normal_num);
> > +        colo_record_bitmap(p->block, p->zero, p->zero_num);
> > +    }
> > +}
> > +
> > +void multifd_colo_process_recv(MultiFDRecvParams *p)
> > +{
> > +    if (!migration_incoming_in_colo_state()) {
> > +        for (int i = 0; i < p->normal_num; i++) {
> > +            void *guest = p->block->host + p->normal[i];
> > +            void *cache = p->host + p->normal[i];
> > +            memcpy(guest, cache, multifd_ram_page_size());
> > +        }  
> 
> I see some differences between what ram.c does and what multifd will do
> after this patch regarding which flags are checked and order of copies
> (code below):
> 
> ram.c:
> 
>   - migration_incoming_colo_enabled && migration_incoming_in_colo_state:
>   Reads from stream into colo_cache.
>   
>   - migration_incoming_colo_enabled && !migration_incoming_in_colo_state:
>   Reads from stream into guest and then memcpy into colo_cache.
> 
>   - !migration_incoming_colo_enabled
>   Reads from stream into guest.
> 
> multifd.c:
> 
>   - migrate_colo:
>   Reads from stream into colo_cache.
>   
>   - !migration_incoming_in_colo_state:
>   memcpy from colo_cache into guest.
> 
>   - !migration_incoming_colo_enabled
>   ???
> 
> The resulting state should be the same, but I wonder if we want to i) use
> the same checks in multifd

migration_incoming_colo_enabled() shouldn't even exist anymore, so I'm
not using it here. migrate_colo() is much easier to reason about.

> and ii) when not in colo state, copy first
> into guest (using readv) and later memcpy into the colo_cache.

I think it is easier the way it is now.

> 
> ---
> ram.c:
> 
> host = host_from_ram_block_offset(block, addr);
> if (migration_incoming_colo_enabled()) {
>     if (migration_incoming_in_colo_state()) {
>         host = colo_cache_from_block_offset(block, addr, true);
>     } else {
>         host_bak = colo_cache_from_block_offset(block, addr, false);
>     }
> }
> qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> if (host_bak) {
>     memcpy(host_bak, host, TARGET_PAGE_SIZE);
> }
> 
> multifd:
> 
> if (migrate_colo()) {
>     p->host = p->block->colo_cache;
> }
> 
> for (int i = 0; i < p->normal_num; i++) {
>     p->iov[i].iov_base = p->host + p->normal[i];
> }
> return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
> 
> if (!migration_incoming_in_colo_state()) {
>     for (int i = 0; i < p->normal_num; i++) {
>         void *guest = p->block->host + p->normal[i];
>         void *cache = p->host + p->normal[i];
>         memcpy(guest, cache, multifd_ram_page_size());
>     }
> }
> ---
> 
> > +        for (int i = 0; i < p->zero_num; i++) {
> > +            void *guest = p->block->host + p->zero[i];
> > +            memset(guest, 0, multifd_ram_page_size());
> > +        }  
> 
> At multifd_nocomp_recv, there will be a call to
> multifd_recv_zero_page_process(), which by that point will have p->host
> == p->block->colo_cache, so it looks like that function will do some
> zero page processing in the colo_cache, setting the rb->receivedmap for
> pages in the colo_cache and potentially also doing a memcpy. Is this
> intended?

rb->receivedmap is only for postcopy, right? So it doesn't apply with
colo.

> 
> I'm thinking that maybe it would overall be better to hook colo directly
> in to multifd_nocomp_recv:

But then it will only work for nocomp, right? It feels like the wrong
level of abstraction to me.

> 
> static int multifd_nocomp_recv(MultiFDRecvParams *p, Error **errp)
> {
>     uint32_t flags;
> 
>     if (migrate_mapped_ram()) {
>         return multifd_file_recv_data(p, errp);
>     }
> 
>     flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
> 
>     if (flags != MULTIFD_FLAG_NOCOMP) {
>         error_setg(errp, "multifd %u: flags received %x flags expected %x",
>                    p->id, flags, MULTIFD_FLAG_NOCOMP);
>         return -1;
>     }
> 
> +    if (migration_incoming_colo_enabled() && migration_incoming_in_colo_state()) {
> +        p->host = p->block->colo_cache;
> +    } // or else{}, depending on how deal with zero pages in the cache
> 
>     multifd_recv_zero_page_process(p);
> 
>     if (!p->normal_num) {
>         return 0;
>     }
> 
>     for (int i = 0; i < p->normal_num; i++) {
>         p->iov[i].iov_base = p->host + p->normal[i];
>         p->iov[i].iov_len = multifd_ram_page_size();
>         ramblock_recv_bitmap_set_offset(p->block, p->normal[i]);
>     }
> +    ret = qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
> +    if (ret != 0) {
> +        return ret;
> +    }
> +
> +    if (migration_incoming_colo_enabled()) {
> +        multifd_colo_process_recv();
> +    }
> 
>     return ret;
> }
> 
> 
> > +    }
> > +}
> > diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
> > --- /dev/null
> > +++ b/migration/multifd-colo.h
> > @@ -0,0 +1,26 @@
> > +/*
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + *
> > + * multifd colo header
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
> > +#define QEMU_MIGRATION_MULTIFD_COLO_H
> > +
> > +#ifdef CONFIG_REPLICATION
> > +
> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
> > +void multifd_colo_process_recv(MultiFDRecvParams *p);
> > +
> > +#else
> > +
> > +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
> > +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
> > +
> > +#endif
> > +#endif
> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> > index 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 100644
> > --- a/migration/multifd-nocomp.c
> > +++ b/migration/multifd-nocomp.c
> > @@ -16,6 +16,7 @@
> >  #include "file.h"
> >  #include "migration-stats.h"
> >  #include "multifd.h"
> > +#include "multifd-colo.h"
> >  #include "options.h"
> >  #include "migration.h"
> >  #include "qapi/error.h"
> > @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >          return -1;
> >      }
> >  
> > -    p->host = p->block->host;
> >      for (i = 0; i < p->normal_num; i++) {
> >          uint64_t offset = be64_to_cpu(packet->offset[i]);
> >  
> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >          p->zero[i] = offset;
> >      }
> >  
> > +    if (migrate_colo()) {
> > +        multifd_colo_prepare_recv(p);
> > +        assert(p->block->colo_cache);
> > +        p->host = p->block->colo_cache;  
> 
> Can't you just use p->block->colo_cache later? I don't see why p->host
> needs to be set beforehand even in the non-colo case.

We should not touch the guest ram directly while in colo state, since
the incoming guest is running and we either want to receive and apply a
whole checkpoint with all ram into colo cache and all device state,
or if anything goes wrong during checkpointing, keep the currently
running guest on the incoming side in pristine state.

I have written more about colo migration here:

https://lore.kernel.org/qemu-devel/20260117204913.584e1829@penguin/
https://lore.kernel.org/qemu-devel/aXE1i9xJ81EWokYz@x1.local/

> 
> > +    } else {
> > +        p->host = p->block->host;
> > +    }
> > +
> >      return 0;
> >  }
> >  
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 332e6fc58053462419f3171f6c320ac37648ef7b..220ed8564960fdabc58e4baa069dd252c8ad293c 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -29,6 +29,7 @@
> >  #include "qemu-file.h"
> >  #include "trace.h"
> >  #include "multifd.h"
> > +#include "multifd-colo.h"
> >  #include "options.h"
> >  #include "qemu/yank.h"
> >  #include "io/channel-file.h"
> > @@ -1258,6 +1259,13 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
> >      int ret;
> >  
> >      ret = multifd_recv_state->ops->recv(p, errp);
> > +    if (ret != 0) {
> > +        return ret;
> > +    }
> > +
> > +    if (migrate_colo()) {
> > +        multifd_colo_process_recv(p);
> > +    }  
> 
> Either put all of colo hooks in multifd.c or multifd-nocomp.c. I think
> the latter is more appropriate as we have mapped_ram already in
> there. Let's drop patch 3 and put this in multifd_nocomp_recv().

Again, it also should work with compression and multifd_nocomp_recv()
is for nocomp only.

> 
> >  
> >      return ret;
> >  }
> > diff --git a/migration/multifd.h b/migration/multifd.h
> > index 89a395aef2b09a6762c45b5361e0ab63256feff6..fbc35702b062fdc3213ce92baed35994f5967c2b 100644
> > --- a/migration/multifd.h
> > +++ b/migration/multifd.h
> > @@ -279,7 +279,10 @@ typedef struct {
> >      uint64_t packets_recved;
> >      /* ramblock */
> >      RAMBlock *block;
> > -    /* ramblock host address */
> > +    /*
> > +     * Normally, it points to ramblock's host address.  When COLO
> > +     * is enabled, it points to the mirror cache for the ramblock.
> > +     */
> >      uint8_t *host;
> >      /* buffers to recv */
> >      struct iovec *iov;  


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] multifd: Add COLO support
  2026-01-26 19:33     ` Lukas Straub
@ 2026-01-26 21:37       ` Fabiano Rosas
  2026-01-27 20:36         ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Fabiano Rosas @ 2026-01-26 21:37 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Peter Xu, Laurent Vivier, Paolo Bonzini, Zhang Chen,
	Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Juan Quintela

Lukas Straub <lukasstraub2@web.de> writes:

>> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
>> > +{
>> > +    /*
>> > +     * While we're still in precopy state (not yet in colo state), we copy
>> > +     * received pages to both guest and cache. No need to set dirty bits,
>> > +     * since guest and cache memory are in sync.
>> > +     */
>> > +    if (migration_incoming_in_colo_state()) {  
>> 
>> What's the relationship between migration_incoming_in_colo_state() and
>> migration_incoming_colo_enabled()? ram_load_precopy() checks both. Would
>> migration_incoming_colo_enabled affect multifd as well?
>
> So first off migration_incoming_colo_enabled() and migrate_colo()
> are equivalent in practice since
> 121ccedc2b migration: block incoming colo when capability is disabled
>
> (I have some cleanup patches lying around, but that will be for later)
>

Ok, I think those are important because when having multifd and
non-multifd code for the same feature, it's useful to be able to compare
the two. So some degree of uniformity would be nice.

> For colo, we do normal precopy migration and at the end we go into colo
> state and then
> migration_incoming_in_colo_state() will be true.
>
> Here we check migrate_colo() outside of these functions as Peter
> requested that in a previous version.
>
>> 
>> The multifd recv threads will be running until after
>> process_incoming_migration_bh(), which is when
>> migration_incoming_disable_colo() runs.
>
> That is not an issue as we use migrate_colo() here.
>
>> 
>> Also, is the colo_cache guaranteed to be around until multifd threads
>> exit?
>
> This is an issue. I will fix it in the next version.
>
>>
>> > +        colo_record_bitmap(p->block, p->normal, p->normal_num);
>> > +        colo_record_bitmap(p->block, p->zero, p->zero_num);
>> > +    }
>> > +}
>> > +
>> > +void multifd_colo_process_recv(MultiFDRecvParams *p)
>> > +{
>> > +    if (!migration_incoming_in_colo_state()) {
>> > +        for (int i = 0; i < p->normal_num; i++) {
>> > +            void *guest = p->block->host + p->normal[i];
>> > +            void *cache = p->host + p->normal[i];
>> > +            memcpy(guest, cache, multifd_ram_page_size());
>> > +        }  
>> 
>> I see some differences between what ram.c does and what multifd will do
>> after this patch regarding which flags are checked and order of copies
>> (code below):
>> 
>> ram.c:
>> 
>>   - migration_incoming_colo_enabled && migration_incoming_in_colo_state:
>>   Reads from stream into colo_cache.
>>   
>>   - migration_incoming_colo_enabled && !migration_incoming_in_colo_state:
>>   Reads from stream into guest and then memcpy into colo_cache.
>> 
>>   - !migration_incoming_colo_enabled
>>   Reads from stream into guest.
>> 
>> multifd.c:
>> 
>>   - migrate_colo:
>>   Reads from stream into colo_cache.
>>   
>>   - !migration_incoming_in_colo_state:
>>   memcpy from colo_cache into guest.
>> 
>>   - !migration_incoming_colo_enabled
>>   ???
>> 
>> The resulting state should be the same, but I wonder if we want to i) use
>> the same checks in multifd
>
> migration_incoming_colo_enabled() shouldn't even exist anymore, so I'm
> not using it here. migrate_colo() is much easier to reason about.
>
>> and ii) when not in colo state, copy first
>> into guest (using readv) and later memcpy into the colo_cache.
>
> I think it is easier the way it is now.
>
>> 
>> ---
>> ram.c:
>> 
>> host = host_from_ram_block_offset(block, addr);
>> if (migration_incoming_colo_enabled()) {
>>     if (migration_incoming_in_colo_state()) {
>>         host = colo_cache_from_block_offset(block, addr, true);
>>     } else {
>>         host_bak = colo_cache_from_block_offset(block, addr, false);
>>     }
>> }
>> qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
>> if (host_bak) {
>>     memcpy(host_bak, host, TARGET_PAGE_SIZE);
>> }
>> 
>> multifd:
>> 
>> if (migrate_colo()) {
>>     p->host = p->block->colo_cache;
>> }
>> 
>> for (int i = 0; i < p->normal_num; i++) {
>>     p->iov[i].iov_base = p->host + p->normal[i];
>> }
>> return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
>> 
>> if (!migration_incoming_in_colo_state()) {
>>     for (int i = 0; i < p->normal_num; i++) {
>>         void *guest = p->block->host + p->normal[i];
>>         void *cache = p->host + p->normal[i];
>>         memcpy(guest, cache, multifd_ram_page_size());
>>     }
>> }
>> ---
>> 
>> > +        for (int i = 0; i < p->zero_num; i++) {
>> > +            void *guest = p->block->host + p->zero[i];
>> > +            memset(guest, 0, multifd_ram_page_size());
>> > +        }  
>> 
>> At multifd_nocomp_recv, there will be a call to
>> multifd_recv_zero_page_process(), which by that point will have p->host
>> == p->block->colo_cache, so it looks like that function will do some
>> zero page processing in the colo_cache, setting the rb->receivedmap for
>> pages in the colo_cache and potentially also doing a memcpy. Is this
>> intended?
>
> rb->receivedmap is only for postcopy, right? So it doesn't apply with
> colo.
>

It's not anymore since commit 5ef7e26bdb ("migration/multifd: solve zero
page causing multiple page faults"). So it seems we might be doing extra
work on top of the colo_cache.

>> 
>> I'm thinking that maybe it would overall be better to hook colo directly
>> in to multifd_nocomp_recv:
>
> But then it will only work for nocomp, right? It feels like the wrong
> level of abstraction to me.
>

Ah, nocomp != ram indeed.

>> 
>> static int multifd_nocomp_recv(MultiFDRecvParams *p, Error **errp)
>> {
>>     uint32_t flags;
>> 
>>     if (migrate_mapped_ram()) {
>>         return multifd_file_recv_data(p, errp);
>>     }
>> 
>>     flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
>> 
>>     if (flags != MULTIFD_FLAG_NOCOMP) {
>>         error_setg(errp, "multifd %u: flags received %x flags expected %x",
>>                    p->id, flags, MULTIFD_FLAG_NOCOMP);
>>         return -1;
>>     }
>> 
>> +    if (migration_incoming_colo_enabled() && migration_incoming_in_colo_state()) {
>> +        p->host = p->block->colo_cache;
>> +    } // or else{}, depending on how deal with zero pages in the cache
>> 
>>     multifd_recv_zero_page_process(p);
>> 
>>     if (!p->normal_num) {
>>         return 0;
>>     }
>> 
>>     for (int i = 0; i < p->normal_num; i++) {
>>         p->iov[i].iov_base = p->host + p->normal[i];
>>         p->iov[i].iov_len = multifd_ram_page_size();
>>         ramblock_recv_bitmap_set_offset(p->block, p->normal[i]);
>>     }
>> +    ret = qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
>> +    if (ret != 0) {
>> +        return ret;
>> +    }
>> +
>> +    if (migration_incoming_colo_enabled()) {
>> +        multifd_colo_process_recv();
>> +    }
>> 
>>     return ret;
>> }
>> 
>> 
>> > +    }
>> > +}
>> > diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
>> > new file mode 100644
>> > index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
>> > --- /dev/null
>> > +++ b/migration/multifd-colo.h
>> > @@ -0,0 +1,26 @@
>> > +/*
>> > + * SPDX-License-Identifier: GPL-2.0-or-later
>> > + *
>> > + * multifd colo header
>> > + *
>> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
>> > + *
>> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> > + * See the COPYING file in the top-level directory.
>> > + */
>> > +
>> > +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
>> > +#define QEMU_MIGRATION_MULTIFD_COLO_H
>> > +
>> > +#ifdef CONFIG_REPLICATION
>> > +
>> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
>> > +void multifd_colo_process_recv(MultiFDRecvParams *p);
>> > +
>> > +#else
>> > +
>> > +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
>> > +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
>> > +
>> > +#endif
>> > +#endif
>> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
>> > index 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 100644
>> > --- a/migration/multifd-nocomp.c
>> > +++ b/migration/multifd-nocomp.c
>> > @@ -16,6 +16,7 @@
>> >  #include "file.h"
>> >  #include "migration-stats.h"
>> >  #include "multifd.h"
>> > +#include "multifd-colo.h"
>> >  #include "options.h"
>> >  #include "migration.h"
>> >  #include "qapi/error.h"
>> > @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
>> >          return -1;
>> >      }
>> >  
>> > -    p->host = p->block->host;
>> >      for (i = 0; i < p->normal_num; i++) {
>> >          uint64_t offset = be64_to_cpu(packet->offset[i]);
>> >  
>> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
>> >          p->zero[i] = offset;
>> >      }
>> >  
>> > +    if (migrate_colo()) {
>> > +        multifd_colo_prepare_recv(p);
>> > +        assert(p->block->colo_cache);
>> > +        p->host = p->block->colo_cache;  
>> 
>> Can't you just use p->block->colo_cache later? I don't see why p->host
>> needs to be set beforehand even in the non-colo case.
>
> We should not touch the guest ram directly while in colo state, since
> the incoming guest is running and we either want to receive and apply a
> whole checkpoint with all ram into colo cache and all device state,
> or if anything goes wrong during checkpointing, keep the currently
> running guest on the incoming side in pristine state.
>

I was asking about setting p->host at this specific point. I don't think
any of this fits the unfill function. However, I see those were
suggested by Peter so let's not go back and forth.

> I have written more about colo migration here:
>
> https://lore.kernel.org/qemu-devel/20260117204913.584e1829@penguin/
> https://lore.kernel.org/qemu-devel/aXE1i9xJ81EWokYz@x1.local/
>
>> 
>> > +    } else {
>> > +        p->host = p->block->host;
>> > +    }
>> > +
>> >      return 0;
>> >  }
>> >  
>> > diff --git a/migration/multifd.c b/migration/multifd.c
>> > index 332e6fc58053462419f3171f6c320ac37648ef7b..220ed8564960fdabc58e4baa069dd252c8ad293c 100644
>> > --- a/migration/multifd.c
>> > +++ b/migration/multifd.c
>> > @@ -29,6 +29,7 @@
>> >  #include "qemu-file.h"
>> >  #include "trace.h"
>> >  #include "multifd.h"
>> > +#include "multifd-colo.h"
>> >  #include "options.h"
>> >  #include "qemu/yank.h"
>> >  #include "io/channel-file.h"
>> > @@ -1258,6 +1259,13 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
>> >      int ret;
>> >  
>> >      ret = multifd_recv_state->ops->recv(p, errp);
>> > +    if (ret != 0) {
>> > +        return ret;
>> > +    }
>> > +
>> > +    if (migrate_colo()) {
>> > +        multifd_colo_process_recv(p);
>> > +    }  
>> 
>> Either put all of colo hooks in multifd.c or multifd-nocomp.c. I think
>> the latter is more appropriate as we have mapped_ram already in
>> there. Let's drop patch 3 and put this in multifd_nocomp_recv().
>
> Again, it also should work with compression and multifd_nocomp_recv()
> is for nocomp only.
>
>> 
>> >  
>> >      return ret;
>> >  }
>> > diff --git a/migration/multifd.h b/migration/multifd.h
>> > index 89a395aef2b09a6762c45b5361e0ab63256feff6..fbc35702b062fdc3213ce92baed35994f5967c2b 100644
>> > --- a/migration/multifd.h
>> > +++ b/migration/multifd.h
>> > @@ -279,7 +279,10 @@ typedef struct {
>> >      uint64_t packets_recved;
>> >      /* ramblock */
>> >      RAMBlock *block;
>> > -    /* ramblock host address */
>> > +    /*
>> > +     * Normally, it points to ramblock's host address.  When COLO
>> > +     * is enabled, it points to the mirror cache for the ramblock.
>> > +     */
>> >      uint8_t *host;
>> >      /* buffers to recv */
>> >      struct iovec *iov;  


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] multifd: Add COLO support
  2026-01-26 21:37       ` Fabiano Rosas
@ 2026-01-27 20:36         ` Peter Xu
  2026-01-28 12:30           ` Fabiano Rosas
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2026-01-27 20:36 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Lukas Straub, qemu-devel, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Juan Quintela

On Mon, Jan 26, 2026 at 06:37:31PM -0300, Fabiano Rosas wrote:
> Lukas Straub <lukasstraub2@web.de> writes:
> 
> >> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
> >> > +{
> >> > +    /*
> >> > +     * While we're still in precopy state (not yet in colo state), we copy
> >> > +     * received pages to both guest and cache. No need to set dirty bits,
> >> > +     * since guest and cache memory are in sync.
> >> > +     */
> >> > +    if (migration_incoming_in_colo_state()) {  
> >> 
> >> What's the relationship between migration_incoming_in_colo_state() and
> >> migration_incoming_colo_enabled()? ram_load_precopy() checks both. Would
> >> migration_incoming_colo_enabled affect multifd as well?
> >
> > So first off migration_incoming_colo_enabled() and migrate_colo()
> > are equivalent in practice since
> > 121ccedc2b migration: block incoming colo when capability is disabled
> >
> > (I have some cleanup patches lying around, but that will be for later)
> >
> 
> Ok, I think those are important because when having multifd and
> non-multifd code for the same feature, it's useful to be able to compare
> the two. So some degree of uniformity would be nice.

I second.  We can drop those in this series before adding multifd support,
likely together with MIG_CMD_ENABLE_COLO as well; I don't think COLO needs
to worry about old binaries.  It should always use the same QEMU binary on
both sides.

The patch needs to rename MIG_CMD_ENABLE_COLO to MIG_CMD_DEPRECATED_0 or
something, to make the rest MIG_CMD compatible to old binaries, though.

> 
> > For colo, we do normal precopy migration and at the end we go into colo
> > state and then
> > migration_incoming_in_colo_state() will be true.
> >
> > Here we check migrate_colo() outside of these functions as Peter
> > requested that in a previous version.
> >
> >> 
> >> The multifd recv threads will be running until after
> >> process_incoming_migration_bh(), which is when
> >> migration_incoming_disable_colo() runs.
> >
> > That is not an issue as we use migrate_colo() here.
> >
> >> 
> >> Also, is the colo_cache guaranteed to be around until multifd threads
> >> exit?
> >
> > This is an issue. I will fix it in the next version.
> >
> >>
> >> > +        colo_record_bitmap(p->block, p->normal, p->normal_num);
> >> > +        colo_record_bitmap(p->block, p->zero, p->zero_num);
> >> > +    }
> >> > +}
> >> > +
> >> > +void multifd_colo_process_recv(MultiFDRecvParams *p)
> >> > +{
> >> > +    if (!migration_incoming_in_colo_state()) {
> >> > +        for (int i = 0; i < p->normal_num; i++) {
> >> > +            void *guest = p->block->host + p->normal[i];
> >> > +            void *cache = p->host + p->normal[i];
> >> > +            memcpy(guest, cache, multifd_ram_page_size());
> >> > +        }  
> >> 
> >> I see some differences between what ram.c does and what multifd will do
> >> after this patch regarding which flags are checked and order of copies
> >> (code below):
> >> 
> >> ram.c:
> >> 
> >>   - migration_incoming_colo_enabled && migration_incoming_in_colo_state:
> >>   Reads from stream into colo_cache.
> >>   
> >>   - migration_incoming_colo_enabled && !migration_incoming_in_colo_state:
> >>   Reads from stream into guest and then memcpy into colo_cache.
> >> 
> >>   - !migration_incoming_colo_enabled
> >>   Reads from stream into guest.
> >> 
> >> multifd.c:
> >> 
> >>   - migrate_colo:
> >>   Reads from stream into colo_cache.
> >>   
> >>   - !migration_incoming_in_colo_state:
> >>   memcpy from colo_cache into guest.
> >> 
> >>   - !migration_incoming_colo_enabled
> >>   ???
> >> 
> >> The resulting state should be the same, but I wonder if we want to i) use
> >> the same checks in multifd
> >
> > migration_incoming_colo_enabled() shouldn't even exist anymore, so I'm
> > not using it here. migrate_colo() is much easier to reason about.
> >
> >> and ii) when not in colo state, copy first
> >> into guest (using readv) and later memcpy into the colo_cache.
> >
> > I think it is easier the way it is now.
> >
> >> 
> >> ---
> >> ram.c:
> >> 
> >> host = host_from_ram_block_offset(block, addr);
> >> if (migration_incoming_colo_enabled()) {
> >>     if (migration_incoming_in_colo_state()) {
> >>         host = colo_cache_from_block_offset(block, addr, true);
> >>     } else {
> >>         host_bak = colo_cache_from_block_offset(block, addr, false);
> >>     }
> >> }
> >> qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> >> if (host_bak) {
> >>     memcpy(host_bak, host, TARGET_PAGE_SIZE);
> >> }
> >> 
> >> multifd:
> >> 
> >> if (migrate_colo()) {
> >>     p->host = p->block->colo_cache;
> >> }
> >> 
> >> for (int i = 0; i < p->normal_num; i++) {
> >>     p->iov[i].iov_base = p->host + p->normal[i];
> >> }
> >> return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
> >> 
> >> if (!migration_incoming_in_colo_state()) {
> >>     for (int i = 0; i < p->normal_num; i++) {
> >>         void *guest = p->block->host + p->normal[i];
> >>         void *cache = p->host + p->normal[i];
> >>         memcpy(guest, cache, multifd_ram_page_size());
> >>     }
> >> }
> >> ---
> >> 
> >> > +        for (int i = 0; i < p->zero_num; i++) {
> >> > +            void *guest = p->block->host + p->zero[i];
> >> > +            memset(guest, 0, multifd_ram_page_size());
> >> > +        }  
> >> 
> >> At multifd_nocomp_recv, there will be a call to
> >> multifd_recv_zero_page_process(), which by that point will have p->host
> >> == p->block->colo_cache, so it looks like that function will do some
> >> zero page processing in the colo_cache, setting the rb->receivedmap for
> >> pages in the colo_cache and potentially also doing a memcpy. Is this
> >> intended?
> >
> > rb->receivedmap is only for postcopy, right? So it doesn't apply with
> > colo.
> >
> 
> It's not anymore since commit 5ef7e26bdb ("migration/multifd: solve zero
> page causing multiple page faults"). So it seems we might be doing extra
> work on top of the colo_cache.

IIUC not extra, but exactly what will be needed.

The logic was about "in a vanilla precopy, if we see one page arriving the
1st time we don't need to zero the buffer because the buffer should be zero
allocated".

In COLO's case, COLO always puts RAM data into colo_cache, hence it should
apply to colo_cache too, avoiding unnecessary memset() for colo_cache
instead.

E.g. colo_cache is allocated from qemu_anon_ram_alloc(), it's also
guaranteed to be zeros when never touched.

> 
> >> 
> >> I'm thinking that maybe it would overall be better to hook colo directly
> >> in to multifd_nocomp_recv:
> >
> > But then it will only work for nocomp, right? It feels like the wrong
> > level of abstraction to me.
> >
> 
> Ah, nocomp != ram indeed.
> 
> >> 
> >> static int multifd_nocomp_recv(MultiFDRecvParams *p, Error **errp)
> >> {
> >>     uint32_t flags;
> >> 
> >>     if (migrate_mapped_ram()) {
> >>         return multifd_file_recv_data(p, errp);
> >>     }
> >> 
> >>     flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
> >> 
> >>     if (flags != MULTIFD_FLAG_NOCOMP) {
> >>         error_setg(errp, "multifd %u: flags received %x flags expected %x",
> >>                    p->id, flags, MULTIFD_FLAG_NOCOMP);
> >>         return -1;
> >>     }
> >> 
> >> +    if (migration_incoming_colo_enabled() && migration_incoming_in_colo_state()) {
> >> +        p->host = p->block->colo_cache;
> >> +    } // or else{}, depending on how deal with zero pages in the cache
> >> 
> >>     multifd_recv_zero_page_process(p);
> >> 
> >>     if (!p->normal_num) {
> >>         return 0;
> >>     }
> >> 
> >>     for (int i = 0; i < p->normal_num; i++) {
> >>         p->iov[i].iov_base = p->host + p->normal[i];
> >>         p->iov[i].iov_len = multifd_ram_page_size();
> >>         ramblock_recv_bitmap_set_offset(p->block, p->normal[i]);
> >>     }
> >> +    ret = qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
> >> +    if (ret != 0) {
> >> +        return ret;
> >> +    }
> >> +
> >> +    if (migration_incoming_colo_enabled()) {
> >> +        multifd_colo_process_recv();
> >> +    }
> >> 
> >>     return ret;
> >> }
> >> 
> >> 
> >> > +    }
> >> > +}
> >> > diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
> >> > new file mode 100644
> >> > index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
> >> > --- /dev/null
> >> > +++ b/migration/multifd-colo.h
> >> > @@ -0,0 +1,26 @@
> >> > +/*
> >> > + * SPDX-License-Identifier: GPL-2.0-or-later
> >> > + *
> >> > + * multifd colo header
> >> > + *
> >> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> >> > + *
> >> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> >> > + * See the COPYING file in the top-level directory.
> >> > + */
> >> > +
> >> > +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
> >> > +#define QEMU_MIGRATION_MULTIFD_COLO_H
> >> > +
> >> > +#ifdef CONFIG_REPLICATION
> >> > +
> >> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
> >> > +void multifd_colo_process_recv(MultiFDRecvParams *p);
> >> > +
> >> > +#else
> >> > +
> >> > +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
> >> > +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
> >> > +
> >> > +#endif
> >> > +#endif
> >> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> >> > index 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 100644
> >> > --- a/migration/multifd-nocomp.c
> >> > +++ b/migration/multifd-nocomp.c
> >> > @@ -16,6 +16,7 @@
> >> >  #include "file.h"
> >> >  #include "migration-stats.h"
> >> >  #include "multifd.h"
> >> > +#include "multifd-colo.h"
> >> >  #include "options.h"
> >> >  #include "migration.h"
> >> >  #include "qapi/error.h"
> >> > @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >> >          return -1;
> >> >      }
> >> >  
> >> > -    p->host = p->block->host;
> >> >      for (i = 0; i < p->normal_num; i++) {
> >> >          uint64_t offset = be64_to_cpu(packet->offset[i]);
> >> >  
> >> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >> >          p->zero[i] = offset;
> >> >      }
> >> >  
> >> > +    if (migrate_colo()) {
> >> > +        multifd_colo_prepare_recv(p);
> >> > +        assert(p->block->colo_cache);
> >> > +        p->host = p->block->colo_cache;  
> >> 
> >> Can't you just use p->block->colo_cache later? I don't see why p->host
> >> needs to be set beforehand even in the non-colo case.
> >
> > We should not touch the guest ram directly while in colo state, since
> > the incoming guest is running and we either want to receive and apply a
> > whole checkpoint with all ram into colo cache and all device state,
> > or if anything goes wrong during checkpointing, keep the currently
> > running guest on the incoming side in pristine state.
> >
> 
> I was asking about setting p->host at this specific point. I don't think
> any of this fits the unfill function. However, I see those were
> suggested by Peter so let's not go back and forth.

Actually I don't know why p->host existed before this work; IIUC we could
have always used p->block->host.  Maybe when Juan was developing this Juan
kept COLO in mind; or maybe Juan wanted to avoid frequent p->block pointer
reference.

IIUC, we could remove p->host, but when we need to access "the buffer of
the ramblock" we'll need to call a helper to fetch that (either ramblock's
buffer, or colo_cache, per migrate_colo()).  And it might be slightly
slower than p->host indeed.

> 
> > I have written more about colo migration here:
> >
> > https://lore.kernel.org/qemu-devel/20260117204913.584e1829@penguin/
> > https://lore.kernel.org/qemu-devel/aXE1i9xJ81EWokYz@x1.local/
> >
> >> 
> >> > +    } else {
> >> > +        p->host = p->block->host;
> >> > +    }
> >> > +
> >> >      return 0;
> >> >  }
> >> >  
> >> > diff --git a/migration/multifd.c b/migration/multifd.c
> >> > index 332e6fc58053462419f3171f6c320ac37648ef7b..220ed8564960fdabc58e4baa069dd252c8ad293c 100644
> >> > --- a/migration/multifd.c
> >> > +++ b/migration/multifd.c
> >> > @@ -29,6 +29,7 @@
> >> >  #include "qemu-file.h"
> >> >  #include "trace.h"
> >> >  #include "multifd.h"
> >> > +#include "multifd-colo.h"
> >> >  #include "options.h"
> >> >  #include "qemu/yank.h"
> >> >  #include "io/channel-file.h"
> >> > @@ -1258,6 +1259,13 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
> >> >      int ret;
> >> >  
> >> >      ret = multifd_recv_state->ops->recv(p, errp);
> >> > +    if (ret != 0) {
> >> > +        return ret;
> >> > +    }
> >> > +
> >> > +    if (migrate_colo()) {
> >> > +        multifd_colo_process_recv(p);
> >> > +    }  
> >> 
> >> Either put all of colo hooks in multifd.c or multifd-nocomp.c. I think
> >> the latter is more appropriate as we have mapped_ram already in
> >> there. Let's drop patch 3 and put this in multifd_nocomp_recv().
> >
> > Again, it also should work with compression and multifd_nocomp_recv()
> > is for nocomp only.
> >
> >> 
> >> >  
> >> >      return ret;
> >> >  }
> >> > diff --git a/migration/multifd.h b/migration/multifd.h
> >> > index 89a395aef2b09a6762c45b5361e0ab63256feff6..fbc35702b062fdc3213ce92baed35994f5967c2b 100644
> >> > --- a/migration/multifd.h
> >> > +++ b/migration/multifd.h
> >> > @@ -279,7 +279,10 @@ typedef struct {
> >> >      uint64_t packets_recved;
> >> >      /* ramblock */
> >> >      RAMBlock *block;
> >> > -    /* ramblock host address */
> >> > +    /*
> >> > +     * Normally, it points to ramblock's host address.  When COLO
> >> > +     * is enabled, it points to the mirror cache for the ramblock.
> >> > +     */
> >> >      uint8_t *host;
> >> >      /* buffers to recv */
> >> >      struct iovec *iov;  
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 05/10] colo: Fix crash during device vmstate load
  2026-01-25 20:40 ` [PATCH v3 05/10] colo: Fix crash during device vmstate load Lukas Straub
@ 2026-01-27 20:38   ` Peter Xu
  2026-01-30 12:49     ` Lukas Straub
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2026-01-27 20:38 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

On Sun, Jan 25, 2026 at 09:40:10PM +0100, Lukas Straub wrote:
> With colo we load device vmstate during each checkpoint, on top of
> a vm that was already running. Some devices expect a reset before
> loading vmstate on such a previously running vm.
> 
> This fixes a crash when using COLO with Q35 machine.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>

Yes makes sense, maybe you can add some comments into the code too since
this was overlooked before,

Reviewed-by: Peter Xu <peterx@redhat.com>

Have you tried to measure how many overheads will this introduce to loading
each snapshot?

> ---
>  migration/colo.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index db783f6fa77500386d923dd97e522883027e71d8..627b3706687036554eda3909b4194116a7640493 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -727,6 +727,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
>  
>      bql_lock();
>      vmstate_loading = true;
> +    qemu_system_reset(SHUTDOWN_CAUSE_SNAPSHOT_LOAD);
>      colo_flush_ram_cache();
>      ret = qemu_load_device_state(fb, errp);
>      if (ret < 0) {
> 
> -- 
> 2.39.5
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/10] migration-test: Add COLO migration unit test
  2026-01-25 20:40 ` [PATCH v3 06/10] migration-test: Add COLO migration unit test Lukas Straub
  2026-01-26 14:40   ` Fabiano Rosas
@ 2026-01-27 20:49   ` Peter Xu
  2026-01-30 10:24     ` Lukas Straub
  2026-01-28 12:32   ` Fabiano Rosas
  2 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2026-01-27 20:49 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

On Sun, Jan 25, 2026 at 09:40:11PM +0100, Lukas Straub wrote:
> +void migration_test_add_colo(MigrationTestEnv *env)
> +{
> +    if (!env->has_kvm) {
> +        g_test_skip("COLO requires KVM accelerator");
> +        return;
> +    }

I'm OK if you want to explicitly bypass others, but could you explanation
why?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] multifd: Add COLO support
  2026-01-27 20:36         ` Peter Xu
@ 2026-01-28 12:30           ` Fabiano Rosas
  2026-01-28 14:09             ` Peter Xu
  2026-02-03  9:47             ` Lukas Straub
  0 siblings, 2 replies; 37+ messages in thread
From: Fabiano Rosas @ 2026-01-28 12:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: Lukas Straub, qemu-devel, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Juan Quintela

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jan 26, 2026 at 06:37:31PM -0300, Fabiano Rosas wrote:
>> Lukas Straub <lukasstraub2@web.de> writes:
>> 
>> >> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
>> >> > +{
>> >> > +    /*
>> >> > +     * While we're still in precopy state (not yet in colo state), we copy
>> >> > +     * received pages to both guest and cache. No need to set dirty bits,
>> >> > +     * since guest and cache memory are in sync.
>> >> > +     */
>> >> > +    if (migration_incoming_in_colo_state()) {  
>> >> 
>> >> What's the relationship between migration_incoming_in_colo_state() and
>> >> migration_incoming_colo_enabled()? ram_load_precopy() checks both. Would
>> >> migration_incoming_colo_enabled affect multifd as well?
>> >
>> > So first off migration_incoming_colo_enabled() and migrate_colo()
>> > are equivalent in practice since
>> > 121ccedc2b migration: block incoming colo when capability is disabled
>> >
>> > (I have some cleanup patches lying around, but that will be for later)
>> >
>> 
>> Ok, I think those are important because when having multifd and
>> non-multifd code for the same feature, it's useful to be able to compare
>> the two. So some degree of uniformity would be nice.
>
> I second.  We can drop those in this series before adding multifd support,
> likely together with MIG_CMD_ENABLE_COLO as well; I don't think COLO needs
> to worry about old binaries.  It should always use the same QEMU binary on
> both sides.
>
> The patch needs to rename MIG_CMD_ENABLE_COLO to MIG_CMD_DEPRECATED_0 or
> something, to make the rest MIG_CMD compatible to old binaries, though.
>
>> 
>> > For colo, we do normal precopy migration and at the end we go into colo
>> > state and then
>> > migration_incoming_in_colo_state() will be true.
>> >
>> > Here we check migrate_colo() outside of these functions as Peter
>> > requested that in a previous version.
>> >
>> >> 
>> >> The multifd recv threads will be running until after
>> >> process_incoming_migration_bh(), which is when
>> >> migration_incoming_disable_colo() runs.
>> >
>> > That is not an issue as we use migrate_colo() here.
>> >
>> >> 
>> >> Also, is the colo_cache guaranteed to be around until multifd threads
>> >> exit?
>> >
>> > This is an issue. I will fix it in the next version.
>> >
>> >>
>> >> > +        colo_record_bitmap(p->block, p->normal, p->normal_num);
>> >> > +        colo_record_bitmap(p->block, p->zero, p->zero_num);
>> >> > +    }
>> >> > +}
>> >> > +
>> >> > +void multifd_colo_process_recv(MultiFDRecvParams *p)
>> >> > +{
>> >> > +    if (!migration_incoming_in_colo_state()) {
>> >> > +        for (int i = 0; i < p->normal_num; i++) {
>> >> > +            void *guest = p->block->host + p->normal[i];
>> >> > +            void *cache = p->host + p->normal[i];
>> >> > +            memcpy(guest, cache, multifd_ram_page_size());
>> >> > +        }  
>> >> 
>> >> I see some differences between what ram.c does and what multifd will do
>> >> after this patch regarding which flags are checked and order of copies
>> >> (code below):
>> >> 
>> >> ram.c:
>> >> 
>> >>   - migration_incoming_colo_enabled && migration_incoming_in_colo_state:
>> >>   Reads from stream into colo_cache.
>> >>   
>> >>   - migration_incoming_colo_enabled && !migration_incoming_in_colo_state:
>> >>   Reads from stream into guest and then memcpy into colo_cache.
>> >> 
>> >>   - !migration_incoming_colo_enabled
>> >>   Reads from stream into guest.
>> >> 
>> >> multifd.c:
>> >> 
>> >>   - migrate_colo:
>> >>   Reads from stream into colo_cache.
>> >>   
>> >>   - !migration_incoming_in_colo_state:
>> >>   memcpy from colo_cache into guest.
>> >> 
>> >>   - !migration_incoming_colo_enabled
>> >>   ???
>> >> 
>> >> The resulting state should be the same, but I wonder if we want to i) use
>> >> the same checks in multifd
>> >
>> > migration_incoming_colo_enabled() shouldn't even exist anymore, so I'm
>> > not using it here. migrate_colo() is much easier to reason about.
>> >
>> >> and ii) when not in colo state, copy first
>> >> into guest (using readv) and later memcpy into the colo_cache.
>> >
>> > I think it is easier the way it is now.
>> >
>> >> 
>> >> ---
>> >> ram.c:
>> >> 
>> >> host = host_from_ram_block_offset(block, addr);
>> >> if (migration_incoming_colo_enabled()) {
>> >>     if (migration_incoming_in_colo_state()) {
>> >>         host = colo_cache_from_block_offset(block, addr, true);
>> >>     } else {
>> >>         host_bak = colo_cache_from_block_offset(block, addr, false);
>> >>     }
>> >> }
>> >> qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
>> >> if (host_bak) {
>> >>     memcpy(host_bak, host, TARGET_PAGE_SIZE);
>> >> }
>> >> 
>> >> multifd:
>> >> 
>> >> if (migrate_colo()) {
>> >>     p->host = p->block->colo_cache;
>> >> }
>> >> 
>> >> for (int i = 0; i < p->normal_num; i++) {
>> >>     p->iov[i].iov_base = p->host + p->normal[i];
>> >> }
>> >> return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
>> >> 
>> >> if (!migration_incoming_in_colo_state()) {
>> >>     for (int i = 0; i < p->normal_num; i++) {
>> >>         void *guest = p->block->host + p->normal[i];
>> >>         void *cache = p->host + p->normal[i];
>> >>         memcpy(guest, cache, multifd_ram_page_size());
>> >>     }
>> >> }
>> >> ---
>> >> 
>> >> > +        for (int i = 0; i < p->zero_num; i++) {
>> >> > +            void *guest = p->block->host + p->zero[i];
>> >> > +            memset(guest, 0, multifd_ram_page_size());
>> >> > +        }  
>> >> 
>> >> At multifd_nocomp_recv, there will be a call to
>> >> multifd_recv_zero_page_process(), which by that point will have p->host
>> >> == p->block->colo_cache, so it looks like that function will do some
>> >> zero page processing in the colo_cache, setting the rb->receivedmap for
>> >> pages in the colo_cache and potentially also doing a memcpy. Is this
>> >> intended?
>> >
>> > rb->receivedmap is only for postcopy, right? So it doesn't apply with
>> > colo.
>> >
>> 
>> It's not anymore since commit 5ef7e26bdb ("migration/multifd: solve zero
>> page causing multiple page faults"). So it seems we might be doing extra
>> work on top of the colo_cache.
>
> IIUC not extra, but exactly what will be needed.
>
> The logic was about "in a vanilla precopy, if we see one page arriving the
> 1st time we don't need to zero the buffer because the buffer should be zero
> allocated".
>
> In COLO's case, COLO always puts RAM data into colo_cache, hence it should
> apply to colo_cache too, avoiding unnecessary memset() for colo_cache
> instead.
>
> E.g. colo_cache is allocated from qemu_anon_ram_alloc(), it's also
> guaranteed to be zeros when never touched.
>
>> 
>> >> 
>> >> I'm thinking that maybe it would overall be better to hook colo directly
>> >> in to multifd_nocomp_recv:
>> >
>> > But then it will only work for nocomp, right? It feels like the wrong
>> > level of abstraction to me.
>> >
>> 
>> Ah, nocomp != ram indeed.
>> 
>> >> 
>> >> static int multifd_nocomp_recv(MultiFDRecvParams *p, Error **errp)
>> >> {
>> >>     uint32_t flags;
>> >> 
>> >>     if (migrate_mapped_ram()) {
>> >>         return multifd_file_recv_data(p, errp);
>> >>     }
>> >> 
>> >>     flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
>> >> 
>> >>     if (flags != MULTIFD_FLAG_NOCOMP) {
>> >>         error_setg(errp, "multifd %u: flags received %x flags expected %x",
>> >>                    p->id, flags, MULTIFD_FLAG_NOCOMP);
>> >>         return -1;
>> >>     }
>> >> 
>> >> +    if (migration_incoming_colo_enabled() && migration_incoming_in_colo_state()) {
>> >> +        p->host = p->block->colo_cache;
>> >> +    } // or else{}, depending on how deal with zero pages in the cache
>> >> 
>> >>     multifd_recv_zero_page_process(p);
>> >> 
>> >>     if (!p->normal_num) {
>> >>         return 0;
>> >>     }
>> >> 
>> >>     for (int i = 0; i < p->normal_num; i++) {
>> >>         p->iov[i].iov_base = p->host + p->normal[i];
>> >>         p->iov[i].iov_len = multifd_ram_page_size();
>> >>         ramblock_recv_bitmap_set_offset(p->block, p->normal[i]);
>> >>     }
>> >> +    ret = qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
>> >> +    if (ret != 0) {
>> >> +        return ret;
>> >> +    }
>> >> +
>> >> +    if (migration_incoming_colo_enabled()) {
>> >> +        multifd_colo_process_recv();
>> >> +    }
>> >> 
>> >>     return ret;
>> >> }
>> >> 
>> >> 
>> >> > +    }
>> >> > +}
>> >> > diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
>> >> > new file mode 100644
>> >> > index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
>> >> > --- /dev/null
>> >> > +++ b/migration/multifd-colo.h
>> >> > @@ -0,0 +1,26 @@
>> >> > +/*
>> >> > + * SPDX-License-Identifier: GPL-2.0-or-later
>> >> > + *
>> >> > + * multifd colo header
>> >> > + *
>> >> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
>> >> > + *
>> >> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> >> > + * See the COPYING file in the top-level directory.
>> >> > + */
>> >> > +
>> >> > +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
>> >> > +#define QEMU_MIGRATION_MULTIFD_COLO_H
>> >> > +
>> >> > +#ifdef CONFIG_REPLICATION
>> >> > +
>> >> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
>> >> > +void multifd_colo_process_recv(MultiFDRecvParams *p);
>> >> > +
>> >> > +#else
>> >> > +
>> >> > +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
>> >> > +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
>> >> > +
>> >> > +#endif
>> >> > +#endif
>> >> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
>> >> > index 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 100644
>> >> > --- a/migration/multifd-nocomp.c
>> >> > +++ b/migration/multifd-nocomp.c
>> >> > @@ -16,6 +16,7 @@
>> >> >  #include "file.h"
>> >> >  #include "migration-stats.h"
>> >> >  #include "multifd.h"
>> >> > +#include "multifd-colo.h"
>> >> >  #include "options.h"
>> >> >  #include "migration.h"
>> >> >  #include "qapi/error.h"
>> >> > @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
>> >> >          return -1;
>> >> >      }
>> >> >  
>> >> > -    p->host = p->block->host;
>> >> >      for (i = 0; i < p->normal_num; i++) {
>> >> >          uint64_t offset = be64_to_cpu(packet->offset[i]);
>> >> >  
>> >> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
>> >> >          p->zero[i] = offset;
>> >> >      }
>> >> >  
>> >> > +    if (migrate_colo()) {
>> >> > +        multifd_colo_prepare_recv(p);
>> >> > +        assert(p->block->colo_cache);
>> >> > +        p->host = p->block->colo_cache;  
>> >> 
>> >> Can't you just use p->block->colo_cache later? I don't see why p->host
>> >> needs to be set beforehand even in the non-colo case.
>> >
>> > We should not touch the guest ram directly while in colo state, since
>> > the incoming guest is running and we either want to receive and apply a
>> > whole checkpoint with all ram into colo cache and all device state,
>> > or if anything goes wrong during checkpointing, keep the currently
>> > running guest on the incoming side in pristine state.
>> >
>> 
>> I was asking about setting p->host at this specific point. I don't think
>> any of this fits the unfill function. However, I see those were
>> suggested by Peter so let's not go back and forth.
>
> Actually I don't know why p->host existed before this work; IIUC we could
> have always used p->block->host.  Maybe when Juan was developing this Juan
> kept COLO in mind; or maybe Juan wanted to avoid frequent p->block pointer
> reference.
>

Maybe p->block was being reset at some point and p->host was passed
being the point where the (whatever) lock was release. I checked and
today there's no such thing. The p->mutex seems to be there just to
protect against this in multifd_recv_sync_main:

WITH_QEMU_LOCK_GUARD(&p->mutex) {
    if (multifd_recv_state->packet_num < p->packet_num) {
        multifd_recv_state->packet_num = p->packet_num;
    }
}

> IIUC, we could remove p->host, but when we need to access "the buffer of
> the ramblock" we'll need to call a helper to fetch that (either ramblock's
> buffer, or colo_cache, per migrate_colo()).  And it might be slightly
> slower than p->host indeed.
>

Yeah, let's keep it, the compression code also uses it, there's no point
removing it now.

>> 
>> > I have written more about colo migration here:
>> >
>> > https://lore.kernel.org/qemu-devel/20260117204913.584e1829@penguin/
>> > https://lore.kernel.org/qemu-devel/aXE1i9xJ81EWokYz@x1.local/
>> >
>> >> 
>> >> > +    } else {
>> >> > +        p->host = p->block->host;
>> >> > +    }
>> >> > +
>> >> >      return 0;
>> >> >  }
>> >> >  
>> >> > diff --git a/migration/multifd.c b/migration/multifd.c
>> >> > index 332e6fc58053462419f3171f6c320ac37648ef7b..220ed8564960fdabc58e4baa069dd252c8ad293c 100644
>> >> > --- a/migration/multifd.c
>> >> > +++ b/migration/multifd.c
>> >> > @@ -29,6 +29,7 @@
>> >> >  #include "qemu-file.h"
>> >> >  #include "trace.h"
>> >> >  #include "multifd.h"
>> >> > +#include "multifd-colo.h"
>> >> >  #include "options.h"
>> >> >  #include "qemu/yank.h"
>> >> >  #include "io/channel-file.h"
>> >> > @@ -1258,6 +1259,13 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
>> >> >      int ret;
>> >> >  
>> >> >      ret = multifd_recv_state->ops->recv(p, errp);
>> >> > +    if (ret != 0) {
>> >> > +        return ret;
>> >> > +    }
>> >> > +
>> >> > +    if (migrate_colo()) {
>> >> > +        multifd_colo_process_recv(p);
>> >> > +    }  
>> >> 
>> >> Either put all of colo hooks in multifd.c or multifd-nocomp.c. I think
>> >> the latter is more appropriate as we have mapped_ram already in
>> >> there. Let's drop patch 3 and put this in multifd_nocomp_recv().
>> >
>> > Again, it also should work with compression and multifd_nocomp_recv()
>> > is for nocomp only.
>> >
>> >> 
>> >> >  
>> >> >      return ret;
>> >> >  }
>> >> > diff --git a/migration/multifd.h b/migration/multifd.h
>> >> > index 89a395aef2b09a6762c45b5361e0ab63256feff6..fbc35702b062fdc3213ce92baed35994f5967c2b 100644
>> >> > --- a/migration/multifd.h
>> >> > +++ b/migration/multifd.h
>> >> > @@ -279,7 +279,10 @@ typedef struct {
>> >> >      uint64_t packets_recved;
>> >> >      /* ramblock */
>> >> >      RAMBlock *block;
>> >> > -    /* ramblock host address */
>> >> > +    /*
>> >> > +     * Normally, it points to ramblock's host address.  When COLO
>> >> > +     * is enabled, it points to the mirror cache for the ramblock.
>> >> > +     */
>> >> >      uint8_t *host;
>> >> >      /* buffers to recv */
>> >> >      struct iovec *iov;  
>> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/10] migration-test: Add COLO migration unit test
  2026-01-25 20:40 ` [PATCH v3 06/10] migration-test: Add COLO migration unit test Lukas Straub
  2026-01-26 14:40   ` Fabiano Rosas
  2026-01-27 20:49   ` Peter Xu
@ 2026-01-28 12:32   ` Fabiano Rosas
  2 siblings, 0 replies; 37+ messages in thread
From: Fabiano Rosas @ 2026-01-28 12:32 UTC (permalink / raw)
  To: Lukas Straub, qemu-devel
  Cc: Peter Xu, Laurent Vivier, Paolo Bonzini, Zhang Chen,
	Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

Lukas Straub <lukasstraub2@web.de> writes:

> Add a COLO migration test for COLO migration and failover.
>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  MAINTAINERS                        |   1 +
>  tests/qtest/meson.build            |   7 +-
>  tests/qtest/migration-test.c       |   1 +
>  tests/qtest/migration/colo-tests.c | 199 +++++++++++++++++++++++++++++++++++++
>  tests/qtest/migration/framework.h  |   5 +
>  5 files changed, 212 insertions(+), 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 883f0a8f4eb92d0bf0f89fcab4674ccc4aed1cc1..2a8b9b2d051883c1b7adce9c1afec80d16a317f8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3856,6 +3856,7 @@ F: migration/colo*
>  F: migration/multifd-colo.*
>  F: include/migration/colo.h
>  F: include/migration/failover.h
> +F: tests/qtest/migration/colo-tests.c
>  F: docs/COLO-FT.txt
>  
>  COLO Proxy
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index dfb83650c643d884daad53a66034ab7aa8c45509..624f7744ec9bd81c8823075b966bc95f7750a667 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -371,6 +371,11 @@ if gnutls.found()
>    endif
>  endif
>  
> +migration_colo_files = []
> +if get_option('replication').allowed()
> +  migration_colo_files = [files('migration/colo-tests.c')]
> +endif
> +
>  qtests = {
>    'aspeed_hace-test': files('aspeed-hace-utils.c', 'aspeed_hace-test.c'),
>    'aspeed_smc-test': files('aspeed-smc-utils.c', 'aspeed_smc-test.c'),
> @@ -382,7 +387,7 @@ qtests = {
>                               'migration/migration-util.c') + dbus_vmstate1,
>    'erst-test': files('erst-test.c'),
>    'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
> -  'migration-test': test_migration_files + migration_tls_files,
> +  'migration-test': test_migration_files + migration_tls_files + migration_colo_files,
>    'pxe-test': files('boot-sector.c'),
>    'pnv-xive2-test': files('pnv-xive2-common.c', 'pnv-xive2-flush-sync.c',
>                            'pnv-xive2-nvpg_bar.c'),
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 08936871741535c926eeac40a7d7c3f461c72fd0..e582f05c7dc2673dbd05a936df8feb6c964b5bbc 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -55,6 +55,7 @@ int main(int argc, char **argv)
>      migration_test_add_precopy(env);
>      migration_test_add_cpr(env);
>      migration_test_add_misc(env);
> +    migration_test_add_colo(env);
>  
>      ret = g_test_run();
>  
> diff --git a/tests/qtest/migration/colo-tests.c b/tests/qtest/migration/colo-tests.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..0586970e206f01ed6e7aa3429321aefc1de7be37
> --- /dev/null
> +++ b/tests/qtest/migration/colo-tests.c
> @@ -0,0 +1,199 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * QTest testcases for COLO migration
> + *
> + * Copyright (c) 2025 Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "libqtest.h"
> +#include "migration/framework.h"
> +#include "migration/migration-qmp.h"
> +#include "migration/migration-util.h"
> +#include "qemu/module.h"
> +
> +static int test_colo_common(MigrateCommon *args,
> +                            bool failover_during_checkpoint,
> +                            bool primary_failover)
> +{
> +    QTestState *from, *to;
> +    void *data_hook = NULL;
> +
> +    /*
> +     * For the COLO test, both VMs will run in parallel. Thus both VMs want to
> +     * open the image read/write at the same time. Using read-only=on is not
> +     * possible here, because ide-hd does not support read-only backing image.
> +     *
> +     * So use -snapshot, where each qemu instance creates its own writable
> +     * snapshot internally while leaving the real image read-only.
> +     */
> +    args->start.opts_source = "-snapshot";
> +    args->start.opts_target = "-snapshot";
> +
> +    /*
> +     * COLO migration code logs many errors when the migration socket
> +     * is shut down, these are expected so we hide them here.
> +     */
> +    args->start.hide_stderr = true;
> +
> +    args->start.oob = true;
> +    args->start.caps[MIGRATION_CAPABILITY_X_COLO] = true;
> +
> +    if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
> +        return -1;
> +    }
> +
> +    migrate_set_parameter_int(from, "x-checkpoint-delay", 300);
> +
> +    if (args->start_hook) {
> +        data_hook = args->start_hook(from, to);
> +    }
> +
> +    migrate_ensure_converge(from);
> +    wait_for_serial("src_serial");
> +
> +    migrate_qmp(from, to, args->connect_uri, NULL, "{}");
> +
> +    wait_for_migration_status(from, "colo", NULL);
> +    wait_for_resume(to, get_dst());
> +
> +    wait_for_serial("src_serial");
> +    wait_for_serial("dest_serial");
> +
> +    /* wait for 3 checkpoints */
> +    for (int i = 0; i < 3; i++) {
> +        qtest_qmp_eventwait(to, "RESUME");
> +        wait_for_serial("src_serial");
> +        wait_for_serial("dest_serial");
> +    }
> +
> +    if (failover_during_checkpoint) {
> +        qtest_qmp_eventwait(to, "STOP");
> +    }
> +    if (primary_failover) {
> +        qtest_qmp_assert_success(from, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
> +                                            "'arguments': {'instances':"
> +                                                "[{'type': 'migration'}]}}");
> +        qtest_qmp_assert_success(from, "{'execute': 'x-colo-lost-heartbeat'}");
> +        wait_for_serial("src_serial");
> +    } else {
> +        qtest_qmp_assert_success(to, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
> +                                        "'arguments': {'instances':"
> +                                            "[{'type': 'migration'}]}}");
> +        qtest_qmp_assert_success(to, "{'execute': 'x-colo-lost-heartbeat'}");
> +        wait_for_serial("dest_serial");
> +    }
> +
> +    if (args->end_hook) {
> +        args->end_hook(from, to, data_hook);
> +    }
> +
> +    migrate_end(from, to, !primary_failover);
> +
> +    return 0;
> +}
> +
> +static void test_colo_plain_common(MigrateCommon *args,
> +                                   bool failover_during_checkpoint,
> +                                   bool primary_failover)
> +{
> +    args->listen_uri = "tcp:127.0.0.1:0";
> +    test_colo_common(args, failover_during_checkpoint, primary_failover);
> +}
> +
> +static void *hook_start_multifd(QTestState *from, QTestState *to)
> +{
> +    return migrate_hook_start_precopy_tcp_multifd_common(from, to, "none");
> +}
> +
> +static void test_colo_multifd_common(MigrateCommon *args,
> +                                     bool failover_during_checkpoint,
> +                                     bool primary_failover)
> +{
> +    args->listen_uri = "defer";
> +    args->start_hook = hook_start_multifd;
> +    args->start.caps[MIGRATION_CAPABILITY_MULTIFD] = true;
> +    test_colo_common(args, failover_during_checkpoint, primary_failover);
> +}
> +
> +static void test_colo_plain_primary_failover(char *name, MigrateCommon *args)
> +{
> +    test_colo_plain_common(args, false, true);
> +}
> +
> +static void test_colo_plain_secondary_failover(char *name, MigrateCommon *args)
> +{
> +    test_colo_plain_common(args, false, false);
> +}
> +
> +static void test_colo_multifd_primary_failover(char *name, MigrateCommon *args)
> +{
> +    test_colo_multifd_common(args, false, true);
> +}
> +
> +static void test_colo_multifd_secondary_failover(char *name,
> +                                                 MigrateCommon *args)
> +{
> +    test_colo_multifd_common(args, false, false);
> +}
> +
> +static void test_colo_plain_primary_failover_checkpoint(char *name,
> +                                                        MigrateCommon *args)
> +{
> +    test_colo_plain_common(args, true, true);
> +}
> +
> +static void test_colo_plain_secondary_failover_checkpoint(char *name,
> +                                                          MigrateCommon *args)
> +{
> +    test_colo_plain_common(args, true, false);
> +}
> +
> +static void test_colo_multifd_primary_failover_checkpoint(char *name,
> +                                                          MigrateCommon *args)
> +{
> +    test_colo_multifd_common(args, true, true);
> +}
> +
> +static void test_colo_multifd_secondary_failover_checkpoint(char *name,
> +                                                            MigrateCommon *args)
> +{
> +    test_colo_multifd_common(args, true, false);
> +}
> +
> +void migration_test_add_colo(MigrationTestEnv *env)
> +{
> +    if (!env->has_kvm) {
> +        g_test_skip("COLO requires KVM accelerator");
> +        return;
> +    }
> +
> +    if (!env->full_set) {
> +        return;
> +    }
> +
> +    migration_test_add("/migration/colo/plain/primary_failover",
> +                       test_colo_plain_primary_failover);
> +    migration_test_add("/migration/colo/plain/secondary_failover",
> +                       test_colo_plain_secondary_failover);
> +
> +    migration_test_add("/migration/colo/multifd/primary_failover",
> +                       test_colo_multifd_primary_failover);
> +    migration_test_add("/migration/colo/multifd/secondary_failover",
> +                       test_colo_multifd_secondary_failover);
> +
> +    migration_test_add("/migration/colo/plain/primary_failover_checkpoint",
> +                       test_colo_plain_primary_failover_checkpoint);
> +    migration_test_add("/migration/colo/plain/secondary_failover_checkpoint",
> +                       test_colo_plain_secondary_failover_checkpoint);
> +
> +    migration_test_add("/migration/colo/multifd/primary_failover_checkpoint",
> +                       test_colo_multifd_primary_failover_checkpoint);
> +    migration_test_add("/migration/colo/multifd/secondary_failover_checkpoint",
> +                       test_colo_multifd_secondary_failover_checkpoint);
> +}
> diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
> index 40984d04930da2d181326d9f6a742bde49018103..80eef758932ce9c301ed6c0f6383d18756144870 100644
> --- a/tests/qtest/migration/framework.h
> +++ b/tests/qtest/migration/framework.h
> @@ -264,5 +264,10 @@ void migration_test_add_file(MigrationTestEnv *env);
>  void migration_test_add_precopy(MigrationTestEnv *env);
>  void migration_test_add_cpr(MigrationTestEnv *env);
>  void migration_test_add_misc(MigrationTestEnv *env);
> +#ifdef CONFIG_REPLICATION
> +void migration_test_add_colo(MigrationTestEnv *env);
> +#else
> +static inline void migration_test_add_colo(MigrationTestEnv *env) {};
> +#endif
>  
>  #endif /* TEST_FRAMEWORK_H */

It survived my stress run. It hit once the race at migration_shutdown()
where current_migration is already freed, but we can ignore that because
it's preexisting.

Tested-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] multifd: Add COLO support
  2026-01-28 12:30           ` Fabiano Rosas
@ 2026-01-28 14:09             ` Peter Xu
  2026-01-28 20:02               ` Fabiano Rosas
  2026-02-03  9:47             ` Lukas Straub
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Xu @ 2026-01-28 14:09 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Lukas Straub, qemu-devel, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Juan Quintela

On Wed, Jan 28, 2026 at 09:30:24AM -0300, Fabiano Rosas wrote:
> >> >> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >> >> >          p->zero[i] = offset;
> >> >> >      }
> >> >> >  
> >> >> > +    if (migrate_colo()) {
> >> >> > +        multifd_colo_prepare_recv(p);
> >> >> > +        assert(p->block->colo_cache);
> >> >> > +        p->host = p->block->colo_cache;  
> >> >> 
> >> >> Can't you just use p->block->colo_cache later? I don't see why p->host
> >> >> needs to be set beforehand even in the non-colo case.
> >> >
> >> > We should not touch the guest ram directly while in colo state, since
> >> > the incoming guest is running and we either want to receive and apply a
> >> > whole checkpoint with all ram into colo cache and all device state,
> >> > or if anything goes wrong during checkpointing, keep the currently
> >> > running guest on the incoming side in pristine state.
> >> >
> >> 
> >> I was asking about setting p->host at this specific point. I don't think
> >> any of this fits the unfill function. However, I see those were
> >> suggested by Peter so let's not go back and forth.
> >
> > Actually I don't know why p->host existed before this work; IIUC we could
> > have always used p->block->host.  Maybe when Juan was developing this Juan
> > kept COLO in mind; or maybe Juan wanted to avoid frequent p->block pointer
> > reference.
> >
> 
> Maybe p->block was being reset at some point and p->host was passed
> being the point where the (whatever) lock was release. I checked and
> today there's no such thing. The p->mutex seems to be there just to
> protect against this in multifd_recv_sync_main:
> 
> WITH_QEMU_LOCK_GUARD(&p->mutex) {
>     if (multifd_recv_state->packet_num < p->packet_num) {
>         multifd_recv_state->packet_num = p->packet_num;
>     }
> }

It should be protected by various checks over migration_is_running().

E.g., QMP device-add & device-del are forbidden so no new pc-dimm hotplug /
removal allowed.  Similarly, virtio_mem_is_busy() would return true during
migration too.

We should definitely make sure ramblock will not be reset during the whole
lifecycle of migration; I believe we're not ready for that..

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] multifd: Add COLO support
  2026-01-28 14:09             ` Peter Xu
@ 2026-01-28 20:02               ` Fabiano Rosas
  0 siblings, 0 replies; 37+ messages in thread
From: Fabiano Rosas @ 2026-01-28 20:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: Lukas Straub, qemu-devel, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Juan Quintela

Peter Xu <peterx@redhat.com> writes:

> On Wed, Jan 28, 2026 at 09:30:24AM -0300, Fabiano Rosas wrote:
>> >> >> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
>> >> >> >          p->zero[i] = offset;
>> >> >> >      }
>> >> >> >  
>> >> >> > +    if (migrate_colo()) {
>> >> >> > +        multifd_colo_prepare_recv(p);
>> >> >> > +        assert(p->block->colo_cache);
>> >> >> > +        p->host = p->block->colo_cache;  
>> >> >> 
>> >> >> Can't you just use p->block->colo_cache later? I don't see why p->host
>> >> >> needs to be set beforehand even in the non-colo case.
>> >> >
>> >> > We should not touch the guest ram directly while in colo state, since
>> >> > the incoming guest is running and we either want to receive and apply a
>> >> > whole checkpoint with all ram into colo cache and all device state,
>> >> > or if anything goes wrong during checkpointing, keep the currently
>> >> > running guest on the incoming side in pristine state.
>> >> >
>> >> 
>> >> I was asking about setting p->host at this specific point. I don't think
>> >> any of this fits the unfill function. However, I see those were
>> >> suggested by Peter so let's not go back and forth.
>> >
>> > Actually I don't know why p->host existed before this work; IIUC we could
>> > have always used p->block->host.  Maybe when Juan was developing this Juan
>> > kept COLO in mind; or maybe Juan wanted to avoid frequent p->block pointer
>> > reference.
>> >
>> 
>> Maybe p->block was being reset at some point and p->host was passed
>> being the point where the (whatever) lock was release. I checked and
>> today there's no such thing. The p->mutex seems to be there just to
>> protect against this in multifd_recv_sync_main:
>> 
>> WITH_QEMU_LOCK_GUARD(&p->mutex) {
>>     if (multifd_recv_state->packet_num < p->packet_num) {
>>         multifd_recv_state->packet_num = p->packet_num;
>>     }
>> }
>
> It should be protected by various checks over migration_is_running().
>
> E.g., QMP device-add & device-del are forbidden so no new pc-dimm hotplug /
> removal allowed.  Similarly, virtio_mem_is_busy() would return true during
> migration too.
>
> We should definitely make sure ramblock will not be reset during the whole
> lifecycle of migration; I believe we're not ready for that..

The pointer reset, not the block. Anyway, it doesn't happen.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/10] migration-test: Add COLO migration unit test
  2026-01-27 20:49   ` Peter Xu
@ 2026-01-30 10:24     ` Lukas Straub
  2026-02-02 14:26       ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Lukas Straub @ 2026-01-30 10:24 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 3323 bytes --]

On Tue, 27 Jan 2026 15:49:31 -0500
Peter Xu <peterx@redhat.com> wrote:

> On Sun, Jan 25, 2026 at 09:40:11PM +0100, Lukas Straub wrote:
> > +void migration_test_add_colo(MigrationTestEnv *env)
> > +{
> > +    if (!env->has_kvm) {
> > +        g_test_skip("COLO requires KVM accelerator");
> > +        return;
> > +    }  
> 
> I'm OK if you want to explicitly bypass others, but could you explanation
> why?
> 
> Thanks,
> 

It used to hang with TCG. Now it crashes, since
migration_bitmap_sync_precopy assumes bql is held. Something for later.

#6  0x00007ffff7471517 in __assert_fail
    (assertion=assertion@entry=0x555555f17aee "bql_locked() != locked", file=file@entry=0x555555f17ab0 "../system/cpus.c", line=line@entry=535, function=function@entry=0x55555609bfd0 <__PRETTY_FUNCTION__.9> "bql_update_status") at ./assert/assert.c:105
#7  0x0000555555b09f1e in bql_update_status (locked=locked@entry=false) at ../system/cpus.c:535
#8  0x0000555555ec60e7 in qemu_mutex_pre_unlock (mutex=0x555557166700 <bql>, file=0x555555efe1dc "../cpu-common.c", line=164) at ../util/qemu-thread-common.h:57
#9  qemu_mutex_pre_unlock (line=164, file=0x555555efe1dc "../cpu-common.c", mutex=0x555557166700 <bql>) at ../util/qemu-thread-common.h:48
#10 qemu_cond_wait_impl (cond=0x5555571442c0 <qemu_work_cond>, mutex=0x555557166700 <bql>, file=0x555555efe1dc "../cpu-common.c", line=164) at ../util/qemu-thread-posix.c:224
#11 0x000055555589e6c8 in do_run_on_cpu (cpu=<optimized out>, func=<optimized out>, data=..., mutex=0x555557166700 <bql>) at ../cpu-common.c:164
#12 0x0000555555b17a06 in memory_global_after_dirty_log_sync () at ../system/memory.c:2938
#13 0x0000555555b55b47 in migration_bitmap_sync (rs=0x7fffe8001340, last_stage=last_stage@entry=true) at ../migration/ram.c:1157
#14 0x0000555555b56721 in migration_bitmap_sync_precopy (last_stage=last_stage@entry=true) at ../migration/ram.c:1195
#15 0x0000555555b59f8a in ram_save_complete (f=0x5555575db620, opaque=<optimized out>) at ../migration/ram.c:3381
#16 0x0000555555b5e4f5 in qemu_savevm_complete (se=se@entry=0x5555574c0d80, f=f@entry=0x5555575db620) at ../migration/savevm.c:1521
#17 0x0000555555b60437 in qemu_savevm_state_complete_precopy_iterable (f=f@entry=0x5555575db620, in_postcopy=in_postcopy@entry=false) at ../migration/savevm.c:1627
#18 0x0000555555b60a4f in qemu_savevm_state_complete_precopy (iterable_only=true, f=0x5555575db620) at ../migration/savevm.c:1719
#19 qemu_savevm_live_state (f=0x5555575db620) at ../migration/savevm.c:1855
#20 0x0000555555b65ed9 in colo_do_checkpoint_transaction (fb=<optimized out>, bioc=<optimized out>, s=0x5555574c0070) at ../migration/colo.c:474
#21 colo_process_checkpoint (s=0x5555574c0070) at ../migration/colo.c:592
#22 migrate_start_colo_process (s=0x5555574c0070) at ../migration/colo.c:655
#23 0x0000555555b4971e in migration_iteration_finish (s=0x5555574c0070) at ../migration/migration.c:3297
#24 migration_thread (opaque=opaque@entry=0x5555574c0070) at ../migration/migration.c:3584
#25 0x0000555555ec58c0 in qemu_thread_start (args=0x5555576583e0) at ../util/qemu-thread-posix.c:393
#26 0x00007ffff74d2aa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#27 0x00007ffff755fc6c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 05/10] colo: Fix crash during device vmstate load
  2026-01-27 20:38   ` Peter Xu
@ 2026-01-30 12:49     ` Lukas Straub
  2026-02-02 14:12       ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Lukas Straub @ 2026-01-30 12:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 1483 bytes --]

On Tue, 27 Jan 2026 15:38:55 -0500
Peter Xu <peterx@redhat.com> wrote:

> On Sun, Jan 25, 2026 at 09:40:10PM +0100, Lukas Straub wrote:
> > With colo we load device vmstate during each checkpoint, on top of
> > a vm that was already running. Some devices expect a reset before
> > loading vmstate on such a previously running vm.
> > 
> > This fixes a crash when using COLO with Q35 machine.
> > 
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>  
> 
> Yes makes sense, maybe you can add some comments into the code too since
> this was overlooked before,
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> Have you tried to measure how many overheads will this introduce to loading
> each snapshot?

It's a large overhead actually, between 10-20 milliseconds.

Regards,
Lukas Straub

> 
> > ---
> >  migration/colo.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/migration/colo.c b/migration/colo.c
> > index db783f6fa77500386d923dd97e522883027e71d8..627b3706687036554eda3909b4194116a7640493 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -727,6 +727,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
> >  
> >      bql_lock();
> >      vmstate_loading = true;
> > +    qemu_system_reset(SHUTDOWN_CAUSE_SNAPSHOT_LOAD);
> >      colo_flush_ram_cache();
> >      ret = qemu_load_device_state(fb, errp);
> >      if (ret < 0) {
> > 
> > -- 
> > 2.39.5
> >   
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 05/10] colo: Fix crash during device vmstate load
  2026-01-30 12:49     ` Lukas Straub
@ 2026-02-02 14:12       ` Peter Xu
  2026-02-03  9:25         ` Lukas Straub
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2026-02-02 14:12 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

On Fri, Jan 30, 2026 at 01:49:42PM +0100, Lukas Straub wrote:
> On Tue, 27 Jan 2026 15:38:55 -0500
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Sun, Jan 25, 2026 at 09:40:10PM +0100, Lukas Straub wrote:
> > > With colo we load device vmstate during each checkpoint, on top of
> > > a vm that was already running. Some devices expect a reset before
> > > loading vmstate on such a previously running vm.
> > > 
> > > This fixes a crash when using COLO with Q35 machine.
> > > 
> > > Signed-off-by: Lukas Straub <lukasstraub2@web.de>  
> > 
> > Yes makes sense, maybe you can add some comments into the code too since
> > this was overlooked before,
> > 
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > 
> > Have you tried to measure how many overheads will this introduce to loading
> > each snapshot?
> 
> It's a large overhead actually, between 10-20 milliseconds.

This can be mentioned in the commit message.

IIUC reset() may or may not be required while loading a snapshot.
Normally, a device reset() should reset all dev registers and internal
states, OTOH loadvm() will reload most of them once more.. so less
efficient.

Maybe there's chance to "fix" q35 instead reducing this overhead, but I'll
leave that to be your call; to me this fix is clean from maint POV.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/10] migration-test: Add COLO migration unit test
  2026-01-30 10:24     ` Lukas Straub
@ 2026-02-02 14:26       ` Peter Xu
  2026-02-03  9:18         ` Lukas Straub
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2026-02-02 14:26 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

On Fri, Jan 30, 2026 at 11:24:02AM +0100, Lukas Straub wrote:
> On Tue, 27 Jan 2026 15:49:31 -0500
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Sun, Jan 25, 2026 at 09:40:11PM +0100, Lukas Straub wrote:
> > > +void migration_test_add_colo(MigrationTestEnv *env)
> > > +{
> > > +    if (!env->has_kvm) {
> > > +        g_test_skip("COLO requires KVM accelerator");
> > > +        return;
> > > +    }  
> > 
> > I'm OK if you want to explicitly bypass others, but could you explanation
> > why?
> > 
> > Thanks,
> > 
> 
> It used to hang with TCG. Now it crashes, since
> migration_bitmap_sync_precopy assumes bql is held. Something for later.

If we want to keep COLO around and be serious, let's try to make COLO the
same standard we target for migration in general whenever possible.  We
shouldn't randomly workaround bugs.  We should fix it.

It looks to me there's some locking issue instead.

Iterator's complete() requires BQL.  Would a patch like below makes sense
to you?

diff --git a/migration/colo.c b/migration/colo.c
index db783f6fa7..b3ea137120 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -458,8 +458,8 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     /* Note: device state is saved into buffer */
     ret = qemu_save_device_state(fb);
 
-    bql_unlock();
     if (ret < 0) {
+        bql_unlock();
         goto out;
     }
 
@@ -473,6 +473,9 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
      */
     qemu_savevm_live_state(s->to_dst_file);
 
+    /* Save live state requires BQL */
+    bql_unlock();
+
     qemu_fflush(fb);
 
     /*

> 
> #6  0x00007ffff7471517 in __assert_fail
>     (assertion=assertion@entry=0x555555f17aee "bql_locked() != locked", file=file@entry=0x555555f17ab0 "../system/cpus.c", line=line@entry=535, function=function@entry=0x55555609bfd0 <__PRETTY_FUNCTION__.9> "bql_update_status") at ./assert/assert.c:105
> #7  0x0000555555b09f1e in bql_update_status (locked=locked@entry=false) at ../system/cpus.c:535
> #8  0x0000555555ec60e7 in qemu_mutex_pre_unlock (mutex=0x555557166700 <bql>, file=0x555555efe1dc "../cpu-common.c", line=164) at ../util/qemu-thread-common.h:57
> #9  qemu_mutex_pre_unlock (line=164, file=0x555555efe1dc "../cpu-common.c", mutex=0x555557166700 <bql>) at ../util/qemu-thread-common.h:48
> #10 qemu_cond_wait_impl (cond=0x5555571442c0 <qemu_work_cond>, mutex=0x555557166700 <bql>, file=0x555555efe1dc "../cpu-common.c", line=164) at ../util/qemu-thread-posix.c:224
> #11 0x000055555589e6c8 in do_run_on_cpu (cpu=<optimized out>, func=<optimized out>, data=..., mutex=0x555557166700 <bql>) at ../cpu-common.c:164
> #12 0x0000555555b17a06 in memory_global_after_dirty_log_sync () at ../system/memory.c:2938
> #13 0x0000555555b55b47 in migration_bitmap_sync (rs=0x7fffe8001340, last_stage=last_stage@entry=true) at ../migration/ram.c:1157
> #14 0x0000555555b56721 in migration_bitmap_sync_precopy (last_stage=last_stage@entry=true) at ../migration/ram.c:1195
> #15 0x0000555555b59f8a in ram_save_complete (f=0x5555575db620, opaque=<optimized out>) at ../migration/ram.c:3381
> #16 0x0000555555b5e4f5 in qemu_savevm_complete (se=se@entry=0x5555574c0d80, f=f@entry=0x5555575db620) at ../migration/savevm.c:1521
> #17 0x0000555555b60437 in qemu_savevm_state_complete_precopy_iterable (f=f@entry=0x5555575db620, in_postcopy=in_postcopy@entry=false) at ../migration/savevm.c:1627
> #18 0x0000555555b60a4f in qemu_savevm_state_complete_precopy (iterable_only=true, f=0x5555575db620) at ../migration/savevm.c:1719
> #19 qemu_savevm_live_state (f=0x5555575db620) at ../migration/savevm.c:1855
> #20 0x0000555555b65ed9 in colo_do_checkpoint_transaction (fb=<optimized out>, bioc=<optimized out>, s=0x5555574c0070) at ../migration/colo.c:474
> #21 colo_process_checkpoint (s=0x5555574c0070) at ../migration/colo.c:592
> #22 migrate_start_colo_process (s=0x5555574c0070) at ../migration/colo.c:655
> #23 0x0000555555b4971e in migration_iteration_finish (s=0x5555574c0070) at ../migration/migration.c:3297
> #24 migration_thread (opaque=opaque@entry=0x5555574c0070) at ../migration/migration.c:3584
> #25 0x0000555555ec58c0 in qemu_thread_start (args=0x5555576583e0) at ../util/qemu-thread-posix.c:393
> #26 0x00007ffff74d2aa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
> #27 0x00007ffff755fc6c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78



-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/10] migration-test: Add COLO migration unit test
  2026-02-02 14:26       ` Peter Xu
@ 2026-02-03  9:18         ` Lukas Straub
  2026-02-03 21:21           ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Lukas Straub @ 2026-02-03  9:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 5754 bytes --]

On Mon, 2 Feb 2026 09:26:06 -0500
Peter Xu <peterx@redhat.com> wrote:

> On Fri, Jan 30, 2026 at 11:24:02AM +0100, Lukas Straub wrote:
> > On Tue, 27 Jan 2026 15:49:31 -0500
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Sun, Jan 25, 2026 at 09:40:11PM +0100, Lukas Straub wrote:  
> > > > +void migration_test_add_colo(MigrationTestEnv *env)
> > > > +{
> > > > +    if (!env->has_kvm) {
> > > > +        g_test_skip("COLO requires KVM accelerator");
> > > > +        return;
> > > > +    }    
> > > 
> > > I'm OK if you want to explicitly bypass others, but could you explanation
> > > why?
> > > 
> > > Thanks,
> > >   
> > 
> > It used to hang with TCG. Now it crashes, since
> > migration_bitmap_sync_precopy assumes bql is held. Something for later.  
> 
> If we want to keep COLO around and be serious, let's try to make COLO the
> same standard we target for migration in general whenever possible.  We
> shouldn't randomly workaround bugs.  We should fix it.
> 
> It looks to me there's some locking issue instead.
> 
> Iterator's complete() requires BQL.  Would a patch like below makes sense
> to you?
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index db783f6fa7..b3ea137120 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -458,8 +458,8 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>      /* Note: device state is saved into buffer */
>      ret = qemu_save_device_state(fb);
>  
> -    bql_unlock();
>      if (ret < 0) {
> +        bql_unlock();
>          goto out;
>      }
>  
> @@ -473,6 +473,9 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>       */
>      qemu_savevm_live_state(s->to_dst_file);
>  
> +    /* Save live state requires BQL */
> +    bql_unlock();
> +
>      qemu_fflush(fb);
>  
>      /*

I already tested that and it works. However, we have to be very careful
around the locking here and I don't think it is safe to take the bql on
the primary here:

The secondary has the bql held at this point:

    colo_receive_check_message(mis->from_src_file,
                       COLO_MESSAGE_VMSTATE_SEND, &local_err);
    ...
    bql_lock();
    cpu_synchronize_all_states();
    ret = qemu_loadvm_state_main(mis->from_src_file, mis, errp);
    bql_unlock();

On the primary there is a filter-mirror mirroring incoming packets to
the secondary filter-redirector. However since the secondary migration
holds bql the receiving filter is blocked and will not receive anything
from the socket. Thus filter-mirror on the primary also may get blocked
during send and block the mainloop (It uses blocking IO).

Now if the primary migration thread wants to take the bql it will
deadlock.

So I think this is something to fix in a separate series since it is
more involved.

Regards,
Lukas Straub

> 
> > 
> > #6  0x00007ffff7471517 in __assert_fail
> >     (assertion=assertion@entry=0x555555f17aee "bql_locked() != locked", file=file@entry=0x555555f17ab0 "../system/cpus.c", line=line@entry=535, function=function@entry=0x55555609bfd0 <__PRETTY_FUNCTION__.9> "bql_update_status") at ./assert/assert.c:105
> > #7  0x0000555555b09f1e in bql_update_status (locked=locked@entry=false) at ../system/cpus.c:535
> > #8  0x0000555555ec60e7 in qemu_mutex_pre_unlock (mutex=0x555557166700 <bql>, file=0x555555efe1dc "../cpu-common.c", line=164) at ../util/qemu-thread-common.h:57
> > #9  qemu_mutex_pre_unlock (line=164, file=0x555555efe1dc "../cpu-common.c", mutex=0x555557166700 <bql>) at ../util/qemu-thread-common.h:48
> > #10 qemu_cond_wait_impl (cond=0x5555571442c0 <qemu_work_cond>, mutex=0x555557166700 <bql>, file=0x555555efe1dc "../cpu-common.c", line=164) at ../util/qemu-thread-posix.c:224
> > #11 0x000055555589e6c8 in do_run_on_cpu (cpu=<optimized out>, func=<optimized out>, data=..., mutex=0x555557166700 <bql>) at ../cpu-common.c:164
> > #12 0x0000555555b17a06 in memory_global_after_dirty_log_sync () at ../system/memory.c:2938
> > #13 0x0000555555b55b47 in migration_bitmap_sync (rs=0x7fffe8001340, last_stage=last_stage@entry=true) at ../migration/ram.c:1157
> > #14 0x0000555555b56721 in migration_bitmap_sync_precopy (last_stage=last_stage@entry=true) at ../migration/ram.c:1195
> > #15 0x0000555555b59f8a in ram_save_complete (f=0x5555575db620, opaque=<optimized out>) at ../migration/ram.c:3381
> > #16 0x0000555555b5e4f5 in qemu_savevm_complete (se=se@entry=0x5555574c0d80, f=f@entry=0x5555575db620) at ../migration/savevm.c:1521
> > #17 0x0000555555b60437 in qemu_savevm_state_complete_precopy_iterable (f=f@entry=0x5555575db620, in_postcopy=in_postcopy@entry=false) at ../migration/savevm.c:1627
> > #18 0x0000555555b60a4f in qemu_savevm_state_complete_precopy (iterable_only=true, f=0x5555575db620) at ../migration/savevm.c:1719
> > #19 qemu_savevm_live_state (f=0x5555575db620) at ../migration/savevm.c:1855
> > #20 0x0000555555b65ed9 in colo_do_checkpoint_transaction (fb=<optimized out>, bioc=<optimized out>, s=0x5555574c0070) at ../migration/colo.c:474
> > #21 colo_process_checkpoint (s=0x5555574c0070) at ../migration/colo.c:592
> > #22 migrate_start_colo_process (s=0x5555574c0070) at ../migration/colo.c:655
> > #23 0x0000555555b4971e in migration_iteration_finish (s=0x5555574c0070) at ../migration/migration.c:3297
> > #24 migration_thread (opaque=opaque@entry=0x5555574c0070) at ../migration/migration.c:3584
> > #25 0x0000555555ec58c0 in qemu_thread_start (args=0x5555576583e0) at ../util/qemu-thread-posix.c:393
> > #26 0x00007ffff74d2aa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
> > #27 0x00007ffff755fc6c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78  
> 
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 05/10] colo: Fix crash during device vmstate load
  2026-02-02 14:12       ` Peter Xu
@ 2026-02-03  9:25         ` Lukas Straub
  0 siblings, 0 replies; 37+ messages in thread
From: Lukas Straub @ 2026-02-03  9:25 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 1634 bytes --]

On Mon, 2 Feb 2026 09:12:33 -0500
Peter Xu <peterx@redhat.com> wrote:

> On Fri, Jan 30, 2026 at 01:49:42PM +0100, Lukas Straub wrote:
> > On Tue, 27 Jan 2026 15:38:55 -0500
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Sun, Jan 25, 2026 at 09:40:10PM +0100, Lukas Straub wrote:  
> > > > With colo we load device vmstate during each checkpoint, on top of
> > > > a vm that was already running. Some devices expect a reset before
> > > > loading vmstate on such a previously running vm.
> > > > 
> > > > This fixes a crash when using COLO with Q35 machine.
> > > > 
> > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de>    
> > > 
> > > Yes makes sense, maybe you can add some comments into the code too since
> > > this was overlooked before,
> > > 
> > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > 
> > > Have you tried to measure how many overheads will this introduce to loading
> > > each snapshot?  
> > 
> > It's a large overhead actually, between 10-20 milliseconds.  
> 
> This can be mentioned in the commit message.
> 
> IIUC reset() may or may not be required while loading a snapshot.
> Normally, a device reset() should reset all dev registers and internal
> states, OTOH loadvm() will reload most of them once more.. so less
> efficient.
> 
> Maybe there's chance to "fix" q35 instead reducing this overhead, but I'll
> leave that to be your call; to me this fix is clean from maint POV.
> 
> Thanks,
> 

Yes, I think this fix is fine for now. It more correct like this and we
can improve performance later while keeping it correct.

Regards,
Lukas Straub

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/10] multifd: Add COLO support
  2026-01-28 12:30           ` Fabiano Rosas
  2026-01-28 14:09             ` Peter Xu
@ 2026-02-03  9:47             ` Lukas Straub
  1 sibling, 0 replies; 37+ messages in thread
From: Lukas Straub @ 2026-02-03  9:47 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Peter Xu, qemu-devel, Laurent Vivier, Paolo Bonzini, Zhang Chen,
	Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Juan Quintela

[-- Attachment #1: Type: text/plain, Size: 5345 bytes --]

On Wed, 28 Jan 2026 09:30:24 -0300
Fabiano Rosas <farosas@suse.de> wrote:

> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Jan 26, 2026 at 06:37:31PM -0300, Fabiano Rosas wrote:  
> >> Lukas Straub <lukasstraub2@web.de> writes:
> >>   
> >> >> [...]
> >> >>   
> >> >> > +        for (int i = 0; i < p->zero_num; i++) {
> >> >> > +            void *guest = p->block->host + p->zero[i];
> >> >> > +            memset(guest, 0, multifd_ram_page_size());
> >> >> > +        }    
> >> >> 
> >> >> At multifd_nocomp_recv, there will be a call to
> >> >> multifd_recv_zero_page_process(), which by that point will have p->host
> >> >> == p->block->colo_cache, so it looks like that function will do some
> >> >> zero page processing in the colo_cache, setting the rb->receivedmap for
> >> >> pages in the colo_cache and potentially also doing a memcpy. Is this
> >> >> intended?  
> >> >
> >> > rb->receivedmap is only for postcopy, right? So it doesn't apply with
> >> > colo.
> >> >  
> >> 
> >> It's not anymore since commit 5ef7e26bdb ("migration/multifd: solve zero
> >> page causing multiple page faults"). So it seems we might be doing extra
> >> work on top of the colo_cache.  
> >
> > IIUC not extra, but exactly what will be needed.
> >
> > The logic was about "in a vanilla precopy, if we see one page arriving the
> > 1st time we don't need to zero the buffer because the buffer should be zero
> > allocated".
> >
> > In COLO's case, COLO always puts RAM data into colo_cache, hence it should
> > apply to colo_cache too, avoiding unnecessary memset() for colo_cache
> > instead.
> >
> > E.g. colo_cache is allocated from qemu_anon_ram_alloc(), it's also
> > guaranteed to be zeros when never touched.
> >  
> >>   
> >> >> 
> >> >> I'm thinking that maybe it would overall be better to hook colo directly
> >> >> in to multifd_nocomp_recv:  
> >> >
> >> > But then it will only work for nocomp, right? It feels like the wrong
> >> > level of abstraction to me.
> >> >  
> >> 
> >> Ah, nocomp != ram indeed.
> >>   
> >> >> 
> >> >> > [...]
> >> >> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> >> >> > index 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 100644
> >> >> > --- a/migration/multifd-nocomp.c
> >> >> > +++ b/migration/multifd-nocomp.c
> >> >> > @@ -16,6 +16,7 @@
> >> >> >  #include "file.h"
> >> >> >  #include "migration-stats.h"
> >> >> >  #include "multifd.h"
> >> >> > +#include "multifd-colo.h"
> >> >> >  #include "options.h"
> >> >> >  #include "migration.h"
> >> >> >  #include "qapi/error.h"
> >> >> > @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >> >> >          return -1;
> >> >> >      }
> >> >> >  
> >> >> > -    p->host = p->block->host;
> >> >> >      for (i = 0; i < p->normal_num; i++) {
> >> >> >          uint64_t offset = be64_to_cpu(packet->offset[i]);
> >> >> >  
> >> >> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >> >> >          p->zero[i] = offset;
> >> >> >      }
> >> >> >  
> >> >> > +    if (migrate_colo()) {
> >> >> > +        multifd_colo_prepare_recv(p);
> >> >> > +        assert(p->block->colo_cache);
> >> >> > +        p->host = p->block->colo_cache;    
> >> >> 
> >> >> Can't you just use p->block->colo_cache later? I don't see why p->host
> >> >> needs to be set beforehand even in the non-colo case.  
> >> >
> >> > We should not touch the guest ram directly while in colo state, since
> >> > the incoming guest is running and we either want to receive and apply a
> >> > whole checkpoint with all ram into colo cache and all device state,
> >> > or if anything goes wrong during checkpointing, keep the currently
> >> > running guest on the incoming side in pristine state.
> >> >  
> >> 
> >> I was asking about setting p->host at this specific point. I don't think
> >> any of this fits the unfill function. However, I see those were
> >> suggested by Peter so let's not go back and forth.  
> >
> > Actually I don't know why p->host existed before this work; IIUC we could
> > have always used p->block->host.  Maybe when Juan was developing this Juan
> > kept COLO in mind; or maybe Juan wanted to avoid frequent p->block pointer
> > reference.
> >  
> 
> Maybe p->block was being reset at some point and p->host was passed
> being the point where the (whatever) lock was release. I checked and
> today there's no such thing. The p->mutex seems to be there just to
> protect against this in multifd_recv_sync_main:
> 
> WITH_QEMU_LOCK_GUARD(&p->mutex) {
>     if (multifd_recv_state->packet_num < p->packet_num) {
>         multifd_recv_state->packet_num = p->packet_num;
>     }
> }
> 
> > IIUC, we could remove p->host, but when we need to access "the buffer of
> > the ramblock" we'll need to call a helper to fetch that (either ramblock's
> > buffer, or colo_cache, per migrate_colo()).  And it might be slightly
> > slower than p->host indeed.
> >  
> 
> Yeah, let's keep it, the compression code also uses it, there's no point
> removing it now.
> 

Actually p->host was there first p->block was added later for COLO in
5d1d1fcf4 multifd: Add the ramblock to MultiFDRecvParams

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/10] migration-test: Add COLO migration unit test
  2026-02-03  9:18         ` Lukas Straub
@ 2026-02-03 21:21           ` Peter Xu
  2026-02-06 19:11             ` Lukas Straub
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2026-02-03 21:21 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

On Tue, Feb 03, 2026 at 10:18:22AM +0100, Lukas Straub wrote:
> On Mon, 2 Feb 2026 09:26:06 -0500
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Fri, Jan 30, 2026 at 11:24:02AM +0100, Lukas Straub wrote:
> > > On Tue, 27 Jan 2026 15:49:31 -0500
> > > Peter Xu <peterx@redhat.com> wrote:
> > >   
> > > > On Sun, Jan 25, 2026 at 09:40:11PM +0100, Lukas Straub wrote:  
> > > > > +void migration_test_add_colo(MigrationTestEnv *env)
> > > > > +{
> > > > > +    if (!env->has_kvm) {
> > > > > +        g_test_skip("COLO requires KVM accelerator");
> > > > > +        return;
> > > > > +    }    
> > > > 
> > > > I'm OK if you want to explicitly bypass others, but could you explanation
> > > > why?
> > > > 
> > > > Thanks,
> > > >   
> > > 
> > > It used to hang with TCG. Now it crashes, since
> > > migration_bitmap_sync_precopy assumes bql is held. Something for later.  
> > 
> > If we want to keep COLO around and be serious, let's try to make COLO the
> > same standard we target for migration in general whenever possible.  We
> > shouldn't randomly workaround bugs.  We should fix it.
> > 
> > It looks to me there's some locking issue instead.
> > 
> > Iterator's complete() requires BQL.  Would a patch like below makes sense
> > to you?
> > 
> > diff --git a/migration/colo.c b/migration/colo.c
> > index db783f6fa7..b3ea137120 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -458,8 +458,8 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
> >      /* Note: device state is saved into buffer */
> >      ret = qemu_save_device_state(fb);
> >  
> > -    bql_unlock();
> >      if (ret < 0) {
> > +        bql_unlock();
> >          goto out;
> >      }
> >  
> > @@ -473,6 +473,9 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
> >       */
> >      qemu_savevm_live_state(s->to_dst_file);
> >  
> > +    /* Save live state requires BQL */
> > +    bql_unlock();
> > +
> >      qemu_fflush(fb);
> >  
> >      /*
> 
> I already tested that and it works. However, we have to be very careful
> around the locking here and I don't think it is safe to take the bql on
> the primary here:
> 
> The secondary has the bql held at this point:

This is definitely an interesting piece of code... one question:

> 
>     colo_receive_check_message(mis->from_src_file,
>                        COLO_MESSAGE_VMSTATE_SEND, &local_err);
>     ...
>     bql_lock();
>     cpu_synchronize_all_states();

Why this is needed at all? ^^^^^^^^^^^^^^^

The qemu_loadvm_state_main() line right below should only load RAM.  I
don't see how it has anything to do with CPU register states..

>     ret = qemu_loadvm_state_main(mis->from_src_file, mis, errp);
>     bql_unlock();
> 
> On the primary there is a filter-mirror mirroring incoming packets to
> the secondary filter-redirector. However since the secondary migration
> holds bql the receiving filter is blocked and will not receive anything
> from the socket. Thus filter-mirror on the primary also may get blocked
> during send and block the mainloop (It uses blocking IO).

Hmm... could you explain why a blocking IO operation to mirror some packets
require holding BQL?  This sounds wrong on its own.

> 
> Now if the primary migration thread wants to take the bql it will
> deadlock.
> 
> So I think this is something to fix in a separate series since it is
> more involved.

Yes it might be involved, but this is really not something like "let's make
it simple for now and improve it later".  This is "OK this function
_requires_ this lock, but let's not take this lock and leave it for
later".  It's not something we can put aside, afaiu.  We should really fix
it.

How far do you think we can fix it?  Could you explain the problem better?

It might be helpful if you can reproduce the hang, then attach the logs
from both QEMU on a full thread backtrace dump.  I'll see what I can help.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/10] migration-test: Add COLO migration unit test
  2026-02-03 21:21           ` Peter Xu
@ 2026-02-06 19:11             ` Lukas Straub
  0 siblings, 0 replies; 37+ messages in thread
From: Lukas Straub @ 2026-02-06 19:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 4810 bytes --]

On Tue, 3 Feb 2026 16:21:05 -0500
Peter Xu <peterx@redhat.com> wrote:

> On Tue, Feb 03, 2026 at 10:18:22AM +0100, Lukas Straub wrote:
> > On Mon, 2 Feb 2026 09:26:06 -0500
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Fri, Jan 30, 2026 at 11:24:02AM +0100, Lukas Straub wrote:  
> > > > On Tue, 27 Jan 2026 15:49:31 -0500
> > > > Peter Xu <peterx@redhat.com> wrote:
> > > >     
> > > > > On Sun, Jan 25, 2026 at 09:40:11PM +0100, Lukas Straub wrote:    
> > > > > > +void migration_test_add_colo(MigrationTestEnv *env)
> > > > > > +{
> > > > > > +    if (!env->has_kvm) {
> > > > > > +        g_test_skip("COLO requires KVM accelerator");
> > > > > > +        return;
> > > > > > +    }      
> > > > > 
> > > > > I'm OK if you want to explicitly bypass others, but could you explanation
> > > > > why?
> > > > > 
> > > > > Thanks,
> > > > >     
> > > > 
> > > > It used to hang with TCG. Now it crashes, since
> > > > migration_bitmap_sync_precopy assumes bql is held. Something for later.    
> > > 
> > > If we want to keep COLO around and be serious, let's try to make COLO the
> > > same standard we target for migration in general whenever possible.  We
> > > shouldn't randomly workaround bugs.  We should fix it.
> > > 
> > > It looks to me there's some locking issue instead.
> > > 
> > > Iterator's complete() requires BQL.  Would a patch like below makes sense
> > > to you?
> > > 
> > > diff --git a/migration/colo.c b/migration/colo.c
> > > index db783f6fa7..b3ea137120 100644
> > > --- a/migration/colo.c
> > > +++ b/migration/colo.c
> > > @@ -458,8 +458,8 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
> > >      /* Note: device state is saved into buffer */
> > >      ret = qemu_save_device_state(fb);
> > >  
> > > -    bql_unlock();
> > >      if (ret < 0) {
> > > +        bql_unlock();
> > >          goto out;
> > >      }
> > >  
> > > @@ -473,6 +473,9 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
> > >       */
> > >      qemu_savevm_live_state(s->to_dst_file);
> > >  
> > > +    /* Save live state requires BQL */
> > > +    bql_unlock();
> > > +
> > >      qemu_fflush(fb);
> > >  
> > >      /*  
> > 
> > I already tested that and it works. However, we have to be very careful
> > around the locking here and I don't think it is safe to take the bql on
> > the primary here:
> > 
> > The secondary has the bql held at this point:  
> 
> This is definitely an interesting piece of code... one question:
> 
> > 
> >     colo_receive_check_message(mis->from_src_file,
> >                        COLO_MESSAGE_VMSTATE_SEND, &local_err);
> >     ...
> >     bql_lock();
> >     cpu_synchronize_all_states();  
> 
> Why this is needed at all? ^^^^^^^^^^^^^^^
> 
> The qemu_loadvm_state_main() line right below should only load RAM.  I
> don't see how it has anything to do with CPU register states..

You are right we don't need this and the lock is needed here. Then I'm
fine with removing the lock here and adding one on the primary side.

> 
> >     ret = qemu_loadvm_state_main(mis->from_src_file, mis, errp);
> >     bql_unlock();
> > 
> > On the primary there is a filter-mirror mirroring incoming packets to
> > the secondary filter-redirector. However since the secondary migration
> > holds bql the receiving filter is blocked and will not receive anything
> > from the socket. Thus filter-mirror on the primary also may get blocked
> > during send and block the mainloop (It uses blocking IO).  
> 
> Hmm... could you explain why a blocking IO operation to mirror some packets
> require holding BQL?  This sounds wrong on its own.

Yes there is no need for the BQL, it just is wrong. The tap fd gets a
POLLIN event, main loop takes BQL and calls the tap fd callback. Tap
reads a packet from the fd and calls qemu_send_packet_async() which
puts it through the net-filters and filter-mirror does a blocking send,
blocking the main loop while BQL is held.

> 
> > 
> > Now if the primary migration thread wants to take the bql it will
> > deadlock.
> > 
> > So I think this is something to fix in a separate series since it is
> > more involved.  
> 
> Yes it might be involved, but this is really not something like "let's make
> it simple for now and improve it later".  This is "OK this function
> _requires_ this lock, but let's not take this lock and leave it for
> later".  It's not something we can put aside, afaiu.  We should really fix
> it..
> 
> How far do you think we can fix it?  Could you explain the problem better?
> 
> It might be helpful if you can reproduce the hang, then attach the logs
> from both QEMU on a full thread backtrace dump.  I'll see what I can help.
> 
> Thanks,
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2026-02-06 19:12 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-25 20:40 [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
2026-01-25 20:40 ` [PATCH v3 01/10] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
2026-01-25 20:40 ` [PATCH v3 02/10] MAINTAINERS: Remove Hailiang Zhang from " Lukas Straub
2026-01-25 20:40 ` [PATCH v3 03/10] Move ram state receive into multifd_ram_state_recv() Lukas Straub
2026-01-26 12:51   ` Fabiano Rosas
2026-01-25 20:40 ` [PATCH v3 04/10] multifd: Add COLO support Lukas Straub
2026-01-26 10:36   ` Zhang Chen
2026-01-26 11:13     ` Lukas Straub
2026-01-26 14:33   ` Fabiano Rosas
2026-01-26 19:33     ` Lukas Straub
2026-01-26 21:37       ` Fabiano Rosas
2026-01-27 20:36         ` Peter Xu
2026-01-28 12:30           ` Fabiano Rosas
2026-01-28 14:09             ` Peter Xu
2026-01-28 20:02               ` Fabiano Rosas
2026-02-03  9:47             ` Lukas Straub
2026-01-25 20:40 ` [PATCH v3 05/10] colo: Fix crash during device vmstate load Lukas Straub
2026-01-27 20:38   ` Peter Xu
2026-01-30 12:49     ` Lukas Straub
2026-02-02 14:12       ` Peter Xu
2026-02-03  9:25         ` Lukas Straub
2026-01-25 20:40 ` [PATCH v3 06/10] migration-test: Add COLO migration unit test Lukas Straub
2026-01-26 14:40   ` Fabiano Rosas
2026-01-27 20:49   ` Peter Xu
2026-01-30 10:24     ` Lukas Straub
2026-02-02 14:26       ` Peter Xu
2026-02-03  9:18         ` Lukas Straub
2026-02-03 21:21           ` Peter Xu
2026-02-06 19:11             ` Lukas Straub
2026-01-28 12:32   ` Fabiano Rosas
2026-01-25 20:40 ` [PATCH v3 07/10] Convert colo main documentation to restructuredText Lukas Straub
2026-01-25 20:40 ` [PATCH v3 08/10] qemu-colo.rst: Miscellaneous changes Lukas Straub
2026-01-26 10:21   ` Zhang Chen
2026-01-26 10:56     ` Lukas Straub
2026-01-25 20:40 ` [PATCH v3 09/10] qemu-colo.rst: Add my copyright Lukas Straub
2026-01-26 10:23   ` Zhang Chen
2026-01-25 20:40 ` [PATCH v3 10/10] qemu-colo.rst: Simplify the block replication setup Lukas Straub

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.