All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test
@ 2026-02-20 19:51 Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 01/19] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
                   ` (19 more replies)
  0 siblings, 20 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub, Juan Quintela

Hello everyone,
This has some cleanups for and adds multifd support and migration unit tests
for COLO migration.

Regards,
Lukas

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
Changes in v10:
- multifd: always kick the main thread
- always open the return path socket on source
- Link to v9: https://lore.kernel.org/qemu-devel/20260218-colo_unit_test_multifd-v9-0-d8dbdb0ca6f6@web.de

Changes in v9:
- Rebase onto master
- Fix two rare bugs discovered during sresstesting the colo unit test
- Link to v8: https://lore.kernel.org/qemu-devel/20260210-colo_unit_test_multifd-v8-0-7f9e5f7d082b@web.de

Changes in v8:
- Fix peter's review comments
- Link to v7: https://lore.kernel.org/qemu-devel/20260210-colo_unit_test_multifd-v7-0-23bd32f36828@web.de

Changes in v7:
- Fix peter's review comments
- Link to v6: https://lore.kernel.org/qemu-devel/20260206-colo_unit_test_multifd-v6-0-27779dda139d@web.de

Changes in v6:
- Fix the crash when running COLO with TCG accel.
- Link to v5: https://lore.kernel.org/qemu-devel/20260203-colo_unit_test_multifd-v5-0-57508b7389f6@web.de

Changes in v5:
- Remove unused inmports from multifd-colo.c
- Mention the checkpoint overhead of reset to the Q35 fix
- Link to v4: https://lore.kernel.org/qemu-devel/20260130-colo_unit_test_multifd-v4-0-7115ab6f0e77@web.de

Changes in v4:
- Add cleanup patches to remove migration_incoming_colo_enabled() and MIG_CMD_ENABLE_COLO
- Add more comments to the colo unit test
- Call colo_release_ram_cache() after multifd threads terminate
- Link to v3: https://lore.kernel.org/qemu-devel/20260125-colo_unit_test_multifd-v3-0-ae926ccd8eae@web.de

Changes in v3:
- Fix peter's review comments.
- Fix COLO with Q35 machine
- Link to v2: https://lore.kernel.org/qemu-devel/20260117-colo_unit_test_multifd-v2-0-ab521777fa51@web.de

Changes in v2:
- Fix review comments
- Hide stderr in colo migration test since the logged errors are expected
- Add benchmarking data for multifd
- Add myself as maintainer for COLO migration framework
- Link to v1: https://lore.kernel.org/qemu-devel/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de

---
Lukas Straub (19):
      MAINTAINERS: Add myself as maintainer for COLO migration framework
      MAINTAINERS: Remove Hailiang Zhang from COLO migration framework
      colo: Setup ram cache in normal migration path
      colo: Replace migration_incoming_colo_enabled() with migrate_colo()
      colo: Remove ENABLE_COLO savevm command and mark it as deprecated
      ram: Remove colo special-casing
      multifd: Move ram state receive into multifd_ram_state_recv()
      multifd: Add COLO support
      Call colo_release_ram_cache() after multifd threads terminate
      colo: Fix crash during device vmstate load
      colo: Hold the BQL while sending ram state
      colo: Do not hold the BQL while receiving ram state.
      migration-test: Add COLO migration unit test
      Convert colo main documentation to restructuredText
      qemu-colo.rst: Miscellaneous changes
      qemu-colo.rst: Add my copyright
      qemu-colo.rst: Simplify the block replication setup
      multifd: Fix hang if send thread errors during sync
      migration: Always open s->rp_state.from_dst_file on the source

 MAINTAINERS                        |   6 +-
 docs/COLO-FT.txt                   | 334 ----------------------------------
 docs/system/index.rst              |   1 +
 docs/system/qemu-colo.rst          | 362 +++++++++++++++++++++++++++++++++++++
 include/migration/colo.h           |   3 -
 migration/colo.c                   |  50 ++---
 migration/meson.build              |   2 +-
 migration/migration.c              | 138 ++++++--------
 migration/multifd-colo.c           |  44 +++++
 migration/multifd-colo.h           |  26 +++
 migration/multifd-nocomp.c         |  10 +-
 migration/multifd.c                |  26 ++-
 migration/multifd.h                |   5 +-
 migration/ram.c                    |  12 +-
 migration/savevm.c                 |  37 +---
 migration/savevm.h                 |   1 -
 migration/trace-events             |   1 -
 tests/qtest/meson.build            |   7 +-
 tests/qtest/migration-test.c       |   1 +
 tests/qtest/migration/colo-tests.c | 198 ++++++++++++++++++++
 tests/qtest/migration/framework.h  |   5 +
 21 files changed, 770 insertions(+), 499 deletions(-)
---
base-commit: 07f97d5da04a9f97e273de85c76f5017d8135a6e
change-id: 20251230-colo_unit_test_multifd-8bf58dcebd46

Best regards,
-- 
Lukas Straub <lukasstraub2@web.de>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v10 01/19] MAINTAINERS: Add myself as maintainer for COLO migration framework
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 02/19] MAINTAINERS: Remove Hailiang Zhang from " Lukas Straub
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

I am ready to maintain it.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e0c481e2125d81298a52caf7f7b51c3e4f026d85..2458241e3cd45673d7228a3fecf2fe94f3b51b77 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3866,6 +3866,7 @@ F: qapi/yank.json
 
 COLO Framework
 M: Hailiang Zhang <zhanghailiang@xfusion.com>
+M: Lukas Straub <lukasstraub2@web.de>
 S: Maintained
 F: migration/colo*
 F: include/migration/colo.h

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 02/19] MAINTAINERS: Remove Hailiang Zhang from COLO migration framework
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 01/19] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 03/19] colo: Setup ram cache in normal migration path Lukas Straub
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

His last email to the mailing list is from December 2021:
https://lore.kernel.org/qemu-devel/20211214075424.6920-1-zhanghailiang@xfusion.com/

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2458241e3cd45673d7228a3fecf2fe94f3b51b77..3c4eb3bc744e23dc3d49d14b24a0d576d7bd60d6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3865,7 +3865,6 @@ F: include/qemu/yank.h
 F: qapi/yank.json
 
 COLO Framework
-M: Hailiang Zhang <zhanghailiang@xfusion.com>
 M: Lukas Straub <lukasstraub2@web.de>
 S: Maintained
 F: migration/colo*

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 03/19] colo: Setup ram cache in normal migration path
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 01/19] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 02/19] MAINTAINERS: Remove Hailiang Zhang from " Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 04/19] colo: Replace migration_incoming_colo_enabled() with migrate_colo() Lukas Straub
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

Since
121ccedc2b migration: block incoming colo when capability is disabled

x-colo capability needs to be always enabled on the incoming side.
So migration_incoming_colo_enabled() and migrate_colo() are equivalent
with migrate_colo() being easier to reason about since it is always true
during the whole migration.

Use migrate_colo() to initialize the ram cache in the normal migration path.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/migration.c | 18 ++++++++++++++----
 migration/savevm.c    | 14 +-------------
 2 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index a5b0465ed30cb812cb294ab901c7a37fe6157dc6..c2b9621190f8678ac6b32d6794d9bcb6ffa5e402 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -630,10 +630,6 @@ int migration_incoming_enable_colo(Error **errp)
         return -EINVAL;
     }
 
-    if (ram_block_discard_disable(true)) {
-        error_setg(errp, "COLO: cannot disable RAM discard");
-        return -EBUSY;
-    }
     migration_colo_enabled = true;
     return 0;
 }
@@ -770,6 +766,20 @@ process_incoming_migration_co(void *opaque)
 
     assert(mis->from_src_file);
 
+    if (migrate_colo()) {
+        if (ram_block_discard_disable(true)) {
+            error_setg(&local_err, "COLO: cannot disable RAM discard");
+            goto fail;
+        }
+
+        ret = colo_init_ram_cache(&local_err);
+        if (ret) {
+            error_prepend(&local_err, "failed to init colo RAM cache: %d: ",
+                          ret);
+            goto fail;
+        }
+    }
+
     mis->largest_page_size = qemu_ram_pagesize_largest();
     postcopy_state_set(POSTCOPY_INCOMING_NONE);
     migrate_set_state(&mis->state, MIGRATION_STATUS_SETUP,
diff --git a/migration/savevm.c b/migration/savevm.c
index 3a16c467b25b8d93b7d40bd0db751158e0278b4f..b88851cdb7974314b8481646a1dd19642887f210 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2427,19 +2427,7 @@ static int loadvm_process_enable_colo(MigrationIncomingState *mis,
                                       Error **errp)
 {
     ERRP_GUARD();
-    int ret;
-
-    ret = migration_incoming_enable_colo(errp);
-    if (ret < 0) {
-        return ret;
-    }
-
-    ret = colo_init_ram_cache(errp);
-    if (ret) {
-        error_prepend(errp, "failed to init colo RAM cache: %d: ", ret);
-        migration_incoming_disable_colo();
-    }
-    return ret;
+    return migration_incoming_enable_colo(errp);
 }
 
 static int loadvm_postcopy_handle_switchover_start(Error **errp)

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 04/19] colo: Replace migration_incoming_colo_enabled() with migrate_colo()
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (2 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 03/19] colo: Setup ram cache in normal migration path Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 05/19] colo: Remove ENABLE_COLO savevm command and mark it as deprecated Lukas Straub
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

Since
121ccedc2b migration: block incoming colo when capability is disabled

x-colo capability needs to be always enabled on the incoming side.
So migration_incoming_colo_enabled() and migrate_colo() are equivalent
with migrate_colo() being easier to reason about since it is always true
during the whole migration.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 include/migration/colo.h | 1 -
 migration/colo.c         | 2 +-
 migration/migration.c    | 9 ++-------
 migration/ram.c          | 2 +-
 4 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index d4fe422e4d335d3bef4f860f56400fcd73287a0e..2496a968cc1ce709f706c0efe57e4f765f163d3c 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -27,7 +27,6 @@ bool migration_in_colo_state(void);
 /* loadvm */
 int migration_incoming_enable_colo(Error **errp);
 void migration_incoming_disable_colo(void);
-bool migration_incoming_colo_enabled(void);
 bool migration_incoming_in_colo_state(void);
 
 COLOMode get_colo_mode(void);
diff --git a/migration/colo.c b/migration/colo.c
index f7a5bd3619a49a3b4a8306973de7a4411cc0df58..97a224c39c49ff2269f375db47112458cab0b4cb 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -935,7 +935,7 @@ void coroutine_fn colo_incoming_co(void)
     QemuThread th;
 
     assert(bql_locked());
-    assert(migration_incoming_colo_enabled());
+    assert(migrate_colo());
 
     qemu_thread_create(&th, MIGRATION_THREAD_DST_COLO,
                        colo_process_incoming_thread,
diff --git a/migration/migration.c b/migration/migration.c
index c2b9621190f8678ac6b32d6794d9bcb6ffa5e402..ef6aac53343f2217cd1aa37e493483703068d1ff 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -605,11 +605,6 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis,
 }
 
 static bool migration_colo_enabled;
-bool migration_incoming_colo_enabled(void)
-{
-    return migration_colo_enabled;
-}
-
 void migration_incoming_disable_colo(void)
 {
     ram_block_discard_disable(false);
@@ -739,7 +734,7 @@ static void process_incoming_migration_bh(void *opaque)
         } else {
             runstate_set(RUN_STATE_PAUSED);
         }
-    } else if (migration_incoming_colo_enabled()) {
+    } else if (migrate_colo()) {
         migration_incoming_disable_colo();
         vm_start();
     } else {
@@ -807,7 +802,7 @@ process_incoming_migration_co(void *opaque)
         goto fail;
     }
 
-    if (migration_incoming_colo_enabled()) {
+    if (migrate_colo()) {
         /* yield until COLO exit */
         colo_incoming_co();
     }
diff --git a/migration/ram.c b/migration/ram.c
index fc7ece2c1a10f34aa5a91f58cbe42ea418d7c078..aebf77aa0b861e00516d6f1090aebefdd0d97e54 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4370,7 +4370,7 @@ static int ram_load_precopy(QEMUFile *f)
              * speed of the migration, but it obviously reduce the downtime of
              * back-up all SVM'S memory in COLO preparing stage.
              */
-            if (migration_incoming_colo_enabled()) {
+            if (migrate_colo()) {
                 if (migration_incoming_in_colo_state()) {
                     /* In COLO stage, put all pages into cache temporarily */
                     host = colo_cache_from_block_offset(block, addr, true);

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 05/19] colo: Remove ENABLE_COLO savevm command and mark it as deprecated
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (3 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 04/19] colo: Replace migration_incoming_colo_enabled() with migrate_colo() Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 06/19] ram: Remove colo special-casing Lukas Straub
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

No need for it anymore now that x-colo capability is required
on incoming side. There is also no need to send it for backwards
compatibility since we only support COLO with the same version on
both sides.

We mark the command code as deprecated and now error out if such
a unhandled command is encountered in loadvm_process_command().

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 include/migration/colo.h |  2 --
 migration/migration.c    | 31 -------------------------------
 migration/savevm.c       | 25 +++++--------------------
 migration/savevm.h       |  1 -
 migration/trace-events   |  1 -
 5 files changed, 5 insertions(+), 55 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 2496a968cc1ce709f706c0efe57e4f765f163d3c..8f94054a10760d0f2598f080643f45f9944cf051 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -25,8 +25,6 @@ void migrate_start_colo_process(MigrationState *s);
 bool migration_in_colo_state(void);
 
 /* loadvm */
-int migration_incoming_enable_colo(Error **errp);
-void migration_incoming_disable_colo(void);
 bool migration_incoming_in_colo_state(void);
 
 COLOMode get_colo_mode(void);
diff --git a/migration/migration.c b/migration/migration.c
index ef6aac53343f2217cd1aa37e493483703068d1ff..dba5d6ede579da42693d5270ede9660fb145238a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -604,31 +604,6 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis,
     return migrate_send_rp_message_req_pages(mis, rb, start);
 }
 
-static bool migration_colo_enabled;
-void migration_incoming_disable_colo(void)
-{
-    ram_block_discard_disable(false);
-    migration_colo_enabled = false;
-}
-
-int migration_incoming_enable_colo(Error **errp)
-{
-#ifndef CONFIG_REPLICATION
-    error_setg(errp, "ENABLE_COLO command come in migration stream, but the "
-               "replication module is not built in");
-    return -ENOTSUP;
-#endif
-
-    if (!migrate_colo()) {
-        error_setg(errp, "ENABLE_COLO command come in migration stream"
-                   ", but x-colo capability is not set");
-        return -EINVAL;
-    }
-
-    migration_colo_enabled = true;
-    return 0;
-}
-
 void migrate_add_address(SocketAddress *address)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
@@ -735,7 +710,6 @@ static void process_incoming_migration_bh(void *opaque)
             runstate_set(RUN_STATE_PAUSED);
         }
     } else if (migrate_colo()) {
-        migration_incoming_disable_colo();
         vm_start();
     } else {
         runstate_set(global_state_get_runstate());
@@ -3534,11 +3508,6 @@ static void *migration_thread(void *opaque)
         qemu_savevm_send_postcopy_advise(s->to_dst_file);
     }
 
-    if (migrate_colo()) {
-        /* Notify migration destination that we enable COLO */
-        qemu_savevm_send_colo_enable(s->to_dst_file);
-    }
-
     if (migrate_auto_converge()) {
         /* Start RAMBlock dirty bitmap sync timer */
         cpu_throttle_dirty_sync_timer(true);
diff --git a/migration/savevm.c b/migration/savevm.c
index b88851cdb7974314b8481646a1dd19642887f210..197c89e0e659b889409c3dc97518920ea9c1824f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -90,7 +90,7 @@ enum qemu_vm_cmd {
                                       were previously sent during
                                       precopy but are dirty. */
     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
-    MIG_CMD_ENABLE_COLO,       /* Enable COLO */
+    MIG_CMD_DEPRECATED_0,      /* Prior to 10.2, used as MIG_CMD_ENABLE_COLO */
     MIG_CMD_POSTCOPY_RESUME,   /* resume postcopy on dest */
     MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
     MIG_CMD_SWITCHOVER_START,  /* Switchover start notification */
@@ -1103,12 +1103,6 @@ static void qemu_savevm_command_send(QEMUFile *f,
     qemu_fflush(f);
 }
 
-void qemu_savevm_send_colo_enable(QEMUFile *f)
-{
-    trace_savevm_send_colo_enable();
-    qemu_savevm_command_send(f, MIG_CMD_ENABLE_COLO, 0, NULL);
-}
-
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value)
 {
     uint32_t buf;
@@ -2423,13 +2417,6 @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
     return 0;
 }
 
-static int loadvm_process_enable_colo(MigrationIncomingState *mis,
-                                      Error **errp)
-{
-    ERRP_GUARD();
-    return migration_incoming_enable_colo(errp);
-}
-
 static int loadvm_postcopy_handle_switchover_start(Error **errp)
 {
     SaveStateEntry *se;
@@ -2513,7 +2500,7 @@ static int loadvm_process_command(QEMUFile *f, Error **errp)
                 return ret;
             }
         }
-        break;
+        return 0;
 
     case MIG_CMD_PING:
         tmp32 = qemu_get_be32(f);
@@ -2524,7 +2511,7 @@ static int loadvm_process_command(QEMUFile *f, Error **errp)
             return -1;
         }
         migrate_send_rp_pong(mis, tmp32);
-        break;
+        return 0;
 
     case MIG_CMD_PACKAGED:
         return loadvm_handle_cmd_packaged(mis, errp);
@@ -2548,14 +2535,12 @@ static int loadvm_process_command(QEMUFile *f, Error **errp)
     case MIG_CMD_RECV_BITMAP:
         return loadvm_handle_recv_bitmap(mis, len, errp);
 
-    case MIG_CMD_ENABLE_COLO:
-        return loadvm_process_enable_colo(mis, errp);
-
     case MIG_CMD_SWITCHOVER_START:
         return loadvm_postcopy_handle_switchover_start(errp);
     }
 
-    return 0;
+    error_setg(errp, "MIG_CMD 0x%x deprecated (len 0x%x)", cmd, len);
+    return -EINVAL;
 }
 
 /*
diff --git a/migration/savevm.h b/migration/savevm.h
index 2ba0881f3bd28dba10e52e16657a0b67b19bb00b..b3d1e8a13ca9a535a6990560ed2d64739ebe730e 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -65,7 +65,6 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint16_t len,
                                            uint64_t *start_list,
                                            uint64_t *length_list);
-void qemu_savevm_send_colo_enable(QEMUFile *f);
 int qemu_save_device_state(QEMUFile *f, Error **errp);
 int qemu_loadvm_state(QEMUFile *f, Error **errp);
 void qemu_loadvm_state_cleanup(MigrationIncomingState *mis);
diff --git a/migration/trace-events b/migration/trace-events
index 90629f828f80b51500776ae2171724369e194573..60e5087e38beccb98588fdffec7deff9a7f92c88 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -37,7 +37,6 @@ savevm_send_ping(uint32_t val) "0x%x"
 savevm_send_postcopy_listen(void) ""
 savevm_send_postcopy_run(void) ""
 savevm_send_postcopy_resume(void) ""
-savevm_send_colo_enable(void) ""
 savevm_send_recv_bitmap(char *name) "%s"
 savevm_send_switchover_start(void) ""
 savevm_state_setup(void) ""

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 06/19] ram: Remove colo special-casing
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (4 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 05/19] colo: Remove ENABLE_COLO savevm command and mark it as deprecated Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 07/19] multifd: Move ram state receive into multifd_ram_state_recv() Lukas Straub
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

We only enter colo state after the precopy migration is finished
so this if is always taken.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/ram.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index aebf77aa0b861e00516d6f1090aebefdd0d97e54..979751f61b30d6c4b878866b5011507e7c519176 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3116,12 +3116,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
     RAMBlock *block;
     int ret, max_hg_page_size;
 
-    /* migration has already setup the bitmap, reuse it. */
-    if (!migration_in_colo_state()) {
-        if (ram_init_all(rsp, errp) != 0) {
-            return -1;
-        }
+    assert(!migration_in_colo_state());
+
+    if (ram_init_all(rsp, errp) != 0) {
+        return -1;
     }
+
     (*rsp)->pss[RAM_CHANNEL_PRECOPY].pss_channel = f;
 
     /*

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 07/19] multifd: Move ram state receive into multifd_ram_state_recv()
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (5 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 06/19] ram: Remove colo special-casing Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 08/19] multifd: Add COLO support Lukas Straub
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

This is in preparation for the next patch.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/multifd.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index ad6261688fdf98a5c7f4ee9fb80ba2901201a33e..332e6fc58053462419f3171f6c320ac37648ef7b 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1253,6 +1253,15 @@ static int multifd_device_state_recv(MultiFDRecvParams *p, Error **errp)
     return ret;
 }
 
+static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
+{
+    int ret;
+
+    ret = multifd_recv_state->ops->recv(p, errp);
+
+    return ret;
+}
+
 static void *multifd_recv_thread(void *opaque)
 {
     MigrationState *s = migrate_get_current();
@@ -1387,7 +1396,7 @@ static void *multifd_recv_thread(void *opaque)
                 assert(use_packets);
                 ret = multifd_device_state_recv(p, &local_err);
             } else {
-                ret = multifd_recv_state->ops->recv(p, &local_err);
+                ret = multifd_ram_state_recv(p, &local_err);
             }
             if (ret != 0) {
                 break;

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 08/19] multifd: Add COLO support
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (6 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 07/19] multifd: Move ram state receive into multifd_ram_state_recv() Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 09/19] Call colo_release_ram_cache() after multifd threads terminate Lukas Straub
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub, Juan Quintela

Like in the normal ram_load() path, put the received pages into the
colo cache and mark the pages in the bitmap so that they will be
flushed to the guest later.

Multifd with COLO is useful to reduce the VM pause time during checkpointing
for latency sensitive workloads. In such workloads the worst-case latency
is especially important.

Also, this is already worth it for the precopy phase as it helps with
converging. Moreover, multifd migration is the preferred way to do migration
nowadays and this allows to use multifd compression with COLO.

Benchmark:
Cluster nodes
 - Intel Xenon E5-2630 v3
 - 48Gb RAM
 - 10G Ethernet
Guest
 - Windows Server 2016
 - 6Gb RAM
 - 4 cores
Workload
 - Upload a file to the guest with SMB to simulate moderate
   memory dirtying
 - Measure the memory transfer time portion of each checkpoint
 - 600ms COLO checkpoint interval

Results
Plain
 idle mean: 4.50ms 99per: 10.33ms
 load mean: 24.30ms 99per: 78.05ms
Multifd-4
 idle mean: 6.48ms 99per: 10.41ms
 load mean: 14.12ms 99per: 31.27ms

Evaluation
While multifd has slightly higher latency when the guest idles, it is
10ms faster under load and more importantly it's worst case latency is
less than 1/2 of plain under load as can be seen in the 99. Percentile.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 MAINTAINERS                |  1 +
 migration/meson.build      |  2 +-
 migration/multifd-colo.c   | 44 ++++++++++++++++++++++++++++++++++++++++++++
 migration/multifd-colo.h   | 26 ++++++++++++++++++++++++++
 migration/multifd-nocomp.c | 10 +++++++++-
 migration/multifd.c        |  8 ++++++++
 migration/multifd.h        |  5 ++++-
 7 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3c4eb3bc744e23dc3d49d14b24a0d576d7bd60d6..5519ea4e163229a9bbc06318a0ee06d88ba6a8a1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3868,6 +3868,7 @@ COLO Framework
 M: Lukas Straub <lukasstraub2@web.de>
 S: Maintained
 F: migration/colo*
+F: migration/multifd-colo.*
 F: include/migration/colo.h
 F: include/migration/failover.h
 F: docs/COLO-FT.txt
diff --git a/migration/meson.build b/migration/meson.build
index c7f39bdb55239ecb0e775c77b90a1aa9e6a4a9ce..c9f0f5f9f2137536497e53e960ce70654ad1b394 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -39,7 +39,7 @@ system_ss.add(files(
 ), gnutls, zlib)
 
 if get_option('replication').allowed()
-  system_ss.add(files('colo-failover.c', 'colo.c'))
+  system_ss.add(files('colo-failover.c', 'colo.c', 'multifd-colo.c'))
 else
   system_ss.add(files('colo-stubs.c'))
 endif
diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
new file mode 100644
index 0000000000000000000000000000000000000000..f160c6543414d3e157a444d613c96df4c5f0e602
--- /dev/null
+++ b/migration/multifd-colo.c
@@ -0,0 +1,44 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * multifd colo implementation
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "multifd.h"
+#include "multifd-colo.h"
+#include "migration/colo.h"
+#include "system/ramblock.h"
+
+void multifd_colo_prepare_recv(MultiFDRecvParams *p)
+{
+    /*
+     * While we're still in precopy state (not yet in colo state), we copy
+     * received pages to both guest and cache. No need to set dirty bits,
+     * since guest and cache memory are in sync.
+     */
+    if (migration_incoming_in_colo_state()) {
+        colo_record_bitmap(p->block, p->normal, p->normal_num);
+        colo_record_bitmap(p->block, p->zero, p->zero_num);
+    }
+}
+
+void multifd_colo_process_recv(MultiFDRecvParams *p)
+{
+    if (!migration_incoming_in_colo_state()) {
+        for (int i = 0; i < p->normal_num; i++) {
+            void *guest = p->block->host + p->normal[i];
+            void *cache = p->host + p->normal[i];
+            memcpy(guest, cache, multifd_ram_page_size());
+        }
+        for (int i = 0; i < p->zero_num; i++) {
+            void *guest = p->block->host + p->zero[i];
+            memset(guest, 0, multifd_ram_page_size());
+        }
+    }
+}
diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
new file mode 100644
index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
--- /dev/null
+++ b/migration/multifd-colo.h
@@ -0,0 +1,26 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * multifd colo header
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
+#define QEMU_MIGRATION_MULTIFD_COLO_H
+
+#ifdef CONFIG_REPLICATION
+
+void multifd_colo_prepare_recv(MultiFDRecvParams *p);
+void multifd_colo_process_recv(MultiFDRecvParams *p);
+
+#else
+
+static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
+static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
+
+#endif
+#endif
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -16,6 +16,7 @@
 #include "file.h"
 #include "migration-stats.h"
 #include "multifd.h"
+#include "multifd-colo.h"
 #include "options.h"
 #include "migration.h"
 #include "qapi/error.h"
@@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
         return -1;
     }
 
-    p->host = p->block->host;
     for (i = 0; i < p->normal_num; i++) {
         uint64_t offset = be64_to_cpu(packet->offset[i]);
 
@@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
         p->zero[i] = offset;
     }
 
+    if (migrate_colo()) {
+        multifd_colo_prepare_recv(p);
+        assert(p->block->colo_cache);
+        p->host = p->block->colo_cache;
+    } else {
+        p->host = p->block->host;
+    }
+
     return 0;
 }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 332e6fc58053462419f3171f6c320ac37648ef7b..220ed8564960fdabc58e4baa069dd252c8ad293c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -29,6 +29,7 @@
 #include "qemu-file.h"
 #include "trace.h"
 #include "multifd.h"
+#include "multifd-colo.h"
 #include "options.h"
 #include "qemu/yank.h"
 #include "io/channel-file.h"
@@ -1258,6 +1259,13 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
     int ret;
 
     ret = multifd_recv_state->ops->recv(p, errp);
+    if (ret != 0) {
+        return ret;
+    }
+
+    if (migrate_colo()) {
+        multifd_colo_process_recv(p);
+    }
 
     return ret;
 }
diff --git a/migration/multifd.h b/migration/multifd.h
index 89a395aef2b09a6762c45b5361e0ab63256feff6..fbc35702b062fdc3213ce92baed35994f5967c2b 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -279,7 +279,10 @@ typedef struct {
     uint64_t packets_recved;
     /* ramblock */
     RAMBlock *block;
-    /* ramblock host address */
+    /*
+     * Normally, it points to ramblock's host address.  When COLO
+     * is enabled, it points to the mirror cache for the ramblock.
+     */
     uint8_t *host;
     /* buffers to recv */
     struct iovec *iov;

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 09/19] Call colo_release_ram_cache() after multifd threads terminate
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (7 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 08/19] multifd: Add COLO support Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 10/19] colo: Fix crash during device vmstate load Lukas Straub
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

The multifd threads still may access the colo cache, so release it
only after they terminate.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/colo.c      | 3 ---
 migration/migration.c | 3 +++
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 97a224c39c49ff2269f375db47112458cab0b4cb..96102c9d0fb3b60e063c81ce07b730011bc4919f 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -949,7 +949,4 @@ void coroutine_fn colo_incoming_co(void)
     /* Wait checkpoint incoming thread exit before free resource */
     qemu_thread_join(&th);
     bql_lock();
-
-    /* We hold the global BQL, so it is safe here */
-    colo_release_ram_cache();
 }
diff --git a/migration/migration.c b/migration/migration.c
index dba5d6ede579da42693d5270ede9660fb145238a..f36d42ef657bdf26d78ca642d77a9b76e1c0c174 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -454,6 +454,9 @@ void migration_incoming_state_destroy(void)
      * BQL and retake unconditionally.
      */
     assert(bql_locked());
+    if (migrate_colo()) {
+        colo_release_ram_cache();
+    }
     qemu_loadvm_state_cleanup(mis);
 
     if (mis->to_src_file) {

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 10/19] colo: Fix crash during device vmstate load
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (8 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 09/19] Call colo_release_ram_cache() after multifd threads terminate Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 11/19] colo: Hold the BQL while sending ram state Lukas Straub
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

With colo we load device vmstate during each checkpoint, on top of
a vm that was already running. Some devices expect a reset before
loading vmstate on such a previously running vm.

This fixes a crash when using COLO with Q35 machine.

The reset adds 10-20ms overhead to the checkpointing proces in my
testing.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/colo.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 96102c9d0fb3b60e063c81ce07b730011bc4919f..dc7cfa81ef7db78e3ee372642de48567c5bc06eb 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -729,6 +729,12 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
 
     bql_lock();
     vmstate_loading = true;
+    /*
+     * With colo we load device vmstate during each checkpoint, on top of
+     * a vm that was already running. Some devices expect a reset before
+     * loading vmstate on such a previously running vm.
+     */
+    qemu_system_reset(SHUTDOWN_CAUSE_SNAPSHOT_LOAD);
     colo_flush_ram_cache();
     ret = qemu_load_device_state(fb, errp);
     if (ret < 0) {

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 11/19] colo: Hold the BQL while sending ram state
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (9 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 10/19] colo: Fix crash during device vmstate load Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 12/19] colo: Do not hold the BQL while receiving " Lukas Straub
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

qemu_savevm_state_complete_precopy() requires that BQL is held.

This fixes a crash when running with TCG accel.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/colo.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index dc7cfa81ef7db78e3ee372642de48567c5bc06eb..3297aa593cd9f87bf1013598464cc581a9d23531 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -455,9 +455,8 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
 
     /* Note: device state is saved into buffer */
     ret = qemu_save_device_state(fb, &local_err);
-
-    bql_unlock();
     if (ret < 0) {
+        bql_unlock();
         goto out;
     }
 
@@ -471,6 +470,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
      */
     qemu_savevm_state_complete_precopy_iterable(s->to_dst_file, false);
     qemu_savevm_state_end(s->to_dst_file);
+    bql_unlock();
 
     /*
      * We need the size of the VMstate data in Secondary side,

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 12/19] colo: Do not hold the BQL while receiving ram state.
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (10 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 11/19] colo: Hold the BQL while sending ram state Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 13/19] migration-test: Add COLO migration unit test Lukas Straub
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

We only receive ram into the colo cache here and don't touch anything
else, so the BQL is not needed here.

Move cpu_synchronize_all_states() downwards, before we apply the received
checkpoint. It turns out that qemu_system_reset() already calls it
for us.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/colo.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 3297aa593cd9f87bf1013598464cc581a9d23531..ce02c71d8857d470be434bdf3a9cacad3baab0d5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -686,11 +686,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
         return;
     }
 
-    bql_lock();
-    cpu_synchronize_all_states();
     ret = qemu_loadvm_state_main(mis->from_src_file, mis, errp);
-    bql_unlock();
-
     if (ret < 0) {
         return;
     }
@@ -733,6 +729,8 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
      * With colo we load device vmstate during each checkpoint, on top of
      * a vm that was already running. Some devices expect a reset before
      * loading vmstate on such a previously running vm.
+     *
+     * NOTE: qemu_system_reset() calls cpu_synchronize_all_states() for us
      */
     qemu_system_reset(SHUTDOWN_CAUSE_SNAPSHOT_LOAD);
     colo_flush_ram_cache();

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 13/19] migration-test: Add COLO migration unit test
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (11 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 12/19] colo: Do not hold the BQL while receiving " Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 14/19] Convert colo main documentation to restructuredText Lukas Straub
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

Add a COLO migration test for COLO migration and failover.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 MAINTAINERS                        |   1 +
 tests/qtest/meson.build            |   7 +-
 tests/qtest/migration-test.c       |   1 +
 tests/qtest/migration/colo-tests.c | 198 +++++++++++++++++++++++++++++++++++++
 tests/qtest/migration/framework.h  |   5 +
 5 files changed, 211 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5519ea4e163229a9bbc06318a0ee06d88ba6a8a1..a587a9233ac556a420f8784c4e3217e5c5b99f34 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3871,6 +3871,7 @@ F: migration/colo*
 F: migration/multifd-colo.*
 F: include/migration/colo.h
 F: include/migration/failover.h
+F: tests/qtest/migration/colo-tests.c
 F: docs/COLO-FT.txt
 
 COLO Proxy
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 25fdbc798010b19e8ec9b6ab55e02d3fb5741398..6a46e2a767de12d978d910ddb6de175bce9810b8 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -374,6 +374,11 @@ if gnutls.found()
   endif
 endif
 
+migration_colo_files = []
+if get_option('replication').allowed()
+  migration_colo_files = [files('migration/colo-tests.c')]
+endif
+
 qtests = {
   'aspeed_hace-test': files('aspeed-hace-utils.c', 'aspeed_hace-test.c'),
   'aspeed_smc-test': files('aspeed-smc-utils.c', 'aspeed_smc-test.c'),
@@ -385,7 +390,7 @@ qtests = {
                              'migration/migration-util.c') + dbus_vmstate1,
   'erst-test': files('erst-test.c'),
   'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
-  'migration-test': test_migration_files + migration_tls_files,
+  'migration-test': test_migration_files + migration_tls_files + migration_colo_files,
   'pxe-test': files('boot-sector.c'),
   'pnv-xive2-test': files('pnv-xive2-common.c', 'pnv-xive2-flush-sync.c',
                           'pnv-xive2-nvpg_bar.c'),
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 08936871741535c926eeac40a7d7c3f461c72fd0..e582f05c7dc2673dbd05a936df8feb6c964b5bbc 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -55,6 +55,7 @@ int main(int argc, char **argv)
     migration_test_add_precopy(env);
     migration_test_add_cpr(env);
     migration_test_add_misc(env);
+    migration_test_add_colo(env);
 
     ret = g_test_run();
 
diff --git a/tests/qtest/migration/colo-tests.c b/tests/qtest/migration/colo-tests.c
new file mode 100644
index 0000000000000000000000000000000000000000..598a1d3821ed0a90318732702027cebad47352fd
--- /dev/null
+++ b/tests/qtest/migration/colo-tests.c
@@ -0,0 +1,198 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * QTest testcases for COLO migration
+ *
+ * Copyright (c) 2025 Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "migration/framework.h"
+#include "migration/migration-qmp.h"
+#include "migration/migration-util.h"
+#include "qemu/module.h"
+
+static int test_colo_common(MigrateCommon *args,
+                            bool failover_during_checkpoint,
+                            bool primary_failover)
+{
+    QTestState *from, *to;
+    void *data_hook = NULL;
+
+    /*
+     * For the COLO test, both VMs will run in parallel. Thus both VMs want to
+     * open the image read/write at the same time. Using read-only=on is not
+     * possible here, because ide-hd does not support read-only backing image.
+     *
+     * So use -snapshot, where each qemu instance creates its own writable
+     * snapshot internally while leaving the real image read-only.
+     */
+    args->start.opts_source = "-snapshot";
+    args->start.opts_target = "-snapshot";
+
+    /*
+     * COLO migration code logs many errors when the migration socket
+     * is shut down, these are expected so we hide them here.
+     */
+    args->start.hide_stderr = true;
+
+    /*
+     * Test with yank with out of band capability since that is how it is
+     * used in production.
+     */
+    args->start.oob = true;
+    args->start.caps[MIGRATION_CAPABILITY_X_COLO] = true;
+
+    if (migrate_start(&from, &to, args->listen_uri, &args->start)) {
+        return -1;
+    }
+
+    migrate_set_parameter_int(from, "x-checkpoint-delay", 300);
+
+    if (args->start_hook) {
+        data_hook = args->start_hook(from, to);
+    }
+
+    migrate_ensure_converge(from);
+    wait_for_serial("src_serial");
+
+    migrate_qmp(from, to, args->connect_uri, NULL, "{}");
+
+    wait_for_migration_status(from, "colo", NULL);
+    wait_for_resume(to, get_dst());
+
+    wait_for_serial("src_serial");
+    wait_for_serial("dest_serial");
+
+    /* wait for 3 checkpoints */
+    for (int i = 0; i < 3; i++) {
+        qtest_qmp_eventwait(to, "RESUME");
+        wait_for_serial("src_serial");
+        wait_for_serial("dest_serial");
+    }
+
+    if (failover_during_checkpoint) {
+        qtest_qmp_eventwait(to, "STOP");
+    }
+    if (primary_failover) {
+        qtest_qmp_assert_success(from, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+                                            "'arguments': {'instances':"
+                                                "[{'type': 'migration'}]}}");
+        qtest_qmp_assert_success(from, "{'execute': 'x-colo-lost-heartbeat'}");
+        wait_for_serial("src_serial");
+    } else {
+        qtest_qmp_assert_success(to, "{'exec-oob': 'yank', 'id': 'yank-cmd', "
+                                        "'arguments': {'instances':"
+                                            "[{'type': 'migration'}]}}");
+        qtest_qmp_assert_success(to, "{'execute': 'x-colo-lost-heartbeat'}");
+        wait_for_serial("dest_serial");
+    }
+
+    if (args->end_hook) {
+        args->end_hook(from, to, data_hook);
+    }
+
+    migrate_end(from, to, !primary_failover);
+
+    return 0;
+}
+
+static void test_colo_plain_common(MigrateCommon *args,
+                                   bool failover_during_checkpoint,
+                                   bool primary_failover)
+{
+    args->listen_uri = "tcp:127.0.0.1:0";
+    test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void *hook_start_multifd(QTestState *from, QTestState *to)
+{
+    return migrate_hook_start_precopy_tcp_multifd_common(from, to, "none");
+}
+
+static void test_colo_multifd_common(MigrateCommon *args,
+                                     bool failover_during_checkpoint,
+                                     bool primary_failover)
+{
+    args->listen_uri = "defer";
+    args->start_hook = hook_start_multifd;
+    args->start.caps[MIGRATION_CAPABILITY_MULTIFD] = true;
+    test_colo_common(args, failover_during_checkpoint, primary_failover);
+}
+
+static void test_colo_plain_primary_failover(char *name, MigrateCommon *args)
+{
+    test_colo_plain_common(args, false, true);
+}
+
+static void test_colo_plain_secondary_failover(char *name, MigrateCommon *args)
+{
+    test_colo_plain_common(args, false, false);
+}
+
+static void test_colo_multifd_primary_failover(char *name, MigrateCommon *args)
+{
+    test_colo_multifd_common(args, false, true);
+}
+
+static void test_colo_multifd_secondary_failover(char *name,
+                                                 MigrateCommon *args)
+{
+    test_colo_multifd_common(args, false, false);
+}
+
+static void test_colo_plain_primary_failover_checkpoint(char *name,
+                                                        MigrateCommon *args)
+{
+    test_colo_plain_common(args, true, true);
+}
+
+static void test_colo_plain_secondary_failover_checkpoint(char *name,
+                                                          MigrateCommon *args)
+{
+    test_colo_plain_common(args, true, false);
+}
+
+static void test_colo_multifd_primary_failover_checkpoint(char *name,
+                                                          MigrateCommon *args)
+{
+    test_colo_multifd_common(args, true, true);
+}
+
+static void test_colo_multifd_secondary_failover_checkpoint(char *name,
+                                                            MigrateCommon *args)
+{
+    test_colo_multifd_common(args, true, false);
+}
+
+void migration_test_add_colo(MigrationTestEnv *env)
+{
+    if (!env->full_set) {
+        return;
+    }
+
+    migration_test_add("/migration/colo/plain/primary_failover",
+                       test_colo_plain_primary_failover);
+    migration_test_add("/migration/colo/plain/secondary_failover",
+                       test_colo_plain_secondary_failover);
+
+    migration_test_add("/migration/colo/multifd/primary_failover",
+                       test_colo_multifd_primary_failover);
+    migration_test_add("/migration/colo/multifd/secondary_failover",
+                       test_colo_multifd_secondary_failover);
+
+    migration_test_add("/migration/colo/plain/primary_failover_checkpoint",
+                       test_colo_plain_primary_failover_checkpoint);
+    migration_test_add("/migration/colo/plain/secondary_failover_checkpoint",
+                       test_colo_plain_secondary_failover_checkpoint);
+
+    migration_test_add("/migration/colo/multifd/primary_failover_checkpoint",
+                       test_colo_multifd_primary_failover_checkpoint);
+    migration_test_add("/migration/colo/multifd/secondary_failover_checkpoint",
+                       test_colo_multifd_secondary_failover_checkpoint);
+}
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index 40984d04930da2d181326d9f6a742bde49018103..80eef758932ce9c301ed6c0f6383d18756144870 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -264,5 +264,10 @@ void migration_test_add_file(MigrationTestEnv *env);
 void migration_test_add_precopy(MigrationTestEnv *env);
 void migration_test_add_cpr(MigrationTestEnv *env);
 void migration_test_add_misc(MigrationTestEnv *env);
+#ifdef CONFIG_REPLICATION
+void migration_test_add_colo(MigrationTestEnv *env);
+#else
+static inline void migration_test_add_colo(MigrationTestEnv *env) {};
+#endif
 
 #endif /* TEST_FRAMEWORK_H */

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 14/19] Convert colo main documentation to restructuredText
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (12 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 13/19] migration-test: Add COLO migration unit test Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 15/19] qemu-colo.rst: Miscellaneous changes Lukas Straub
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 MAINTAINERS               |   2 +-
 docs/COLO-FT.txt          | 334 ------------------------------------------
 docs/system/index.rst     |   1 +
 docs/system/qemu-colo.rst | 360 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+), 335 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a587a9233ac556a420f8784c4e3217e5c5b99f34..486c361ceac893175a8cb4f9ae2e8ac8202137ea 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3872,7 +3872,7 @@ F: migration/multifd-colo.*
 F: include/migration/colo.h
 F: include/migration/failover.h
 F: tests/qtest/migration/colo-tests.c
-F: docs/COLO-FT.txt
+F: docs/system/qemu-colo.rst
 
 COLO Proxy
 M: Zhang Chen <zhangckid@gmail.com>
diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt
deleted file mode 100644
index 2283a09c080b8996f9767eeb415e8d4fbdc940af..0000000000000000000000000000000000000000
--- a/docs/COLO-FT.txt
+++ /dev/null
@@ -1,334 +0,0 @@
-COarse-grained LOck-stepping Virtual Machines for Non-stop Service
-----------------------------------------
-Copyright (c) 2016 Intel Corporation
-Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
-Copyright (c) 2016 Fujitsu, Corp.
-
-This work is licensed under the terms of the GNU GPL, version 2 or later.
-See the COPYING file in the top-level directory.
-
-This document gives an overview of COLO's design and how to use it.
-
-== Background ==
-Virtual machine (VM) replication is a well known technique for providing
-application-agnostic software-implemented hardware fault tolerance,
-also known as "non-stop service".
-
-COLO (COarse-grained LOck-stepping) is a high availability solution.
-Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the
-same request from client, and generate response in parallel too.
-If the response packets from PVM and SVM are identical, they are released
-immediately. Otherwise, a VM checkpoint (on demand) is conducted.
-
-== Architecture ==
-
-The architecture of COLO is shown in the diagram below.
-It consists of a pair of networked physical nodes:
-The primary node running the PVM, and the secondary node running the SVM
-to maintain a valid replica of the PVM.
-PVM and SVM execute in parallel and generate output of response packets for
-client requests according to the application semantics.
-
-The incoming packets from the client or external network are received by the
-primary node, and then forwarded to the secondary node, so that both the PVM
-and the SVM are stimulated with the same requests.
-
-COLO receives the outbound packets from both the PVM and SVM and compares them
-before allowing the output to be sent to clients.
-
-The SVM is qualified as a valid replica of the PVM, as long as it generates
-identical responses to all client requests. Once the differences in the outputs
-are detected between the PVM and SVM, COLO withholds transmission of the
-outbound packets until it has successfully synchronized the PVM state to the SVM.
-
-  Primary Node                                                            Secondary Node
-+------------+  +-----------------------+       +------------------------+  +------------+
-|            |  |       HeartBeat       +<----->+       HeartBeat        |  |            |
-| Primary VM |  +-----------+-----------+       +-----------+------------+  |Secondary VM|
-|            |              |                               |               |            |
-|            |  +-----------|-----------+       +-----------|------------+  |            |
-|            |  |QEMU   +---v----+      |       |QEMU  +----v---+        |  |            |
-|            |  |       |Failover|      |       |      |Failover|        |  |            |
-|            |  |       +--------+      |       |      +--------+        |  |            |
-|            |  |   +---------------+   |       |   +---------------+    |  |            |
-|            |  |   | VM Checkpoint +-------------->+ VM Checkpoint |    |  |            |
-|            |  |   +---------------+   |       |   +---------------+    |  |            |
-|Requests<--------------------------\ /-----------------\ /--------------------->Requests|
-|            |  |                   ^ ^ |       |       | |              |  |            |
-|Responses+---------------------\ /-|-|------------\ /-------------------------+Responses|
-|            |  |               | | | | |       |  | |  | |              |  |            |
-|            |  | +-----------+ | | | | |       |  | |  | | +----------+ |  |            |
-|            |  | | COLO disk | | | | | |       |  | |  | | | COLO disk| |  |            |
-|            |  | |   Manager +---------------------------->| Manager  | |  |            |
-|            |  | ++----------+ v v | | |       |  | v  v | +---------++ |  |            |
-|            |  |  |+-----------+-+-+-++|       | ++-+--+-+---------+ |  |  |            |
-|            |  |  ||   COLO Proxy     ||       | |   COLO Proxy    | |  |  |            |
-|            |  |  || (compare packet  ||       | |(adjust sequence | |  |  |            |
-|            |  |  ||and mirror packet)||       | |    and ACK)     | |  |  |            |
-|            |  |  |+------------+---+-+|       | +-----------------+ |  |  |            |
-+------------+  +-----------------------+       +------------------------+  +------------+
-+------------+     |             |   |                                |     +------------+
-| VM Monitor |     |             |   |                                |     | VM Monitor |
-+------------+     |             |   |                                |     +------------+
-+---------------------------------------+       +----------------------------------------+
-|   Kernel         |             |   |  |       |   Kernel            |                  |
-+---------------------------------------+       +----------------------------------------+
-                   |             |   |                                |
-    +--------------v+  +---------v---+--+       +------------------+ +v-------------+
-    |   Storage     |  |External Network|       | External Network | |   Storage    |
-    +---------------+  +----------------+       +------------------+ +--------------+
-
-
-== Components introduction ==
-
-You can see there are several components in COLO's diagram of architecture.
-Their functions are described below.
-
-HeartBeat:
-Runs on both the primary and secondary nodes, to periodically check platform
-availability. When the primary node suffers a hardware fail-stop failure,
-the heartbeat stops responding, the secondary node will trigger a failover
-as soon as it determines the absence.
-
-COLO disk Manager:
-When primary VM writes data into image, the colo disk manager captures this data
-and sends it to secondary VM's which makes sure the context of secondary VM's
-image is consistent with the context of primary VM 's image.
-For more details, please refer to docs/block-replication.txt.
-
-Checkpoint/Failover Controller:
-Modifications of save/restore flow to realize continuous migration,
-to make sure the state of VM in Secondary side is always consistent with VM in
-Primary side.
-
-COLO Proxy:
-Delivers packets to Primary and Secondary, and then compare the responses from
-both side. Then decide whether to start a checkpoint according to some rules.
-Please refer to docs/colo-proxy.txt for more information.
-
-Note:
-HeartBeat has not been implemented yet, so you need to trigger failover process
-by using 'x-colo-lost-heartbeat' command.
-
-== COLO operation status ==
-
-+-----------------+
-|                 |
-|    Start COLO   |
-|                 |
-+--------+--------+
-         |
-         |  Main qmp command:
-         |  migrate-set-capabilities with x-colo
-         |  migrate
-         |
-         v
-+--------+--------+
-|                 |
-|  COLO running   |
-|                 |
-+--------+--------+
-         |
-         |  Main qmp command:
-         |  x-colo-lost-heartbeat
-         |  or
-         |  some error happened
-         v
-+--------+--------+
-|                 |  send qmp event:
-|  COLO failover  |  COLO_EXIT
-|                 |
-+-----------------+
-
-COLO use the qmp command to switch and report operation status.
-The diagram just shows the main qmp command, you can get the detail
-in test procedure.
-
-== Test procedure ==
-Note: Here we are running both instances on the same host for testing,
-change the IP Addresses if you want to run it on two hosts. Initially
-127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host.
-
-== Startup qemu ==
-1. Primary:
-Note: Initially, $imagefolder/primary.qcow2 needs to be copied to all hosts.
-You don't need to change any IP's here, because 0.0.0.0 listens on any
-interface. The chardev's with 127.0.0.1 IP's loopback to the local qemu
-instance.
-
-# imagefolder="/mnt/vms/colo-test-primary"
-
-# qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
-   -device piix3-usb-uhci -device usb-tablet -name primary \
-   -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
-   -device rtl8139,id=e0,netdev=hn0 \
-   -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server=on,wait=off \
-   -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on \
-   -chardev socket,id=compare0,host=127.0.0.1,port=9001,server=on,wait=off \
-   -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
-   -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server=on,wait=off \
-   -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
-   -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
-   -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
-   -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
-   -object iothread,id=iothread1 \
-   -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,\
-outdev=compare_out0,iothread=iothread1 \
-   -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
-children.0.file.filename=$imagefolder/primary.qcow2,children.0.driver=qcow2 -S
-
-2. Secondary:
-Note: Active and hidden images need to be created only once and the
-size should be the same as primary.qcow2. Again, you don't need to change
-any IP's here, except for the $primary_ip variable.
-
-# imagefolder="/mnt/vms/colo-test-secondary"
-# primary_ip=127.0.0.1
-
-# qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G
-
-# qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 10G
-
-# qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
-   -device piix3-usb-uhci -device usb-tablet -name secondary \
-   -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
-   -device rtl8139,id=e0,netdev=hn0 \
-   -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect-ms=1000 \
-   -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect-ms=1000 \
-   -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
-   -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
-   -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
-   -drive if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=qcow2 \
-   -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,\
-top-id=colo-disk0,file.file.filename=$imagefolder/secondary-active.qcow2,\
-file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/secondary-hidden.qcow2,\
-file.backing.backing=parent0 \
-   -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
-children.0=childs0 \
-   -incoming tcp:0.0.0.0:9998
-
-
-3. On Secondary VM's QEMU monitor, issue command
-{"execute":"qmp_capabilities"}
-{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
-{"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": "9999"} } } }
-{"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } }
-
-Note:
-  a. The qmp command nbd-server-start and nbd-server-add must be run
-     before running the qmp command migrate on primary QEMU
-  b. Active disk, hidden disk and nbd target's length should be the
-     same.
-  c. It is better to put active disk and hidden disk in ramdisk. They
-     will be merged into the parent disk on failover.
-
-4. On Primary VM's QEMU monitor, issue command:
-{"execute":"qmp_capabilities"}
-{"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
-{"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
-{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
-{"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
-
-  Note:
-  a. There should be only one NBD Client for each primary disk.
-  b. The qmp command line must be run after running qmp command line in
-     secondary qemu.
-
-5. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
-You can issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
-to change the idle checkpoint period time
-
-6. Failover test
-You can kill one of the VMs and Failover on the surviving VM:
-
-If you killed the Secondary, then follow "Primary Failover". After that,
-if you want to resume the replication, follow "Primary resume replication"
-
-If you killed the Primary, then follow "Secondary Failover". After that,
-if you want to resume the replication, follow "Secondary resume replication"
-
-== Primary Failover ==
-The Secondary died, resume on the Primary
-
-{"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "child": "children.1"} }
-{"execute": "human-monitor-command", "arguments":{ "command-line": "drive_del replication0" } }
-{"execute": "object-del", "arguments":{ "id": "comp0" } }
-{"execute": "object-del", "arguments":{ "id": "iothread1" } }
-{"execute": "object-del", "arguments":{ "id": "m0" } }
-{"execute": "object-del", "arguments":{ "id": "redire0" } }
-{"execute": "object-del", "arguments":{ "id": "redire1" } }
-{"execute": "x-colo-lost-heartbeat" }
-
-== Secondary Failover ==
-The Primary died, resume on the Secondary and prepare to become the new Primary
-
-{"execute": "nbd-server-stop"}
-{"execute": "x-colo-lost-heartbeat"}
-
-{"execute": "object-del", "arguments":{ "id": "f2" } }
-{"execute": "object-del", "arguments":{ "id": "f1" } }
-{"execute": "chardev-remove", "arguments":{ "id": "red1" } }
-{"execute": "chardev-remove", "arguments":{ "id": "red0" } }
-
-{"execute": "chardev-add", "arguments":{ "id": "mirror0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "0.0.0.0", "port": "9003" } }, "server": true } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare1", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "0.0.0.0", "port": "9004" } }, "server": true } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9001" } }, "server": true } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare0-0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9001" } }, "server": false } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare_out", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9005" } }, "server": true } } } }
-{"execute": "chardev-add", "arguments":{ "id": "compare_out0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9005" } }, "server": false } } } }
-
-== Primary resume replication ==
-Resume replication after new Secondary is up.
-
-Start the new Secondary (Steps 2 and 3 above), then on the Primary:
-{"execute": "drive-mirror", "arguments":{ "device": "colo-disk0", "job-id": "resync", "target": "nbd://127.0.0.2:9999/parent0", "mode": "existing", "format": "raw", "sync": "full"} }
-
-Wait until disk is synced, then:
-{"execute": "stop"}
-{"execute": "block-job-cancel", "arguments":{ "device": "resync"} }
-
-{"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
-{"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
-
-{"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
-{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
-{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire1", "netdev": "hn0", "queue": "rx", "outdev": "compare0" } }
-{"execute": "object-add", "arguments":{ "qom-type": "iothread", "id": "iothread1" } }
-{"execute": "object-add", "arguments":{ "qom-type": "colo-compare", "id": "comp0", "primary_in": "compare0-0", "secondary_in": "compare1", "outdev": "compare_out0", "iothread": "iothread1" } }
-
-{"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
-{"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.2:9998" } }
-
-Note:
-If this Primary previously was a Secondary, then we need to insert the
-filters before the filter-rewriter by using the
-""insert": "before", "position": "id=rew0"" Options. See below.
-
-== Secondary resume replication ==
-Become Primary and resume replication after new Secondary is up. Note
-that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary.
-
-Start the new Secondary (Steps 2 and 3 above, but with primary_ip=127.0.0.2),
-then on the old Secondary:
-{"execute": "drive-mirror", "arguments":{ "device": "colo-disk0", "job-id": "resync", "target": "nbd://127.0.0.1:9999/parent0", "mode": "existing", "format": "raw", "sync": "full"} }
-
-Wait until disk is synced, then:
-{"execute": "stop"}
-{"execute": "block-job-cancel", "arguments":{ "device": "resync" } }
-
-{"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=9999,file.export=parent0,node-name=replication0"}}
-{"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
-
-{"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
-{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
-{"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire1", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "outdev": "compare0" } }
-{"execute": "object-add", "arguments":{ "qom-type": "iothread", "id": "iothread1" } }
-{"execute": "object-add", "arguments":{ "qom-type": "colo-compare", "id": "comp0", "primary_in": "compare0-0", "secondary_in": "compare1", "outdev": "compare_out0", "iothread": "iothread1" } }
-
-{"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
-{"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.1:9998" } }
-
-== TODO ==
-1. Support shared storage.
-2. Develop the heartbeat part.
-3. Reduce checkpoint VM’s downtime while doing checkpoint.
diff --git a/docs/system/index.rst b/docs/system/index.rst
index 427b020483104f6589878bbf255a367ae114c61b..6268c41aea9c74dc3e59d896b5ae082360bfbb1a 100644
--- a/docs/system/index.rst
+++ b/docs/system/index.rst
@@ -41,3 +41,4 @@ or Hypervisor.Framework.
    igvm
    vm-templating
    sriov
+   qemu-colo
diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
new file mode 100644
index 0000000000000000000000000000000000000000..4b5fbbf398f8a5c4ea6baad615bde94b2b4678d2
--- /dev/null
+++ b/docs/system/qemu-colo.rst
@@ -0,0 +1,360 @@
+Qemu COLO Fault Tolerance
+=========================
+
+| Copyright (c) 2016 Intel Corporation
+| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+| Copyright (c) 2016 Fujitsu, Corp.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+This document gives an overview of COLO's design and how to use it.
+
+Background
+----------
+Virtual machine (VM) replication is a well known technique for providing
+application-agnostic software-implemented hardware fault tolerance,
+also known as "non-stop service".
+
+COLO (COarse-grained LOck-stepping) is a high availability solution.
+Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the
+same request from client, and generate response in parallel too.
+If the response packets from PVM and SVM are identical, they are released
+immediately. Otherwise, a VM checkpoint (on demand) is conducted.
+
+Architecture
+------------
+The architecture of COLO is shown in the diagram below.
+It consists of a pair of networked physical nodes:
+The primary node running the PVM, and the secondary node running the SVM
+to maintain a valid replica of the PVM.
+PVM and SVM execute in parallel and generate output of response packets for
+client requests according to the application semantics.
+
+The incoming packets from the client or external network are received by the
+primary node, and then forwarded to the secondary node, so that both the PVM
+and the SVM are stimulated with the same requests.
+
+COLO receives the outbound packets from both the PVM and SVM and compares them
+before allowing the output to be sent to clients.
+
+The SVM is qualified as a valid replica of the PVM, as long as it generates
+identical responses to all client requests. Once the differences in the outputs
+are detected between the PVM and SVM, COLO withholds transmission of the
+outbound packets until it has successfully synchronized the PVM state to the SVM.
+
+Overview::
+
+      Primary Node                                                            Secondary Node
+    +------------+  +-----------------------+       +------------------------+  +------------+
+    |            |  |       HeartBeat       +<----->+       HeartBeat        |  |            |
+    | Primary VM |  +-----------+-----------+       +-----------+------------+  |Secondary VM|
+    |            |              |                               |               |            |
+    |            |  +-----------|-----------+       +-----------|------------+  |            |
+    |            |  |QEMU   +---v----+      |       |QEMU  +----v---+        |  |            |
+    |            |  |       |Failover|      |       |      |Failover|        |  |            |
+    |            |  |       +--------+      |       |      +--------+        |  |            |
+    |            |  |   +---------------+   |       |   +---------------+    |  |            |
+    |            |  |   | VM Checkpoint +-------------->+ VM Checkpoint |    |  |            |
+    |            |  |   +---------------+   |       |   +---------------+    |  |            |
+    |Requests<--------------------------\ /-----------------\ /--------------------->Requests|
+    |            |  |                   ^ ^ |       |       | |              |  |            |
+    |Responses+---------------------\ /-|-|------------\ /-------------------------+Responses|
+    |            |  |               | | | | |       |  | |  | |              |  |            |
+    |            |  | +-----------+ | | | | |       |  | |  | | +----------+ |  |            |
+    |            |  | | COLO disk | | | | | |       |  | |  | | | COLO disk| |  |            |
+    |            |  | |   Manager +---------------------------->| Manager  | |  |            |
+    |            |  | ++----------+ v v | | |       |  | v  v | +---------++ |  |            |
+    |            |  |  |+-----------+-+-+-++|       | ++-+--+-+---------+ |  |  |            |
+    |            |  |  ||   COLO Proxy     ||       | |   COLO Proxy    | |  |  |            |
+    |            |  |  || (compare packet  ||       | |(adjust sequence | |  |  |            |
+    |            |  |  ||and mirror packet)||       | |    and ACK)     | |  |  |            |
+    |            |  |  |+------------+---+-+|       | +-----------------+ |  |  |            |
+    +------------+  +-----------------------+       +------------------------+  +------------+
+    +------------+     |             |   |                                |     +------------+
+    | VM Monitor |     |             |   |                                |     | VM Monitor |
+    +------------+     |             |   |                                |     +------------+
+    +---------------------------------------+       +----------------------------------------+
+    |   Kernel         |             |   |  |       |   Kernel            |                  |
+    +---------------------------------------+       +----------------------------------------+
+                       |             |   |                                |
+        +--------------v+  +---------v---+--+       +------------------+ +v-------------+
+        |   Storage     |  |External Network|       | External Network | |   Storage    |
+        +---------------+  +----------------+       +------------------+ +--------------+
+
+Components introduction
+^^^^^^^^^^^^^^^^^^^^^^^
+You can see there are several components in COLO's diagram of architecture.
+Their functions are described below.
+
+HeartBeat
+~~~~~~~~~
+Runs on both the primary and secondary nodes, to periodically check platform
+availability. When the primary node suffers a hardware fail-stop failure,
+the heartbeat stops responding, the secondary node will trigger a failover
+as soon as it determines the absence.
+
+COLO disk Manager
+~~~~~~~~~~~~~~~~~
+When primary VM writes data into image, the colo disk manager captures this data
+and sends it to secondary VM's which makes sure the context of secondary VM's
+image is consistent with the context of primary VM 's image.
+For more details, please refer to docs/block-replication.txt.
+
+Checkpoint/Failover Controller
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Modifications of save/restore flow to realize continuous migration,
+to make sure the state of VM in Secondary side is always consistent with VM in
+Primary side.
+
+COLO Proxy
+~~~~~~~~~~
+Delivers packets to Primary and Secondary, and then compare the responses from
+both side. Then decide whether to start a checkpoint according to some rules.
+Please refer to docs/colo-proxy.txt for more information.
+
+Note:
+HeartBeat has not been implemented yet, so you need to trigger failover process
+by using 'x-colo-lost-heartbeat' command.
+
+COLO operation status
+^^^^^^^^^^^^^^^^^^^^^
+
+Overview::
+
+    +-----------------+
+    |                 |
+    |    Start COLO   |
+    |                 |
+    +--------+--------+
+             |
+             |  Main qmp command:
+             |  migrate-set-capabilities with x-colo
+             |  migrate
+             |
+             v
+    +--------+--------+
+    |                 |
+    |  COLO running   |
+    |                 |
+    +--------+--------+
+             |
+             |  Main qmp command:
+             |  x-colo-lost-heartbeat
+             |  or
+             |  some error happened
+             v
+    +--------+--------+
+    |                 |  send qmp event:
+    |  COLO failover  |  COLO_EXIT
+    |                 |
+    +-----------------+
+
+
+COLO use the qmp command to switch and report operation status.
+The diagram just shows the main qmp command, you can get the detail
+in test procedure.
+
+Test procedure
+--------------
+Note: Here we are running both instances on the same host for testing,
+change the IP Addresses if you want to run it on two hosts. Initially
+``127.0.0.1`` is the Primary Host and ``127.0.0.2`` is the Secondary Host.
+
+Startup qemu
+^^^^^^^^^^^^
+**1. Primary**:
+Note: Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
+You don't need to change any IP's here, because ``0.0.0.0`` listens on any
+interface. The chardev's with ``127.0.0.1`` IP's loopback to the local qemu
+instance::
+
+    # imagefolder="/mnt/vms/colo-test-primary"
+
+    # qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
+       -device piix3-usb-uhci -device usb-tablet -name primary \
+       -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
+       -device rtl8139,id=e0,netdev=hn0 \
+       -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server=on,wait=off \
+       -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on \
+       -chardev socket,id=compare0,host=127.0.0.1,port=9001,server=on,wait=off \
+       -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
+       -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server=on,wait=off \
+       -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
+       -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
+       -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
+       -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
+       -object iothread,id=iothread1 \
+       -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,\
+    outdev=compare_out0,iothread=iothread1 \
+       -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
+    children.0.file.filename=$imagefolder/primary.qcow2,children.0.driver=qcow2 -S
+
+
+**2. Secondary**:
+Note: Active and hidden images need to be created only once and the
+size should be the same as ``primary.qcow2``. Again, you don't need to change
+any IP's here, except for the ``$primary_ip`` variable::
+
+    # imagefolder="/mnt/vms/colo-test-secondary"
+    # primary_ip=127.0.0.1
+
+    # qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G
+
+    # qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 10G
+
+    # qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp stdio \
+       -device piix3-usb-uhci -device usb-tablet -name secondary \
+       -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
+       -device rtl8139,id=e0,netdev=hn0 \
+       -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect-ms=1000 \
+       -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect-ms=1000 \
+       -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
+       -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
+       -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
+       -drive if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=qcow2 \
+       -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,\
+    top-id=colo-disk0,file.file.filename=$imagefolder/secondary-active.qcow2,\
+    file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/secondary-hidden.qcow2,\
+    file.backing.backing=parent0 \
+       -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
+    children.0=childs0 \
+       -incoming tcp:0.0.0.0:9998
+
+
+**3.** On Secondary VM's QEMU monitor, issue command::
+
+    {"execute":"qmp_capabilities"}
+    {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
+    {"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": "9999"} } } }
+    {"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } }
+
+Note:
+  a. The qmp command ``nbd-server-start`` and ``nbd-server-add`` must be run
+     before running the qmp command migrate on primary QEMU
+  b. Active disk, hidden disk and nbd target's length should be the
+     same.
+  c. It is better to put active disk and hidden disk in ramdisk. They
+     will be merged into the parent disk on failover.
+
+**4.** On Primary VM's QEMU monitor, issue command::
+
+    {"execute":"qmp_capabilities"}
+    {"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
+    {"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
+    {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
+    {"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
+
+Note:
+  a. There should be only one NBD Client for each primary disk.
+  b. The qmp command line must be run after running qmp command line in
+     secondary qemu.
+
+**5.** After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
+You can issue command ``{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }``
+to change the idle checkpoint period time
+
+Failover test
+^^^^^^^^^^^^^
+You can kill one of the VMs and Failover on the surviving VM:
+
+If you killed the Secondary, then follow "Primary Failover".
+After that, if you want to resume the replication, follow "Primary resume replication"
+
+If you killed the Primary, then follow "Secondary Failover".
+After that, if you want to resume the replication, follow "Secondary resume replication"
+
+Primary Failover
+~~~~~~~~~~~~~~~~
+The Secondary died, resume on the Primary::
+
+    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "child": "children.1"} }
+    {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_del replication0" } }
+    {"execute": "object-del", "arguments":{ "id": "comp0" } }
+    {"execute": "object-del", "arguments":{ "id": "iothread1" } }
+    {"execute": "object-del", "arguments":{ "id": "m0" } }
+    {"execute": "object-del", "arguments":{ "id": "redire0" } }
+    {"execute": "object-del", "arguments":{ "id": "redire1" } }
+    {"execute": "x-colo-lost-heartbeat" }
+
+Secondary Failover
+~~~~~~~~~~~~~~~~~~
+The Primary died, resume on the Secondary and prepare to become the new Primary::
+
+    {"execute": "nbd-server-stop"}
+    {"execute": "x-colo-lost-heartbeat"}
+
+    {"execute": "object-del", "arguments":{ "id": "f2" } }
+    {"execute": "object-del", "arguments":{ "id": "f1" } }
+    {"execute": "chardev-remove", "arguments":{ "id": "red1" } }
+    {"execute": "chardev-remove", "arguments":{ "id": "red0" } }
+
+    {"execute": "chardev-add", "arguments":{ "id": "mirror0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "0.0.0.0", "port": "9003" } }, "server": true } } } }
+    {"execute": "chardev-add", "arguments":{ "id": "compare1", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "0.0.0.0", "port": "9004" } }, "server": true } } } }
+    {"execute": "chardev-add", "arguments":{ "id": "compare0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9001" } }, "server": true } } } }
+    {"execute": "chardev-add", "arguments":{ "id": "compare0-0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9001" } }, "server": false } } } }
+    {"execute": "chardev-add", "arguments":{ "id": "compare_out", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9005" } }, "server": true } } } }
+    {"execute": "chardev-add", "arguments":{ "id": "compare_out0", "backend": {"type": "socket", "data": {"addr": { "type": "inet", "data": { "host": "127.0.0.1", "port": "9005" } }, "server": false } } } }
+
+Primary resume replication
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Resume replication after new Secondary is up.
+
+Start the new Secondary (Steps 2 and 3 above), then on the Primary::
+
+    {"execute": "drive-mirror", "arguments":{ "device": "colo-disk0", "job-id": "resync", "target": "nbd://127.0.0.2:9999/parent0", "mode": "existing", "format": "raw", "sync": "full"} }
+
+Wait until disk is synced, then::
+
+    {"execute": "stop"}
+    {"execute": "block-job-cancel", "arguments":{ "device": "resync"} }
+
+    {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
+    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
+
+    {"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire1", "netdev": "hn0", "queue": "rx", "outdev": "compare0" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "iothread", "id": "iothread1" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "colo-compare", "id": "comp0", "primary_in": "compare0-0", "secondary_in": "compare1", "outdev": "compare_out0", "iothread": "iothread1" } }
+
+    {"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
+    {"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.2:9998" } }
+
+Note:
+If this Primary previously was a Secondary, then we need to insert the
+filters before the filter-rewriter by using the
+""insert": "before", "position": "id=rew0"" Options. See below.
+
+Secondary resume replication
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Become Primary and resume replication after new Secondary is up. Note
+that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary.
+
+Start the new Secondary (Steps 2 and 3 above, but with primary_ip=127.0.0.2),
+then on the old Secondary::
+
+    {"execute": "drive-mirror", "arguments":{ "device": "colo-disk0", "job-id": "resync", "target": "nbd://127.0.0.1:9999/parent0", "mode": "existing", "format": "raw", "sync": "full"} }
+
+Wait until disk is synced, then::
+
+    {"execute": "stop"}
+    {"execute": "block-job-cancel", "arguments":{ "device": "resync" } }
+
+    {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=9999,file.export=parent0,node-name=replication0"}}
+    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
+
+    {"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire1", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "outdev": "compare0" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "iothread", "id": "iothread1" } }
+    {"execute": "object-add", "arguments":{ "qom-type": "colo-compare", "id": "comp0", "primary_in": "compare0-0", "secondary_in": "compare1", "outdev": "compare_out0", "iothread": "iothread1" } }
+
+    {"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
+    {"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.1:9998" } }
+
+TODO
+----
+1. Support shared storage.
+2. Develop the heartbeat part.
+3. Reduce checkpoint VM’s downtime while doing checkpoint.

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 15/19] qemu-colo.rst: Miscellaneous changes
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (13 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 14/19] Convert colo main documentation to restructuredText Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 16/19] qemu-colo.rst: Add my copyright Lukas Straub
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 docs/system/qemu-colo.rst | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
index 4b5fbbf398f8a5c4ea6baad615bde94b2b4678d2..a70e61aa09391cda933031535fa982d27cf6654b 100644
--- a/docs/system/qemu-colo.rst
+++ b/docs/system/qemu-colo.rst
@@ -1,13 +1,6 @@
 Qemu COLO Fault Tolerance
 =========================
 
-| Copyright (c) 2016 Intel Corporation
-| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
-| Copyright (c) 2016 Fujitsu, Corp.
-
-This work is licensed under the terms of the GNU GPL, version 2 or later.
-See the COPYING file in the top-level directory.
-
 This document gives an overview of COLO's design and how to use it.
 
 Background
@@ -82,8 +75,8 @@ Overview::
         |   Storage     |  |External Network|       | External Network | |   Storage    |
         +---------------+  +----------------+       +------------------+ +--------------+
 
-Components introduction
-^^^^^^^^^^^^^^^^^^^^^^^
+Components
+^^^^^^^^^^
 You can see there are several components in COLO's diagram of architecture.
 Their functions are described below.
 
@@ -157,14 +150,21 @@ in test procedure.
 
 Test procedure
 --------------
-Note: Here we are running both instances on the same host for testing,
+
+Setup
+^^^^^
+
+Here we are running both instances on the same host for testing,
 change the IP Addresses if you want to run it on two hosts. Initially
 ``127.0.0.1`` is the Primary Host and ``127.0.0.2`` is the Secondary Host.
 
+COLO uses double the guest ram size on the secondary side. The Qemu version
+should be the same on both hosts.
+
 Startup qemu
 ^^^^^^^^^^^^
 **1. Primary**:
-Note: Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
+Initially, ``$imagefolder/primary.qcow2`` needs to be copied to all hosts.
 You don't need to change any IP's here, because ``0.0.0.0`` listens on any
 interface. The chardev's with ``127.0.0.1`` IP's loopback to the local qemu
 instance::
@@ -192,7 +192,7 @@ instance::
 
 
 **2. Secondary**:
-Note: Active and hidden images need to be created only once and the
+Active and hidden images need to be created only once and the
 size should be the same as ``primary.qcow2``. Again, you don't need to change
 any IP's here, except for the ``$primary_ip`` variable::
 
@@ -353,8 +353,9 @@ Wait until disk is synced, then::
     {"execute": "migrate-set-capabilities", "arguments":{ "capabilities": [ {"capability": "x-colo", "state": true } ] } }
     {"execute": "migrate", "arguments":{ "uri": "tcp:127.0.0.1:9998" } }
 
-TODO
-----
-1. Support shared storage.
-2. Develop the heartbeat part.
-3. Reduce checkpoint VM’s downtime while doing checkpoint.
+| Copyright (c) 2016 Intel Corporation
+| Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+| Copyright (c) 2016 Fujitsu, Corp.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 16/19] qemu-colo.rst: Add my copyright
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (14 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 15/19] qemu-colo.rst: Miscellaneous changes Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 17/19] qemu-colo.rst: Simplify the block replication setup Lukas Straub
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

I have so far contributed 61 commits to the colo project, waranting
the addition of my copyright to this file.

Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 docs/system/qemu-colo.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
index a70e61aa09391cda933031535fa982d27cf6654b..75abbd80298df79223cb8e70064a5dc83d70f4eb 100644
--- a/docs/system/qemu-colo.rst
+++ b/docs/system/qemu-colo.rst
@@ -356,6 +356,7 @@ Wait until disk is synced, then::
 | Copyright (c) 2016 Intel Corporation
 | Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
 | Copyright (c) 2016 Fujitsu, Corp.
+| Copyright (c) 2026 Lukas Straub <lukasstraub2@web.de>
 
 This work is licensed under the terms of the GNU GPL, version 2 or later.
 See the COPYING file in the top-level directory.

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 17/19] qemu-colo.rst: Simplify the block replication setup
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (15 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 16/19] qemu-colo.rst: Add my copyright Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-20 19:51 ` [PATCH v10 18/19] multifd: Fix hang if send thread errors during sync Lukas Straub
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

On the primary side we don't actually need the replication
block driver, since it only passes trough all IO.
So simplify the setup and also use 'blockdev-add' instead of
'human-monitor-command'.

This is how my clients use colo in production.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 docs/system/qemu-colo.rst | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/docs/system/qemu-colo.rst b/docs/system/qemu-colo.rst
index 75abbd80298df79223cb8e70064a5dc83d70f4eb..f7d3b6439cf3401a58c412634239d1a43999a10e 100644
--- a/docs/system/qemu-colo.rst
+++ b/docs/system/qemu-colo.rst
@@ -240,8 +240,8 @@ Note:
 **4.** On Primary VM's QEMU monitor, issue command::
 
     {"execute":"qmp_capabilities"}
-    {"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
-    {"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } }
+    {"execute": "blockdev-add", "arguments": {"driver": "nbd", "node-name": "nbd0", "server": {"type": "inet", "host": "127.0.0.2", "port": "9999"}, "export": "parent0", "detect-zeroes": "on"} }
+    {"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "nbd0" } }
     {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } }
     {"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } }
 
@@ -269,7 +269,7 @@ Primary Failover
 The Secondary died, resume on the Primary::
 
     {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "child": "children.1"} }
-    {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_del replication0" } }
+    {"execute": "blockdev-del", "arguments": {"node-name": "nbd0"} }
     {"execute": "object-del", "arguments":{ "id": "comp0" } }
     {"execute": "object-del", "arguments":{ "id": "iothread1" } }
     {"execute": "object-del", "arguments":{ "id": "m0" } }
@@ -309,8 +309,8 @@ Wait until disk is synced, then::
     {"execute": "stop"}
     {"execute": "block-job-cancel", "arguments":{ "device": "resync"} }
 
-    {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0"}}
-    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
+    {"execute": "blockdev-add", "arguments": {"driver": "nbd", "node-name": "nbd0", "server": {"type": "inet", "host": "127.0.0.2", "port": "9999"}, "export": "parent0", "detect-zeroes": "on"} }
+    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "nbd0" } }
 
     {"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
     {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }
@@ -341,8 +341,8 @@ Wait until disk is synced, then::
     {"execute": "stop"}
     {"execute": "block-job-cancel", "arguments":{ "device": "resync" } }
 
-    {"execute": "human-monitor-command", "arguments":{ "command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=9999,file.export=parent0,node-name=replication0"}}
-    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "replication0" } }
+    {"execute": "blockdev-add", "arguments": {"driver": "nbd", "node-name": "nbd0", "server": {"type": "inet", "host": "127.0.0.1", "port": "9999"}, "export": "parent0", "detect-zeroes": "on"} }
+    {"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "node": "nbd0" } }
 
     {"execute": "object-add", "arguments":{ "qom-type": "filter-mirror", "id": "m0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "tx", "outdev": "mirror0" } }
     {"execute": "object-add", "arguments":{ "qom-type": "filter-redirector", "id": "redire0", "insert": "before", "position": "id=rew0", "netdev": "hn0", "queue": "rx", "indev": "compare_out" } }

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 18/19] multifd: Fix hang if send thread errors during sync
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (16 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 17/19] qemu-colo.rst: Simplify the block replication setup Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-26 15:25   ` Peter Xu
  2026-02-20 19:51 ` [PATCH v10 19/19] migration: Always open s->rp_state.from_dst_file on the source Lukas Straub
  2026-02-26 10:44 ` [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
  19 siblings, 1 reply; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

When a send thread encounters an error (as is the case with yank),
it sets multifd_send_state->exiting and the other threads exit too.
This races with multifd_send_sync_main() which now hangs at
qemu_sem_wait(&p->sem_sync) in multifd_send_sync_main() line 647
as it waits for threads that have exited.

Fix this by kicking the semaphores when exiting the send threads.

I encountered this hang when stress testing the colo unit test,
though I was unable to write a migration test to reliably hit this.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/multifd.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 220ed8564960fdabc58e4baa069dd252c8ad293c..7762aab8e0702672d3730f27e9c9ee3b86500f0c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -772,9 +772,14 @@ out:
         assert(local_err);
         trace_multifd_send_error(p->id);
         multifd_send_error_propagate(local_err);
-        multifd_send_kick_main(p);
     }
 
+    /*
+     * Always kick the main thread: The main thread might wait on this thread
+     * while another thread encounters an error and signals this thread to exit.
+     */
+    multifd_send_kick_main(p);
+
     rcu_unregister_thread();
     trace_multifd_send_thread_end(p->id, p->packets_sent);
 

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v10 19/19] migration: Always open s->rp_state.from_dst_file on the source
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (17 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 18/19] multifd: Fix hang if send thread errors during sync Lukas Straub
@ 2026-02-20 19:51 ` Lukas Straub
  2026-02-26 15:49   ` Peter Xu
  2026-02-26 10:44 ` [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
  19 siblings, 1 reply; 27+ messages in thread
From: Lukas Straub @ 2026-02-20 19:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Lukas Straub

qemu_file_get_return_path() can not fail (See also commit 3f9d6e77b)
so always open the return path socket on the source. This allows
us to reuse the return path in other parts like colo. Also take
the proper locks in colo while we're at it.

This fixes a crash due to a race between migrate_cancel() and
the colo thread shutting down.

Before, the rp socket is opened just before the rp thread is started
and closed after it terminates and postcopy fast path is closed.
Now it's the same, only the rp socket stays open until migration_cleanup().

If there is a rp thread, the rp socket is shut down at the end of migration,
but the file is still open. COLO is not compatible with postcopy, so this is
safe as no one else uses the rp socket after this point.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 migration/colo.c      | 29 ++++++-------------
 migration/migration.c | 77 ++++++++++++++++++++++++---------------------------
 2 files changed, 44 insertions(+), 62 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index ce02c71d8857d470be434bdf3a9cacad3baab0d5..0dee33f1145b81af276cf318e2984deae9ae0527 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -173,11 +173,13 @@ static void primary_vm_do_failover(void)
      * The s->rp_state.from_dst_file and s->to_dst_file may use the
      * same fd, but we still shutdown the fd for twice, it is harmless.
      */
-    if (s->to_dst_file) {
-        qemu_file_shutdown(s->to_dst_file);
-    }
-    if (s->rp_state.from_dst_file) {
-        qemu_file_shutdown(s->rp_state.from_dst_file);
+    WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
+        if (s->to_dst_file) {
+            qemu_file_shutdown(s->to_dst_file);
+        }
+        if (s->rp_state.from_dst_file) {
+            qemu_file_shutdown(s->rp_state.from_dst_file);
+        }
     }
 
     old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
@@ -537,6 +539,7 @@ static void colo_process_checkpoint(MigrationState *s)
     Error *local_err = NULL;
     int ret;
 
+    assert(s->rp_state.from_dst_file);
     if (get_colo_mode() != COLO_MODE_PRIMARY) {
         error_report("COLO mode must be COLO_MODE_PRIMARY");
         return;
@@ -544,12 +547,6 @@ static void colo_process_checkpoint(MigrationState *s)
 
     failover_init_state();
 
-    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
-    if (!s->rp_state.from_dst_file) {
-        error_report("Open QEMUFile from_dst_file failed");
-        goto out;
-    }
-
     packets_compare_notifier.notify = colo_compare_notify_checkpoint;
     colo_compare_register_notifier(&packets_compare_notifier);
 
@@ -634,16 +631,6 @@ out:
     colo_compare_unregister_notifier(&packets_compare_notifier);
     timer_free(s->colo_delay_timer);
     qemu_event_destroy(&s->colo_checkpoint_event);
-
-    /*
-     * Must be called after failover BH is completed,
-     * Or the failover BH may shutdown the wrong fd that
-     * re-used by other threads after we release here.
-     */
-    if (s->rp_state.from_dst_file) {
-        qemu_fclose(s->rp_state.from_dst_file);
-        s->rp_state.from_dst_file = NULL;
-    }
 }
 
 void migrate_start_colo_process(MigrationState *s)
diff --git a/migration/migration.c b/migration/migration.c
index f36d42ef657bdf26d78ca642d77a9b76e1c0c174..8caa56940beef12de33a799695cf486c8fbd471c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -97,7 +97,7 @@ static GSList *migration_blockers[MIG_MODE__MAX];
 
 static bool migration_object_check(MigrationState *ms, Error **errp);
 static bool migration_switchover_start(MigrationState *s, Error **errp);
-static bool close_return_path_on_source(MigrationState *s);
+static bool stop_return_path_thread_on_source(MigrationState *s);
 static void migration_completion_end(MigrationState *s);
 
 static void migration_downtime_start(MigrationState *s)
@@ -1278,7 +1278,7 @@ static void migration_cleanup(MigrationState *s)
     cpr_state_close();
     cpr_transfer_source_destroy(s);
 
-    close_return_path_on_source(s);
+    stop_return_path_thread_on_source(s);
 
     if (s->migration_thread_running) {
         bql_unlock();
@@ -1307,6 +1307,14 @@ static void migration_cleanup(MigrationState *s)
         qemu_fclose(tmp);
     }
 
+    WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
+        tmp = s->rp_state.from_dst_file;
+        s->rp_state.from_dst_file = NULL;
+    }
+    if (tmp) {
+        qemu_fclose(tmp);
+    }
+
     assert(!migration_is_active());
 
     if (s->state == MIGRATION_STATUS_CANCELLING) {
@@ -2187,38 +2195,6 @@ static bool migrate_handle_rp_resume_ack(MigrationState *s,
     return true;
 }
 
-/*
- * Release ms->rp_state.from_dst_file (and postcopy_qemufile_src if
- * existed) in a safe way.
- */
-static void migration_release_dst_files(MigrationState *ms)
-{
-    QEMUFile *file = NULL;
-
-    WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
-        /*
-         * Reset the from_dst_file pointer first before releasing it, as we
-         * can't block within lock section
-         */
-        file = ms->rp_state.from_dst_file;
-        ms->rp_state.from_dst_file = NULL;
-    }
-
-    /*
-     * Do the same to postcopy fast path socket too if there is.  No
-     * locking needed because this qemufile should only be managed by
-     * return path thread.
-     */
-    if (ms->postcopy_qemufile_src) {
-        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
-        qemu_file_shutdown(ms->postcopy_qemufile_src);
-        qemu_fclose(ms->postcopy_qemufile_src);
-        ms->postcopy_qemufile_src = NULL;
-    }
-
-    qemu_fclose(file);
-}
-
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -2388,9 +2364,9 @@ out:
     return NULL;
 }
 
-static void open_return_path_on_source(MigrationState *ms)
+static void start_return_path_thread_on_source(MigrationState *ms)
 {
-    ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
+    assert(ms->rp_state.from_dst_file);
 
     trace_open_return_path_on_source();
 
@@ -2402,7 +2378,7 @@ static void open_return_path_on_source(MigrationState *ms)
 }
 
 /* Return true if error detected, or false otherwise */
-static bool close_return_path_on_source(MigrationState *ms)
+static bool stop_return_path_thread_on_source(MigrationState *ms)
 {
     if (!ms->rp_state.rp_thread_created) {
         return false;
@@ -2424,7 +2400,17 @@ static bool close_return_path_on_source(MigrationState *ms)
 
     qemu_thread_join(&ms->rp_state.rp_thread);
     ms->rp_state.rp_thread_created = false;
-    migration_release_dst_files(ms);
+    /*
+     * Close the postcopy fast path socket if there is one.
+     * No locking needed because this qemufile should only be managed by
+     * return path thread which we just stopped.
+     */
+    if (ms->postcopy_qemufile_src) {
+        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
+        qemu_file_shutdown(ms->postcopy_qemufile_src);
+        qemu_fclose(ms->postcopy_qemufile_src);
+        ms->postcopy_qemufile_src = NULL;
+    }
     trace_migration_return_path_end_after();
 
     /* Return path will persist the error in MigrationState when quit */
@@ -2787,7 +2773,7 @@ static void migration_completion(MigrationState *s)
         goto fail;
     }
 
-    if (close_return_path_on_source(s)) {
+    if (stop_return_path_thread_on_source(s)) {
         goto fail;
     }
 
@@ -2941,7 +2927,15 @@ static MigThrError postcopy_pause(MigrationState *s)
          * path and just wait for the thread to finish. It will be
          * re-created when we resume.
          */
-        close_return_path_on_source(s);
+        stop_return_path_thread_on_source(s);
+        QEMUFile *rp_file;
+        WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
+            rp_file = s->rp_state.from_dst_file;
+            s->rp_state.from_dst_file = NULL;
+        }
+        if (rp_file) {
+            qemu_fclose(rp_file);
+        }
 
         /*
          * Current channel is possibly broken. Release it.  Note that this is
@@ -3758,6 +3752,7 @@ void migration_start_outgoing(MigrationState *s)
     if (!qemu_file_set_blocking(s->to_dst_file, true, &local_err)) {
         goto fail;
     }
+    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
 
     /*
      * Open the return path. For postcopy, it is used exclusively. For
@@ -3765,7 +3760,7 @@ void migration_start_outgoing(MigrationState *s)
      * QEMU uses the return path.
      */
     if (migrate_postcopy_ram() || migrate_return_path()) {
-        open_return_path_on_source(s);
+        start_return_path_thread_on_source(s);
     }
 
     /*

-- 
2.39.5



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test
  2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
                   ` (18 preceding siblings ...)
  2026-02-20 19:51 ` [PATCH v10 19/19] migration: Always open s->rp_state.from_dst_file on the source Lukas Straub
@ 2026-02-26 10:44 ` Lukas Straub
  19 siblings, 0 replies; 27+ messages in thread
From: Lukas Straub @ 2026-02-26 10:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert, Juan Quintela

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]

On Fri, 20 Feb 2026 20:51:22 +0100
Lukas Straub <lukasstraub2@web.de> wrote:

> Hello everyone,
> This has some cleanups for and adds multifd support and migration unit tests
> for COLO migration.
> 
> Regards,
> Lukas
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
> Changes in v10:
> - multifd: always kick the main thread
> - always open the return path socket on source
> - Link to v9: https://lore.kernel.org/qemu-devel/20260218-colo_unit_test_multifd-v9-0-d8dbdb0ca6f6@web.de
> 

Gentle ping.

Best regards,
Lukas Straub

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v10 18/19] multifd: Fix hang if send thread errors during sync
  2026-02-20 19:51 ` [PATCH v10 18/19] multifd: Fix hang if send thread errors during sync Lukas Straub
@ 2026-02-26 15:25   ` Peter Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2026-02-26 15:25 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

On Fri, Feb 20, 2026 at 08:51:40PM +0100, Lukas Straub wrote:
> When a send thread encounters an error (as is the case with yank),
> it sets multifd_send_state->exiting and the other threads exit too.
> This races with multifd_send_sync_main() which now hangs at
> qemu_sem_wait(&p->sem_sync) in multifd_send_sync_main() line 647
> as it waits for threads that have exited.
> 
> Fix this by kicking the semaphores when exiting the send threads.
> 
> I encountered this hang when stress testing the colo unit test,
> though I was unable to write a migration test to reliably hit this.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v10 19/19] migration: Always open s->rp_state.from_dst_file on the source
  2026-02-20 19:51 ` [PATCH v10 19/19] migration: Always open s->rp_state.from_dst_file on the source Lukas Straub
@ 2026-02-26 15:49   ` Peter Xu
  2026-02-27 11:49     ` Lukas Straub
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2026-02-26 15:49 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

On Fri, Feb 20, 2026 at 08:51:41PM +0100, Lukas Straub wrote:
> qemu_file_get_return_path() can not fail (See also commit 3f9d6e77b)
> so always open the return path socket on the source. This allows
> us to reuse the return path in other parts like colo. Also take
> the proper locks in colo while we're at it.
> 
> This fixes a crash due to a race between migrate_cancel() and
> the colo thread shutting down.
> 
> Before, the rp socket is opened just before the rp thread is started
> and closed after it terminates and postcopy fast path is closed.
> Now it's the same, only the rp socket stays open until migration_cleanup().
> 
> If there is a rp thread, the rp socket is shut down at the end of migration,
> but the file is still open. COLO is not compatible with postcopy, so this is
> safe as no one else uses the rp socket after this point.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  migration/colo.c      | 29 ++++++-------------
>  migration/migration.c | 77 ++++++++++++++++++++++++---------------------------
>  2 files changed, 44 insertions(+), 62 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index ce02c71d8857d470be434bdf3a9cacad3baab0d5..0dee33f1145b81af276cf318e2984deae9ae0527 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -173,11 +173,13 @@ static void primary_vm_do_failover(void)
>       * The s->rp_state.from_dst_file and s->to_dst_file may use the
>       * same fd, but we still shutdown the fd for twice, it is harmless.
>       */
> -    if (s->to_dst_file) {
> -        qemu_file_shutdown(s->to_dst_file);
> -    }
> -    if (s->rp_state.from_dst_file) {
> -        qemu_file_shutdown(s->rp_state.from_dst_file);
> +    WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> +        if (s->to_dst_file) {
> +            qemu_file_shutdown(s->to_dst_file);
> +        }
> +        if (s->rp_state.from_dst_file) {
> +            qemu_file_shutdown(s->rp_state.from_dst_file);
> +        }

Nit: these "start to take mutex for..." changes should belong to a separate
patch.

>      }
>  
>      old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
> @@ -537,6 +539,7 @@ static void colo_process_checkpoint(MigrationState *s)
>      Error *local_err = NULL;
>      int ret;
>  
> +    assert(s->rp_state.from_dst_file);
>      if (get_colo_mode() != COLO_MODE_PRIMARY) {
>          error_report("COLO mode must be COLO_MODE_PRIMARY");
>          return;
> @@ -544,12 +547,6 @@ static void colo_process_checkpoint(MigrationState *s)
>  
>      failover_init_state();
>  
> -    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
> -    if (!s->rp_state.from_dst_file) {
> -        error_report("Open QEMUFile from_dst_file failed");
> -        goto out;
> -    }
> -
>      packets_compare_notifier.notify = colo_compare_notify_checkpoint;
>      colo_compare_register_notifier(&packets_compare_notifier);
>  
> @@ -634,16 +631,6 @@ out:
>      colo_compare_unregister_notifier(&packets_compare_notifier);
>      timer_free(s->colo_delay_timer);
>      qemu_event_destroy(&s->colo_checkpoint_event);
> -
> -    /*
> -     * Must be called after failover BH is completed,
> -     * Or the failover BH may shutdown the wrong fd that
> -     * re-used by other threads after we release here.
> -     */
> -    if (s->rp_state.from_dst_file) {
> -        qemu_fclose(s->rp_state.from_dst_file);
> -        s->rp_state.from_dst_file = NULL;
> -    }
>  }
>  
>  void migrate_start_colo_process(MigrationState *s)
> diff --git a/migration/migration.c b/migration/migration.c
> index f36d42ef657bdf26d78ca642d77a9b76e1c0c174..8caa56940beef12de33a799695cf486c8fbd471c 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -97,7 +97,7 @@ static GSList *migration_blockers[MIG_MODE__MAX];
>  
>  static bool migration_object_check(MigrationState *ms, Error **errp);
>  static bool migration_switchover_start(MigrationState *s, Error **errp);
> -static bool close_return_path_on_source(MigrationState *s);
> +static bool stop_return_path_thread_on_source(MigrationState *s);
>  static void migration_completion_end(MigrationState *s);
>  
>  static void migration_downtime_start(MigrationState *s)
> @@ -1278,7 +1278,7 @@ static void migration_cleanup(MigrationState *s)
>      cpr_state_close();
>      cpr_transfer_source_destroy(s);
>  
> -    close_return_path_on_source(s);
> +    stop_return_path_thread_on_source(s);
>  
>      if (s->migration_thread_running) {
>          bql_unlock();
> @@ -1307,6 +1307,14 @@ static void migration_cleanup(MigrationState *s)
>          qemu_fclose(tmp);
>      }
>  
> +    WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> +        tmp = s->rp_state.from_dst_file;
> +        s->rp_state.from_dst_file = NULL;
> +    }
> +    if (tmp) {
> +        qemu_fclose(tmp);
> +    }
> +
>      assert(!migration_is_active());
>  
>      if (s->state == MIGRATION_STATUS_CANCELLING) {
> @@ -2187,38 +2195,6 @@ static bool migrate_handle_rp_resume_ack(MigrationState *s,
>      return true;
>  }
>  
> -/*
> - * Release ms->rp_state.from_dst_file (and postcopy_qemufile_src if
> - * existed) in a safe way.
> - */
> -static void migration_release_dst_files(MigrationState *ms)
> -{
> -    QEMUFile *file = NULL;
> -
> -    WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
> -        /*
> -         * Reset the from_dst_file pointer first before releasing it, as we
> -         * can't block within lock section
> -         */
> -        file = ms->rp_state.from_dst_file;
> -        ms->rp_state.from_dst_file = NULL;
> -    }
> -
> -    /*
> -     * Do the same to postcopy fast path socket too if there is.  No
> -     * locking needed because this qemufile should only be managed by
> -     * return path thread.
> -     */
> -    if (ms->postcopy_qemufile_src) {
> -        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
> -        qemu_file_shutdown(ms->postcopy_qemufile_src);
> -        qemu_fclose(ms->postcopy_qemufile_src);
> -        ms->postcopy_qemufile_src = NULL;
> -    }
> -
> -    qemu_fclose(file);
> -}
> -
>  /*
>   * Handles messages sent on the return path towards the source VM
>   *
> @@ -2388,9 +2364,9 @@ out:
>      return NULL;
>  }
>  
> -static void open_return_path_on_source(MigrationState *ms)
> +static void start_return_path_thread_on_source(MigrationState *ms)

Changing this seems OK, but I'm totally confused why you deleted
migration_release_dst_files().  That's the helper to properly close the two
possible qemufiles that rp thread uses.  IIUC you can keep it then use it
in either postcopy_pause() or migration_cleanup() directly.

Then you need to remove the chunk [1] below, making the function only do
"stop" but not close.

>  {
> -    ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
> +    assert(ms->rp_state.from_dst_file);

I don't see why this change is a must, to open from_dst_file earlier.  Can
we keep it as before, or would you justify it in a separate patch?

>  
>      trace_open_return_path_on_source();
>  
> @@ -2402,7 +2378,7 @@ static void open_return_path_on_source(MigrationState *ms)
>  }
>  
>  /* Return true if error detected, or false otherwise */
> -static bool close_return_path_on_source(MigrationState *ms)
> +static bool stop_return_path_thread_on_source(MigrationState *ms)
>  {
>      if (!ms->rp_state.rp_thread_created) {
>          return false;
> @@ -2424,7 +2400,17 @@ static bool close_return_path_on_source(MigrationState *ms)
>  
>      qemu_thread_join(&ms->rp_state.rp_thread);
>      ms->rp_state.rp_thread_created = false;
> -    migration_release_dst_files(ms);
> +    /*
> +     * Close the postcopy fast path socket if there is one.
> +     * No locking needed because this qemufile should only be managed by
> +     * return path thread which we just stopped.
> +     */
> +    if (ms->postcopy_qemufile_src) {
> +        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
> +        qemu_file_shutdown(ms->postcopy_qemufile_src);
> +        qemu_fclose(ms->postcopy_qemufile_src);
> +        ms->postcopy_qemufile_src = NULL;
> +    }

[1]

>      trace_migration_return_path_end_after();
>  
>      /* Return path will persist the error in MigrationState when quit */
> @@ -2787,7 +2773,7 @@ static void migration_completion(MigrationState *s)
>          goto fail;
>      }
>  
> -    if (close_return_path_on_source(s)) {
> +    if (stop_return_path_thread_on_source(s)) {
>          goto fail;
>      }
>  
> @@ -2941,7 +2927,15 @@ static MigThrError postcopy_pause(MigrationState *s)
>           * path and just wait for the thread to finish. It will be
>           * re-created when we resume.
>           */
> -        close_return_path_on_source(s);
> +        stop_return_path_thread_on_source(s);
> +        QEMUFile *rp_file;
> +        WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> +            rp_file = s->rp_state.from_dst_file;
> +            s->rp_state.from_dst_file = NULL;
> +        }
> +        if (rp_file) {
> +            qemu_fclose(rp_file);
> +        }

Open-code this is going backwards.  Please see if we can reuse
migration_release_dst_files() at least.

Thanks,

>  
>          /*
>           * Current channel is possibly broken. Release it.  Note that this is
> @@ -3758,6 +3752,7 @@ void migration_start_outgoing(MigrationState *s)
>      if (!qemu_file_set_blocking(s->to_dst_file, true, &local_err)) {
>          goto fail;
>      }
> +    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
>  
>      /*
>       * Open the return path. For postcopy, it is used exclusively. For
> @@ -3765,7 +3760,7 @@ void migration_start_outgoing(MigrationState *s)
>       * QEMU uses the return path.
>       */
>      if (migrate_postcopy_ram() || migrate_return_path()) {
> -        open_return_path_on_source(s);
> +        start_return_path_thread_on_source(s);
>      }
>  
>      /*
> 
> -- 
> 2.39.5
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v10 19/19] migration: Always open s->rp_state.from_dst_file on the source
  2026-02-26 15:49   ` Peter Xu
@ 2026-02-27 11:49     ` Lukas Straub
  2026-02-27 17:22       ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Lukas Straub @ 2026-02-27 11:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 11996 bytes --]

On Thu, 26 Feb 2026 10:49:07 -0500
Peter Xu <peterx@redhat.com> wrote:

> On Fri, Feb 20, 2026 at 08:51:41PM +0100, Lukas Straub wrote:
> > qemu_file_get_return_path() can not fail (See also commit 3f9d6e77b)
> > so always open the return path socket on the source. This allows
> > us to reuse the return path in other parts like colo. Also take
> > the proper locks in colo while we're at it.
> > 
> > This fixes a crash due to a race between migrate_cancel() and
> > the colo thread shutting down.
> > 
> > Before, the rp socket is opened just before the rp thread is started
> > and closed after it terminates and postcopy fast path is closed.
> > Now it's the same, only the rp socket stays open until migration_cleanup().
> > 
> > If there is a rp thread, the rp socket is shut down at the end of migration,
> > but the file is still open. COLO is not compatible with postcopy, so this is
> > safe as no one else uses the rp socket after this point.
> > 
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > ---
> >  migration/colo.c      | 29 ++++++-------------
> >  migration/migration.c | 77 ++++++++++++++++++++++++---------------------------
> >  2 files changed, 44 insertions(+), 62 deletions(-)
> > 
> > diff --git a/migration/colo.c b/migration/colo.c
> > index ce02c71d8857d470be434bdf3a9cacad3baab0d5..0dee33f1145b81af276cf318e2984deae9ae0527 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -173,11 +173,13 @@ static void primary_vm_do_failover(void)
> >       * The s->rp_state.from_dst_file and s->to_dst_file may use the
> >       * same fd, but we still shutdown the fd for twice, it is harmless.
> >       */
> > -    if (s->to_dst_file) {
> > -        qemu_file_shutdown(s->to_dst_file);
> > -    }
> > -    if (s->rp_state.from_dst_file) {
> > -        qemu_file_shutdown(s->rp_state.from_dst_file);
> > +    WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> > +        if (s->to_dst_file) {
> > +            qemu_file_shutdown(s->to_dst_file);
> > +        }
> > +        if (s->rp_state.from_dst_file) {
> > +            qemu_file_shutdown(s->rp_state.from_dst_file);
> > +        }  
> 
> Nit: these "start to take mutex for..." changes should belong to a separate
> patch.
> 

Will do.

> >      }
> >  
> >      old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
> > @@ -537,6 +539,7 @@ static void colo_process_checkpoint(MigrationState *s)
> >      Error *local_err = NULL;
> >      int ret;
> >  
> > +    assert(s->rp_state.from_dst_file);
> >      if (get_colo_mode() != COLO_MODE_PRIMARY) {
> >          error_report("COLO mode must be COLO_MODE_PRIMARY");
> >          return;
> > @@ -544,12 +547,6 @@ static void colo_process_checkpoint(MigrationState *s)
> >  
> >      failover_init_state();
> >  
> > -    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
> > -    if (!s->rp_state.from_dst_file) {
> > -        error_report("Open QEMUFile from_dst_file failed");
> > -        goto out;
> > -    }
> > -
> >      packets_compare_notifier.notify = colo_compare_notify_checkpoint;
> >      colo_compare_register_notifier(&packets_compare_notifier);
> >  
> > @@ -634,16 +631,6 @@ out:
> >      colo_compare_unregister_notifier(&packets_compare_notifier);
> >      timer_free(s->colo_delay_timer);
> >      qemu_event_destroy(&s->colo_checkpoint_event);
> > -
> > -    /*
> > -     * Must be called after failover BH is completed,
> > -     * Or the failover BH may shutdown the wrong fd that
> > -     * re-used by other threads after we release here.
> > -     */
> > -    if (s->rp_state.from_dst_file) {
> > -        qemu_fclose(s->rp_state.from_dst_file);
> > -        s->rp_state.from_dst_file = NULL;
> > -    }
> >  }
> >  
> >  void migrate_start_colo_process(MigrationState *s)
> > diff --git a/migration/migration.c b/migration/migration.c
> > index f36d42ef657bdf26d78ca642d77a9b76e1c0c174..8caa56940beef12de33a799695cf486c8fbd471c 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -97,7 +97,7 @@ static GSList *migration_blockers[MIG_MODE__MAX];
> >  
> >  static bool migration_object_check(MigrationState *ms, Error **errp);
> >  static bool migration_switchover_start(MigrationState *s, Error **errp);
> > -static bool close_return_path_on_source(MigrationState *s);
> > +static bool stop_return_path_thread_on_source(MigrationState *s);
> >  static void migration_completion_end(MigrationState *s);
> >  
> >  static void migration_downtime_start(MigrationState *s)
> > @@ -1278,7 +1278,7 @@ static void migration_cleanup(MigrationState *s)
> >      cpr_state_close();
> >      cpr_transfer_source_destroy(s);
> >  
> > -    close_return_path_on_source(s);
> > +    stop_return_path_thread_on_source(s);
> >  
> >      if (s->migration_thread_running) {
> >          bql_unlock();
> > @@ -1307,6 +1307,14 @@ static void migration_cleanup(MigrationState *s)
> >          qemu_fclose(tmp);
> >      }
> >  
> > +    WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> > +        tmp = s->rp_state.from_dst_file;
> > +        s->rp_state.from_dst_file = NULL;
> > +    }
> > +    if (tmp) {
> > +        qemu_fclose(tmp);
> > +    }
> > +
> >      assert(!migration_is_active());
> >  
> >      if (s->state == MIGRATION_STATUS_CANCELLING) {
> > @@ -2187,38 +2195,6 @@ static bool migrate_handle_rp_resume_ack(MigrationState *s,
> >      return true;
> >  }
> >  
> > -/*
> > - * Release ms->rp_state.from_dst_file (and postcopy_qemufile_src if
> > - * existed) in a safe way.
> > - */
> > -static void migration_release_dst_files(MigrationState *ms)
> > -{
> > -    QEMUFile *file = NULL;
> > -
> > -    WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
> > -        /*
> > -         * Reset the from_dst_file pointer first before releasing it, as we
> > -         * can't block within lock section
> > -         */
> > -        file = ms->rp_state.from_dst_file;
> > -        ms->rp_state.from_dst_file = NULL;
> > -    }
> > -
> > -    /*
> > -     * Do the same to postcopy fast path socket too if there is.  No
> > -     * locking needed because this qemufile should only be managed by
> > -     * return path thread.
> > -     */
> > -    if (ms->postcopy_qemufile_src) {
> > -        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
> > -        qemu_file_shutdown(ms->postcopy_qemufile_src);
> > -        qemu_fclose(ms->postcopy_qemufile_src);
> > -        ms->postcopy_qemufile_src = NULL;
> > -    }
> > -
> > -    qemu_fclose(file);
> > -}
> > -
> >  /*
> >   * Handles messages sent on the return path towards the source VM
> >   *
> > @@ -2388,9 +2364,9 @@ out:
> >      return NULL;
> >  }
> >  
> > -static void open_return_path_on_source(MigrationState *ms)
> > +static void start_return_path_thread_on_source(MigrationState *ms)  
> 
> Changing this seems OK, but I'm totally confused why you deleted
> migration_release_dst_files().  That's the helper to properly close the two
> possible qemufiles that rp thread uses.

Okay, so my reasoning here is:
I think that closing ms->postcopy_qemufile_src is very closely
related to cleaning up the rp thread so I moved that to
stop_return_path_thread_on_source().

Meanwhile I think s->rp_state.from_dst_file is more closely related to
s->to_dst_file and in fact I think we can remove s->rp_state.from_dst_file
entirely in the future and use s->to_dst_file for send *and* receive
just like you would with a normal socket.

So I close s->rp_state.from_dst_file right besides s->to_dst_file in
this patch. And I do the same open-coded file lock dance like
s->to_dst_file already does.

> IIUC you can keep it then use it
> in either postcopy_pause() or migration_cleanup() directly.

I can do that if you wish.
Should I keep closing ms->postcopy_qemufile_src inside
migration_release_dst_files() or stop_return_path_thread_on_source() ?

> 
> Then you need to remove the chunk [1] below, making the function only do
> "stop" but not close.
> 
> >  {
> > -    ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
> > +    assert(ms->rp_state.from_dst_file);  
> 
> I don't see why this change is a must, to open from_dst_file earlier.  Can
> we keep it as before, or would you justify it in a separate patch?

Well, the issue is that opening the return path file is behind this if:

 if (migrate_postcopy_ram() || migrate_return_path()) {
-        open_return_path_on_source(s);
+        start_return_path_thread_on_source(s);
 }

And I want to always open the return path file, without also starting
the return path thread.

> 
> >  
> >      trace_open_return_path_on_source();
> >  
> > @@ -2402,7 +2378,7 @@ static void open_return_path_on_source(MigrationState *ms)
> >  }
> >  
> >  /* Return true if error detected, or false otherwise */
> > -static bool close_return_path_on_source(MigrationState *ms)
> > +static bool stop_return_path_thread_on_source(MigrationState *ms)
> >  {
> >      if (!ms->rp_state.rp_thread_created) {
> >          return false;
> > @@ -2424,7 +2400,17 @@ static bool close_return_path_on_source(MigrationState *ms)
> >  
> >      qemu_thread_join(&ms->rp_state.rp_thread);
> >      ms->rp_state.rp_thread_created = false;
> > -    migration_release_dst_files(ms);
> > +    /*
> > +     * Close the postcopy fast path socket if there is one.
> > +     * No locking needed because this qemufile should only be managed by
> > +     * return path thread which we just stopped.
> > +     */
> > +    if (ms->postcopy_qemufile_src) {
> > +        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
> > +        qemu_file_shutdown(ms->postcopy_qemufile_src);
> > +        qemu_fclose(ms->postcopy_qemufile_src);
> > +        ms->postcopy_qemufile_src = NULL;
> > +    }  
> 
> [1]
> 
> >      trace_migration_return_path_end_after();
> >  
> >      /* Return path will persist the error in MigrationState when quit */
> > @@ -2787,7 +2773,7 @@ static void migration_completion(MigrationState *s)
> >          goto fail;
> >      }
> >  
> > -    if (close_return_path_on_source(s)) {
> > +    if (stop_return_path_thread_on_source(s)) {
> >          goto fail;
> >      }
> >  
> > @@ -2941,7 +2927,15 @@ static MigThrError postcopy_pause(MigrationState *s)
> >           * path and just wait for the thread to finish. It will be
> >           * re-created when we resume.
> >           */
> > -        close_return_path_on_source(s);
> > +        stop_return_path_thread_on_source(s);
> > +        QEMUFile *rp_file;
> > +        WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> > +            rp_file = s->rp_state.from_dst_file;
> > +            s->rp_state.from_dst_file = NULL;
> > +        }
> > +        if (rp_file) {
> > +            qemu_fclose(rp_file);
> > +        }  
> 
> Open-code this is going backwards.  Please see if we can reuse
> migration_release_dst_files() at least.
> 
> Thanks,
> 
> >  
> >          /*
> >           * Current channel is possibly broken. Release it.  Note that this is
> > @@ -3758,6 +3752,7 @@ void migration_start_outgoing(MigrationState *s)
> >      if (!qemu_file_set_blocking(s->to_dst_file, true, &local_err)) {
> >          goto fail;
> >      }
> > +    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
> >  
> >      /*
> >       * Open the return path. For postcopy, it is used exclusively. For
> > @@ -3765,7 +3760,7 @@ void migration_start_outgoing(MigrationState *s)
> >       * QEMU uses the return path.
> >       */
> >      if (migrate_postcopy_ram() || migrate_return_path()) {
> > -        open_return_path_on_source(s);
> > +        start_return_path_thread_on_source(s);
> >      }
> >
> >
> >  
> >      /*
> > 
> > -- 
> > 2.39.5
> >   
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v10 19/19] migration: Always open s->rp_state.from_dst_file on the source
  2026-02-27 11:49     ` Lukas Straub
@ 2026-02-27 17:22       ` Peter Xu
  2026-02-27 17:59         ` Lukas Straub
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2026-02-27 17:22 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

On Fri, Feb 27, 2026 at 12:49:03PM +0100, Lukas Straub wrote:
> On Thu, 26 Feb 2026 10:49:07 -0500
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Fri, Feb 20, 2026 at 08:51:41PM +0100, Lukas Straub wrote:
> > > qemu_file_get_return_path() can not fail (See also commit 3f9d6e77b)
> > > so always open the return path socket on the source. This allows
> > > us to reuse the return path in other parts like colo. Also take
> > > the proper locks in colo while we're at it.
> > > 
> > > This fixes a crash due to a race between migrate_cancel() and
> > > the colo thread shutting down.
> > > 
> > > Before, the rp socket is opened just before the rp thread is started
> > > and closed after it terminates and postcopy fast path is closed.
> > > Now it's the same, only the rp socket stays open until migration_cleanup().
> > > 
> > > If there is a rp thread, the rp socket is shut down at the end of migration,
> > > but the file is still open. COLO is not compatible with postcopy, so this is
> > > safe as no one else uses the rp socket after this point.
> > > 
> > > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > > ---
> > >  migration/colo.c      | 29 ++++++-------------
> > >  migration/migration.c | 77 ++++++++++++++++++++++++---------------------------
> > >  2 files changed, 44 insertions(+), 62 deletions(-)
> > > 
> > > diff --git a/migration/colo.c b/migration/colo.c
> > > index ce02c71d8857d470be434bdf3a9cacad3baab0d5..0dee33f1145b81af276cf318e2984deae9ae0527 100644
> > > --- a/migration/colo.c
> > > +++ b/migration/colo.c
> > > @@ -173,11 +173,13 @@ static void primary_vm_do_failover(void)
> > >       * The s->rp_state.from_dst_file and s->to_dst_file may use the
> > >       * same fd, but we still shutdown the fd for twice, it is harmless.
> > >       */
> > > -    if (s->to_dst_file) {
> > > -        qemu_file_shutdown(s->to_dst_file);
> > > -    }
> > > -    if (s->rp_state.from_dst_file) {
> > > -        qemu_file_shutdown(s->rp_state.from_dst_file);
> > > +    WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> > > +        if (s->to_dst_file) {
> > > +            qemu_file_shutdown(s->to_dst_file);
> > > +        }
> > > +        if (s->rp_state.from_dst_file) {
> > > +            qemu_file_shutdown(s->rp_state.from_dst_file);
> > > +        }  
> > 
> > Nit: these "start to take mutex for..." changes should belong to a separate
> > patch.
> > 
> 
> Will do.
> 
> > >      }
> > >  
> > >      old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
> > > @@ -537,6 +539,7 @@ static void colo_process_checkpoint(MigrationState *s)
> > >      Error *local_err = NULL;
> > >      int ret;
> > >  
> > > +    assert(s->rp_state.from_dst_file);
> > >      if (get_colo_mode() != COLO_MODE_PRIMARY) {
> > >          error_report("COLO mode must be COLO_MODE_PRIMARY");
> > >          return;
> > > @@ -544,12 +547,6 @@ static void colo_process_checkpoint(MigrationState *s)
> > >  
> > >      failover_init_state();
> > >  
> > > -    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
> > > -    if (!s->rp_state.from_dst_file) {
> > > -        error_report("Open QEMUFile from_dst_file failed");
> > > -        goto out;
> > > -    }
> > > -
> > >      packets_compare_notifier.notify = colo_compare_notify_checkpoint;
> > >      colo_compare_register_notifier(&packets_compare_notifier);
> > >  
> > > @@ -634,16 +631,6 @@ out:
> > >      colo_compare_unregister_notifier(&packets_compare_notifier);
> > >      timer_free(s->colo_delay_timer);
> > >      qemu_event_destroy(&s->colo_checkpoint_event);
> > > -
> > > -    /*
> > > -     * Must be called after failover BH is completed,
> > > -     * Or the failover BH may shutdown the wrong fd that
> > > -     * re-used by other threads after we release here.
> > > -     */
> > > -    if (s->rp_state.from_dst_file) {
> > > -        qemu_fclose(s->rp_state.from_dst_file);
> > > -        s->rp_state.from_dst_file = NULL;
> > > -    }
> > >  }
> > >  
> > >  void migrate_start_colo_process(MigrationState *s)
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index f36d42ef657bdf26d78ca642d77a9b76e1c0c174..8caa56940beef12de33a799695cf486c8fbd471c 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -97,7 +97,7 @@ static GSList *migration_blockers[MIG_MODE__MAX];
> > >  
> > >  static bool migration_object_check(MigrationState *ms, Error **errp);
> > >  static bool migration_switchover_start(MigrationState *s, Error **errp);
> > > -static bool close_return_path_on_source(MigrationState *s);
> > > +static bool stop_return_path_thread_on_source(MigrationState *s);
> > >  static void migration_completion_end(MigrationState *s);
> > >  
> > >  static void migration_downtime_start(MigrationState *s)
> > > @@ -1278,7 +1278,7 @@ static void migration_cleanup(MigrationState *s)
> > >      cpr_state_close();
> > >      cpr_transfer_source_destroy(s);
> > >  
> > > -    close_return_path_on_source(s);
> > > +    stop_return_path_thread_on_source(s);
> > >  
> > >      if (s->migration_thread_running) {
> > >          bql_unlock();
> > > @@ -1307,6 +1307,14 @@ static void migration_cleanup(MigrationState *s)
> > >          qemu_fclose(tmp);
> > >      }
> > >  
> > > +    WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> > > +        tmp = s->rp_state.from_dst_file;
> > > +        s->rp_state.from_dst_file = NULL;
> > > +    }
> > > +    if (tmp) {
> > > +        qemu_fclose(tmp);
> > > +    }
> > > +
> > >      assert(!migration_is_active());
> > >  
> > >      if (s->state == MIGRATION_STATUS_CANCELLING) {
> > > @@ -2187,38 +2195,6 @@ static bool migrate_handle_rp_resume_ack(MigrationState *s,
> > >      return true;
> > >  }
> > >  
> > > -/*
> > > - * Release ms->rp_state.from_dst_file (and postcopy_qemufile_src if
> > > - * existed) in a safe way.
> > > - */
> > > -static void migration_release_dst_files(MigrationState *ms)
> > > -{
> > > -    QEMUFile *file = NULL;
> > > -
> > > -    WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
> > > -        /*
> > > -         * Reset the from_dst_file pointer first before releasing it, as we
> > > -         * can't block within lock section
> > > -         */
> > > -        file = ms->rp_state.from_dst_file;
> > > -        ms->rp_state.from_dst_file = NULL;
> > > -    }
> > > -
> > > -    /*
> > > -     * Do the same to postcopy fast path socket too if there is.  No
> > > -     * locking needed because this qemufile should only be managed by
> > > -     * return path thread.
> > > -     */
> > > -    if (ms->postcopy_qemufile_src) {
> > > -        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
> > > -        qemu_file_shutdown(ms->postcopy_qemufile_src);
> > > -        qemu_fclose(ms->postcopy_qemufile_src);
> > > -        ms->postcopy_qemufile_src = NULL;
> > > -    }
> > > -
> > > -    qemu_fclose(file);
> > > -}
> > > -
> > >  /*
> > >   * Handles messages sent on the return path towards the source VM
> > >   *
> > > @@ -2388,9 +2364,9 @@ out:
> > >      return NULL;
> > >  }
> > >  
> > > -static void open_return_path_on_source(MigrationState *ms)
> > > +static void start_return_path_thread_on_source(MigrationState *ms)  
> > 
> > Changing this seems OK, but I'm totally confused why you deleted
> > migration_release_dst_files().  That's the helper to properly close the two
> > possible qemufiles that rp thread uses.
> 
> Okay, so my reasoning here is:
> I think that closing ms->postcopy_qemufile_src is very closely
> related to cleaning up the rp thread so I moved that to
> stop_return_path_thread_on_source().

Hmm OK, I had that feeling you wanted to move all qemufile cleanups into
the cleanup function.

Actually I think that may still be easier for you, e.g. you can at least
reuse the migration_release_dst_files().  I don't see much benefit yet on
open code the preempt qemufiles..

> 
> Meanwhile I think s->rp_state.from_dst_file is more closely related to
> s->to_dst_file and in fact I think we can remove s->rp_state.from_dst_file
> entirely in the future and use s->to_dst_file for send *and* receive
> just like you would with a normal socket.
> 
> So I close s->rp_state.from_dst_file right besides s->to_dst_file in
> this patch. And I do the same open-coded file lock dance like
> s->to_dst_file already does.

Let's not do open coded things.  That makes things worse.

If you also agree we can cleanup qemufile alawys only until migration
cleanup then we can stick with it for now.

> 
> > IIUC you can keep it then use it
> > in either postcopy_pause() or migration_cleanup() directly.
> 
> I can do that if you wish.
> Should I keep closing ms->postcopy_qemufile_src inside
> migration_release_dst_files() or stop_return_path_thread_on_source() ?

If you use migration_release_dst_files() in both (1) postcopy pause and (2)
migration cleanup, IIUC we should covered all cases.  But please double
check.

> 
> > 
> > Then you need to remove the chunk [1] below, making the function only do
> > "stop" but not close.
> > 
> > >  {
> > > -    ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
> > > +    assert(ms->rp_state.from_dst_file);  
> > 
> > I don't see why this change is a must, to open from_dst_file earlier.  Can
> > we keep it as before, or would you justify it in a separate patch?
> 
> Well, the issue is that opening the return path file is behind this if:
> 
>  if (migrate_postcopy_ram() || migrate_return_path()) {
> -        open_return_path_on_source(s);
> +        start_return_path_thread_on_source(s);
>  }
> 
> And I want to always open the return path file, without also starting
> the return path thread.

I never notice this small difference.. then logically COLO should just
depend on return-path capability.

IIUC the simplest for your series to move on is:

  - If COLO always require return-path (I hope you know the best... e.g. do
    you enable return-path cap for your colo deployment?), we can add that
    enforcement in migrate_caps_check().  It's an ABI break only to COLO,
    but if you're OK, I don't see it an issue.

  - Just add one more condition above into:

  if (migrate_postcopy_ram() || migrate_return_path() || migrate_colo()) {
          open_return_path_on_source(s);
  }

  Instead of making rp complicated; after all many places assume rp thread
  is there when rp qemufile is there, vice versa.  Changing that needs more
  monitoring of code base, IMHO.  Better stick with it to be simple.

> 
> > 
> > >  
> > >      trace_open_return_path_on_source();
> > >  
> > > @@ -2402,7 +2378,7 @@ static void open_return_path_on_source(MigrationState *ms)
> > >  }
> > >  
> > >  /* Return true if error detected, or false otherwise */
> > > -static bool close_return_path_on_source(MigrationState *ms)
> > > +static bool stop_return_path_thread_on_source(MigrationState *ms)
> > >  {
> > >      if (!ms->rp_state.rp_thread_created) {
> > >          return false;
> > > @@ -2424,7 +2400,17 @@ static bool close_return_path_on_source(MigrationState *ms)
> > >  
> > >      qemu_thread_join(&ms->rp_state.rp_thread);
> > >      ms->rp_state.rp_thread_created = false;
> > > -    migration_release_dst_files(ms);
> > > +    /*
> > > +     * Close the postcopy fast path socket if there is one.
> > > +     * No locking needed because this qemufile should only be managed by
> > > +     * return path thread which we just stopped.
> > > +     */
> > > +    if (ms->postcopy_qemufile_src) {
> > > +        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
> > > +        qemu_file_shutdown(ms->postcopy_qemufile_src);
> > > +        qemu_fclose(ms->postcopy_qemufile_src);
> > > +        ms->postcopy_qemufile_src = NULL;
> > > +    }  
> > 
> > [1]
> > 
> > >      trace_migration_return_path_end_after();
> > >  
> > >      /* Return path will persist the error in MigrationState when quit */
> > > @@ -2787,7 +2773,7 @@ static void migration_completion(MigrationState *s)
> > >          goto fail;
> > >      }
> > >  
> > > -    if (close_return_path_on_source(s)) {
> > > +    if (stop_return_path_thread_on_source(s)) {
> > >          goto fail;
> > >      }
> > >  
> > > @@ -2941,7 +2927,15 @@ static MigThrError postcopy_pause(MigrationState *s)
> > >           * path and just wait for the thread to finish. It will be
> > >           * re-created when we resume.
> > >           */
> > > -        close_return_path_on_source(s);
> > > +        stop_return_path_thread_on_source(s);
> > > +        QEMUFile *rp_file;
> > > +        WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> > > +            rp_file = s->rp_state.from_dst_file;
> > > +            s->rp_state.from_dst_file = NULL;
> > > +        }
> > > +        if (rp_file) {
> > > +            qemu_fclose(rp_file);
> > > +        }  
> > 
> > Open-code this is going backwards.  Please see if we can reuse
> > migration_release_dst_files() at least.
> > 
> > Thanks,
> > 
> > >  
> > >          /*
> > >           * Current channel is possibly broken. Release it.  Note that this is
> > > @@ -3758,6 +3752,7 @@ void migration_start_outgoing(MigrationState *s)
> > >      if (!qemu_file_set_blocking(s->to_dst_file, true, &local_err)) {
> > >          goto fail;
> > >      }
> > > +    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
> > >  
> > >      /*
> > >       * Open the return path. For postcopy, it is used exclusively. For
> > > @@ -3765,7 +3760,7 @@ void migration_start_outgoing(MigrationState *s)
> > >       * QEMU uses the return path.
> > >       */
> > >      if (migrate_postcopy_ram() || migrate_return_path()) {
> > > -        open_return_path_on_source(s);
> > > +        start_return_path_thread_on_source(s);
> > >      }
> > >
> > >
> > >  
> > >      /*
> > > 
> > > -- 
> > > 2.39.5
> > >   
> > 
> 



-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v10 19/19] migration: Always open s->rp_state.from_dst_file on the source
  2026-02-27 17:22       ` Peter Xu
@ 2026-02-27 17:59         ` Lukas Straub
  2026-02-28  0:22           ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Lukas Straub @ 2026-02-27 17:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 15776 bytes --]

On Fri, 27 Feb 2026 12:22:18 -0500
Peter Xu <peterx@redhat.com> wrote:

> On Fri, Feb 27, 2026 at 12:49:03PM +0100, Lukas Straub wrote:
> > On Thu, 26 Feb 2026 10:49:07 -0500
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Fri, Feb 20, 2026 at 08:51:41PM +0100, Lukas Straub wrote:  
> > > > qemu_file_get_return_path() can not fail (See also commit 3f9d6e77b)
> > > > so always open the return path socket on the source. This allows
> > > > us to reuse the return path in other parts like colo. Also take
> > > > the proper locks in colo while we're at it.
> > > > 
> > > > This fixes a crash due to a race between migrate_cancel() and
> > > > the colo thread shutting down.
> > > > 
> > > > Before, the rp socket is opened just before the rp thread is started
> > > > and closed after it terminates and postcopy fast path is closed.
> > > > Now it's the same, only the rp socket stays open until migration_cleanup().
> > > > 
> > > > If there is a rp thread, the rp socket is shut down at the end of migration,
> > > > but the file is still open. COLO is not compatible with postcopy, so this is
> > > > safe as no one else uses the rp socket after this point.
> > > > 
> > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > > > ---
> > > >  migration/colo.c      | 29 ++++++-------------
> > > >  migration/migration.c | 77 ++++++++++++++++++++++++---------------------------
> > > >  2 files changed, 44 insertions(+), 62 deletions(-)
> > > > 
> > > > diff --git a/migration/colo.c b/migration/colo.c
> > > > index ce02c71d8857d470be434bdf3a9cacad3baab0d5..0dee33f1145b81af276cf318e2984deae9ae0527 100644
> > > > --- a/migration/colo.c
> > > > +++ b/migration/colo.c
> > > > @@ -173,11 +173,13 @@ static void primary_vm_do_failover(void)
> > > >       * The s->rp_state.from_dst_file and s->to_dst_file may use the
> > > >       * same fd, but we still shutdown the fd for twice, it is harmless.
> > > >       */
> > > > -    if (s->to_dst_file) {
> > > > -        qemu_file_shutdown(s->to_dst_file);
> > > > -    }
> > > > -    if (s->rp_state.from_dst_file) {
> > > > -        qemu_file_shutdown(s->rp_state.from_dst_file);
> > > > +    WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> > > > +        if (s->to_dst_file) {
> > > > +            qemu_file_shutdown(s->to_dst_file);
> > > > +        }
> > > > +        if (s->rp_state.from_dst_file) {
> > > > +            qemu_file_shutdown(s->rp_state.from_dst_file);
> > > > +        }    
> > > 
> > > Nit: these "start to take mutex for..." changes should belong to a separate
> > > patch.
> > >   
> > 
> > Will do.
> >   
> > > >      }
> > > >  
> > > >      old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
> > > > @@ -537,6 +539,7 @@ static void colo_process_checkpoint(MigrationState *s)
> > > >      Error *local_err = NULL;
> > > >      int ret;
> > > >  
> > > > +    assert(s->rp_state.from_dst_file);
> > > >      if (get_colo_mode() != COLO_MODE_PRIMARY) {
> > > >          error_report("COLO mode must be COLO_MODE_PRIMARY");
> > > >          return;
> > > > @@ -544,12 +547,6 @@ static void colo_process_checkpoint(MigrationState *s)
> > > >  
> > > >      failover_init_state();
> > > >  
> > > > -    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
> > > > -    if (!s->rp_state.from_dst_file) {
> > > > -        error_report("Open QEMUFile from_dst_file failed");
> > > > -        goto out;
> > > > -    }
> > > > -
> > > >      packets_compare_notifier.notify = colo_compare_notify_checkpoint;
> > > >      colo_compare_register_notifier(&packets_compare_notifier);
> > > >  
> > > > @@ -634,16 +631,6 @@ out:
> > > >      colo_compare_unregister_notifier(&packets_compare_notifier);
> > > >      timer_free(s->colo_delay_timer);
> > > >      qemu_event_destroy(&s->colo_checkpoint_event);
> > > > -
> > > > -    /*
> > > > -     * Must be called after failover BH is completed,
> > > > -     * Or the failover BH may shutdown the wrong fd that
> > > > -     * re-used by other threads after we release here.
> > > > -     */
> > > > -    if (s->rp_state.from_dst_file) {
> > > > -        qemu_fclose(s->rp_state.from_dst_file);
> > > > -        s->rp_state.from_dst_file = NULL;
> > > > -    }
> > > >  }
> > > >  
> > > >  void migrate_start_colo_process(MigrationState *s)
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index f36d42ef657bdf26d78ca642d77a9b76e1c0c174..8caa56940beef12de33a799695cf486c8fbd471c 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -97,7 +97,7 @@ static GSList *migration_blockers[MIG_MODE__MAX];
> > > >  
> > > >  static bool migration_object_check(MigrationState *ms, Error **errp);
> > > >  static bool migration_switchover_start(MigrationState *s, Error **errp);
> > > > -static bool close_return_path_on_source(MigrationState *s);
> > > > +static bool stop_return_path_thread_on_source(MigrationState *s);
> > > >  static void migration_completion_end(MigrationState *s);
> > > >  
> > > >  static void migration_downtime_start(MigrationState *s)
> > > > @@ -1278,7 +1278,7 @@ static void migration_cleanup(MigrationState *s)
> > > >      cpr_state_close();
> > > >      cpr_transfer_source_destroy(s);
> > > >  
> > > > -    close_return_path_on_source(s);
> > > > +    stop_return_path_thread_on_source(s);
> > > >  
> > > >      if (s->migration_thread_running) {
> > > >          bql_unlock();
> > > > @@ -1307,6 +1307,14 @@ static void migration_cleanup(MigrationState *s)
> > > >          qemu_fclose(tmp);
> > > >      }
> > > >  
> > > > +    WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> > > > +        tmp = s->rp_state.from_dst_file;
> > > > +        s->rp_state.from_dst_file = NULL;
> > > > +    }
> > > > +    if (tmp) {
> > > > +        qemu_fclose(tmp);
> > > > +    }
> > > > +
> > > >      assert(!migration_is_active());
> > > >  
> > > >      if (s->state == MIGRATION_STATUS_CANCELLING) {
> > > > @@ -2187,38 +2195,6 @@ static bool migrate_handle_rp_resume_ack(MigrationState *s,
> > > >      return true;
> > > >  }
> > > >  
> > > > -/*
> > > > - * Release ms->rp_state.from_dst_file (and postcopy_qemufile_src if
> > > > - * existed) in a safe way.
> > > > - */
> > > > -static void migration_release_dst_files(MigrationState *ms)
> > > > -{
> > > > -    QEMUFile *file = NULL;
> > > > -
> > > > -    WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
> > > > -        /*
> > > > -         * Reset the from_dst_file pointer first before releasing it, as we
> > > > -         * can't block within lock section
> > > > -         */
> > > > -        file = ms->rp_state.from_dst_file;
> > > > -        ms->rp_state.from_dst_file = NULL;
> > > > -    }
> > > > -
> > > > -    /*
> > > > -     * Do the same to postcopy fast path socket too if there is.  No
> > > > -     * locking needed because this qemufile should only be managed by
> > > > -     * return path thread.
> > > > -     */
> > > > -    if (ms->postcopy_qemufile_src) {
> > > > -        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
> > > > -        qemu_file_shutdown(ms->postcopy_qemufile_src);
> > > > -        qemu_fclose(ms->postcopy_qemufile_src);
> > > > -        ms->postcopy_qemufile_src = NULL;
> > > > -    }
> > > > -
> > > > -    qemu_fclose(file);
> > > > -}
> > > > -
> > > >  /*
> > > >   * Handles messages sent on the return path towards the source VM
> > > >   *
> > > > @@ -2388,9 +2364,9 @@ out:
> > > >      return NULL;
> > > >  }
> > > >  
> > > > -static void open_return_path_on_source(MigrationState *ms)
> > > > +static void start_return_path_thread_on_source(MigrationState *ms)    
> > > 
> > > Changing this seems OK, but I'm totally confused why you deleted
> > > migration_release_dst_files().  That's the helper to properly close the two
> > > possible qemufiles that rp thread uses.  
> > 
> > Okay, so my reasoning here is:
> > I think that closing ms->postcopy_qemufile_src is very closely
> > related to cleaning up the rp thread so I moved that to
> > stop_return_path_thread_on_source().  
> 
> Hmm OK, I had that feeling you wanted to move all qemufile cleanups into
> the cleanup function.
> 
> Actually I think that may still be easier for you, e.g. you can at least
> reuse the migration_release_dst_files().  I don't see much benefit yet on
> open code the preempt qemufiles..
> 
> > 
> > Meanwhile I think s->rp_state.from_dst_file is more closely related to
> > s->to_dst_file and in fact I think we can remove s->rp_state.from_dst_file
> > entirely in the future and use s->to_dst_file for send *and* receive
> > just like you would with a normal socket.
> > 
> > So I close s->rp_state.from_dst_file right besides s->to_dst_file in
> > this patch. And I do the same open-coded file lock dance like
> > s->to_dst_file already does.  
> 
> Let's not do open coded things.  That makes things worse.
> 
> If you also agree we can cleanup qemufile alawys only until migration
> cleanup then we can stick with it for now.
> 
> >   
> > > IIUC you can keep it then use it
> > > in either postcopy_pause() or migration_cleanup() directly.  
> > 
> > I can do that if you wish.
> > Should I keep closing ms->postcopy_qemufile_src inside
> > migration_release_dst_files() or stop_return_path_thread_on_source() ?  
> 
> If you use migration_release_dst_files() in both (1) postcopy pause and (2)
> migration cleanup, IIUC we should covered all cases.  But please double
> check.

Even better idea: Add a helper that closes s->to_dst_file and s->rp_state.from_dst_file
and use that in postcopy pause and migration cleanup.

What do you think?

> >   
> > > 
> > > Then you need to remove the chunk [1] below, making the function only do
> > > "stop" but not close.
> > >   
> > > >  {
> > > > -    ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
> > > > +    assert(ms->rp_state.from_dst_file);    
> > > 
> > > I don't see why this change is a must, to open from_dst_file earlier.  Can
> > > we keep it as before, or would you justify it in a separate patch?  
> > 
> > Well, the issue is that opening the return path file is behind this if:
> > 
> >  if (migrate_postcopy_ram() || migrate_return_path()) {
> > -        open_return_path_on_source(s);
> > +        start_return_path_thread_on_source(s);
> >  }
> > 
> > And I want to always open the return path file, without also starting
> > the return path thread.  
> 
> I never notice this small difference.. then logically COLO should just
> depend on return-path capability.
> 
> IIUC the simplest for your series to move on is:
> 
>   - If COLO always require return-path (I hope you know the best... e.g. do
>     you enable return-path cap for your colo deployment?), we can add that
>     enforcement in migrate_caps_check().  It's an ABI break only to COLO,
>     but if you're OK, I don't see it an issue.
> 
>   - Just add one more condition above into:
> 
>   if (migrate_postcopy_ram() || migrate_return_path() || migrate_colo()) {
>           open_return_path_on_source(s);
>   }
> 
>   Instead of making rp complicated; after all many places assume rp thread
>   is there when rp qemufile is there, vice versa.  Changing that needs more
>   monitoring of code base, IMHO.  Better stick with it to be simple.

That doesn't work either, because now you have the rp thread running
and to stop the thread we
qemu_file_shutdown(ms->rp_state.from_dst_file).
And we're back to square one because we can't reuse the shut
s->rp_state.from_dst_file in colo.

In fact looking at the code, qemu_file_shutdown() shuts both
ms->rp_state.from_dst_file *and* s->to_dst_file because
it calls qio_channel_shutdown(f->ioc, QIO_CHANNEL_SHUTDOWN_BOTH) and
f->ioc is the same io channel for both files.

So after stop_return_path_thread_on_source() our connection the the
secondary is severed. That's now workable.

> 
> >   
> > >   
> > > >  
> > > >      trace_open_return_path_on_source();
> > > >  
> > > > @@ -2402,7 +2378,7 @@ static void open_return_path_on_source(MigrationState *ms)
> > > >  }
> > > >  
> > > >  /* Return true if error detected, or false otherwise */
> > > > -static bool close_return_path_on_source(MigrationState *ms)
> > > > +static bool stop_return_path_thread_on_source(MigrationState *ms)
> > > >  {
> > > >      if (!ms->rp_state.rp_thread_created) {
> > > >          return false;
> > > > @@ -2424,7 +2400,17 @@ static bool close_return_path_on_source(MigrationState *ms)
> > > >  
> > > >      qemu_thread_join(&ms->rp_state.rp_thread);
> > > >      ms->rp_state.rp_thread_created = false;
> > > > -    migration_release_dst_files(ms);
> > > > +    /*
> > > > +     * Close the postcopy fast path socket if there is one.
> > > > +     * No locking needed because this qemufile should only be managed by
> > > > +     * return path thread which we just stopped.
> > > > +     */
> > > > +    if (ms->postcopy_qemufile_src) {
> > > > +        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
> > > > +        qemu_file_shutdown(ms->postcopy_qemufile_src);
> > > > +        qemu_fclose(ms->postcopy_qemufile_src);
> > > > +        ms->postcopy_qemufile_src = NULL;
> > > > +    }    
> > > 
> > > [1]
> > >   
> > > >      trace_migration_return_path_end_after();
> > > >  
> > > >      /* Return path will persist the error in MigrationState when quit */
> > > > @@ -2787,7 +2773,7 @@ static void migration_completion(MigrationState *s)
> > > >          goto fail;
> > > >      }
> > > >  
> > > > -    if (close_return_path_on_source(s)) {
> > > > +    if (stop_return_path_thread_on_source(s)) {
> > > >          goto fail;
> > > >      }
> > > >  
> > > > @@ -2941,7 +2927,15 @@ static MigThrError postcopy_pause(MigrationState *s)
> > > >           * path and just wait for the thread to finish. It will be
> > > >           * re-created when we resume.
> > > >           */
> > > > -        close_return_path_on_source(s);
> > > > +        stop_return_path_thread_on_source(s);
> > > > +        QEMUFile *rp_file;
> > > > +        WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> > > > +            rp_file = s->rp_state.from_dst_file;
> > > > +            s->rp_state.from_dst_file = NULL;
> > > > +        }
> > > > +        if (rp_file) {
> > > > +            qemu_fclose(rp_file);
> > > > +        }    
> > > 
> > > Open-code this is going backwards.  Please see if we can reuse
> > > migration_release_dst_files() at least.
> > > 
> > > Thanks,
> > >   
> > > >  
> > > >          /*
> > > >           * Current channel is possibly broken. Release it.  Note that this is
> > > > @@ -3758,6 +3752,7 @@ void migration_start_outgoing(MigrationState *s)
> > > >      if (!qemu_file_set_blocking(s->to_dst_file, true, &local_err)) {
> > > >          goto fail;
> > > >      }
> > > > +    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
> > > >  
> > > >      /*
> > > >       * Open the return path. For postcopy, it is used exclusively. For
> > > > @@ -3765,7 +3760,7 @@ void migration_start_outgoing(MigrationState *s)
> > > >       * QEMU uses the return path.
> > > >       */
> > > >      if (migrate_postcopy_ram() || migrate_return_path()) {
> > > > -        open_return_path_on_source(s);
> > > > +        start_return_path_thread_on_source(s);
> > > >      }
> > > >
> > > >
> > > >  
> > > >      /*
> > > > 
> > > > -- 
> > > > 2.39.5
> > > >     
> > >   
> >   
> 
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v10 19/19] migration: Always open s->rp_state.from_dst_file on the source
  2026-02-27 17:59         ` Lukas Straub
@ 2026-02-28  0:22           ` Peter Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2026-02-28  0:22 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Zhang Chen, Hailiang Zhang, Markus Armbruster, Li Zhijian,
	Dr. David Alan Gilbert

On Fri, Feb 27, 2026 at 06:59:14PM +0100, Lukas Straub wrote:
> On Fri, 27 Feb 2026 12:22:18 -0500
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Fri, Feb 27, 2026 at 12:49:03PM +0100, Lukas Straub wrote:
> > > On Thu, 26 Feb 2026 10:49:07 -0500
> > > Peter Xu <peterx@redhat.com> wrote:
> > >   
> > > > On Fri, Feb 20, 2026 at 08:51:41PM +0100, Lukas Straub wrote:  
> > > > > qemu_file_get_return_path() can not fail (See also commit 3f9d6e77b)
> > > > > so always open the return path socket on the source. This allows
> > > > > us to reuse the return path in other parts like colo. Also take
> > > > > the proper locks in colo while we're at it.
> > > > > 
> > > > > This fixes a crash due to a race between migrate_cancel() and
> > > > > the colo thread shutting down.
> > > > > 
> > > > > Before, the rp socket is opened just before the rp thread is started
> > > > > and closed after it terminates and postcopy fast path is closed.
> > > > > Now it's the same, only the rp socket stays open until migration_cleanup().
> > > > > 
> > > > > If there is a rp thread, the rp socket is shut down at the end of migration,
> > > > > but the file is still open. COLO is not compatible with postcopy, so this is
> > > > > safe as no one else uses the rp socket after this point.
> > > > > 
> > > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > > > > ---
> > > > >  migration/colo.c      | 29 ++++++-------------
> > > > >  migration/migration.c | 77 ++++++++++++++++++++++++---------------------------
> > > > >  2 files changed, 44 insertions(+), 62 deletions(-)
> > > > > 
> > > > > diff --git a/migration/colo.c b/migration/colo.c
> > > > > index ce02c71d8857d470be434bdf3a9cacad3baab0d5..0dee33f1145b81af276cf318e2984deae9ae0527 100644
> > > > > --- a/migration/colo.c
> > > > > +++ b/migration/colo.c
> > > > > @@ -173,11 +173,13 @@ static void primary_vm_do_failover(void)
> > > > >       * The s->rp_state.from_dst_file and s->to_dst_file may use the
> > > > >       * same fd, but we still shutdown the fd for twice, it is harmless.
> > > > >       */
> > > > > -    if (s->to_dst_file) {
> > > > > -        qemu_file_shutdown(s->to_dst_file);
> > > > > -    }
> > > > > -    if (s->rp_state.from_dst_file) {
> > > > > -        qemu_file_shutdown(s->rp_state.from_dst_file);
> > > > > +    WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> > > > > +        if (s->to_dst_file) {
> > > > > +            qemu_file_shutdown(s->to_dst_file);
> > > > > +        }
> > > > > +        if (s->rp_state.from_dst_file) {
> > > > > +            qemu_file_shutdown(s->rp_state.from_dst_file);
> > > > > +        }    
> > > > 
> > > > Nit: these "start to take mutex for..." changes should belong to a separate
> > > > patch.
> > > >   
> > > 
> > > Will do.
> > >   
> > > > >      }
> > > > >  
> > > > >      old_state = failover_set_state(FAILOVER_STATUS_ACTIVE,
> > > > > @@ -537,6 +539,7 @@ static void colo_process_checkpoint(MigrationState *s)
> > > > >      Error *local_err = NULL;
> > > > >      int ret;
> > > > >  
> > > > > +    assert(s->rp_state.from_dst_file);
> > > > >      if (get_colo_mode() != COLO_MODE_PRIMARY) {
> > > > >          error_report("COLO mode must be COLO_MODE_PRIMARY");
> > > > >          return;
> > > > > @@ -544,12 +547,6 @@ static void colo_process_checkpoint(MigrationState *s)
> > > > >  
> > > > >      failover_init_state();
> > > > >  
> > > > > -    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
> > > > > -    if (!s->rp_state.from_dst_file) {
> > > > > -        error_report("Open QEMUFile from_dst_file failed");
> > > > > -        goto out;
> > > > > -    }
> > > > > -
> > > > >      packets_compare_notifier.notify = colo_compare_notify_checkpoint;
> > > > >      colo_compare_register_notifier(&packets_compare_notifier);
> > > > >  
> > > > > @@ -634,16 +631,6 @@ out:
> > > > >      colo_compare_unregister_notifier(&packets_compare_notifier);
> > > > >      timer_free(s->colo_delay_timer);
> > > > >      qemu_event_destroy(&s->colo_checkpoint_event);
> > > > > -
> > > > > -    /*
> > > > > -     * Must be called after failover BH is completed,
> > > > > -     * Or the failover BH may shutdown the wrong fd that
> > > > > -     * re-used by other threads after we release here.
> > > > > -     */
> > > > > -    if (s->rp_state.from_dst_file) {
> > > > > -        qemu_fclose(s->rp_state.from_dst_file);
> > > > > -        s->rp_state.from_dst_file = NULL;
> > > > > -    }
> > > > >  }
> > > > >  
> > > > >  void migrate_start_colo_process(MigrationState *s)
> > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > index f36d42ef657bdf26d78ca642d77a9b76e1c0c174..8caa56940beef12de33a799695cf486c8fbd471c 100644
> > > > > --- a/migration/migration.c
> > > > > +++ b/migration/migration.c
> > > > > @@ -97,7 +97,7 @@ static GSList *migration_blockers[MIG_MODE__MAX];
> > > > >  
> > > > >  static bool migration_object_check(MigrationState *ms, Error **errp);
> > > > >  static bool migration_switchover_start(MigrationState *s, Error **errp);
> > > > > -static bool close_return_path_on_source(MigrationState *s);
> > > > > +static bool stop_return_path_thread_on_source(MigrationState *s);
> > > > >  static void migration_completion_end(MigrationState *s);
> > > > >  
> > > > >  static void migration_downtime_start(MigrationState *s)
> > > > > @@ -1278,7 +1278,7 @@ static void migration_cleanup(MigrationState *s)
> > > > >      cpr_state_close();
> > > > >      cpr_transfer_source_destroy(s);
> > > > >  
> > > > > -    close_return_path_on_source(s);
> > > > > +    stop_return_path_thread_on_source(s);
> > > > >  
> > > > >      if (s->migration_thread_running) {
> > > > >          bql_unlock();
> > > > > @@ -1307,6 +1307,14 @@ static void migration_cleanup(MigrationState *s)
> > > > >          qemu_fclose(tmp);
> > > > >      }
> > > > >  
> > > > > +    WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> > > > > +        tmp = s->rp_state.from_dst_file;
> > > > > +        s->rp_state.from_dst_file = NULL;
> > > > > +    }
> > > > > +    if (tmp) {
> > > > > +        qemu_fclose(tmp);
> > > > > +    }
> > > > > +
> > > > >      assert(!migration_is_active());
> > > > >  
> > > > >      if (s->state == MIGRATION_STATUS_CANCELLING) {
> > > > > @@ -2187,38 +2195,6 @@ static bool migrate_handle_rp_resume_ack(MigrationState *s,
> > > > >      return true;
> > > > >  }
> > > > >  
> > > > > -/*
> > > > > - * Release ms->rp_state.from_dst_file (and postcopy_qemufile_src if
> > > > > - * existed) in a safe way.
> > > > > - */
> > > > > -static void migration_release_dst_files(MigrationState *ms)
> > > > > -{
> > > > > -    QEMUFile *file = NULL;
> > > > > -
> > > > > -    WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
> > > > > -        /*
> > > > > -         * Reset the from_dst_file pointer first before releasing it, as we
> > > > > -         * can't block within lock section
> > > > > -         */
> > > > > -        file = ms->rp_state.from_dst_file;
> > > > > -        ms->rp_state.from_dst_file = NULL;
> > > > > -    }
> > > > > -
> > > > > -    /*
> > > > > -     * Do the same to postcopy fast path socket too if there is.  No
> > > > > -     * locking needed because this qemufile should only be managed by
> > > > > -     * return path thread.
> > > > > -     */
> > > > > -    if (ms->postcopy_qemufile_src) {
> > > > > -        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
> > > > > -        qemu_file_shutdown(ms->postcopy_qemufile_src);
> > > > > -        qemu_fclose(ms->postcopy_qemufile_src);
> > > > > -        ms->postcopy_qemufile_src = NULL;
> > > > > -    }
> > > > > -
> > > > > -    qemu_fclose(file);
> > > > > -}
> > > > > -
> > > > >  /*
> > > > >   * Handles messages sent on the return path towards the source VM
> > > > >   *
> > > > > @@ -2388,9 +2364,9 @@ out:
> > > > >      return NULL;
> > > > >  }
> > > > >  
> > > > > -static void open_return_path_on_source(MigrationState *ms)
> > > > > +static void start_return_path_thread_on_source(MigrationState *ms)    
> > > > 
> > > > Changing this seems OK, but I'm totally confused why you deleted
> > > > migration_release_dst_files().  That's the helper to properly close the two
> > > > possible qemufiles that rp thread uses.  
> > > 
> > > Okay, so my reasoning here is:
> > > I think that closing ms->postcopy_qemufile_src is very closely
> > > related to cleaning up the rp thread so I moved that to
> > > stop_return_path_thread_on_source().  
> > 
> > Hmm OK, I had that feeling you wanted to move all qemufile cleanups into
> > the cleanup function.
> > 
> > Actually I think that may still be easier for you, e.g. you can at least
> > reuse the migration_release_dst_files().  I don't see much benefit yet on
> > open code the preempt qemufiles..
> > 
> > > 
> > > Meanwhile I think s->rp_state.from_dst_file is more closely related to
> > > s->to_dst_file and in fact I think we can remove s->rp_state.from_dst_file
> > > entirely in the future and use s->to_dst_file for send *and* receive
> > > just like you would with a normal socket.
> > > 
> > > So I close s->rp_state.from_dst_file right besides s->to_dst_file in
> > > this patch. And I do the same open-coded file lock dance like
> > > s->to_dst_file already does.  
> > 
> > Let's not do open coded things.  That makes things worse.
> > 
> > If you also agree we can cleanup qemufile alawys only until migration
> > cleanup then we can stick with it for now.
> > 
> > >   
> > > > IIUC you can keep it then use it
> > > > in either postcopy_pause() or migration_cleanup() directly.  
> > > 
> > > I can do that if you wish.
> > > Should I keep closing ms->postcopy_qemufile_src inside
> > > migration_release_dst_files() or stop_return_path_thread_on_source() ?  
> > 
> > If you use migration_release_dst_files() in both (1) postcopy pause and (2)
> > migration cleanup, IIUC we should covered all cases.  But please double
> > check.
> 
> Even better idea: Add a helper that closes s->to_dst_file and s->rp_state.from_dst_file
> and use that in postcopy pause and migration cleanup.
> 
> What do you think?

If you mean having the helper close all three possible qemufiles, then it
sounds OK to me.  As long as your new code will still properly manage the
preempt channel and not open-code it I'm OK.

> 
> > >   
> > > > 
> > > > Then you need to remove the chunk [1] below, making the function only do
> > > > "stop" but not close.
> > > >   
> > > > >  {
> > > > > -    ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
> > > > > +    assert(ms->rp_state.from_dst_file);    
> > > > 
> > > > I don't see why this change is a must, to open from_dst_file earlier.  Can
> > > > we keep it as before, or would you justify it in a separate patch?  
> > > 
> > > Well, the issue is that opening the return path file is behind this if:
> > > 
> > >  if (migrate_postcopy_ram() || migrate_return_path()) {
> > > -        open_return_path_on_source(s);
> > > +        start_return_path_thread_on_source(s);
> > >  }
> > > 
> > > And I want to always open the return path file, without also starting
> > > the return path thread.  
> > 
> > I never notice this small difference.. then logically COLO should just
> > depend on return-path capability.
> > 
> > IIUC the simplest for your series to move on is:
> > 
> >   - If COLO always require return-path (I hope you know the best... e.g. do
> >     you enable return-path cap for your colo deployment?), we can add that
> >     enforcement in migrate_caps_check().  It's an ABI break only to COLO,
> >     but if you're OK, I don't see it an issue.
> > 
> >   - Just add one more condition above into:
> > 
> >   if (migrate_postcopy_ram() || migrate_return_path() || migrate_colo()) {
> >           open_return_path_on_source(s);
> >   }
> > 
> >   Instead of making rp complicated; after all many places assume rp thread
> >   is there when rp qemufile is there, vice versa.  Changing that needs more
> >   monitoring of code base, IMHO.  Better stick with it to be simple.
> 
> That doesn't work either, because now you have the rp thread running
> and to stop the thread we
> qemu_file_shutdown(ms->rp_state.from_dst_file).

Do you mean this line?

    WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
        if (migrate_has_error(ms) && ms->rp_state.from_dst_file) {
            qemu_file_shutdown(ms->rp_state.from_dst_file);
        }
    }

We don't shut it down if the migration succeeded, but rely on RP_SHUT
message.  I think it should still be true for COLO.  If something is wrong,
migration will fail and it won't switch to COLO state.  Otherwise it won't
shut so IIUC COLO can reuse it.

> And we're back to square one because we can't reuse the shut
> s->rp_state.from_dst_file in colo.
> 
> In fact looking at the code, qemu_file_shutdown() shuts both
> ms->rp_state.from_dst_file *and* s->to_dst_file because
> it calls qio_channel_shutdown(f->ioc, QIO_CHANNEL_SHUTDOWN_BOTH) and
> f->ioc is the same io channel for both files.
> 
> So after stop_return_path_thread_on_source() our connection the the
> secondary is severed. That's now workable.
> 
> > 
> > >   
> > > >   
> > > > >  
> > > > >      trace_open_return_path_on_source();
> > > > >  
> > > > > @@ -2402,7 +2378,7 @@ static void open_return_path_on_source(MigrationState *ms)
> > > > >  }
> > > > >  
> > > > >  /* Return true if error detected, or false otherwise */
> > > > > -static bool close_return_path_on_source(MigrationState *ms)
> > > > > +static bool stop_return_path_thread_on_source(MigrationState *ms)
> > > > >  {
> > > > >      if (!ms->rp_state.rp_thread_created) {
> > > > >          return false;
> > > > > @@ -2424,7 +2400,17 @@ static bool close_return_path_on_source(MigrationState *ms)
> > > > >  
> > > > >      qemu_thread_join(&ms->rp_state.rp_thread);
> > > > >      ms->rp_state.rp_thread_created = false;
> > > > > -    migration_release_dst_files(ms);
> > > > > +    /*
> > > > > +     * Close the postcopy fast path socket if there is one.
> > > > > +     * No locking needed because this qemufile should only be managed by
> > > > > +     * return path thread which we just stopped.
> > > > > +     */
> > > > > +    if (ms->postcopy_qemufile_src) {
> > > > > +        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
> > > > > +        qemu_file_shutdown(ms->postcopy_qemufile_src);
> > > > > +        qemu_fclose(ms->postcopy_qemufile_src);
> > > > > +        ms->postcopy_qemufile_src = NULL;
> > > > > +    }    
> > > > 
> > > > [1]
> > > >   
> > > > >      trace_migration_return_path_end_after();
> > > > >  
> > > > >      /* Return path will persist the error in MigrationState when quit */
> > > > > @@ -2787,7 +2773,7 @@ static void migration_completion(MigrationState *s)
> > > > >          goto fail;
> > > > >      }
> > > > >  
> > > > > -    if (close_return_path_on_source(s)) {
> > > > > +    if (stop_return_path_thread_on_source(s)) {
> > > > >          goto fail;
> > > > >      }
> > > > >  
> > > > > @@ -2941,7 +2927,15 @@ static MigThrError postcopy_pause(MigrationState *s)
> > > > >           * path and just wait for the thread to finish. It will be
> > > > >           * re-created when we resume.
> > > > >           */
> > > > > -        close_return_path_on_source(s);
> > > > > +        stop_return_path_thread_on_source(s);
> > > > > +        QEMUFile *rp_file;
> > > > > +        WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
> > > > > +            rp_file = s->rp_state.from_dst_file;
> > > > > +            s->rp_state.from_dst_file = NULL;
> > > > > +        }
> > > > > +        if (rp_file) {
> > > > > +            qemu_fclose(rp_file);
> > > > > +        }    
> > > > 
> > > > Open-code this is going backwards.  Please see if we can reuse
> > > > migration_release_dst_files() at least.
> > > > 
> > > > Thanks,
> > > >   
> > > > >  
> > > > >          /*
> > > > >           * Current channel is possibly broken. Release it.  Note that this is
> > > > > @@ -3758,6 +3752,7 @@ void migration_start_outgoing(MigrationState *s)
> > > > >      if (!qemu_file_set_blocking(s->to_dst_file, true, &local_err)) {
> > > > >          goto fail;
> > > > >      }
> > > > > +    s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
> > > > >  
> > > > >      /*
> > > > >       * Open the return path. For postcopy, it is used exclusively. For
> > > > > @@ -3765,7 +3760,7 @@ void migration_start_outgoing(MigrationState *s)
> > > > >       * QEMU uses the return path.
> > > > >       */
> > > > >      if (migrate_postcopy_ram() || migrate_return_path()) {
> > > > > -        open_return_path_on_source(s);
> > > > > +        start_return_path_thread_on_source(s);
> > > > >      }
> > > > >
> > > > >
> > > > >  
> > > > >      /*
> > > > > 
> > > > > -- 
> > > > > 2.39.5
> > > > >     
> > > >   
> > >   
> > 
> > 
> > 
> 



-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2026-02-28  0:24 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-20 19:51 [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
2026-02-20 19:51 ` [PATCH v10 01/19] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
2026-02-20 19:51 ` [PATCH v10 02/19] MAINTAINERS: Remove Hailiang Zhang from " Lukas Straub
2026-02-20 19:51 ` [PATCH v10 03/19] colo: Setup ram cache in normal migration path Lukas Straub
2026-02-20 19:51 ` [PATCH v10 04/19] colo: Replace migration_incoming_colo_enabled() with migrate_colo() Lukas Straub
2026-02-20 19:51 ` [PATCH v10 05/19] colo: Remove ENABLE_COLO savevm command and mark it as deprecated Lukas Straub
2026-02-20 19:51 ` [PATCH v10 06/19] ram: Remove colo special-casing Lukas Straub
2026-02-20 19:51 ` [PATCH v10 07/19] multifd: Move ram state receive into multifd_ram_state_recv() Lukas Straub
2026-02-20 19:51 ` [PATCH v10 08/19] multifd: Add COLO support Lukas Straub
2026-02-20 19:51 ` [PATCH v10 09/19] Call colo_release_ram_cache() after multifd threads terminate Lukas Straub
2026-02-20 19:51 ` [PATCH v10 10/19] colo: Fix crash during device vmstate load Lukas Straub
2026-02-20 19:51 ` [PATCH v10 11/19] colo: Hold the BQL while sending ram state Lukas Straub
2026-02-20 19:51 ` [PATCH v10 12/19] colo: Do not hold the BQL while receiving " Lukas Straub
2026-02-20 19:51 ` [PATCH v10 13/19] migration-test: Add COLO migration unit test Lukas Straub
2026-02-20 19:51 ` [PATCH v10 14/19] Convert colo main documentation to restructuredText Lukas Straub
2026-02-20 19:51 ` [PATCH v10 15/19] qemu-colo.rst: Miscellaneous changes Lukas Straub
2026-02-20 19:51 ` [PATCH v10 16/19] qemu-colo.rst: Add my copyright Lukas Straub
2026-02-20 19:51 ` [PATCH v10 17/19] qemu-colo.rst: Simplify the block replication setup Lukas Straub
2026-02-20 19:51 ` [PATCH v10 18/19] multifd: Fix hang if send thread errors during sync Lukas Straub
2026-02-26 15:25   ` Peter Xu
2026-02-20 19:51 ` [PATCH v10 19/19] migration: Always open s->rp_state.from_dst_file on the source Lukas Straub
2026-02-26 15:49   ` Peter Xu
2026-02-27 11:49     ` Lukas Straub
2026-02-27 17:22       ` Peter Xu
2026-02-27 17:59         ` Lukas Straub
2026-02-28  0:22           ` Peter Xu
2026-02-26 10:44 ` [PATCH v10 00/19] migration: Add COLO multifd support and COLO migration unit test Lukas Straub

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.