All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] migration/rdma: fixes, refactor and cleanup
@ 2025-02-21  6:36 Li Zhijian via
  2025-02-21  6:36 ` [PATCH v2 1/8] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Li Zhijian via @ 2025-02-21  6:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

- It fix the RDMA migration broken issue
- disable RDMA + postcopy
- some cleanups
- Add a qtest for RDMA at last

Changs since V1[0]:
Add some saparate patches to refactor and cleanup based on V1

[0] https://lore.kernel.org/qemu-devel/20250218074345.638203-1-lizhijian@fujitsu.com/

Li Zhijian (8):
  migration: Prioritize RDMA in ram_save_target_page()
  migration/rdma: Remove redundant RAM_SAVE_CONTROL_NOT_SUPP check
  migration: Kill RAM_SAVE_CONTROL_NOT_SUPP
  migration: Integrate control_save_page() logic into
    ram_save_target_page()
  migration: Add migration_capabilities_and_transport_compatible()
    helper
  migraion: disable RDMA + postcopy-ram
  migration/rdma: Remove redundant migration_in_postcopy checks
  migration: Add qtest for migration over RDMA

 MAINTAINERS                           |  1 +
 migration/migration.c                 | 40 ++++++++++++-----
 migration/ram.c                       | 41 +++++------------
 migration/rdma.c                      | 12 +++--
 migration/rdma.h                      |  3 +-
 scripts/rdma-migration-helper.sh      | 41 +++++++++++++++++
 tests/qtest/migration/precopy-tests.c | 64 +++++++++++++++++++++++++++
 7 files changed, 153 insertions(+), 49 deletions(-)
 create mode 100755 scripts/rdma-migration-helper.sh

-- 
2.44.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/8] migration: Prioritize RDMA in ram_save_target_page()
  2025-02-21  6:36 [PATCH v2 0/8] migration/rdma: fixes, refactor and cleanup Li Zhijian via
@ 2025-02-21  6:36 ` Li Zhijian via
  2025-02-24 19:55   ` Peter Xu
  2025-02-21  6:36 ` [PATCH v2 2/8] migration/rdma: Remove redundant RAM_SAVE_CONTROL_NOT_SUPP check Li Zhijian via
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Li Zhijian via @ 2025-02-21  6:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

Address an error in RDMA-based migration by ensuring RDMA is prioritized
when saving pages in `ram_save_target_page()`.

Previously, the RDMA protocol's page-saving step was placed after other
protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
failures characterized by unknown control messages and state loading errors
destination:
(qemu) qemu-system-x86_64: Unknown control message QEMU FILE
qemu-system-x86_64: error while loading state section id 1(ram)
qemu-system-x86_64: load of migration failed: Operation not permitted
source:
(qemu) qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
qemu-system-x86_64: failed to save SaveStateEntry with id(name): 1(ram): -1
qemu-system-x86_64: rdma migration: recv polling control error!
qemu-system-x86_64: warning: Early error. Sending error.
qemu-system-x86_64: warning: rdma migration: send polling control error

RDMA migration implemented its own protocol/method to send pages to
destination side, hand over to RDMA first to prevent pages being saved by
other protocol.

Fixes: bc38dc2f5f3 ("migration: refactor ram_save_target_page functions")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/ram.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 589b6505eb2..424df6d9f13 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1964,6 +1964,11 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
     ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
     int res;
 
+    /* Hand over to RDMA first */
+    if (control_save_page(pss, offset, &res)) {
+        return res;
+    }
+
     if (!migrate_multifd()
         || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
         if (save_zero_page(rs, pss, offset)) {
@@ -1976,10 +1981,6 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
         return ram_save_multifd_page(block, offset);
     }
 
-    if (control_save_page(pss, offset, &res)) {
-        return res;
-    }
-
     return ram_save_page(rs, pss);
 }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/8] migration/rdma: Remove redundant RAM_SAVE_CONTROL_NOT_SUPP check
  2025-02-21  6:36 [PATCH v2 0/8] migration/rdma: fixes, refactor and cleanup Li Zhijian via
  2025-02-21  6:36 ` [PATCH v2 1/8] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
@ 2025-02-21  6:36 ` Li Zhijian via
  2025-02-21  6:36 ` [PATCH v2 3/8] migration: Kill RAM_SAVE_CONTROL_NOT_SUPP Li Zhijian via
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Li Zhijian via @ 2025-02-21  6:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

qemu_rdma_save_page() no longer returns RAM_SAVE_CONTROL_NOT_SUPP
since commit a4832d299dd ("migration/rdma: Check sooner if we are in postcopy for save_page()")

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 76fb0349238..af8e6234a9f 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3290,8 +3290,7 @@ int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
 
     int ret = qemu_rdma_save_page(f, block_offset, offset, size);
 
-    if (ret != RAM_SAVE_CONTROL_DELAYED &&
-        ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+    if (ret != RAM_SAVE_CONTROL_DELAYED) {
         if (ret < 0) {
             qemu_file_set_error(f, ret);
         }
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/8] migration: Kill RAM_SAVE_CONTROL_NOT_SUPP
  2025-02-21  6:36 [PATCH v2 0/8] migration/rdma: fixes, refactor and cleanup Li Zhijian via
  2025-02-21  6:36 ` [PATCH v2 1/8] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
  2025-02-21  6:36 ` [PATCH v2 2/8] migration/rdma: Remove redundant RAM_SAVE_CONTROL_NOT_SUPP check Li Zhijian via
@ 2025-02-21  6:36 ` Li Zhijian via
  2025-02-21  6:36 ` [PATCH v2 4/8] migration: Integrate control_save_page() logic into ram_save_target_page() Li Zhijian via
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Li Zhijian via @ 2025-02-21  6:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

Refactor the migration control logic by eliminating the
`RAM_SAVE_CONTROL_NOT_SUPP` return value within the migration codebase.

This involves moving the checks for RDMA migration status and postcopy
state from rdma_control_save_page() to control_save_page()

With this change, control_save_page() now takes responsibility for
determining whether RDMA operations can proceed, based on the state of
migration.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/ram.c  | 19 ++++++++++---------
 migration/rdma.c |  4 +---
 migration/rdma.h |  3 +--
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 424df6d9f13..b7157b9b175 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1155,18 +1155,19 @@ static bool control_save_page(PageSearchStatus *pss,
 {
     int ret;
 
-    ret = rdma_control_save_page(pss->pss_channel, pss->block->offset, offset,
-                                 TARGET_PAGE_SIZE);
-    if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
-        return false;
-    }
+    if (migrate_rdma() && !migration_in_postcopy()) {
+        ret = rdma_control_save_page(pss->pss_channel, pss->block->offset,
+                                     offset, TARGET_PAGE_SIZE);
 
-    if (ret == RAM_SAVE_CONTROL_DELAYED) {
-        *pages = 1;
+        if (ret == RAM_SAVE_CONTROL_DELAYED) {
+            *pages = 1;
+        } else {
+            *pages = ret;
+        }
         return true;
     }
-    *pages = ret;
-    return true;
+
+    return false;
 }
 
 /*
diff --git a/migration/rdma.c b/migration/rdma.c
index af8e6234a9f..c6876347e1e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3284,9 +3284,7 @@ err:
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                            ram_addr_t offset, size_t size)
 {
-    if (!migrate_rdma() || migration_in_postcopy()) {
-        return RAM_SAVE_CONTROL_NOT_SUPP;
-    }
+    assert(migrate_rdma());
 
     int ret = qemu_rdma_save_page(f, block_offset, offset, size);
 
diff --git a/migration/rdma.h b/migration/rdma.h
index f55f28bbed1..8eeb0117b91 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -33,7 +33,6 @@ void rdma_start_incoming_migration(InetSocketAddress *host_port, Error **errp);
 #define RAM_CONTROL_ROUND     1
 #define RAM_CONTROL_FINISH    3
 
-#define RAM_SAVE_CONTROL_NOT_SUPP -1000
 #define RAM_SAVE_CONTROL_DELAYED  -2000
 
 #ifdef CONFIG_RDMA
@@ -56,7 +55,7 @@ static inline
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                            ram_addr_t offset, size_t size)
 {
-    return RAM_SAVE_CONTROL_NOT_SUPP;
+    g_assert_not_reached();
 }
 #endif
 #endif
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 4/8] migration: Integrate control_save_page() logic into ram_save_target_page()
  2025-02-21  6:36 [PATCH v2 0/8] migration/rdma: fixes, refactor and cleanup Li Zhijian via
                   ` (2 preceding siblings ...)
  2025-02-21  6:36 ` [PATCH v2 3/8] migration: Kill RAM_SAVE_CONTROL_NOT_SUPP Li Zhijian via
@ 2025-02-21  6:36 ` Li Zhijian via
  2025-02-21  6:36 ` [PATCH v2 5/8] migration: Add migration_capabilities_and_transport_compatible() helper Li Zhijian via
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Li Zhijian via @ 2025-02-21  6:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

Refactor the page saving logic by integrating the control_save_page()
function directly into ram_save_target_page(). This change consolidates the
RDMA migration decision-making process into a single function, enhancing
clarity and maintainability.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/ram.c | 35 +++++++----------------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index b7157b9b175..e07651aee8d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1143,33 +1143,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
     return len;
 }
 
-/*
- * @pages: the number of pages written by the control path,
- *        < 0 - error
- *        > 0 - number of pages written
- *
- * Return true if the pages has been saved, otherwise false is returned.
- */
-static bool control_save_page(PageSearchStatus *pss,
-                              ram_addr_t offset, int *pages)
-{
-    int ret;
-
-    if (migrate_rdma() && !migration_in_postcopy()) {
-        ret = rdma_control_save_page(pss->pss_channel, pss->block->offset,
-                                     offset, TARGET_PAGE_SIZE);
-
-        if (ret == RAM_SAVE_CONTROL_DELAYED) {
-            *pages = 1;
-        } else {
-            *pages = ret;
-        }
-        return true;
-    }
-
-    return false;
-}
-
 /*
  * directly send the page to the stream
  *
@@ -1966,7 +1939,13 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
     int res;
 
     /* Hand over to RDMA first */
-    if (control_save_page(pss, offset, &res)) {
+    if (migrate_rdma() && !migration_in_postcopy()) {
+        res = rdma_control_save_page(pss->pss_channel, pss->block->offset,
+                                     offset, TARGET_PAGE_SIZE);
+
+        if (res == RAM_SAVE_CONTROL_DELAYED) {
+            res = 1;
+        }
         return res;
     }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 5/8] migration: Add migration_capabilities_and_transport_compatible() helper
  2025-02-21  6:36 [PATCH v2 0/8] migration/rdma: fixes, refactor and cleanup Li Zhijian via
                   ` (3 preceding siblings ...)
  2025-02-21  6:36 ` [PATCH v2 4/8] migration: Integrate control_save_page() logic into ram_save_target_page() Li Zhijian via
@ 2025-02-21  6:36 ` Li Zhijian via
  2025-02-24 19:58   ` Peter Xu
  2025-02-21  6:36 ` [PATCH v2 6/8] migraion: disable RDMA + postcopy-ram Li Zhijian via
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Li Zhijian via @ 2025-02-21  6:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

Similar to migration_channels_and_transport_compatible(), introduce a
new helper migration_capabilities_and_transport_compatible() to check if
the capabilites is compatible with the transport.

Currently, only move the capabilities vs RDMA transport to this
function.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/migration.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c597aa707e5..2eacae25e0e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -238,6 +238,30 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
     return true;
 }
 
+static bool
+migration_capabilities_and_transport_compatible(MigrationAddress *addr,
+                                                Error **errp)
+{
+    if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) {
+        if (migrate_xbzrle()) {
+            error_setg(errp, "RDMA and XBZRLE can't be used together");
+            return false;
+        }
+        if (migrate_multifd()) {
+            error_setg(errp, "RDMA and multifd can't be used together");
+            return false;
+        }
+    }
+
+    return true;
+}
+
+static bool migration_transport_compatible(MigrationAddress *addr, Error **errp)
+{
+    return migration_channels_and_transport_compatible(addr, errp) &&
+           migration_capabilities_and_transport_compatible(addr, errp);
+}
+
 static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
 {
     uintptr_t a = (uintptr_t) ap, b = (uintptr_t) bp;
@@ -716,7 +740,7 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
     }
 
     /* transport mechanism not suitable for migration? */
-    if (!migration_channels_and_transport_compatible(addr, errp)) {
+    if (!migration_transport_compatible(addr, errp)) {
         return;
     }
 
@@ -735,14 +759,6 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
         }
 #ifdef CONFIG_RDMA
     } else if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) {
-        if (migrate_xbzrle()) {
-            error_setg(errp, "RDMA and XBZRLE can't be used together");
-            return;
-        }
-        if (migrate_multifd()) {
-            error_setg(errp, "RDMA and multifd can't be used together");
-            return;
-        }
         rdma_start_incoming_migration(&addr->u.rdma, errp);
 #endif
     } else if (addr->transport == MIGRATION_ADDRESS_TYPE_EXEC) {
@@ -2159,7 +2175,7 @@ void qmp_migrate(const char *uri, bool has_channels,
     }
 
     /* transport mechanism not suitable for migration? */
-    if (!migration_channels_and_transport_compatible(addr, errp)) {
+    if (!migration_transport_compatible(addr, errp)) {
         return;
     }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 6/8] migraion: disable RDMA + postcopy-ram
  2025-02-21  6:36 [PATCH v2 0/8] migration/rdma: fixes, refactor and cleanup Li Zhijian via
                   ` (4 preceding siblings ...)
  2025-02-21  6:36 ` [PATCH v2 5/8] migration: Add migration_capabilities_and_transport_compatible() helper Li Zhijian via
@ 2025-02-21  6:36 ` Li Zhijian via
  2025-02-24 19:58   ` Peter Xu
  2025-02-21  6:36 ` [PATCH v2 7/8] migration/rdma: Remove redundant migration_in_postcopy checks Li Zhijian via
  2025-02-21  6:36 ` [PATCH v2 8/8] migration: Add qtest for migration over RDMA Li Zhijian via
  7 siblings, 1 reply; 19+ messages in thread
From: Li Zhijian via @ 2025-02-21  6:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

It's believed that RDMA + postcopy-ram has been broken for a while.
Rather than spending time re-enabling it, let's simply disable it as a
trade-off.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/migration.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 2eacae25e0e..d414a4b1379 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -251,6 +251,10 @@ migration_capabilities_and_transport_compatible(MigrationAddress *addr,
             error_setg(errp, "RDMA and multifd can't be used together");
             return false;
         }
+        if (migrate_postcopy_ram()) {
+            error_setg(errp, "RDMA and postcopy-ram can't be used together");
+            return false;
+        }
     }
 
     return true;
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 7/8] migration/rdma: Remove redundant migration_in_postcopy checks
  2025-02-21  6:36 [PATCH v2 0/8] migration/rdma: fixes, refactor and cleanup Li Zhijian via
                   ` (5 preceding siblings ...)
  2025-02-21  6:36 ` [PATCH v2 6/8] migraion: disable RDMA + postcopy-ram Li Zhijian via
@ 2025-02-21  6:36 ` Li Zhijian via
  2025-02-24 20:00   ` Peter Xu
  2025-02-21  6:36 ` [PATCH v2 8/8] migration: Add qtest for migration over RDMA Li Zhijian via
  7 siblings, 1 reply; 19+ messages in thread
From: Li Zhijian via @ 2025-02-21  6:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

Since we have disabled RDMA + postcopy, it's safe to remove
the migration_in_postcopy()  that follows the migration_rdma().

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/ram.c  | 2 +-
 migration/rdma.c | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index e07651aee8d..c363034c882 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1939,7 +1939,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
     int res;
 
     /* Hand over to RDMA first */
-    if (migrate_rdma() && !migration_in_postcopy()) {
+    if (migrate_rdma()) {
         res = rdma_control_save_page(pss->pss_channel, pss->block->offset,
                                      offset, TARGET_PAGE_SIZE);
 
diff --git a/migration/rdma.c b/migration/rdma.c
index c6876347e1e..0349dd4a8b8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3826,7 +3826,7 @@ int rdma_block_notification_handle(QEMUFile *f, const char *name)
 
 int rdma_registration_start(QEMUFile *f, uint64_t flags)
 {
-    if (!migrate_rdma() || migration_in_postcopy()) {
+    if (!migrate_rdma()) {
         return 0;
     }
 
@@ -3858,7 +3858,8 @@ int rdma_registration_stop(QEMUFile *f, uint64_t flags)
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
     int ret;
 
-    if (!migrate_rdma() || migration_in_postcopy()) {
+    /* Hand over to RDMA first */
+    if (!migrate_rdma()) {
         return 0;
     }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 8/8] migration: Add qtest for migration over RDMA
  2025-02-21  6:36 [PATCH v2 0/8] migration/rdma: fixes, refactor and cleanup Li Zhijian via
                   ` (6 preceding siblings ...)
  2025-02-21  6:36 ` [PATCH v2 7/8] migration/rdma: Remove redundant migration_in_postcopy checks Li Zhijian via
@ 2025-02-21  6:36 ` Li Zhijian via
  2025-02-24 20:01   ` Peter Xu
  7 siblings, 1 reply; 19+ messages in thread
From: Li Zhijian via @ 2025-02-21  6:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

This qtest requires there is a RDMA(RoCE) link in the host.
In order to make the test work smoothly, introduce a
scripts/rdma-migration-helper.sh to
- setup a new Soft-RoCE(aka RXE) if it's root
- detect existing RoCE link

Test will be skipped if there is no available RoCE link.
 # Start of rdma tests
 # Running /x86_64/migration/precopy/rdma/plain
 ok 1 /x86_64/migration/precopy/rdma/plain # SKIP
 There is no available rdma link to run RDMA migration test.
 To enable the test:
 (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the test
 or
 (2) Run the test with root privilege

 # End of rdma tests

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 MAINTAINERS                           |  1 +
 scripts/rdma-migration-helper.sh      | 41 +++++++++++++++++
 tests/qtest/migration/precopy-tests.c | 64 +++++++++++++++++++++++++++
 3 files changed, 106 insertions(+)
 create mode 100755 scripts/rdma-migration-helper.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 3848d37a38d..15360fcdc4b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3480,6 +3480,7 @@ R: Li Zhijian <lizhijian@fujitsu.com>
 R: Peter Xu <peterx@redhat.com>
 S: Odd Fixes
 F: migration/rdma*
+F: scripts/rdma-migration-helper.sh
 
 Migration dirty limit and dirty page rate
 M: Hyman Huang <yong.huang@smartx.com>
diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh
new file mode 100755
index 00000000000..66557d9e267
--- /dev/null
+++ b/scripts/rdma-migration-helper.sh
@@ -0,0 +1,41 @@
+#!/bin/bash
+
+# Copied from blktests
+get_ipv4_addr()
+{
+    ip -4 -o addr show dev "$1" |
+        sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
+        tr -d '\n'
+}
+
+has_soft_rdma()
+{
+    rdma link | grep -q " netdev $1[[:blank:]]*\$"
+}
+
+rdma_rxe_setup_detect()
+{
+    (
+        cd /sys/class/net &&
+            for i in *; do
+                [ -e "$i" ] || continue
+                [ "$i" = "lo" ] && continue
+                [ "$(<"$i/addr_len")" = 6 ] || continue
+                [ "$(<"$i/carrier")" = 1 ] || continue
+
+                has_soft_rdma "$i" && break
+                [ "$operation" = "setup" ] &&
+                    rdma link add "${i}_rxe" type rxe netdev "$i" && break
+            done
+        has_soft_rdma "$i" || return
+        get_ipv4_addr "$i"
+    )
+}
+
+operation=${1:-setup}
+
+if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
+    rdma_rxe_setup_detect
+else
+    echo "Usage: $0 [setup | detect]"
+fi
diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index ba273d10b9a..bf97f4e9325 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -99,6 +99,66 @@ static void test_precopy_unix_dirty_ring(void)
     test_precopy_common(&args);
 }
 
+#ifdef CONFIG_RDMA
+
+#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
+static int new_rdma_link(char *buffer)
+{
+    const char *argument = (geteuid() == 0) ? "setup" : "detect";
+    char cmd[1024];
+
+    snprintf(cmd, sizeof(cmd), "%s %s", RDMA_MIGRATION_HELPER, argument);
+
+    FILE *pipe = popen(cmd, "r");
+    if (pipe == NULL) {
+        perror("Failed to run script");
+        return -1;
+    }
+
+    int idx = 0;
+    while (fgets(buffer + idx, 128 - idx, pipe) != NULL) {
+        idx += strlen(buffer);
+    }
+
+    int status = pclose(pipe);
+    if (status == -1) {
+        perror("Error reported by pclose()");
+        return -1;
+    } else if (WIFEXITED(status)) {
+        return WEXITSTATUS(status);
+    }
+
+    return -1;
+}
+
+static void test_precopy_rdma_plain(void)
+{
+    char buffer[128] = {};
+
+    if (new_rdma_link(buffer)) {
+        g_test_skip("\nThere is no available rdma link to run RDMA migration test.\n"
+                    "To enable the test:\n"
+                    "(1) Run \'" RDMA_MIGRATION_HELPER " setup\' with root and rerun the test\n"
+                    "or\n"
+                    "(2) Run the test with root privilege\n");
+        return;
+    }
+
+    /*
+     * TODO: query a free port instead of hard code.
+     * 29200=('R'+'D'+'M'+'A')*100
+     **/
+    g_autofree char *uri = g_strdup_printf("rdma:%s:29200", buffer);
+
+    MigrateCommon args = {
+        .listen_uri = uri,
+        .connect_uri = uri,
+    };
+
+    test_precopy_common(&args);
+}
+#endif
+
 static void test_precopy_tcp_plain(void)
 {
     MigrateCommon args = {
@@ -1124,6 +1184,10 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env)
                        test_multifd_tcp_uri_none);
     migration_test_add("/migration/multifd/tcp/plain/cancel",
                        test_multifd_tcp_cancel);
+#ifdef CONFIG_RDMA
+    migration_test_add("/migration/precopy/rdma/plain",
+                       test_precopy_rdma_plain);
+#endif
 }
 
 void migration_test_add_precopy(MigrationTestEnv *env)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/8] migration: Prioritize RDMA in ram_save_target_page()
  2025-02-21  6:36 ` [PATCH v2 1/8] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
@ 2025-02-24 19:55   ` Peter Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2025-02-24 19:55 UTC (permalink / raw)
  To: Li Zhijian; +Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini

On Fri, Feb 21, 2025 at 02:36:05PM +0800, Li Zhijian wrote:
> Address an error in RDMA-based migration by ensuring RDMA is prioritized
> when saving pages in `ram_save_target_page()`.
> 
> Previously, the RDMA protocol's page-saving step was placed after other
> protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
> failures characterized by unknown control messages and state loading errors
> destination:
> (qemu) qemu-system-x86_64: Unknown control message QEMU FILE
> qemu-system-x86_64: error while loading state section id 1(ram)
> qemu-system-x86_64: load of migration failed: Operation not permitted
> source:
> (qemu) qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
> qemu-system-x86_64: failed to save SaveStateEntry with id(name): 1(ram): -1
> qemu-system-x86_64: rdma migration: recv polling control error!
> qemu-system-x86_64: warning: Early error. Sending error.
> qemu-system-x86_64: warning: rdma migration: send polling control error
> 
> RDMA migration implemented its own protocol/method to send pages to
> destination side, hand over to RDMA first to prevent pages being saved by
> other protocol.
> 
> Fixes: bc38dc2f5f3 ("migration: refactor ram_save_target_page functions")
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/8] migration: Add migration_capabilities_and_transport_compatible() helper
  2025-02-21  6:36 ` [PATCH v2 5/8] migration: Add migration_capabilities_and_transport_compatible() helper Li Zhijian via
@ 2025-02-24 19:58   ` Peter Xu
  2025-02-25  6:37     ` Zhijian Li (Fujitsu) via
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2025-02-24 19:58 UTC (permalink / raw)
  To: Li Zhijian; +Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini

On Fri, Feb 21, 2025 at 02:36:09PM +0800, Li Zhijian wrote:
> Similar to migration_channels_and_transport_compatible(), introduce a
> new helper migration_capabilities_and_transport_compatible() to check if
> the capabilites is compatible with the transport.
> 
> Currently, only move the capabilities vs RDMA transport to this
> function.
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

Yeah this is even better, thanks.

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 6/8] migraion: disable RDMA + postcopy-ram
  2025-02-21  6:36 ` [PATCH v2 6/8] migraion: disable RDMA + postcopy-ram Li Zhijian via
@ 2025-02-24 19:58   ` Peter Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2025-02-24 19:58 UTC (permalink / raw)
  To: Li Zhijian; +Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini

On Fri, Feb 21, 2025 at 02:36:10PM +0800, Li Zhijian wrote:
> It's believed that RDMA + postcopy-ram has been broken for a while.
> Rather than spending time re-enabling it, let's simply disable it as a
> trade-off.
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 7/8] migration/rdma: Remove redundant migration_in_postcopy checks
  2025-02-21  6:36 ` [PATCH v2 7/8] migration/rdma: Remove redundant migration_in_postcopy checks Li Zhijian via
@ 2025-02-24 20:00   ` Peter Xu
  2025-02-25  6:21     ` Zhijian Li (Fujitsu) via
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2025-02-24 20:00 UTC (permalink / raw)
  To: Li Zhijian; +Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini

On Fri, Feb 21, 2025 at 02:36:11PM +0800, Li Zhijian wrote:
> Since we have disabled RDMA + postcopy, it's safe to remove
> the migration_in_postcopy()  that follows the migration_rdma().
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
>  migration/ram.c  | 2 +-
>  migration/rdma.c | 5 +++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index e07651aee8d..c363034c882 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1939,7 +1939,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
>      int res;
>  
>      /* Hand over to RDMA first */
> -    if (migrate_rdma() && !migration_in_postcopy()) {

This line was just added in previous patch.

Would it be better move 5/6 above, then somehow squash 2/3/4/7 so that it
doesn't need to add something and got removed again?  I feel like the four
patches can be squashed into 1 or 2 instead when reorder them.

> +    if (migrate_rdma()) {
>          res = rdma_control_save_page(pss->pss_channel, pss->block->offset,
>                                       offset, TARGET_PAGE_SIZE);
>  
> diff --git a/migration/rdma.c b/migration/rdma.c
> index c6876347e1e..0349dd4a8b8 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -3826,7 +3826,7 @@ int rdma_block_notification_handle(QEMUFile *f, const char *name)
>  
>  int rdma_registration_start(QEMUFile *f, uint64_t flags)
>  {
> -    if (!migrate_rdma() || migration_in_postcopy()) {
> +    if (!migrate_rdma()) {
>          return 0;
>      }
>  
> @@ -3858,7 +3858,8 @@ int rdma_registration_stop(QEMUFile *f, uint64_t flags)
>      RDMAControlHeader head = { .len = 0, .repeat = 1 };
>      int ret;
>  
> -    if (!migrate_rdma() || migration_in_postcopy()) {
> +    /* Hand over to RDMA first */
> +    if (!migrate_rdma()) {
>          return 0;
>      }
>  
> -- 
> 2.44.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 8/8] migration: Add qtest for migration over RDMA
  2025-02-21  6:36 ` [PATCH v2 8/8] migration: Add qtest for migration over RDMA Li Zhijian via
@ 2025-02-24 20:01   ` Peter Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2025-02-24 20:01 UTC (permalink / raw)
  To: Li Zhijian; +Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini

On Fri, Feb 21, 2025 at 02:36:12PM +0800, Li Zhijian wrote:
> This qtest requires there is a RDMA(RoCE) link in the host.
> In order to make the test work smoothly, introduce a
> scripts/rdma-migration-helper.sh to
> - setup a new Soft-RoCE(aka RXE) if it's root
> - detect existing RoCE link
> 
> Test will be skipped if there is no available RoCE link.
>  # Start of rdma tests
>  # Running /x86_64/migration/precopy/rdma/plain
>  ok 1 /x86_64/migration/precopy/rdma/plain # SKIP
>  There is no available rdma link to run RDMA migration test.
>  To enable the test:
>  (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the test
>  or
>  (2) Run the test with root privilege
> 
>  # End of rdma tests
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 7/8] migration/rdma: Remove redundant migration_in_postcopy checks
  2025-02-24 20:00   ` Peter Xu
@ 2025-02-25  6:21     ` Zhijian Li (Fujitsu) via
  2025-02-25 14:50       ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2025-02-25  6:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini



On 25/02/2025 04:00, Peter Xu wrote:
> On Fri, Feb 21, 2025 at 02:36:11PM +0800, Li Zhijian wrote:
>> Since we have disabled RDMA + postcopy, it's safe to remove
>> the migration_in_postcopy()  that follows the migration_rdma().
>>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>>   migration/ram.c  | 2 +-
>>   migration/rdma.c | 5 +++--
>>   2 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index e07651aee8d..c363034c882 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1939,7 +1939,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
>>       int res;
>>   
>>       /* Hand over to RDMA first */
>> -    if (migrate_rdma() && !migration_in_postcopy()) {
> 
> This line was just added in previous patch.
> 
> Would it be better move 5/6 above, then somehow squash 2/3/4/7 so that it
> doesn't need to add something and got removed again? 

Yeah, it sound good to me.
I tried to reorder the pathes and squash previous 2 3 4 to a single one

So the new layout will be like below:

e5b1998ad30 migration: Add qtest for migration over RDMA
9a1b87e2db6 migration: Unfold control_save_page()  << this one squashed previous 2/3/4
b6ccd49e934 migration/rdma: Remove redundant migration_in_postcopy checks
c7c4209db6f migration: disable RDMA + postcopy-ram
0463b54d7f9 migration: Add migration_capabilities_and_transport_compatible() helper
21c76dcabee migration: Prioritize RDMA in ram_save_target_page()


Thanks
Zhijian


> I feel like the four
> patches can be squashed into 1 or 2 instead when reorder them.
> 
>> +    if (migrate_rdma()) {
>>           res = rdma_control_save_page(pss->pss_channel, pss->block->offset,
>>                                        offset, TARGET_PAGE_SIZE);
>>   
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index c6876347e1e..0349dd4a8b8 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -3826,7 +3826,7 @@ int rdma_block_notification_handle(QEMUFile *f, const char *name)
>>   
>>   int rdma_registration_start(QEMUFile *f, uint64_t flags)
>>   {
>> -    if (!migrate_rdma() || migration_in_postcopy()) {
>> +    if (!migrate_rdma()) {
>>           return 0;
>>       }
>>   
>> @@ -3858,7 +3858,8 @@ int rdma_registration_stop(QEMUFile *f, uint64_t flags)
>>       RDMAControlHeader head = { .len = 0, .repeat = 1 };
>>       int ret;
>>   
>> -    if (!migrate_rdma() || migration_in_postcopy()) {
>> +    /* Hand over to RDMA first */
>> +    if (!migrate_rdma()) {
>>           return 0;
>>       }
>>   
>> -- 
>> 2.44.0
>>
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/8] migration: Add migration_capabilities_and_transport_compatible() helper
  2025-02-24 19:58   ` Peter Xu
@ 2025-02-25  6:37     ` Zhijian Li (Fujitsu) via
  2025-02-25 14:48       ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2025-02-25  6:37 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini



On 25/02/2025 03:58, Peter Xu wrote:
> On Fri, Feb 21, 2025 at 02:36:09PM +0800, Li Zhijian wrote:
>> Similar to migration_channels_and_transport_compatible(), introduce a
>> new helper migration_capabilities_and_transport_compatible() to check if
>> the capabilites is compatible with the transport.
>>
>> Currently, only move the capabilities vs RDMA transport to this
>> function.
>>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> 
> Yeah this is even better, thanks.
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>

Hi Peter,

Thinking one step further, this patch looks promising and can check
most situations. However, on the destination side, if the user first
specifies '-incoming' (with the startup parameter -incoming xxx or
migrate_incoming xxx) and then 'migrate_set_capability xxx on',
the function migration_capabilities_and_transport_compatible() will
not be called to check compatibility, which might lead to migration failure.

To address this, without introducing a new member 'transport' into the MigrationState
structure, the code might need to be adjusted to this:

The question is whether we need to consider it now(in this patch set)?

  static bool migration_transport_compatible(MigrationAddress *addr, Error **errp)
  {
      return migration_channels_and_transport_compatible(addr, errp) &&
-           migration_capabilities_and_transport_compatible(addr, errp);
+           migration_capabilities_and_transport_compatible(addr->transport,
+               migrate_get_current()->capabilities, errp);
  }

  static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
diff --git a/migration/options.c b/migration/options.c
index bb259d192a9..59f0ed5b528 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -439,6 +439,29 @@ static bool migrate_incoming_started(void)
      return !!migration_incoming_get_current()->transport_data;
  }
  
+bool
+migration_capabilities_and_transport_compatible(MigrationAddressType transport,
+                                                bool *new_caps,
+                                                Error **errp)
+{
+    if (transport == MIGRATION_ADDRESS_TYPE_RDMA) {
+        if (new_caps[MIGRATION_CAPABILITY_XBZRLE]) {
+            error_setg(errp, "RDMA and XBZRLE can't be used together");
+            return false;
+        }
+        if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
+            error_setg(errp, "RDMA and multifd can't be used together");
+            return false;
+        }
+        if (new_caps[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+            error_setg(errp, "RDMA and postcopy-ram can't be used together");
+            return false;
+        }
+    }
+
+    return true;
+}
+
  /**
   * @migration_caps_check - check capability compatibility
   *
@@ -602,6 +625,15 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
          }
      }
  
+    /*
+     * In destination side, check the cases that capability is being set
+     * after incoming thread has started.
+    */
+    if (migrate_rdma() &&
+        !migration_capabilities_and_transport_compatible(
+            MIGRATION_ADDRESS_TYPE_RDMA, new_caps, errp)) {
+        return false;
+    }
      return true;
  }
  
diff --git a/migration/options.h b/migration/options.h
index 762be4e641a..ca6a40e7545 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -58,6 +58,9 @@ bool migrate_tls(void);
  /* capabilities helpers */
  
  bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp);
+bool
+migration_capabilities_and_transport_compatible(MigrationAddressType transport,
+                                                bool *new_caps, Error **errp);

> 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/8] migration: Add migration_capabilities_and_transport_compatible() helper
  2025-02-25  6:37     ` Zhijian Li (Fujitsu) via
@ 2025-02-25 14:48       ` Peter Xu
  2025-02-26  6:34         ` Zhijian Li (Fujitsu) via
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2025-02-25 14:48 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini

On Tue, Feb 25, 2025 at 06:37:21AM +0000, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 25/02/2025 03:58, Peter Xu wrote:
> > On Fri, Feb 21, 2025 at 02:36:09PM +0800, Li Zhijian wrote:
> >> Similar to migration_channels_and_transport_compatible(), introduce a
> >> new helper migration_capabilities_and_transport_compatible() to check if
> >> the capabilites is compatible with the transport.
> >>
> >> Currently, only move the capabilities vs RDMA transport to this
> >> function.
> >>
> >> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> > 
> > Yeah this is even better, thanks.
> > 
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> Hi Peter,
> 
> Thinking one step further, this patch looks promising and can check
> most situations. However, on the destination side, if the user first
> specifies '-incoming' (with the startup parameter -incoming xxx or
> migrate_incoming xxx) and then 'migrate_set_capability xxx on',
> the function migration_capabilities_and_transport_compatible() will
> not be called to check compatibility, which might lead to migration failure.
> 
> To address this, without introducing a new member 'transport' into the MigrationState
> structure, the code might need to be adjusted to this:
> 
> The question is whether we need to consider it now(in this patch set)?

We can do that in one patch.

> 
>   static bool migration_transport_compatible(MigrationAddress *addr, Error **errp)
>   {
>       return migration_channels_and_transport_compatible(addr, errp) &&
> -           migration_capabilities_and_transport_compatible(addr, errp);
> +           migration_capabilities_and_transport_compatible(addr->transport,
> +               migrate_get_current()->capabilities, errp);

Here IMHO we could make migration_capabilities_and_transport_compatible()
taking addr+errp like before, then..

>   }
> 
>   static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
> diff --git a/migration/options.c b/migration/options.c
> index bb259d192a9..59f0ed5b528 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -439,6 +439,29 @@ static bool migrate_incoming_started(void)
>       return !!migration_incoming_get_current()->transport_data;
>   }
>   
> +bool
> +migration_capabilities_and_transport_compatible(MigrationAddressType transport,
> +                                                bool *new_caps,
> +                                                Error **errp)
> +{

..  here fetch global capability list and feed it.

> +    if (transport == MIGRATION_ADDRESS_TYPE_RDMA) {

[1]

> +        if (new_caps[MIGRATION_CAPABILITY_XBZRLE]) {
> +            error_setg(errp, "RDMA and XBZRLE can't be used together");
> +            return false;
> +        }
> +        if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
> +            error_setg(errp, "RDMA and multifd can't be used together");
> +            return false;
> +        }
> +        if (new_caps[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
> +            error_setg(errp, "RDMA and postcopy-ram can't be used together");
> +            return false;
> +        }

We could introduce migration_rdma_caps_check(&caps, errp) for this chunk
(since [1]), then...

> +    }
> +
> +    return true;
> +}
> +
>   /**
>    * @migration_caps_check - check capability compatibility
>    *
> @@ -602,6 +625,15 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
>           }
>       }
>   
> +    /*
> +     * In destination side, check the cases that capability is being set
> +     * after incoming thread has started.
> +    */
> +    if (migrate_rdma() &&
> +        !migration_capabilities_and_transport_compatible(
> +            MIGRATION_ADDRESS_TYPE_RDMA, new_caps, errp)) {
> +        return false;
> +    }

... use migration_rdma_caps_check() here, might be slightly more readable:

  if (migrate_rdma() && !migration_rdma_caps_check(new_caps, errp)) {
      return false;
  }

>       return true;
>   }
>   
> diff --git a/migration/options.h b/migration/options.h
> index 762be4e641a..ca6a40e7545 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -58,6 +58,9 @@ bool migrate_tls(void);
>   /* capabilities helpers */
>   
>   bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp);
> +bool
> +migration_capabilities_and_transport_compatible(MigrationAddressType transport,
> +                                                bool *new_caps, Error **errp);
> 
> > 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 7/8] migration/rdma: Remove redundant migration_in_postcopy checks
  2025-02-25  6:21     ` Zhijian Li (Fujitsu) via
@ 2025-02-25 14:50       ` Peter Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2025-02-25 14:50 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini

On Tue, Feb 25, 2025 at 06:21:20AM +0000, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 25/02/2025 04:00, Peter Xu wrote:
> > On Fri, Feb 21, 2025 at 02:36:11PM +0800, Li Zhijian wrote:
> >> Since we have disabled RDMA + postcopy, it's safe to remove
> >> the migration_in_postcopy()  that follows the migration_rdma().
> >>
> >> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> >> ---
> >>   migration/ram.c  | 2 +-
> >>   migration/rdma.c | 5 +++--
> >>   2 files changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/migration/ram.c b/migration/ram.c
> >> index e07651aee8d..c363034c882 100644
> >> --- a/migration/ram.c
> >> +++ b/migration/ram.c
> >> @@ -1939,7 +1939,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
> >>       int res;
> >>   
> >>       /* Hand over to RDMA first */
> >> -    if (migrate_rdma() && !migration_in_postcopy()) {
> > 
> > This line was just added in previous patch.
> > 
> > Would it be better move 5/6 above, then somehow squash 2/3/4/7 so that it
> > doesn't need to add something and got removed again? 
> 
> Yeah, it sound good to me.
> I tried to reorder the pathes and squash previous 2 3 4 to a single one
> 
> So the new layout will be like below:
> 
> e5b1998ad30 migration: Add qtest for migration over RDMA
> 9a1b87e2db6 migration: Unfold control_save_page()  << this one squashed previous 2/3/4
> b6ccd49e934 migration/rdma: Remove redundant migration_in_postcopy checks
> c7c4209db6f migration: disable RDMA + postcopy-ram
> 0463b54d7f9 migration: Add migration_capabilities_and_transport_compatible() helper
> 21c76dcabee migration: Prioritize RDMA in ram_save_target_page()

I'll have another look when repost, but so far looks good, thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/8] migration: Add migration_capabilities_and_transport_compatible() helper
  2025-02-25 14:48       ` Peter Xu
@ 2025-02-26  6:34         ` Zhijian Li (Fujitsu) via
  0 siblings, 0 replies; 19+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2025-02-26  6:34 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini



On 25/02/2025 22:48, Peter Xu wrote:
> On Tue, Feb 25, 2025 at 06:37:21AM +0000, Zhijian Li (Fujitsu) wrote:
>>
>>
>> On 25/02/2025 03:58, Peter Xu wrote:
>>> On Fri, Feb 21, 2025 at 02:36:09PM +0800, Li Zhijian wrote:
>>>> Similar to migration_channels_and_transport_compatible(), introduce a
>>>> new helper migration_capabilities_and_transport_compatible() to check if
>>>> the capabilites is compatible with the transport.
>>>>
>>>> Currently, only move the capabilities vs RDMA transport to this
>>>> function.
>>>>
>>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>>>
>>> Yeah this is even better, thanks.
>>>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>
>> Hi Peter,
>>
>> Thinking one step further, this patch looks promising and can check
>> most situations. However, on the destination side, if the user first
>> specifies '-incoming' (with the startup parameter -incoming xxx or
>> migrate_incoming xxx) and then 'migrate_set_capability xxx on',
>> the function migration_capabilities_and_transport_compatible() will
>> not be called to check compatibility, which might lead to migration failure.
>>
>> To address this, without introducing a new member 'transport' into the MigrationState
>> structure, the code might need to be adjusted to this:
>>
>> The question is whether we need to consider it now(in this patch set)?
> 
> We can do that in one patch.

Okay, please ignore the V3 and take another look at the V4 which integrated your
below suggestion.


Thanks

> 
>>
>>    static bool migration_transport_compatible(MigrationAddress *addr, Error **errp)
>>    {
>>        return migration_channels_and_transport_compatible(addr, errp) &&
>> -           migration_capabilities_and_transport_compatible(addr, errp);
>> +           migration_capabilities_and_transport_compatible(addr->transport,
>> +               migrate_get_current()->capabilities, errp);
> 
> Here IMHO we could make migration_capabilities_and_transport_compatible()
> taking addr+errp like before, then..
> 
>>    }
>>
>>    static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
>> diff --git a/migration/options.c b/migration/options.c
>> index bb259d192a9..59f0ed5b528 100644
>> --- a/migration/options.c
>> +++ b/migration/options.c
>> @@ -439,6 +439,29 @@ static bool migrate_incoming_started(void)
>>        return !!migration_incoming_get_current()->transport_data;
>>    }
>>    
>> +bool
>> +migration_capabilities_and_transport_compatible(MigrationAddressType transport,
>> +                                                bool *new_caps,
>> +                                                Error **errp)
>> +{
> 
> ..  here fetch global capability list and feed it.
> 
>> +    if (transport == MIGRATION_ADDRESS_TYPE_RDMA) {
> 
> [1]
> 
>> +        if (new_caps[MIGRATION_CAPABILITY_XBZRLE]) {
>> +            error_setg(errp, "RDMA and XBZRLE can't be used together");
>> +            return false;
>> +        }
>> +        if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
>> +            error_setg(errp, "RDMA and multifd can't be used together");
>> +            return false;
>> +        }
>> +        if (new_caps[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
>> +            error_setg(errp, "RDMA and postcopy-ram can't be used together");
>> +            return false;
>> +        }
> 
> We could introduce migration_rdma_caps_check(&caps, errp) for this chunk
> (since [1]), then...
> 
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>    /**
>>     * @migration_caps_check - check capability compatibility
>>     *
>> @@ -602,6 +625,15 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
>>            }
>>        }
>>    
>> +    /*
>> +     * In destination side, check the cases that capability is being set
>> +     * after incoming thread has started.
>> +    */
>> +    if (migrate_rdma() &&
>> +        !migration_capabilities_and_transport_compatible(
>> +            MIGRATION_ADDRESS_TYPE_RDMA, new_caps, errp)) {
>> +        return false;
>> +    }
> 
> ... use migration_rdma_caps_check() here, might be slightly more readable:
> 
>    if (migrate_rdma() && !migration_rdma_caps_check(new_caps, errp)) {
>        return false;
>    }
> 
>>        return true;
>>    }
>>    
>> diff --git a/migration/options.h b/migration/options.h
>> index 762be4e641a..ca6a40e7545 100644
>> --- a/migration/options.h
>> +++ b/migration/options.h
>> @@ -58,6 +58,9 @@ bool migrate_tls(void);
>>    /* capabilities helpers */
>>    
>>    bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp);
>> +bool
>> +migration_capabilities_and_transport_compatible(MigrationAddressType transport,
>> +                                                bool *new_caps, Error **errp);
>>
>>>
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-02-26  6:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-21  6:36 [PATCH v2 0/8] migration/rdma: fixes, refactor and cleanup Li Zhijian via
2025-02-21  6:36 ` [PATCH v2 1/8] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
2025-02-24 19:55   ` Peter Xu
2025-02-21  6:36 ` [PATCH v2 2/8] migration/rdma: Remove redundant RAM_SAVE_CONTROL_NOT_SUPP check Li Zhijian via
2025-02-21  6:36 ` [PATCH v2 3/8] migration: Kill RAM_SAVE_CONTROL_NOT_SUPP Li Zhijian via
2025-02-21  6:36 ` [PATCH v2 4/8] migration: Integrate control_save_page() logic into ram_save_target_page() Li Zhijian via
2025-02-21  6:36 ` [PATCH v2 5/8] migration: Add migration_capabilities_and_transport_compatible() helper Li Zhijian via
2025-02-24 19:58   ` Peter Xu
2025-02-25  6:37     ` Zhijian Li (Fujitsu) via
2025-02-25 14:48       ` Peter Xu
2025-02-26  6:34         ` Zhijian Li (Fujitsu) via
2025-02-21  6:36 ` [PATCH v2 6/8] migraion: disable RDMA + postcopy-ram Li Zhijian via
2025-02-24 19:58   ` Peter Xu
2025-02-21  6:36 ` [PATCH v2 7/8] migration/rdma: Remove redundant migration_in_postcopy checks Li Zhijian via
2025-02-24 20:00   ` Peter Xu
2025-02-25  6:21     ` Zhijian Li (Fujitsu) via
2025-02-25 14:50       ` Peter Xu
2025-02-21  6:36 ` [PATCH v2 8/8] migration: Add qtest for migration over RDMA Li Zhijian via
2025-02-24 20:01   ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.