qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup
@ 2025-02-26  6:30 Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 1/6] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

- It fix the RDMA migration broken issue
- disable RDMA + postcopy
- some cleanups
- Add a qtest for RDMA at last

Changes since V3:
- check RDMA and capabilities are compatible on both sides # renamed from
  previous V3's "migration: Add migration_capabilities_and_transport_compatible()"

Changes since V2:
- squash previous 2/3/4 to '[PATCH v3 5/6] migration: Unfold  control_save_page()'
- reorder the patch layout to prevent recently added code from being deleted again.
- collect Reviewed tags from Peter

Changes since V1[0]:
Add some saparate patches to refactor and cleanup based on V1

[0] https://lore.kernel.org/qemu-devel/20250218074345.638203-1-lizhijian@fujitsu.com/


Li Zhijian (6):
  migration: Prioritize RDMA in ram_save_target_page()
  migration: check RDMA and capabilities are compatible on both sides
  migration: disable RDMA + postcopy-ram
  migration/rdma: Remove redundant migration_in_postcopy checks
  migration: Unfold control_save_page()
  migration: Add qtest for migration over RDMA

 MAINTAINERS                           |  1 +
 migration/migration.c                 | 30 ++++++++-----
 migration/options.c                   | 25 +++++++++++
 migration/options.h                   |  1 +
 migration/ram.c                       | 41 +++++------------
 migration/rdma.c                      | 11 ++---
 migration/rdma.h                      |  3 +-
 scripts/rdma-migration-helper.sh      | 41 +++++++++++++++++
 tests/qtest/migration/precopy-tests.c | 64 +++++++++++++++++++++++++++
 9 files changed, 168 insertions(+), 49 deletions(-)
 create mode 100755 scripts/rdma-migration-helper.sh

-- 
2.44.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/6] migration: Prioritize RDMA in ram_save_target_page()
  2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
@ 2025-02-26  6:30 ` Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides Li Zhijian via
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

Address an error in RDMA-based migration by ensuring RDMA is prioritized
when saving pages in `ram_save_target_page()`.

Previously, the RDMA protocol's page-saving step was placed after other
protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
failures characterized by unknown control messages and state loading errors
destination:
(qemu) qemu-system-x86_64: Unknown control message QEMU FILE
qemu-system-x86_64: error while loading state section id 1(ram)
qemu-system-x86_64: load of migration failed: Operation not permitted
source:
(qemu) qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
qemu-system-x86_64: failed to save SaveStateEntry with id(name): 1(ram): -1
qemu-system-x86_64: rdma migration: recv polling control error!
qemu-system-x86_64: warning: Early error. Sending error.
qemu-system-x86_64: warning: rdma migration: send polling control error

RDMA migration implemented its own protocol/method to send pages to
destination side, hand over to RDMA first to prevent pages being saved by
other protocol.

Fixes: bc38dc2f5f3 ("migration: refactor ram_save_target_page functions")
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3:
   collect Reviewed tags
---
 migration/ram.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 589b6505eb2..424df6d9f13 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1964,6 +1964,11 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
     ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
     int res;
 
+    /* Hand over to RDMA first */
+    if (control_save_page(pss, offset, &res)) {
+        return res;
+    }
+
     if (!migrate_multifd()
         || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
         if (save_zero_page(rs, pss, offset)) {
@@ -1976,10 +1981,6 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
         return ram_save_multifd_page(block, offset);
     }
 
-    if (control_save_page(pss, offset, &res)) {
-        return res;
-    }
-
     return ram_save_page(rs, pss);
 }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides
  2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 1/6] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
@ 2025-02-26  6:30 ` Li Zhijian via
  2025-02-26 15:46   ` Peter Xu
  2025-02-26  6:30 ` [PATCH v4 3/6] migration: disable RDMA + postcopy-ram Li Zhijian via
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

Depending on the order of starting RDMA and setting capability,
the following scenarios can be categorized into the following scenarios:
Source:
 S1: [set capabilities] -> [Start RDMA outgoing]
Destination:
 D1: [set capabilities] -> [Start RDMA incoming]
 D2: [Start RDMA incoming] -> [set capabilities]

Previously, compatibility between RDMA and capabilities was verified only
in scenario D1, potentially causing migration failures in other situations.

For scenarios S1 and D1, we can seamlessly incorporate
migration_transport_compatible() to address compatibility between
channels and capabilities vs transport.

For scenario D2, ensure compatibility within migrate_caps_check().

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---

V4:
  - Remove Reviewed-tag and cover above D2 scenario

V3:
  - collect Reviewed tag
  - reorder: 5th -> 2nd
---
 migration/migration.c | 30 ++++++++++++++++++++----------
 migration/options.c   | 21 +++++++++++++++++++++
 migration/options.h   |  1 +
 3 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c597aa707e5..0736d3a6728 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -238,6 +238,24 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
     return true;
 }
 
+static bool
+migration_capabilities_and_transport_compatible(MigrationAddress *addr,
+                                                Error **errp)
+{
+    if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) {
+        return migrate_rdma_caps_check(migrate_get_current()->capabilities,
+                                       errp);
+    }
+
+    return true;
+}
+
+static bool migration_transport_compatible(MigrationAddress *addr, Error **errp)
+{
+    return migration_channels_and_transport_compatible(addr, errp) &&
+           migration_capabilities_and_transport_compatible(addr, errp);
+}
+
 static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
 {
     uintptr_t a = (uintptr_t) ap, b = (uintptr_t) bp;
@@ -716,7 +734,7 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
     }
 
     /* transport mechanism not suitable for migration? */
-    if (!migration_channels_and_transport_compatible(addr, errp)) {
+    if (!migration_transport_compatible(addr, errp)) {
         return;
     }
 
@@ -735,14 +753,6 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
         }
 #ifdef CONFIG_RDMA
     } else if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) {
-        if (migrate_xbzrle()) {
-            error_setg(errp, "RDMA and XBZRLE can't be used together");
-            return;
-        }
-        if (migrate_multifd()) {
-            error_setg(errp, "RDMA and multifd can't be used together");
-            return;
-        }
         rdma_start_incoming_migration(&addr->u.rdma, errp);
 #endif
     } else if (addr->transport == MIGRATION_ADDRESS_TYPE_EXEC) {
@@ -2159,7 +2169,7 @@ void qmp_migrate(const char *uri, bool has_channels,
     }
 
     /* transport mechanism not suitable for migration? */
-    if (!migration_channels_and_transport_compatible(addr, errp)) {
+    if (!migration_transport_compatible(addr, errp)) {
         return;
     }
 
diff --git a/migration/options.c b/migration/options.c
index bb259d192a9..c6f18df5864 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -439,6 +439,20 @@ static bool migrate_incoming_started(void)
     return !!migration_incoming_get_current()->transport_data;
 }
 
+bool migrate_rdma_caps_check(bool *caps, Error **errp)
+{
+    if (caps[MIGRATION_CAPABILITY_XBZRLE]) {
+        error_setg(errp, "RDMA and XBZRLE can't be used together");
+        return false;
+    }
+    if (caps[MIGRATION_CAPABILITY_MULTIFD]) {
+        error_setg(errp, "RDMA and multifd can't be used together");
+        return false;
+    }
+
+    return true;
+}
+
 /**
  * @migration_caps_check - check capability compatibility
  *
@@ -602,6 +616,13 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
         }
     }
 
+    /*
+     * On destination side, check the cases that capability is being set
+     * after incoming thread has started.
+    */
+    if (migrate_rdma() && !migrate_rdma_caps_check(new_caps, errp)) {
+        return false;
+    }
     return true;
 }
 
diff --git a/migration/options.h b/migration/options.h
index 762be4e641a..82d839709e7 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -57,6 +57,7 @@ bool migrate_tls(void);
 
 /* capabilities helpers */
 
+bool migrate_rdma_caps_check(bool *caps, Error **errp);
 bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp);
 
 /* parameters */
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 3/6] migration: disable RDMA + postcopy-ram
  2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 1/6] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides Li Zhijian via
@ 2025-02-26  6:30 ` Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks Li Zhijian via
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

It's believed that RDMA + postcopy-ram has been broken for a while.
Rather than spending time re-enabling it, let's simply disable it as a
trade-off.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3:
  - collect Reviewed tag
  - reoder: 6th -> 3th
---
 migration/options.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/migration/options.c b/migration/options.c
index c6f18df5864..527ba05d413 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -449,6 +449,10 @@ bool migrate_rdma_caps_check(bool *caps, Error **errp)
         error_setg(errp, "RDMA and multifd can't be used together");
         return false;
     }
+    if (caps[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+        error_setg(errp, "RDMA and postcopy-ram can't be used together");
+        return false;
+    }
 
     return true;
 }
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks
  2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
                   ` (2 preceding siblings ...)
  2025-02-26  6:30 ` [PATCH v4 3/6] migration: disable RDMA + postcopy-ram Li Zhijian via
@ 2025-02-26  6:30 ` Li Zhijian via
  2025-02-26 15:49   ` Peter Xu
  2025-02-26  6:30 ` [PATCH v4 5/6] migration: Unfold control_save_page() Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 6/6] migration: Add qtest for migration over RDMA Li Zhijian via
  5 siblings, 1 reply; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

Since we have disabled RDMA + postcopy, it's safe to remove
the migration_in_postcopy() that follows the migrate_rdma().

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3:
  reorder: 7th->4th
---
 migration/rdma.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 76fb0349238..e5b4ac599b1 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3284,7 +3284,7 @@ err:
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                            ram_addr_t offset, size_t size)
 {
-    if (!migrate_rdma() || migration_in_postcopy()) {
+    if (!migrate_rdma()) {
         return RAM_SAVE_CONTROL_NOT_SUPP;
     }
 
@@ -3829,7 +3829,7 @@ int rdma_block_notification_handle(QEMUFile *f, const char *name)
 
 int rdma_registration_start(QEMUFile *f, uint64_t flags)
 {
-    if (!migrate_rdma() || migration_in_postcopy()) {
+    if (!migrate_rdma()) {
         return 0;
     }
 
@@ -3861,7 +3861,7 @@ int rdma_registration_stop(QEMUFile *f, uint64_t flags)
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
     int ret;
 
-    if (!migrate_rdma() || migration_in_postcopy()) {
+    if (!migrate_rdma()) {
         return 0;
     }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 5/6] migration: Unfold control_save_page()
  2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
                   ` (3 preceding siblings ...)
  2025-02-26  6:30 ` [PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks Li Zhijian via
@ 2025-02-26  6:30 ` Li Zhijian via
  2025-02-26 15:51   ` Peter Xu
  2025-02-26  6:30 ` [PATCH v4 6/6] migration: Add qtest for migration over RDMA Li Zhijian via
  5 siblings, 1 reply; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

control_save_page() is for RDMA only, unfold it to make the code more
clear.
In addition:
 - Similar to other branches style in ram_save_target_page(), involve RDMA
   only if the condition 'migrate_rdma()' is true.
 - Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3:
  squash previous 2nd, 3th, 4th into one patch
---
 migration/ram.c  | 34 +++++++---------------------------
 migration/rdma.c |  7 ++-----
 migration/rdma.h |  3 +--
 3 files changed, 10 insertions(+), 34 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 424df6d9f13..c363034c882 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1143,32 +1143,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
     return len;
 }
 
-/*
- * @pages: the number of pages written by the control path,
- *        < 0 - error
- *        > 0 - number of pages written
- *
- * Return true if the pages has been saved, otherwise false is returned.
- */
-static bool control_save_page(PageSearchStatus *pss,
-                              ram_addr_t offset, int *pages)
-{
-    int ret;
-
-    ret = rdma_control_save_page(pss->pss_channel, pss->block->offset, offset,
-                                 TARGET_PAGE_SIZE);
-    if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
-        return false;
-    }
-
-    if (ret == RAM_SAVE_CONTROL_DELAYED) {
-        *pages = 1;
-        return true;
-    }
-    *pages = ret;
-    return true;
-}
-
 /*
  * directly send the page to the stream
  *
@@ -1965,7 +1939,13 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
     int res;
 
     /* Hand over to RDMA first */
-    if (control_save_page(pss, offset, &res)) {
+    if (migrate_rdma()) {
+        res = rdma_control_save_page(pss->pss_channel, pss->block->offset,
+                                     offset, TARGET_PAGE_SIZE);
+
+        if (res == RAM_SAVE_CONTROL_DELAYED) {
+            res = 1;
+        }
         return res;
     }
 
diff --git a/migration/rdma.c b/migration/rdma.c
index e5b4ac599b1..08eb924ffaa 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3284,14 +3284,11 @@ err:
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                            ram_addr_t offset, size_t size)
 {
-    if (!migrate_rdma()) {
-        return RAM_SAVE_CONTROL_NOT_SUPP;
-    }
+    assert(migrate_rdma());
 
     int ret = qemu_rdma_save_page(f, block_offset, offset, size);
 
-    if (ret != RAM_SAVE_CONTROL_DELAYED &&
-        ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+    if (ret != RAM_SAVE_CONTROL_DELAYED) {
         if (ret < 0) {
             qemu_file_set_error(f, ret);
         }
diff --git a/migration/rdma.h b/migration/rdma.h
index f55f28bbed1..8eeb0117b91 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -33,7 +33,6 @@ void rdma_start_incoming_migration(InetSocketAddress *host_port, Error **errp);
 #define RAM_CONTROL_ROUND     1
 #define RAM_CONTROL_FINISH    3
 
-#define RAM_SAVE_CONTROL_NOT_SUPP -1000
 #define RAM_SAVE_CONTROL_DELAYED  -2000
 
 #ifdef CONFIG_RDMA
@@ -56,7 +55,7 @@ static inline
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                            ram_addr_t offset, size_t size)
 {
-    return RAM_SAVE_CONTROL_NOT_SUPP;
+    g_assert_not_reached();
 }
 #endif
 #endif
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 6/6] migration: Add qtest for migration over RDMA
  2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
                   ` (4 preceding siblings ...)
  2025-02-26  6:30 ` [PATCH v4 5/6] migration: Unfold control_save_page() Li Zhijian via
@ 2025-02-26  6:30 ` Li Zhijian via
  2025-02-28 13:49   ` Fabiano Rosas
  5 siblings, 1 reply; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

This qtest requires there is a RDMA(RoCE) link in the host.
In order to make the test work smoothly, introduce a
scripts/rdma-migration-helper.sh to
- setup a new Soft-RoCE(aka RXE) if it's root
- detect existing RoCE link

Test will be skipped if there is no available RoCE link.
 # Start of rdma tests
 # Running /x86_64/migration/precopy/rdma/plain
 ok 1 /x86_64/migration/precopy/rdma/plain # SKIP
 There is no available rdma link to run RDMA migration test.
 To enable the test:
 (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the test
 or
 (2) Run the test with root privilege

 # End of rdma tests

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 MAINTAINERS                           |  1 +
 scripts/rdma-migration-helper.sh      | 41 +++++++++++++++++
 tests/qtest/migration/precopy-tests.c | 64 +++++++++++++++++++++++++++
 3 files changed, 106 insertions(+)
 create mode 100755 scripts/rdma-migration-helper.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 3848d37a38d..15360fcdc4b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3480,6 +3480,7 @@ R: Li Zhijian <lizhijian@fujitsu.com>
 R: Peter Xu <peterx@redhat.com>
 S: Odd Fixes
 F: migration/rdma*
+F: scripts/rdma-migration-helper.sh
 
 Migration dirty limit and dirty page rate
 M: Hyman Huang <yong.huang@smartx.com>
diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh
new file mode 100755
index 00000000000..66557d9e267
--- /dev/null
+++ b/scripts/rdma-migration-helper.sh
@@ -0,0 +1,41 @@
+#!/bin/bash
+
+# Copied from blktests
+get_ipv4_addr()
+{
+    ip -4 -o addr show dev "$1" |
+        sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
+        tr -d '\n'
+}
+
+has_soft_rdma()
+{
+    rdma link | grep -q " netdev $1[[:blank:]]*\$"
+}
+
+rdma_rxe_setup_detect()
+{
+    (
+        cd /sys/class/net &&
+            for i in *; do
+                [ -e "$i" ] || continue
+                [ "$i" = "lo" ] && continue
+                [ "$(<"$i/addr_len")" = 6 ] || continue
+                [ "$(<"$i/carrier")" = 1 ] || continue
+
+                has_soft_rdma "$i" && break
+                [ "$operation" = "setup" ] &&
+                    rdma link add "${i}_rxe" type rxe netdev "$i" && break
+            done
+        has_soft_rdma "$i" || return
+        get_ipv4_addr "$i"
+    )
+}
+
+operation=${1:-setup}
+
+if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
+    rdma_rxe_setup_detect
+else
+    echo "Usage: $0 [setup | detect]"
+fi
diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index ba273d10b9a..bf97f4e9325 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -99,6 +99,66 @@ static void test_precopy_unix_dirty_ring(void)
     test_precopy_common(&args);
 }
 
+#ifdef CONFIG_RDMA
+
+#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
+static int new_rdma_link(char *buffer)
+{
+    const char *argument = (geteuid() == 0) ? "setup" : "detect";
+    char cmd[1024];
+
+    snprintf(cmd, sizeof(cmd), "%s %s", RDMA_MIGRATION_HELPER, argument);
+
+    FILE *pipe = popen(cmd, "r");
+    if (pipe == NULL) {
+        perror("Failed to run script");
+        return -1;
+    }
+
+    int idx = 0;
+    while (fgets(buffer + idx, 128 - idx, pipe) != NULL) {
+        idx += strlen(buffer);
+    }
+
+    int status = pclose(pipe);
+    if (status == -1) {
+        perror("Error reported by pclose()");
+        return -1;
+    } else if (WIFEXITED(status)) {
+        return WEXITSTATUS(status);
+    }
+
+    return -1;
+}
+
+static void test_precopy_rdma_plain(void)
+{
+    char buffer[128] = {};
+
+    if (new_rdma_link(buffer)) {
+        g_test_skip("\nThere is no available rdma link to run RDMA migration test.\n"
+                    "To enable the test:\n"
+                    "(1) Run \'" RDMA_MIGRATION_HELPER " setup\' with root and rerun the test\n"
+                    "or\n"
+                    "(2) Run the test with root privilege\n");
+        return;
+    }
+
+    /*
+     * TODO: query a free port instead of hard code.
+     * 29200=('R'+'D'+'M'+'A')*100
+     **/
+    g_autofree char *uri = g_strdup_printf("rdma:%s:29200", buffer);
+
+    MigrateCommon args = {
+        .listen_uri = uri,
+        .connect_uri = uri,
+    };
+
+    test_precopy_common(&args);
+}
+#endif
+
 static void test_precopy_tcp_plain(void)
 {
     MigrateCommon args = {
@@ -1124,6 +1184,10 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env)
                        test_multifd_tcp_uri_none);
     migration_test_add("/migration/multifd/tcp/plain/cancel",
                        test_multifd_tcp_cancel);
+#ifdef CONFIG_RDMA
+    migration_test_add("/migration/precopy/rdma/plain",
+                       test_precopy_rdma_plain);
+#endif
 }
 
 void migration_test_add_precopy(MigrationTestEnv *env)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides
  2025-02-26  6:30 ` [PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides Li Zhijian via
@ 2025-02-26 15:46   ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2025-02-26 15:46 UTC (permalink / raw)
  To: Li Zhijian; +Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini

On Wed, Feb 26, 2025 at 02:30:39PM +0800, Li Zhijian wrote:
> Depending on the order of starting RDMA and setting capability,
> the following scenarios can be categorized into the following scenarios:
> Source:
>  S1: [set capabilities] -> [Start RDMA outgoing]
> Destination:
>  D1: [set capabilities] -> [Start RDMA incoming]
>  D2: [Start RDMA incoming] -> [set capabilities]
> 
> Previously, compatibility between RDMA and capabilities was verified only
> in scenario D1, potentially causing migration failures in other situations.
> 
> For scenarios S1 and D1, we can seamlessly incorporate
> migration_transport_compatible() to address compatibility between
> channels and capabilities vs transport.
> 
> For scenario D2, ensure compatibility within migrate_caps_check().
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks
  2025-02-26  6:30 ` [PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks Li Zhijian via
@ 2025-02-26 15:49   ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2025-02-26 15:49 UTC (permalink / raw)
  To: Li Zhijian; +Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini

On Wed, Feb 26, 2025 at 02:30:41PM +0800, Li Zhijian wrote:
> Since we have disabled RDMA + postcopy, it's safe to remove
> the migration_in_postcopy() that follows the migrate_rdma().
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 5/6] migration: Unfold control_save_page()
  2025-02-26  6:30 ` [PATCH v4 5/6] migration: Unfold control_save_page() Li Zhijian via
@ 2025-02-26 15:51   ` Peter Xu
  2025-02-27  0:42     ` Zhijian Li (Fujitsu) via
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2025-02-26 15:51 UTC (permalink / raw)
  To: Li Zhijian; +Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini

On Wed, Feb 26, 2025 at 02:30:42PM +0800, Li Zhijian wrote:
> control_save_page() is for RDMA only, unfold it to make the code more
> clear.
> In addition:
>  - Similar to other branches style in ram_save_target_page(), involve RDMA
>    only if the condition 'migrate_rdma()' is true.
>  - Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

[...]

> @@ -56,7 +55,7 @@ static inline
>  int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
>                             ram_addr_t offset, size_t size)
>  {
> -    return RAM_SAVE_CONTROL_NOT_SUPP;
> +    g_assert_not_reached();
>  }

Not sure if some compiler will be unhappy on the retval not provided, but
anyway we'll see..

Reviewed-by: Peter Xu <peterx@redhat.com>

>  #endif
>  #endif
> -- 
> 2.44.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 5/6] migration: Unfold control_save_page()
  2025-02-26 15:51   ` Peter Xu
@ 2025-02-27  0:42     ` Zhijian Li (Fujitsu) via
  2025-02-27 16:43       ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2025-02-27  0:42 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini



On 26/02/2025 23:51, Peter Xu wrote:
> On Wed, Feb 26, 2025 at 02:30:42PM +0800, Li Zhijian wrote:
>> control_save_page() is for RDMA only, unfold it to make the code more
>> clear.
>> In addition:
>>   - Similar to other branches style in ram_save_target_page(), involve RDMA
>>     only if the condition 'migrate_rdma()' is true.
>>   - Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.
>>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> 
> [...]
> 
>> @@ -56,7 +55,7 @@ static inline
>>   int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
>>                              ram_addr_t offset, size_t size)
>>   {
>> -    return RAM_SAVE_CONTROL_NOT_SUPP;
>> +    g_assert_not_reached();
>>   }
> 
> Not sure if some compiler will be unhappy on the retval not provided, but
> anyway we'll see..

There is no problem in fedora 40(gcc 14.2.1) and ubuntu2204(gcc 11.4.0) with --disable-rdma.

I also noticed we have a few existing same usage:

1708 bool ram_write_tracking_compatible(void)
1709 {
1710     g_assert_not_reached();
1711 }
1712
1713 int ram_write_tracking_start(void)
1714 {
1715     g_assert_not_reached();
1716 }
1717
1718 void ram_write_tracking_stop(void)
1719 {
1720     g_assert_not_reached();
1721 }


I also asked the AI/Deepseek-R1, pasted a piece of his answer

```
3. Why No Warning for Missing return? 🚨
Typical case: A non-void function missing a return triggers -Wreturn-type warnings (enabled by -Wall).
This case: The noreturn annotation ensures no execution path exists beyond g_assert_not_reached(). Because the compiler recognizes this, no warning is necessary.

Conclusion
GCC trusts the noreturn annotation of g_assert_not_reached(), recognizing that the function’s control flow ends there. Thus, no warning is emitted. For code safety, ensure assertions are active or add fallback code if needed.
```

> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
>>   #endif
>>   #endif
>> -- 
>> 2.44.0
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 5/6] migration: Unfold control_save_page()
  2025-02-27  0:42     ` Zhijian Li (Fujitsu) via
@ 2025-02-27 16:43       ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2025-02-27 16:43 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini

On Thu, Feb 27, 2025 at 12:42:30AM +0000, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 26/02/2025 23:51, Peter Xu wrote:
> > On Wed, Feb 26, 2025 at 02:30:42PM +0800, Li Zhijian wrote:
> >> control_save_page() is for RDMA only, unfold it to make the code more
> >> clear.
> >> In addition:
> >>   - Similar to other branches style in ram_save_target_page(), involve RDMA
> >>     only if the condition 'migrate_rdma()' is true.
> >>   - Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.
> >>
> >> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> > 
> > [...]
> > 
> >> @@ -56,7 +55,7 @@ static inline
> >>   int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
> >>                              ram_addr_t offset, size_t size)
> >>   {
> >> -    return RAM_SAVE_CONTROL_NOT_SUPP;
> >> +    g_assert_not_reached();
> >>   }
> > 
> > Not sure if some compiler will be unhappy on the retval not provided, but
> > anyway we'll see..
> 
> There is no problem in fedora 40(gcc 14.2.1) and ubuntu2204(gcc 11.4.0) with --disable-rdma.
> 
> I also noticed we have a few existing same usage:
> 
> 1708 bool ram_write_tracking_compatible(void)
> 1709 {
> 1710     g_assert_not_reached();
> 1711 }
> 1712
> 1713 int ram_write_tracking_start(void)
> 1714 {
> 1715     g_assert_not_reached();
> 1716 }
> 1717
> 1718 void ram_write_tracking_stop(void)
> 1719 {
> 1720     g_assert_not_reached();
> 1721 }

Right.

The other question is what about G_DISABLE_ASSERT, then I found this:

osdep.h:

#ifdef G_DISABLE_ASSERT
#error building with G_DISABLE_ASSERT is not supported
#endif

So yeah, we should be good.

> 
> 
> I also asked the AI/Deepseek-R1, pasted a piece of his answer
> 
> ```
> 3. Why No Warning for Missing return? 🚨
> Typical case: A non-void function missing a return triggers -Wreturn-type warnings (enabled by -Wall).
> This case: The noreturn annotation ensures no execution path exists beyond g_assert_not_reached(). Because the compiler recognizes this, no warning is necessary.
> 
> Conclusion
> GCC trusts the noreturn annotation of g_assert_not_reached(), recognizing that the function’s control flow ends there. Thus, no warning is emitted. For code safety, ensure assertions are active or add fallback code if needed.
> ```
> 
> > 
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > 
> >>   #endif
> >>   #endif
> >> -- 
> >> 2.44.0
> >>
> > 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 6/6] migration: Add qtest for migration over RDMA
  2025-02-26  6:30 ` [PATCH v4 6/6] migration: Add qtest for migration over RDMA Li Zhijian via
@ 2025-02-28 13:49   ` Fabiano Rosas
  2025-03-03  2:10     ` Zhijian Li (Fujitsu) via
  0 siblings, 1 reply; 14+ messages in thread
From: Fabiano Rosas @ 2025-02-28 13:49 UTC (permalink / raw)
  To: Li Zhijian via, qemu-devel
  Cc: Peter Xu, Laurent Vivier, Paolo Bonzini, Li Zhijian

Li Zhijian via <qemu-devel@nongnu.org> writes:

> This qtest requires there is a RDMA(RoCE) link in the host.
> In order to make the test work smoothly, introduce a
> scripts/rdma-migration-helper.sh to
> - setup a new Soft-RoCE(aka RXE) if it's root
> - detect existing RoCE link
>
> Test will be skipped if there is no available RoCE link.
>  # Start of rdma tests
>  # Running /x86_64/migration/precopy/rdma/plain
>  ok 1 /x86_64/migration/precopy/rdma/plain # SKIP
>  There is no available rdma link to run RDMA migration test.
>  To enable the test:
>  (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the test

sudo scripts/rdma-migration-helper.sh setup
QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test
--full -r /x86_64/migration/precopy/rdma/plain

# {
#     "error": {
#         "class": "GenericError",
#         "desc": "RDMA ERROR: rdma migration: error registering 0 control!"
#     }
# }

>  or
>  (2) Run the test with root privilege

This one works fine.

>
>  # End of rdma tests
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
>  MAINTAINERS                           |  1 +
>  scripts/rdma-migration-helper.sh      | 41 +++++++++++++++++
>  tests/qtest/migration/precopy-tests.c | 64 +++++++++++++++++++++++++++
>  3 files changed, 106 insertions(+)
>  create mode 100755 scripts/rdma-migration-helper.sh
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3848d37a38d..15360fcdc4b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3480,6 +3480,7 @@ R: Li Zhijian <lizhijian@fujitsu.com>
>  R: Peter Xu <peterx@redhat.com>
>  S: Odd Fixes
>  F: migration/rdma*
> +F: scripts/rdma-migration-helper.sh
>  
>  Migration dirty limit and dirty page rate
>  M: Hyman Huang <yong.huang@smartx.com>
> diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh
> new file mode 100755
> index 00000000000..66557d9e267
> --- /dev/null
> +++ b/scripts/rdma-migration-helper.sh
> @@ -0,0 +1,41 @@
> +#!/bin/bash
> +

I'd prefer a command -v rdma check around here. With the way the script
pipes commands into one another will cause bash to emit a couple of
"rdma: command not found" in case rdma command is not present.

> +# Copied from blktests
> +get_ipv4_addr()
> +{
> +    ip -4 -o addr show dev "$1" |
> +        sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
> +        tr -d '\n'
> +}
> +
> +has_soft_rdma()
> +{
> +    rdma link | grep -q " netdev $1[[:blank:]]*\$"
> +}
> +
> +rdma_rxe_setup_detect()
> +{
> +    (
> +        cd /sys/class/net &&
> +            for i in *; do
> +                [ -e "$i" ] || continue
> +                [ "$i" = "lo" ] && continue
> +                [ "$(<"$i/addr_len")" = 6 ] || continue
> +                [ "$(<"$i/carrier")" = 1 ] || continue
> +
> +                has_soft_rdma "$i" && break
> +                [ "$operation" = "setup" ] &&
> +                    rdma link add "${i}_rxe" type rxe netdev "$i" && break
> +            done
> +        has_soft_rdma "$i" || return
> +        get_ipv4_addr "$i"
> +    )
> +}
> +
> +operation=${1:-setup}
> +
> +if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
> +    rdma_rxe_setup_detect
> +else
> +    echo "Usage: $0 [setup | detect]"
> +fi

What happened to the cleanup option? I think I missed some discussion on
this... We can't expect people to know how to clean this up without any
hint.

> diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
> index ba273d10b9a..bf97f4e9325 100644
> --- a/tests/qtest/migration/precopy-tests.c
> +++ b/tests/qtest/migration/precopy-tests.c
> @@ -99,6 +99,66 @@ static void test_precopy_unix_dirty_ring(void)
>      test_precopy_common(&args);
>  }
>  
> +#ifdef CONFIG_RDMA
> +
> +#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
> +static int new_rdma_link(char *buffer)
> +{
> +    const char *argument = (geteuid() == 0) ? "setup" : "detect";
> +    char cmd[1024];
> +
> +    snprintf(cmd, sizeof(cmd), "%s %s", RDMA_MIGRATION_HELPER, argument);
> +
> +    FILE *pipe = popen(cmd, "r");

This needs to be silenced, otherwise messages from the script will break
TAP output. I suggest:

    bool verbose = g_getenv("QTEST_LOG");

    snprintf(cmd, sizeof(cmd), "%s %s %s", RDMA_MIGRATION_HELPER, argument,
             verbose ? "" : "2>/dev/null");

> +    if (pipe == NULL) {
> +        perror("Failed to run script");
> +        return -1;
> +    }
> +
> +    int idx = 0;
> +    while (fgets(buffer + idx, 128 - idx, pipe) != NULL) {
> +        idx += strlen(buffer);
> +    }
> +
> +    int status = pclose(pipe);
> +    if (status == -1) {
> +        perror("Error reported by pclose()");
> +        return -1;
> +    } else if (WIFEXITED(status)) {
> +        return WEXITSTATUS(status);
> +    }
> +
> +    return -1;
> +}
> +
> +static void test_precopy_rdma_plain(void)
> +{
> +    char buffer[128] = {};
> +
> +    if (new_rdma_link(buffer)) {
> +        g_test_skip("\nThere is no available rdma link to run RDMA migration test.\n"
> +                    "To enable the test:\n"
> +                    "(1) Run \'" RDMA_MIGRATION_HELPER " setup\' with root and rerun the test\n"
> +                    "or\n"
> +                    "(2) Run the test with root privilege\n");

g_test_skip() needs a one-line message, otherwise it breaks TAP
output. You can turn this into a g_test_message(), put it under
QTEST_LOG=1 and add a g_test_skip("no rdma link available") below.

> +        return;
> +    }
> +
> +    /*
> +     * TODO: query a free port instead of hard code.
> +     * 29200=('R'+'D'+'M'+'A')*100
> +     **/
> +    g_autofree char *uri = g_strdup_printf("rdma:%s:29200", buffer);
> +
> +    MigrateCommon args = {
> +        .listen_uri = uri,
> +        .connect_uri = uri,
> +    };
> +
> +    test_precopy_common(&args);
> +}
> +#endif
> +
>  static void test_precopy_tcp_plain(void)
>  {
>      MigrateCommon args = {
> @@ -1124,6 +1184,10 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env)
>                         test_multifd_tcp_uri_none);
>      migration_test_add("/migration/multifd/tcp/plain/cancel",
>                         test_multifd_tcp_cancel);
> +#ifdef CONFIG_RDMA
> +    migration_test_add("/migration/precopy/rdma/plain",
> +                       test_precopy_rdma_plain);
> +#endif
>  }
>  
>  void migration_test_add_precopy(MigrationTestEnv *env)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 6/6] migration: Add qtest for migration over RDMA
  2025-02-28 13:49   ` Fabiano Rosas
@ 2025-03-03  2:10     ` Zhijian Li (Fujitsu) via
  0 siblings, 0 replies; 14+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2025-03-03  2:10 UTC (permalink / raw)
  To: Fabiano Rosas, Li Zhijian via; +Cc: Peter Xu, Laurent Vivier, Paolo Bonzini

Fabiano

Thanks for your testing.

On 28/02/2025 21:49, Fabiano Rosas wrote:
> Li Zhijian via <qemu-devel@nongnu.org> writes:
> 
>> This qtest requires there is a RDMA(RoCE) link in the host.
>> In order to make the test work smoothly, introduce a
>> scripts/rdma-migration-helper.sh to
>> - setup a new Soft-RoCE(aka RXE) if it's root
>> - detect existing RoCE link
>>
>> Test will be skipped if there is no available RoCE link.
>>   # Start of rdma tests
>>   # Running /x86_64/migration/precopy/rdma/plain
>>   ok 1 /x86_64/migration/precopy/rdma/plain # SKIP
>>   There is no available rdma link to run RDMA migration test.
>>   To enable the test:
>>   (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the test
> 
> sudo scripts/rdma-migration-helper.sh setup
> QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test
> --full -r /x86_64/migration/precopy/rdma/plain
> 
> # {
> #     "error": {
> #         "class": "GenericError",
> #         "desc": "RDMA ERROR: rdma migration: error registering 0 control!"
> #     }
> # }
> 


1333 static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
1334 {
1335     rdma->wr_data[idx].control_mr = ibv_reg_mr(rdma->pd,
1336             rdma->wr_data[idx].control, RDMA_CONTROL_MAX_BUFFER,
1337             IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE);  <<<=== It failed here
            
1338     if (rdma->wr_data[idx].control_mr) {
1339         rdma->total_registrations++;
1340         return 0;
1341     }
1342     return -1;
1343 }


It appears to have failed at ibv_reg_mr()
  
This worked on my Ubuntu2204 and Fedora40. I wonder if your distro's security policy
is preventing MR registration without root privileges...?




>>   or
>>   (2) Run the test with root privilege
> 
> This one works fine.
> 
>>
>>   # End of rdma tests
>>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>>   MAINTAINERS                           |  1 +
>>   scripts/rdma-migration-helper.sh      | 41 +++++++++++++++++
>>   tests/qtest/migration/precopy-tests.c | 64 +++++++++++++++++++++++++++
>>   3 files changed, 106 insertions(+)
>>   create mode 100755 scripts/rdma-migration-helper.sh
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 3848d37a38d..15360fcdc4b 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3480,6 +3480,7 @@ R: Li Zhijian <lizhijian@fujitsu.com>
>>   R: Peter Xu <peterx@redhat.com>
>>   S: Odd Fixes
>>   F: migration/rdma*
>> +F: scripts/rdma-migration-helper.sh
>>   
>>   Migration dirty limit and dirty page rate
>>   M: Hyman Huang <yong.huang@smartx.com>
>> diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh
>> new file mode 100755
>> index 00000000000..66557d9e267
>> --- /dev/null
>> +++ b/scripts/rdma-migration-helper.sh
>> @@ -0,0 +1,41 @@
>> +#!/bin/bash
>> +
> 
> I'd prefer a command -v rdma check around here. With the way the script
> pipes commands into one another will cause bash to emit a couple of
> "rdma: command not found" in case rdma command is not present.
> 

It sounds good to me.


>> +# Copied from blktests
>> +get_ipv4_addr()
>> +{
>> +    ip -4 -o addr show dev "$1" |
>> +        sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
>> +        tr -d '\n'
>> +}
>> +
>> +has_soft_rdma()
>> +{
>> +    rdma link | grep -q " netdev $1[[:blank:]]*\$"
>> +}
>> +
>> +rdma_rxe_setup_detect()
>> +{
>> +    (
>> +        cd /sys/class/net &&
>> +            for i in *; do
>> +                [ -e "$i" ] || continue
>> +                [ "$i" = "lo" ] && continue
>> +                [ "$(<"$i/addr_len")" = 6 ] || continue
>> +                [ "$(<"$i/carrier")" = 1 ] || continue
>> +
>> +                has_soft_rdma "$i" && break
>> +                [ "$operation" = "setup" ] &&
>> +                    rdma link add "${i}_rxe" type rxe netdev "$i" && break
>> +            done
>> +        has_soft_rdma "$i" || return
>> +        get_ipv4_addr "$i"
>> +    )
>> +}
>> +
>> +operation=${1:-setup}
>> +
>> +if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
>> +    rdma_rxe_setup_detect
>> +else
>> +    echo "Usage: $0 [setup | detect]"
>> +fi
> 
> What happened to the cleanup option? I think I missed some discussion on
> this... We can't expect people to know how to clean this up without any
> hint.

Nothing special, one reason could be to keep it as simple as possible in the beginning.

I'm fine to add it back.


> 
>> diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
>> index ba273d10b9a..bf97f4e9325 100644
>> --- a/tests/qtest/migration/precopy-tests.c
>> +++ b/tests/qtest/migration/precopy-tests.c
>> @@ -99,6 +99,66 @@ static void test_precopy_unix_dirty_ring(void)
>>       test_precopy_common(&args);
>>   }
>>   
>> +#ifdef CONFIG_RDMA
>> +
>> +#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
>> +static int new_rdma_link(char *buffer)
>> +{
>> +    const char *argument = (geteuid() == 0) ? "setup" : "detect";
>> +    char cmd[1024];
>> +
>> +    snprintf(cmd, sizeof(cmd), "%s %s", RDMA_MIGRATION_HELPER, argument);
>> +
>> +    FILE *pipe = popen(cmd, "r");
> 
> This needs to be silenced, otherwise messages from the script will break
> TAP output. I suggest:
> 
>      bool verbose = g_getenv("QTEST_LOG");
> 
>      snprintf(cmd, sizeof(cmd), "%s %s %s", RDMA_MIGRATION_HELPER, argument,
>               verbose ? "" : "2>/dev/null");
> 

It sound good to me, i will update it.


>> +    if (pipe == NULL) {
>> +        perror("Failed to run script");
>> +        return -1;
>> +    }
>> +
>> +    int idx = 0;
>> +    while (fgets(buffer + idx, 128 - idx, pipe) != NULL) {
>> +        idx += strlen(buffer);
>> +    }
>> +
>> +    int status = pclose(pipe);
>> +    if (status == -1) {
>> +        perror("Error reported by pclose()");
>> +        return -1;
>> +    } else if (WIFEXITED(status)) {
>> +        return WEXITSTATUS(status);
>> +    }
>> +
>> +    return -1;
>> +}
>> +
>> +static void test_precopy_rdma_plain(void)
>> +{
>> +    char buffer[128] = {};
>> +
>> +    if (new_rdma_link(buffer)) {
>> +        g_test_skip("\nThere is no available rdma link to run RDMA migration test.\n"
>> +                    "To enable the test:\n"
>> +                    "(1) Run \'" RDMA_MIGRATION_HELPER " setup\' with root and rerun the test\n"
>> +                    "or\n"
>> +                    "(2) Run the test with root privilege\n");
> 
> g_test_skip() needs a one-line message, otherwise it breaks TAP
> output. You can turn this into a g_test_message(), put it under
> QTEST_LOG=1 and add a g_test_skip("no rdma link available") below.


Ditto.

Thanks
Zhijian

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-03-03  2:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
2025-02-26  6:30 ` [PATCH v4 1/6] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
2025-02-26  6:30 ` [PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides Li Zhijian via
2025-02-26 15:46   ` Peter Xu
2025-02-26  6:30 ` [PATCH v4 3/6] migration: disable RDMA + postcopy-ram Li Zhijian via
2025-02-26  6:30 ` [PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks Li Zhijian via
2025-02-26 15:49   ` Peter Xu
2025-02-26  6:30 ` [PATCH v4 5/6] migration: Unfold control_save_page() Li Zhijian via
2025-02-26 15:51   ` Peter Xu
2025-02-27  0:42     ` Zhijian Li (Fujitsu) via
2025-02-27 16:43       ` Peter Xu
2025-02-26  6:30 ` [PATCH v4 6/6] migration: Add qtest for migration over RDMA Li Zhijian via
2025-02-28 13:49   ` Fabiano Rosas
2025-03-03  2:10     ` Zhijian Li (Fujitsu) via

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).