All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup
@ 2025-02-26  6:30 Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 1/6] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

- It fix the RDMA migration broken issue
- disable RDMA + postcopy
- some cleanups
- Add a qtest for RDMA at last

Changes since V3:
- check RDMA and capabilities are compatible on both sides # renamed from
  previous V3's "migration: Add migration_capabilities_and_transport_compatible()"

Changes since V2:
- squash previous 2/3/4 to '[PATCH v3 5/6] migration: Unfold  control_save_page()'
- reorder the patch layout to prevent recently added code from being deleted again.
- collect Reviewed tags from Peter

Changes since V1[0]:
Add some saparate patches to refactor and cleanup based on V1

[0] https://lore.kernel.org/qemu-devel/20250218074345.638203-1-lizhijian@fujitsu.com/


Li Zhijian (6):
  migration: Prioritize RDMA in ram_save_target_page()
  migration: check RDMA and capabilities are compatible on both sides
  migration: disable RDMA + postcopy-ram
  migration/rdma: Remove redundant migration_in_postcopy checks
  migration: Unfold control_save_page()
  migration: Add qtest for migration over RDMA

 MAINTAINERS                           |  1 +
 migration/migration.c                 | 30 ++++++++-----
 migration/options.c                   | 25 +++++++++++
 migration/options.h                   |  1 +
 migration/ram.c                       | 41 +++++------------
 migration/rdma.c                      | 11 ++---
 migration/rdma.h                      |  3 +-
 scripts/rdma-migration-helper.sh      | 41 +++++++++++++++++
 tests/qtest/migration/precopy-tests.c | 64 +++++++++++++++++++++++++++
 9 files changed, 168 insertions(+), 49 deletions(-)
 create mode 100755 scripts/rdma-migration-helper.sh

-- 
2.44.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/6] migration: Prioritize RDMA in ram_save_target_page()
  2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
@ 2025-02-26  6:30 ` Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides Li Zhijian via
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

Address an error in RDMA-based migration by ensuring RDMA is prioritized
when saving pages in `ram_save_target_page()`.

Previously, the RDMA protocol's page-saving step was placed after other
protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
failures characterized by unknown control messages and state loading errors
destination:
(qemu) qemu-system-x86_64: Unknown control message QEMU FILE
qemu-system-x86_64: error while loading state section id 1(ram)
qemu-system-x86_64: load of migration failed: Operation not permitted
source:
(qemu) qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
qemu-system-x86_64: failed to save SaveStateEntry with id(name): 1(ram): -1
qemu-system-x86_64: rdma migration: recv polling control error!
qemu-system-x86_64: warning: Early error. Sending error.
qemu-system-x86_64: warning: rdma migration: send polling control error

RDMA migration implemented its own protocol/method to send pages to
destination side, hand over to RDMA first to prevent pages being saved by
other protocol.

Fixes: bc38dc2f5f3 ("migration: refactor ram_save_target_page functions")
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3:
   collect Reviewed tags
---
 migration/ram.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 589b6505eb2..424df6d9f13 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1964,6 +1964,11 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
     ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
     int res;
 
+    /* Hand over to RDMA first */
+    if (control_save_page(pss, offset, &res)) {
+        return res;
+    }
+
     if (!migrate_multifd()
         || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
         if (save_zero_page(rs, pss, offset)) {
@@ -1976,10 +1981,6 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
         return ram_save_multifd_page(block, offset);
     }
 
-    if (control_save_page(pss, offset, &res)) {
-        return res;
-    }
-
     return ram_save_page(rs, pss);
 }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides
  2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 1/6] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
@ 2025-02-26  6:30 ` Li Zhijian via
  2025-02-26 15:46   ` Peter Xu
  2025-02-26  6:30 ` [PATCH v4 3/6] migration: disable RDMA + postcopy-ram Li Zhijian via
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

Depending on the order of starting RDMA and setting capability,
the following scenarios can be categorized into the following scenarios:
Source:
 S1: [set capabilities] -> [Start RDMA outgoing]
Destination:
 D1: [set capabilities] -> [Start RDMA incoming]
 D2: [Start RDMA incoming] -> [set capabilities]

Previously, compatibility between RDMA and capabilities was verified only
in scenario D1, potentially causing migration failures in other situations.

For scenarios S1 and D1, we can seamlessly incorporate
migration_transport_compatible() to address compatibility between
channels and capabilities vs transport.

For scenario D2, ensure compatibility within migrate_caps_check().

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---

V4:
  - Remove Reviewed-tag and cover above D2 scenario

V3:
  - collect Reviewed tag
  - reorder: 5th -> 2nd
---
 migration/migration.c | 30 ++++++++++++++++++++----------
 migration/options.c   | 21 +++++++++++++++++++++
 migration/options.h   |  1 +
 3 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c597aa707e5..0736d3a6728 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -238,6 +238,24 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
     return true;
 }
 
+static bool
+migration_capabilities_and_transport_compatible(MigrationAddress *addr,
+                                                Error **errp)
+{
+    if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) {
+        return migrate_rdma_caps_check(migrate_get_current()->capabilities,
+                                       errp);
+    }
+
+    return true;
+}
+
+static bool migration_transport_compatible(MigrationAddress *addr, Error **errp)
+{
+    return migration_channels_and_transport_compatible(addr, errp) &&
+           migration_capabilities_and_transport_compatible(addr, errp);
+}
+
 static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
 {
     uintptr_t a = (uintptr_t) ap, b = (uintptr_t) bp;
@@ -716,7 +734,7 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
     }
 
     /* transport mechanism not suitable for migration? */
-    if (!migration_channels_and_transport_compatible(addr, errp)) {
+    if (!migration_transport_compatible(addr, errp)) {
         return;
     }
 
@@ -735,14 +753,6 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
         }
 #ifdef CONFIG_RDMA
     } else if (addr->transport == MIGRATION_ADDRESS_TYPE_RDMA) {
-        if (migrate_xbzrle()) {
-            error_setg(errp, "RDMA and XBZRLE can't be used together");
-            return;
-        }
-        if (migrate_multifd()) {
-            error_setg(errp, "RDMA and multifd can't be used together");
-            return;
-        }
         rdma_start_incoming_migration(&addr->u.rdma, errp);
 #endif
     } else if (addr->transport == MIGRATION_ADDRESS_TYPE_EXEC) {
@@ -2159,7 +2169,7 @@ void qmp_migrate(const char *uri, bool has_channels,
     }
 
     /* transport mechanism not suitable for migration? */
-    if (!migration_channels_and_transport_compatible(addr, errp)) {
+    if (!migration_transport_compatible(addr, errp)) {
         return;
     }
 
diff --git a/migration/options.c b/migration/options.c
index bb259d192a9..c6f18df5864 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -439,6 +439,20 @@ static bool migrate_incoming_started(void)
     return !!migration_incoming_get_current()->transport_data;
 }
 
+bool migrate_rdma_caps_check(bool *caps, Error **errp)
+{
+    if (caps[MIGRATION_CAPABILITY_XBZRLE]) {
+        error_setg(errp, "RDMA and XBZRLE can't be used together");
+        return false;
+    }
+    if (caps[MIGRATION_CAPABILITY_MULTIFD]) {
+        error_setg(errp, "RDMA and multifd can't be used together");
+        return false;
+    }
+
+    return true;
+}
+
 /**
  * @migration_caps_check - check capability compatibility
  *
@@ -602,6 +616,13 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
         }
     }
 
+    /*
+     * On destination side, check the cases that capability is being set
+     * after incoming thread has started.
+    */
+    if (migrate_rdma() && !migrate_rdma_caps_check(new_caps, errp)) {
+        return false;
+    }
     return true;
 }
 
diff --git a/migration/options.h b/migration/options.h
index 762be4e641a..82d839709e7 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -57,6 +57,7 @@ bool migrate_tls(void);
 
 /* capabilities helpers */
 
+bool migrate_rdma_caps_check(bool *caps, Error **errp);
 bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp);
 
 /* parameters */
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 3/6] migration: disable RDMA + postcopy-ram
  2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 1/6] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides Li Zhijian via
@ 2025-02-26  6:30 ` Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks Li Zhijian via
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

It's believed that RDMA + postcopy-ram has been broken for a while.
Rather than spending time re-enabling it, let's simply disable it as a
trade-off.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3:
  - collect Reviewed tag
  - reoder: 6th -> 3th
---
 migration/options.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/migration/options.c b/migration/options.c
index c6f18df5864..527ba05d413 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -449,6 +449,10 @@ bool migrate_rdma_caps_check(bool *caps, Error **errp)
         error_setg(errp, "RDMA and multifd can't be used together");
         return false;
     }
+    if (caps[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+        error_setg(errp, "RDMA and postcopy-ram can't be used together");
+        return false;
+    }
 
     return true;
 }
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks
  2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
                   ` (2 preceding siblings ...)
  2025-02-26  6:30 ` [PATCH v4 3/6] migration: disable RDMA + postcopy-ram Li Zhijian via
@ 2025-02-26  6:30 ` Li Zhijian via
  2025-02-26 15:49   ` Peter Xu
  2025-02-26  6:30 ` [PATCH v4 5/6] migration: Unfold control_save_page() Li Zhijian via
  2025-02-26  6:30 ` [PATCH v4 6/6] migration: Add qtest for migration over RDMA Li Zhijian via
  5 siblings, 1 reply; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

Since we have disabled RDMA + postcopy, it's safe to remove
the migration_in_postcopy() that follows the migrate_rdma().

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3:
  reorder: 7th->4th
---
 migration/rdma.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 76fb0349238..e5b4ac599b1 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3284,7 +3284,7 @@ err:
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                            ram_addr_t offset, size_t size)
 {
-    if (!migrate_rdma() || migration_in_postcopy()) {
+    if (!migrate_rdma()) {
         return RAM_SAVE_CONTROL_NOT_SUPP;
     }
 
@@ -3829,7 +3829,7 @@ int rdma_block_notification_handle(QEMUFile *f, const char *name)
 
 int rdma_registration_start(QEMUFile *f, uint64_t flags)
 {
-    if (!migrate_rdma() || migration_in_postcopy()) {
+    if (!migrate_rdma()) {
         return 0;
     }
 
@@ -3861,7 +3861,7 @@ int rdma_registration_stop(QEMUFile *f, uint64_t flags)
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
     int ret;
 
-    if (!migrate_rdma() || migration_in_postcopy()) {
+    if (!migrate_rdma()) {
         return 0;
     }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 5/6] migration: Unfold control_save_page()
  2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
                   ` (3 preceding siblings ...)
  2025-02-26  6:30 ` [PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks Li Zhijian via
@ 2025-02-26  6:30 ` Li Zhijian via
  2025-02-26 15:51   ` Peter Xu
  2025-02-26  6:30 ` [PATCH v4 6/6] migration: Add qtest for migration over RDMA Li Zhijian via
  5 siblings, 1 reply; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

control_save_page() is for RDMA only, unfold it to make the code more
clear.
In addition:
 - Similar to other branches style in ram_save_target_page(), involve RDMA
   only if the condition 'migrate_rdma()' is true.
 - Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3:
  squash previous 2nd, 3th, 4th into one patch
---
 migration/ram.c  | 34 +++++++---------------------------
 migration/rdma.c |  7 ++-----
 migration/rdma.h |  3 +--
 3 files changed, 10 insertions(+), 34 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 424df6d9f13..c363034c882 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1143,32 +1143,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
     return len;
 }
 
-/*
- * @pages: the number of pages written by the control path,
- *        < 0 - error
- *        > 0 - number of pages written
- *
- * Return true if the pages has been saved, otherwise false is returned.
- */
-static bool control_save_page(PageSearchStatus *pss,
-                              ram_addr_t offset, int *pages)
-{
-    int ret;
-
-    ret = rdma_control_save_page(pss->pss_channel, pss->block->offset, offset,
-                                 TARGET_PAGE_SIZE);
-    if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
-        return false;
-    }
-
-    if (ret == RAM_SAVE_CONTROL_DELAYED) {
-        *pages = 1;
-        return true;
-    }
-    *pages = ret;
-    return true;
-}
-
 /*
  * directly send the page to the stream
  *
@@ -1965,7 +1939,13 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
     int res;
 
     /* Hand over to RDMA first */
-    if (control_save_page(pss, offset, &res)) {
+    if (migrate_rdma()) {
+        res = rdma_control_save_page(pss->pss_channel, pss->block->offset,
+                                     offset, TARGET_PAGE_SIZE);
+
+        if (res == RAM_SAVE_CONTROL_DELAYED) {
+            res = 1;
+        }
         return res;
     }
 
diff --git a/migration/rdma.c b/migration/rdma.c
index e5b4ac599b1..08eb924ffaa 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3284,14 +3284,11 @@ err:
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                            ram_addr_t offset, size_t size)
 {
-    if (!migrate_rdma()) {
-        return RAM_SAVE_CONTROL_NOT_SUPP;
-    }
+    assert(migrate_rdma());
 
     int ret = qemu_rdma_save_page(f, block_offset, offset, size);
 
-    if (ret != RAM_SAVE_CONTROL_DELAYED &&
-        ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+    if (ret != RAM_SAVE_CONTROL_DELAYED) {
         if (ret < 0) {
             qemu_file_set_error(f, ret);
         }
diff --git a/migration/rdma.h b/migration/rdma.h
index f55f28bbed1..8eeb0117b91 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -33,7 +33,6 @@ void rdma_start_incoming_migration(InetSocketAddress *host_port, Error **errp);
 #define RAM_CONTROL_ROUND     1
 #define RAM_CONTROL_FINISH    3
 
-#define RAM_SAVE_CONTROL_NOT_SUPP -1000
 #define RAM_SAVE_CONTROL_DELAYED  -2000
 
 #ifdef CONFIG_RDMA
@@ -56,7 +55,7 @@ static inline
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
                            ram_addr_t offset, size_t size)
 {
-    return RAM_SAVE_CONTROL_NOT_SUPP;
+    g_assert_not_reached();
 }
 #endif
 #endif
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 6/6] migration: Add qtest for migration over RDMA
  2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
                   ` (4 preceding siblings ...)
  2025-02-26  6:30 ` [PATCH v4 5/6] migration: Unfold control_save_page() Li Zhijian via
@ 2025-02-26  6:30 ` Li Zhijian via
  2025-02-28 13:49   ` Fabiano Rosas
  5 siblings, 1 reply; 14+ messages in thread
From: Li Zhijian via @ 2025-02-26  6:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Laurent Vivier, Paolo Bonzini,
	Li Zhijian

This qtest requires there is a RDMA(RoCE) link in the host.
In order to make the test work smoothly, introduce a
scripts/rdma-migration-helper.sh to
- setup a new Soft-RoCE(aka RXE) if it's root
- detect existing RoCE link

Test will be skipped if there is no available RoCE link.
 # Start of rdma tests
 # Running /x86_64/migration/precopy/rdma/plain
 ok 1 /x86_64/migration/precopy/rdma/plain # SKIP
 There is no available rdma link to run RDMA migration test.
 To enable the test:
 (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the test
 or
 (2) Run the test with root privilege

 # End of rdma tests

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 MAINTAINERS                           |  1 +
 scripts/rdma-migration-helper.sh      | 41 +++++++++++++++++
 tests/qtest/migration/precopy-tests.c | 64 +++++++++++++++++++++++++++
 3 files changed, 106 insertions(+)
 create mode 100755 scripts/rdma-migration-helper.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 3848d37a38d..15360fcdc4b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3480,6 +3480,7 @@ R: Li Zhijian <lizhijian@fujitsu.com>
 R: Peter Xu <peterx@redhat.com>
 S: Odd Fixes
 F: migration/rdma*
+F: scripts/rdma-migration-helper.sh
 
 Migration dirty limit and dirty page rate
 M: Hyman Huang <yong.huang@smartx.com>
diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh
new file mode 100755
index 00000000000..66557d9e267
--- /dev/null
+++ b/scripts/rdma-migration-helper.sh
@@ -0,0 +1,41 @@
+#!/bin/bash
+
+# Copied from blktests
+get_ipv4_addr()
+{
+    ip -4 -o addr show dev "$1" |
+        sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
+        tr -d '\n'
+}
+
+has_soft_rdma()
+{
+    rdma link | grep -q " netdev $1[[:blank:]]*\$"
+}
+
+rdma_rxe_setup_detect()
+{
+    (
+        cd /sys/class/net &&
+            for i in *; do
+                [ -e "$i" ] || continue
+                [ "$i" = "lo" ] && continue
+                [ "$(<"$i/addr_len")" = 6 ] || continue
+                [ "$(<"$i/carrier")" = 1 ] || continue
+
+                has_soft_rdma "$i" && break
+                [ "$operation" = "setup" ] &&
+                    rdma link add "${i}_rxe" type rxe netdev "$i" && break
+            done
+        has_soft_rdma "$i" || return
+        get_ipv4_addr "$i"
+    )
+}
+
+operation=${1:-setup}
+
+if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
+    rdma_rxe_setup_detect
+else
+    echo "Usage: $0 [setup | detect]"
+fi
diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index ba273d10b9a..bf97f4e9325 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -99,6 +99,66 @@ static void test_precopy_unix_dirty_ring(void)
     test_precopy_common(&args);
 }
 
+#ifdef CONFIG_RDMA
+
+#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
+static int new_rdma_link(char *buffer)
+{
+    const char *argument = (geteuid() == 0) ? "setup" : "detect";
+    char cmd[1024];
+
+    snprintf(cmd, sizeof(cmd), "%s %s", RDMA_MIGRATION_HELPER, argument);
+
+    FILE *pipe = popen(cmd, "r");
+    if (pipe == NULL) {
+        perror("Failed to run script");
+        return -1;
+    }
+
+    int idx = 0;
+    while (fgets(buffer + idx, 128 - idx, pipe) != NULL) {
+        idx += strlen(buffer);
+    }
+
+    int status = pclose(pipe);
+    if (status == -1) {
+        perror("Error reported by pclose()");
+        return -1;
+    } else if (WIFEXITED(status)) {
+        return WEXITSTATUS(status);
+    }
+
+    return -1;
+}
+
+static void test_precopy_rdma_plain(void)
+{
+    char buffer[128] = {};
+
+    if (new_rdma_link(buffer)) {
+        g_test_skip("\nThere is no available rdma link to run RDMA migration test.\n"
+                    "To enable the test:\n"
+                    "(1) Run \'" RDMA_MIGRATION_HELPER " setup\' with root and rerun the test\n"
+                    "or\n"
+                    "(2) Run the test with root privilege\n");
+        return;
+    }
+
+    /*
+     * TODO: query a free port instead of hard code.
+     * 29200=('R'+'D'+'M'+'A')*100
+     **/
+    g_autofree char *uri = g_strdup_printf("rdma:%s:29200", buffer);
+
+    MigrateCommon args = {
+        .listen_uri = uri,
+        .connect_uri = uri,
+    };
+
+    test_precopy_common(&args);
+}
+#endif
+
 static void test_precopy_tcp_plain(void)
 {
     MigrateCommon args = {
@@ -1124,6 +1184,10 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env)
                        test_multifd_tcp_uri_none);
     migration_test_add("/migration/multifd/tcp/plain/cancel",
                        test_multifd_tcp_cancel);
+#ifdef CONFIG_RDMA
+    migration_test_add("/migration/precopy/rdma/plain",
+                       test_precopy_rdma_plain);
+#endif
 }
 
 void migration_test_add_precopy(MigrationTestEnv *env)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides
  2025-02-26  6:30 ` [PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides Li Zhijian via
@ 2025-02-26 15:46   ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2025-02-26 15:46 UTC (permalink / raw)
  To: Li Zhijian; +Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini

On Wed, Feb 26, 2025 at 02:30:39PM +0800, Li Zhijian wrote:
> Depending on the order of starting RDMA and setting capability,
> the following scenarios can be categorized into the following scenarios:
> Source:
>  S1: [set capabilities] -> [Start RDMA outgoing]
> Destination:
>  D1: [set capabilities] -> [Start RDMA incoming]
>  D2: [Start RDMA incoming] -> [set capabilities]
> 
> Previously, compatibility between RDMA and capabilities was verified only
> in scenario D1, potentially causing migration failures in other situations.
> 
> For scenarios S1 and D1, we can seamlessly incorporate
> migration_transport_compatible() to address compatibility between
> channels and capabilities vs transport.
> 
> For scenario D2, ensure compatibility within migrate_caps_check().
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks
  2025-02-26  6:30 ` [PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks Li Zhijian via
@ 2025-02-26 15:49   ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2025-02-26 15:49 UTC (permalink / raw)
  To: Li Zhijian; +Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini

On Wed, Feb 26, 2025 at 02:30:41PM +0800, Li Zhijian wrote:
> Since we have disabled RDMA + postcopy, it's safe to remove
> the migration_in_postcopy() that follows the migrate_rdma().
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 5/6] migration: Unfold control_save_page()
  2025-02-26  6:30 ` [PATCH v4 5/6] migration: Unfold control_save_page() Li Zhijian via
@ 2025-02-26 15:51   ` Peter Xu
  2025-02-27  0:42     ` Zhijian Li (Fujitsu) via
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2025-02-26 15:51 UTC (permalink / raw)
  To: Li Zhijian; +Cc: qemu-devel, Fabiano Rosas, Laurent Vivier, Paolo Bonzini

On Wed, Feb 26, 2025 at 02:30:42PM +0800, Li Zhijian wrote:
> control_save_page() is for RDMA only, unfold it to make the code more
> clear.
> In addition:
>  - Similar to other branches style in ram_save_target_page(), involve RDMA
>    only if the condition 'migrate_rdma()' is true.
>  - Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

[...]

> @@ -56,7 +55,7 @@ static inline
>  int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
>                             ram_addr_t offset, size_t size)
>  {
> -    return RAM_SAVE_CONTROL_NOT_SUPP;
> +    g_assert_not_reached();
>  }

Not sure if some compiler will be unhappy on the retval not provided, but
anyway we'll see..

Reviewed-by: Peter Xu <peterx@redhat.com>

>  #endif
>  #endif
> -- 
> 2.44.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 5/6] migration: Unfold control_save_page()
  2025-02-26 15:51   ` Peter Xu
@ 2025-02-27  0:42     ` Zhijian Li (Fujitsu) via
  2025-02-27 16:43       ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2025-02-27  0:42 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini



On 26/02/2025 23:51, Peter Xu wrote:
> On Wed, Feb 26, 2025 at 02:30:42PM +0800, Li Zhijian wrote:
>> control_save_page() is for RDMA only, unfold it to make the code more
>> clear.
>> In addition:
>>   - Similar to other branches style in ram_save_target_page(), involve RDMA
>>     only if the condition 'migrate_rdma()' is true.
>>   - Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.
>>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> 
> [...]
> 
>> @@ -56,7 +55,7 @@ static inline
>>   int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
>>                              ram_addr_t offset, size_t size)
>>   {
>> -    return RAM_SAVE_CONTROL_NOT_SUPP;
>> +    g_assert_not_reached();
>>   }
> 
> Not sure if some compiler will be unhappy on the retval not provided, but
> anyway we'll see..

There is no problem in fedora 40(gcc 14.2.1) and ubuntu2204(gcc 11.4.0) with --disable-rdma.

I also noticed we have a few existing same usage:

1708 bool ram_write_tracking_compatible(void)
1709 {
1710     g_assert_not_reached();
1711 }
1712
1713 int ram_write_tracking_start(void)
1714 {
1715     g_assert_not_reached();
1716 }
1717
1718 void ram_write_tracking_stop(void)
1719 {
1720     g_assert_not_reached();
1721 }


I also asked the AI/Deepseek-R1, pasted a piece of his answer

```
3. Why No Warning for Missing return? 🚨
Typical case: A non-void function missing a return triggers -Wreturn-type warnings (enabled by -Wall).
This case: The noreturn annotation ensures no execution path exists beyond g_assert_not_reached(). Because the compiler recognizes this, no warning is necessary.

Conclusion
GCC trusts the noreturn annotation of g_assert_not_reached(), recognizing that the function’s control flow ends there. Thus, no warning is emitted. For code safety, ensure assertions are active or add fallback code if needed.
```

> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
>>   #endif
>>   #endif
>> -- 
>> 2.44.0
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 5/6] migration: Unfold control_save_page()
  2025-02-27  0:42     ` Zhijian Li (Fujitsu) via
@ 2025-02-27 16:43       ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2025-02-27 16:43 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Laurent Vivier,
	Paolo Bonzini

On Thu, Feb 27, 2025 at 12:42:30AM +0000, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 26/02/2025 23:51, Peter Xu wrote:
> > On Wed, Feb 26, 2025 at 02:30:42PM +0800, Li Zhijian wrote:
> >> control_save_page() is for RDMA only, unfold it to make the code more
> >> clear.
> >> In addition:
> >>   - Similar to other branches style in ram_save_target_page(), involve RDMA
> >>     only if the condition 'migrate_rdma()' is true.
> >>   - Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.
> >>
> >> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> > 
> > [...]
> > 
> >> @@ -56,7 +55,7 @@ static inline
> >>   int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
> >>                              ram_addr_t offset, size_t size)
> >>   {
> >> -    return RAM_SAVE_CONTROL_NOT_SUPP;
> >> +    g_assert_not_reached();
> >>   }
> > 
> > Not sure if some compiler will be unhappy on the retval not provided, but
> > anyway we'll see..
> 
> There is no problem in fedora 40(gcc 14.2.1) and ubuntu2204(gcc 11.4.0) with --disable-rdma.
> 
> I also noticed we have a few existing same usage:
> 
> 1708 bool ram_write_tracking_compatible(void)
> 1709 {
> 1710     g_assert_not_reached();
> 1711 }
> 1712
> 1713 int ram_write_tracking_start(void)
> 1714 {
> 1715     g_assert_not_reached();
> 1716 }
> 1717
> 1718 void ram_write_tracking_stop(void)
> 1719 {
> 1720     g_assert_not_reached();
> 1721 }

Right.

The other question is what about G_DISABLE_ASSERT, then I found this:

osdep.h:

#ifdef G_DISABLE_ASSERT
#error building with G_DISABLE_ASSERT is not supported
#endif

So yeah, we should be good.

> 
> 
> I also asked the AI/Deepseek-R1, pasted a piece of his answer
> 
> ```
> 3. Why No Warning for Missing return? 🚨
> Typical case: A non-void function missing a return triggers -Wreturn-type warnings (enabled by -Wall).
> This case: The noreturn annotation ensures no execution path exists beyond g_assert_not_reached(). Because the compiler recognizes this, no warning is necessary.
> 
> Conclusion
> GCC trusts the noreturn annotation of g_assert_not_reached(), recognizing that the function’s control flow ends there. Thus, no warning is emitted. For code safety, ensure assertions are active or add fallback code if needed.
> ```
> 
> > 
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > 
> >>   #endif
> >>   #endif
> >> -- 
> >> 2.44.0
> >>
> > 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 6/6] migration: Add qtest for migration over RDMA
  2025-02-26  6:30 ` [PATCH v4 6/6] migration: Add qtest for migration over RDMA Li Zhijian via
@ 2025-02-28 13:49   ` Fabiano Rosas
  2025-03-03  2:10     ` Zhijian Li (Fujitsu) via
  0 siblings, 1 reply; 14+ messages in thread
From: Fabiano Rosas @ 2025-02-28 13:49 UTC (permalink / raw)
  To: Li Zhijian via, qemu-devel
  Cc: Peter Xu, Laurent Vivier, Paolo Bonzini, Li Zhijian

Li Zhijian via <qemu-devel@nongnu.org> writes:

> This qtest requires there is a RDMA(RoCE) link in the host.
> In order to make the test work smoothly, introduce a
> scripts/rdma-migration-helper.sh to
> - setup a new Soft-RoCE(aka RXE) if it's root
> - detect existing RoCE link
>
> Test will be skipped if there is no available RoCE link.
>  # Start of rdma tests
>  # Running /x86_64/migration/precopy/rdma/plain
>  ok 1 /x86_64/migration/precopy/rdma/plain # SKIP
>  There is no available rdma link to run RDMA migration test.
>  To enable the test:
>  (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the test

sudo scripts/rdma-migration-helper.sh setup
QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test
--full -r /x86_64/migration/precopy/rdma/plain

# {
#     "error": {
#         "class": "GenericError",
#         "desc": "RDMA ERROR: rdma migration: error registering 0 control!"
#     }
# }

>  or
>  (2) Run the test with root privilege

This one works fine.

>
>  # End of rdma tests
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
>  MAINTAINERS                           |  1 +
>  scripts/rdma-migration-helper.sh      | 41 +++++++++++++++++
>  tests/qtest/migration/precopy-tests.c | 64 +++++++++++++++++++++++++++
>  3 files changed, 106 insertions(+)
>  create mode 100755 scripts/rdma-migration-helper.sh
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3848d37a38d..15360fcdc4b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3480,6 +3480,7 @@ R: Li Zhijian <lizhijian@fujitsu.com>
>  R: Peter Xu <peterx@redhat.com>
>  S: Odd Fixes
>  F: migration/rdma*
> +F: scripts/rdma-migration-helper.sh
>  
>  Migration dirty limit and dirty page rate
>  M: Hyman Huang <yong.huang@smartx.com>
> diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh
> new file mode 100755
> index 00000000000..66557d9e267
> --- /dev/null
> +++ b/scripts/rdma-migration-helper.sh
> @@ -0,0 +1,41 @@
> +#!/bin/bash
> +

I'd prefer a command -v rdma check around here. With the way the script
pipes commands into one another will cause bash to emit a couple of
"rdma: command not found" in case rdma command is not present.

> +# Copied from blktests
> +get_ipv4_addr()
> +{
> +    ip -4 -o addr show dev "$1" |
> +        sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
> +        tr -d '\n'
> +}
> +
> +has_soft_rdma()
> +{
> +    rdma link | grep -q " netdev $1[[:blank:]]*\$"
> +}
> +
> +rdma_rxe_setup_detect()
> +{
> +    (
> +        cd /sys/class/net &&
> +            for i in *; do
> +                [ -e "$i" ] || continue
> +                [ "$i" = "lo" ] && continue
> +                [ "$(<"$i/addr_len")" = 6 ] || continue
> +                [ "$(<"$i/carrier")" = 1 ] || continue
> +
> +                has_soft_rdma "$i" && break
> +                [ "$operation" = "setup" ] &&
> +                    rdma link add "${i}_rxe" type rxe netdev "$i" && break
> +            done
> +        has_soft_rdma "$i" || return
> +        get_ipv4_addr "$i"
> +    )
> +}
> +
> +operation=${1:-setup}
> +
> +if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
> +    rdma_rxe_setup_detect
> +else
> +    echo "Usage: $0 [setup | detect]"
> +fi

What happened to the cleanup option? I think I missed some discussion on
this... We can't expect people to know how to clean this up without any
hint.

> diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
> index ba273d10b9a..bf97f4e9325 100644
> --- a/tests/qtest/migration/precopy-tests.c
> +++ b/tests/qtest/migration/precopy-tests.c
> @@ -99,6 +99,66 @@ static void test_precopy_unix_dirty_ring(void)
>      test_precopy_common(&args);
>  }
>  
> +#ifdef CONFIG_RDMA
> +
> +#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
> +static int new_rdma_link(char *buffer)
> +{
> +    const char *argument = (geteuid() == 0) ? "setup" : "detect";
> +    char cmd[1024];
> +
> +    snprintf(cmd, sizeof(cmd), "%s %s", RDMA_MIGRATION_HELPER, argument);
> +
> +    FILE *pipe = popen(cmd, "r");

This needs to be silenced, otherwise messages from the script will break
TAP output. I suggest:

    bool verbose = g_getenv("QTEST_LOG");

    snprintf(cmd, sizeof(cmd), "%s %s %s", RDMA_MIGRATION_HELPER, argument,
             verbose ? "" : "2>/dev/null");

> +    if (pipe == NULL) {
> +        perror("Failed to run script");
> +        return -1;
> +    }
> +
> +    int idx = 0;
> +    while (fgets(buffer + idx, 128 - idx, pipe) != NULL) {
> +        idx += strlen(buffer);
> +    }
> +
> +    int status = pclose(pipe);
> +    if (status == -1) {
> +        perror("Error reported by pclose()");
> +        return -1;
> +    } else if (WIFEXITED(status)) {
> +        return WEXITSTATUS(status);
> +    }
> +
> +    return -1;
> +}
> +
> +static void test_precopy_rdma_plain(void)
> +{
> +    char buffer[128] = {};
> +
> +    if (new_rdma_link(buffer)) {
> +        g_test_skip("\nThere is no available rdma link to run RDMA migration test.\n"
> +                    "To enable the test:\n"
> +                    "(1) Run \'" RDMA_MIGRATION_HELPER " setup\' with root and rerun the test\n"
> +                    "or\n"
> +                    "(2) Run the test with root privilege\n");

g_test_skip() needs a one-line message, otherwise it breaks TAP
output. You can turn this into a g_test_message(), put it under
QTEST_LOG=1 and add a g_test_skip("no rdma link available") below.

> +        return;
> +    }
> +
> +    /*
> +     * TODO: query a free port instead of hard code.
> +     * 29200=('R'+'D'+'M'+'A')*100
> +     **/
> +    g_autofree char *uri = g_strdup_printf("rdma:%s:29200", buffer);
> +
> +    MigrateCommon args = {
> +        .listen_uri = uri,
> +        .connect_uri = uri,
> +    };
> +
> +    test_precopy_common(&args);
> +}
> +#endif
> +
>  static void test_precopy_tcp_plain(void)
>  {
>      MigrateCommon args = {
> @@ -1124,6 +1184,10 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env)
>                         test_multifd_tcp_uri_none);
>      migration_test_add("/migration/multifd/tcp/plain/cancel",
>                         test_multifd_tcp_cancel);
> +#ifdef CONFIG_RDMA
> +    migration_test_add("/migration/precopy/rdma/plain",
> +                       test_precopy_rdma_plain);
> +#endif
>  }
>  
>  void migration_test_add_precopy(MigrationTestEnv *env)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 6/6] migration: Add qtest for migration over RDMA
  2025-02-28 13:49   ` Fabiano Rosas
@ 2025-03-03  2:10     ` Zhijian Li (Fujitsu) via
  0 siblings, 0 replies; 14+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2025-03-03  2:10 UTC (permalink / raw)
  To: Fabiano Rosas, Li Zhijian via; +Cc: Peter Xu, Laurent Vivier, Paolo Bonzini

Fabiano

Thanks for your testing.

On 28/02/2025 21:49, Fabiano Rosas wrote:
> Li Zhijian via <qemu-devel@nongnu.org> writes:
> 
>> This qtest requires there is a RDMA(RoCE) link in the host.
>> In order to make the test work smoothly, introduce a
>> scripts/rdma-migration-helper.sh to
>> - setup a new Soft-RoCE(aka RXE) if it's root
>> - detect existing RoCE link
>>
>> Test will be skipped if there is no available RoCE link.
>>   # Start of rdma tests
>>   # Running /x86_64/migration/precopy/rdma/plain
>>   ok 1 /x86_64/migration/precopy/rdma/plain # SKIP
>>   There is no available rdma link to run RDMA migration test.
>>   To enable the test:
>>   (1) Run 'scripts/rdma-migration-helper.sh setup' with root and rerun the test
> 
> sudo scripts/rdma-migration-helper.sh setup
> QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test
> --full -r /x86_64/migration/precopy/rdma/plain
> 
> # {
> #     "error": {
> #         "class": "GenericError",
> #         "desc": "RDMA ERROR: rdma migration: error registering 0 control!"
> #     }
> # }
> 


1333 static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
1334 {
1335     rdma->wr_data[idx].control_mr = ibv_reg_mr(rdma->pd,
1336             rdma->wr_data[idx].control, RDMA_CONTROL_MAX_BUFFER,
1337             IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE);  <<<=== It failed here
            
1338     if (rdma->wr_data[idx].control_mr) {
1339         rdma->total_registrations++;
1340         return 0;
1341     }
1342     return -1;
1343 }


It appears to have failed at ibv_reg_mr()
  
This worked on my Ubuntu2204 and Fedora40. I wonder if your distro's security policy
is preventing MR registration without root privileges...?




>>   or
>>   (2) Run the test with root privilege
> 
> This one works fine.
> 
>>
>>   # End of rdma tests
>>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>>   MAINTAINERS                           |  1 +
>>   scripts/rdma-migration-helper.sh      | 41 +++++++++++++++++
>>   tests/qtest/migration/precopy-tests.c | 64 +++++++++++++++++++++++++++
>>   3 files changed, 106 insertions(+)
>>   create mode 100755 scripts/rdma-migration-helper.sh
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 3848d37a38d..15360fcdc4b 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3480,6 +3480,7 @@ R: Li Zhijian <lizhijian@fujitsu.com>
>>   R: Peter Xu <peterx@redhat.com>
>>   S: Odd Fixes
>>   F: migration/rdma*
>> +F: scripts/rdma-migration-helper.sh
>>   
>>   Migration dirty limit and dirty page rate
>>   M: Hyman Huang <yong.huang@smartx.com>
>> diff --git a/scripts/rdma-migration-helper.sh b/scripts/rdma-migration-helper.sh
>> new file mode 100755
>> index 00000000000..66557d9e267
>> --- /dev/null
>> +++ b/scripts/rdma-migration-helper.sh
>> @@ -0,0 +1,41 @@
>> +#!/bin/bash
>> +
> 
> I'd prefer a command -v rdma check around here. With the way the script
> pipes commands into one another will cause bash to emit a couple of
> "rdma: command not found" in case rdma command is not present.
> 

It sounds good to me.


>> +# Copied from blktests
>> +get_ipv4_addr()
>> +{
>> +    ip -4 -o addr show dev "$1" |
>> +        sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p' |
>> +        tr -d '\n'
>> +}
>> +
>> +has_soft_rdma()
>> +{
>> +    rdma link | grep -q " netdev $1[[:blank:]]*\$"
>> +}
>> +
>> +rdma_rxe_setup_detect()
>> +{
>> +    (
>> +        cd /sys/class/net &&
>> +            for i in *; do
>> +                [ -e "$i" ] || continue
>> +                [ "$i" = "lo" ] && continue
>> +                [ "$(<"$i/addr_len")" = 6 ] || continue
>> +                [ "$(<"$i/carrier")" = 1 ] || continue
>> +
>> +                has_soft_rdma "$i" && break
>> +                [ "$operation" = "setup" ] &&
>> +                    rdma link add "${i}_rxe" type rxe netdev "$i" && break
>> +            done
>> +        has_soft_rdma "$i" || return
>> +        get_ipv4_addr "$i"
>> +    )
>> +}
>> +
>> +operation=${1:-setup}
>> +
>> +if [ "$operation" == "setup" ] || [ "$operation" == "detect" ]; then
>> +    rdma_rxe_setup_detect
>> +else
>> +    echo "Usage: $0 [setup | detect]"
>> +fi
> 
> What happened to the cleanup option? I think I missed some discussion on
> this... We can't expect people to know how to clean this up without any
> hint.

Nothing special, one reason could be to keep it as simple as possible in the beginning.

I'm fine to add it back.


> 
>> diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
>> index ba273d10b9a..bf97f4e9325 100644
>> --- a/tests/qtest/migration/precopy-tests.c
>> +++ b/tests/qtest/migration/precopy-tests.c
>> @@ -99,6 +99,66 @@ static void test_precopy_unix_dirty_ring(void)
>>       test_precopy_common(&args);
>>   }
>>   
>> +#ifdef CONFIG_RDMA
>> +
>> +#define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
>> +static int new_rdma_link(char *buffer)
>> +{
>> +    const char *argument = (geteuid() == 0) ? "setup" : "detect";
>> +    char cmd[1024];
>> +
>> +    snprintf(cmd, sizeof(cmd), "%s %s", RDMA_MIGRATION_HELPER, argument);
>> +
>> +    FILE *pipe = popen(cmd, "r");
> 
> This needs to be silenced, otherwise messages from the script will break
> TAP output. I suggest:
> 
>      bool verbose = g_getenv("QTEST_LOG");
> 
>      snprintf(cmd, sizeof(cmd), "%s %s %s", RDMA_MIGRATION_HELPER, argument,
>               verbose ? "" : "2>/dev/null");
> 

It sound good to me, i will update it.


>> +    if (pipe == NULL) {
>> +        perror("Failed to run script");
>> +        return -1;
>> +    }
>> +
>> +    int idx = 0;
>> +    while (fgets(buffer + idx, 128 - idx, pipe) != NULL) {
>> +        idx += strlen(buffer);
>> +    }
>> +
>> +    int status = pclose(pipe);
>> +    if (status == -1) {
>> +        perror("Error reported by pclose()");
>> +        return -1;
>> +    } else if (WIFEXITED(status)) {
>> +        return WEXITSTATUS(status);
>> +    }
>> +
>> +    return -1;
>> +}
>> +
>> +static void test_precopy_rdma_plain(void)
>> +{
>> +    char buffer[128] = {};
>> +
>> +    if (new_rdma_link(buffer)) {
>> +        g_test_skip("\nThere is no available rdma link to run RDMA migration test.\n"
>> +                    "To enable the test:\n"
>> +                    "(1) Run \'" RDMA_MIGRATION_HELPER " setup\' with root and rerun the test\n"
>> +                    "or\n"
>> +                    "(2) Run the test with root privilege\n");
> 
> g_test_skip() needs a one-line message, otherwise it breaks TAP
> output. You can turn this into a g_test_message(), put it under
> QTEST_LOG=1 and add a g_test_skip("no rdma link available") below.


Ditto.

Thanks
Zhijian

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-03-03  2:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-26  6:30 [PATCH v4 0/6] migration/rdma: fixes, refactor and cleanup Li Zhijian via
2025-02-26  6:30 ` [PATCH v4 1/6] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
2025-02-26  6:30 ` [PATCH v4 2/6] migration: check RDMA and capabilities are compatible on both sides Li Zhijian via
2025-02-26 15:46   ` Peter Xu
2025-02-26  6:30 ` [PATCH v4 3/6] migration: disable RDMA + postcopy-ram Li Zhijian via
2025-02-26  6:30 ` [PATCH v4 4/6] migration/rdma: Remove redundant migration_in_postcopy checks Li Zhijian via
2025-02-26 15:49   ` Peter Xu
2025-02-26  6:30 ` [PATCH v4 5/6] migration: Unfold control_save_page() Li Zhijian via
2025-02-26 15:51   ` Peter Xu
2025-02-27  0:42     ` Zhijian Li (Fujitsu) via
2025-02-27 16:43       ` Peter Xu
2025-02-26  6:30 ` [PATCH v4 6/6] migration: Add qtest for migration over RDMA Li Zhijian via
2025-02-28 13:49   ` Fabiano Rosas
2025-03-03  2:10     ` Zhijian Li (Fujitsu) via

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.